Anti-Search patterns – SQL to look for what is NOT there – Part One
Usually in SQL queries, you will try to retrieve information from a database by reading specific records from the relevant tables. However, there are situations where what you want to learn from a query is the fact that something is NOT there. So instead of querying for existing records, you are looking for records that do not exist. This article discusses a number of such situations and demonstrates several “anti-queries”. Where possible, I have used the EMP and DEPT table of the classical SCOTT sample schema to illustrate the point.
Some of these examples are perhaps better used to illustrate specific SQL features – such as CONNECT_BY_ISLEAF, PARTITION OUTER JOIN, LISTAGG, MINUS, SCALAR SUBQUERY and more – than to answer the questions at hand.
Manager without subordinates – Leaf Nodes
The MGR column in the EMP table is used to reference the manager of an employee. It is therefore quite easy to find the employees without a manager (where mgr is null). The reverse is not as simple as that: find employees that are not somebody’s manager. There is no reference from the manager to her staff in the table.
To find out which employees are in fact not anybody’s manager, there are several approaches possible.
A very straightforward approach, using NOT EXISTS:
select ename from emp e where not exists ( select empno from emp s where s.mgr = e.empno )
With a Scalar Subquery we can achieve something similar. However, because we use a more crude approach – instead of stating exactly what we want as we do with NOT EXISTS, we have the database calculate the count and only then filter on the count result being zero – chances are that this approach does not perform as well. Of course if the functional requirement shifts from ‘find employees with no subordinates’ to ‘ find employees with zero or one subordinate’, this approach can still be used.
select ename from emp e where ( select count(*) from emp s where s.mgr = e.empno ) = 0
The MINUS set operator can be used to substract one set of rows from another. We can use this operator for answer our question here in the following way:
select ename from emp minus select m.ename from emp s join emp m on (s.mgr = m.empno)
First create a set with all [names of] employees in the company. Then create the list of [names of] employees that are someone’s manager. By substracting the second set from the first, we extract those employees that are not in the set of employees who are someone’s manager – and that are therefore NOT someone’s manager.
The following query uses a roundabout way: first create a comma separated string with all the empnos of the managers in table emp, next return the employees whose empno does not exist in that comma separated string:
select ename from emp where instr( ( select listagg(mgr,',') within group (order by mgr) from emp ) , empno) = 0
The relationship between Employees and their manager is a hierarchical one. In order to find the employees ‘at the bottom of the foodchain’ , we can use an hierarchical query and its facilities for identifying leaf nodes.
The connect_by_isleaf function can be used in an hierarchical query based on the CONNECT BY syntax – the long standing way of doing hierarchical queries in Oracle:
with emp_hierarchy as ( select ename , connect_by_isleaf leaf from emp connect by prior empno = mgr start with mgr is null ) select ename from emp_hierarchy where leaf = 1
As of Oracle 11g, the new way of doing hierarchical queries – also the ANSI SQL way – uses recursive subquery factoring. With this approach, there is no connect_by_isleaf function. The following query uses ‘modern’ hierarchical querying locates the the leaf nodes with a specific construction:
with employees (empno, name, mgr, lev) as ( select empno, ename, mgr, 1 from emp where mgr is null union all select e.empno, e.ename , e.mgr, m.lev + 1 from emp e join employees m on (m.empno = e.mgr) ) SEARCH DEPTH FIRST BY mgr SET seq , employees_with_leaf as ( select e.* , case when (lev - lead(lev) over (order by seq)) < 0 then 0 else 1 end is_leaf from employees e ) select name from employees_with_leaf where is_leaf = 1
Joins are typically used to combine records. Joins can readily be used to locate those records that cannot be combined. The following query uses an outer join to link all candidate managers (every employee in table EMP) to to their subordinates. The left outer join is used to also return candidate managers that are not referenced by any subordinate.
Every join returns a combination of a manager record with an employee (subordinate) record. Only for those candidate managers that do not have a subordinate – and are returned through the outer join – is the combination incomplete: the manager record is not joined at all. We find those incomplete combinations when we check for the rowid of the employee half of the combination. By filtering on the combinations where the employee rowid is null, we find the candidate managers without subordinates.
select m.ename from emp m left outer join emp e on (m.empno = e.mgr) where e.rowid is null
In the sequel to this article, we will answer burning questions like: which letters do not occur in the names of the employees, in which months of the year was no one hired, who in each department does not have a colleague with less experience (or a lower salary), which employees do not have a colleague in the same job (either in their own department or in the entire company), and more. Stay tuned.
Speaking of tuning: it is nice of course to have seven different ways – and there undoubtedly many more – to find out about managers who are not managers at all due to a lack of subordinates. However, the question is still open which approach to use. It would be interesting to compare these methods – not just by SQL elegance and query intuivity – but also by performance consequences and execution planning. If anyone feels like putting these approaches to the test, comparing the execution plans for a very large EMP table with or without constraints and indexes, then I would be most interested in taking a look at the outcomes.
- Using a SelectMany… component in a JHeadstart Advanced Search Form – search for all CLERKS & SALESMEN
- Oracle RDBMS 11gR2 – goodbye Connect By or: the end of hierarchical querying as we know it
- Oracle RDBMS 11gR2 – new style hierarchical querying using Recursive Subquery Factoring
- Writing a Word Search puzzle solver in SQL
- Using default search values on the JHeadstart advanced search
- Oracle SQL – Finding free adjacent seats in an airplane, using Subquery Factoring, Analytical Functions (LISTAGG and LEAD), Outer Join, PIVOT and good old INSTR
- Oracle SQL – spotting combinations that occur and those that do not – demonstrating Analytical Functions, Outer Join and SubQuery Factoring
- Oracle SQL: Using subquery factoring in an INSERT statement
- OTN Yathra 2013 – Spreading the story of Oracle across India – (Half time)
- ADF DVT Speed Date: Present Metrics per Year, Quarter and Month using a zoom-enabled ADF DVT Resource Utilization Gantt and ADF BC
- Using TRUNC in SQL to get the first date in a period
- ADF DVT: Visualizing valid periods using Project and Scheduling Gantt Charts
- Advanced SQL to find valid periods – juggling with outer joins, running totals and analytical functions
- The Very Very Latest in Database Development – slides from the Expertezed presentation
- AMIS Masterclass Advanced SQL – scherp je SQL skills in een pittige en praktijkgerichte dag – maandag 10 december