Anti-Search patterns - SQL to look for what is NOT there - Part One javacode 9085791

Anti-Search patterns – SQL to look for what is NOT there – Part One

Usually in SQL queries, you will try to retrieve information from a database by reading specific records from the relevant tables. However, there are situations where what you want to learn from a query is the fact that something is NOT there. So instead of querying for existing records, you are looking for records that do not exist. This article discusses a number of such situations and demonstrates several “anti-queries”. Where possible, I have used the EMP and DEPT table of the classical SCOTT sample schema to illustrate the point.

Some of these examples are perhaps better used to illustrate specific SQL features – such as CONNECT_BY_ISLEAF, PARTITION OUTER JOIN, LISTAGG, MINUS, SCALAR SUBQUERY and more – than to answer the questions at hand.

Manager without subordinates – Leaf Nodes

The MGR column in the EMP table is used to reference the manager of an employee. It is therefore quite easy to find the employees without a manager (where mgr is null). The reverse is not as simple as that: find employees that are not somebody’s manager. There is no reference from the manager to her staff in the table.

To find out which employees are in fact not anybody’s manager, there are several approaches possible.

Not Exists

A very straightforward approach, using NOT EXISTS:

select ename
from   emp e
where  not exists ( select empno
                    from   emp s
                    where  s.mgr = e.empno
                  )

Scalar Subquery

With a Scalar Subquery we can achieve something similar. However, because we use a more crude approach – instead of stating exactly what we want as we do with NOT EXISTS, we have the database calculate the count and only then filter on the count result being zero – chances are that this approach does not perform as well. Of course if the functional requirement shifts from ‘find employees with no subordinates’ to ‘ find employees with zero or one subordinate’, this approach can still be used.

select ename
from   emp e
where  ( select count(*)
         from emp s
         where  s.mgr = e.empno
       ) = 0

Minus

The MINUS set operator can be used to substract one set of rows from another. We can use this operator for answer our question here in the following way:

select ename
from   emp
minus
select m.ename
from   emp s
       join
       emp m
       on (s.mgr = m.empno)

First create a set with all [names of] employees in the company. Then create the list of [names of] employees that are someone’s manager. By substracting the second set from the first, we extract those employees that are not in the set of employees who are someone’s manager – and that are therefore NOT someone’s manager.

Instr

The following query uses a roundabout way: first create a comma separated string with all the empnos of the managers in table emp, next return the employees whose empno does not exist in that comma separated string:

select ename
from   emp
where  instr( ( select listagg(mgr,',') within group (order by mgr)
                from   emp
              )
            , empno) = 0

Hierarchical Query

The relationship between Employees and their manager is a hierarchical one. In order to find the employees ‘at the bottom of the foodchain’ , we can use an hierarchical query and its facilities for identifying leaf nodes.

The connect_by_isleaf function can be used in an hierarchical query based on the CONNECT BY syntax – the long standing way of doing hierarchical queries in Oracle:

with emp_hierarchy as
( select ename
  ,      connect_by_isleaf leaf
  from   emp
  connect
  by     prior empno = mgr
  start
  with   mgr is null
)
select ename
from   emp_hierarchy
where  leaf = 1

As of Oracle 11g, the new way of doing hierarchical queries – also the ANSI SQL way – uses recursive subquery factoring. With this approach, there is no connect_by_isleaf function. The following query uses ‘modern’ hierarchical querying locates the the leaf nodes with a specific construction:

with employees (empno, name, mgr, lev) as
( select empno, ename, mgr, 1
  from   emp
  where  mgr is null
  union all
  select e.empno, e.ename
  ,      e.mgr, m.lev + 1
  from   emp e join employees m
         on (m.empno = e.mgr)
)
SEARCH DEPTH FIRST BY mgr SET seq
, employees_with_leaf as
( select e.*
  ,      case
         when (lev - lead(lev) over (order by seq)) < 0
         then 0
         else 1
         end is_leaf
  from   employees e
)
select name
from   employees_with_leaf
where  is_leaf = 1

Anti Join

Joins are typically used to combine records. Joins can readily be used to locate those records that cannot be combined. The following query uses an outer join to link all candidate managers (every employee in table EMP) to to their subordinates. The left outer join is used to also return candidate managers that are not referenced by any subordinate.

Every join returns a combination of a manager record with an employee (subordinate) record. Only for those candidate managers that do not have a subordinate – and are returned through the outer join – is the combination incomplete: the manager record is not joined at all. We find those incomplete combinations when we check for the rowid of the employee half of the combination. By filtering on the combinations where the employee rowid is null, we find the candidate managers without subordinates.

select m.ename
from   emp m
       left outer join
       emp e
       on (m.empno = e.mgr)
where  e.rowid is null

The Sequel

In the sequel to this article, we will answer burning questions like: which letters do not occur in the names of the employees, in which months of the year was no one hired, who in each department does not have a colleague with less experience (or a lower salary), which employees do not have a colleague in the same job (either in their own department or in the entire company), and more. Stay tuned.

Speaking of tuning: it is nice of course to have seven different ways – and there undoubtedly many more – to find out about managers who are not managers at all due to a lack of subordinates. However, the question is still open which approach to use. It would be interesting to compare these methods – not just by SQL elegance and query intuivity – but also by performance consequences and execution planning. If anyone feels like putting these approaches to the test, comparing the execution plans for a very large EMP table with or without constraints and indexes, then I would be most interested in taking a look at the outcomes.

4 Comments

  1. Lucas Jellema December 22, 2010
  2. Laurent Schneider December 20, 2010
  3. Timo Raitalaakso December 20, 2010
  4. Anton Scheffer December 20, 2010