Anti-Search patterns – SQL to look for what is NOT there – Part Two

It is a strange thing really: querying the database to find missing data. Queries usually report on what is there. Not the queries in this article (and its prequel): they report on what is not there. We explore a number of approaches to uncovering missing information – because the fact that something is not found in itself is a fact we may want to know about. And it gives me a nice motive to demonstrate a number of fun SQL features.

We are still very much engaged in the EMP and DEPT tables in the SCOTT schema that back in the early ’80s was pivotal in demonstrating the capabilities of the first Oracle RDBMS releases. It still does an excellent job for most of my demos today. Must be me…

This article delves into urgent matters such as ‘which letters do not occur in the names of the employees’, ‘in which months of the year was no one hired’, ‘who in each department does not have a colleague with less experience (or a lower salary)’, ‘which employees do not have a colleague in the same job (either in their own department or in the entire company)’, and more.

Letters not used in any Employee Name

Our first search for information that is not there is a quest for the letters (from the alphabet) that are not used in any of the Employee Names in table EMP. We have 14 employees, each with a 4 to 8 letter ENAME. We wonder (yes, we do) which of our 26 letters in the western european alphabet is not used in any of those 14 names.

Several approaches to this puzzling challenge are available to us, and I am sure there are many more than I will demonstrate here. All of these make use of the alphabet query that was introduced in the article about static and somewhat dynamic domains: https://technology.amis.nl/blog/9577/retrieving-the-values-for-domains-in-sql-queries-in-the-oracle-rdbms-months-days-numbers-and-custom-value-sets.

The first approach creates one long string out of all the employee enames, then joins that string with every letter in the alphabet. Only rows where the letter does not occur in the string are returned, thereby giving us the list of letters not used in any ename:

with alphabet as
( select chr(64+rownum) letter
  from   dual
  connect
  by     rownum < 27
)
, employeenames as
( select listagg(ename,',') within group (order by sal) enames
  from   emp
)
select letter
from   alphabet
       cross join
       employeenames
where  instr( enames, letter ) = 0

 

The second approach, though similar, uses two special options: first a join condition that does not just look for equality in two columns but instead uses instr( ename, letter ) >0 . The second is the outer join with the specific filter for un-joined rows. Every letter that is not joined to any ENAME is returned by this query.

with alphabet as
( select chr(64+rownum) letter
  from   dual
  connect
  by     rownum < 27
)
select letter
from   alphabet
       left outer join
       emp
       on instr( ename, letter ) > 0
where  emp.rowid is null

The third approach uses the MINUS set operator to substract from the ALPHABET all letters that occur in any of the ENAMES – and note how the CONNECT BY is used to turn a string of characters into a rowset with a single row for each character in the string (it is almost pivoting, right there!).

unusedletters

with alphabet as
( select chr(64+rownum) letter
  from   dual
  connect
  by     rownum < 27
)
, employeenames as
( select listagg(ename,',') within group (order by sal) enames
  from   emp
)
,  ename_letters as
( select substr(enames, level, 1) letter
  from   employeenames
  connect
  by     level <= length(enames)
)
select letter
from   alphabet
minus
select letter
from   ename_letters

 

The jobs that a department does not contain

Another pressing matter is to find out which JOBs do not occur in each department. The EMP table has a JOB column that specifies for each employee which Job he or she is in. A simple query

select distinct job from emp

reveals the various jobs that the organization that is described by table EMP contains. Suppose we want to find out which of these various jobs do not occur in a specific department. We need to craft a query that returns for each department a comma separated values list of the jobs that are not represented in the department.

Again, the outer query comes in handy, to join each department with each job, even if the job is not represented in the department. In order to make this work, we need to use the partition by clause. This clause instructs the SQL engine to perform the outer join for each department again. Every JOB therefore is present in the end result, not just once, but once for every department. Next, we will filter on all joins of JOB for a DEPARTMENT where an outer join was required – or where the DEPARTMENT does not have an EMP row to do a real join.

select distinct
       deptno
,      position
from   emp
       partition by (deptno)
       right outer join
       ( select distinct job position
         from   emp
       )  functions
       on (emp.job = functions.position)
where  emp.rowid is null
order
by     deptno
,      position

unusedjobs

The next step is to turn this query result into the comma separated aggregate that was requested.

select deptno
,      listagg(position, ',') within group (order by position) unfilled_jobs
from ( select deptno
       ,      position
       from   emp
              partition by (deptno)
              right outer join
              ( select distinct job position
                from   emp
              )  functions
              on (emp.job = functions.position)
       where  emp.rowid is null
     )
group
by     deptno
order
by     deptno

aggunusedjobs

 

 

The rarest jobs

A simple query to find the most rare job (in terms of the largest number of departments not having that job), which is the same thing as the smallest number of departments having that job:

 

select job
,      count(distinct deptno) num_of_depts
from   emp
group
by     job
order
by     num_of_depts

rare jobs

 

In which months was no one hired

Every EMLOYEE has a hiredate. From the HIREDATE column, it is easy to find out in which year and month someone was hired. However, in search of more information not there, we would like to find out in which months there was no one hired at all.

The pattern is emerging: outer join and filter on the virtual join cases:

with months as
( select 0 + rownum month_num
  ,      to_char( to_date('01-'||rownum||'-2010', 'DD-MM-YYYY'), 'MONTH') month_name
  from   dual
  connect
  by     level < 13
)
select months.month_name
from   emp e
       right outer join
       months
       on extract(month from e.hiredate)= months.month_num
where  e.rowid is null
order
by     months.month_num

We make use of the MONTHS domain, also discussed in the article about static and somewhat dynamic domains: https://technology.amis.nl/blog/9577/retrieving-the-values-for-domains-in-sql-queries-in-the-oracle-rdbms-months-days-numbers-and-custom-value-sets.

A somewhat alternative approach uses simple aggregates and a filter on the employee-hire count:

with months as
( select 0 + rownum month_num
  ,      to_char( to_date('01-'||rownum||'-2010', 'DD-MM-YYYY'), 'MONTH') month_name
  from   dual
  connect
  by     level < 13
)
select months.month_name
,      count(empno) "number of hires"
from   emp e
       right outer join
       months
       on extract(month from e.hiredate)= months.month_num
having count(empno) = 0
group
by     months.month_name
,      months.month_num
order
by     months.month_num

 

This next elaborate query shows the number of hired employees in each month per department. Again, using the PARTITION BY clause, the outer join has been applied to every department, instead of just once for every row in the MONTHS set. The result of this query would help to get an answer two the question: for each department – give me the months in which no employee was hired.

with months as
( select 0 + rownum month_num
  ,      to_char( to_date('01-'||rownum||'-2010', 'DD-MM-YYYY'), 'MONTH') month_name
  from   dual
  connect
  by     level < 13
)
select months.month_name
,      deptno department
,      count(empno) "number of hires"
from   emp e
       partition by (deptno)
       right outer join
       months
       on extract(month from e.hiredate)= months.month_num
group
by     months.month_name
,      months.month_num
,      deptno
order
by     months.month_num
,      deptno

Who in each department does not have a colleague with less experience

One way of reading this section title is: finding a record (EMP) for which there is not a next record (EMP) [where hiredate is more recent]. When a query can be expressed in terms of a next record, you should immediately think Analytical Functions, and more specifically: LEAD (and LAG).

Using lead we can look for the next less experienced colleague for each employee:

 select ename
 ,      lead(ename) over (partition by deptno order by hiredate asc) next_less_experienced_in_dept
 ,      deptno
 ,      hiredate
 from   emp
 order
 by     deptno
 ,      hiredate

Then we filter on those rows for which next_less_experienced_in_dept is null to find the most junior employees in each department.

 select ename
 ,      deptno
 from   (  select ename
           ,      lead(ename) over (partition by deptno order by hiredate asc) next_less_experienced_in_dept
           ,      deptno
           from   emp
        )
 where  next_less_experienced_in_dept is null

Now it is easy to check if these least experienced employees are also the least-earners:

 select ename
 ,      deptno
 from   (  select ename
           ,      lead(ename) over (partition by deptno order by sal desc) next_less_paid_in_dept
           ,      deptno
           from   emp
        )
 where  next_less_paid_in_dept is null

 

It turns out that MILLER and JAMES make an appearance in both result sets, ADAMS is least experiences in 20 yet SMITH makes less.

An alternative approach to the ‘find least experienced employees’, and even more compact, is this query:

 select deptno
 ,      max(ename) keep (dense_rank last order by hiredate asc)
 from   emp
 group
 by     deptno

This query selects for each department the ename for the record that is ranked last when ordered by hiredate. I have never quite gotten used to this syntax, but it can be quite useful.

The equivalent for salary should be obvious:

 select deptno
 ,      max(ename) keep (dense_rank last order by sal desc)
 from   emp
 group
 by     deptno

No Colleagues

Our final question in this article: which employees do not have a colleague in the same job (either in their own department or in the entire company)

 select ename
 ,      deptno
 ,      job
 from   (  select ename
           ,      count(*) over (partition by job, deptno) -1  num_of_colleagues
           ,      deptno
           ,      job
           from   emp
        )
 where  num_of_colleagues = 0

It turns out that it is lonely both at the top and at the bottom (PRESIDENT, MANAGER and CLERK).

lonelytop

2 Comments

  1. Lucas Jellema December 29, 2010
  2. Laurent Schneider December 27, 2010