Summary results for all dates, including the ones that I do not have data for - example of using Partition Outer Join - Oracle10g SQL Feature Oracle Headquarters Redwood Shores1 e1698667100526

Summary results for all dates, including the ones that I do not have data for – example of using Partition Outer Join – Oracle10g SQL Feature

Trying to make myself useful by answering some of the questions on the OTN SQL and PL/SQL Forum, I came across this question: Group Above Report problem…not able to display absent rows in date range. The question could be translated to:

I am trying to aggregate records by date and I want results for every date in a certain range. However, I am missing some dates – because there is no data at all for these dates.

I was hooked, since it is a common situation. Of course, you cannot expect the database to return a result for a certain date if there is no data for that date. That is logical. So instead of querying for all records where date between start_of_date_range and end_of_date range – which will only give you agregation results for all the dates that are actually represented in your base set, you need to be more explicit about the dates you are actually interested in.
Let’s translate this problem to EMP. We want to see the number of employees hired in each of the years 1979 through 1985. And we want the count per department. Our very first attempt:

select deptno
,      extract(year from hiredate) hireyear
,      count(empno)
from   emp
where  extract(year from hiredate) between 1979 and 1985
group
by     deptno
,      extract(year from hiredate)
order
by     deptno
,      hireyear
/
    DEPTNO   HIREYEAR COUNT(EMPNO)
---------- ---------- ------------
        10       1981            2
        10       1982            1
        20       1980            1
        20       1981            2
        20       1982            1
        20       1983            1
        30       1981            6

I do not see results for 1984 at all and for departments 10 and 30 other years are lacking as well: 1980 and 1983. So I somehow must instruct the database to provide results for every year in my desired range. To do so, I must explicitly create all year-records for that range. I could perhaps query all years from a table – if there is such a table. I could also make use of a Table Function.But an easy solution is the following – based on Anton’s comment on the post Courtesy of Tom Kyte: Generating rows in SQL with the CUBE statement – no dummy table or table function required

This statement will give me the years I require:

select 1979 + years.year  year
from   ( select  rownum year
         from    dual
         connect
         by      rownum < 6 ) years
/
      YEAR
----------
      1980
      1981
      1982
      1983
      1984
      1985

I can use this query as in-line view for my summary:

 with years as
 ( select 1979 + years.year  year
 from   ( select  rownum year
          from    dual
          connect
          by      rownum < 6 ) years
 )
 select deptno
 ,      years.year hireyear
 ,      count(empno)
 from   emp
        right outer join
        years
        on (extract (year from hiredate) = years.year)
 group
 by     deptno
 ,      years.year
 order
 by     deptno
 ,      hireyear
/
    DEPTNO   HIREYEAR COUNT(EMPNO)
---------- ---------- ------------
        10       1981            2
        10       1982            1
        20       1980            1
        20       1981            2
        20       1982            1
        20       1983            1
        30       1981            6
                 1984            0
                 1985            0

The results are clearly somewhat more encouraging. This time we have records for 1984 and 1985. However, departments 10 and 20 still do not have results for 1980 or 1983. The outer join does nothing more than ensuring that we will have at least one record for every year. If department 20 provides a record for 1980 and 1983, the outer join does nothing to provide additional records for other departments as well.

And now finally to the Oracle 10g Partition Outer Join clause. This clause allows us to exactly solve the problem we are now facing: we want the outer join to be performed not just once but for every department again! Look at the following query:

with years as
( select 1979 + years.year  year
from   ( select  rownum year
         from    dual
         connect
         by      rownum < 6 ) years
)
select deptno
,      years.year hireyear
,      count(empno)
from   emp partition by (deptno)
       right outer join
       years
       on (extract (year from hiredate) = years.year)
group
by     deptno
,      years.year
order
by     deptno
,      hireyear
/
    DEPTNO   HIREYEAR COUNT(EMPNO)
---------- ---------- ------------
        10       1980            0
        10       1981            2
        10       1982            1
        10       1983            0
        10       1984            0
        10       1985            0
        20       1980            1
        20       1981            2
        20       1982            1
        20       1983            1
        20       1984            0
        20       1985            0
        30       1980            0
        30       1981            6
        30       1982            0
        30       1983            0
        30       1984            0
        30       1985            0

All this is brought about by adding 'partition by (deptno) to the EMP table. That simple clause instructs the OUTER JOIN to be performed on each department-partition of the EMP table.

If you do not happen to be on 10g just yet, you can simulate the Partition Outer Join with code like this:

with years as
( select 1979 + years.year  year
from   ( select  rownum year
         from    dual
         connect
         by      rownum < 6 ) years
)
select depts.deptno
,      years.year hireyear
,      count(empno)
from   (select distinct deptno
        from   emp
       ) depts
       cross join
       years
       left outer join
       emp staff
       on (extract (year from staff.hiredate) = years.year and depts.deptno = staff.deptno)
group
by     depts.deptno
,      years.year
order
by     depts.deptno
,      hireyear

Here I have cross joined a list of all departments with the list of all years. That means that I have set up as yet empty cells for every combination of department (deptno) and year. I then left outer join that grid of empty cells with the actual summary results of aggregating over emp. This will fill in as many cells as possible.