Trying to make myself useful by answering some of the questions on the OTN SQL and PL/SQL Forum, I came across this question: Group Above Report problem…not able to display absent rows in date range. The question could be translated to:
I am trying to aggregate records by date and I want results for every date in a certain range. However, I am missing some dates – because there is no data at all for these dates.
I was hooked, since it is a common situation. Of course, you cannot expect the database to return a result for a certain date if there is no data for that date. That is logical. So instead of querying for all records where date between start_of_date_range and end_of_date range – which will only give you agregation results for all the dates that are actually represented in your base set, you need to be more explicit about the dates you are actually interested in.
Let’s translate this problem to EMP. We want to see the number of employees hired in each of the years 1979 through 1985. And we want the count per department. Our very first attempt:
select deptno , extract(year from hiredate) hireyear , count(empno) from emp where extract(year from hiredate) between 1979 and 1985 group by deptno , extract(year from hiredate) order by deptno , hireyear / DEPTNO HIREYEAR COUNT(EMPNO) ---------- ---------- ------------ 10 1981 2 10 1982 1 20 1980 1 20 1981 2 20 1982 1 20 1983 1 30 1981 6
I do not see results for 1984 at all and for departments 10 and 30 other years are lacking as well: 1980 and 1983. So I somehow must instruct the database to provide results for every year in my desired range. To do so, I must explicitly create all year-records for that range. I could perhaps query all years from a table – if there is such a table. I could also make use of a Table Function.But an easy solution is the following – based on Anton’s comment on the post Courtesy of Tom Kyte: Generating rows in SQL with the CUBE statement – no dummy table or table function required
This statement will give me the years I require:
select 1979 + years.year year from ( select rownum year from dual connect by rownum < 6 ) years / YEAR ---------- 1980 1981 1982 1983 1984 1985
I can use this query as in-line view for my summary:
with years as ( select 1979 + years.year year from ( select rownum year from dual connect by rownum < 6 ) years ) select deptno , years.year hireyear , count(empno) from emp right outer join years on (extract (year from hiredate) = years.year) group by deptno , years.year order by deptno , hireyear / DEPTNO HIREYEAR COUNT(EMPNO) ---------- ---------- ------------ 10 1981 2 10 1982 1 20 1980 1 20 1981 2 20 1982 1 20 1983 1 30 1981 6 1984 0 1985 0
The results are clearly somewhat more encouraging. This time we have records for 1984 and 1985. However, departments 10 and 20 still do not have results for 1980 or 1983. The outer join does nothing more than ensuring that we will have at least one record for every year. If department 20 provides a record for 1980 and 1983, the outer join does nothing to provide additional records for other departments as well.
And now finally to the Oracle 10g Partition Outer Join clause. This clause allows us to exactly solve the problem we are now facing: we want the outer join to be performed not just once but for every department again! Look at the following query:
with years as ( select 1979 + years.year year from ( select rownum year from dual connect by rownum < 6 ) years ) select deptno , years.year hireyear , count(empno) from emp partition by (deptno) right outer join years on (extract (year from hiredate) = years.year) group by deptno , years.year order by deptno , hireyear / DEPTNO HIREYEAR COUNT(EMPNO) ---------- ---------- ------------ 10 1980 0 10 1981 2 10 1982 1 10 1983 0 10 1984 0 10 1985 0 20 1980 1 20 1981 2 20 1982 1 20 1983 1 20 1984 0 20 1985 0 30 1980 0 30 1981 6 30 1982 0 30 1983 0 30 1984 0 30 1985 0
All this is brought about by adding 'partition by (deptno) to the EMP table. That simple clause instructs the OUTER JOIN to be performed on each department-partition of the EMP table.
If you do not happen to be on 10g just yet, you can simulate the Partition Outer Join with code like this:
with years as ( select 1979 + years.year year from ( select rownum year from dual connect by rownum < 6 ) years ) select depts.deptno , years.year hireyear , count(empno) from (select distinct deptno from emp ) depts cross join years left outer join emp staff on (extract (year from staff.hiredate) = years.year and depts.deptno = staff.deptno) group by depts.deptno , years.year order by depts.deptno , hireyear
Here I have cross joined a list of all departments with the list of all years. That means that I have set up as yet empty cells for every combination of department (deptno) and year. I then left outer join that grid of empty cells with the actual summary results of aggregating over emp. This will fill in as many cells as possible.