Aggregating in SQL across multiple levels and along several dimensions using ROLLUP and GROUPING SETS

In preparation for the Oracle 7Up Workshop as well as the Oracle Quiz – The Water is Still Burning (presentation and paper) with Alex for the ODTUG 2006 conference in June in Washington DC, I revisited some old pals of mine: the ROLLUP and the GROUPING SETS aggregation operators. Let’s take a brief look at what again they can be used for. Note that I intentionally shy away from the CUBE operator for now…

Select statements have been perfectly capable of retrieving aggregate values. Maximum and Average Salaries, Count of Job occurrences, Minimum Hiredate: all hunky dory. There are two serious limitations though: all aggregations in a query have to be along the same dimension and at the same level. You cannot query in a single the query the Maximum Salary for all Salesmen and the Average Salary per Department. Nor can you query the average salary per Employee as well as per Hire Year.

In this article we discuss ROLLUP (8i) and GROUPING SETS (9i) that make both operations possible.

....

Crossing Dimensions using GROUPING SETS

Using Grouping Sets, introduced in Oracle 9i, you can aggregate along different dimensions without resorting to not so elegant and less than perfect performing UNION (ALL). The following query for example gives us the maximum salary per JOB as well as per DEPARTMENT:

select deptno
, job
, max(sal) max_sal
from emp
group
by grouping sets (deptno, job)

The results suggest that PRESIDENT in Department 10 is the best paid position:
Aggregating in SQL across multiple levels and along several dimensions using ROLLUP and GROUPING SETS sqlRollup1 
Grouping Sets are a quite flexible mechanism to determine exactly for which (combinations of) columns we want to have an aggregate value calculated. We can also add aggregates for the combination of DEPTNO and JOB:

select deptno
, job
, max(sal) max_sal
from emp
group
by grouping sets (deptno, job, (deptno, job))

Aggregating in SQL across multiple levels and along several dimensions using ROLLUP and GROUPING SETS sqlRollup2
Note: all these results can be thought of as to be constructed using multiple group by clauses in a single select-statement, like this:

select deptno
, job
, max(sal) max_sal
from emp
group
by deptno
group
by job
group
by job, deptno

This is invalid syntax, but it illustrates what grouping sets are like when executed.

Aggregation at various levels of a hierarchy

 

Now for aggregation at various levels in a single query. If we want to retrieve the Average Salary for all employees as well as the Average Salary per Year and per Quarter and per Month in which employees have been hired, we are out of luck with the normal aggregation; that is: we need to use UNION ALL to combine several queries. However, using the ROLLUP operator, we can aggregate from multiple levels in a single query.

SELECT extract(year from hiredate) year
, 'Q'||to_char(hiredate, 'Q') quarter
, to_char(hiredate, 'MONTH') month
, count(empno) num
, avg(sal) avg_sal
FROM emp
GROUP
BY ROLLUP
( extract(year from hiredate)
, 'Q'||to_char(hiredate, 'Q')
, to_char(hiredate, 'MONTH')
)

This query returns the number of employees hired per month, quarter, year and in total – as well as the average salary at each of those levels:
 Aggregating in SQL across multiple levels and along several dimensions using ROLLUP and GROUPING SETS sqlRollup3
With ROLLUP we specify to the SQL engine that we desire aggregations at each of the levels of a ‘hire-time hierarchy’ that stretches from Month via Quarter and Year to the Grand Total level. Note that the order of the expressions in the ROLLUP clause determines this hierarchy: the most detailed level is last.

The last record in the resultset is somewhat enigmatic. It only says 14 and 2073 – there is no clue as to what the year, quarter and month are. Does this mean we have 14 employees with NULL for a hiredate? Probably not, but to be sure, we can use the GROUPING function to determine for a record whether it contains an aggregation at a certain level:

SELECT case grouping(extract(year from hiredate))
when 1
then 'All Years'
else to_char(extract(year from hiredate))
end year
, case grouping('Q'||to_char(hiredate, 'Q'))
when 1
then 'All Quarters'
else 'Q'||to_char(hiredate, 'Q')
end quarter
, case grouping(to_char(hiredate, 'MONTH'))
when 1
then 'All Months'
else to_char(hiredate, 'MONTH')
end month
, count(empno) num
, avg(sal) avg_sal
FROM emp
GROUP
BY ROLLUP
( extract(year from hiredate)
, 'Q'||to_char(hiredate, 'Q')
, to_char(hiredate, 'MONTH')

With this query, the results look like this:
Aggregating in SQL across multiple levels and along several dimensions using ROLLUP and GROUPING SETS sqlRollup4 

One Response

  1. Viliam April 19, 2006