Aggregating in SQL across multiple levels and along several dimensions using ROLLUP and GROUPING SETS
In preparation for the Oracle 7Up Workshop as well as the Oracle Quiz – The Water is Still Burning (presentation and paper) with Alex for the ODTUG 2006 conference in June in Washington DC, I revisited some old pals of mine: the ROLLUP and the GROUPING SETS aggregation operators. Let’s take a brief look at what again they can be used for. Note that I intentionally shy away from the CUBE operator for now…
Select statements have been perfectly capable of retrieving aggregate values. Maximum and Average Salaries, Count of Job occurrences, Minimum Hiredate: all hunky dory. There are two serious limitations though: all aggregations in a query have to be along the same dimension and at the same level. You cannot query in a single the query the Maximum Salary for all Salesmen and the Average Salary per Department. Nor can you query the average salary per Employee as well as per Hire Year.
In this article we discuss ROLLUP (8i) and GROUPING SETS (9i) that make both operations possible.
Crossing Dimensions using GROUPING SETS
Using Grouping Sets, introduced in Oracle 9i, you can aggregate along different dimensions without resorting to not so elegant and less than perfect performing UNION (ALL). The following query for example gives us the maximum salary per JOB as well as per DEPARTMENT:
select deptno<br />, job<br />, max(sal) max_sal<br />from emp<br />group<br />by grouping sets (deptno, job) <br />
The results suggest that PRESIDENT in Department 10 is the best paid position:
Grouping Sets are a quite flexible mechanism to determine exactly for which (combinations of) columns we want to have an aggregate value calculated. We can also add aggregates for the combination of DEPTNO and JOB:
select deptno<br />, job<br />, max(sal) max_sal<br />from emp<br />group<br />by grouping sets (deptno, job, (deptno, job))
Note: all these results can be thought of as to be constructed using multiple group by clauses in a single select-statement, like this:
select deptno<br />, job<br />, max(sal) max_sal<br />from emp<br />group<br />by deptno<br />group<br />by job <br />group<br />by job, deptno <br />
This is invalid syntax, but it illustrates what grouping sets are like when executed.
Aggregation at various levels of a hierarchy
Now for aggregation at various levels in a single query. If we want to retrieve the Average Salary for all employees as well as the Average Salary per Year and per Quarter and per Month in which employees have been hired, we are out of luck with the normal aggregation; that is: we need to use UNION ALL to combine several queries. However, using the ROLLUP operator, we can aggregate from multiple levels in a single query.
SELECT extract(year from hiredate) year<br />, 'Q'||to_char(hiredate, 'Q') quarter<br />, to_char(hiredate, 'MONTH') month<br />, count(empno) num<br />, avg(sal) avg_sal<br />FROM emp<br />GROUP<br />BY ROLLUP<br /> ( extract(year from hiredate)<br /> , 'Q'||to_char(hiredate, 'Q')<br /> , to_char(hiredate, 'MONTH')<br /> )<br />
This query returns the number of employees hired per month, quarter, year and in total – as well as the average salary at each of those levels:
With ROLLUP we specify to the SQL engine that we desire aggregations at each of the levels of a ‘hire-time hierarchy’ that stretches from Month via Quarter and Year to the Grand Total level. Note that the order of the expressions in the ROLLUP clause determines this hierarchy: the most detailed level is last.
The last record in the resultset is somewhat enigmatic. It only says 14 and 2073 – there is no clue as to what the year, quarter and month are. Does this mean we have 14 employees with NULL for a hiredate? Probably not, but to be sure, we can use the GROUPING function to determine for a record whether it contains an aggregation at a certain level:
SELECT case grouping(extract(year from hiredate))<br /> when 1<br /> then 'All Years'<br /> else to_char(extract(year from hiredate))<br /> end year<br />, case grouping('Q'||to_char(hiredate, 'Q'))<br /> when 1<br /> then 'All Quarters'<br /> else 'Q'||to_char(hiredate, 'Q')<br /> end quarter<br />, case grouping(to_char(hiredate, 'MONTH'))<br /> when 1<br /> then 'All Months'<br /> else to_char(hiredate, 'MONTH')<br /> end month<br />, count(empno) num<br />, avg(sal) avg_sal<br />FROM emp<br />GROUP<br />BY ROLLUP<br /> ( extract(year from hiredate)<br /> , 'Q'||to_char(hiredate, 'Q')<br /> , to_char(hiredate, 'MONTH')<br /> ) <br />
With this query, the results look like this:
- Summary results for all dates, including the ones that I do not have data for – example of using Partition Outer Join – Oracle10g SQL Feature
- Courtesy of Tom Kyte: Generating rows in SQL with the CUBE statement – no dummy table or table function required
- Proxy Servlet for AJAX requests to multiple, remote servers
- Analytical SQL Functions – theory and examples – Part 1 on the Partition Clause
- Making up records in SQL Queries – Table Functions and 10g Model clause