Aggregating in SQL across multiple levels and along several dimensions using ROLLUP and GROUPING SETS

1

In preparation for the Oracle 7Up Workshop as well as the Oracle Quiz – The Water is Still Burning (presentation and paper) with Alex for the ODTUG 2006 conference in June in Washington DC, I revisited some old pals of mine: the ROLLUP and the GROUPING SETS aggregation operators. Let’s take a brief look at what again they can be used for. Note that I intentionally shy away from the CUBE operator for now…

Select statements have been perfectly capable of retrieving aggregate values. Maximum and Average Salaries, Count of Job occurrences, Minimum Hiredate: all hunky dory. There are two serious limitations though: all aggregations in a query have to be along the same dimension and at the same level. You cannot query in a single the query the Maximum Salary for all Salesmen and the Average Salary per Department. Nor can you query the average salary per Employee as well as per Hire Year.

In this article we discuss ROLLUP (8i) and GROUPING SETS (9i) that make both operations possible.

....

Crossing Dimensions using GROUPING SETS

Using Grouping Sets, introduced in Oracle 9i, you can aggregate along different dimensions without resorting to not so elegant and less than perfect performing UNION (ALL). The following query for example gives us the maximum salary per JOB as well as per DEPARTMENT:

select deptno<br />,      job<br />,      max(sal) max_sal<br />from   emp<br />group<br />by     grouping sets (deptno, job) <br />

The results suggest that PRESIDENT in Department 10 is the best paid position:
 
Grouping Sets are a quite flexible mechanism to determine exactly for which (combinations of) columns we want to have an aggregate value calculated. We can also add aggregates for the combination of DEPTNO and JOB:

select deptno<br />,      job<br />,      max(sal) max_sal<br />from   emp<br />group<br />by     grouping sets (deptno, job, (deptno, job))


Note: all these results can be thought of as to be constructed using multiple group by clauses in a single select-statement, like this:

select deptno<br />,      job<br />,      max(sal) max_sal<br />from   emp<br />group<br />by     deptno<br />group<br />by     job <br />group<br />by     job, deptno <br />

This is invalid syntax, but it illustrates what grouping sets are like when executed.

Aggregation at various levels of a hierarchy

 

Now for aggregation at various levels in a single query. If we want to retrieve the Average Salary for all employees as well as the Average Salary per Year and per Quarter and per Month in which employees have been hired, we are out of luck with the normal aggregation; that is: we need to use UNION ALL to combine several queries. However, using the ROLLUP operator, we can aggregate from multiple levels in a single query.

SELECT extract(year from hiredate) year<br />,      'Q'||to_char(hiredate, 'Q') quarter<br />,      to_char(hiredate, 'MONTH')  month<br />,      count(empno) num<br />,      avg(sal) avg_sal<br />FROM   emp<br />GROUP<br />BY     ROLLUP<br />       ( extract(year from hiredate)<br />       , 'Q'||to_char(hiredate, 'Q')<br />       , to_char(hiredate, 'MONTH')<br />       )<br />

This query returns the number of employees hired per month, quarter, year and in total – as well as the average salary at each of those levels:
 
With ROLLUP we specify to the SQL engine that we desire aggregations at each of the levels of a ‘hire-time hierarchy’ that stretches from Month via Quarter and Year to the Grand Total level. Note that the order of the expressions in the ROLLUP clause determines this hierarchy: the most detailed level is last.

The last record in the resultset is somewhat enigmatic. It only says 14 and 2073 – there is no clue as to what the year, quarter and month are. Does this mean we have 14 employees with NULL for a hiredate? Probably not, but to be sure, we can use the GROUPING function to determine for a record whether it contains an aggregation at a certain level:

SELECT case grouping(extract(year from hiredate))<br />       when 1<br />       then 'All Years'<br />       else to_char(extract(year from hiredate))<br />       end  year<br />,      case grouping('Q'||to_char(hiredate, 'Q'))<br />       when 1<br />       then 'All Quarters'<br />       else 'Q'||to_char(hiredate, 'Q')<br />       end  quarter<br />,      case grouping(to_char(hiredate, 'MONTH'))<br />       when 1<br />       then 'All Months'<br />       else to_char(hiredate, 'MONTH')<br />       end  month<br />,      count(empno) num<br />,      avg(sal) avg_sal<br />FROM   emp<br />GROUP<br />BY     ROLLUP<br />       ( extract(year from hiredate)<br />       , 'Q'||to_char(hiredate, 'Q')<br />       , to_char(hiredate, 'MONTH')<br />       )&nbsp;<br />

With this query, the results look like this:
 

Share.

About Author

Lucas Jellema, active in IT (and with Oracle) since 1994. Oracle ACE Director for Fusion Middleware. Consultant, trainer and instructor on diverse areas including Oracle Database (SQL & PLSQL), Service Oriented Architecture, BPM, ADF, Java in various shapes and forms and many other things. Author of the Oracle Press book: Oracle SOA Suite 11g Handbook. Frequent presenter on conferences such as JavaOne, Oracle OpenWorld, ODTUG Kaleidoscope, Devoxx and OBUG. Presenter for Oracle University Celebrity specials.

1 Comment

  1. A remark to a statement, that UNION (ALL) is “less than perfect performing”:
    * UNION is bad performing, it has to sort all queries and remove duplicates
    * UNION ALL is perfectly performing, it just executes first query, returns rows, executes the second and returns rows. It is equal to executing both queries.

    Viliam