Courtesy of Tom Kyte: Generating rows in SQL with the CUBE statement – no dummy table or table function required
In a previous post Making up records in SQL Queries – Table Functions and 10g Model clause I discussed how you can make use of either a Table Function or the 10g SQL Model Clause to make up records in queries – as alternative to using a large table or dummy table to produce a certain number of records. During the three-day seminar by Tom Kyte that we – a seven man delegation from AMIS – attended this week, I saw a new way to do this ‘wizardry of new rows where none existed before’ using the CUBE statement.
From time to time there is a need in my SQL queries to have access to a table with a specific number of records. Sometimes I want to explode the results from a query – double, treble etc. the number of records. On other occasions I want to select from a specific range or outer join with a specified range to have an entry for every rank, date etc. More concrete examples (in terms of EMP and DEPT):
- select the number of employees hired on every day of the week – include entries for days on which no one was hired
- select the employees ranked on even positions when ordered by salary (note: there are many ways to approach this)
- present for the numbers 1 to 5 the number of employees that manage that number of subordinates
- select all employees as many times as there are Clerks
Of course the old way of doing this was create a utility or helper table that contained a very large number of records (something like dual but with many rows). Each row typically only contained a number. At other times you also have created tables with seven records (days of the week), twelve records (months of the year), 26 records (letters in the alphabet) etc. However, these could all have been derived in views from the one table with just many numeric values. You could also fake this table by using a view or table in your schema that you know to be always available and contain a substantial number of records. Tom Kyte’s favorite for example is ALL_OBJECTS. My previous post discussed ‘modern’ ways: TABLE FUNCTIONS (Oracle 9i) and the MODEL clause (10g). Now we will use the CUBE for an even more elegant – though slight less controllable way.
Using CUBE to generate rows
CUBE is used in the group by clause of select statements, to produce aggregations along every combination of dimensions described by the CUBE parameters. Typical use of CUBE is to produce subtotals at all dimension values. Some background on the CUBE for example in The SQL of OLAP Don’t overlook the core strength of your OLAP technology solution: SQL. By Michael Gonzales
The CUBE supergroup is the other extension to the GROUP BY clause that produces a result set that contains all the subtotal rows of a ROLLUP aggregation and, in addition, contains “cross-tabulation” rows. Cross-tabulation rows are additional superaggregate rows. They are, as the name implies, summaries across columns as if the data were represented as a spreadsheet. Like ROLLUP, a CUBE group can also be thought of as a series of grouping-sets. In the case of a CUBE, all permutations of the cubed grouping expression are computed along with the grand total. Therefore, the n elements of a CUBE translate to 2 raised to the power of n grouping-sets.GROUP BY CUBE (a,b,c) is equivalent to:
GROUP BY GROUPING SETS
(
(a,b,c)
(a,b)
(a,c)
(b,c)
(a)
(b)
( c)
()
)Notice that the three elements of the CUBE translate to eight grouping sets.
The number of additional records produced by a CUBE depends on the number of distinct values in each dimension. For example: show the sum of salaries for all job values as well as all departments as well as all combinations of the two as well as a grand total:
select d.dname
, e.job
, sum(e.sal)
from emp e
, dept d
where e.deptno = d.deptno
group by cube(d.dname, e.job)
/
DNAME JOB SUM(E.SAL)
-------------- --------- ----------
29025
CLERK 4150
ANALYST 6000
MANAGER 8275
SALESMAN 5600
PRESIDENT 5000
SALES 9400
SALES CLERK 950
SALES MANAGER 2850
SALES SALESMAN 5600
RESEARCH 10875
RESEARCH CLERK 1900
RESEARCH ANALYST 6000
RESEARCH MANAGER 2975
ACCOUNTING 8750
ACCOUNTING CLERK 1300
ACCOUNTING MANAGER 2450
ACCOUNTING PRESIDENT 5000
In this case, the number of records returned is 18: 4 grouping sets: (job, dept),(job),(dept),() with 3 departments and 5 jobs (but only three jobs per department): 3 (dept) + 5 (job) + 1 () + 3 * 3 (job, dept).
We can also use CUBE to generate ‘dummy totals’ without any real aggregations. CUBE will produce POWER( 2,x) records, where x is the number of dimensions listed in the parameterlist. For example:
select 1 from dual group by CUBE (1,2,3,4)
will generate 2 to the power of 4 = 2*2*2*2= 16 rows. Note that CUBE (1,1,1,1) or CUBE (a,a,a,a) would give the same result; I use 1,2,3,4 to easily derive the number of records returned.
Using this ‘row generator’ in an in-line view with where clause on rownum we can control exactly how many records are produced. This can be used in many ways.
List the days of the week (knowing that 21st november 2004 is a Sunday
SELECT to_char( days.day + to_date('21-11-2004','DD-MM-YYYY'),'DAY') day
FROM ( select rownum day
from ( select 1
from dual
group
by cube (1,2,3)
)
where rownum<8) days
The result:
DAY --------- MONDAY TUESDAY WEDNESDAY THURSDAY FRIDAY SATURDAY SUNDAY
Perhaps slightly more elegant is the following incarnation of this query:
WITH nums as
( select rownum num
from ( select 1 num
from dual
group
by cube (1,2,3)
)
)
SELECT to_char( days.day + to_date('21-11-2004','DD-MM-YYYY'),'DAY') day
FROM ( select num day
from nums
where num<8) days
Find for the numbers 0 through 5 the number of managers who have that number of subordinates:
WITH nums as
( select rownum num
from ( select 1 num
from dual
group
by cube (1,2,3)
)
)
SELECT num_subs.num number_of_subordinates
, count(subs_count.cnt) number_of_mgrs
FROM nums num_subs
left outer join
(select count(*) cnt from emp group by mgr) subs_count
on num_subs.num-1 = subs_count.cnt
where num_subs.num < 7
group
by num_subs.num
Results:
NUMBER_OF_SUBORDINATES NUMBER_OF_MGRS
---------------------- --------------
0 0
1 4
2 1
3 1
4 0
5 1
Select the employees ranked 5 through 7 when ordered by salary, descending:
WITH nums as
( select rownum num
from ( select 1 num
from dual
group
by cube (1,2,3)
)
)
select *
from nums
, ( select emp.*, rownum rn
from ( select emp.*
from emp
order
by sal desc) emp
) emp
where rn = nums.num
and nums.num between 5 and 7
COLUMN_VALUE EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO RN
------------ ---------- ---------- --------- ---------- --------- ---------- ---------- ---------- -
5 7698 BLAKE MANAGER 7839 01-MAY-81 2850 30 5
6 7782 CLARK MARKETEER 7839 09-JUN-81 2450 10 6
7 7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30 7
select rownum from (select 1 from dual group by cube(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20));
is slow but using a carthesian join is fast.
select rownum
from (select 1 from dual group by (cube(1,2,3,4,5,6,7,8,9,10))
, (select 1 from dual group by (cube(1,2,3,4,5,6,7,8,9,10));
and deleivers the same number of rows.
This is faster than
select count(rownum ) FROM dual CONNECT BY ROWNUM