In a previous post Making up records in SQL Queries – Table Functions and 10g Model clause I discussed how you can make use of either a Table Function or the 10g SQL Model Clause to make up records in queries – as alternative to using a large table or dummy table to produce a certain number of records. During the three-day seminar by Tom Kyte that we – a seven man delegation from AMIS – attended this week, I saw a new way to do this ‘wizardry of new rows where none existed before’ using the CUBE statement.
From time to time there is a need in my SQL queries to have access to a table with a specific number of records. Sometimes I want to explode the results from a query – double, treble etc. the number of records. On other occasions I want to select from a specific range or outer join with a specified range to have an entry for every rank, date etc. More concrete examples (in terms of EMP and DEPT):
- select the number of employees hired on every day of the week – include entries for days on which no one was hired
- select the employees ranked on even positions when ordered by salary (note: there are many ways to approach this)
- present for the numbers 1 to 5 the number of employees that manage that number of subordinates
- select all employees as many times as there are Clerks
Of course the old way of doing this was create a utility or helper table that contained a very large number of records (something like dual but with many rows). Each row typically only contained a number. At other times you also have created tables with seven records (days of the week), twelve records (months of the year), 26 records (letters in the alphabet) etc. However, these could all have been derived in views from the one table with just many numeric values. You could also fake this table by using a view or table in your schema that you know to be always available and contain a substantial number of records. Tom Kyte’s favorite for example is ALL_OBJECTS. My previous post discussed ‘modern’ ways: TABLE FUNCTIONS (Oracle 9i) and the MODEL clause (10g). Now we will use the CUBE for an even more elegant – though slight less controllable way.
Using CUBE to generate rows
CUBE is used in the group by clause of select statements, to produce aggregations along every combination of dimensions described by the CUBE parameters. Typical use of CUBE is to produce subtotals at all dimension values. Some background on the CUBE for example in The SQL of OLAP Don’t overlook the core strength of your OLAP technology solution: SQL. By Michael Gonzales
The CUBE supergroup is the other extension to the GROUP BY clause that produces a result set that contains all the subtotal rows of a ROLLUP aggregation and, in addition, contains “cross-tabulation” rows. Cross-tabulation rows are additional superaggregate rows. They are, as the name implies, summaries across columns as if the data were represented as a spreadsheet. Like ROLLUP, a CUBE group can also be thought of as a series of grouping-sets. In the case of a CUBE, all permutations of the cubed grouping expression are computed along with the grand total. Therefore, the n elements of a CUBE translate to 2 raised to the power of n grouping-sets.
GROUP BY CUBE (a,b,c) is equivalent to:
GROUP BY GROUPING SETS
(
(a,b,c)
(a,b)
(a,c)
(b,c)
(a)
(b)
( c)
()
)Notice that the three elements of the CUBE translate to eight grouping sets.
The number of additional records produced by a CUBE depends on the number of distinct values in each dimension. For example: show the sum of salaries for all job values as well as all departments as well as all combinations of the two as well as a grand total:
select d.dname , e.job , sum(e.sal) from emp e , dept d where e.deptno = d.deptno group by cube(d.dname, e.job) / DNAME JOB SUM(E.SAL) -------------- --------- ---------- 29025 CLERK 4150 ANALYST 6000 MANAGER 8275 SALESMAN 5600 PRESIDENT 5000 SALES 9400 SALES CLERK 950 SALES MANAGER 2850 SALES SALESMAN 5600 RESEARCH 10875 RESEARCH CLERK 1900 RESEARCH ANALYST 6000 RESEARCH MANAGER 2975 ACCOUNTING 8750 ACCOUNTING CLERK 1300 ACCOUNTING MANAGER 2450 ACCOUNTING PRESIDENT 5000
In this case, the number of records returned is 18: 4 grouping sets: (job, dept),(job),(dept),() with 3 departments and 5 jobs (but only three jobs per department): 3 (dept) + 5 (job) + 1 () + 3 * 3 (job, dept).
We can also use CUBE to generate ‘dummy totals’ without any real aggregations. CUBE will produce POWER( 2,x)
records, where x is the number of dimensions listed in the parameterlist. For example:
select 1 from dual group by CUBE (1,2,3,4)
will generate 2 to the power of 4 = 2*2*2*2= 16 rows. Note that CUBE (1,1,1,1) or CUBE (a,a,a,a) would give the same result; I use 1,2,3,4 to easily derive the number of records returned.
Using this ‘row generator’ in an in-line view with where clause on rownum we can control exactly how many records are produced. This can be used in many ways.
List the days of the week (knowing that 21st november 2004 is a Sunday
SELECT to_char( days.day + to_date('21-11-2004','DD-MM-YYYY'),'DAY') day FROM ( select rownum day from ( select 1 from dual group by cube (1,2,3) ) where rownum<8) days
The result:
DAY --------- MONDAY TUESDAY WEDNESDAY THURSDAY FRIDAY SATURDAY SUNDAY
Perhaps slightly more elegant is the following incarnation of this query:
WITH nums as ( select rownum num from ( select 1 num from dual group by cube (1,2,3) ) ) SELECT to_char( days.day + to_date('21-11-2004','DD-MM-YYYY'),'DAY') day FROM ( select num day from nums where num<8) days
Find for the numbers 0 through 5 the number of managers who have that number of subordinates:
WITH nums as ( select rownum num from ( select 1 num from dual group by cube (1,2,3) ) ) SELECT num_subs.num number_of_subordinates , count(subs_count.cnt) number_of_mgrs FROM nums num_subs left outer join (select count(*) cnt from emp group by mgr) subs_count on num_subs.num-1 = subs_count.cnt where num_subs.num < 7 group by num_subs.num
Results:
NUMBER_OF_SUBORDINATES NUMBER_OF_MGRS ---------------------- -------------- 0 0 1 4 2 1 3 1 4 0 5 1
Select the employees ranked 5 through 7 when ordered by salary, descending:
WITH nums as ( select rownum num from ( select 1 num from dual group by cube (1,2,3) ) ) select * from nums , ( select emp.*, rownum rn from ( select emp.* from emp order by sal desc) emp ) emp where rn = nums.num and nums.num between 5 and 7
COLUMN_VALUE EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO RN ------------ ---------- ---------- --------- ---------- --------- ---------- ---------- ---------- - 5 7698 BLAKE MANAGER 7839 01-MAY-81 2850 30 5 6 7782 CLARK MARKETEER 7839 09-JUN-81 2450 10 6 7 7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30 7
select rownum from (select 1 from dual group by cube(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20));
is slow but using a carthesian join is fast.
select rownum
from (select 1 from dual group by (cube(1,2,3,4,5,6,7,8,9,10))
, (select 1 from dual group by (cube(1,2,3,4,5,6,7,8,9,10));
and deleivers the same number of rows.
This is faster than
select count(rownum ) FROM dual CONNECT BY ROWNUM
Je moet quotes, kleiner dan , groter dan etc. escapen alsof je html of xml typt. Dus met een ampersand ervoor en een punt-komma erachter. Of, wat soms ook lukt, is een pre tag ervoor en een pre tag erna.
Anton,
Thanks for clearing that up. ๐
Anton,
On all three of your suggestions I got: ERROR at line 1:ORA-00920: invalid relational operator
SQL> select rownum from dual connect by rownum
2 /
select rownum from dual connect by rownum
*
ERROR at line 1:
ORA-00920: invalid relational operator
Did you leave something out of the scripts? (on the second one, there is a parenthesis missing)
What also works as a row-generator:
I wa looking for a script to generate lots of dates, grouped by quarter, month and year. Starting with Lucas’ day-generator script I experienced problems with leap years. I adjusted the script so it generates months in stead of days. Following is a leap year compliant script that generates months, quarters and years:
select to_char(add_months(to_date(’01-12-1996′,’DD-MM-YYYY’), r.months), ‘Q’) kwartaal
, to_char(add_months(to_date(’01-12-1996′,’DD-MM-YYYY’), r.months), ‘MONTH’, ‘NLS_DATE_LANGUAGE = dutch’) maand
, to_char(add_months(to_date(’01-12-1996′,’DD-MM-YYYY’), r.months), ‘YYYY’) jaar
–, last_day(add_months(to_date(’01-12-1996′,’DD-MM-YYYY’), r.months)) last_day
FROM ( select rownum months
from ( select 0
from dual
group
by cube (1,2,3,4,5,6,7,8)
)
) r
;
I happen to see this titled ‘Weird Results’ on askTom. It kinda explains the result I saw in comment 1.
(just to let you know)
I think for generating 1 million records it is much better to use a table (or cartesian product in an in-line view) that you know to contain at least a million records. This CUBE approach seems best suited for small numbers (say up to a 1000). Perhaps one day I will benchmark the Table Function, Model, Cube and Dummy Table approach.
Wow, it’s slow. 2^20 = 1 million rows takes forever.
I noticed something weird when running the examples you posted.
When I run this query in SQL*Plus, I only get one row back.
select 1 from dual
group by cube (1,2,3,4)
When I run the same query in TOAD, I get 16 rows as you predicted.
Do you happen to know why?
Alex