Courtesy of Tom Kyte: Generating rows in SQL with the CUBE statement – no dummy table or table function required

13

In a previous post Making up records in SQL Queries – Table Functions and 10g Model clause I discussed how you can make use of either a Table Function or the 10g SQL Model Clause to make up records in queries – as alternative to using a large table or dummy table to produce a certain number of records. During the three-day seminar by Tom Kyte that we – a seven man delegation from AMIS – attended this week, I saw a new way to do this ‘wizardry of new rows where none existed before’ using the CUBE statement.

From time to time there is a need in my SQL queries to have access to a table with a specific number of records. Sometimes I want to explode the results from a query – double, treble etc. the number of records. On other occasions I want to select from a specific range or outer join with a specified range to have an entry for every rank, date etc. More concrete examples (in terms of EMP and DEPT):

  • select the number of employees hired on every day of the week – include entries for days on which no one was hired
  • select the employees ranked on even positions when ordered by salary (note: there are many ways to approach this)
  • present for the numbers 1 to 5 the number of employees that manage that number of subordinates
  • select all employees as many times as there are Clerks

Of course the old way of doing this was create a utility or helper table that contained a very large number of records (something like dual but with many rows). Each row typically only contained a number. At other times you also have created tables with seven records (days of the week), twelve records (months of the year), 26 records (letters in the alphabet) etc. However, these could all have been derived in views from the one table with just many numeric values. You could also fake this table by using a view or table in your schema that you know to be always available and contain a substantial number of records. Tom Kyte’s favorite for example is ALL_OBJECTS. My previous post discussed ‘modern’ ways: TABLE FUNCTIONS (Oracle 9i) and the MODEL clause (10g). Now we will use the CUBE for an even more elegant – though slight less controllable way.

Using CUBE to generate rows

CUBE is used in the group by clause of select statements, to produce aggregations along every combination of dimensions described by the CUBE parameters. Typical use of CUBE is to produce subtotals at all dimension values. Some background on the CUBE for example in The SQL of OLAP Don’t overlook the core strength of your OLAP technology solution: SQL. By Michael Gonzales

The CUBE supergroup is the other extension to the GROUP BY clause that produces a result set that contains all the subtotal rows of a ROLLUP aggregation and, in addition, contains “cross-tabulation” rows. Cross-tabulation rows are additional superaggregate rows. They are, as the name implies, summaries across columns as if the data were represented as a spreadsheet. Like ROLLUP, a CUBE group can also be thought of as a series of grouping-sets. In the case of a CUBE, all permutations of the cubed grouping expression are computed along with the grand total. Therefore, the n elements of a CUBE translate to 2 raised to the power of n grouping-sets.

GROUP BY CUBE (a,b,c) is equivalent to:

GROUP BY GROUPING SETS
(
(a,b,c)
(a,b)
(a,c)
(b,c)
(a)
(b)
( c)
()
)

Notice that the three elements of the CUBE translate to eight grouping sets.

The number of additional records produced by a CUBE depends on the number of distinct values in each dimension. For example: show the sum of salaries for all job values as well as all departments as well as all combinations of the two as well as a grand total:

 select d.dname
 ,      e.job
 ,      sum(e.sal)
 from   emp e
 ,      dept d
 where  e.deptno = d.deptno
 group by cube(d.dname, e.job)
 /
DNAME          JOB       SUM(E.SAL)
-------------- --------- ----------
                              29025
               CLERK           4150
               ANALYST         6000
               MANAGER         8275
               SALESMAN        5600
               PRESIDENT       5000
SALES                          9400
SALES          CLERK            950
SALES          MANAGER         2850
SALES          SALESMAN        5600
RESEARCH                      10875
RESEARCH       CLERK           1900
RESEARCH       ANALYST         6000
RESEARCH       MANAGER         2975
ACCOUNTING                     8750
ACCOUNTING     CLERK           1300
ACCOUNTING     MANAGER         2450
ACCOUNTING     PRESIDENT       5000

In this case, the number of records returned is 18: 4 grouping sets: (job, dept),(job),(dept),() with 3 departments and 5 jobs (but only three jobs per department): 3 (dept) + 5 (job) + 1 () + 3 * 3 (job, dept).

We can also use CUBE to generate ‘dummy totals’ without any real aggregations. CUBE will produce POWER( 2,x) records, where x is the number of dimensions listed in the parameterlist. For example:

select 1 from dual group by CUBE (1,2,3,4)

will generate 2 to the power of 4 = 2*2*2*2= 16 rows. Note that CUBE (1,1,1,1) or CUBE (a,a,a,a) would give the same result; I use 1,2,3,4 to easily derive the number of records returned.

Using this ‘row generator’ in an in-line view with where clause on rownum we can control exactly how many records are produced. This can be used in many ways.
List the days of the week (knowing that 21st november 2004 is a Sunday

SELECT to_char( days.day + to_date('21-11-2004','DD-MM-YYYY'),'DAY') day
FROM   ( select rownum  day
         from   ( select 1
                  from   dual
                  group
                  by     cube (1,2,3)
                )
         where rownum<8) days

The result:

DAY
---------
MONDAY
TUESDAY
WEDNESDAY
THURSDAY
FRIDAY
SATURDAY
SUNDAY

Perhaps slightly more elegant is the following incarnation of this query:

WITH nums as
( select rownum num
  from   ( select 1 num
           from   dual
           group
           by     cube (1,2,3)
         )
)
SELECT to_char( days.day + to_date('21-11-2004','DD-MM-YYYY'),'DAY') day
FROM   ( select num day
         from   nums
         where  num<8) days

Find for the numbers 0 through 5 the number of managers who have that number of subordinates:

WITH nums as
( select rownum num
  from   ( select 1 num
           from   dual
           group
           by     cube (1,2,3)
         )
)
SELECT num_subs.num number_of_subordinates
,      count(subs_count.cnt) number_of_mgrs
FROM   nums num_subs
       left outer join
       (select count(*) cnt from emp group by mgr) subs_count
       on  num_subs.num-1 = subs_count.cnt
where  num_subs.num < 7
group
by     num_subs.num

Results:

NUMBER_OF_SUBORDINATES NUMBER_OF_MGRS
---------------------- --------------
                     0              0
                     1              4
                     2              1
                     3              1
                     4              0
                     5              1

Select the employees ranked 5 through 7 when ordered by salary, descending:

WITH nums as
( select rownum num
  from   ( select 1 num
           from   dual
           group
           by     cube (1,2,3)
         )
)
select *
from   nums
,      ( select emp.*, rownum rn
         from ( select emp.*
                from emp
                order
                by sal desc) emp
       ) emp
where  rn = nums.num
and    nums.num between 5 and 7
COLUMN_VALUE      EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO         RN
------------ ---------- ---------- --------- ---------- --------- ---------- ---------- ---------- -
           5       7698 BLAKE      MANAGER         7839 01-MAY-81       2850                    30          5
           6       7782 CLARK      MARKETEER       7839 09-JUN-81       2450                    10          6
           7       7499 ALLEN      SALESMAN        7698 20-FEB-81       1600        300         30          7
Share.

About Author

Lucas Jellema, active in IT (and with Oracle) since 1994. Oracle ACE Director for Fusion Middleware. Consultant, trainer and instructor on diverse areas including Oracle Database (SQL & PLSQL), Service Oriented Architecture, BPM, ADF, Java in various shapes and forms and many other things. Author of the Oracle Press book: Oracle SOA Suite 11g Handbook. Frequent presenter on conferences such as JavaOne, Oracle OpenWorld, ODTUG Kaleidoscope, Devoxx and OBUG. Presenter for Oracle University Celebrity specials.

13 Comments

  1. Bert de Geus on

    select rownum from (select 1 from dual group by cube(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20));

    is slow but using a carthesian join is fast.

    select rownum
    from (select 1 from dual group by (cube(1,2,3,4,5,6,7,8,9,10))
    , (select 1 from dual group by (cube(1,2,3,4,5,6,7,8,9,10));

    and deleivers the same number of rows.
    This is faster than

    select count(rownum ) FROM dual CONNECT BY ROWNUM

  2. Pingback: AMIS Technology blog » Blog Archive » Puzzelen met SQL – Testdata Generatie

  3. Je moet quotes, kleiner dan , groter dan etc. escapen alsof je html of xml typt. Dus met een ampersand ervoor en een punt-komma erachter. Of, wat soms ook lukt, is een pre tag ervoor en een pre tag erna.

  4. Alex, I didn`t leave anything out of the scripts,
    but the software which displays this blog (or the browser ) couldn`t handle the "smaller than sign"  < I used.
    And updating a message doesn`t like the single quote ' by the way
    
  5. Alex Nuijten on

    Anton,

    On all three of your suggestions I got: ERROR at line 1:ORA-00920: invalid relational operator
    SQL> select rownum from dual connect by rownum
    2 /
    select rownum from dual connect by rownum
    *
    ERROR at line 1:
    ORA-00920: invalid relational operator

    Did you leave something out of the scripts? (on the second one, there is a parenthesis missing)

    What also works as a row-generator:

    select *
      from (select level l
              from dual
           connect by level < = 6
           )
    /
    
  6. Another number generator is "select rownum from dual connect by rownum < 10"
    
    select to_char( days.day + to_date('21-11-2004','DD-MM-YYYY'),'DAY') day
    from ( select rownum day from dual connect by rownum < 8 ) days
    
    And select count(rownum )  FROM dual CONNECT BY ROWNUM < 1048576 takes 4 seconds
    
    Updated: the database which stores this blog didn`t liked the < and quotes in my message,
    
  7. I wa looking for a script to generate lots of dates, grouped by quarter, month and year. Starting with Lucas’ day-generator script I experienced problems with leap years. I adjusted the script so it generates months in stead of days. Following is a leap year compliant script that generates months, quarters and years:

    select to_char(add_months(to_date(’01-12-1996′,’DD-MM-YYYY’), r.months), ‘Q’) kwartaal
    , to_char(add_months(to_date(’01-12-1996′,’DD-MM-YYYY’), r.months), ‘MONTH’, ‘NLS_DATE_LANGUAGE = dutch’) maand
    , to_char(add_months(to_date(’01-12-1996′,’DD-MM-YYYY’), r.months), ‘YYYY’) jaar
    –, last_day(add_months(to_date(’01-12-1996′,’DD-MM-YYYY’), r.months)) last_day
    FROM ( select rownum months
    from ( select 0
    from dual
    group
    by cube (1,2,3,4,5,6,7,8)
    )
    ) r
    ;

  8. Alex Nuijten on

    I happen to see this titled ‘Weird Results’ on askTom. It kinda explains the result I saw in comment 1.
    (just to let you know)

  9. I think for generating 1 million records it is much better to use a table (or cartesian product in an in-line view) that you know to contain at least a million records. This CUBE approach seems best suited for small numbers (say up to a 1000). Perhaps one day I will benchmark the Table Function, Model, Cube and Dummy Table approach.

  10. I noticed something weird when running the examples you posted.
    When I run this query in SQL*Plus, I only get one row back.

    select 1 from dual
    group by cube (1,2,3,4)

    When I run the same query in TOAD, I get 16 rows as you predicted.
    Do you happen to know why?

    Alex

  11. Pingback: » Creating your own advanced search engine for any website - using Oracle Text - Searching the AMIS Technology Weblog