Pivoting in SQL using the 10g Model Clause

4

Pivoting records means changing their format and typically either splitting or merging records. Usual examples include transactional records such as a list of daily transactions that need to be turned into a record structure, for example a weekly record. The inverse operation is also sometimes required. You can also use pivoting to get a nicer, more compact result from a SQL query without procedural logic.

Pivoting (both merging and splitting) can easily be implemented using Table Functions. See for example: Turning On Pivot Tables By Jonathan Gennick (Oracle Magazine 2002). Table Functions require you to create an object type (most of the times), a nested table type and a PL/SQL Function. For a simple (!) query, that may be a lot of trouble. And although Table Functions can be made quite efficient – they even allow parallel execution – they still can probably not compete with pure SQL based solutions.

You can “trick” merging records in SQL using in-line views and an aggregation statement (this works in early 8.0 versions of the database). This SQL trick seems to be invented by Tom Kyte (see his Expert One-on-One Oracle book) – and can now be seen in many articles. For an example, see: Pivot Query on AskTom. This solution requires SQL that is somewhat difficult to read, understand and maintain: the purpose of the aggregation to condense several rows into one is not immediately clear.

You can ususally make use of a UNION for splitting records. This is not very efficient and requires you to duplicate entire SQL statements which is far from ideal.

The new way to pivot records – merging and splitting using the 10g SQL MODEL

The SQL MODEL clause, introduced in Oracle 10g Release 1, gives new ways to approach pivoting, both merging and splitting records. Let’s take a look at the following example:

Timesheets

We have a table employee_week_record that contains timesheets per employee per week:

create table employee_week_records
( weekno number(4)
, empno number(5)
, mon number(4,2)
, tue number(4,2)
, wed number(4,2)
, thu number(4,2)
, fri number(4,2)
, sat number(4,2)
, sun number(4,2)
)
/

A few records are inserted into this table:

insert into employee_week_records
( weekno, empno, mon, tue, wed, thu, fri, sat, sun)
values
(1, 1210,  8,   7.5, 8.5, 4.5, 8,null,null)
/
insert into employee_week_records
( weekno, empno, mon, tue, wed, thu, fri, sat, sun)
values
(1, 1215,  2,   7.5, 8,   7.5, 8,null,null)
/

Straight querying against this table will give us records with 9 columns. This is not an easy format for example to perform analysis on total hour-count per week or per employee. For such operations and reports it would be much better if the data were in a transactional format, with one record per day for each week and employee. So instead of:

  WEEKNO      EMPNO        MON        TUE        WED        THU        FRI        SAT        SUN
-------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
       1       1210          8        7.5        8.5        4.5          8
       1       1215          2        7.5          8        7.5          8

we would get something like:

 WEEKNO      EMPNO DAY       HOURS
------- ---------- ---- ----------
      1       1210 sun
      1       1210 sat
      1       1210 fri           8
      1       1210 thu         4.5
      1       1210 wed         8.5
      1       1215 fri           8
      1       1215 thu         7.5....

The next query against the employee_week_records table and using a MODEL clause returns the transactional format we are looking for:

select weekno
,      empno
,      day
,      hours
from   employee_week_records
model
  return updated rows
  partition by (weekno, empno) -- we want to repeat the creation of 7 hours-cells (or really 7 new records) for every empno and every weekno
  dimension by ('base' day) -- with 'base' day we make up a new dimension (a new column in the spreadsheet); we need it to create new rows by assigning cell values using positional notation with new values for this dimension; note that the original row in the resultset returned by the from-clause will be extended with an extra value ' base'  and alias day.
  measures ( 0 hours, mon, tue, wed, thu, fri, sat, sun)  -- with 0 hours we make up a new measure (a new column in the spreadsheet) with initial value 0
  RULES upsert
  (
     hours['mon' ] = mon['base'] -- mon['base']  refers to the measure mon (one of the columns in the underlying table employee_week_records) where the dimension day has the value 'base'  (the default value is has been assigned during creation and the value that it will have in all records in the result-set fed into the rules)
   , hours['tue' ] = tue['base']
   , hours['wed' ] = wed['base']
   , hours['thu' ] = thu['base']
   , hours['fri' ] = fri['base']
   , hours['sat' ] = sat['base']
   , hours['sun' ] = sun['base']
  )

The results of this query:

 WEEKNO      EMPNO DAY       HOURS
------- ---------- ---- ----------
      1       1210 sun
      1       1210 sat
      1       1210 fri           8
      1       1210 thu         4.5
      1       1210 wed         8.5
      1       1210 tue         7.5
      1       1210 mon           8
      1       1215 sun
      1       1215 sat
      1       1215 fri           8
      1       1215 thu         7.5
      1       1215 wed           8
      1       1215 tue         7.5
      1       1215 mon           2

Now suppose we have records in transactional format and we want to create the weekly timesheet-records we started out with. The table containing the transaction records looks as follows:

create table worked_hours
( weekno number(4)
, empno number(5)
, day   number(1)
, hours number(4,2)
)
/

It has been populated from the EMPLOYEE_WEEK_RECORDS table using a multi-table insert statement, something of a pivot-statement in itself:

insert all
  into worked_hours (weekno, empno,day, hours)
  values (weekno, empno, 1, mon)
  into worked_hours values (weekno,empno, 2, tue)
  into worked_hours values (weekno,empno, 3, wed)
  into worked_hours values (weekno,empno, 4, thu)
  into worked_hours values (weekno,empno, 5, fri)
  into worked_hours values (weekno,empno, 6, sat)
  into worked_hours values (weekno,empno, 7, sun)
 select weekno
 ,      empno
 ,      mon, tue, wed, thu, fri, sat, sun
 from   employee_week_records
/

The data is something like

 WEEKNO      EMPNO        DAY      HOURS
------- ---------- ---------- ----------
      1       1210          1          8
      1       1215          1          2
      1       1210          2        7.5
      1       1215          2        7.5
      1       1210          3        8.5
      1       1215          3          8
      1       1210          4        4.5
...

To rotate these records and pivot-merge them into weekly records, we make use of the following query with MODEL clause:

select weekno
,      empno
,      mon
,      tue
,      wed
,      thu
,      fri
,      sat
,      sun
from   worked_hours
model
  return updated rows
  partition by (weekno, empno)
  dimension by ( day )
  measures ( hours, 0 mon, 0 tue, 0 wed, 0 thu, 0 fri, 0 sat, 0 sun)
  RULES upsert
  (
     mon [0] = hours [1]
   , tue [0] = hours [2]
   , wed [0] = hours [3]
   , thu [0] = hours [4]
   , fri [0] = hours [5]
   , sat [0] = hours [6]
   , sun [0] = hours [7]
  )/

The records returned are

  WEEKNO      EMPNO        MON        TUE        WED        THU        FRI        SAT        SUN
------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
      1       1215          2        7.5          8        7.5          8
      1       1210          8        7.5        8.5        4.5          8

Calculating the Week-Total

One thing we can easily do is to pivot the week-records to transactional records, perform an analysis on the transaction data and in the same query pivot again, to record format. Thus we could add a week-total to each record without actually adding up the daily numbers.

 WEEKNO      EMPNO        MON        TUE        WED        THU        FRI        SAT        SUN  WEEKTOTAL
------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- -
      1       1210          8        7.5        8.5        4.5          8                             36.5
      1       1215          2        7.5          8        7.5          8                               33

Granted, this is not the easiest way to get hold of week-totals but it does show you the trick. Of course when in transactional format you can do other things as well: filter out sundays, multiply hours on saturday by two, set limits on the number of hours per day (no more than 12, no less than 0) etc.

The combination of two nested queries – each with MODEL clause – to pivot and un-pivot the records could look as follows:

select weekno
,      empno
,      mon
,      tue
,      wed
,      thu
,      fri
,      sat
,      sun
,      weektotal
from   ( select weekno
         ,      empno
         ,      day
         ,      hours
         from   employee_week_records
         model
           return updated rows
           partition by (weekno, empno)
           dimension by ('base' day)
           measures ( 0 hours, mon, tue, wed, thu, fri, sat, sun)
           RULES upsert
           (
              hours ['mon' ] = mon ['base']
            , hours ['tue' ] = tue ['base']
            , hours ['wed' ] = wed ['base']
            , hours ['thu' ] = thu ['base']
            , hours ['fri' ] = fri ['base']
            , hours ['sat' ] = sat ['base']
            , hours ['sun' ] = sun ['base']
           )
       )
model
  return updated rows
  partition by (weekno, empno)
  dimension by (day )
  measures ( hours, 0 mon, 0 tue, 0 wed, 0 thu, 0 fri, 0 sat, 0 sun, 0 weektotal)
  RULES upsert
  (
     mon['base'] = hours['mon']
   , tue['base'] = hours['tue']
   , wed['base'] = hours['wed']
   , thu['base'] = hours['thu']
   , fri['base'] = hours['fri']
   , sat['base'] = hours['sat']
   , sun['base'] = hours['sun']
   , weektotal['base'] = sum(hours)[ANY]
  )

Most remarkable in that query is the way the weektotal is calculated: by applying the aggregate SUM function on the hours measure for all dimension values within the current partition of weekno and empno.

Matrix Report

This second example demonstrates matrix-reporting of aggregate values in the context of good old EMP. We have this next query that reports for each department the nr1, nr2 and nr3 in terms of salary:

select deptno,
           max(decode(seq,1,ename,null)) highest_paid,
           max(decode(seq,2,ename,null)) second_highest,
           max(decode(seq,3,ename,null)) third_highest
  from ( SELECT deptno, ename,
                row_number() OVER
                   (PARTITION BY deptno
                        ORDER BY sal desc NULLS LAST ) seq
           FROM emp )
where seq < = 3
group by deptno

The result from this query is:

 DEPTNO HIGHEST_PA SECOND_HIG THIRD_HIGH
------- ---------- ---------- ----------
     20 SCOTT      FORD       JONES
     30 BLAKE      ALLEN      TURNER
     10 KING       CLARK      MILLER

Note that we have an alternative solution, using the LEAD() analytical function:

 select deptno
 ,      ename nr1
 ,      nr2
 ,      nr3
 from   ( select deptno
          ,      ename
          ,      lead(ename,1) over (partition by deptno
                                      order by sal desc) nr2
          ,      lead(ename,2) over (partition by deptno
                                     order by sal desc) nr3
          ,      rank() over (partition by deptno order by sal desc) rnk
          from   emp
        )
 where  rnk = 1

Using the SQL Model Clause, we come to a different solution – that you may feel is better readable. I have not yet done a comparison in terms of performance:

select deptno
,      nr1
,      nr2
,      nr3
from   emp
model
  return updated rows
  partition by ( deptno)
  dimension by ( row_number() OVER ( PARTITION BY deptno
                                     ORDER BY sal desc NULLS LAST) rnk
               )
  measures (ename,  lpad(' ',10) nr1, lpad(' ',10) nr2, lpad(' ',10) nr3)
  rules upsert
  (
      nr1 [0] = ename [1]
    , nr2 [0] = ename [2]
    , nr3 [0] = ename [3]
  )
/
    DEPTNO NR1        NR2        NR3
---------- ---------- ---------- ----------
        10 KING       CLARK      MILLER
        30 BLAKE      ALLEN      TURNER
        20 SCOTT      FORD       JONES

A few things to note in this query:

  • The MODEL clause is evaluated after all clauses in the query block except SELECT DISTINCT, and ORDER BY clause are evaluated. These clauses and expressions in the SELECT list are evaluated after the MODEL clause.
  • With the Partition By (Deptno), we do not need to focus on deptno anymore: we will do the same thing for every department without explicitly going through all departments (using for example nr1[FOR deptno in (select distinct deptno from emp),0] = ename[cv(),1] with deptno defined as Dimension rather than Partition element)
  • Oracle Documentation:”If your query has a MODEL clause, then the query’s SELECT and ORDER BY lists cannot contain aggregates or analytic functions. If needed, these can be specified in PARTITION BY, DIMENSION BY, and MEASURES lists and need to be aliased. Aliases can then be used in the SELECT or ORDER BY clauses.” This means for our query that either we use an in-line view:
    from   (select deptno, ename, row_number() OVER ( PARTITION BY deptno ORDER BY sal desc NULLS LAST) rnk from emp)

    or we include the analytical row_number() function in the DIMENSION BY list.

  • The lpad(‘ ‘,10) seems required to set up the nr1, nr2 and nr3 measures that are wide enough to receive the value set by the rules. If we define these measures with for example ‘ ‘ nr1 then we receive an
    ORA-25137: Data value out of range

    error. The 10 in this case suffices since no value for ENAME is wider than 10.

  • The rules UPSERT statement ensures that only updated or newly created records are returned. In this case that means that only newly created records are returned and the original resultset produced by [from emp]is discarded.
  • The measures are taken from the set of 14 records that was set up through select (dimensions, measures) from emp or more specifically: select row_number() OVER ( PARTITION BY deptno
    ORDER BY sal desc NULLS LAST) rnk, ename, lpad(‘ ‘,10) nr1, lpad(‘ ‘,10) nr2, lpad(‘ ‘,10) nr3 from emp.
    Within this set, measure nr1 for each department is found using the positional cell reference ename [1], which refers to the cell in the current department where the dimension rnk has the value 1.

One last example. I have been working on another post where someone asked me to create the following output – this show the salary-sum for each job and in each department as well as the grand-total for each job :

JOB       DEPT10     DEPT20     DEPT30     GRAND_TOTAL
--------- ---------- ---------- ---------- -----------
CLERK     1300       1900       950               4150
SALESMAN                        5600              5600
PRESIDENT 5000                                    5000
MANAGER   2450       2975       2850              8275
ANALYST              6000                         6000

The underlying query looks as follows:

select job
,      dept10
,      dept20
,      dept30
,      grand_total
from   (select distinct deptno,  job, sum(sal) OVER ( PARTITION BY deptno, job) sumsal , sum(sal) over( partition by job) sumjob from emp )
model
  return updated rows
  partition by (job)
  dimension by (deptno)
  measures (sumsal,   lpad(' ',10) dept10, lpad(' ',10) dept20, lpad(' ',10) dept30 , sumjob grand_total)
  rules upsert
  (
      dept10 [0] = sumsal [10]
    , dept20 [0] = sumsal [20]
    , dept30 [0] = sumsal [30]
    , grand_total [0] = max(grand_total) [ANY]
  )
/

Resources

Chapter 22 – SQL for Modeling – Oracle® Database Data Warehousing Guide 10g Release 1 (10.1)
The SQL Model Clause of Oracle Database 10g – An Oracle White Paper (August 2003)

Share.

About Author

Lucas Jellema, active in IT (and with Oracle) since 1994. Oracle ACE Director for Fusion Middleware. Consultant, trainer and instructor on diverse areas including Oracle Database (SQL & PLSQL), Service Oriented Architecture, BPM, ADF, Java in various shapes and forms and many other things. Author of the Oracle Press book: Oracle SOA Suite 11g Handbook. Frequent presenter on conferences such as JavaOne, Oracle OpenWorld, ODTUG Kaleidoscope, Devoxx and OBUG. Presenter for Oracle University Celebrity specials.

4 Comments

  1. Hi Lucas,
    i tried using your pivoting logic to convert columns into rows, but i din’t quite understand whats happening in the code though. can you please please explain or answer the below questions:
    1.) why a partition was required in this case?
    2.) why measures have to start with 0 hrs, why it should be defaulted?
    3.) What is the model clause actually doing, can you please elaborate it step by step?
     
     

  2. How could I resolved the issue in which I don’t know before hand the number of columns. You know that you have mon, tue … sun. How about if you are trying to convert 20 departments and you want to filter by dept ID but you need the Department Name as the column name. The deparments filtered vary constantly so the names for the columns have to be according to the data extracted

    Dept ID Desc Emps
    10 Planning 3
    10 Planning 3
    15 Mktg 2
    ..
    200 Call Center 5

    I need an output
    Planning Mktg
    6 2

    another time
    Planning Call Center
    6 5

  3. From what I recently understood is that this may indicate that there was not enough memory space to hold the intermediate results and file swapping took place for the TEMP area. Can you somehow enlarge the memory available to this process?

  4. here is a performace test i’ve maded on worked_hours.

    insert into work_hours select mod(rownum,56),
    rownum,mod(rownum,7),1 from dba_objects;

    select count(1) from worked_hours;
    49647

    select weekno
    , empno
    , mon
    , tue
    , wed
    , thu
    , fri
    , sat
    , sun
    from worked_hours
    model
    return updated rows
    partition by (weekno, empno)
    dimension by ( day )
    measures ( hours, 0 mon, 0 tue, 0 wed, 0 thu, 0 fri, 0 sat, 0 sun)
    RULES upsert
    (
    mon [0] = hours [1]
    , tue [0] = hours [2]
    , wed [0] = hours [3]
    , thu [0] = hours [4]
    , fri [0] = hours [5]
    , sat [0] = hours [6]
    , sun [0] = hours [7]
    )
    /

    Statistics
    ———————————————————-
    10 recursive calls
    0 db block gets
    184 consistent gets
    1260 physical reads
    0 redo size
    1228664 bytes sent via SQL*Net to client
    36900 bytes received via SQL*Net from client
    3310 SQL*Net roundtrips to/from client
    0 sorts (memory)
    0 sorts (disk)
    49635 rows processed

    49635 rows selected.

    Elapsed: 00:00:04.21

    SQL> select weekno
    2 , empno
    3 , mon
    4 , tue
    5 , wed
    6 , thu
    7 , fri
    8 , sat
    9 , sun
    10 from (
    11 select weekno,empno,sum(decode(day,2,hours,0)) mon,
    12 sum(decode(day,3,hours,0)) tue
    13 ,sum(decode(day,4,hours,0)) wed
    14 ,sum(decode(day,5,hours,0)) thu
    15 ,sum(decode(day,6,hours,0)) fri
    16 ,sum(decode(day,7,hours,0)) sat
    17 ,sum(decode(day,7,hours,0)) sun from
    18 worked_hours group by weekno,empno)
    19 /

    49635 rows selected.

    Elapsed: 00:00:01.92

    Statistics
    ———————————————————-
    4 recursive calls
    6 db block gets
    184 consistent gets
    272 physical reads
    0 redo size
    943540 bytes sent via SQL*Net to client
    36900 bytes received via SQL*Net from client
    3310 SQL*Net roundtrips to/from client
    0 sorts (memory)
    1 sorts (disk)
    49635 rows processed

    It seems that the old pivot method is better. It seems that the difference connect somehow to phiscal reads.