Pivoting in SQL using the 10g Model Clause

Pivoting records means changing their format and typically either splitting or merging records. Usual examples include transactional records such as a list of daily transactions that need to be turned into a record structure, for example a weekly record. The inverse operation is also sometimes required. You can also use pivoting to get a nicer, more compact result from a SQL query without procedural logic.

Pivoting (both merging and splitting) can easily be implemented using Table Functions. See for example: Turning On Pivot Tables By Jonathan Gennick (Oracle Magazine 2002). Table Functions require you to create an object type (most of the times), a nested table type and a PL/SQL Function. For a simple (!) query, that may be a lot of trouble. And although Table Functions can be made quite efficient – they even allow parallel execution – they still can probably not compete with pure SQL based solutions.

You can “trick” merging records in SQL using in-line views and an aggregation statement (this works in early 8.0 versions of the database). This SQL trick seems to be invented by Tom Kyte (see his Expert One-on-One Oracle book) – and can now be seen in many articles. For an example, see: Pivot Query on AskTom. This solution requires SQL that is somewhat difficult to read, understand and maintain: the purpose of the aggregation to condense several rows into one is not immediately clear.

You can ususally make use of a UNION for splitting records. This is not very efficient and requires you to duplicate entire SQL statements which is far from ideal.

The new way to pivot records – merging and splitting using the 10g SQL MODEL

The SQL MODEL clause, introduced in Oracle 10g Release 1, gives new ways to approach pivoting, both merging and splitting records. Let’s take a look at the following example:

Timesheets

We have a table employee_week_record that contains timesheets per employee per week:

create table employee_week_records
( weekno number(4)
, empno number(5)
, mon number(4,2)
, tue number(4,2)
, wed number(4,2)
, thu number(4,2)
, fri number(4,2)
, sat number(4,2)
, sun number(4,2)
)
/

A few records are inserted into this table:

insert into employee_week_records
( weekno, empno, mon, tue, wed, thu, fri, sat, sun)
values
(1, 1210,  8,   7.5, 8.5, 4.5, 8,null,null)
/
insert into employee_week_records
( weekno, empno, mon, tue, wed, thu, fri, sat, sun)
values
(1, 1215,  2,   7.5, 8,   7.5, 8,null,null)
/

Straight querying against this table will give us records with 9 columns. This is not an easy format for example to perform analysis on total hour-count per week or per employee. For such operations and reports it would be much better if the data were in a transactional format, with one record per day for each week and employee. So instead of:

  WEEKNO      EMPNO        MON        TUE        WED        THU        FRI        SAT        SUN
-------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
       1       1210          8        7.5        8.5        4.5          8
       1       1215          2        7.5          8        7.5          8

we would get something like:

 WEEKNO      EMPNO DAY       HOURS
------- ---------- ---- ----------
      1       1210 sun
      1       1210 sat
      1       1210 fri           8
      1       1210 thu         4.5
      1       1210 wed         8.5
      1       1215 fri           8
      1       1215 thu         7.5....

The next query against the employee_week_records table and using a MODEL clause returns the transactional format we are looking for:

select weekno
,      empno
,      day
,      hours
from   employee_week_records
model
  return updated rows
  partition by (weekno, empno) -- we want to repeat the creation of 7 hours-cells (or really 7 new records) for every empno and every weekno
  dimension by ('base' day) -- with 'base' day we make up a new dimension (a new column in the spreadsheet); we need it to create new rows by assigning cell values using positional notation with new values for this dimension; note that the original row in the resultset returned by the from-clause will be extended with an extra value ' base'  and alias day.
  measures ( 0 hours, mon, tue, wed, thu, fri, sat, sun)  -- with 0 hours we make up a new measure (a new column in the spreadsheet) with initial value 0
  RULES upsert
  (
     hours['mon' ] = mon['base'] -- mon['base']  refers to the measure mon (one of the columns in the underlying table employee_week_records) where the dimension day has the value 'base'  (the default value is has been assigned during creation and the value that it will have in all records in the result-set fed into the rules)
   , hours['tue' ] = tue['base']
   , hours['wed' ] = wed['base']
   , hours['thu' ] = thu['base']
   , hours['fri' ] = fri['base']
   , hours['sat' ] = sat['base']
   , hours['sun' ] = sun['base']
  )

The results of this query:

 WEEKNO      EMPNO DAY       HOURS
------- ---------- ---- ----------
      1       1210 sun
      1       1210 sat
      1       1210 fri           8
      1       1210 thu         4.5
      1       1210 wed         8.5
      1       1210 tue         7.5
      1       1210 mon           8
      1       1215 sun
      1       1215 sat
      1       1215 fri           8
      1       1215 thu         7.5
      1       1215 wed           8
      1       1215 tue         7.5
      1       1215 mon           2

Now suppose we have records in transactional format and we want to create the weekly timesheet-records we started out with. The table containing the transaction records looks as follows:

create table worked_hours
( weekno number(4)
, empno number(5)
, day   number(1)
, hours number(4,2)
)
/

It has been populated from the EMPLOYEE_WEEK_RECORDS table using a multi-table insert statement, something of a pivot-statement in itself:

insert all
  into worked_hours (weekno, empno,day, hours)
  values (weekno, empno, 1, mon)
  into worked_hours values (weekno,empno, 2, tue)
  into worked_hours values (weekno,empno, 3, wed)
  into worked_hours values (weekno,empno, 4, thu)
  into worked_hours values (weekno,empno, 5, fri)
  into worked_hours values (weekno,empno, 6, sat)
  into worked_hours values (weekno,empno, 7, sun)
 select weekno
 ,      empno
 ,      mon, tue, wed, thu, fri, sat, sun
 from   employee_week_records
/

The data is something like

 WEEKNO      EMPNO        DAY      HOURS
------- ---------- ---------- ----------
      1       1210          1          8
      1       1215          1          2
      1       1210          2        7.5
      1       1215          2        7.5
      1       1210          3        8.5
      1       1215          3          8
      1       1210          4        4.5
...

To rotate these records and pivot-merge them into weekly records, we make use of the following query with MODEL clause:

select weekno
,      empno
,      mon
,      tue
,      wed
,      thu
,      fri
,      sat
,      sun
from   worked_hours
model
  return updated rows
  partition by (weekno, empno)
  dimension by ( day )
  measures ( hours, 0 mon, 0 tue, 0 wed, 0 thu, 0 fri, 0 sat, 0 sun)
  RULES upsert
  (
     mon [0] = hours [1]
   , tue [0] = hours [2]
   , wed [0] = hours [3]
   , thu [0] = hours [4]
   , fri [0] = hours [5]
   , sat [0] = hours [6]
   , sun [0] = hours [7]
  )/

The records returned are

  WEEKNO      EMPNO        MON        TUE        WED        THU        FRI        SAT        SUN
------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
      1       1215          2        7.5          8        7.5          8
      1       1210          8        7.5        8.5        4.5          8

Calculating the Week-Total

One thing we can easily do is to pivot the week-records to transactional records, perform an analysis on the transaction data and in the same query pivot again, to record format. Thus we could add a week-total to each record without actually adding up the daily numbers.

 WEEKNO      EMPNO        MON        TUE        WED        THU        FRI        SAT        SUN  WEEKTOTAL
------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- -
      1       1210          8        7.5        8.5        4.5          8                             36.5
      1       1215          2        7.5          8        7.5          8                               33

Granted, this is not the easiest way to get hold of week-totals but it does show you the trick. Of course when in transactional format you can do other things as well: filter out sundays, multiply hours on saturday by two, set limits on the number of hours per day (no more than 12, no less than 0) etc.

The combination of two nested queries – each with MODEL clause – to pivot and un-pivot the records could look as follows:

select weekno
,      empno
,      mon
,      tue
,      wed
,      thu
,      fri
,      sat
,      sun
,      weektotal
from   ( select weekno
         ,      empno
         ,      day
         ,      hours
         from   employee_week_records
         model
           return updated rows
           partition by (weekno, empno)
           dimension by ('base' day)
           measures ( 0 hours, mon, tue, wed, thu, fri, sat, sun)
           RULES upsert
           (
              hours ['mon' ] = mon ['base']
            , hours ['tue' ] = tue ['base']
            , hours ['wed' ] = wed ['base']
            , hours ['thu' ] = thu ['base']
            , hours ['fri' ] = fri ['base']
            , hours ['sat' ] = sat ['base']
            , hours ['sun' ] = sun ['base']
           )
       )
model
  return updated rows
  partition by (weekno, empno)
  dimension by (day )
  measures ( hours, 0 mon, 0 tue, 0 wed, 0 thu, 0 fri, 0 sat, 0 sun, 0 weektotal)
  RULES upsert
  (
     mon['base'] = hours['mon']
   , tue['base'] = hours['tue']
   , wed['base'] = hours['wed']
   , thu['base'] = hours['thu']
   , fri['base'] = hours['fri']
   , sat['base'] = hours['sat']
   , sun['base'] = hours['sun']
   , weektotal['base'] = sum(hours)[ANY]
  )

Most remarkable in that query is the way the weektotal is calculated: by applying the aggregate SUM function on the hours measure for all dimension values within the current partition of weekno and empno.

Matrix Report

This second example demonstrates matrix-reporting of aggregate values in the context of good old EMP. We have this next query that reports for each department the nr1, nr2 and nr3 in terms of salary:

select deptno,
           max(decode(seq,1,ename,null)) highest_paid,
           max(decode(seq,2,ename,null)) second_highest,
           max(decode(seq,3,ename,null)) third_highest
  from ( SELECT deptno, ename,
                row_number() OVER
                   (PARTITION BY deptno
                        ORDER BY sal desc NULLS LAST ) seq
           FROM emp )
where seq < = 3
group by deptno

The result from this query is:

 DEPTNO HIGHEST_PA SECOND_HIG THIRD_HIGH
------- ---------- ---------- ----------
     20 SCOTT      FORD       JONES
     30 BLAKE      ALLEN      TURNER
     10 KING       CLARK      MILLER

Note that we have an alternative solution, using the LEAD() analytical function:

 select deptno
 ,      ename nr1
 ,      nr2
 ,      nr3
 from   ( select deptno
          ,      ename
          ,      lead(ename,1) over (partition by deptno
                                      order by sal desc) nr2
          ,      lead(ename,2) over (partition by deptno
                                     order by sal desc) nr3
          ,      rank() over (partition by deptno order by sal desc) rnk
          from   emp
        )
 where  rnk = 1

Using the SQL Model Clause, we come to a different solution – that you may feel is better readable. I have not yet done a comparison in terms of performance:

select deptno
,      nr1
,      nr2
,      nr3
from   emp
model
  return updated rows
  partition by ( deptno)
  dimension by ( row_number() OVER ( PARTITION BY deptno
                                     ORDER BY sal desc NULLS LAST) rnk
               )
  measures (ename,  lpad(' ',10) nr1, lpad(' ',10) nr2, lpad(' ',10) nr3)
  rules upsert
  (
      nr1 [0] = ename [1]
    , nr2 [0] = ename [2]
    , nr3 [0] = ename [3]
  )
/
    DEPTNO NR1        NR2        NR3
---------- ---------- ---------- ----------
        10 KING       CLARK      MILLER
        30 BLAKE      ALLEN      TURNER
        20 SCOTT      FORD       JONES

A few things to note in this query:

  • The MODEL clause is evaluated after all clauses in the query block except SELECT DISTINCT, and ORDER BY clause are evaluated. These clauses and expressions in the SELECT list are evaluated after the MODEL clause.
  • With the Partition By (Deptno), we do not need to focus on deptno anymore: we will do the same thing for every department without explicitly going through all departments (using for example nr1[FOR deptno in (select distinct deptno from emp),0] = ename[cv(),1] with deptno defined as Dimension rather than Partition element)
  • Oracle Documentation:”If your query has a MODEL clause, then the query’s SELECT and ORDER BY lists cannot contain aggregates or analytic functions. If needed, these can be specified in PARTITION BY, DIMENSION BY, and MEASURES lists and need to be aliased. Aliases can then be used in the SELECT or ORDER BY clauses.” This means for our query that either we use an in-line view:
    from   (select deptno, ename, row_number() OVER ( PARTITION BY deptno ORDER BY sal desc NULLS LAST) rnk from emp)

    or we include the analytical row_number() function in the DIMENSION BY list.

  • The lpad(‘ ‘,10) seems required to set up the nr1, nr2 and nr3 measures that are wide enough to receive the value set by the rules. If we define these measures with for example ‘ ‘ nr1 then we receive an
    ORA-25137: Data value out of range

    error. The 10 in this case suffices since no value for ENAME is wider than 10.

  • The rules UPSERT statement ensures that only updated or newly created records are returned. In this case that means that only newly created records are returned and the original resultset produced by [from emp] is discarded.
  • The measures are taken from the set of 14 records that was set up through select (dimensions, measures) from emp or more specifically: select row_number() OVER ( PARTITION BY deptno
    ORDER BY sal desc NULLS LAST) rnk, ename, lpad(‘ ‘,10) nr1, lpad(‘ ‘,10) nr2, lpad(‘ ‘,10) nr3 from emp.Pivoting in SQL using the 10g Model Clause emp pivot
    Within this set, measure nr1 for each department is found using the positional cell reference ename [1], which refers to the cell in the current department where the dimension rnk has the value 1.

One last example. I have been working on another post where someone asked me to create the following output – this show the salary-sum for each job and in each department as well as the grand-total for each job :

JOB       DEPT10     DEPT20     DEPT30     GRAND_TOTAL
--------- ---------- ---------- ---------- -----------
CLERK     1300       1900       950               4150
SALESMAN                        5600              5600
PRESIDENT 5000                                    5000
MANAGER   2450       2975       2850              8275
ANALYST              6000                         6000

The underlying query looks as follows:

select job
,      dept10
,      dept20
,      dept30
,      grand_total
from   (select distinct deptno,  job, sum(sal) OVER ( PARTITION BY deptno, job) sumsal , sum(sal) over( partition by job) sumjob from emp )
model
  return updated rows
  partition by (job)
  dimension by (deptno)
  measures (sumsal,   lpad(' ',10) dept10, lpad(' ',10) dept20, lpad(' ',10) dept30 , sumjob grand_total)
  rules upsert
  (
      dept10 [0] = sumsal [10]
    , dept20 [0] = sumsal [20]
    , dept30 [0] = sumsal [30]
    , grand_total [0] = max(grand_total) [ANY]
  )
/

Resources

Chapter 22 – SQL for Modeling – Oracle® Database Data Warehousing Guide 10g Release 1 (10.1)
The SQL Model Clause of Oracle Database 10g – An Oracle White Paper (August 2003)–>10gr1-twp-bi-dw-sqlmodel-131067

4 Comments

  1. Dinakar October 11, 2011
  2. Oliver November 30, 2005
  3. Lucas December 17, 2004
  4. amihay gonen December 16, 2004