Pivoting records means changing their format and typically either splitting or merging records. Usual examples include transactional records such as a list of daily transactions that need to be turned into a record structure, for example a weekly record. The inverse operation is also sometimes required. You can also use pivoting to get a nicer, more compact result from a SQL query without procedural logic.
Pivoting (both merging and splitting) can easily be implemented using Table Functions. See for example: Turning On Pivot Tables By Jonathan Gennick (Oracle Magazine 2002). Table Functions require you to create an object type (most of the times), a nested table type and a PL/SQL Function. For a simple (!) query, that may be a lot of trouble. And although Table Functions can be made quite efficient – they even allow parallel execution – they still can probably not compete with pure SQL based solutions.
You can “trick” merging records in SQL using in-line views and an aggregation statement (this works in early 8.0 versions of the database). This SQL trick seems to be invented by Tom Kyte (see his Expert One-on-One Oracle book) – and can now be seen in many articles. For an example, see: Pivot Query on AskTom. This solution requires SQL that is somewhat difficult to read, understand and maintain: the purpose of the aggregation to condense several rows into one is not immediately clear.
You can ususally make use of a UNION for splitting records. This is not very efficient and requires you to duplicate entire SQL statements which is far from ideal.
The new way to pivot records – merging and splitting using the 10g SQL MODEL
The SQL MODEL clause, introduced in Oracle 10g Release 1, gives new ways to approach pivoting, both merging and splitting records. Let’s take a look at the following example:
Timesheets
We have a table employee_week_record that contains timesheets per employee per week:
create table employee_week_records ( weekno number(4) , empno number(5) , mon number(4,2) , tue number(4,2) , wed number(4,2) , thu number(4,2) , fri number(4,2) , sat number(4,2) , sun number(4,2) ) /
A few records are inserted into this table:
insert into employee_week_records ( weekno, empno, mon, tue, wed, thu, fri, sat, sun) values (1, 1210, 8, 7.5, 8.5, 4.5, 8,null,null) / insert into employee_week_records ( weekno, empno, mon, tue, wed, thu, fri, sat, sun) values (1, 1215, 2, 7.5, 8, 7.5, 8,null,null) /
Straight querying against this table will give us records with 9 columns. This is not an easy format for example to perform analysis on total hour-count per week or per employee. For such operations and reports it would be much better if the data were in a transactional format, with one record per day for each week and employee. So instead of:
WEEKNO EMPNO MON TUE WED THU FRI SAT SUN -------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- 1 1210 8 7.5 8.5 4.5 8 1 1215 2 7.5 8 7.5 8
we would get something like:
WEEKNO EMPNO DAY HOURS ------- ---------- ---- ---------- 1 1210 sun 1 1210 sat 1 1210 fri 8 1 1210 thu 4.5 1 1210 wed 8.5 1 1215 fri 8 1 1215 thu 7.5....
The next query against the employee_week_records table and using a MODEL clause returns the transactional format we are looking for:
select weekno , empno , day , hours from employee_week_records model return updated rows partition by (weekno, empno) -- we want to repeat the creation of 7 hours-cells (or really 7 new records) for every empno and every weekno dimension by ('base' day) -- with 'base' day we make up a new dimension (a new column in the spreadsheet); we need it to create new rows by assigning cell values using positional notation with new values for this dimension; note that the original row in the resultset returned by the from-clause will be extended with an extra value ' base' and alias day. measures ( 0 hours, mon, tue, wed, thu, fri, sat, sun) -- with 0 hours we make up a new measure (a new column in the spreadsheet) with initial value 0 RULES upsert ( hours['mon' ] = mon['base'] -- mon['base'] refers to the measure mon (one of the columns in the underlying table employee_week_records) where the dimension day has the value 'base' (the default value is has been assigned during creation and the value that it will have in all records in the result-set fed into the rules) , hours['tue' ] = tue['base'] , hours['wed' ] = wed['base'] , hours['thu' ] = thu['base'] , hours['fri' ] = fri['base'] , hours['sat' ] = sat['base'] , hours['sun' ] = sun['base'] )
The results of this query:
WEEKNO EMPNO DAY HOURS ------- ---------- ---- ---------- 1 1210 sun 1 1210 sat 1 1210 fri 8 1 1210 thu 4.5 1 1210 wed 8.5 1 1210 tue 7.5 1 1210 mon 8 1 1215 sun 1 1215 sat 1 1215 fri 8 1 1215 thu 7.5 1 1215 wed 8 1 1215 tue 7.5 1 1215 mon 2
Now suppose we have records in transactional format and we want to create the weekly timesheet-records we started out with. The table containing the transaction records looks as follows:
create table worked_hours ( weekno number(4) , empno number(5) , day number(1) , hours number(4,2) ) /
It has been populated from the EMPLOYEE_WEEK_RECORDS table using a multi-table insert statement, something of a pivot-statement in itself:
insert all into worked_hours (weekno, empno,day, hours) values (weekno, empno, 1, mon) into worked_hours values (weekno,empno, 2, tue) into worked_hours values (weekno,empno, 3, wed) into worked_hours values (weekno,empno, 4, thu) into worked_hours values (weekno,empno, 5, fri) into worked_hours values (weekno,empno, 6, sat) into worked_hours values (weekno,empno, 7, sun) select weekno , empno , mon, tue, wed, thu, fri, sat, sun from employee_week_records /
The data is something like
WEEKNO EMPNO DAY HOURS ------- ---------- ---------- ---------- 1 1210 1 8 1 1215 1 2 1 1210 2 7.5 1 1215 2 7.5 1 1210 3 8.5 1 1215 3 8 1 1210 4 4.5 ...
To rotate these records and pivot-merge them into weekly records, we make use of the following query with MODEL clause:
select weekno , empno , mon , tue , wed , thu , fri , sat , sun from worked_hours model return updated rows partition by (weekno, empno) dimension by ( day ) measures ( hours, 0 mon, 0 tue, 0 wed, 0 thu, 0 fri, 0 sat, 0 sun) RULES upsert ( mon [0] = hours [1] , tue [0] = hours [2] , wed [0] = hours [3] , thu [0] = hours [4] , fri [0] = hours [5] , sat [0] = hours [6] , sun [0] = hours [7] )/
The records returned are
WEEKNO EMPNO MON TUE WED THU FRI SAT SUN ------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- 1 1215 2 7.5 8 7.5 8 1 1210 8 7.5 8.5 4.5 8
Calculating the Week-Total
One thing we can easily do is to pivot the week-records to transactional records, perform an analysis on the transaction data and in the same query pivot again, to record format. Thus we could add a week-total to each record without actually adding up the daily numbers.
WEEKNO EMPNO MON TUE WED THU FRI SAT SUN WEEKTOTAL ------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- - 1 1210 8 7.5 8.5 4.5 8 36.5 1 1215 2 7.5 8 7.5 8 33
Granted, this is not the easiest way to get hold of week-totals but it does show you the trick. Of course when in transactional format you can do other things as well: filter out sundays, multiply hours on saturday by two, set limits on the number of hours per day (no more than 12, no less than 0) etc.
The combination of two nested queries – each with MODEL clause – to pivot and un-pivot the records could look as follows:
select weekno , empno , mon , tue , wed , thu , fri , sat , sun , weektotal from ( select weekno , empno , day , hours from employee_week_records model return updated rows partition by (weekno, empno) dimension by ('base' day) measures ( 0 hours, mon, tue, wed, thu, fri, sat, sun) RULES upsert ( hours ['mon' ] = mon ['base'] , hours ['tue' ] = tue ['base'] , hours ['wed' ] = wed ['base'] , hours ['thu' ] = thu ['base'] , hours ['fri' ] = fri ['base'] , hours ['sat' ] = sat ['base'] , hours ['sun' ] = sun ['base'] ) ) model return updated rows partition by (weekno, empno) dimension by (day ) measures ( hours, 0 mon, 0 tue, 0 wed, 0 thu, 0 fri, 0 sat, 0 sun, 0 weektotal) RULES upsert ( mon['base'] = hours['mon'] , tue['base'] = hours['tue'] , wed['base'] = hours['wed'] , thu['base'] = hours['thu'] , fri['base'] = hours['fri'] , sat['base'] = hours['sat'] , sun['base'] = hours['sun'] , weektotal['base'] = sum(hours)[ANY] )
Most remarkable in that query is the way the weektotal is calculated: by applying the aggregate SUM function on the hours measure for all dimension values within the current partition of weekno and empno.
Matrix Report
This second example demonstrates matrix-reporting of aggregate values in the context of good old EMP. We have this next query that reports for each department the nr1, nr2 and nr3 in terms of salary:
select deptno, max(decode(seq,1,ename,null)) highest_paid, max(decode(seq,2,ename,null)) second_highest, max(decode(seq,3,ename,null)) third_highest from ( SELECT deptno, ename, row_number() OVER (PARTITION BY deptno ORDER BY sal desc NULLS LAST ) seq FROM emp ) where seq < = 3 group by deptno
The result from this query is:
DEPTNO HIGHEST_PA SECOND_HIG THIRD_HIGH ------- ---------- ---------- ---------- 20 SCOTT FORD JONES 30 BLAKE ALLEN TURNER 10 KING CLARK MILLER
Note that we have an alternative solution, using the LEAD() analytical function:
select deptno , ename nr1 , nr2 , nr3 from ( select deptno , ename , lead(ename,1) over (partition by deptno order by sal desc) nr2 , lead(ename,2) over (partition by deptno order by sal desc) nr3 , rank() over (partition by deptno order by sal desc) rnk from emp ) where rnk = 1
Using the SQL Model Clause, we come to a different solution – that you may feel is better readable. I have not yet done a comparison in terms of performance:
select deptno , nr1 , nr2 , nr3 from emp model return updated rows partition by ( deptno) dimension by ( row_number() OVER ( PARTITION BY deptno ORDER BY sal desc NULLS LAST) rnk ) measures (ename, lpad(' ',10) nr1, lpad(' ',10) nr2, lpad(' ',10) nr3) rules upsert ( nr1 [0] = ename [1] , nr2 [0] = ename [2] , nr3 [0] = ename [3] ) / DEPTNO NR1 NR2 NR3 ---------- ---------- ---------- ---------- 10 KING CLARK MILLER 30 BLAKE ALLEN TURNER 20 SCOTT FORD JONES
A few things to note in this query:
- The MODEL clause is evaluated after all clauses in the query block except SELECT DISTINCT, and ORDER BY clause are evaluated. These clauses and expressions in the SELECT list are evaluated after the MODEL clause.
- With the Partition By (Deptno), we do not need to focus on deptno anymore: we will do the same thing for every department without explicitly going through all departments (using for example nr1[FOR deptno in (select distinct deptno from emp),0] = ename[cv(),1] with deptno defined as Dimension rather than Partition element)
- Oracle Documentation:”If your query has a MODEL clause, then the query’s SELECT and ORDER BY lists cannot contain aggregates or analytic functions. If needed, these can be specified in PARTITION BY, DIMENSION BY, and MEASURES lists and need to be aliased. Aliases can then be used in the SELECT or ORDER BY clauses.” This means for our query that either we use an in-line view:
from (select deptno, ename, row_number() OVER ( PARTITION BY deptno ORDER BY sal desc NULLS LAST) rnk from emp)
or we include the analytical row_number() function in the DIMENSION BY list.
- The lpad(‘ ‘,10) seems required to set up the nr1, nr2 and nr3 measures that are wide enough to receive the value set by the rules. If we define these measures with for example ‘ ‘ nr1 then we receive an
ORA-25137: Data value out of range
error. The 10 in this case suffices since no value for ENAME is wider than 10.
- The rules UPSERT statement ensures that only updated or newly created records are returned. In this case that means that only newly created records are returned and the original resultset produced by [from emp] is discarded.
- The measures are taken from the set of 14 records that was set up through select (dimensions, measures) from emp or more specifically: select row_number() OVER ( PARTITION BY deptno
ORDER BY sal desc NULLS LAST) rnk, ename, lpad(‘ ‘,10) nr1, lpad(‘ ‘,10) nr2, lpad(‘ ‘,10) nr3 from emp.
Within this set, measure nr1 for each department is found using the positional cell reference ename [1], which refers to the cell in the current department where the dimension rnk has the value 1.
One last example. I have been working on another post where someone asked me to create the following output – this show the salary-sum for each job and in each department as well as the grand-total for each job :
JOB DEPT10 DEPT20 DEPT30 GRAND_TOTAL --------- ---------- ---------- ---------- ----------- CLERK 1300 1900 950 4150 SALESMAN 5600 5600 PRESIDENT 5000 5000 MANAGER 2450 2975 2850 8275 ANALYST 6000 6000
The underlying query looks as follows:
select job , dept10 , dept20 , dept30 , grand_total from (select distinct deptno, job, sum(sal) OVER ( PARTITION BY deptno, job) sumsal , sum(sal) over( partition by job) sumjob from emp ) model return updated rows partition by (job) dimension by (deptno) measures (sumsal, lpad(' ',10) dept10, lpad(' ',10) dept20, lpad(' ',10) dept30 , sumjob grand_total) rules upsert ( dept10 [0] = sumsal [10] , dept20 [0] = sumsal [20] , dept30 [0] = sumsal [30] , grand_total [0] = max(grand_total) [ANY] ) /
Resources
Chapter 22 – SQL for Modeling – Oracle® Database Data Warehousing Guide 10g Release 1 (10.1)
The SQL Model Clause of Oracle Database 10g – An Oracle White Paper (August 2003)–>10gr1-twp-bi-dw-sqlmodel-131067
Hi Lucas,
i tried using your pivoting logic to convert columns into rows, but i din’t quite understand whats happening in the code though. can you please please explain or answer the below questions:
1.) why a partition was required in this case?
2.) why measures have to start with 0 hrs, why it should be defaulted?
3.) What is the model clause actually doing, can you please elaborate it step by step?
Â
Â
How could I resolved the issue in which I don’t know before hand the number of columns. You know that you have mon, tue … sun. How about if you are trying to convert 20 departments and you want to filter by dept ID but you need the Department Name as the column name. The deparments filtered vary constantly so the names for the columns have to be according to the data extracted
Dept ID Desc Emps
10 Planning 3
10 Planning 3
15 Mktg 2
..
200 Call Center 5
I need an output
Planning Mktg
6 2
another time
Planning Call Center
6 5
From what I recently understood is that this may indicate that there was not enough memory space to hold the intermediate results and file swapping took place for the TEMP area. Can you somehow enlarge the memory available to this process?
here is a performace test i’ve maded on worked_hours.
insert into work_hours select mod(rownum,56),
rownum,mod(rownum,7),1 from dba_objects;
select count(1) from worked_hours;
49647
select weekno
, empno
, mon
, tue
, wed
, thu
, fri
, sat
, sun
from worked_hours
model
return updated rows
partition by (weekno, empno)
dimension by ( day )
measures ( hours, 0 mon, 0 tue, 0 wed, 0 thu, 0 fri, 0 sat, 0 sun)
RULES upsert
(
mon [0] = hours [1]
, tue [0] = hours [2]
, wed [0] = hours [3]
, thu [0] = hours [4]
, fri [0] = hours [5]
, sat [0] = hours [6]
, sun [0] = hours [7]
)
/
Statistics
———————————————————-
10 recursive calls
0 db block gets
184 consistent gets
1260 physical reads
0 redo size
1228664 bytes sent via SQL*Net to client
36900 bytes received via SQL*Net from client
3310 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
49635 rows processed
49635 rows selected.
Elapsed: 00:00:04.21
SQL> select weekno
2 , empno
3 , mon
4 , tue
5 , wed
6 , thu
7 , fri
8 , sat
9 , sun
10 from (
11 select weekno,empno,sum(decode(day,2,hours,0)) mon,
12 sum(decode(day,3,hours,0)) tue
13 ,sum(decode(day,4,hours,0)) wed
14 ,sum(decode(day,5,hours,0)) thu
15 ,sum(decode(day,6,hours,0)) fri
16 ,sum(decode(day,7,hours,0)) sat
17 ,sum(decode(day,7,hours,0)) sun from
18 worked_hours group by weekno,empno)
19 /
49635 rows selected.
Elapsed: 00:00:01.92
Statistics
———————————————————-
4 recursive calls
6 db block gets
184 consistent gets
272 physical reads
0 redo size
943540 bytes sent via SQL*Net to client
36900 bytes received via SQL*Net from client
3310 SQL*Net roundtrips to/from client
0 sorts (memory)
1 sorts (disk)
49635 rows processed
It seems that the old pivot method is better. It seems that the difference connect somehow to phiscal reads.