The Oracle 11g UNPIVOT operator - turning columns into rows Oracle Headquarters Redwood Shores1 e1698667100526

The Oracle 11g UNPIVOT operator – turning columns into rows

The PIVOT operation is one frequently discussed when talking about more advanced SQL Queries. Pivoting is the process of switching rows and columns, and is for example an advanced feature in Excel. Before Oracle 11g, there were several approaches to pivoting, that were not work-arounds for the lack of the PIVOT operator in the Oracle SQL Language.

With Oracle 11g, this has changed. Now we do have a PIVOT and an UNPIVOT operator at our disposal. Let’s first take a look at UNPIVOT. In Oracle, the UNPIVOT operation is the process of turning Columns to Rows. Put simply, by applying the UNPIVOT operator to a number of columns, every row is split into that same number of rows. Each of these rows has two new columns: one for the column that this row stems from – one of the columns the data set was UNPIVOTed by – and one with the value from the column. The original UNPIVOT-columns are no longer part of the newly created records.

Let’s take a look at a simple example, using the well known EMP table. We decide to consider the SAL and COMM columns as two different types of INCOME_COMPONENT. We would like to present our employees with an INCOME_COMPONENT column – and two rows for each employee with two income components.

This UNPIVOT query would look like this:

select *
from   ( select ename, job, sal, comm
         from   emp
       )
       unpivot
       ( income_component_value
         for income_component_type in (sal, comm)
       )
/


Here we have specified that the rows selected from emp, with columns ename, job, sal and comm, should be UNPIVOTed by the SAL and COMM column. This means that for each row with a value in the SAL column, we get a new row with columns ENAME and JOB, a new column INCOME_COMPONENT_TYPE with value ‘SAL’  and a new column INCOME_COMPONENT_VALUE with the value from the SAL column. If that row also has a value for COMM it spawns another new row with the same ENAME and JOB values, a value of ‘COMM’ for the INCOME_COMPONENT_TYPE column and the COMM value in the new column INCOME_COMPONENT_VALUE:

ENAME      JOB       INCO INCOME_COMPONENT_VALUE
---------- --------- ---- ----------------------
SMITH      CLERK     SAL                     800
ALLEN      SALESMAN  SAL                    1600
ALLEN      SALESMAN  COMM                    300
WARD       SALESMAN  SAL                    1250
WARD       SALESMAN  COMM                    500
JONES      MANAGER   SAL                    2975
MARTIN     SALESMAN  SAL                    1250
MARTIN     SALESMAN  COMM                   1400
BLAKE      MANAGER   SAL                    2850
CLARK      MANAGER   SAL                    2450
KING       PRESIDENT SAL                    5000
TURNER     SALESMAN  SAL                    1500
TURNER     SALESMAN  COMM                      0
ADAMS      CLERK     SAL                    1100
JAMES      CLERK     SAL                     950
MILLER     CLERK     SAL                    1300
SCOTT      ANALYST   SAL                    3000
FORD       ANALYST   SAL                    3000

18 rows selected.

With this result, it becomes easy to calculate the total income for our employees, although with a simple “sal+nvl(comm,0) as Income” that could be done without UNPIVOT of course:

select ename
,       job
,      sum(income_component_value) income
from   ( select ename, job, sal, comm
         from   emp
       )
       unpivot
       ( income_component_value
         for income_component_type in (sal, comm)
       )
group
by     ename
,      job
/

ENAME JOB INCOME ———- ——— ———- WARD SALESMAN 1750 SMITH CLERK 800 CLARK MANAGER 2450 TURNER SALESMAN 1500 JAMES CLERK 950 JONES MANAGER 2975 MARTIN SALESMAN 2650 ADAMS CLERK 1100 SCOTT ANALYST 3000 KING PRESIDENT 5000 FORD ANALYST 3000 ALLEN SALESMAN 1900 BLAKE MANAGER 2850 MILLER CLERK 1300 14 rows selected.

This example does not do much I fear to illustrate the usefulness of the UNPIVOT operation. Where it really shines I believe is when the data design has not been as denormalized as we would have liked and rather than the transactional format we would prefer for aggregations, we are stuck with multiple columns in each row. Typical examples are records with columns for each day of the week or even day of the month or for several fixed – or so we thought – times during each day. When the screen to enter values looks like a matrix, all too frequently the table has such a column layout. And data operations are harder than they should be. Enter the UNPIVOT.

Look at this example.

SQL> desc daily_measurements
 Name                                      Null?    Type
 ----------------------------------------- -------- ---------------
 DAY                                                DATE
 MORNING_MEASUREMENT                                NUMBER(5,2)
 AFTERNOON_MEASUREMENT                              NUMBER(5,2)
 EVENING_MEASUREMENT                                NUMBER(5,2)

At one point, this table design seemed a good idea. By now, we know better. Getting an average measurement per day is much harder than one would think, knowing that our overworked staff could easily miss one of the daily measurements – and of course we do not know which one.

DAY       MORNING_MEASUREMENT AFTERNOON_MEASUREMENT EVENING_MEASUREMENT
--------- ------------------- --------------------- -------------------
27-SEP-07                10.2                 45.91                35.6
28-SEP-07               16.12                 25.13               38.21
29-SEP-07               21.65                                      81.2

select (morning_measurement+afternoon_measurement+evening_measurement)/3 day_average is not going to do the trick. First of all we need NVL(measurement,0) to not have NULLs distort the summation. Then we should divide by the number of measurements that is actually there: case when morning_measurement is null then 0 else 1 end + case when afternoon_measurement is null then 0 else 1 end +… is one way to arrive at the factor to divide by. Not elegant!

Using UNPIVOT, life becomes so much easier in these circumstances:

select day
,      avg(measurement)
from   daily_measurements
       unpivot
       ( measurement
         for measurement_time in ( morning_measurement
                                 , afternoon_measurement
                                 , evening_measurement
                                 )
       )
group
by    day
/

DAY       AVG(MEASUREMENT)
--------- ----------------
27-SEP-07            30.57
28-SEP-07       26.4866667
29-SEP-07           51.425

That is all we need to do. We take the three measurement columns in the daily_measurements table and use them for UNPIVOTing. Each of these measurement columns spawns – when it contains a value – a new record for the day. Instead of one row for each day, with three measurements, we get a row per measurement.

In the old days, with Oracle 10g or earlier, we probably would have solved this with something like:

select day
,      morning_measurement measurement
,      'morning_measurement'  measurement_time
from   daily_measurements
where  morning_measurement is not null
union all
select day
,      afternoon_measurement measurement
,      'afternoon_measurement'  measurement_time
from   daily_measurements
where  afternoon_measurement is not null
union all
select day
,      evening_measurement measurement
,      'evening_measurement'  measurement_time
from   daily_measurements
where  evening_measurement is not null
/

It works, but with a growing number of columns, it is not pretty. I have not investigated, but it would be my expectation that UNPIVOT is faster as well.

2 Comments

  1. Robert Gauf October 4, 2007
  2. Vidya Balasubramanian October 1, 2007