Dynamic and Conditional grouping in SQL queries – for flexible results from single query (oh and a useful case for the CUBE operator)

3

 

My challenge of the day: a query that returns aggregate values. And that can be manipulated dynamically to sometimes aggregate grouped by one dimension, sometimes along multiple or another single dimensiopn. The query will be used to build a report page on that displays a table with rows returned from the query. In this page, the user can hide/display columns. Whenever a column is hidden, the query should no longer group by the columnn. However, it should still be the same query.

In this article, we will look at a way to bring this dynamic, conditional aggregation behavior into a SQL query. How the query will perform aggregating on different levels without change to the query itself. We will use wildly interesting concepts such as the database context, the CASE statement and last but not least, the CUBE (aggregate) operator.

In a subsequent article we will also take a look at the web page that we create on top of this query.

Let’s start with the mother of all queries:

select deptno
,      job
,      avg(sal) "average salary"
,      count(empno) "num in job"
from   scott.emp
group
by     deptno
,      job

The results from this initial query shows aggregate values – average salary and count of staff in job – per Department and per Job:

    DEPTNO JOB       average salary num in job
---------- --------- -------------- ----------
        10 PRESIDENT           5000          1
        10 CLERK               1300          1
        10 MANAGER             2450          1
        20 MANAGER             2975          1
        20 ANALYST             3000          2
        20 CLERK                950          2
        30 SALESMAN            1400          4
        30 CLERK                950          1
        30 MANAGER             2850          1

We want to change the query to only make it group per JOB for other departments than 30. We can change the query using a CASE statement like this:

select deptno
,      case deptno
       when 30
       then 'ALL'
       else job
       end  job
,      avg(sal) "average salary"
,      count(empno) "num in job"
from   scott.emp
group
by     deptno
,      case deptno
       when 30
       then 'ALL'
       else job
       end
order
by     deptno
,      job

The results:

DEPTNO JOB       average salary num in job
------ --------- -------------- ----------
    10 CLERK               1300          1
    10 MANAGER             2450          1
    10 PRESIDENT           5000          1
    20 ANALYST             3000          2
    20 CLERK                950          2
    20 MANAGER             2975          1
    30 ALL           1566.66667          6

Here you see that we group per JOB within departments 10 and 20 but not 30. This is our hook into a more general approach to conditional aggregating.

We want to be able to set switches that specify grouping by Department, Job and Hireyear with simple statements like:

exec query_switcher.group_by_job_yn('Y')
exec query_switcher.group_by_department_yn('N')

The best way to pass information into SQL queriesis through the use of an (application) context. The optimizer treats references to SYS_CONTEXT as bind parameters – making for a much more efficient  execution plan than we get through the use of calls to functions in packages that return the values of global package variables.

To set up a context, we first create the package to manage the context contents:

create or replace package query_switcher
is

procedure group_by_department_yn(p_yn in varchar2);
procedure group_by_job_yn(p_yn in varchar2);
procedure group_by_hireyear_yn(p_yn in varchar2);

end;

Now we can create the context

create context query_switch_ctx using query_switcher

And then the package body:

create or replace package body query_switcher
is

procedure group_by_department_yn(p_yn in varchar2)
is
begin
  DBMS_SESSION.SET_CONTEXT( 'query_switch_ctx', 'group_by_department', p_yn);
end;

procedure group_by_job_yn(p_yn in varchar2)
is
begin
  DBMS_SESSION.SET_CONTEXT( 'query_switch_ctx', 'group_by_job', p_yn);
end;

procedure group_by_hireyear_yn(p_yn in varchar2)
is
begin
  DBMS_SESSION.SET_CONTEXT( 'query_switch_ctx', 'group_by_hireyear', p_yn);
end;

end;

With all this in place, we can create the query itself that uses the switches from the context for deciding which behavior to assume:

select case sys_context('query_switch_ctx', 'group_by_department')
       when 'N'
       then null
       else deptno
       end deptno
,      case sys_context('query_switch_ctx', 'group_by_job')
       when 'N'
       then 'ALL'
       else job
       end  job
,      case sys_context('query_switch_ctx', 'group_by_hireyear')
       when 'N'
       then to_number(null)
       else extract(year from hiredate)
       end  hireyear
,      avg(sal) "average salary"
,      count(empno) "num in job"
from   scott.emp
group
by     case sys_context('query_switch_ctx', 'group_by_department')
       when 'N'
       then null
       else deptno
       end
,      case sys_context('query_switch_ctx', 'group_by_job')
       when 'N'
       then 'ALL'
       else job
       end
,      case sys_context('query_switch_ctx', 'group_by_hireyear')
       when 'N'
       then to_number(null)
       else extract(year from hiredate)
       end
order
by     deptno
,      job
,      hireyear

Before running the query, we can throw the switches to indicate on which columns we want to group.

For example, no column grouping at all – only the overall aggregates:

exec query_switcher.group_by_job_yn('N')
exec query_switcher.group_by_hireyear_yn('N')
exec query_switcher.group_by_department_yn('N')

The results:

   DEPTNO JOB         HIREYEAR average salary num in job
--------- --------- ---------- -------------- ----------
          ALL                      2073.21429         14

We would like to group by Hireyear, without changing the query. To achieve that objective, we have to throw the right switch:

exec query_switcher.group_by_hireyear_yn('Y')

And the results:

    DEPTNO JOB         HIREYEAR average salary num in job
---------- --------- ---------- -------------- ----------
           ALL             1980            800          1
           ALL             1981         2282.5         10
           ALL             1982           1300          1
           ALL             1987           2050          2

Try another grouping – throw the (additional) switches:

exec query_switcher.group_by_job_yn('Y')

And the results:

    DEPTNO JOB         HIREYEAR average salary num in job
---------- --------- ---------- -------------- ----------
           ANALYST         1981           3000          1
           ANALYST         1987           3000          1
           CLERK           1980            800          1
           CLERK           1981            950          1
           CLERK           1982           1300          1
           CLERK           1987           1100          1
           MANAGER         1981     2758.33333          3
           PRESIDENT       1981           5000          1
           SALESMAN        1981           1400          4

Change the grouping – no more grouping on Job and start grouping on Department:

exec query_switcher.group_by_department_yn('Y')
exec query_switcher.group_by_job_yn('N')

And the query returns:

 DEPTNO JOB         HIREYEAR average salary num in job
------- --------- ---------- -------------- ----------
     10 ALL             1981           3725          2
     10 ALL             1982           1300          1
     20 ALL             1980            800          1
     20 ALL             1981         2987.5          2
     20 ALL             1987           2050          2
     30 ALL             1981     1566.66667          6

exec query_switcher.group_by_department_yn('N')
exec query_switcher.group_by_hireyear_yn('N')
exec query_switcher.group_by_job_yn('Y')


  DEPTNO JOB         HIREYEAR average salary num in job
-------- --------- ---------- -------------- ----------
         ANALYST                        3000          2
         CLERK                        1037.5          4
         MANAGER                  2758.33333          3
         PRESIDENT                      5000          1
         SALESMAN                       1400          4

And on and on…

So we achieved what we set out to do: the ability to dynamically change the grouping of the query – no change needed to the query, only manipulation of switches in a context. We could create a View from this Query and use it from Oracle Forms, APEX or Java Web Applications.

Enter: CUBE

I had the feeling that the CUBE aggregate operator could also be used in situations like this. To achieve similar functionality in a perhaps even leaner way. CUBE has been around since 9iR2 – about 2002 – and I have not been able to find many situations where the CUBE could meaningfully be used. But here I have one.

Let’s return to a two dimension aggregation situation, aggregate on DEPTNO and JOB:

select deptno
,      job
,      sum(sal)
from   emp
group
by     cube(deptno, job)

The CUBE causes the aggregation to be performed in multiple ways in this single query. Aggregation is done for:

  • all combinations of deptno and job
  • all values of deptno (over all job values)
  • all values of job (over all deptno values)
  • over all (grand total)

The outcome:

DEPTNO JOB         SUM(SAL)
------ --------- ----------
                      29025
       CLERK           4150
       ANALYST         6000
       MANAGER         8275
       SALESMAN        5600
       PRESIDENT       5000
    10                 8750
    10 CLERK           1300
    10 MANAGER         2450
    10 PRESIDENT       5000
    20                10875
    20 CLERK           1900
    20 ANALYST         6000
    20 MANAGER         2975
    30                 9400
    30 CLERK            950
    30 MANAGER         2850
    30 SALESMAN        5600

We the CUBE is used in a query, we can use the GROUPING function. This function can be used as an indicator whether the query groups on (aggregates over) a  specific column. The previous query with GROUPING looks like this:

select deptno
,      job
,      sum(sal)
,      grouping(job)
,      grouping(deptno)
from   emp
group
by     cube(deptno, job)

and results in:

   DEPTNO JOB         SUM(SAL) GROUPING(JOB) GROUPING(DEPTNO)
--------- --------- ---------- ------------- ----------------
                         29025             1                1
          CLERK           4150             0                1
          ANALYST         6000             0                1
          MANAGER         8275             0                1
          SALESMAN        5600             0                1
          PRESIDENT       5000             0                1
       10                 8750             1                0
       10 CLERK           1300             0                0
       10 MANAGER         2450             0                0
       10 PRESIDENT       5000             0                0
       20                10875             1                0
       20 CLERK           1900             0                0
       20 ANALYST         6000             0                0
       20 MANAGER         2975             0                0
       30                 9400             1                0
       30 CLERK            950             0                0
       30 MANAGER         2850             0                0
       30 SALESMAN        5600             0                0

We can apply a filter on the grouping results:

select *
from ( select deptno
       ,      job
       ,      sum(sal)
       ,      grouping(job) gr_job
       ,      grouping(deptno) gr_deptno
       from   emp
       group
       by     cube(deptno, job)
     )
where gr_job = case sys_context('query_switch_ctx', 'group_by_job')
               when 'Y'
               then 0
               else 1
               end
and   gr_deptno = case sys_context('query_switch_ctx', 'group_by_deptno')
                  when 'Y'
                  then 0
                  else 1
                  end

With switches:

exec query_switcher.group_by_department_yn('N')
exec query_switcher.group_by_job_yn('Y')

And outcome:

 DEPTNO JOB         SUM(SAL)     GR_JOB  GR_DEPTNO
------- --------- ---------- ---------- ----------
        ANALYST         6000          0          1
        CLERK           4150          0          1
        MANAGER         8275          0          1
        PRESIDENT       5000          0          1
        SALESMAN        5600          0          1

So we see that the CUBE operator helps us to make the core query a little simpler – no group by clause riddles with CASE statements. And when we use the CUBE we can simply apply a where clause to determine through plain filtering which aggregation levels should be effective.

Note by the way the slightly counterintuitive values of the GROUPING function compared to the switchers I defined: when we say ‘group by Job’  we want to return all rows where grouping(job) is equal to 0. Probably my misunderstanding…

Share.

About Author

Lucas Jellema, active in IT (and with Oracle) since 1994. Oracle ACE Director for Fusion Middleware. Consultant, trainer and instructor on diverse areas including Oracle Database (SQL & PLSQL), Service Oriented Architecture, BPM, ADF, Java in various shapes and forms and many other things. Author of the Oracle Press book: Oracle SOA Suite 11g Handbook. Frequent presenter on conferences such as JavaOne, Oracle OpenWorld, ODTUG Kaleidoscope, Devoxx and OBUG. Presenter for Oracle University Celebrity specials.

3 Comments

  1. Hi Lucas,

    You can use CUBE as an inefficient row generator (“select rownum from (select 1 from dual group by cube(1,1,1,1)) where rownum <= 13″) or maybe for some obscure BI query or report. I’ll have to ask this to some BI colleagues. However, you said it perfectly: “CUBE will aggregate along all dimensions and at all levels – while we will always need just a subset”. I think CUBE has become almost obsolete for normal development, since the advent of grouping sets. With grouping sets you can specify exactly which grouping you want, without having to calculate everything first and then filter.

    Regards,
    Rob.

  2. Hi Rob,
    Thanks for your response. Yes, you are right on both accounts. I did not use the HAVING because of the way in which I expect the application to make use of this query. Manipulating the where-clause is much easier in the application than fiddling with the HAVING clause, that’s why I approached it like this. And of course the CUBE will aggregate along all dimensions and at all levels – while we will always need just a subset. How costly that proves to be? The percentage unnecessary aggregations can grow quite large, depending on the logic in the application – up to 95% or more. So perhaps this is not a very good example of CUBE usage after all. Do you know one?
    regards, Lucas

  3. Lucas,

    Nice post.
    Two remarks though:
    1) the CUBE query will look much easier when using the HAVING clause, since there won’t be a need for the inline view in that case.
    2) the CUBE query will be less performant than your first query as it has to calculate all grouping sets first before being able to filter on them. To verify: in the explain plan you’ll see a “GENERATE CUBE” step with a much larger number in the A-rows column than the number of rows in your table.

    Regards,
    Rob.