Dynamic and Conditional grouping in SQL queries – for flexible results from single query (oh and a useful case for the CUBE operator)
My challenge of the day: a query that returns aggregate values. And that can be manipulated dynamically to sometimes aggregate grouped by one dimension, sometimes along multiple or another single dimensiopn. The query will be used to build a report page on that displays a table with rows returned from the query. In this page, the user can hide/display columns. Whenever a column is hidden, the query should no longer group by the columnn. However, it should still be the same query.
In this article, we will look at a way to bring this dynamic, conditional aggregation behavior into a SQL query. How the query will perform aggregating on different levels without change to the query itself. We will use wildly interesting concepts such as the database context, the CASE statement and last but not least, the CUBE (aggregate) operator.
In a subsequent article we will also take a look at the web page that we create on top of this query.
Let’s start with the mother of all queries:
select deptno , job , avg(sal) "average salary" , count(empno) "num in job" from scott.emp group by deptno , job
The results from this initial query shows aggregate values – average salary and count of staff in job – per Department and per Job:
DEPTNO JOB average salary num in job
---------- --------- -------------- ----------
10 PRESIDENT 5000 1
10 CLERK 1300 1
10 MANAGER 2450 1
20 MANAGER 2975 1
20 ANALYST 3000 2
20 CLERK 950 2
30 SALESMAN 1400 4
30 CLERK 950 1
30 MANAGER 2850 1
We want to change the query to only make it group per JOB for other departments than 30. We can change the query using a CASE statement like this:
select deptno
, case deptno
when 30
then 'ALL'
else job
end job
, avg(sal) "average salary"
, count(empno) "num in job"
from scott.emp
group
by deptno
, case deptno
when 30
then 'ALL'
else job
end
order
by deptno
, job
The results:
DEPTNO JOB average salary num in job
------ --------- -------------- ----------
10 CLERK 1300 1
10 MANAGER 2450 1
10 PRESIDENT 5000 1
20 ANALYST 3000 2
20 CLERK 950 2
20 MANAGER 2975 1
30 ALL 1566.66667 6
Here you see that we group per JOB within departments 10 and 20 but not 30. This is our hook into a more general approach to conditional aggregating.
We want to be able to set switches that specify grouping by Department, Job and Hireyear with simple statements like:
exec query_switcher.group_by_job_yn('Y')
exec query_switcher.group_by_department_yn('N')
The best way to pass information into SQL queriesis through the use of an (application) context. The optimizer treats references to SYS_CONTEXT as bind parameters – making for a much more efficient execution plan than we get through the use of calls to functions in packages that return the values of global package variables.
To set up a context, we first create the package to manage the context contents:
create or replace package query_switcher is procedure group_by_department_yn(p_yn in varchar2); procedure group_by_job_yn(p_yn in varchar2); procedure group_by_hireyear_yn(p_yn in varchar2); end;
Now we can create the context
create context query_switch_ctx using query_switcher
And then the package body:
create or replace package body query_switcher is procedure group_by_department_yn(p_yn in varchar2) is begin DBMS_SESSION.SET_CONTEXT( 'query_switch_ctx', 'group_by_department', p_yn); end; procedure group_by_job_yn(p_yn in varchar2) is begin DBMS_SESSION.SET_CONTEXT( 'query_switch_ctx', 'group_by_job', p_yn); end; procedure group_by_hireyear_yn(p_yn in varchar2) is begin DBMS_SESSION.SET_CONTEXT( 'query_switch_ctx', 'group_by_hireyear', p_yn); end; end;
With all this in place, we can create the query itself that uses the switches from the context for deciding which behavior to assume:
select case sys_context('query_switch_ctx', 'group_by_department')
when 'N'
then null
else deptno
end deptno
, case sys_context('query_switch_ctx', 'group_by_job')
when 'N'
then 'ALL'
else job
end job
, case sys_context('query_switch_ctx', 'group_by_hireyear')
when 'N'
then to_number(null)
else extract(year from hiredate)
end hireyear
, avg(sal) "average salary"
, count(empno) "num in job"
from scott.emp
group
by case sys_context('query_switch_ctx', 'group_by_department')
when 'N'
then null
else deptno
end
, case sys_context('query_switch_ctx', 'group_by_job')
when 'N'
then 'ALL'
else job
end
, case sys_context('query_switch_ctx', 'group_by_hireyear')
when 'N'
then to_number(null)
else extract(year from hiredate)
end
order
by deptno
, job
, hireyear
Before running the query, we can throw the switches to indicate on which columns we want to group.
For example, no column grouping at all – only the overall aggregates:
exec query_switcher.group_by_job_yn('N')
exec query_switcher.group_by_hireyear_yn('N')
exec query_switcher.group_by_department_yn('N')
The results:
DEPTNO JOB HIREYEAR average salary num in job
--------- --------- ---------- -------------- ----------
ALL 2073.21429 14
We would like to group by Hireyear, without changing the query. To achieve that objective, we have to throw the right switch:
exec query_switcher.group_by_hireyear_yn('Y')
And the results:
DEPTNO JOB HIREYEAR average salary num in job
---------- --------- ---------- -------------- ----------
ALL 1980 800 1
ALL 1981 2282.5 10
ALL 1982 1300 1
ALL 1987 2050 2
Try another grouping – throw the (additional) switches:
exec query_switcher.group_by_job_yn('Y')
And the results:
DEPTNO JOB HIREYEAR average salary num in job
---------- --------- ---------- -------------- ----------
ANALYST 1981 3000 1
ANALYST 1987 3000 1
CLERK 1980 800 1
CLERK 1981 950 1
CLERK 1982 1300 1
CLERK 1987 1100 1
MANAGER 1981 2758.33333 3
PRESIDENT 1981 5000 1
SALESMAN 1981 1400 4
Change the grouping – no more grouping on Job and start grouping on Department:
exec query_switcher.group_by_department_yn('Y')
exec query_switcher.group_by_job_yn('N')
And the query returns:
DEPTNO JOB HIREYEAR average salary num in job
------- --------- ---------- -------------- ----------
10 ALL 1981 3725 2
10 ALL 1982 1300 1
20 ALL 1980 800 1
20 ALL 1981 2987.5 2
20 ALL 1987 2050 2
30 ALL 1981 1566.66667 6
exec query_switcher.group_by_department_yn('N')
exec query_switcher.group_by_hireyear_yn('N')
exec query_switcher.group_by_job_yn('Y')
DEPTNO JOB HIREYEAR average salary num in job
-------- --------- ---------- -------------- ----------
ANALYST 3000 2
CLERK 1037.5 4
MANAGER 2758.33333 3
PRESIDENT 5000 1
SALESMAN 1400 4
And on and on…
So we achieved what we set out to do: the ability to dynamically change the grouping of the query – no change needed to the query, only manipulation of switches in a context. We could create a View from this Query and use it from Oracle Forms, APEX or Java Web Applications.
Enter: CUBE
I had the feeling that the CUBE aggregate operator could also be used in situations like this. To achieve similar functionality in a perhaps even leaner way. CUBE has been around since 9iR2 – about 2002 – and I have not been able to find many situations where the CUBE could meaningfully be used. But here I have one.
Let’s return to a two dimension aggregation situation, aggregate on DEPTNO and JOB:
select deptno , job , sum(sal) from emp group by cube(deptno, job)
The CUBE causes the aggregation to be performed in multiple ways in this single query. Aggregation is done for:
- all combinations of deptno and job
- all values of deptno (over all job values)
- all values of job (over all deptno values)
- over all (grand total)
The outcome:
DEPTNO JOB SUM(SAL)
------ --------- ----------
29025
CLERK 4150
ANALYST 6000
MANAGER 8275
SALESMAN 5600
PRESIDENT 5000
10 8750
10 CLERK 1300
10 MANAGER 2450
10 PRESIDENT 5000
20 10875
20 CLERK 1900
20 ANALYST 6000
20 MANAGER 2975
30 9400
30 CLERK 950
30 MANAGER 2850
30 SALESMAN 5600
We the CUBE is used in a query, we can use the GROUPING function. This function can be used as an indicator whether the query groups on (aggregates over) a specific column. The previous query with GROUPING looks like this:
select deptno , job , sum(sal) , grouping(job) , grouping(deptno) from emp group by cube(deptno, job)
and results in:
DEPTNO JOB SUM(SAL) GROUPING(JOB) GROUPING(DEPTNO)
--------- --------- ---------- ------------- ----------------
29025 1 1
CLERK 4150 0 1
ANALYST 6000 0 1
MANAGER 8275 0 1
SALESMAN 5600 0 1
PRESIDENT 5000 0 1
10 8750 1 0
10 CLERK 1300 0 0
10 MANAGER 2450 0 0
10 PRESIDENT 5000 0 0
20 10875 1 0
20 CLERK 1900 0 0
20 ANALYST 6000 0 0
20 MANAGER 2975 0 0
30 9400 1 0
30 CLERK 950 0 0
30 MANAGER 2850 0 0
30 SALESMAN 5600 0 0
We can apply a filter on the grouping results:
select *
from ( select deptno
, job
, sum(sal)
, grouping(job) gr_job
, grouping(deptno) gr_deptno
from emp
group
by cube(deptno, job)
)
where gr_job = case sys_context('query_switch_ctx', 'group_by_job')
when 'Y'
then 0
else 1
end
and gr_deptno = case sys_context('query_switch_ctx', 'group_by_deptno')
when 'Y'
then 0
else 1
end
With switches:
exec query_switcher.group_by_department_yn('N')
exec query_switcher.group_by_job_yn('Y')
And outcome:
DEPTNO JOB SUM(SAL) GR_JOB GR_DEPTNO
------- --------- ---------- ---------- ----------
ANALYST 6000 0 1
CLERK 4150 0 1
MANAGER 8275 0 1
PRESIDENT 5000 0 1
SALESMAN 5600 0 1
So we see that the CUBE operator helps us to make the core query a little simpler – no group by clause riddles with CASE statements. And when we use the CUBE we can simply apply a where clause to determine through plain filtering which aggregation levels should be effective.
Note by the way the slightly counterintuitive values of the GROUPING function compared to the switchers I defined: when we say ‘group by Job’ we want to return all rows where grouping(job) is equal to 0. Probably my misunderstanding…
Hi Lucas,
You can use CUBE as an inefficient row generator (“select rownum from (select 1 from dual group by cube(1,1,1,1)) where rownum <= 13″) or maybe for some obscure BI query or report. I’ll have to ask this to some BI colleagues. However, you said it perfectly: “CUBE will aggregate along all dimensions and at all levels – while we will always need just a subset”. I think CUBE has become almost obsolete for normal development, since the advent of grouping sets. With grouping sets you can specify exactly which grouping you want, without having to calculate everything first and then filter.
Regards,
Rob.