Cost Based Optimizing through time travel? – the value of meta-data for enabling Query Rewrite

Query Rewrite is an increasingly more important feature of the Oracle database. Linked intimately to Materialized Views, Query Rewrite is used by the CBO to make queries leverage intermediate results – often aggregate values or pre-joined records – available in Materialized Views when executing a query, thus preventing the need for and performance hit of re-retrieving those data. This article gives a very simple example of such a query rewrite operation and then continues to illustrate how providing some additional meta-data to the database – next to constraints and proper data-type definitions – in the form of Dimensions can make all the different to the Query Rewrite capabilities of the CBO.

Before we start on this exploration, we will have to ask you to assume that the EMP table has millions of records. Otherwise, you will probably not be duly impressed.

....

In order to be able to efficiently query the records in our Employee data warehouse (i.e. table EMP), we have created a Materialized View, as follows:

create materialized view emp_time_mv
build immediate
refresh on demand
enable query rewrite
as
select to_char(hiredate, 'mmyyyy') mmyyyy
, count(sal)
, sum(sal)
from emp e
group
by to_char(hiredate, 'mmyyyy')
/
 

This Materialized View is Query rewrite enabled. That means that we allow the Cost Based Optimizer to make use of this MV if that helps for speeding up our queries against base table EMP. The consequence can be that the following query against our monstrous EMP table:

select *
from ( select avg(sal) avg_sal
, to_char(hiredate, 'mmyyyy') mmyyyy
from emp e
group
by to_char(hiredate, 'mmyyyy')
)
where mmyyyy = 121981
 

can be executed by the CBO as query against the Materialized View with a maximum of some hundreds of records:

Cost Based Optimizing through time travel? - the value of meta-data for enabling Query Rewrite qrw1
 
What might surprise you here is that the CBO is able to recognize the fact that AVG is calculated from SUM and COUNT, and therefore it can leverage the intermediate results in the Materialized view.

Now suppose we would like to have the average salary per year. What if we query for that value against table EMP, will the CBO be smart enough to do the query rewrite?

select avg(sal) avg_sal
, substr(to_char(hiredate, 'mmyyyy'),3) yyyy
from emp e
group
by substr(to_char(hiredate, 'mmyyyy'), 3)
 

The results are fine, as is the execution plan:

Cost Based Optimizing through time travel? - the value of meta-data for enabling Query Rewrite qrw2

Because we asked for aggregation by year – with year a substring of the myyyy string we have used as a basis for the Materialized View -the CBO was able to rewrite our query and make use of the MV.

Now suppose a different situation. We are still interested in aggregate salary values per timeframe. We want look at Quarters and we want to specify some additional characteristics for the years.

We retrace our steps a little bit. First we create a table to hold the dimension values, here for the time dimension:

create table time_hierarchy 
( mmyyyy, qtr_yyyy, yyyy)
as
select distinct
to_char(hiredate, 'mmyyyy') mmyyyy
, 'Q'||to_char(hiredate, 'q')||to_char(hiredate, 'yyyy') qtr_yyyy
, to_char(hiredate, 'yyyy') yyyy
from emp

This table contains records for every month in every year that we have Employee records for.
Cost Based Optimizing through time travel? - the value of meta-data for enabling Query Rewrite qrw3 
Now we recreate the Materialized View EMP_TIME_MV

create materialized view emp_time_Mv
build immediate
refresh on demand
enable query rewrite
as
select t.mmyyyy
, count(sal)
, sum(sal)
from emp e
, time_hierarchy t
where to_char(hiredate, 'mmyyyy') = t.mmyyyy
group
by t.mmyyyy
/

Here we have joined the EMP aggregation results with the entries in the Time Dimension. At this point, the CBO is still unaware of the hierarchical relationship between MMYYY through QTR_YYYY to YYYY. If we query:

select t.yyyy
, avg(sal)
from emp e
, time_hierarchy t
where to_char(hiredate, 'mmyyyy') = t.mmyyyy
group
by t.yyyy
/

the CBO does not know it can make use of the Materialized View.
Cost Based Optimizing through time travel? - the value of meta-data for enabling Query Rewrite qrw4 
Let’s tell the CBO about the hierarchy between Months, Quarters and Years:

create dimension time_hierarchy
level mmyyyy is time_hierarchy.mmyyyy
level qtr_yyyy is time_hierarchy.qtr_yyyy
level yyyy is time_hierarchy.yyyy
hierarchy time_roll_up
( mmyyyy child of qtr_yyyy child of yyyy
)

If we now try to select the aggregate year data – the average salary per hire year – we will see the CBO is able to leverage the Materialized, thereby speeding up the query tremendously:

select t.yyyy
, avg(sal)
from emp e
, time_hierarchy t
where to_char(hiredate, 'mmyyyy') = t.mmyyyy
group
by t.yyyy
/

Cost Based Optimizing through time travel? - the value of meta-data for enabling Query Rewrite qrw5

5 Comments

  1. Maxim April 21, 2006
  2. Karl April 21, 2006
  3. Alex Nuijten April 21, 2006
  4. Karl April 21, 2006
  5. Marco Gralike April 21, 2006