Cost Based Optimizing through time travel? – the value of meta-data for enabling Query Rewrite

5

Query Rewrite is an increasingly more important feature of the Oracle database. Linked intimately to Materialized Views, Query Rewrite is used by the CBO to make queries leverage intermediate results – often aggregate values or pre-joined records – available in Materialized Views when executing a query, thus preventing the need for and performance hit of re-retrieving those data. This article gives a very simple example of such a query rewrite operation and then continues to illustrate how providing some additional meta-data to the database – next to constraints and proper data-type definitions – in the form of Dimensions can make all the different to the Query Rewrite capabilities of the CBO.

Before we start on this exploration, we will have to ask you to assume that the EMP table has millions of records. Otherwise, you will probably not be duly impressed.

....

In order to be able to efficiently query the records in our Employee data warehouse (i.e. table EMP), we have created a Materialized View, as follows:

create materialized view emp_time_mv<br />build immediate<br />refresh on demand<br />enable query rewrite<br />as<br />select to_char(hiredate, 'mmyyyy') mmyyyy<br />,      count(sal)<br />,      sum(sal)<br />from   emp e<br />group<br />by     to_char(hiredate, 'mmyyyy')<br />/<br />&nbsp;

This Materialized View is Query rewrite enabled. That means that we allow the Cost Based Optimizer to make use of this MV if that helps for speeding up our queries against base table EMP. The consequence can be that the following query against our monstrous EMP table:

select *<br /> from   ( select avg(sal) avg_sal<br />          ,      to_char(hiredate, 'mmyyyy') mmyyyy<br />          from   emp e<br />          group<br />          by     to_char(hiredate, 'mmyyyy')  <br />        )<br /> where  mmyyyy = 121981<br />&nbsp;

can be executed by the CBO as query against the Materialized View with a maximum of some hundreds of records:


 
What might surprise you here is that the CBO is able to recognize the fact that AVG is calculated from SUM and COUNT, and therefore it can leverage the intermediate results in the Materialized view.

Now suppose we would like to have the average salary per year. What if we query for that value against table EMP, will the CBO be smart enough to do the query rewrite?

select avg(sal) avg_sal<br />,      substr(to_char(hiredate, 'mmyyyy'),3) yyyy<br />from   emp e<br />group<br />by     substr(to_char(hiredate, 'mmyyyy'), 3)<br />&nbsp;

The results are fine, as is the execution plan:

Because we asked for aggregation by year – with year a substring of the myyyy string we have used as a basis for the Materialized View -the CBO was able to rewrite our query and make use of the MV.

Now suppose a different situation. We are still interested in aggregate salary values per timeframe. We want look at Quarters and we want to specify some additional characteristics for the years.

We retrace our steps a little bit. First we create a table to hold the dimension values, here for the time dimension:

create table time_hierarchy <br />( mmyyyy, qtr_yyyy, yyyy)<br />as <br />select distinct<br />       to_char(hiredate, 'mmyyyy') mmyyyy<br />,      'Q'||to_char(hiredate, 'q')||to_char(hiredate, 'yyyy') qtr_yyyy<br />,      to_char(hiredate, 'yyyy') yyyy<br />from   emp<br /><br />

This table contains records for every month in every year that we have Employee records for.
 
Now we recreate the Materialized View EMP_TIME_MV

create materialized view emp_time_Mv<br /> build immediate<br /> refresh on demand<br /> enable query rewrite<br /> as<br /> select t.mmyyyy<br /> ,      count(sal)<br /> ,      sum(sal)<br /> from   emp e<br /> ,      time_hierarchy t<br /> where  to_char(hiredate, 'mmyyyy') = t.mmyyyy<br /> group<br /> by     t.mmyyyy<br />/<br />

Here we have joined the EMP aggregation results with the entries in the Time Dimension. At this point, the CBO is still unaware of the hierarchical relationship between MMYYY through QTR_YYYY to YYYY. If we query:

select t.yyyy<br />,      avg(sal)<br />from   emp e<br />,      time_hierarchy t<br />where  to_char(hiredate, 'mmyyyy') = t.mmyyyy<br />group<br />by     t.yyyy<br />/<br /><br />

the CBO does not know it can make use of the Materialized View.
 
Let’s tell the CBO about the hierarchy between Months, Quarters and Years:

create dimension time_hierarchy<br /> level mmyyyy is time_hierarchy.mmyyyy<br /> level qtr_yyyy is time_hierarchy.qtr_yyyy<br /> level yyyy is time_hierarchy.yyyy <br />hierarchy time_roll_up<br />( mmyyyy child of qtr_yyyy child of yyyy<br />)<br /><br />

If we now try to select the aggregate year data – the average salary per hire year – we will see the CBO is able to leverage the Materialized, thereby speeding up the query tremendously:

select t.yyyy<br />,      avg(sal)<br />from   emp e<br />,      time_hierarchy t<br />where  to_char(hiredate, 'mmyyyy') = t.mmyyyy<br />group<br />by     t.yyyy<br />/<br /><br />

Share.

About Author

Lucas Jellema, active in IT (and with Oracle) since 1994. Oracle ACE Director for Fusion Middleware. Consultant, trainer and instructor on diverse areas including Oracle Database (SQL & PLSQL), Service Oriented Architecture, BPM, ADF, Java in various shapes and forms and many other things. Author of the Oracle Press book: Oracle SOA Suite 11g Handbook. Frequent presenter on conferences such as JavaOne, Oracle OpenWorld, ODTUG Kaleidoscope, Devoxx and OBUG. Presenter for Oracle University Celebrity specials.

5 Comments

  1. That are indeed exciting features, interesting is as well, the rewrite can be “bidirectional”, i.e. if your materialized view is built on count(*) and avg(*), it is used to query a sum (it means formula to calculate the total sum from partial averages and partial counts is known to Query Transformer), there are even much more complex transformations known ( total variance can be calculated from partial variances etc). Also big help to recognize rewrite capabilities does dbms_mview.explain_rewrite.

    Regards

    Maxim

  2. Great article!
    ‘That means that we allow the Cost Based Optimizer to make use of this MV’
    I am not sure about this : Not CBO rewrites the Query but a Component call Query Rewriter? the Query Rewriter rewrites the Query and then let’s the CBO evalualte the costs? Then out of the set of rewritten queries the lowes cost rewritten query is used.
    Greetings
    Karl