Cost Based Optimizing through time travel? – the value of meta-data for enabling Query Rewrite

Lucas Jellema 5
0 0
Read Time:3 Minute, 43 Second

Query Rewrite is an increasingly more important feature of the Oracle database. Linked intimately to Materialized Views, Query Rewrite is used by the CBO to make queries leverage intermediate results – often aggregate values or pre-joined records – available in Materialized Views when executing a query, thus preventing the need for and performance hit of re-retrieving those data. This article gives a very simple example of such a query rewrite operation and then continues to illustrate how providing some additional meta-data to the database – next to constraints and proper data-type definitions – in the form of Dimensions can make all the different to the Query Rewrite capabilities of the CBO.

Before we start on this exploration, we will have to ask you to assume that the EMP table has millions of records. Otherwise, you will probably not be duly impressed.

....

In order to be able to efficiently query the records in our Employee data warehouse (i.e. table EMP), we have created a Materialized View, as follows:

create materialized view emp_time_mv
build immediate
refresh on demand
enable query rewrite
as
select to_char(hiredate, 'mmyyyy') mmyyyy
, count(sal)
, sum(sal)
from emp e
group
by to_char(hiredate, 'mmyyyy')
/
 

This Materialized View is Query rewrite enabled. That means that we allow the Cost Based Optimizer to make use of this MV if that helps for speeding up our queries against base table EMP. The consequence can be that the following query against our monstrous EMP table:

select *
from ( select avg(sal) avg_sal
, to_char(hiredate, 'mmyyyy') mmyyyy
from emp e
group
by to_char(hiredate, 'mmyyyy')
)
where mmyyyy = 121981
 

can be executed by the CBO as query against the Materialized View with a maximum of some hundreds of records:


 
What might surprise you here is that the CBO is able to recognize the fact that AVG is calculated from SUM and COUNT, and therefore it can leverage the intermediate results in the Materialized view.

Now suppose we would like to have the average salary per year. What if we query for that value against table EMP, will the CBO be smart enough to do the query rewrite?

select avg(sal) avg_sal
, substr(to_char(hiredate, 'mmyyyy'),3) yyyy
from emp e
group
by substr(to_char(hiredate, 'mmyyyy'), 3)
 

The results are fine, as is the execution plan:

Because we asked for aggregation by year – with year a substring of the myyyy string we have used as a basis for the Materialized View -the CBO was able to rewrite our query and make use of the MV.

Now suppose a different situation. We are still interested in aggregate salary values per timeframe. We want look at Quarters and we want to specify some additional characteristics for the years.

We retrace our steps a little bit. First we create a table to hold the dimension values, here for the time dimension:

create table time_hierarchy 
( mmyyyy, qtr_yyyy, yyyy)
as
select distinct
to_char(hiredate, 'mmyyyy') mmyyyy
, 'Q'||to_char(hiredate, 'q')||to_char(hiredate, 'yyyy') qtr_yyyy
, to_char(hiredate, 'yyyy') yyyy
from emp

This table contains records for every month in every year that we have Employee records for.
 
Now we recreate the Materialized View EMP_TIME_MV

create materialized view emp_time_Mv
build immediate
refresh on demand
enable query rewrite
as
select t.mmyyyy
, count(sal)
, sum(sal)
from emp e
, time_hierarchy t
where to_char(hiredate, 'mmyyyy') = t.mmyyyy
group
by t.mmyyyy
/

Here we have joined the EMP aggregation results with the entries in the Time Dimension. At this point, the CBO is still unaware of the hierarchical relationship between MMYYY through QTR_YYYY to YYYY. If we query:

select t.yyyy
, avg(sal)
from emp e
, time_hierarchy t
where to_char(hiredate, 'mmyyyy') = t.mmyyyy
group
by t.yyyy
/

the CBO does not know it can make use of the Materialized View.
 
Let’s tell the CBO about the hierarchy between Months, Quarters and Years:

create dimension time_hierarchy
level mmyyyy is time_hierarchy.mmyyyy
level qtr_yyyy is time_hierarchy.qtr_yyyy
level yyyy is time_hierarchy.yyyy
hierarchy time_roll_up
( mmyyyy child of qtr_yyyy child of yyyy
)

If we now try to select the aggregate year data – the average salary per hire year – we will see the CBO is able to leverage the Materialized, thereby speeding up the query tremendously:

select t.yyyy
, avg(sal)
from emp e
, time_hierarchy t
where to_char(hiredate, 'mmyyyy') = t.mmyyyy
group
by t.yyyy
/

About Post Author

Lucas Jellema

Lucas Jellema, active in IT (and with Oracle) since 1994. Oracle ACE Director and Oracle Developer Champion. Solution architect and developer on diverse areas including SQL, JavaScript, Kubernetes & Docker, Machine Learning, Java, SOA and microservices, events in various shapes and forms and many other things. Author of the Oracle Press book Oracle SOA Suite 12c Handbook. Frequent presenter on user groups and community events and conferences such as JavaOne, Oracle Code, CodeOne, NLJUG JFall and Oracle OpenWorld.
Happy
Happy
0 %
Sad
Sad
0 %
Excited
Excited
0 %
Sleepy
Sleepy
0 %
Angry
Angry
0 %
Surprise
Surprise
0 %

Average Rating

5 Star
0%
4 Star
0%
3 Star
0%
2 Star
0%
1 Star
0%

5 thoughts on “Cost Based Optimizing through time travel? – the value of meta-data for enabling Query Rewrite

  1. That are indeed exciting features, interesting is as well, the rewrite can be “bidirectional”, i.e. if your materialized view is built on count(*) and avg(*), it is used to query a sum (it means formula to calculate the total sum from partial averages and partial counts is known to Query Transformer), there are even much more complex transformations known ( total variance can be calculated from partial variances etc). Also big help to recognize rewrite capabilities does dbms_mview.explain_rewrite.

    Regards

    Maxim

  2. Great article!
    ‘That means that we allow the Cost Based Optimizer to make use of this MV’
    I am not sure about this : Not CBO rewrites the Query but a Component call Query Rewriter? the Query Rewriter rewrites the Query and then let’s the CBO evalualte the costs? Then out of the set of rewritten queries the lowes cost rewritten query is used.
    Greetings
    Karl

Comments are closed.

Next Post

Find the number of descendant nodes in a SQL query - how many employees work (indirectly) for me? And what do they earn on average?

Every once in a while you run into a SQL challenge that seems extremely simple at first glance and turns out to be not so very simple when you look a bit deeper. Hierarchical queries are not part of my every day diet, but they occur quite regularly. So I […]
%d bloggers like this: