Smart denormalization and Enhanced PL/SQL Function Call performance - Comparing Function Based Index and Materialized Views Oracle Headquarters Redwood Shores1 e1698667100526

Smart denormalization and Enhanced PL/SQL Function Call performance – Comparing Function Based Index and Materialized Views

Queries that contain calls to user defined PL/SQL Functions can be performance killers. If a function call is part of the WHERE clause, depending on the execution plan it can be performed for every record that is considered for returning in the query result. For large tables that can lead to thousands to even million of Function invocations (executions and SQL to PL/SQL context switches) in a single query. Needless to say this might add up. Especially if the function itself is non trivial and has to perform derivations or calculations that are not infinitely fast.

Since Oracle 8i we can make use of Function Based Indexes to alleviate some of our distress. With FBI the database pre-calculates the function on each record in the table and stores the outcome in an Index Structure. When the query tries to find records that have a specific result for the function call, the Cost Based Optimizer can decide to use the FBI to directly go to the records that satisfy that query condition. However, this will only help if we have an equality search (where function-result = value). If we want to find records that have values similar to or larger than the function result, the FBI cannot be used. Or if we want to query a number of records from the table, not filtering on the function result, but including the function result for each record, we cannot simply find that function result from the precalculated set of values recorded in the FBI.

With the advent of Materialized Views and the ever increasing query rewrite capabilities, it seems that Materialized Views can take over from or at least complement the Function Based Index functionality. In this article we will investigate what Materialized Views have to offer and how they hold up against the Function Based Index.

 

Let’s first create a test-table to run our queries against. Inspired by countless articles by Tom Kyte and Alex Nuijten I will create a table based on ALL_OBJECTS:

 create table ao1 
as 
select object_name
,      object_type
,      last_ddl_time
,      owner 
,      object_id 
from   all_objects 
where  rownum < 25000
/

alter table ao1
add constraint ao_pk primary key (object_id)
/

select count(*) 
from ao1
/
  COUNT(*)
----------
     24999

This table contains 25000 records – enough to get some meaningful results. The function to be used for our experiment is the following:

create or replace function calculate_stuff
( p_name in varchar2
, p_type in varchar2
) return varchar2
deterministic
is
begin
  keep_count.g_ctr:= keep_count.g_ctr + 1;
  for i in 1..100000 loop null; end loop; -- slow it down a little
  return initcap(p_type)||': '||lower(p_name);
end;
/

It does nothing special: concattenating three string values – and spending some time in an idle loop to make the function execution somewhat non-trivial timewise. The package it calls – keep_count.g_ctr – is a package I have used to keep track of the number of function invocations.

We will first set up the Function Based Index as well as the Materialized View. Note that all queries in the remainer of this article are executed against AO1 to test the Function Based Index and against AO2 to try the Materialized View approach. We create table AO2 in exactly the same way as AO1:

create table ao2 
as 
select object_name
,      object_type
,      last_ddl_time
,      owner 
,      object_id 
from   all_objects 
where  rownum < 25000
/

alter table ao2
add constraint ao2_pk primary key (object_id)
/

Setting up the Function Based Index

Time to create the Function Based Index:

SQL> create index calc_stuff_fbi on ao1( calculate_stuff(object_name, object_type) )
  2  /

Index created.

Elapsed: 00:02:10.50

Creating the index takes little over 2 minutes for our 25000 records.

Setting up the Materialized View

Our Materialized View is very simple: it contains the primary key and the function result:

create materialized view ao2_mv
refresh fast on commit
with primary key
ENABLE QUERY REWRITE
as
select object_id
,      calculate_stuff(object_name, object_type) as calc_stuff
from   ao2
/
Materialized view created.

Elapsed: 00:02:15.86

In order to create this query as Refresh On Commit, we need to create a Materialized View on the underlying table:

create materialized view log on ao2
/
Materialized view log created.

Elapsed: 00:00:01.11

Equality Search – The Function Based Index can Shine!

The query we compare first – the equality search on the Function outcome:

select object_id
,      calculate_stuff(object_name, object_type) 
from   ao1
where calculate_stuff(object_name, object_type) = 'Table: dual'
/

Here are the results for the two situations:

Execution Plan Timing #Function Calls #Records returned
Function Based Index    0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=2 Card=242 Bytes=9922)
1    0   TABLE ACCESS (BY INDEX ROWID) OF ‘AO1’ (TABLE) (Cost=2 Card=242 Bytes=9922)
2    1     INDEX (RANGE SCAN) OF ‘CALC_STUFF_FBI’ (INDEX) (Cost=1 Card=97)
0.1s 0 1
Materialized View 0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=40 Card=2 Bytes=40
30)
1    0   MAT_VIEW REWRITE ACCESS (FULL) OF ‘AO2_MV’ (MAT_VIEW REWRITE) (Cost=40 Card=2 Bytes=4030)
0.3s 0 1

Now we can go a little beyond this result for the Materialized View. We can create an index on the column calc_stuff in the materialized view:

create index calc_stuff_index on ao2_mv (calc_stuff)
/

Index created.

Elapsed: 00:00:00.93

The results for the query are now:

 

Execution Plan Timing #Function Calls #Records returned
Materialized View  0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=2 Card=1 Bytes=2015)
1    0   MAT_VIEW REWRITE ACCESS (BY INDEX ROWID) OF ‘AO2_MV’ (MAT_VIEW REWRITE) (Cost=2 Card=1 Bytes=2015)
2    1     INDEX (RANGE SCAN) OF ‘CALC_STUFF_INDEX’ (INDEX) (Cost=1 Card=1)
0.1s 0 1

 

Querying on similarity – not equality: What good can the FBI do now?

This next query queries for records whose Function Result is similar to a certain value. Will we be able to leverage the fact that the function results are pre-calculated and available in the FBI?

select object_id
,      calculate_stuff(object_name, object_type) 
from   ao1
where calculate_stuff(object_name, object_type) like 'Table:%'

/

And the results:

Execution Plan Timing #Function Calls #Records returned
Function Based Index Execution Plan
———————————————————-
0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=48 Card=1208 Bytes=49528)

1    0   TABLE ACCESS (BY INDEX ROWID) OF ‘AO1’ (TABLE) (Cost=48 Card=1208 Bytes=49528)

2    1     INDEX (RANGE SCAN) OF ‘CALC_STUFF_FBI’ (INDEX) (Cost=3 Card=217)

0.3s 0 20
Materialized View 0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=17 Card=20 Bytes=40300)
1    0   MAT_VIEW REWRITE ACCESS (BY INDEX ROWID) OF ‘AO2_MV’ (MAT_VIEW REWRITE) (Cost=17 Card=20 Bytes=40300)
2    1     INDEX (RANGE SCAN) OF ‘CALC_STUFF_INDEX’ (INDEX) (Cost=2 Card=20)
0.3s 0 20

 

Querying for the Function Result – no where clause referencing the function at all
here

In this case we look at a query that does not filter on the function result but still wants to return that result:

select object_id
,      calculate_stuff(object_name, object_type) 
from   ao1
where rownum < 500
/

We wonder of course whether we can make use of the precalculated values stored in the Function Based Index:

Execution Plan Timing #Function Calls #Records returned
Function Based Index 0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=3 Card=499 Bytes=20459)
1    0   COUNT (STOPKEY)
2    1     TABLE ACCESS (FULL) OF ‘AO1’ (TABLE) (Cost=3 Card=24162 Bytes=990642)
4.4s 499 499
Materialized View 0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=3 Card=499 Bytes=1005485)
1    0   COUNT (STOPKEY)
2    1     MAT_VIEW REWRITE ACCESS (FULL) OF ‘AO2_MV’ (MAT_VIEW REWRITE) (Cost=3 Card=25929 Bytes=52246935)
1.4s 0 499

 

The FBI result almost suggests – comparing with the previous query – that if we had included the function call in the where clause, we could have leveraged the FBI “cache”. Let’s see about that. Well it turns out that queries like:

 select object_id
 ,      calculate_stuff(object_name, object_type)
 from   ao1
 where rownum < 500
 and   (calculate_stuff(object_name, object_type) like 'Table%' or 1=1)

still make the function call for all records returned. So we cannot leverage the FBI if we did not explicitly select records based on the expression the FBI was based on. The FBI is just to smart for its own good you could say. Well, that reminds me: we use a hint to try to enforce usage of the FBI:

select /*+ INDEX (ao1 calc_stuff_fbi) */object_id
,      calculate_stuff(object_name, object_type) 
from   ao1
where rownum < 500
/

The CBO happily ignores my hint. So still the 499 function calls.

Or-ring the results together

The next query filters on records that either have TABLE as their OBJECT_TYPE value or have a function result like ‘Synon%’. For each record returned, we want to see the function result.

select object_id
,      calculate_stuff(object_name, object_type) 
from   ao1
where  object_type ='TABLE'
or     calculate_stuff(object_name, object_type) like 'Synon%' 
/

We wonder how many function executions are required:

Execution Plan Timing #Function Calls #Records returned
Function Based Index  0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=56 Card=1438 Bytes=58958)

1    0   TABLE ACCESS (FULL) OF ‘AO1’ (TABLE) (Cost=56 Card=1438 Bytes=58958)

4m3s 37706 12727
Materialized View 0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=289 Card=752 Bytes
=1533328)

1    0   CONCATENATION
2    1     HASH JOIN (Cost=199 Card=579 Bytes=1180581)
3    2       TABLE ACCESS (FULL) OF ‘AO2’ (TABLE) (Cost=49 Card=174
80 Bytes=419520)

4    2       MAT_VIEW REWRITE ACCESS (BY INDEX ROWID) OF ‘AO2_MV’ (
MAT_VIEW REWRITE) (Cost=38 Card=694 Bytes=1398410)

5    4         INDEX (RANGE SCAN) OF ‘CALC_STUFF_INDEX’ (INDEX) (Co
st=3 Card=125)

6    1     HASH JOIN (Cost=90 Card=173 Bytes=352747)
7    6       TABLE ACCESS (FULL) OF ‘AO2’ (TABLE) (Cost=50 Card=175
Bytes=4200)

8    6       MAT_VIEW REWRITE ACCESS (FULL) OF ‘AO2_MV’ (MAT_VIEW R
EWRITE) (Cost=40 Card=694 Bytes=1398410)

18.0s 0 12727

Aggregation
of the Function Results

Just to analyze all common situations, let’s see if aggegrating the function results has an unexpected behavior:

select object_type
,      max(calculate_stuff(object_name, object_type) )
from   ao1
group
by     object_type
/

And here are the results for this query:

Execution Plan Timing #Function Calls #Records returned
Function Based Index 0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=52 Card=24162 Bytes=676536)
1    0   SORT (GROUP BY) (Cost=52 Card=24162 Bytes=676536)
2    1     TABLE ACCESS (FULL) OF ‘AO1’ (TABLE) (Cost=49 Card 24162 Bytes=676536)
2m6s 24999 18
Materialized View 0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=420 Card=20945 Byt
es=42706855)

1    0   SORT (GROUP BY) (Cost=420 Card=20945 Bytes=42706855)
2    1     MERGE JOIN (Cost=418 Card=20945 Bytes=42706855)
3    2       MAT_VIEW REWRITE ACCESS (BY INDEX ROWID) OF ‘AO2_MV’ (
MAT_VIEW REWRITE) (Cost=219 Card=25929 Bytes=52246935)

4    3         INDEX (FULL SCAN) OF ‘AO2_PK1’ (INDEX (UNIQUE)) (Cos
t=60 Card=25929)

5    2       SORT (JOIN) (Cost=199 Card=20945 Bytes=502680)
6    5         TABLE ACCESS (FULL) OF ‘AO2’ (TABLE) (Cost=49 Card=2
0945 Bytes=502680)

1.4s 0 18

 

Concattenating the function result – any FBI leverage?

The last query we take a look at uses the Function Result to simply concattenate it with another value. We already know the FBI won’t do us any good in this situation. Let’s find out about the Materialized Vie. Here is the query:

select 'Object Id: '||object_id
     ||' ('
     ||substr( calculate_stuff(object_name, object_type)
             , 1
             , instr(calculate_stuff(object_name, object_type),':')
             )
     ||')' Object_Id_and_Type
from   ao1
where  rownum< 500
/

and here are the results:

Execution Plan Timing #Function Calls #Records returned
Function Based Index 0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=3 Card=499 Bytes=20459)
1    0   COUNT (STOPKEY)
2    1     TABLE ACCESS (FULL) OF ‘AO1’ (TABLE) (Cost=3 Card=24162 Bytes=990642)
5.8s 499 499
Materialized View 0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=3 Card=499 Bytes=1
005485)

1    0   COUNT (STOPKEY)
2    1     MAT_VIEW REWRITE ACCESS (FULL) OF ‘AO2_MV’ (MAT_VIEW REW
RITE) (Cost=3 Card=25929 Bytes=52246935)

1s 0 499

 

Conclusions

The Function Based Index provides very efficient query performance improvements when we filter by an expression and (possibly) also want to return that expression. The FBI is a very lean structure with almost the lowest possible data duplication. If we want to find records that have a function result equal or similar to a certain value, the FBI is superior. (and I have to confess I was surprised to see that in the second query using LIKE on the Function Result the FBI could still be leveraged!).

The Materialized View approach is not as lightweight. It requires a Materialized View Log. And it stores a complete – though lean – record for every record in the base table. Adding an index on this Materialized View adds even more storage requirements as well as DML overhead (probably very slight). However, the precalculated values in the Materialized View could be leveraged in every scenario we went through. After creating the Materialized View, we never again had to make a single Function Call!

The over all conclusion can be that in many circumstances, Materialized Views can be a good alternative to Function Based Indexes. And – not something we specifically addressed in this article – MVs are a very good compromise between denormalization in the data design and pragmatical & performant data retrieval. We do not have to clutter our tables with column containing redundant – denormalized, derived, calculated – data, yet we can still benefit from the performance improvements such denormalization would give us!

2 Comments

  1. karl May 23, 2006
  2. David Aldridge May 18, 2006