Queries that contain calls to user defined PL/SQL Functions can be performance killers. If a function call is part of the WHERE clause, depending on the execution plan it can be performed for every record that is considered for returning in the query result. For large tables that can lead to thousands to even million of Function invocations (executions and SQL to PL/SQL context switches) in a single query. Needless to say this might add up. Especially if the function itself is non trivial and has to perform derivations or calculations that are not infinitely fast.
Since Oracle 8i we can make use of Function Based Indexes to alleviate some of our distress. With FBI the database pre-calculates the function on each record in the table and stores the outcome in an Index Structure. When the query tries to find records that have a specific result for the function call, the Cost Based Optimizer can decide to use the FBI to directly go to the records that satisfy that query condition. However, this will only help if we have an equality search (where function-result = value). If we want to find records that have values similar to or larger than the function result, the FBI cannot be used. Or if we want to query a number of records from the table, not filtering on the function result, but including the function result for each record, we cannot simply find that function result from the precalculated set of values recorded in the FBI.
With the advent of Materialized Views and the ever increasing query rewrite capabilities, it seems that Materialized Views can take over from or at least complement the Function Based Index functionality. In this article we will investigate what Materialized Views have to offer and how they hold up against the Function Based Index.
Let’s first create a test-table to run our queries against. Inspired by countless articles by Tom Kyte and Alex Nuijten I will create a table based on ALL_OBJECTS:
create table ao1 as select object_name , object_type , last_ddl_time , owner , object_id from all_objects where rownum < 25000 / alter table ao1 add constraint ao_pk primary key (object_id) / select count(*) from ao1 / COUNT(*) ---------- 24999
This table contains 25000 records – enough to get some meaningful results. The function to be used for our experiment is the following:
create or replace function calculate_stuff ( p_name in varchar2 , p_type in varchar2 ) return varchar2 deterministic is begin keep_count.g_ctr:= keep_count.g_ctr + 1; for i in 1..100000 loop null; end loop; -- slow it down a little return initcap(p_type)||': '||lower(p_name); end; /
It does nothing special: concattenating three string values – and spending some time in an idle loop to make the function execution somewhat non-trivial timewise. The package it calls – keep_count.g_ctr – is a package I have used to keep track of the number of function invocations.
We will first set up the Function Based Index as well as the Materialized View. Note that all queries in the remainer of this article are executed against AO1 to test the Function Based Index and against AO2 to try the Materialized View approach. We create table AO2 in exactly the same way as AO1:
create table ao2 as select object_name , object_type , last_ddl_time , owner , object_id from all_objects where rownum < 25000 / alter table ao2 add constraint ao2_pk primary key (object_id) /
Setting up the Function Based Index
Time to create the Function Based Index:
SQL> create index calc_stuff_fbi on ao1( calculate_stuff(object_name, object_type) ) 2 / Index created. Elapsed: 00:02:10.50
Creating the index takes little over 2 minutes for our 25000 records.
Setting up the Materialized View
Our Materialized View is very simple: it contains the primary key and the function result:
create materialized view ao2_mv refresh fast on commit with primary key ENABLE QUERY REWRITE as select object_id , calculate_stuff(object_name, object_type) as calc_stuff from ao2 / Materialized view created. Elapsed: 00:02:15.86
In order to create this query as Refresh On Commit, we need to create a Materialized View on the underlying table:
create materialized view log on ao2 / Materialized view log created. Elapsed: 00:00:01.11
Equality Search – The Function Based Index can Shine!
The query we compare first – the equality search on the Function outcome:
select object_id , calculate_stuff(object_name, object_type) from ao1 where calculate_stuff(object_name, object_type) = 'Table: dual' /
Here are the results for the two situations:
Execution Plan | Timing | #Function Calls | #Records returned | |
Function Based Index | 0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=2 Card=242 Bytes=9922) 1 0 TABLE ACCESS (BY INDEX ROWID) OF ‘AO1’ (TABLE) (Cost=2 Card=242 Bytes=9922) 2 1 INDEX (RANGE SCAN) OF ‘CALC_STUFF_FBI’ (INDEX) (Cost=1 Card=97) |
0.1s | 0 | 1 |
Materialized View | 0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=40 Card=2 Bytes=40 30) 1 0 MAT_VIEW REWRITE ACCESS (FULL) OF ‘AO2_MV’ (MAT_VIEW REWRITE) (Cost=40 Card=2 Bytes=4030) |
0.3s | 0 | 1 |
Now we can go a little beyond this result for the Materialized View. We can create an index on the column calc_stuff in the materialized view:
create index calc_stuff_index on ao2_mv (calc_stuff) /
Index created.
Elapsed: 00:00:00.93
The results for the query are now:
Execution Plan | Timing | #Function Calls | #Records returned | |
Materialized View | 0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=2 Card=1 Bytes=2015) 1 0 MAT_VIEW REWRITE ACCESS (BY INDEX ROWID) OF ‘AO2_MV’ (MAT_VIEW REWRITE) (Cost=2 Card=1 Bytes=2015) 2 1 INDEX (RANGE SCAN) OF ‘CALC_STUFF_INDEX’ (INDEX) (Cost=1 Card=1) |
0.1s | 0 | 1 |
Querying on similarity – not equality: What good can the FBI do now?
This next query queries for records whose Function Result is similar to a certain value. Will we be able to leverage the fact that the function results are pre-calculated and available in the FBI?
select object_id , calculate_stuff(object_name, object_type) from ao1 where calculate_stuff(object_name, object_type) like 'Table:%'
/
And the results:
Execution Plan | Timing | #Function Calls | #Records returned | |
Function Based Index | Execution Plan ———————————————————- 0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=48 Card=1208 Bytes=49528) 1 0 TABLE ACCESS (BY INDEX ROWID) OF ‘AO1’ (TABLE) (Cost=48 Card=1208 Bytes=49528) 2 1 INDEX (RANGE SCAN) OF ‘CALC_STUFF_FBI’ (INDEX) (Cost=3 Card=217) |
0.3s | 0 | 20 |
Materialized View | 0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=17 Card=20 Bytes=40300) 1 0 MAT_VIEW REWRITE ACCESS (BY INDEX ROWID) OF ‘AO2_MV’ (MAT_VIEW REWRITE) (Cost=17 Card=20 Bytes=40300) 2 1 INDEX (RANGE SCAN) OF ‘CALC_STUFF_INDEX’ (INDEX) (Cost=2 Card=20) |
0.3s | 0 | 20 |
Querying for the Function Result – no where clause referencing the function at all
here
In this case we look at a query that does not filter on the function result but still wants to return that result:
select object_id , calculate_stuff(object_name, object_type) from ao1 where rownum < 500 /
We wonder of course whether we can make use of the precalculated values stored in the Function Based Index:
Execution Plan | Timing | #Function Calls | #Records returned | |
Function Based Index | 0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=3 Card=499 Bytes=20459) 1 0 COUNT (STOPKEY) 2 1 TABLE ACCESS (FULL) OF ‘AO1’ (TABLE) (Cost=3 Card=24162 Bytes=990642) |
4.4s | 499 | 499 |
Materialized View | 0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=3 Card=499 Bytes=1005485) 1 0 COUNT (STOPKEY) 2 1 MAT_VIEW REWRITE ACCESS (FULL) OF ‘AO2_MV’ (MAT_VIEW REWRITE) (Cost=3 Card=25929 Bytes=52246935) |
1.4s | 0 | 499 |
The FBI result almost suggests – comparing with the previous query – that if we had included the function call in the where clause, we could have leveraged the FBI “cache”. Let’s see about that. Well it turns out that queries like:
select object_id , calculate_stuff(object_name, object_type) from ao1 where rownum < 500 and (calculate_stuff(object_name, object_type) like 'Table%' or 1=1)
still make the function call for all records returned. So we cannot leverage the FBI if we did not explicitly select records based on the expression the FBI was based on. The FBI is just to smart for its own good you could say. Well, that reminds me: we use a hint to try to enforce usage of the FBI:
select /*+ INDEX (ao1 calc_stuff_fbi) */object_id , calculate_stuff(object_name, object_type) from ao1 where rownum < 500 /
The CBO happily ignores my hint. So still the 499 function calls.
Or-ring the results together
The next query filters on records that either have TABLE as their OBJECT_TYPE value or have a function result like ‘Synon%’. For each record returned, we want to see the function result.
select object_id , calculate_stuff(object_name, object_type) from ao1 where object_type ='TABLE' or calculate_stuff(object_name, object_type) like 'Synon%' /
We wonder how many function executions are required:
Execution Plan | Timing | #Function Calls | #Records returned | |
Function Based Index | 0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=56 Card=1438 Bytes=58958)
1 0 TABLE ACCESS (FULL) OF ‘AO1’ (TABLE) (Cost=56 Card=1438 Bytes=58958) |
4m3s | 37706 | 12727 |
Materialized View | 0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=289 Card=752 Bytes =1533328) 1 0 CONCATENATION 4 2 MAT_VIEW REWRITE ACCESS (BY INDEX ROWID) OF ‘AO2_MV’ ( 5 4 INDEX (RANGE SCAN) OF ‘CALC_STUFF_INDEX’ (INDEX) (Co 6 1 HASH JOIN (Cost=90 Card=173 Bytes=352747) 8 6 MAT_VIEW REWRITE ACCESS (FULL) OF ‘AO2_MV’ (MAT_VIEW R |
18.0s | 0 | 12727 |
Aggregation
of the Function Results
Just to analyze all common situations, let’s see if aggegrating the function results has an unexpected behavior:
select object_type , max(calculate_stuff(object_name, object_type) ) from ao1 group by object_type /
And here are the results for this query:
Execution Plan | Timing | #Function Calls | #Records returned | |
Function Based Index | 0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=52 Card=24162 Bytes=676536) 1 0 SORT (GROUP BY) (Cost=52 Card=24162 Bytes=676536) 2 1 TABLE ACCESS (FULL) OF ‘AO1’ (TABLE) (Cost=49 Card 24162 Bytes=676536) |
2m6s | 24999 | 18 |
Materialized View | 0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=420 Card=20945 Byt es=42706855) 1 0 SORT (GROUP BY) (Cost=420 Card=20945 Bytes=42706855) 4 3 INDEX (FULL SCAN) OF ‘AO2_PK1’ (INDEX (UNIQUE)) (Cos 5 2 SORT (JOIN) (Cost=199 Card=20945 Bytes=502680) |
1.4s | 0 | 18 |
Concattenating the function result – any FBI leverage?
The last query we take a look at uses the Function Result to simply concattenate it with another value. We already know the FBI won’t do us any good in this situation. Let’s find out about the Materialized Vie. Here is the query:
select 'Object Id: '||object_id ||' (' ||substr( calculate_stuff(object_name, object_type) , 1 , instr(calculate_stuff(object_name, object_type),':') ) ||')' Object_Id_and_Type from ao1 where rownum< 500 /
and here are the results:
Execution Plan | Timing | #Function Calls | #Records returned | |
Function Based Index | 0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=3 Card=499 Bytes=20459) 1 0 COUNT (STOPKEY) 2 1 TABLE ACCESS (FULL) OF ‘AO1’ (TABLE) (Cost=3 Card=24162 Bytes=990642) |
5.8s | 499 | 499 |
Materialized View | 0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=3 Card=499 Bytes=1 005485) 1 0 COUNT (STOPKEY) |
1s | 0 | 499 |
Conclusions
The Function Based Index provides very efficient query performance improvements when we filter by an expression and (possibly) also want to return that expression. The FBI is a very lean structure with almost the lowest possible data duplication. If we want to find records that have a function result equal or similar to a certain value, the FBI is superior. (and I have to confess I was surprised to see that in the second query using LIKE on the Function Result the FBI could still be leveraged!).
The Materialized View approach is not as lightweight. It requires a Materialized View Log. And it stores a complete – though lean – record for every record in the base table. Adding an index on this Materialized View adds even more storage requirements as well as DML overhead (probably very slight). However, the precalculated values in the Materialized View could be leveraged in every scenario we went through. After creating the Materialized View, we never again had to make a single Function Call!
The over all conclusion can be that in many circumstances, Materialized Views can be a good alternative to Function Based Indexes. And – not something we specifically addressed in this article – MVs are a very good compromise between denormalization in the data design and pragmatical & performant data retrieval. We do not have to clutter our tables with column containing redundant – denormalized, derived, calculated – data, yet we can still benefit from the performance improvements such denormalization would give us!
Very interesting blog!
Karl
Ah, very interesting.
An interesting hybrid solution would be to place the MV table in a hash cluster, thus getting co-location of the same value in a way that approaches the index and fast lookup on equality predicates.
Plenty of food for thought in here though Lucas.