Testing Knowledge of Database Performance – Is this a good Execution Plan?

7

Database Performance Analysis and Tuning of queries and Database Design have never been my specialty. As of late, I have been reading a lot in this area, on Wait Statistics and events 10046 and 10053, on different strategies for table design and index design, on partitioning and clustering and many other more or less advanced topics. But the basics are also under construction. What does an execution plan tell me again? How can I interpret the plans of the CBO and how can I recognize a suboptimal plan or a plan based on wrong information. More in general: what does the CBO know and do? What does it mean to execute a query?

Doing database performance tuning is not my daily job. But it is something every self conscious Oracle professional should be well versed in. Besides, I am doing a lot of job interviews with Oracle professional applying for a job at AMIS. It happens quite frequently that they state to know quite a bit about database performance and query tuning. Then of course I try to assess exactly how much they know and have done in this area. Simple questions are whether they use TKPROF and what information they gather from Database Traces (that eliminates most of them by the way, as hardly anyone seems to get beyond a simple Explain Plan and using Trace files and TKPROF is only for the elite). Then I try to spar a little on the CBO, Hints, Execution Plan, Index Design and so on.

As we were recently designing a four week training program for junior Oracle Developers, it struck me that a very simple question could be a good first test as to the awareness with regard to performance related knowledge and experience of people I interview: is this a good execution plan?
....
I will show a simple enough situation: two tables – ORDERS and PRODUCTS – and a query that joins the two.

Something like:

explain plan for<br />select p.name<br />,      o.price<br />,      o.quantity<br />,      o.order_date<br />from   orders o<br />,      products p<br />where  o.order_date between to_date('18-12-2005','DD-MM-YYYY') <br />                            and<br />                            to_date('25-12-2005','DD-MM-YYYY') <br />and    o.product_id = p.id<br />and    p.name = 'Fair Trade Coffee'<br />/<br />&nbsp;

Next I will show the Execution Plan for that query and ask whether it is a good plan. The plan looks like:

The question is whether this plan is good or sub-optimal. The correct answer of course would be: I do not have enough information about the two tables involved to judge the plan. I cannot therefore tell you if the plan is good or bad. If you do not know more about these tables, the constraints and the available indexes, you cannot say whether this plan is good or bad. The CBO thinks each table contains one row. If that is correct, a Full Table Scan on both tables followed by a Hash Join makes perfect sense. If by chance that Cardinality information is erroneous, this plan could be awful. However, in the absence of Indexes, Clustering, Index Organized Tables, Partitions and Materialized Views, full table scans are probably unavoidable.

To which my response would be: tell me which information you need to decide wether this plan is good or bad? The ideal candidate would be able to name at least some of the following pieces of information that would be useful to evaluate the plan:

  • how many records are in ORDERS and PRODUCTS
  • are there any indexes on the columns accessed in this query, particularly the ones in the where clause
  • if there is an index on order_date: what is the distribution of the order_date like? Is more than say 5% likely to fall in the requested timeframe?
  • if there is an index on Product.name: is the index perhaps a Unique one? Or else is there a only a small percentage of Products with the name Fair Trade Coffee?
  • are there any supporting materialized views?
  • is one of the tables partitioned – ORDERS by order_date and/or PRODUCTS by Name?
  • is one of the tables perhaps Index Organized?
  • is ORDERS clustered by Product?
  • should the query be optimized for FIRST_ROW or ALL_ROWS?
  • how fast are multi-block reads vs. single block reads on your hardware (and how many rows fit into a block)? (very fast multi-block reads would favor full-table scans)

Let’s see a simple overview of how these pieces of information may change the execution plan.

Initial Situation

We start simple enough with two tables that do not contain any data. Only we pretend that they do, to give the CBO something to chew on:

create table orders<br />( id            number(10,0) not null<br />, product_id    number(10,0) not null<br />, quantity      number(5,0)<br />, price         number(8,2)<br />, discount      number(8,2)<br />, customer_id   number(8,0)<br />, order_date    date         not null<br />, shipping_date date<br />)<br />/<br /><br />create table products<br />( id            number(10,0) not null<br />, name          varchar2(100)<br />, weight        number(8,3)<br />, color         varchar2(20)<br />, price         number(6,2)<br />, full_color_image  blob<br />, description   varchar2(4000)<br />)<br />/<br /><br />begin<br />  dbms_stats.DELETE_TABLE_STATS<br />  ( ownname =&gt; 'BOOK'<br />  , tabname =&gt; 'ORDERS'<br />  );<br />  dbms_stats.DELETE_TABLE_STATS<br />  ( ownname =&gt; 'BOOK'<br />  , tabname =&gt; 'PRODUCTS'<br />  );<br />end;<br />/<br /><br />To not have my examples spoilt by that smart-ass CBO, I turn off dynamic sampling - which would reveal that my tables do not in fact contain any data at all: <br /><br />alter session set optimizer_dynamic_sampling=0<br />/ <br /> <br />explain plan for<br />select p.name<br />,      o.price<br />,      o.quantity<br />,      o.order_date<br />from   orders o<br />,      products p<br />where  o.order_date between to_date('18-12-2005','DD-MM-YYYY') <br />                            and<br />                            to_date('25-12-2005','DD-MM-YYYY') <br />and    o.product_id = p.id<br />and    p.name = 'Fair Trade Coffee'<br />/<br /><br />SELECT *<br />FROM TABLE(DBMS_XPLAN.DISPLAY)<br />/<br /><br />

In this situation we get the Execution Plan shown before for the query that tries to find all orders for Fair Trade Coffee in the days prior to Christmas last year.

Let’s give the CBO some information about the numbers of records in these two tables:

begin<br />  dbms_stats.set_table_stats<br />  ( ownname =&gt; 'BOOK'<br />  , tabname =&gt; 'ORDERS'<br />  , numrows =&gt; 100<br />  , numblks =&gt; 35<br />  );<br />  dbms_stats.set_table_stats<br />  ( ownname =&gt; 'BOOK'<br />  , tabname =&gt; 'PRODUCTS'<br />  , numrows =&gt; 500<br />  , numblks =&gt; 120<br />  );<br />end;<br />/<br /><br />

Well, the CBO is not too impressed, judging from this execution plan:

Execution Plan<br />----------------------------------------------------------<br />   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=38 Card=1 Bytes=113)<br />   1    0   HASH JOIN (Cost=38 Card=1 Bytes=113)<br />   2    1     TABLE ACCESS (FULL) OF 'ORDERS' (TABLE) (Cost=9 Card=1 Bytes=48)<br />   3    1     TABLE ACCESS (FULL) OF 'PRODUCTS' (TABLE) (Cost=28 Card=5 Bytes=325)<br /><br />

The only changes are the estimates for the Cardinality and the number of Bytes processed. And why should it be impressed? It is still only a small number of records. Apparently, the CBO thinks that 5 out of 500 Products will satisfy the search criterium as well as 1 out of 100 Orders. So without further knowledge, it assumes a 1% match for equality and range conditions.

Of course there is no real alternative Execution Plan available to the CBO: there are no supporting objects such as indexes to approach the tables.

If we step up the statistics a little, we see an even higher cost, but as expected there is still the same execution plan:

begin<br />  dbms_stats.set_table_stats<br />  ( ownname =&gt; 'BOOK'<br />  , tabname =&gt; 'ORDERS'<br />  , numrows =&gt; 1000000<br />  , numblks =&gt; 350000<br />  );<br />  dbms_stats.set_table_stats<br />  ( ownname =&gt; 'BOOK'<br />  , tabname =&gt; 'PRODUCTS'<br />  , numrows =&gt; 500000<br />  , numblks =&gt; 12000<br />  );<br />end;<br />/<br /><br />Execution Plan<br />----------------------------------------------------------<br />   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=79468 Card=2500 By<br />          tes=282500)<br />   1    0   HASH JOIN (Cost=79468 Card=2500 Bytes=282500)<br />   2    1     TABLE ACCESS (FULL) OF 'ORDERS' (TABLE) (Cost=76823 Card=2500 Bytes=120000)<br />   3    1     TABLE ACCESS (FULL) OF 'PRODUCTS' (TABLE) (Cost=2644 Card=5000 Bytes=325000)    <br />&nbsp;  

Constraints and Indexes 

Let’s throw in some constraints: primary key constraints on the ID columns (and the associated supporting Unique Indexes) as well as a foreign key constraint from orders.product_id to products.id.

alter table products<br />add constraint product_pk primary key (id)<br />/<br />alter table orders<br />add constraint orders_pk primary key (id)<br />/<br />alter table orders<br />add constraint orders_products_fk foreign key (product_id) <br />references products (id)<br />/<br /><br />

Now with the latest statistics, see if we get the same execution plan:

Well, the plan changed. The estimated cost is slightly lower – whatever slightly means in terms of cost as we do not really know of the unit of cost is. The CBO proposes to do a full access on the ORDERS table. It expects to find about 2500 rows that satisfy the query criterium. These 2500 rows are sorted (step 4), in preparation for the MERGE JOIN (step 1). It also does an INDEX FULL SCAN (step 3) to get all 500K Products.Id values (sorted), ready to be used in the MERGE JOIN. Using this sorted list of ID values, it inspects all PRODUCTS to see whether their name is satisfying the Name filter (step2). The CBO expects 5000 (again, 1%) records that live up to the filter on Product Name. The final step is the Merge Join on Product.Id = Order.Pr
oduct_
Id to bring the selected records together. I am somewhat surprised that this is the plan – it seems that a full table access on PRODUCTS followed by a Sort on Id of the resulting 5000 rows would have been seen as more efficient.

begin<br />  dbms_stats.set_table_stats<br />  ( ownname =&gt; 'BOOK'<br />  , tabname =&gt; 'PRODUCTS'<br />  , numrows =&gt; 50000000<br />  , numblks =&gt; 1200000<br />  );<br />end;<br />/<br />&nbsp;

If we suggest that the PRODUCTS table is much larger than before, now having 50M records, the CBO changes its tune:

PLAN_TABLE_OUTPUT<br />---------------------------------------------------------------------------------------------------<br /><br />-------------------------------------------------------------------------------------------<br />| Id  | Operation                    | Name       | Rows  | Bytes | Cost (%CPU)| Time     |<br />-------------------------------------------------------------------------------------------<br />|   0 | SELECT STATEMENT             |            |  3165 |   349K| 79327   (1)| 00:15:52 |<br />|   1 |  NESTED LOOPS                |            |  3165 |   349K| 79327   (1)| 00:15:52 |<br />|*  2 |   TABLE ACCESS FULL          | ORDERS     |  2500 |   117K| 76823   (1)| 00:15:22 |<br />|*  3 |   TABLE ACCESS BY INDEX ROWID| PRODUCTS   |     1 |    65 |     1   (0)| 00:00:01 |<br />|*  4 |    INDEX UNIQUE SCAN         | PRODUCT_PK |     1 |       |     0   (0)| 00:00:01 |<br />-------------------------------------------------------------------------------------------<br /><br />Predicate Information (identified by operation id):<br />---------------------------------------------------<br /><br />   2 - filter(&quot;O&quot;.&quot;ORDER_DATE&quot;&gt;=TO_DATE('2005-12-18 00:00:00', 'yyyy-mm-dd<br />              hh24:mi:ss') AND &quot;O&quot;.&quot;ORDER_DATE&quot;&lt;=TO_DATE('2005-12-25 00:00:00', 'yyyy-mm-dd<br />              hh24:mi:ss'))<br />   3 - filter(&quot;P&quot;.&quot;NAME&quot;='Fair Trade Coffee')<br />   4 - access(&quot;O&quot;.&quot;PRODUCT_ID&quot;=&quot;P&quot;.&quot;ID&quot;)<br />

 

It full scans the ORDERS table, filters on the ORDER_DATE, finds the associated PRODUCT, using the index on its Primary Key, and then filter on the Name of the Product. 

begin<br />  dbms_stats.set_table_stats<br />  ( ownname =&gt; 'BOOK'<br />  , tabname =&gt; 'PRODUCTS'<br />  , numrows =&gt; 5000<br />  , numblks =&gt; 1200<br />  );<br />end;<br />/<br />&nbsp;

With only 5000 rows in the PRODUCTS table, the CBO ignores the index and goes in for the Full Table Scans:

PLAN_TABLE_OUTPUT<br />---------------------------------------------------------------------------------------<br />Plan hash value: 3353409128<br /><br />-------------------------------------------------------------------------------<br />| Id  | Operation          | Name     | Rows  | Bytes | Cost (%CPU)| Time     |<br />-------------------------------------------------------------------------------<br />|   0 | SELECT STATEMENT   |          |    63 |  7119 | 77089   (1)| 00:15:26 |<br />|*  1 |  HASH JOIN         |          |    63 |  7119 | 77089   (1)| 00:15:26 |<br />|*  2 |   TABLE ACCESS FULL| PRODUCTS |    50 |  3250 |   265   (1)| 00:00:04 |<br />|*  3 |   TABLE ACCESS FULL| ORDERS   |  2500 |   117K| 76823   (1)| 00:15:22 |<br />-------------------------------------------------------------------------------<br /><br />Predicate Information (identified by operation id):<br />---------------------------------------------------<br /><br />   1 - access(&quot;O&quot;.&quot;PRODUCT_ID&quot;=&quot;P&quot;.&quot;ID&quot;)<br />   2 - filter(&quot;P&quot;.&quot;NAME&quot;='Fair Trade Coffee')<br />   3 - filter(&quot;O&quot;.&quot;ORDER_DATE&quot;&gt;=TO_DATE('2005-12-18 00:00:00',<br />              'yyyy-mm-dd hh24:mi:ss') AND &quot;O&quot;.&quot;ORDER_DATE&quot;&lt;=TO_DATE('2005-12-25<br />              00:00:00', 'yyyy-mm-dd hh24:mi:ss'))<br />

 

 

Let’s add a supporting index (non-unique) on the product_id (foreign key) column:

create index orders_product_id_idx on orders(product_id)<br />/<br /><br />

The CBO uses this index to get the Orders for all Products it has found satisfying the Name Filter:

PLAN_TABLE_OUTPUT<br />----------------------------------------------------------------------------------------------------<br /><br />----------------------------------------------------------------------------------------------------<br />| Id  | Operation                   | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |<br />----------------------------------------------------------------------------------------------------<br />|   0 | SELECT STATEMENT            |                       |    63 |  7119 |   265   (1)| 00:00:04 |<br />|*  1 |  TABLE ACCESS BY INDEX ROWID| ORDERS                |     1 |    48 |     0   (0)| 00:00:01 |<br />|   2 |   NESTED LOOPS              |                       |    63 |  7119 |   265   (1)| 00:00:04 |<br />|*  3 |    TABLE ACCESS FULL        | PRODUCTS              |    50 |  3250 |   265   (1)| 00:00:04 |<br />|*  4 |    INDEX RANGE SCAN         | ORDERS_PRODUCT_ID_IDX |   508 |       |     0   (0)| 00:00:01 |<br />----------------------------------------------------------------------------------------------------<br /><br />Predicate Information (identified by operation id):<br />---------------------------------------------------<br /><br />   1 - filter(&quot;O&quot;.&quot;ORDER_DATE&quot;&gt;=TO_DATE('2005-12-18 00:00:00', 'yyyy-mm-dd hh24:mi:ss') AND<br />              &quot;O&quot;.&quot;ORDER_DATE&quot;&lt;=TO_DATE('2005-12-25 00:00:00', 'yyyy-mm-dd hh24:mi:ss'))<br />   3 - filter(&quot;P&quot;.&quot;NAME&quot;='Fair Trade Coffee')<br />   4 - access(&quot;O&quot;.&quot;PRODUCT_ID&quot;=&quot;P&quot;.&quot;ID&quot;)<br />

 

 

If we pump up the PRODUCTS table, back to 500k rows: 

begin<br />  dbms_stats.set_table_stats<br />  ( ownname =&gt; 'BOOK'<br />  , tabname =&gt; 'PRODUCTS'<br />  , numrows =&gt; 500000<br />  , numblks =&gt; 120000<br />  );<br />end;<br />/<br />&nbsp;

We get exactly the same plan.

 

PLAN_TABLE_OUTPUT<br />----------------------------------------------------------------------------------------------------<br /><br />----------------------------------------------------------------------------------------------------<br />| Id  | Operation                   | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |<br />----------------------------------------------------------------------------------------------------<br />|   0 | SELECT STATEMENT            |                       |  2500 |   275K| 26340   (1)| 00:05:17 |<br />|*  1 |  TABLE ACCESS BY INDEX ROWID| ORDERS                |     1 |    48 |     0   (0)| 00:00:01 |<br />|   2 |   NESTED LOOPS              |                       |  2500 |   275K| 26340   (1)| 00:05:17 |<br />|*  3 |    TABLE ACCESS FULL        | PRODUCTS              |  5000 |   317K| 26340   (1)| 00:05:17 |<br />|*  4 |    INDEX RANGE SCAN         | ORDERS_PRODUCT_ID_IDX |   200 |       |     0   (0)| 00:00:01 |<br />----------------------------------------------------------------------------------------------------<br />

 

If for some reason we do not like the performance of this query, there are easy steps to take. One obvious step is a Unique Index on Products.Name. It seems relatively safe to predict that for this query, access through that index would be quite attractive, followed by a access of the Orders through the index on the Product_Id column. Let’s see if that works out in actual practice:

create unique index products_name_idx on products(name)<br />/<br /><br />

We see the following plan: an index unique scan for the Product Name, returning an estimated one row (since it is a unique index, more than one can not happen). It loops over that one row, accessing the index on the Orders.Product_id column. The Ord

ers are than filtered by the order_date (step 4) and the result (an estimated one row) is returned. 

PLAN_TABLE_OUTPUT<br />----------------------------------------------------------------------------------------------------<br />Plan hash value: 1899464088<br /><br />----------------------------------------------------------------------------------------------------<br />| Id  | Operation                    | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |<br />----------------------------------------------------------------------------------------------------<br />|   0 | SELECT STATEMENT             |                       |     1 |   113 |     0   (0)| 00:00:01 |<br />|   1 |  NESTED LOOPS                |                       |     1 |   113 |     0   (0)| 00:00:01 |<br />|   2 |   TABLE ACCESS BY INDEX ROWID| PRODUCTS              |     1 |    65 |     0   (0)| 00:00:01 |<br />|*  3 |    INDEX UNIQUE SCAN         | PRODUCTS_NAME_IDX     |     1 |       |     0   (0)| 00:00:01 |<br />|*  4 |   TABLE ACCESS BY INDEX ROWID| ORDERS                |     1 |    48 |     0   (0)| 00:00:01 |<br />|*  5 |    INDEX RANGE SCAN          | ORDERS_PRODUCT_ID_IDX |   200 |       |     0   (0)| 00:00:01 |<br />----------------------------------------------------------------------------------------------------<br /><br />Predicate Information (identified by operation id):<br />---------------------------------------------------<br /><br />   3 - access(&quot;P&quot;.&quot;NAME&quot;='Fair Trade Coffee')<br />   4 - filter(&quot;O&quot;.&quot;ORDER_DATE&quot;&gt;=TO_DATE('2005-12-18 00:00:00', 'yyyy-mm-dd hh24:mi:ss') AND<br />              &quot;O&quot;.&quot;ORDER_DATE&quot;&lt;=TO_DATE('2005-12-25 00:00:00', 'yyyy-mm-dd hh24:mi:ss'))<br />   5 - access(&quot;O&quot;.&quot;PRODUCT_ID&quot;=&quot;P&quot;.&quot;ID&quot;) <br />

 

With a smaller number of Products – which would seem the more natural situation – we will see a similar plan:

<br />begin<br />  dbms_stats.set_table_stats<br />  ( ownname =&gt; 'BOOK'<br />  , tabname =&gt; 'PRODUCTS'<br />  , numrows =&gt; 15000<br />  , numblks =&gt; 4800<br />  );<br />end;<br />/<br />

 

 

only with a larger number of Orders to be inspected for order_date compliance.

PLAN_TABLE_OUTPUT<br />----------------------------------------------------------------------------------------------------<br />Plan hash value: 1899464088<br /><br />----------------------------------------------------------------------------------------------------<br />| Id  | Operation                    | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |<br />----------------------------------------------------------------------------------------------------<br />|   0 | SELECT STATEMENT             |                       |     1 |   113 |     0   (0)| 00:00:01 |<br />|   1 |  NESTED LOOPS                |                       |     1 |   113 |     0   (0)| 00:00:01 |<br />|   2 |   TABLE ACCESS BY INDEX ROWID| PRODUCTS              |     1 |    65 |     0   (0)| 00:00:01 |<br />|*  3 |    INDEX UNIQUE SCAN         | PRODUCTS_NAME_IDX     |     1 |       |     0   (0)| 00:00:01 |<br />|*  4 |   TABLE ACCESS BY INDEX ROWID| ORDERS                |     1 |    48 |     0   (0)| 00:00:01 |<br />|*  5 |    INDEX RANGE SCAN          | ORDERS_PRODUCT_ID_IDX |   434 |       |     0   (0)| 00:00:01 |<br />----------------------------------------------------------------------------------------------------<br /><br />Predicate Information (identified by operation id):<br />---------------------------------------------------<br /><br />   3 - access(&quot;P&quot;.&quot;NAME&quot;='Fair Trade Coffee')<br />   4 - filter(&quot;O&quot;.&quot;ORDER_DATE&quot;&gt;=TO_DATE('2005-12-18 00:00:00', 'yyyy-mm-dd hh24:mi:ss') AND<br />              &quot;O&quot;.&quot;ORDER_DATE&quot;&lt;=TO_DATE('2005-12-25 00:00:00', 'yyyy-mm-dd hh24:mi:ss'))<br />   5 - access(&quot;O&quot;.&quot;PRODUCT_ID&quot;=&quot;P&quot;.&quot;ID&quot;) <br />

Column Statistics & Data Distribution 

If we tell the CBO a little about the distribution of date values in the order_date column, we will not see an immediate change in the plan, but certainly in the estimated cardinality:

begin<br />  DBMS_STATS.SET_COLUMN_STATS <br />  ( ownname      =&gt; 'BOOK'<br />  , tabname      =&gt; 'ORDERS'<br />  , colname      =&gt; 'ORDER_DATE'<br />  , distcnt      =&gt; 500 <br />  , density      =&gt; 1/500  --  1 out of 500 <br />  );<br />end;<br />/<br />PLAN_TABLE_OUTPUT<br />----------------------------------------------------------------------------------------------------<br />Plan hash value: 1899464088<br /><br />----------------------------------------------------------------------------------------------------<br />| Id  | Operation                    | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |<br />----------------------------------------------------------------------------------------------------<br />|   0 | SELECT STATEMENT             |                       |     1 |   113 |     0   (0)| 00:00:01 |<br />|   1 |  NESTED LOOPS                |                       |     1 |   113 |     0   (0)| 00:00:01 |<br />|   2 |   TABLE ACCESS BY INDEX ROWID| PRODUCTS              |     1 |    65 |     0   (0)| 00:00:01 |<br />|*  3 |    INDEX UNIQUE SCAN         | PRODUCTS_NAME_IDX     |     1 |       |     0   (0)| 00:00:01 |<br />|*  4 |   TABLE ACCESS BY INDEX ROWID| ORDERS                |     1 |    48 |     0   (0)| 00:00:01 |<br />|*  5 |    INDEX RANGE SCAN          | ORDERS_PRODUCT_ID_IDX |   533 |       |     0   (0)| 00:00:01 |<br />----------------------------------------------------------------------------------------------------<br /><br />Predicate Information (identified by operation id):<br />---------------------------------------------------<br /><br />   3 - access(&quot;P&quot;.&quot;NAME&quot;='Fair Trade Coffee')<br />   4 - filter(&quot;O&quot;.&quot;ORDER_DATE&quot;&gt;=TO_DATE('2005-12-18 00:00:00', 'yyyy-mm-dd hh24:mi:ss') AND<br />              &quot;O&quot;.&quot;ORDER_DATE&quot;&lt;=TO_DATE('2005-12-25 00:00:00', 'yyyy-mm-dd hh24:mi:ss'))<br />   5 - access(&quot;O&quot;.&quot;PRODUCT_ID&quot;=&quot;P&quot;.&quot;ID&quot;)<br /><br />

If we suggest a very focused data set in the ORDERS table – with just two months worth of data – we might eventually end up with a Full Table Scan on ORDERS as using the index does not pay off anymore. 

begin<br />  DBMS_STATS.SET_COLUMN_STATS <br />  ( ownname      =&gt; 'BOOK'<br />  , tabname      =&gt; 'ORDERS'<br />  , colname      =&gt; 'ORDER_DATE'<br />  , distcnt      =&gt; 50 <br />  , density      =&gt; 1/50 -- 1 out of 50 is OK<br />  );<br />end;<br />/<br />----------------------------------------------------------------------------------------------------<br />| Id  | Operation                    | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |<br />----------------------------------------------------------------------------------------------------<br />|   0 | SELECT STATEMENT             |                       |     2 |   226 |     0   (0)| 00:00:01 |<br />|   1 |  NESTED LOOPS                |                       |     2 |   226 |     0   (0)| 00:00:01 |<br />|   2 |   TABLE ACCESS BY INDEX ROWID| PRODUCTS              |     1 |    65 |     0   (0)| 00:00:01 |<br />|*  3 |    INDEX UNIQUE SCAN         | PRODUCTS_NAME_IDX     |     1 |       |     0   (0)| 00:00:01 |<br />|*  4 |   TABLE ACCESS BY INDEX ROWID| ORDERS                |     2 |    96 |     0   (0)| 00:00:01 |<br />|*  5 |    INDEX RANGE SCAN          | ORDERS_PRODUCT_ID_IDX |    90 |       |     0   (0)| 00:00:01 |<br />----------------------------------------------------------------------------------------------------<br /><br />Predicate Information (identified by operation id):<br />---------------------------------------------------<br /><br />   3 - access(&quot;P&quot;.&quot;NAME&quot;='Fair Trade Coffee')<br />   4 - fil

ter(&quot;O&quot;.&quot;ORDER_DATE&quot;&gt;=TO_DATE('2005-12-18 00:00:00', 'yyyy-mm-dd hh24:mi:ss') AND<br />              &quot;O&quot;.&quot;ORDER_DATE&quot;&lt;=TO_DATE('2005-12-25 00:00:00', 'yyyy-mm-dd hh24:mi:ss'))<br />   5 - access(&quot;O&quot;.&quot;PRODUCT_ID&quot;=&quot;P&quot;.&quot;ID&quot;) <br />

Were we to drop the unique index on Products.Name, we will see an effect – but only if there is a substantial number of PRODUCTS:
 

drop index products_name_idx<br />/<br /><br />PLAN_TABLE_OUTPUT<br />----------------------------------------------------------------------------------------------------<br /><br />----------------------------------------------------------------------------------------------------<br />| Id  | Operation                   | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |<br />----------------------------------------------------------------------------------------------------<br />|   0 | SELECT STATEMENT            |                       |  1587 |   175K| 10536   (1)| 00:02:07 |<br />|*  1 |  TABLE ACCESS BY INDEX ROWID| ORDERS                |     1 |    48 |     0   (0)| 00:00:01 |<br />|   2 |   NESTED LOOPS              |                       |  1587 |   175K| 10536   (1)| 00:02:07 |<br />|*  3 |    TABLE ACCESS FULL        | PRODUCTS              |  1500 | 97500 | 10536   (1)| 00:02:07 |<br />|*  4 |    INDEX RANGE SCAN         | ORDERS_PRODUCT_ID_IDX |    53 |       |     0   (0)| 00:00:01 |<br />----------------------------------------------------------------------------------------------------<br /><br />Predicate Information (identified by operation id):<br />---------------------------------------------------<br /><br />   1 - filter(&quot;O&quot;.&quot;ORDER_DATE&quot;&gt;=TO_DATE('2005-12-18 00:00:00', 'yyyy-mm-dd hh24:mi:ss') AND<br />              &quot;O&quot;.&quot;ORDER_DATE&quot;&lt;=TO_DATE('2005-12-25 00:00:00', 'yyyy-mm-dd hh24:mi:ss'))<br />   3 - filter(&quot;P&quot;.&quot;NAME&quot;='Fair Trade Coffee')<br />   4 - access(&quot;O&quot;.&quot;PRODUCT_ID&quot;=&quot;P&quot;.&quot;ID&quot;)

Set new numbers of records:

begin<br />  dbms_stats.set_table_stats<br />  ( ownname =&gt; 'BOOK'<br />  , tabname =&gt; 'PRODUCTS'<br />  , numrows =&gt; 1500000<br />  , numblks =&gt; 480000<br />  );<br />end;<br />/<br />

Now the CBO thinks the number of PRODUCTS is getting too large to warrant a FULL TABLE SCAN followed by an INDEX RANGE SCAN on ORDERS. It expects to find 20000 rows (one in fifty) from ORDERS -

PLAN_TABLE_OUTPUT<br />----------------------------------------------------------------------------------------------------<br />Plan hash value: 1637760431<br /><br />---------------------------------------------------------------------------------------------------<br />| Id  | Operation                    | Name       | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |<br />---------------------------------------------------------------------------------------------------<br />|   0 | SELECT STATEMENT             |            | 14999 |  1655K|       | 77893   (1)| 00:15:35 |<br />|   1 |  MERGE JOIN                  |            | 14999 |  1655K|       | 77893   (1)| 00:15:35 |<br />|*  2 |   TABLE ACCESS BY INDEX ROWID| PRODUCTS   | 15000 |   952K|       |   827   (1)| 00:00:10 |<br />|   3 |    INDEX FULL SCAN           | PRODUCT_PK |  1500K|       |       |    26   (0)| 00:00:01 |<br />|*  4 |   SORT JOIN                  |            | 20000 |   937K|  2376K| 77067   (1)| 00:15:25 |<br />|*  5 |    TABLE ACCESS FULL         | ORDERS     | 20000 |   937K|       | 76823   (1)| 00:15:22 |<br />---------------------------------------------------------------------------------------------------<br /><br />Predicate Information (identified by operation id):<br />---------------------------------------------------<br /><br />   2 - filter(&quot;P&quot;.&quot;NAME&quot;='Fair Trade Coffee')<br />   4 - access(&quot;O&quot;.&quot;PRODUCT_ID&quot;=&quot;P&quot;.&quot;ID&quot;)<br />       filter(&quot;O&quot;.&quot;PRODUCT_ID&quot;=&quot;P&quot;.&quot;ID&quot;)<br />   5 - filter(&quot;O&quot;.&quot;ORDER_DATE&quot;&gt;=TO_DATE('2005-12-18 00:00:00', 'yyyy-mm-dd hh24:mi:ss')<br />              AND &quot;O&quot;.&quot;ORDER_DATE&quot;&lt;=TO_DATE('2005-12-25 00:00:00', 'yyyy-mm-dd hh24:mi:ss')) <br />

Resources

 

On Joins and Query Plans by Tom Kyte http://www.oracle.com/technology/oramag/oracle/06-may/o36asktom.html

Reference to DBMS_STATS package http://www.psoug.org/reference/dbms_stats.html

Understanding Column Histgrams – http://www.mcse.ms/archive26-2005-5-1624465.html

Presentation on real life application of dbms_stats www.oaktable.net/getFile/117

Influencing Oracle’s Cost-Based Optimizer – Part 1 through 3 by Glenn Goodrum http://www.dbazine.com/oracle/or-articles/goodrum2 Very good introduction to reading execution plans and using dbms_stats.
 

Share.

About Author

Lucas Jellema, active in IT (and with Oracle) since 1994. Oracle ACE Director for Fusion Middleware. Consultant, trainer and instructor on diverse areas including Oracle Database (SQL & PLSQL), Service Oriented Architecture, BPM, ADF, Java in various shapes and forms and many other things. Author of the Oracle Press book: Oracle SOA Suite 11g Handbook. Frequent presenter on conferences such as JavaOne, Oracle OpenWorld, ODTUG Kaleidoscope, Devoxx and OBUG. Presenter for Oracle University Celebrity specials.

7 Comments

  1. Good expample. Things can really get complicated when a query is having joins between 4 or 5 tables, one or two nested queries, order by/group by clauses.

  2. After learning sql_trace with the 10046 and 10053 trace with tkprof, I would also recommend using the following Note in Metalink:

    Note:215187.1 SQLTXPLAIN.SQL – Enhanced Explain Plan and related diagnostic info for one SQL statement

    This gives you a lot more detailed information on the underlying tables being used (especially useful when selecting against a view), the last time they were analyzed, constraint information and a whole lot more.

  3. @Jacco:

    I agree, reality differs.

    A workaround could be (of course the solution would be invest in your self…READ and learn…) running DB Console during development and make extensive use of ADDM and the AWR (but this is also not a failsafe method). This would minimize hind site approaches (in production environments!) like, using Anjo Kolks YAPP (see: http://www.oraperf.com – my prevered approach) or the current Oracle performance methology, tuning on the use of resources.

  4. I love the examples, but the major things I miss is skewed data / histograms. These can be a serious headache / great solution while looking for performance issues.

    In my experience, which is limited, in a large production environment with many databases (and too little DBA’s), an examination of the environment up to the level where you can set statistics manually is hardly possible. There is simply not enough time to analyze the database to this extend. Using the gather_table_stats with auto option on all columns is taking way too much time on a large environment, so you are stuck with hoping that gathering schema statistics on a daily (or weekly, depending on the amount of data entered) basis is enough.

    Sadly enough too many applications are developed without looking at any queries. Especially java developers using the lazy sql option of hibernate are killers, although Oracle’s toplink can create some serious trouble too. These objects mappers are the future of queries though….

  5. Tip.

    If you don’t have the space to create real indexes (because your are testing on a proper sized test environment…) use virtual indexes:

    Within (for example) sqlplus use:

    – Forcing the optimizer to use nosegment indexes / virtual indexes
    alter session set “_use_nosegment_indexes”=TRUE;

    create index orders_product_id_idx on orders(product_id)
    NOSEGMENT
    /

  6. One of the questions should also be: What database version are we dealing with?

    Now, you will have to come up with new interview questions… :-) You gave away to much; anyway, I think understanding TKPROF should be elementary knowledge, dear Watson ;-)