Tom Kyte part II. Using what we have learned Oracle Headquarters Redwood Shores1 e1698667100526

Tom Kyte part II. Using what we have learned

In this post we will tune a query from being hopelessly slow to fast. We will use “explain plan” to analyze the query plan. We will see how Oracle prevents you from doing stupid things. We will also see that preventing hard parses of queries will really speed up your query.

Suppose we have this table:

SQL> desc sao_processed
 Name                                      Null?    Type
 ----------------------------------------- -------- ------------
 GSCREGION                                          NUMBER(11)
 SAOID                                     NOT NULL VARCHAR2(10)
 FLAMSTEED                                          NUMBER(11)
 BAYER                                              VARCHAR2(10)
 BAYERADD                                           VARCHAR2(5)
 CONSTELLATION                                      CHAR(3)
 DELETED                                            NUMBER(11)
 MAG                                                NUMBER(3,1)
 RA                                                 NUMBER(9,8)
 DE                                                 NUMBER(9,8)


This table holds information about the stars in the universe. The universe is divided in parts (gscregion). The mag column holds information about the brightness of the star. Suppose we want to create a map of a certain region of the univers. How do we do that? First we calculate which parts of the universe we need (this is beyond this post). The result is a list of gscregions. We also want to see stars with a brightness below a certain value.

Suppose the region we want to see consists of 5 gscregions. Our query will then look like this (note that these regions should not be a range, as they are in this example):

select *
from sao_processed
where mag < = 5.5
and (gscregion = 37
       or gscregion =  38
        or gscregion =  39
        or gscregion =  40
        or gscregion =  41);

It's possible that the number of gscregions is bigger (over 1500). So the where statement is dynamic. This will result in hard parses of the query. The query is different everytime it's executed. So the execution plan has to be determined everytime you execute the query. This is very time consuming as we will see.

We tried to rewrite the query using an in statement:

select count(*) from sao_processed where mag < = 5.5 and gscregion in (37, 38, 39,40,41)

This will not work when the number of parameters in the in statements is over a 1000 entries. That's stange: MySQL en Postgress were able to process the query; Oracle has this limitation. This limit is GOOD of Oracle. It prevents you from doing stupid things. Over a 1000 OR's in a query of in your in statement is not normal. Oracle wakes you up: "Hey stupid, think of a better solution". We will now see the steps to a better solution.

First we let Oracle explain our queries, so that we can see what it's doing:

set autotrace on explain

We create the table and fill it with 260.000 stars. We query the table against 1500 regions:

select count(*) from sao_processed where mag < = 5.5 and (gscregion = 37 or gscregion =  38 or gscregion =  39 or gscregion =  40 or gscregion =  41 or gscregion =
42 or gscregion =  43 or gscregion =  44 or gscregion =  45 or gscregion =  46 or gscregion =  47 or gscregion =  48 or gscregion =  49 or gscregion =  50 or gscregion =  51 or gscregion =  52 or gscregion =  53 or gscregion =  54 or gscregion =  55 or gscregion =  56 or gscregion =  57 or gscregion =  58 or gscregion =  59 or gscregion =  60 or gscregion =  61 or gscregion =
62 or gscregion =  63 or gscregion =  64 or gscregion =  65 or gscregion =  66 or gscregion =  67 or gscregion =  68 or gscregion =  69 or gscregion =  70 or gscregion =  71 or gscregion =  72 or gscregion =  73 or gscregion =  74 or gscregion =  75 or gscregion =  76 or gscregion =  77 or gscregion =  78 or gscregion =  79 or gscregion =  80 or gscregion =  81 or gscregion =
82 or gscregion =  83 or gscregion =  84 or gscregion =  85 or gscregion =  86 or gscregion =  87 or gscregion =  88 or gscregion =  89 or gscregion =  90 or gscregion =  91 or gscregion =  92 or gscregion =  93 or gscregion =  94 or gscregion =  95 or gscregion =  96 or gscregion =  97 or gscregion =  98 or gscregion =  99 or gscregion =  100 or gscregion =  101 or gscregion =
102 or gscregion =  103 or gscregion =  104 or gscregion =  105 or gscregion =  106 or gscregion =  107 or gscregion =  108 or gscregion =  109 or gscregion =  110 or gscregion =  111 or gscregion =  112 or gscregion =  113 or gscregion =  114 or gscregion =  115 or gscregion =  116 or gscregion =  117 or gscregion =
118 or gscregion =  119 or gscr........................

  COUNT(*)
----------
       516


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=393 Card=1 Bytes=2
          6)

   1    0   SORT (AGGREGATE)
   2    1     TABLE ACCESS (FULL) OF 'SAO_PROCESSED' (TABLE) (Cost=393
           Card=569 Bytes=14794)

Ok, full table scan. Lets create an index on gscregion and mag and do the same query again:

create index sao_in on sao_processed(gscregion,mag);

do the query.......  resulting in:
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=3 Card=1 Bytes=26)
   1    0   SORT (AGGREGATE)
   2    1     INLIST ITERATOR
   3    2       INDEX (RANGE SCAN) OF 'SAO_IN' (INDEX) (Cost=3 Card=56
          9 Bytes=14794)

We got rid of the full tablescan, but it was still slow (over 30 sec). It’s looks that the hard parse of the query took too long. The dynamic where clause was taking too much time to be parsed. This should be solved to make this query speed up.

We learned from Oracle that we are stupid and that we should look for a completely different solution. The solution was to put the gscregions we wanted to query against in a temporary table:

CREATE GLOBAL TEMPORARY TABLE my_GSCREGION(
GSCREGION NUMBER(11)
) ON COMMIT PRESERVE ROWS;

insert into my_GSCREGION values(37);
insert into my_GSCREGION values(38);
insert into my_GSCREGION values(39);
insert into my_GSCREGION values(40);
insert into my_GSCREGION values(41);
insert into my_GSCREGION values(42);
..... for all 1500 rows


SQL> select count(*)
  2  from sao_processed a
  3  , my_gscregion b
  4  where mag < = 5.5
  5  and a.gscregion = b.gscregion;

  COUNT(*)
----------
       516


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=159 Card=1 Bytes=3
          9)

   1    0   SORT (AGGREGATE)
   2    1     HASH JOIN (Cost=159 Card=2975 Bytes=116025)
   3    2       TABLE ACCESS (FULL) OF 'MY_GSCREGION' (TABLE (TEMP)) (
          Cost=2 Card=1503 Bytes=19539)

   4    2       INDEX (FAST FULL SCAN) OF 'SAO_IN' (INDEX) (Cost=157 C
          ard=2975 Bytes=77350)

This was blazingly fast (1 sec). The index was used fully and we got rid of the hard parse. Well almost: in the real world the input 5.5 should be a bind variable. This way the query plan can be cashed and reused.

Is there a drawback? Yes. We have to fill the temporary table before we do the query. But the original query will take minutes when you fill the database with millions of stars. Entering 1500 record in the temporary table is nothing when you compare it like this.

One Response

  1. Lucas February 27, 2005