IN or EXIST or doesn’t it matter

7

In my previous post about package constants I mentioned the application I am working on right now. They gave me the task to speed up the application since performance was getting worse and worse. I analyzed the statspack results together with a DBA and we found two queries that together took 40% of the logical IO’s of the system. That’s a lot for only two queries, especially when you look how big our application is; we have many queries.

The two queries were both in one procedure. That must have been an off day for the original programmer :-). The queries were small and had an IN statement in the where clause. I simply rewrote the queries to use an EXIST and they became blazingly fast. That was strange. I attended the Tom Kyte seminar in Utrecht in 2005 and he claimed that it didn’t matter anymore. IN or EXIST, the database would see this and the optimizer would have the same execution plan for both, but not in my case. How could that be? I was running the queries in a 9R2 database.


So I wrote a (meaningless) example to show the difference between the IN and EXIST statements. The source of this test script is as follows:

set timing off
set autotrace off
set echo on

create table my_blocked_objects (object_name varchar2(200) not null)
/

create table my_objects (object_name varchar2(200) not null)
/

create index my_blk_idx on my_blocked_objects(object_name)
/

create index my_obj_idx on my_objects(object_name)
/

insert into my_blocked_objects
select owner || '.' || object_name
from   all_objects
/

insert into my_objects
select user || '.' || object_name
from   user_objects
/

commit
/

begin
  dbms_stats.gather_table_stats(user,'MY_BLOCKED_OBJECTS');
  dbms_stats.gather_table_stats(user,'MY_OBJECTS');
end;
/

set timing on
set autotrace on explain

select count(*)
from   my_objects obj
where  obj.object_name in (select blk.object_name
                           from   my_blocked_objects blk)
/

select count(*)
from   my_objects obj
where  exists (select ''
               from   my_blocked_objects blk
               where  blk.object_name = obj.object_name)
/

drop table my_blocked_objects
/

drop table my_objects
/

The autotrace results of this script on my database was:

SQL> select count(*)
  2  from   my_objects obj
  3  where  obj.object_name in (select blk.object_name
  4                             from   my_blocked_objects blk)
  5  /

  COUNT(*)
----------
      3376

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=166 Card=1 Bytes=132
          )

   1    0   SORT (AGGREGATE)
   2    1     HASH JOIN (Cost=166 Card=3378 Bytes=445896)
   3    2       INDEX (FULL SCAN) OF 'MY_OBJ_IDX' (NON-UNIQUE) (Cost=2
          6 Card=3378 Bytes=101340)

   4    2       VIEW OF 'VW_NSO_1' (Cost=156 Card=27148 Bytes=2769096)
   5    4         SORT (UNIQUE) (Cost=156 Card=27148 Bytes=814440)
   6    5           INDEX (FULL SCAN) OF 'MY_BLK_IDX' (NON-UNIQUE) (Co
          st=26 Card=27522 Bytes=825660)




SQL>
SQL> select count(*)
  2  from   my_objects obj
  3  where  exists (select ''
  4                 from   my_blocked_objects blk
  5                 where  blk.object_name = obj.object_name)
  6  /

  COUNT(*)
----------
      3376

Elapsed: 00:00:00.00

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=3 Card=1 Bytes=30)
   1    0   SORT (AGGREGATE)
   2    1     INDEX (FULL SCAN) OF 'MY_OBJ_IDX' (NON-UNIQUE) (Cost=26
          Card=169 Bytes=5070)

   3    2       INDEX (RANGE SCAN) OF 'MY_BLK_IDX' (NON-UNIQUE) (Cost=
          1 Card=1 Bytes=30)

Again not the same execution plan. The IN version was less efficient compared to the EXIST version of the query. And this was not what Tom Kyte was saying during his seminar. I then tested the same script on another instance; but also a 9R2 database. That gave the following results:

SQL> select count(*)
  2  from   my_objects obj
  3  where  obj.object_name in (select blk.object_name
  4                             from   my_blocked_objects blk)
  5  /

  COUNT(*)
----------
      2220

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=12 Card=1 Bytes=62)
   1    0   SORT (AGGREGATE)
   2    1     HASH JOIN (SEMI) (Cost=12 Card=2223 Bytes=137826)
   3    2       TABLE ACCESS (FULL) OF 'MY_OBJECTS' (Cost=4 Card=2223
          Bytes=71136)

   4    2       INDEX (FAST FULL SCAN) OF 'MY_BLK_IDX' (NON-UNIQUE) (C
          ost=5 Card=24923 Bytes=747690)




SQL>
SQL> select count(*)
  2  from   my_objects obj
  3  where  exists (select ''
  4                 from   my_blocked_objects blk
  5                 where  blk.object_name = obj.object_name)
  6  /

  COUNT(*)
----------
      2220

Elapsed: 00:00:00.00

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=12 Card=1 Bytes=62)
   1    0   SORT (AGGREGATE)
   2    1     HASH JOIN (SEMI) (Cost=12 Card=2223 Bytes=137826)
   3    2       TABLE ACCESS (FULL) OF 'MY_OBJECTS' (Cost=4 Card=2223
          Bytes=71136)

   4    2       INDEX (FAST FULL SCAN) OF 'MY_BLK_IDX' (NON-UNIQUE) (C
          ost=5 Card=24923 Bytes=747690)

Wow the same execution plan, but a slightly higher cost (Cost=12 compared to Cost=3). How was this possible???? TKPROF output showed the same results. It took some time to figure this out; but the difference was introduced by different optimizer parameters in the database. The database with two execution plans had the following parameters:

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------
object_cache_optimal_size            integer     102400
optimizer_dynamic_sampling           integer     0
optimizer_features_enable            string      8.1.7
optimizer_index_caching              integer     0
optimizer_index_cost_adj             integer     10
optimizer_max_permutations           integer     80000
optimizer_mode                       string      CHOOSE

The database with one execution plan had the following parameters:

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------
object_cache_optimal_size            integer     102400
optimizer_dynamic_sampling           integer     1
optimizer_features_enable            string      9.2.0
optimizer_index_caching              integer     0
optimizer_index_cost_adj             integer     100
optimizer_max_permutations           integer     2000
optimizer_mode                       string      CHOOSE

 

For some reason the DBA’s kept the optimizer_features_enable on 8.1.7 after the migration of the database. This prevented the optimizer from using new features like optimizing IN where clauses. The argument was that it could be possible that another value of this parameter could negatively impact performance. That could be true; but no one tested that; they just took tha assumption.
Setting the parameter to 9.2.0 would have automatically optimized the two bad queries that led to this post. They would never have been a problem. I would never have found them as being problem queries. My lesson learnt is that you never should rely on assumptions; you have to test your assumptions.

 

Tom Kyte was right in this case. There is no difference between IN or EXIST. I believe the difference in cost between the two databases (cost=3 vs. cost=12) is hard to compare because of the difference of the optimizer_features_enable parameter.

Share.

About Author

7 Comments

  1. Andrew Markiewicz on

    Also of note is that the optimizer_index_cost_adj parameter is different in the two databases (unless that is a typo in the post).

    optimizer_features_enable string 8.1.7
    optimizer_index_caching integer 0
    optimizer_index_cost_adj integer 10

    optimizer_features_enable string 9.2.0
    optimizer_index_caching integer 0
    optimizer_index_cost_adj integer 100

    OICA in the”8.1.7″ version calculates indexes as 10% of the cost making indexes more likely to be chosen.
    That parameter should be the same for a comparison of the optimizer_features_enable parameter.

    Andrew

  2. Patrick Roozen on

    Hi Andre,

    In Jonathan Lewis’s book: Cost-Based Oracle Fundamentals on page 466 it also mentions that the optimizer_max_permutations will be 2000 when you set the optimizer_features_enable to 9.2.0. So that confirms what Alex says as well.

    But as you know I have run the original query to be optimized in the developers test database with the parameter optimizer_features_enable set to 9.2.0 and it didn’t change the execution plan. (I do not have the output here anymore, but will provide the info if you can let me have the database so I can change the parameter back and forth)

    I’m sad to see your comment “For some reason the DBA’s kept the optimizer_features_enable on 8.1.7 after the migration of the database. This prevented the optimizer from using new features like optimizing IN where clauses. The argument was that it could be possible that another value of this parameter could negatively impact performance. That could be true; but no one tested that; they just took tha assumption.” here because neither you nor me where there at that time. Therefor you cannot say that it wasn’t tested. You don’t know and neither do I. We just cannot find the reason anymore why it was left like that.

    Do not think that changing that parameter is a silver bullet and will fix all the problems…
    (as you know I pointed it out to you and we will change it to 9.2.0 but that has to be tested and thus will take it’s time)

    When I have the chance I will run the original query (not your test query) on the database with the optimizer_features_enable set to 9.2.0 and back at 8.1.7 and will post the outcome of that here as well.

    Keep up the good work ;-) and convince your collegues to use bind variables.

    Patrick

  3. “Also the “optimizer_max_permutations” should have been set to 2000 (instead of 80000).”

    I believe the default value optimizer_max_permutations is derived from optimizer_features_enable so setting it to 9.2.0 should “correct” optimizer_max_permutations as well.

  4. hopefully my last posting;
    – optimizer_features_enable string 8.1.7
    As you told that was the reason. CBO has a lot on development on it. this makes sometimes hard to explain why a query takes a specific execution plan.
    Karl

  5. Hi again;
    made a trip around a sea with my wife and now i know what to write about to finish my thougts.

    Its all about determine thd DRIVING TABLE/QUERY. The Driving table is te table which the Row Source generator starts to retrieve data.
    In RULE BASED times as RBO did not know about expected cardinality there were some fixed assumptions also using the IN and EXISTS operator.

    Using an IN operator means in RBO Mode that the driving TABLE is the one accessed in the subquery. This helped when only a few rows where retrieved.

    USING an EXISTS Operator in RBO mode means that the DRIVING TABLE/QUERY is the main query. this helps when only a few number of rows where retreived in the main query.

    Now in the new world the CBO’s one, the DRIVING TABLE is choosen via expected number rows of the operation on this table. Using IN our EXISTS for CBO is meaningless for choosing the DRIVING TABLE/QUERY.

    Using a rule hint of ths same table example demonstarate what i mean.

    SQL> — RULE BASED IN CLAUSE
    SQL> SELECT /*+RULE */ *
    2 FROM My_Objects Obj
    3 WHERE Obj.Object_Name IN
    4 (SELECT Blk.Object_Name FROM My_Blocked_Objects Blk)
    5 ;

    7600 Zeilen ausgewõhlt.

    Ausf³hrungsplan
    ———————————————————-
    Plan hash value: 2264125269

    ———————————————————-
    | Id | Operation | Name |
    ———————————————————-
    | 0 | SELECT STATEMENT | |
    | 1 | TABLE ACCESS BY INDEX ROWID| MY_OBJECTS |
    | 2 | NESTED LOOPS | |
    | 3 | VIEW | VW_NSO_1 |
    | 4 | SORT UNIQUE | |
    | 5 | TABLE ACCESS FULL | MY_BLOCKED_OBJECTS |
    |* 6 | INDEX RANGE SCAN | MY_OBJ_IDX |
    ———————————————————-

    Predicate Information (identified by operation id):
    —————————————————

    6 – access(“OBJ”.”OBJECT_NAME”=”$nso_col_1″)

    Note
    —–
    – rule based optimizer used (consider using cbo)

    Statistiken
    ———————————————————-
    1 recursive calls
    0 db block gets
    15505 consistent gets
    0 physical reads
    0 redo size
    104294 bytes sent via SQL*Net to client
    5950 bytes received via SQL*Net from client
    508 SQL*Net roundtrips to/from client
    1 sorts (memory)
    0 sorts (disk)
    7600 rows processed

    SQL>

    The explain plan is expected RBO behaiver . it must use the SUB-Query as DRIVING QUERY. with very bad results :

    IN Operator – RBO : 15505 consistent gets
    IN Operator – CBO : 1352 consistent gets

    Greeting
    Karl

    PS.: the IN Operator – CBO Trace output (number expected rows was different from lasst posting)…

    SQL>
    SQL> — IN CLAUSE
    SQL> SELECT *
    2 FROM My_Objects Obj
    3 WHERE Obj.Object_Name IN
    4 (SELECT Blk.Object_Name FROM My_Blocked_Objects Blk)
    5 ;

    7600 Zeilen ausgewõhlt.

    Ausf³hrungsplan
    ———————————————————-
    Plan hash value: 3465247318

    ————————————————————————————
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    ————————————————————————————
    | 0 | SELECT STATEMENT | | 8000 | 750K| 373 (4)| 00:00:05 |
    |* 1 | HASH JOIN SEMI | | 8000 | 750K| 373 (4)| 00:00:05 |
    | 2 | TABLE ACCESS FULL | MY_OBJECTS | 8000 | 562K| 27 (0)| 00:00:01 |
    | 3 | INDEX FAST FULL SCAN| MY_BLK_IDX | 251K| 5900K| 340 (2)| 00:00:05 |
    ————————————————————————————

    Predicate Information (identified by operation id):
    —————————————————

    1 – access(“OBJ”.”OBJECT_NAME”=”BLK”.”OBJECT_NAME”)

    Statistiken
    ———————————————————-
    1 recursive calls
    0 db block gets
    1352 consistent gets
    0 physical reads
    0 redo size
    104294 bytes sent via SQL*Net to client
    5950 bytes received via SQL*Net from client
    508 SQL*Net roundtrips to/from client
    0 sorts (memory)
    0 sorts (disk)
    7600 rows processed

  6. Hi Anton,
    made it with a little bit more data and more columns – the results are the same on Oracle XE :

    set timing off
    set autotrace off
    set echo on
    DROP TABLE my_blocked_objects;
    DROP TABLE my_objects;
    create table my_blocked_objects AS SELECT do.owner || ‘.’ || do.object_name AS object_name , do.subobject_name, do.object_id, do.object_type, do.created, do.last_ddl_time, do.timestamp FROM dba_objects do, User_Objects uo;

    create table my_objects AS SELECT do.owner || ‘.’ || do.object_name AS object_name , do.subobject_name, do.object_id, do.object_type, do.created, do.last_ddl_time, do.timestamp FROM dba_objects do , User_Objects uo1, User_Objects uo2 WHERE do.owner = USER;

    create index my_blk_idx on my_blocked_objects(object_name)
    /

    create index my_obj_idx on my_objects(object_name)
    /

    begin
    dbms_stats.gather_table_stats(user,’MY_BLOCKED_OBJECTS’);
    dbms_stats.gather_table_stats(user,’MY_OBJECTS’);
    end;
    /

    SQL> set line 120
    SQL> select *
    2 from my_objects obj
    3 where obj.object_name in (select blk.object_name
    4 from my_blocked_objects blk);

    2548 Zeilen ausgewõhlt.

    Ausf³hrungsplan
    ———————————————————-
    Plan hash value: 3465247318

    ————————————————————————————
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    ————————————————————————————
    | 0 | SELECT STATEMENT | | 2744 | 246K| 374 (4)| 00:00:05 |
    |* 1 | HASH JOIN SEMI | | 2744 | 246K| 374 (4)| 00:00:05 |
    | 2 | TABLE ACCESS FULL | MY_OBJECTS | 2744 | 182K| 11 (0)| 00:00:01 |
    | 3 | INDEX FAST FULL SCAN| MY_BLK_IDX | 264K| 6196K| 358 (2)| 00:00:05 |
    ————————————————————————————

    Predicate Information (identified by operation id):
    —————————————————

    1 – access(“OBJ”.”OBJECT_NAME”=”BLK”.”OBJECT_NAME”)

    Statistiken
    ———————————————————-
    0 recursive calls
    0 db block gets
    1349 consistent gets
    0 physical reads
    0 redo size
    35871 bytes sent via SQL*Net to client
    2243 bytes received via SQL*Net from client
    171 SQL*Net roundtrips to/from client
    0 sorts (memory)
    0 sorts (disk)
    2548 rows processed

    SQL> select *
    2 from my_objects obj
    3 where exists (select ”
    4 from my_blocked_objects blk
    5 where blk.object_name = obj.object_name)
    6 /

    2548 Zeilen ausgewõhlt.

    Ausf³hrungsplan
    ———————————————————-
    Plan hash value: 3465247318

    ————————————————————————————
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    ————————————————————————————
    | 0 | SELECT STATEMENT | | 2744 | 246K| 374 (4)| 00:00:05 |
    |* 1 | HASH JOIN SEMI | | 2744 | 246K| 374 (4)| 00:00:05 |
    | 2 | TABLE ACCESS FULL | MY_OBJECTS | 2744 | 182K| 11 (0)| 00:00:01 |
    | 3 | INDEX FAST FULL SCAN| MY_BLK_IDX | 264K| 6196K| 358 (2)| 00:00:05 |
    ————————————————————————————

    Predicate Information (identified by operation id):
    —————————————————

    1 – access(“BLK”.”OBJECT_NAME”=”OBJ”.”OBJECT_NAME”)

    Statistiken
    ———————————————————-
    1 recursive calls
    0 db block gets
    1349 consistent gets
    0 physical reads
    0 redo size
    35871 bytes sent via SQL*Net to client
    2243 bytes received via SQL*Net from client
    171 SQL*Net roundtrips to/from client
    0 sorts (memory)
    0 sorts (disk)
    2548 rows processed

    SQL>

    The intersting ist that the access predicates are the sam (join)

    1 – access(“BLK”.”OBJECT_NAME”=”OBJ”.”OBJECT_NAME”)

    So the optimizer seem not to see the *exists* construct or *in* construct but joins both tables.
    Very interesting
    Greetings
    karl

  7. Marco Gralike on

    Also the “optimizer_max_permutations” should have been set to 2000 (instead of 80000). This is a known “feature” and should be corrected (probably also an off day of the Oracle employer who introduced this high value).