Lots of little interesting notes on Oracle 9i & 10g – database design, DBA, architecture, performance etc. from the Tom Kyte seminar

Last week, I sat with six colleagues at the AskTom Live – seminar with Tom Kyte. You can see us sitting here in the frontrow. Lots of little interesting notes on Oracle 9i & 10g - database design, DBA, architecture, performance etc. from the Tom Kyte seminar TKS 05You may have seen several posts on this seminar already. Here is my contribution, with things I noted down that struck me, seem useful to remember or are worthwhile sharing. There was a lot more of course. I could not write everything down, I did not fully understand all of it either.

Keywords in this article are 10g, Read Consistency, Flashback (Query, Table, Table to before Drop, Database), Tuning, Debunking Myths, Question Authority, Performance and Database Design.

10g – SQL, PL/SQL and utilities

  • optimized PL/SQL compiler: many coding errors like wrong data-type and dead code such as assignment of a constant in a loop are detected and corrected. Unfortunately the compiler does not tell us what code it decided to run: it does not optimize the PL/SQL code, it optimizes the byte-code
  • native PL/SQL compilation is much easier: the natively compiled code is no longer stored externally on the file-system (difficult with backups, less portable) but instead in LOBs inside the database; see post on AskTom on Native PL/SQL Compilation
  • the new PL/SQL DOUBLE and FLOAT datatypes are far less accurate than the NUMBER (12 or 13 digits of precision vs. 38 digits of precision). However, if you do not need the very high precision of NUMBER, FLOAT and DOUBLE (and PLS_INTEGER) can speed simple operations by substantial factors. Tom showed a calculation of PI, using NUMBER and BINARY_DOUBLE, in what was the exact same code. The timings were 1.5 vs. 0.3 seconds!
  • the expdp and impdb (the server side import and export architecture) and work ‘piped across the network’, that is: the things expdp is extracting on one database can be transferred or piped across the network to another database where impdb is immediately importing it without the intermediate dump file ever being created.
  • in 10gR2 range queries (using BETWEEN) and queries with LIKE can be performed in nls_sort_ci mode, meaning that they are case insensitive. select * from a where col like 'XT%' is then equivalent with select * from a where col like 'xt%' or select * from a where col like 'xT%'
  • in 10gR2 we will have a row level error handling mechanism for DML statements, somewhat similar to the PL/SQL SAVE EXCEPTIONS clause in BULK DML operations. This will allow DML statements to continue processing rows 5001 to 10000 even if the update of row 5000 resulted in an error, e.g. a constraint violation. The failed rows can be processed separately from the main statement.
  • Table Redefinition performs online reorganization of a table; for example: set up a table with all characteristics as we would like to have it – proper storage parameters, partitions, indexes and constraints – well perhaps we add the constraints after we complete data copying, start redefinition of the source table and finally swap the two tables. A good description of this process is found here
    (reasons for doing reorganization: Improve Performance–Packing the rows in contiguous space–Eliminating Chained Rows, Protect Availability–Prevent out-of-space error conditions, Recover Wasted Disk Space–Inside the block and below the High Water Mark�, Optimize Database Layout–Relocating to different tablespace to relieve I/O contention)
  • In 10g, PL/SQL program units no longer are invalidated when the synonym they rely on for accessing objects such as tables, views or other PL/SQL objects are defined – as long as the object the synonyms refers to now has the same signature, no invalidations takes place. This allows for on-line redefinition of synonyms without causing lots of invalidations, recompilations and enormous queues – as recompilation cannot be done for procedures that are currently being executed!
  • 10g On Line Segment Shrink reclaims space for tables, meaning that a full table scan will take potentially much less i/o (fewer datablocks to read)
  • The new 10g COLLECT aggregate operator can aggregate column values into Nested Table types. For example
    create or replace type ename_type as table of varchar2(30);
    select deptno, cast( collect(ename) as ename_type ) enames
      from emp
     group by deptno;
    DEPTNO ENAMES
    ------ ----------------------------------------
        10 ENAME_TYPE('CLARK', 'KING', 'MILLER')
        20 ENAME_TYPE('SMITH', 'JONES', 'SCOTT', 'ADAMS', 'FORD')
        30 ENAME_TYPE('ALLEN', 'WARD', 'MARTIN', 'BLAKE', 'TURNER', 'JAMES')
  • The UNDO_RETENTION time can be guaranteed or enforced in 10g. That means: if we specify a retention time of 3 days and indicate that it must be guaranteed, we can rely on flashback queries and flashback table statements to always be able to go back 3 days. Note however that if the UNDO tablespace fills up, this guaranteed 3 day period may lead to a complete stop of all transactions in the database until either the oldest undo information expires (is three days old) or more freespace is added to the UNDO tablespace.
  • The 9iR2 dbms_xplan package provides more details, especially on join-conditions and where-predicates in execution plans than normal autotrace explain only in SQL*Plus does. It even shows the VPD-predicates that are applied!
  • In 9i, DUAL is an IOT. In 10g, queries against DUAL that do not actually select a column are executed as FAST DUAL, which means no real table access and therefore less I/O. Compare select * from dual and select 1 from dual.
  • Function Based Indexes are Standard Edition as of 9iR2!
  • In 10g will EXECUTE IMMEDIATE of the same statement not require a re-parse; The ref cursor for (select ….) will still and always require a new parse

Tuning suggestions and observations

  • increase the CACHE value of sequences from the default of 20 to a much higher number; whenever a new set of sequences must be acquired, relatively expensive, serialized internal operations take place. Storing a larger set of cached values is very cheap. You only may loose (larger) ranges of sequence values upon database shutdown and restart.
  • in 10g, statistics gathering is by default automatically performed. This includes system statistics: information about CPU – speed, average load – and disks. Based on this information, the CBO can choose CPU_COSTING as optimization goal instead of the more common I/O_COSTING as goal. You may see different execution paths as a result of this when your system characteristics change
  • Bind variable peeking by the CBO when choosing an execution plan means that the CBO will not just assume a generic bind variable in a query such as ‘select * from emp where job=:x’, and act accordingly, it will actually look at the value. If that value happens to be one that is very rare in the job column, the execution plan that is chosen may use an index. If that value is very common in the job column, chances are the CBO will use a Full Table Scan. However, the execution plan that is chosen when first this query is run – since the database was started or the shared pool flushed – will be picked for subsequent executions of the exact same query. So queries for values of 😡 that are less common or more rare maybe hit with an inappropriate execution plan. If that happens, you may want to use literals in certain cases instead of bind-variables or deliberately create distinct queries.
  • The CBO is very clever in 10g. Instead of telling it how to perform an operation, e.g. turning NOT IN expressions into NOT EXISTS for getting a better execution plan, we can just tell it what we want. The SQL can be written in a very functional way with little regard for the subtleties of the execution plans. NOT IN and NOT EXISTS result in the same execution plan. In general we should focus on tuning questions and algoritms rather than queries.
  • Frequent committing is bad – see Logwriter Dilemma
  • Putting more CPU to a slow system may actually slow things further down. For example: the bottleneck was I/O. Some large reports consumed lots of CPU resources, they did not compete too hard for the I/O as the smaller transactions from the OLTP system. With more CPU, the reports ran faster and required more I/O: the struggle for I/O became more fierce and overall performance deteriorated!
  • dbms_profiler can help to find low-hanging fruit in poorly performing PL/SQL. It can also help to find out whether PL/SQL unit-tests cover all code in the application. It also may help to find unused code.
  • Hard parses are the death of many systems
  • Instrument your code. Put debug and trace statements all over your code; make them configurable from outside the application, to turn debugging on or off at various levels
  • Analytical functions are more efficient than alternative approaches in small (everything in memory) or large systems (memory paging, but alternative solutions also require disk-paging. It may be that in intermediate systems, for processing the analytical functions already memory resources are too small and memory-paging to disk is required while alternative approaches using subqueries for example still fit into memory
  • It is every so much faster to query select * from table where sys_context('context', ' key') = column than select * from table where plsql_function() = column. Applicatio context references are seen as bind-variables

.

Kyte says… (but always question authority)

  • It is preferable to have no SQL in any 3GL – except of course PL/SQL. All 3GL languages should access the database through APIs. These APIs shield the underlying tables and views as well as the SQL required to access the data. The 3GL language calls stored procedures to perform Business Services. These procedures or services can take data and perform DML or return data (typically using ref cursors) retrieved from the database. The API itself can be built in two layers: a T(able)API layer linked to a single table and a Business API (BAPI) or Module API (MAPI).
  • Read the Oracle Server Concepts Guide
  • Use bind variables – except perhaps in Data Warehouses where the time to run a query far outlives the time to parse ten. Bind-variable peeking maybe cause use of bind-variables in a DWH somewhat disadvantageous.
  • Set the parameter CURSOR_SHARING to EXACT. Never use FORCE as a global database setting. If for some poorly written application you think cursor_sharing=force is necessary, consider using LOGON triggers to detect that application and alter the session to use cursor_sharing=force
  • Use dbms_application_info as a way to instrument your code and be able to on-line, real-time follow the progress of PL/SQL code and long-running queries. Note that dbms_application_info updates v$-tables wihtout the need for a commit. It could almost be seen as alternative to dbms_pipe in some senses!
  • Parallel Query is not very useful in a multi-user environment. At the very least do not use FORCE PARALLEL; have the database handle degree of parallellism.
  • PERCENT_DISC is a neat way to find the value that caps a certain percentage of the records. For example select *
    from ( select ename
    , sal
    , percentile_disc(0.5) within group (order by sal desc) over() sal_p50
    from emp
    )
    where sal < sal_p50
    order
    by sal asc
  • Use locally managed tablespaces; there is no excuse for using dictionary managed tablespaces
  • If you can do it SET based in SQL, it will always beat the procedural, PL/SQL based approach

Debunking Myths

All of the following statements are NOT true

  • The most selective column (i.e. with most distinct values) should be FIRST in an index – in most cases it should be the exact reverse!
  • Data and indexes should be in separate tablespaces – only perhaps for transportable tablespaces if you want to transport the data and rebuild the indexes. Because of advanced disk-reading capabilities such as EMC read-ahead, it may make sense to distribute tables and indexes over different tablespaces by type of access: tables and indexes that are typically read in multi-block mode (full table scan) should be together in a tablespace. Tables and indexes accessed usually in single-block (index scan) mode should be put together in another tablespace.
  • NOT IN in SQL queries should always be rewritten with EXISTS – not true, in 10g both are treated the same by the CBO and result in the same execution plan
  • Buffer Cache Hit Ratio should be over 99% – it is like saying ‘your car should be returned to the dealer when it is running on average at less than 99% of its document maximum horsepower; ratios are indicative only, never a goal on their own
  • select count(1) from table is faster than select count(*) from table – not true, never was true
  • Select * from table where nvl(:bindvariable, column) = column is bad programming; you should (probably in the client) program an if-then-else that decides whether to do the query select * from table or the query select * from table where column=:bindvariable – the CBO in 10g is smart enough to create this if-then-else construction in the execution plan itself!
  • You should always use FETCH BULK COLLECT INTO when you want to process over 10 to 30 records in PL/SQL – in 10g this relatively new guidelines is redundant: the PL/SQL compiler automatically turns for var in cursor() constructions in BULK fetches (default LIMIT 100), that are even slightly more efficient than the explicit ones you program yourself.
  • Full table scans are bad; you should force the CBO to use indexes – sometimes true when you want to query only few rows from very large tables. However: the CBO knows best! Full table scan can be the optimal approach in many cases
  • Commit frequently! – not true: it generates lots of extra undo and redo, causes a log file synch (wait!) and you may break the logical cohesion of your transaction. Transaction is first and foremost a logical concept, not a physical one.

for me to remember

  • Use materialized views as vehicle to implement an ON-COMMIT trigger. Also use them to implement declaratively multi-table constraints (also see AskTom)
  • Write a Pie Chart in SQL using Table Functions
  • Database Dictionary and statistics are an interesting application for BI
  • In Analytics: the analytical functions are evaluated after the WHERE but before the GROUP BY
  • NULLS LAST == order by nvl(column, very_big_number) asc
  • Use Stored Java to access MySQL or other (remote) database; return the results wrapped as Table Function; mimic Database Link for non-Oracle (JDBC) databases.

Tidbits

  • AskTom is developed and deployed using HTML DB
  • If you want a query to fail under certain conditions, include a CASE WHEN CONDITION THEN 1/0 END in the query
  • Tom Kyte used VMWare for this seminar: one VM for Windows with PowerPoint, two VMs for a 10g and a 9iR2 database. A demo with 90-trial of VMWare with 10g Database can be downloaded from OTN
  • For test-cases it is very important to think Multi-user, multi-session. Things that you never expect to happen when sequentially processing statements may happen when parallel sessions are active
  • Workspace Management could be used for Workshop Data-management or Test-Data Management. It’s poor-man’s Flashback – an not limited in retention

Resources

Recent presentations by Tom Kyte are on his website AskTom. I suspect that most slides from the seminar last week in Utrecht will be here in some form or shape.

6 Comments

  1. keerthi January 4, 2007
  2. naveen September 15, 2005
  3. Farqualeet Ishrat July 2, 2005
  4. Lucas March 4, 2005
  5. Lucas March 2, 2005
  6. Lucas February 27, 2005