SAN, SUN and SOA, better run and hide – Likes and dislikes of Anjo Kolk – report of an inspiring AMIS Query on Performance
Last night, we hosted an inspiring and thought provoking AMIS Query, with some 70 developers, project managers and primarily DBAs. The presentator was well known performance expert ("I just solve problems, I do not call myself performance expert") Anjo Kolk whose YAPP paper and thoughts on Response Time helped shape our current way of thinking about (database) performance. Anjo told a lot of stories from his experiences in the field, literally around the world, and painted a picture of how to approach performance ‘situations’.
The main theme he seemed to radiate was: use your common sense. Do not blindly follow so called common and best practices, silver bullets and experts’ pearls of wisdom. Keep thinking! His primary method of analyzing problems: keep asking WHY? Do not focus so much on HOW operations are performed – trying to tune SQL statements to death -, start with WHAT should be done to begin with, as the best performing statement is the one you do not have to execute at all.
And be very aware that a cheap statement can still cause problems when executed zillions of times per day. He shocked the audience with some background on the behavior of JDBC drivers and connection pooling, stating that while the connection is kept alive with connection pooling, the session is not. JDBC Drivers will reopen sessions all the time (look at V$SYSSTAT en check logons cumulative – with many web applications, such as WebForms in Oracle E-Business Suite, these will shoot through the roof). As part of opening the session, the driver will usually execute ALTER SESSION SET NLS_TZ. Anjo related about a site where that statement was executed 100k times per day.
In this particular case, what makes it even worse is that NLS_TZ is an init.ora parameter. As long as a session does not change anyone of the init.ora parameters it has a pointer to a gobal, cental, one-copy of all database parameters – between 27k and 32k worth of memory. However, as soon as a session overrides even a single parameter, it will get its own copy of the entire table of parameters. With many sessions, especially many sessions constantly being openened and closed again, this not only takes a lot of memory from the shared pool, it also results in a lot of fragmentation and eventually quite possibly the forced flush by the library heap manager of a substantial portion of the pool. This in turn will most likely remove the execution plans for a lot of SQL statements that then require renew parsing, a quite costly operation!
It seems that only when the timezones of the middle tier that opens the (JDBC) connections and the database are in synch, this ALTER SESSION is not performed.
Another telling example was the case where a calculation was performed very frequently. The calculation was a division of two numbers: x/y. For each calculation, x and y had to be derived, which was quite expensive in terms of performance. Close investigation revealed that x was 0 in a large percentage of the cases. Using a simple if-then statement to only derive the value of y when x was not 0, made huge performance impact. Looking at WHAT should be done instead of HOW (trying to make the derivation of y as efficient as possible) was easier and had a higher rendement.
By the end of evening, any SAN Administrator present had probably hidden under his chair or chosen a new career. Anjo was not very positive about SANs and the way they typically are used and administrated. One quite telling example showed a huge, and ever increasing cache (CAS$$$ as Anjo spells it) at the SAN – with a proud SAN administrator, pointing at the low activity at the SAN – combined with a small database buffer cache and a network with perhapos huge bandwidth but very noticeable latency. Anjo argued that with a latency of the network controller of 0.3 ms, you can get about 3000 I/O operations per second. With an average data block size of 8Kb, that means that with single block reads you get up to 24 Mb/s – with multi-block reads that can get a little higher – but a pretty substantial bandwith of 256 Mb/s is pointless, and increasing it in order to improve performance is plain ridiculous: in this situation, the latency will constraint the network-traffic. Adding memory to the database buffer would in this case reduce the need for network traffic, improving the overall performance and saving money on the SAN. Anjo could quite understand why SAN vendors suggest to their customers that Full Table Scans are the way to go in order to optimize SAN performance: FTS means multi block reads means more networktraffic possible – and of course much more work for the database AND the SAN, so the overall gain is questionable.
Another endangered species if Anjo gets his way is the Java Developer. Well, in particular the Java developer without a clue when it comes to the database. Who does not know or care about database specifics, about properly dealing with cursors, about trying to perform in the middle tier what the database is very good in or trying to push to the database what the middle tier probably should be doing.
Anjo promotes an Holistic approach to the system. Which means for example that the only thing that truly matters is the performance as experienced by the end-user. Whether there is waittime on the middle tier or in the database or somewhere else is not relevant, the overall response time is what counts. R (as experienced by the end user) = W (aittime, from anywhere in the system) + s (ervice time, again anywhere in the system).
What is service time in the database will be considered waittime in the middletier. Trying to make the system as whole better performant, is not achieved by local optimizations, for example reducing the service time on the midde tier by pushing more work to the database. If anything, Anjo argued for more work being done on the middle tier, more CPU power being applied in the middle tier – but not for operations of course the database is better at.
He demonstrated how a slow middle tier can slow down the database, by increasing the contention: if through a JDBC connection an update is performed, followed by some work on the middle tier, followed by either commit or a rollback of the transaction, the updated record is locked while the middle tier is doing the work. Locks may result in contention and the longer the lock is held, the higher the risk at contention (and scalability inhibition). By making the middle tier faster (more CPUs for example), the period of time during which the lock is held is reduced and database performance increases.
Anjo mentioned an operation we can perform on cursors that I had not heard about before: after opening a cursor – parse, execute, first fetch – we can continue with one of three not two actions: keep open and do a subsequent fetch is well-known, as is close the cursor. However, there is a third option: CANCEL. With cancelling the cursor, the resultset associated with the cursor and all the resources tied to it are freed – just as with close. However, the cursor remains available in a parsed state, associated with its bind variables, and can easily be reused for fetching from! Anjo did a presentation on "The Life of a Cursor" that provides many more details (see: life of a cursor). Surprisingly, Anjo’s paper even provides the Java code for using the cancel cursor option:
DriverManager.registerDriver(new<br />oracle.jdbc.OracleDriver());<br />conn = DriverManager.getConnection<br />("jdbc:oracle:oci8:@", "scott", "tiger");<br />Statement stmt = conn.createStatement ();<br />ResultSet rset = stmt.executeQuery ("select ename<br />from emp");<br / >rset.next();<br />System.ou t.println (rset.getString (1));<br />rset.close ();<br />// stmt.close ();<br />((OracleStatement)stmt).cancel(); // instead of the normal close, cast to OracleStatement to access the cancel method<br /><br />
More details in Chapter 5, OracleÂ® Database Application Developer’s Guide – Fundamentals 10g Release 1 (10.1) Part Number B10795-01.
Cancelling Cursors: Cancelling a cursor frees resources from the current fetch.The information currently in the associated private area is lost but the cursor remains open, parsed, and associated with its bind variables.
Closing Cursors: Closing a cursor means that the information currently in the associated private area is lost and its memory is deallocated. Once a cursor is opened, it is not closed until one of the following events occurs:
1) The user program terminates its connection to the server.
2) If the user program is an OCI program or precompiler application, then it explicitly closes any open cursor during the execution of that program. (However, when this program terminates, any cursors remaining open are implicitly closed.)
Anjo stated that most CPU power should be used for WebServers, directly servicing the end-users and holding their sessions and least should be used for the database with an intermediate percentage of overall CPU power for the middle tier.
SOA, SUN and writing books
Some final thoughts on hardware: Anjo had some rough experiences with SUN it seems. Sparc is dead, there are many cheaper and better performing alternatives (he mentioned some good AMD examples).
Switching from 32bit to 64 bit is only meaningful alongside a drastic increase in cache-sizes. With no change in the cache-size, switching to 64 bit means dividing the number of ‘positions’ in the cache by two!!
SOA is a potential havoc wreaker when it comes to performance. With SOA, your overall response time becomes dependent on something you may not control: the service(s) invoked while servicing a request. Anjo does not like to be dependent on something you do not control and especially he does not like loss of predictability. For that reason he strongly advises against using features like Dynamic Resizing of Buffer Cache, Dynamic Generation of Statistics and dynamic fiddling with PGA_Target (Sort in memory/sort on disk). All these features cause unpredictable behavior.
Writing books does not make Anjo happy. In fact, the one paper that made him famous was not even written by Anjo himself – though of course the ideas and concepts are. We should not hold our breath for a book by him. Instead, presentations like this one are a good way to learn from a master problem solver.
White papers by Anjo, including the YAPP paper: http://www.oraperf.com/whitepapers.html
Blog by Anjo on how the YAPP paper influenced thinking about performance (tuning) http://oraperf.blogspot.com/2005/06/yapp-ten-years-later-what-has-changed.html
- AMIS Query: Anjo Kolk (Miracle Benelux, Oak Table Network) – How the Middle Tier is Slowing Down Your Database
- AMIS Query on Oracle Enterprise Service Bus, Oracle WebServices Manager, SOA & BPEL. …and not about JBoss
- BPEL-Soft – Oracle's previous acquisition showcased during AMIS Query
- Enhancing Web Applications with AJAX, XMLHttpRequest, UIX – Report from An AMIS Ten Minute TechTalk
- SQL*Plus or Report style Break Groups in SQL Query