After a great party with the Oracle house band doing a veritable “The Commitments”-style performance and some excellent food under a largely clear and sunny Finnish Summer sky and wild night in the woods (with the mosquitoes having their own little party), day two of OUGF 2014 Harmony is under way. For me, it will be another Chris Date day, together with Graham Woods on performance, my own presentation on ADF and some additional bonus sessions.
Graham Woods (Oracle, Real Performance Group) – The Changing Performance Landscape
Seekless devices (SDD, Flashdrives) mean that I/O is becoming much faster and much less of a bottleneck (data access in 1 ms instead of 10-15 ms).
Workload management required when the system is no longer I/O bound. The disk is no longer the limiting factor, you can get maxed out on the CPU. In order to prevent the system from effectively slowing down or even stalling completely, you have to actually manage the workload.
What does the CPU do? Being stalled most of the time – on cache misses or I/O (large value of the CPI metric)
In Memory structures help leverage modern CPUs to allow data intensive operations at true CPU speeds: columnar storage and encoding and compression.
12.1.0.2 will have in memory columnar stores – to be released on June 12th by Larry Ellison: in memory columnar representation of on-disk-structures, synchronization between disk and memory, query optimization for in-memory-structures. Note that a single query can leverage data from memory together with data from disk.
Disk Access is at ms level, DRAM is microsecond level and CPU Cache is at nano-second level (that is 10e6 times as fast as disk access). With in memory capabilities Oracle is aiming for CPU Cache data access – which will lover the CPI(cycles per instruction) dramatically. Real world example: from bitmapped index on conventional storage to Exadata (offloaded scan with Bloom Filters) to In Memory Columnar structures delivered 20 times and 8000 times better performance.
The big challenges:
- CPU over-subscription (and process over-subscription) (as a result possibly taking the disk bottleneck away, which takes away the limit on what the system can handle) – architecture and resource management (1-10 processors per node, use resource managers to ensure there is always some spare capacity on the
- ever increasing data sets – algorithms designed to work with large number of rows if that is what you are dealing with (row by row row processing is insane on millions of rows; if you can process a row per millisecond, 1M rows will take 17 minutes and a Billion rows some 12 days)
- a reluctance to change code – hardware is the new software (people seem much more prepared to change hardware than they are to change software, the very reverse from what it used to be; it was called software because it could be change easily). Why is there such a big reluctance to change code – that is where the real benefits typically are. Unlike the marketing message vendors are spouting, including Oracle.
Conclusion: big changes in industry. It is your decision to resist these changes or embrace them. In the latter case, accept the fact that accepting one change (faster CPUs, faster storage) will probably mean changes in other areas, such as the resource/workload management and the algoritms in your software. Be prepared to adopt the software (logic) to the changing circumstances in which the software is running.
Chris Date – Temporal Data and the Relational Model
<this deserves a separate article> I hope I will get round to it.
Some observations:
The SQL 2011 description of the temporal model is very limited. One would almost think the vendors have ulterior motives to come up with a proper standard – as they are deciding on a standard that seems flawed and incomplete in many ways.
Some very important elements seem to be missing – especially the fact that the interval (which is called period in SQL because the term interval was already taken for the duration) is not a first class citizen. It is not a data type (generator) in its own right, but rather a construct based on two separate column definitions. The official reasoning behind this is the fact that a new type in SQL would confuse the eco-systems – technologies that have to interact with SQL. It seems quite lame – to hold back the evolution of SQL because of this. Especially given the fact that being able to use proper intervals does not force you to do so. And for ignorant client technologies, you can use views to hide this unfamiliar construct until such time as the clients can handle them.
Melanie Caffrey – Three in 12c: Row Limiting, PL/SQL with SQL and Temporal Validity
Top N – WITH TIES to keep widows & orphans togerther: if you request TOP 5 and numbers 6 and 7 rank exactly the same as record #5, WITH TIES will result in the #6 and #7 being returned as well. Without WITH TIES you would just get the TOP 5 and be none the wiser about the other two records that should probably be in the list as well.
PL/SQL in SQL is helpful among many other ways when you cannot bring new PL/SQL objects into a production database and running SQL is not such an issue. Additionally, many SQL queries will perform much better with inline PL/SQL than if they would have to go through the context switch into non-in-line and non-UDF-pragmaed units.
WITH_PLSQL hint in UPDATE statements is required to successfully use inline PL/SQL inside UPDATE statements:
update /*+ WITH_PLSQL */ emp a set hiredate = (WITH FUNCTION func ( q in number) return number is begin return 1; end; select func(sal) from emp b where b.empno = a.empno) where a.job=’CLERK’
Oracle 12c 12.1.0.1 implementation of temporal validity is introduced by Melanie. It is a first stab at getting the SQL 2011 standard supported – and that falls short a long way from what Chris Date suggested in his seminar. At the same time, it is a very useful first step (or actually: it’s a continuation of the work previously done on Flashback Queries).
Currently no overlap detection.