The backdrop for Oracle OpenWorld 2016 – or indeed all IT conferences – is the constant evolution in IT. On many levels – from hardware to philosophy – the ongoing fundamental developments open up opportunities and foster and sometimes accelerate change. To understand some of the directions taken by Oracle outlined in this document, it is useful to take a brief look at the elements in that IT evolution, that are most strongly influencing how Oracle is moving. Note that many of these evolutionary changes are linked together and enable and enforce each other. Also: a sequence of small steps may at some point lead to a breakthrough acceleration: each step by itself may not be spectacular but the accumulative effect from all steps may bring us to a next level.
An example of that effect is that small increases in processing speeds may for a long time just have been small, incremental and evolutionary changes. At some point – operations that had to be done in the background, as an asynchronous or batch job can now finally and suddenly be done in real time, synchronous and interactive because at last the wait time has dropped below what is acceptable to humans. Interactive SQL against data on Hadoop is an example of this. As are speech capture and chatbots. This means a breakthrough change because we can offer a new class of services and functionality to end users.
Similarly, the combination of fast and smart compression and increased memory sizes have brought us to a point where [the active dataset from] enterprise databases can be held entirely in memory. As a result, reporting and analytic operations against this data set can be done at speeds that are one or two orders of magnitude faster than previously.
Some of the relevant IT developments are briefly touched upon below.
More powerful IT
• Moore’s law – reduced costs and higher capacity – limited by heat generation (we can go smaller, but we generate too much heat to actually do so) – and breaking through previous barriers (for example bandwidth, latency, memory size); we can get more compute and storage and network transport capacity than ever before and at lower prices.
One of the many effects of this is that a flat network architecture with many direct connections and therefore very low latency is now feasible cost wise, as Oracle is showing in its Generation 2 Data Centers where it replaces the traditional slower and less predictable but much cheaper hierarchical network architecture,
• Over the decades, the number of IOPS (I/O operations per second, for example between compute units and data stores) has been increasing in leaps and bounds. The speed of input operations traditionally were the main limiting factor in many data intensive IT operations. However – many developments have taken and are taking place that is changing the picture. As Oracle states for example: “it is very hard to be IO bound on Exadata (because of the Inifiniband internal network, the smart storage cells, the fast storage units and the use of large and fast DRAM and of very fast NVMe (non-volatile memory express on top of solid-state drive)”.
In general, systems have more and faster memory and use non volatile memory stores between DRAM – both offering very high IOPS rates. Additionally disk storage units are using large and smart memory caches to preempt more expensive disk operations and compression that helps to reduce the data volume to be read and transported – at the expense of some CPU capacity. This can further increase the IOPS performance. The increased speed of networks – both higher bandwidth and lower latency – help to further increase IOPS within [engineered] systems and between systems.
The next bottleneck in terms of IOPS is not so much within the data center, but between corporate on premises data centers and clouds or between the cloud environments of different providers. It would make sense for all major cloud providers to closely work together to provide reliable, high speed connections between their respective data centers.
• Distributed, parallel processing is not necessarily new. However, it has been growing in importance and the effectiveness, efficiency, affordability and ease of use today are far beyond what they used to be. Of course the compute capacity of our CPUs has been increasing primarily by a growing number of cores rather than the clock speed of the individual cores. Modern programming languages such as Java and enterprise platforms such Oracle Database know how to leverage parallel threads on multiple cores in a single box.
The much faster networks have made distributed work processing across co-located systems – or even quite far removed systems – feasible. MapReduce and other Hadoop workload execution engines are an example of this. Interactive jobs can be performed for example using Spark SQL running against a distributed file system – because of fast processors running close to the data, fast networks and smart job managers that use clever algorithms to divide the work, distribute the work packages, collect the results and compile the final outcome.
We are becoming smarter as well – or in some cases we are now finally able to put preexisting smartness to good use because finally technological reality has caught up with mathematical theory. Out of the box thinking – again, enabled by technological advances – brings us to new insights and different approaches. Briefly some examples
– given a bottleneck, what can you do to reduce its impact? By engaging non-bottleneck resources to take over some of the work or pre-process the work to make it easier, the overall throughput can be much increased. This realization has been applied in many cases – including the software in silicon, compression of data (to fit more in a Gb of network transfer or a GB of DRAM) and filter data as early as possible (such as SmartScan on the Exadata storage cells)
– data arriving in real time – for example from IoT networks – can easily flood a system. Streaming Analytics algorithms have evolved to do filtering, aggregation and pattern matching on data that is streaming in. Time slices of data are kept in memory, analyzed and reduced to usually a tiny number of useful events. The vast majority of data can quickly be discarded
– if an approximate answer to a data aggregation question is good enough – and frequently that turns out to be the case – then the mathematical algorithms indicated with the term Approximate Analytics can yield results up to an order of magnitude faster (10x) than traditional, exact aggregations. Oracle has implemented these algorithms in SQL in the 12c release of Oracle Database
– the realization that data does not necessarily need to be queries from the same interface where it is manipulated has led to several new, hugely efficient approaches to working with data. The acronym CQRS – for Command Query Responsibility Separation – is applied to that school of thinking. It feels alien to anyone used to relational tables as the single source of queries and target for data manipulations. Introduce a view for the queries, and the first cracks begin to show and with Materialized Views it becomes even more explicit.
With the In Memory option, Oracle has made CQRS part of the database internals: DML go against disk, Query against memory and Oracle keeps the two in synch. Realizing that data does not always need to be completely fresh – allowing a quick data read cache to possibly lag a little behind the transaction reality can make a huge difference . Using an Elastic Search index across various data sources and performing queries initially against that index before perhaps retrieving relational records from the RDBMS is a valid way of working in certain circumstances.
– By using the far more compact columnar storage format for data that is only queried and not updated, a lot more data can be fit into memory. That makes In Memory processing feasible for large data sets and even entire databases. Data that is available in memory in columnar format can be queried extremely rapidly. That means that traditional indexes can be discarded. That in turn means that OLTP can be done much faster and scale much better. No more locking because of indexes and no more effort to update the indexes. The smart concept of the columnar format makes DML operations go faster – an unexpected bonus.
– mathematical theory describes the potential of BlockChain. IT developments and Cloud infrastructure have made the implementation of blockchains a reality. The impact of a completely reliable, non-affiliated – to either bank, government or corporation – transaction ledger is not yet fully clear. The assumption that this impact will be substantial seems not too bold.
Download the AMIS OOW16 Highlights for an overview of announcements at OOW16.