Reflections after Oracle OpenWorld 2015 – Data Management (Database) image23

Reflections after Oracle OpenWorld 2015 – Data Management (Database)

image

This article gives an overview of some of the most eye catching stories from Oracle OpenWorld 2015 around Data Management. It discusses on premises products such as Oracle Database as well as cloud services around the Database.

Data Management in the Oracle Integrated Cloud Platform Services composition covers Big Data, NoSQL and Oracle Database (RDBMS) in all its glory. MySQL is not as explicitly positioned – and is not currently available as a cloud service from Oracle – but deserves some attention as well.

Surprisingly prominent during the Oracle OpenWorld 2015 conference was the new release (12.2.0.1) of the Oracle Database. This release is currently in beta and is expected to become generally available in the Summer of 2016. The long awaited 12c release of Oracle XE is to be based on the 12cR2 release. It seems unlikely to be published before 2017.

The main messages about this release confirm the integration across the stack (in memory), join current industry trends (JSON, JavaScript) and first and foremost address challenging faced by anyone who wants a Cloud service based on the Oracle Database (either Oracle in its Public Cloud or any enterprise in their private cloud). The nice thing about the latter is that any improvement introduced into the database to help Oracle run its cloud services benefits us well for running our private cloud.

Multitenancy

Being able to use as few resources – physical machine resources as well as human administrator resources – for running certain workloads is important. The Total Cost of Ownership is dictated to a large extent by the human factor (up to 60%). For Oracle – owner of the database software – license costs are not as important as they are to the rest of us, but the cost of hardware is important as well as the ability to vertically scale within a given piece of hardware. Multitenancy is all about achieving higher density – more workload from the same physical set of resources (CPU, memory) – with low administrative burden.

The introduction of the pluggable database architecture in Oracle Database 12c Release 1 brought multitenancy into the Oracle Database. Pluggable Databases (PDBs) are on the one hand isolated databases in their own right – on par with database any standalone database – yet on the other inherit from the container database (CDB) all generic structures present within any freshly installed Oracle Database and share physical resources with other PDBs running in the same CDB on the same underlying hardware. Many administrative operations are performed on the container database as a whole instead of on every individual PDB – making these operations much more efficient per PDB. Additional operations were available to plug in and unplug as well as clone a PDB, the latter allowing incredibly fast ‘create database as this other database’ functionality.

The multitenancy capabilities are to be expanded considerably in 12c release 2 – judged by the presentations at OOW15, from Larry Ellison and Andy Mendelsohn down to dozens of conference sessions, hands-on labs and demo ground exhibitions.

image

Through the database resource manager, it will be possible to limit usage of memory by PDBs in addition to CPU and I/O – providing more isolation than in release 1.

The abilities to juggle with PDBs and do so while keeping them running (hot operations) are extended. In release 2, hot cloning of a PDB is supported as well as [hot] refresh of the cloned PDB at a later point in time, to resynchronize the clone with changes in the original PDB since the moment of cloning or the last refresh. This makes it very easy to create an environment to reproduce and fix a production bug or to create a production like test environment. The next figure describes how clones can be made according to Data Profiles – resulting in tailored, almost personalized clones that can automatically be managed.

image

The ability to mask data – apply data redaction at run time to hide confidential data – and use data masking and sub setting – functionality to create an extract from a database that contains a meaningful selection from the master data set with any restricted data hidden or scrambled – can play an important role in creating such environments as well, and were frequently referred to during the conference.

image

Another PDB juggling feature is the [online] relocate. This features helps us transfer a PDB from one CDB to another, without incurring downtime. This transfer can take place between two on premises container databases, and can also cross the great divide between on premises and public cloud, in either direction.

A new mechanism that helps both increase density and decrease [manual] administration effort is called Application Container. This is a vehicle for custom defined database objects – code and meta data – that should be shared by or made available across all PDBs. Just as the CDB contains Oracle’s generic, shared database objects, the Application Container contains the custom set of generic components. Any update to the contents of the Application Container is immediately available in all PDBs that consume the container.

In the Hands On Lab for multitenancy, Proxy PDBs were introduced. These are a special kind of Pluggable Database that it is kind of a “proxy” between two Pluggable Databases – representing a remote PDB locally and hiding the ‘remoteness’ from local consumers.

Currently, there is a limit of 252 PDBs per CDB. With release 2, this limit is raised to 4096, which means it is no longer a limitation for any practical purposes. Note that anyone adopting Release 12c of the Oracle Database can have one PDB for free – to run multiple PDBs requires the multitenancy database option. Also note that Oracle Database Unplugged – running without the pluggable architecture – will be deprecated in the future.

In Memory

The most rapid uptake of any new database option is claimed by Oracle for the In Memory option (generally available since July 2014) that uses data held in memory in a columnar data format to speed up analytical queries. Configuration of this option is declarative and very simple – and leveraging the feature is completely transparent. Applications are no aware at all whether they run SQL against a traditional database – relational data format stored on disk and perhaps retrieved from SGA – or against an in memory data set. In fact, they might well be running against a combination of both.

image

With the increased popularity of flash technologies, flash-based solutions such as all-flash arrays and PCIe-based flash servers, the notion of memory is shifting a little. Memory is not just DRAM – but potentially includes a spectrum of technologies with DRAM like access times, orders of magnitudes above [disk] storage IO. Smart Scans are sometimes limited by CPU not flash; that means that in addition to DRAM, we can engage Flash without loss of performance due to slower access times..

All Exadata machines, cloud or on premises, are now also able to store columnar data in the PCI-Express flash cards in the Exadata machines as well as in DRAM main memory, effectively boosting their columnar storage by a factor of 10X to 100X.

image

With Database 12c release 2, In Memory can be used on an Active Data Guard stand-by database. This makes it possible to off load high volume analytics away from the main OLTP database. Note that – somewhat unrelated – AWR will be available on Active Data Guard as well.

image

In 12cR2, the In Memory option will be much smarter about which data to hold in memory versus which data is on disk. The column store will be a cache, with aging out and repopulating managed by least recently used algorithms using the Heat Map that has been introduced in Release 12.1. In addition to this automated cache management, also policies can be defined to determine which data to keep or evict.

Other innovations with the In Memory option will bring additional performance improvements with joins (3x), expression evaluation (10x) and JSON parsing & processing (60x).

In the realm of in memory databases, Oracle also offers the TimesTen product, currently at its 11.2.2 release. TimesTen is not for the type of high volume analytical processing that the In Memory option for Oracle Database is intended for. Instead, TimesTen is for very low latency environments with high transaction volumes. TimesTen is frequently embedded in an application – somewhat similar to a in memory data grid but with transactions and standard SQL support, as well as built in PL/SQL.

Sharding

Sharding is fairly common term in the field of NoSQL databases. Oracle announced that starting with Release 12cR2 the Oracle Database will also have a support for sharding – a form of distributed partitioning. Simply put: each shard is an independent database on any physical location that contains a part[ition] of a very large dataset that is logically one. Because the databases are independent and have their own isolated physical resources, the sharded database is horizontally, linearly scalable.

Applications communicate with a shard director using regular SQL statements. Based on session characteristics and perhaps the nature of the query of DML operation, the shard director will either direct the statement to one particular shard or to multiple shards – perhaps rewriting the statement for each individual shard. In case of distributed execution, the shard director has to gather the responses from all shards involved and construct the combined answer that is handed back to the application. The shard structure is transparent to applications.

image

It seems logical to distribute the data across the shards based on geographical attributes – such as customers and orders for US customers go into the US shard and European customers have their data recorded in the European shard. Such a distribution will reduce latency challenges for most transactions and queries, while global queries and transactions are still supported as well, benefiting from the additional parallel query capabilities at each of the regional shards. Note that sharding – just like partitioning – can be based on any key or can be done randomly. The latter is called system managed sharding and results in many, relatively small chunks that are automatically balanced across shards. Detail data can automatically be assigned to the same shard as their master.

image

LinkedIn presented their home grown sharding architecture, based on Oracle Database 12cR1. They are heavily involved with the native sharding functionality in Oracle Database 12cR2 – and are quite happy with the way it is evolving.

With sharding, Oracle Database breaks through several barriers. Horizontal scalability is within reach – depending especially on the capabilities of the query coordinator to run distributed queries and assemble the results. Assuming that works as promised, then linear scalability to very large data sets and high volumes of distributed users seems feasible. Because queries can succeed even in the absence of some shards, fault tolerance of the database architecture is increased. For high availability, shards – the distributed individual databases – can easily make use of each other as stand-by database, to allow fast failover.

Of all database related topics during OOW 2015, I consider sharding as the one that most innovative and potentially has the most impact on Oracle Database Architects and Administrators. It may be for high end situations initially but the potential seems far reaching. It also introduces many interesting, tough questions. Do we get a three-phase commit for distributed transactions touching multiple shards? What is the definition of read-consistency in such a shared world?

 

Security

Security obviously is a very big topic – even more so with database in the cloud. Oracle wants us to move to the cloud because it is more secure than anything we can do on premises. With web scale security measures at all levels of the stack, including security in silicon and protection against social engineering and physical attacks at data center level, that may well be justified. Additionally, Oracle touts the use of encryption, for data in transit as well as data in rest: all data in the database should be encrypted. Requires no application changes, has “Near Zero” overhead with Silicon Encryption and is integrated with Oracle Database technologies such as Log files, Compression, ASM, DataPump.

image

If data in the Oracle Database cloud is encrypted, and the key required for decryption is in the on premises Key Vault, then no one – no employee of Oracle’s and no representative of law enforcement agencies – can read the data. As part of security measures, audit vault (to keep track of all actions around sensitive data) and data vault (to protect sensitive data from unauthorized actions) are promoted. The evolution of the still fairly new Real Application Security continues with 12cR2.

Database Cloud Services

Platform services for Oracle Database can be defined along several dimensions. One is the size of the shape – based on the intended usage. Oracle offers DBaaS for individual developers for development purposes, for small business (SMB) and departmental applications and SaaS extensions, for enterprise applications of a substantial size and finally extreme and mission critical workloads and consolidated enterprise databases. Along other dimensions, these services come with different levels and flavors of performance, security and availability; new services include RAC and Data Guard. Then to some extent there is some variation in the database editions that can be used.

clip_image004

The high end database cloud service announced at Oracle OpenWorld 2015 is Oracle Database Exadata Cloud Service. imageEnterprises can now deploy mission-critical production OLTP databases and data warehouses in the Oracle Cloud. The Oracle Database Exadata Cloud Service runs on the Oracle Exadata platform, which is in production at thousands of sites worldwide. The service includes built-in high performance and availability with features such as Oracle Multitenant, Oracle Database In-Memory, Oracle Real Application Clusters, Oracle Active Data Guard, InfiniBand Fabric, and Smart Flash Cache. This basically amounts to a managed Exadata platform – hassle free, physically secure, subscription based.

Some facts: Scale-out compute, scale-out intelligent storage, elastic expansion. Cloud automation: –Fast, elastic provisioning, –Automated management. Available in scalable units of:

–28 to 68 cores

–½ TB memory

–19.2 TB flash storage

–42 TB disk storage

Complementing the database in the cloud services is the Oracle Database Backup Service. This service stores backups from on-premises Oracle databases or databases in the Oracle Cloud using the same process you use to back up to disk or tape. the data is encrypted at the source and triple-mirrored in the cloud to be safe and always recoverable.

The new Sparc M7 CPU – the most powerful CPU in the world (dixit Oracle) – is the heart of the updated Oracle SuperCluster machine. A machine that, according to Oracle, is Engineered together for the best [private] DBaaS solution (and perhaps best public solution as well, but for now the Oracle Public Cloud seems to be running on Intel X86 chips in the Exadata systems). Together with multitenant Oracle Database 12c and Enterprise Manager 12c, The SuperCluster is positioned as the best Private Database Cloud solution – based on price/performance, security, low (cost) entry and quick ramp up.

image

 

Database Development

A new release of the Oracle Database brings joy to administrators as well as to developers. The main focus at OOW is on database architecture, high end features and breakthrough announcements. The new features most relevant or interesting for most database professionals are not as highly visible. Some made it to center stage though.

Shining through in the definition of some of the database dictionary views in 12cR1 we will have 128-byte object identifiers, up from 30 bytes (yes, only 30!) in the database today. Finally, meaningful object names for packages and views, even if you happen to work in a multi-byte language.

Java 8 comes to the database. The embedded JVM will be Java 8 and include all Java 8 SE features. This includes the Nashorn JavaScript engine and the bi-directional integration between Java and JavaScript. This in turn means that stored Java Procedures wrapped with PL/SQL interfaces can in turn invoke JavaScript code. That could be used for example to process JSON documents – stored in the database – in JavaScript.

Another integration with JavaScript is available in the form of the Oracle Database driver for Node.js. Node.JS can be described as a middle tier application server for JavaScript, somewhat similar to what Tomcat or Jetty is for Java code. Node.js is able to serve high volumes of clients through a very lean infrastructure and an asynchronous programming model, that works great with requests that are largely handled by back end components such as services or databases. This driver will make Oracle Database a more easily accessible option for Node.js developers. Similarly [new] Oracle Database drivers have been published for Python, PHP, R and .NET. Let there not be any barriers for using Oracle Database from any of the common application development technologies! In 12cR2, Application Continuity – the option that makes the application highly available – will be supported with all drivers and therefore all types of clients.

JSON is a hot topic. Support storing, validating, parsing and querying JSON documents was added to the Oracle Database in 12c R1. More support for schema-less documents such as JSON will be available in R2. I am hoping for capabilities to construct JSON payloads – similar to the XML construction we can do with abstract data type to XMLType conversion or even SQL/XML. Additionally, richer JSON filtering capabilities have been suggested, supported perhaps by a search index – similar to the binary index for XMLType.

Already, SODA (Simple Oracle Document Access) implementations are available through a Java API and as REST Services (with ORDS – Oracle Rest Data Services – or with Servlet under embedded XMLDB Http Listener).

image

SODA can be positioned as a set of micro services for interacting with JSON documents stored in an Oracle Database 12c.

ORDS also allows relational database objects to be exposed through REST/JSON services. imageNote that these REST services – on top of relational tables or through SODA against JSON documents – can be consumed directly from JavaScript in web pages running in a browser. ORDS itself is a middle tier component that runs outside the database. It leverages a Java EE container – either Tomcat, WebLogic or an embedded [Glassfish] container. ORDS 3.0 supports OAuth 2.0 Security for authentication and authorization.

image

 

SQL Developer is ORDS aware and can be used to easily configure ORDS for specific database objects.

The SQL*Plus command line has been with us for a long time. Even though many new user interfaces have been provided for interacting through SQL and PL/SQL – including browser based tooling (iSQLPlus) and of course SQL Developer – many database professionals refuse to let go of their command line tool. If the DBA does not come to the mountain, the SQL Developer team thought, we have to move the mountain. The result is a new tool, SQLcl, that offers the command like quality of SQL*Plus, with a number of enhancements taken from IDEs, in particular from SQL Developer.

SQLcl is a standalone command line, a small Java application. It can connect to a database through various connection types – EZConnect, TNS, LDAP, native SSH tunnels, Proxy, G/Login.SQL. SQLcl extends the SQL*Plus command set with improved formatting and some very useful commands, including ALIAS (shortcut for reusable complex set of commands), CTAS (create table as select..), DDL (generate DDL from an existing database object), History (recent command history), INFO (improved desc), code completion (like Linux command line completion of directory and file names). See this SlideShare presentation on SQLcl.

Package managers are an important component for example in Linux (RPM) and with Node.js (NPM). Package managers read meta-data that describe applications – their composition and dependencies, version details and installation & configuration aspects – and can install, upgrade and remove applications through simple command line operations. At the OOW conference the notion of a database package manager (dbpm) was discussed. Through this dbpm components consisting of database objects can be managed (note: the term package in this sense does not correspond one-to-one with a PL/SQL Stored Package). With a single install command, multiple database objects could be installed or upgraded and [smoke] tested. The dbpm could use a repository to keep track of the [versions of] packages that are installed or are available for installation. The database package manager could be integrated with SQL*Plus, SQL Developer and the SQLcl tool.

LiveSQL is another, entirely different, user interface for doing SQL and PL/SQL interaction. LiveSQL is a browser interface on top of a live Oracle Dataabse. Through this interface, users can easily try out SQL and PL/SQL statements. They can create their own database objects, import data and try out their own code. Additionally, many annotated and ready to run samples are available, of interesting SQL statements and PL/SQL snippets. Blog-authors or anyone who feels like sharing code can publish on LiveSQL. The longer term plan is to have (all) code samples in the Oracle Database documentation available on LiveSQL and to refer from the documentation to these live samples.

image

The MATCH_RECOGNIZE operator introduced in Oracle Database 12cR1 for pattern matching – data analytics beyond what analytical functions can do – was probably the SQL capability receiving most attention. It proves a powerful tool for advanced and quite fast data inspection. Another good to know about operator is APPROX_COUNT_DISTINCT, to be used for ‘how many’ questions, where the answers does not need to be exactly right. When a deviation of a few percent is allowed, use this operator with a 5 to 50 times better performance over count(distinct).

image