The "X" Product – An X(ML) Database Opportunity?

Exadata – Extreme Performance Warehousing“, that’s how the presentation on the last Friday morning from Oracle Open World started (Orig. Presentation Title was “Oracle’s New Database Accelerator: Query Processing Revolutionized”) while speaking about the new database machine. And indeed it was a revolution and it took the blogosphere by night. My notes from this session are already “for historic use” because a lot of people already blogged about it.

Another nice side effect is that Kevin Closson started blogging again. If you want to know all about there is to know regarding the “Exadata” hardware then you now can read up on Kevin’s: “Exadata Posts” page and FAQ’s. It must have been hard for Kevin not to blog about this cool piece of machinery and its smart database software.

The "X" Product - An X(ML) Database Opportunity? 01 oow2008

So if you want you can skip the following, because it is old news

 

 

The presentation on Friday, started out with talking about the data shipment, which is slowing down while those data warehouses become bigger and bigger. So what Exadata is doing is ship less data, has more pipes and makes use from bigger pipes to ship the needed data faster through the system. So in short it makes use of brawney hardware for use by Oracle brainy software.

It makes use of what has been defined as a storage server Cell which is the building block of the Exadata storage grid. The “Cell” is prepackaged with hardware and build-in software. This cell is optimized in its configuration for fast data processing across the full stack / bandwidth for all disks. Aggregated disk bandwidth over 1000 Mb/s Disk controller bandwidth over 1000 MB/sec. CPU power 2 quad-core Intel 64 bit. The interconnect is based on the Infiniband network, based on the ZDP Infiniband protocol (RDS V3). Some of the pro’s of this are very low CPU overhead, also available as Linux Open Source and zero op zero loss Datagram protocol. Exadata cell has two Infiniband links for network redundance.

A configured rack

8 Oracle database servers:

  • 64 Intel Processor cores
  • Oracle Enterprise Linux
  • Oracle Real Application Clusters

14 Exadata Storage servers:

  • 50 to 168 TB raw storage

Infiniband switches:

  • Optimized certified and supported by Oracle

You can scale out by just adding extra racks. From six racks upwards you will run out on ports to scale further. The cool part is that the performance will scale out linear while you scale out adding boxes.

It makes use of a Massively Parallel Scale out Architecture; This architectures is based on makes use of the ZDP protocol which makes use of the iDB data protocol (build by Oracle) which accesses the grid disks. Simple provisioning.  The architecture makes use of a storage grid and a DB grid and while doing this, these layers are unaware off each other, so ZDP makes use iDB but is not “hooked” in.

Fault management and diagnostics are build-in. Protection of your data is garden by Automatic Storage Management and protects from cell/block brownout and DataGuard provides disaster protection and corruption via an automatically maintained standby copy of the database.

Database Intelligence is build-in via Smart Scans. Exadata cells implement smarts scans to greatly reduce the data that needs to be processed by the architecture and this result in an enormous amount of data reduction.

The "X" Product - An X(ML) Database Opportunity? 02 oow2008

Does the X in Exadata not also stand for XML?

As far as I know, there is not an universal XML solution yet. IMHO (see also my blog post on this issue: “Why XML Does and Doesn’t Fit the Real World“), what the XML is in desperate need for is a “brainy” optimizer that understands XML access paths and with it a dynamic storage container that is flexible enough to address all that free formatted XML data out there. Most of the time XML is not structured, so how to address the issue retrieving your XML data fast. Although some pretend that there is an universal container, I think it doesn’t exist yet. At least, I couldn’t find a solution for this problem on the internet. I hope that someone proofs me wrong. That would give me more insight in how to deal with this conceptual problem. Nowadays, I have found a definition though that describes the underlying problem (although it has it roots in the Java world) and it is called “Impedance Mismatch“.

The "X" Product - An X(ML) Database Opportunity? xml tree

Another issue that still exists, is that there are to many big parties interest at stake, so instead of agreeing on one standard, most of the time there at least two “standard” solutions for an “ongoing” XML problem. In the end XML or the transport of data via XML will loose…and that would be very unfortunate because I thought the big parties at least had now agreed on something (the format for data transport between their systems).

Anyway, on that Friday in my next XMLDB presentation I attended, I listened into a interesting discussion from some of the XMLDB development team, about the Exadata machine and the possible use for these kinds of XML problems. This started to make me wonder about the “X” machine and its usage in the XMLDB data realm. If we, for now, can’t solve these kinds of problems, why not address these issues via “brute force“. The new “X” machine is apparently very good at pushing data at high bandwidths. Oracle has multiple storage options for its XMLType datatype, that can be used to deal with this free format nature of XML. Some of those storage models could be used, probably very effective, in conjunction with the Exadata Storage architecture.

The "X" Product - An X(ML) Database Opportunity? xml

The discussion pinpointed on XMLType Binary XML storage. This storage model can be compressed and hopefully in some of the next releases be handled via parallizing its data retrieval. What is interesting is that it can be indexed by something called and XMLType Index. An XMLType index is an domain index that has been defined for the realm that is XMLDB. It exists of a PATH TABLE in which ID’s, fragments of XML data and, among others, the tree structure of the XML instance are stored. This PATH TABLE table is indexed by multiple indexes that will be used if a query respects defined circumstances or if it can be re-written. There is much more to it, but the point is that, if you are not precise in your create XMLType Index statement, then this PATH TABLE can become very big. Up to the amount of multiple times even of the original XML document data.This is because it will contain a lot of references to possible XML elements.

The "X" Product - An X(ML) Database Opportunity? xml index

In general, the creation of an full XMLType Index (aka without an “include” or “exclude” parameter defined) is seen as the wrong way to go, because this is something like indexing every column on every table in your schema / database. It is a very fast, dedicated for XML, index routine and wouldn’t it be nice to make full use of this (and of the compressed binary xml data features). Although I am not a hardware specialist, nor have the in depth knowledge of a Kevin Closson, I still wonder if the Exadata machine couldn’t be used for XML storage as well…

Why limit it to data warehousing? There are probably other use-cases out there that could benefit from the Exadata machine. There are very big (terabytes big) XML systems and an E”X”adata system could do wonders to fill the needs of those systems, in other words, provide speed / scalable performance…

I still endure some residue of what is called a “jet lag“, but this is just my 0.02$, I am hopefully adding to discussion on the “where and how to use” such a cool piece of machinery…

 

Smiley

2 Comments

  1. Marco Gralike October 2, 2008
  2. Gerwin Hendriksen October 2, 2008