Reflections after Oracle OpenWorld 2015 – Business Analytics (Big Data, GoldenGate, OBI (EE), ODI, NoSQL) image5

Reflections after Oracle OpenWorld 2015 – Business Analytics (Big Data, GoldenGate, OBI (EE), ODI, NoSQL)

image

note: I would like to thank Mark Rittman of RittmanMead for sharing many of this findings from Oracle OpenWorld 2015 as well as a comprehensive slide desk.

Business Analytics covers the areas of Business Intelligence, Data Discovery and Big Data as well as some of the data gathering and preparation that is required to get to the point where analysis can be performed. It an area where several cloud offerings have been introduced and more announced –all of them with quite a bit of fanfare. The Data Visualization Cloud Service for example got plenty of airtime during Larry Ellison’s Sunday night keynote session. Underlying this visualizer is the Visual Analyzer that is also available in BI Foundation and BI Cloud Service (BICS), and will ultimately be part of the Oracle BI Apps offering too.

clip_image002

The product portfolio in the area of business analytics is quite broad – and a lot of evolution is taking place around it. The most prominent products in the area of Business Analytics include Oracle BI [EE] and BI Applications, the data integration offerings ODI and GoldenGate and the various Big Data products. And of course their cloud based counterparts. Some of the highlights are discussed in this section.

Data Visualization Cloud Service

Announced as a cloud service targeted at business users for no more than $150 per month (for a minimum of 5 users), the Data Visualization Cloud Service provides a mechanism for rapid creation of highly visual analyses of data. Business Users can upload data from Excel, use a simple drag and drop mechanism to highlight the data elements that they wish to include in the analysis, and the Visual Analyzer will automatically generate an analysis (chart, graph etc.) using whichever type of visualization it thinks best suits the data selected.

The user can change the visualization to something they feel would work better – this is just a couple of clicks to instantly see the changes. Adding further elements is simply a case of dragging and dropping those further elements onto the visualization that was just created. Visualizations can then be saved, annotated and shared – and those you’ve shared the visualizations with can play around with them/enhance them, rather than only seeing a static image that cannot be challenged. This service is positioned to compete with Tableau. The underlying Visual Analyzer component can also be used as stand-alone desktop application and is included in OBI 12c.

Oracle BI 12c

Oracle BI 12c is the strategic foundation of Oracle’s analytics platform. The 12.2.1 release of Oracle BI was made available for download in the second half of October, just before the conference. As with most FMW 12.2.1 products, a cleaner look & feel is one of the most striking new features. The option to just upload data from Excel, other spreadsheet tools or other data sources (“data mashup”) is also included in BI 12c; also. This last feature is part of one of two main trends in this release as identified by Mark Rittman: self-service and agility. As Mark puts it (in this article http://www.rittmanmead.com/2015/10/oracle-business-intelligence-12c-now-available-improving-agility-and-enabling-self-service-for-bi-users/ ): “[.. ]new features such as data mashups make it easier for end-users to complete the last mile in reporting by adding particular measures and attribute values to the reports and subject areas provided by IT, avoiding the situation where they instead export all data to Excel or wait for IT to add the data they need into the curated dataset managed centrally.” Business users can point and click to upload personal data and blend it with IT-managed data in BI 12c, which automatically infers connections between data sets.

Support for Mobile has been extended in 12c: Keyword search (“BI Ask”) empowers users to literally talk to their data, asking questions and having visualizations automatically created as responses, opening up an easy entry point for authoring.  Additionally, the interface for iOS has been completely redesigned; and Mobile BI for Android offers sharing and following for nearby devices, as well as the ability to project any dashboard or story to Google Chrome Cast-enabled devices.

Predictive analysis is more tightly integrated, enabling customers to more easily forecast future conditions based on existing data points, group elements that are statistically similar (binning), and expose outliers. BI 12c includes the ability to run the free Oracle R distribution on BI Server, and extend existing analytics with custom R scripts, which can point to any engine (R, Oracle Database, Spark, etc.) without needing to change the BI RPD to deliver results.

Optimized in-memory processing (EssBase) is included for best use of system resources, enabling faster analysis on more data. Streamlined administration and life cycle management reduce the time and resources required to manage BI 12c. It is quicker and easier to provision new 12c environments and move workloads between on-premises and in the cloud.

Oracle BI Cloud Service (BICS)

In Oracle BI Cloud Service ($250 per user per month), you can create detailed analyses and carefully curated dashboards. With Visual Analyzer, end-users have faster, simpler assembly of detailed reports. Oracle Business Intelligence Cloud Service (BICS) is the cloud based version of Oracle Business Intelligence Enterprise Edition, with an enhanced web interface and the standard Analysis Editor, Dashboard Editor and repository (RPD) under the covers. BI Cloud Service is designed to be used by every employee. Oracle Database Schema Service is integrated with BICS. Oracle refers to BICS as “agile” business intelligence due the fact there are no on-premises infrastructure, on both BI tools and database, which speeds up the whole implementation process. In addition, on BICS you don’t need to buy perpetual or one/two year term licenses for BI tool, instead of that, BICS price model is based on named users (monthly) and storage sizing.

EssBase Cloud Service

EssBase will go into the cloud as the upcoming ESSbase Cloud Service. It will provide a platform component for other cloud services from Oracle, as well as for custom analytic applications.

clip_image004

An early screenshot of what the EssCS could look like – using the simplified user experience originally created for the Oracle Cloud Applications such as HCM and Sales Cloud – is shown in this screenshot:

clip_image005

Note that Oracle indicated for this cloud service: “Essbase is Essbase, but we do not plan for 100% feature parity in v1”.

Oracle BI Applications

Oracle BI Applications is an on premises product. It is described as a set of prebuilt BI solutions that deliver intuitive, role-based intelligence for everyone in an organization from front line employees to senior management that enable better decisions, actions, and business processes. Designed for heterogeneous environments, these solutions enable organizations to gain insight from a range of data sources and applications including Siebel, Oracle E-Business Suite, PeopleSoft, and third party systems such as SAP. OBIA 11.1.1.10 supports the following Oracle Public Cloud SaaS offerings: Oracle Fusion Cloud Service (Sales Cloud, HCM Cloud, Financial Cloud, Procurement Cloud, Project Cloud), Oracle Talent Cloud Service (Taleo Enterprise Cloud), Oracle Service Cloud Service (RightNow Cloud). This release also has the Diagnostic Health Check – a preliminary ETL phase in which a diagnostic report is generated to identify problematic source data that may cause ETL failure or data loss or corruption in the data warehouse. In the event of a failure of an ETL task, diagnostics are run and error handling and automatic correction are performed to enable the load plan to restart and continue. Soon Oracle will release Visual Analyzer for Oracle BI Applications.

The cloud counterpart of BI Applications is called Cloud Business Analytics (CBA). There will be multiple releases of this service each year, with one major on premises OBIA release per year. Part of CBA is Oracle Transactional Business Intelligence Enterprise, designed for executives and analysts to drive strategic growth of business. Initially this includes strategic analytics for Human Resources and Enterprise Resource Planning. CBA also provides cross-source analytics for cloud, on-premises, and 3rd party sources, with eBusiness Suite & Peoplesoft on-premises compatibility.

Somewhat related are cloud services around Enterprise Performance Management (based on Hyperion). Already available is Oracle EPM Cloud with the full power of Hyperion Planning, featuring Oracle Planning and Budgeting Cloud and Oracle Enterprise Performance Reporting Cloud. Financial Consolidation & Close Cloud Service and Account Reconciliation Cloud Service were in preview at OOW 2015, to be fully integrated with the EPM Cloud shortly. The longer term vision includes: Profitability and Cost Management Cloud Service, Tax Provision and Reporting Cloud Service, Dimension Management Cloud Service.

Data Integration

The whole Data Integration Platform consists of a considerable series of products, as shown in this next picture. Oracle Data Integrator (ODI) and GoldenGate are the most prominent among those, with ODI focusing on bulk ELT processing and GoldenGate taking on streaming, real time data delivery.

clip_image007

GoldenGate

Oracle acquired GoldenGate in 2009: a leading provider of real-time data integration solutions. Low overhead capture of database events, distributing these events in near real time and applying changes derived from these events is the short summary of what GoldenGate can do. Few acquisitions have had such a tremendous impact within Oracle. GoldenGate is used in many critical use cases, for real time data synchronization between heterogeneous (or even homogenous) systems or migrations and upgrades and now for cloud to cloud and cloud to on-premises data exchange.

GoldenGate 12.2.1 was released in October of 2015. It offers improvements in ease of use, support for Big Data (targets such as Apache HBase, Flume, HDFS, Hive, Kafka, NoSQL) and MySQL 5.7, support for standard (AVRO, JSON, XML) and customizable data formats, automatic heartbeat, metadata in Trail File and more Oracle Database support (invisible columns, Data Pump). This year at Oracle Open World 2015, Oracle announced Oracle GoldenGate Studio Release 1 (12.2.1). This is Oracle’s first step into bringing Oracle GoldenGate into the GUI environment. Studio is a design tool that complements the monitoring tools that Oracle provides for GoldenGate.

The GoldenGate Cloud Service (GGCS) was announced as well.

image

This service will allow organizations to run GoldenGate with a subscription on an hourly basis, and use it for upgrade scenarios or for on premises to cloud or cloud to cloud synchronization use cases. GGCS can have Oracle Database Cloud Service, Exadata CS and Big Data Cloud Service as a target (delivery to NoSQL or Hadoop), as well as third party clouds (Azure, AWS, etc.)

clip_image009

Oracle Data Integrator (ODI)

Proprietary ETL is dead – proclaimed one session at OOW 2015. Apache-based ETL is what is next. We have entered the era of Big Data ETL – for now in batch and soon after that as streaming ETL.

The somewhat provocative graphic that accompanied this statement is shown here:

clip_image011

As part of this (r)evolution, Oracle Data Integrator 12.2.1 has come out in October 2015. Big Data support is an important area in this new release, following up on the ODI for Big Data release of April 2015. Data access is supported in many sources, using an array of native data access languages and protocols (including HCat, Sqoop, Kafka, Big Data SQL). Transformation engines are available in various Big Data technologies such as Hive, Apache Spark, Apache Pig. Oozie can be used for orchestration of the process. In line with NoSQL, Oracle has coined the term NoETL engine around this ODI for Big Data. The beta program has started for ODI Streaming ETL (on Spark Streaming).

Also in release 12.2.1 is better support for life cyle management: release management capabilities are introduced to provide a distinction between those development and deployment environments and artifacts. The ODI Exchange is launched, which allows browsing, download, and install of global ODI objects made available by Oracle, or other ODI users through Official of Third-Party Update Centers.

New Knowledge Modules keep being added; in 12.2.1 there is module for Oracle Partition Exchange, o allow users to swap partitions as needed.

This release adds a new Oracle Enterprise Data Quality (EDQ) technology – meaning that it is directly available in Topology and allows the creation of data servers, physical schemas, and logical schemas for EDQ.

With regard to the cloud: ODI will be certified to on the Java Cloud Service. Additionally, capabilities from ODI will be included in some of the other proposed cloud services, including Big Data Preparation CS. The next figure shows the proposed situation for ODI on the Oracle Public Cloud.

clip_image013

Enterprise Data Quality and Enterprise Metadata Management

A not so well known product that has only been part of Oracle’s portfolio for a relatively short period is Oracle Metadata Management. A product to manage and govern all metadata – such as definitions of business objects and technical data structures and their dependencies and the evolution of all of these. The product provides data lineage and impact analysis reports across technologies. A product that can harvest metadata from Oracle and third-party data integration, business intelligence, ETL, big data, database, and data warehousing technologies. This product is targeted as Data Stewards, Architects and business users. A product that seems related to the API Platform that is to come out of the Integration team – in terms of governance of shared, canonical definitions.

Release 12.2.1 came out in October 2015. The product includes harvesters for a wide range of technologies, a business glossary (to catalog, link and collaborate on business terms), social and collaboration features (including external URLs, annotations and tagging), automatic stitching, Metadata Explorer, Data Flow Lineage & Impact Analysis.

A related offering around governing and managing data at enterprise level is Oracle Enterprise Data Quality: Oracle’s strategic data quality management platform, used to understand, improve, protect and govern data quality throughout the enterprise. For this product too, release 12.2.1 was published around OOW 2015. Part of this release is deeper integration into the Oracle stack – for example WebLogic and Coherence Clustering and Oracle Real Application Clusters (RAC), allowing a cluster of any number of servers to act as a single highly available and highly scalable EDQ system.

An additional Data Store type has been added to allow EDQ to read data from Hadoop using the HIVE specification. EDQ’s integration with the Fusion Middleware Audit Framework has been extended at this release to provide auditing of a much wider range of events in EDQ. All Web Services that have been created in EDQ will now have a REST API as well as the existing SOAP API. The REST based API allows JSON objects be passed over HTTP to a server and have JSON returned to the caller. Improved Matching Flexibility allow designers of match processes to configure and maintain match rules more easily, especially where matching works on a large number of identifiers.

The longer term strategy focus on Data Governance – to manage (the process for managing) the quality of the enterprise data. And to integrate even deeper into the Oracle stack – especially with some of the DaaS and SaaS services. EDQ – like ODI – will be supported to run on the Java Cloud Service and will be integrated into SaaS as Fusion DQ and into DaaS as DaaS DQ.

The next figure shows all planned public cloud services around data integration and management of metadata and data.

clip_image015

One special service – very much in the design stage it seems – is Oracle DataflowML Cloud Service (A data integration fabric with smarts, as one OOW presentation put it). This service targets data integration for Cloud and Big Data, differentiated by machine learning. This service is positioned against AWS Data Pipeline, birst User Data Tier, BIReady, Google Cloud Dataflow and the likes. It uses a Lambda/Kappa platform, powered by Apache Spark. It connects through REST APIs and with connectors for Oracle SaaS and PaaS and 3rd Party Cloud as well as on premises batch, GoldenGate, MFT and SOA Suite. It seems related to IoT service – but probably less real time oriented and for Big Data instead of small events.

Cloud Services for Big Data

Under the umbrella of business analytics are a number of cloud services for Big Data: clip_image017

· Big Data Preparation

· Big Data Cloud Service

· Big Data Discovery – visual face of Hadoop

These fit into this larger Big Data Cloud Platform overview:

clip_image019

Big Data Preparation CS

The Big Data Preparation CS is defined as a unified system to prepare, repair, enrich, govern and publish any type of data (log, social, machine-generated) [for loading into Hadoop, a NoSQL database, the Oracle Database and other places where processing and analysis can take place]. The service itself is Oracle Public Cloud based – but sources and destinations can be in any cloud or on premises. Soon after its initial launch, the service will be able to publish to BI CS.

The service is a massively scalable built on Hadoop/Spark, and enhanced with Natural Language processing and Reference Dataset Knowledge. It is targeted at data domain experts, not at programmers.

clip_image021

It is the familiar story of decoupling components, specializing components on specific tasks and freeing up human and machine resources to focus on their specific job. For example: liberate Hadoop, ETL or DQ developers from writing custom scripts for data preparation. By preparing the data, the downstream processes receive consistent, complete and trustworthy data.

The Big Data Preparation CS performs continuous execution of recognized files; once configured for a specific type of input, no human intervention is required to do processing of that type of file. Governance dashboards are provided for insight into runtime metrics, health reports, alerts & error prone manual data curation efforts.

Here is a screenshot for the BDP CS with live analysis of preparation of a data set and suggestions – based on machine learning with semantic technologies:

clip_image023

Big Data Cloud Service

The Big Data Cloud Service is about storing and processing big data in the public cloud instead of on self acquired and self-managed infrastructure on premises. It provides the power of Hadoop delivered as a secure, automated, elastic service, which can also be fully integrated with existing enterprise data in Oracle Database. Simply put: this service gives a company the use of a Oracle Big Data Appliance and its components, without purchasing and maintaining hardware. Oracle hosts the big data machine environment and takes care of management. Data can be stored in the Oracle Storage Cloud and processed with multiple Hadoop clusters.

Oracle Big Data Cloud Service provides a massively-scalable big data environment featuring:

  • Cloudera’s comprehensive software suite, including Cloudera Distribution with Apache Hadoop and Apache Spark.
  • Big Data Connectors that deliver load rates of up to 15 TB per hour between Oracle Big Data Cloud Service and Oracle Exadata Cloud Service.
  • pre-configured tools to help accelerate analytical efforts on geospatial, graph, or regression problems, such as
    • Oracle Big Data Spatial for scalable geospatial analysis and map-building.
    • Oracle Big Data Graph for storage and analysis of social and Internet of Things (IoT) networks.
    • Oracle R Advanced Analytics for Hadoop for scalable R processing on Apache Hadoop and Spark.

Oracle Big Data Cloud Service can be extended with Oracle Big Data SQL Cloud Service. This service allows a single query to span data warehouses running in Oracle Database Cloud Service—Exadata Edition and Oracle Big Data Cloud Service.

Oracle Big Data Cloud Service provides strong authentication, authorization and auditing of data in Hadoop with just a single click. Strong authentication is provided using Kerberos. Oracle Big Data Cloud Service leverages Apache Sentry to authorize SQL access via tools like Hive and Impala. Both encryption of data-at-rest and network encryption are included. Hardware-based VPN provides additional security for all Hadoop Services.

clip_image024

NoSQL Database Cloud Service

A brand new cloud service, announced but not yet available and with very little resources published on cloud.oracle.com is the NoSQL Database Cloud Service. This service makes a dynamic scale-out, low latency key-value database available from the Oracle Public Cloud, with support for JSON and Table data types. Built-in are high availability, transactions, parallel query and clustering. Automated load balancing is provided for fast efficient data access across the cluster. Simple APIs are offered for Java, Node and Python developers. Longer term features include time series, elastic search and mobile data synch.

The service will be integrated with Big Data Cloud Services & Database Cloud Services and allow associations from other cloud service such as Node.js (in the Application Container Cloud Service) and Java Cloud Service. This service is based on Oracle NoSQL Database, built on top of Berkeley DB.

Big Data Discovery – The Visual Face of Hadoop

One of the things slowing down the adoption of Big Data is the fact that even when you have your Big Data collected in the Data Lake and you are convinced that relevant information can be extracted, you may not know how to get to that information. Which patterns are hidden beneath the surface of the data lake? How do you filter, join and aggregate data to arrive that useful conclusions? Once you know all that, you define reports that can be run every day of week on freshly collected [big] data as well as dashboards that provide a near real time insight based on the latest data and based on the insight on how to use that data.

The process of finding out how your collection of big data can be exploited – of unearthing the structures and correlations and joins in the data set – is called data discovery (or data exploration). Oracle acquired a company called Endeca a few years back (October 2011) that was very active in this area.

Based on technology such as Endeca, Big Data Discovery offers an intuitive visual interfaces for business analysts and data scientists to find, explore, transform and analyze Big Data. The service allows for comprehensive data wrangling for fast, scalable data manipulation using the full power of Hadoop cluster and has support for unstructured data, including text enrichment, sentiment analytics, keyword search, location based correlation, language detection, and classification.

clip_image026