Talk of the town during Oracle OpenWorld 2016 definitely included the term Machine Learning. Machine learning was mentioned in every other session it seemed. Sometimes fully justified – and sometimes quite far fetched. Our working definition of Machine Learning for the purpose of this article: software using historic data to be able to make predictions, recommendations and even take actions in new situations. These slides from Larry Ellison’s keynote visualize that line of thinking:
Data can be analyzed in many ways in order to discover patterns, simple or complex correlations, find formulas and rules that describe with a certain degree of certainty how the value of an attribute we care about can be influenced or even derived from a set of input values. If we know which characteristics of our customer predict their product interests or which physical attributes or a patient are indicators of specific medical conditions or which IT infrastructure data or heavy equipment metrics can be indicative of imminent breakdowns, then we can act much more efficiently and timely. Machine learning or predictive analytics as Oracle tends to translate it is quickly becoming a reality. Largely because of the availability of a lot of data and the ability to perform data analysis in fairly short time periods using commodity hardware and without necessarily bringing an PhD in data sciences to the table.
Oracle offers machine learning inside a number of SaaS and PaaS products – resulting in recommendations, predictions and alerts. Examples are
· HCM Predictive Workforce that predicts performance of employees as well as their risk of leaving the company.
· Sales Predictor that recommends which products should be offered to which customer
· Communications Industry Data Model that does automatic customer segmentation, churn prediction and sentiment analysis.
· Management Cloud – metrics around Application Performance, Log files and IT infrastructure performance are collected, analyzed, reported in dashboards with visualizations and converted through machine learning into predictions and recommendations
Dataflow Machine Learning Cloud Service
Streaming data processing [aka real-time analytics] is a term used for finding meaning in real time data streams – by aggregating in time windows, matching patterns in subsequently arriving events and detecting events with values above threshold levels. Apache Storm and Apache Spark Streaming are open source frameworks for this streaming data processing. Oracle has announced Data Flow Machine Learning (DFML) Cloud Service for the same purpose: analyze real time streams of data:
The role of machine learning in this service is mainly to help with preparing the the data flows – it recommends specific correlations, joins, aggregations between various streams non streaming data sources.
These screenshots show how the DFML CS is embedded in the Big Data Tools:
first an overview of (big) data sources and data pipelines:
create a new data pipeline:
since at least one of the data sources for this pipeline is streaming, the pipeline as a whole is streaming and therefore a data flow – streaming ETL – is designed:
Do It Yourself Machine Learning
Customers can implement their own machine learning capabilities for leveraging their own data using Oracle Database Advanced Analytics option – which includes in-database machine learning algorithms in data mining, text analysis, graph and spatial analysis and a range of statistical operations available through Enterprise R.
The ability to perform predictive analytics inside the Oracle Database platform is presented as a major usp compared to other vendors:
Oracle Advanced Analytics for Hadoop offers similar machine learning algorithm for data on Hadoop, in conjunction with Apache Spark. Inside the database, many R algorithms are accessible through SQL and vice versa there are R functions executed by the SQL engine – all transparently to developers and data analysts. A substantial number of R operations is even pushed down to the Exadata Storage Cells, just like some of the SQL operations are. Oracle Advanced Analytics is also available as part of the Big Data Cloud Service. One of many new features in in OAA in Oracle Database 12c Release 2 is support for the ESA (Explicit Semantic Analysis) algorithm that is good at extracting meaning from documents and finding mutually similar documents.
And of course, advanced analytics are the same in the cloud as on premises – although then of course the data has to move:
Machine Learning is frequently complemented by human effort.
Human insights can help to steer the machine activity in the right direction. Big Data Discovery Cloud Service provides functionality to users to explore raw data – usually big data on Hadoop – wrangling, correlating, enriching and analyzing data from various sources that can be quite unstructured. `
An important facility is visualization: Big Data Discovery CS includes dozens of data visualizations and it knows how to select potentially meaningful visualizations for combinations of attributes that seem to have a correlation of sorts.
Using the initial results from machine learning and the pattern, rules and models that seem to exist in the data, the service offers candidate visualizations that human users can quickly discard, further explore and bring together into dashboards. Users can export transformed, enriched and blended data back to Hadoop, making it available for onward processing.
A lightweight subset of the functionality in Big Data Discovery CS is offered in the Data Visualization Cloud Service – available as desktop tool as well as browser based cloud facility.
This service allows users to import data sets – up to 50 GB – and ‘slice & dice’ them in any which way. Users can create and share reports, dashboards and narrations: visual data stories of an analysis, step by step explaining findings, conclusions and subsequent filter/aggregation/drill down or visualization actions. Simple to use and powerful to share insights.
Visualization packs are predefined for data in SaaS applications and on premises Applications Unlimited:
The Oracle Analytics Cloud – in Personal, Workgroup and Enterprise Editions (!) – provides visualization for many different data sources. I am not sure how this suite and the Data Visualization CS are related.
Oracle Machine Learning CS
Newly announced in a somewhat tentative way is Oracle Machine Learning CS, “the face of machine learning at Oracle”. My apologies for the poor visuals: these slides have not been shared after the conference so I had to include the picture I took during the session.
It is planned to provide a collaborative notebook style user interface based on Apache Zeppelin – in the style of many data science notebook platforms -such as iPython. In these notebooks, (citizen) data scientists can collaborate, combining notes and documentations with data explorations and manipulations and the results. Together, users create a narration from the data sets investigated and the insights that have been gained. Machine Learning CS (MLCS) will integrate all of Oracle’s and all open source machine learning algorithms and data sources together in a single user interface.
This means for example that in a notebook an analysis can be performed on data sets from Hadoop, NoSQL databases, DBaaS and Oracle BigData CS using analytical means in Oracle Database (SQL, PL/SQL, Data Mining, Graph, Text, Spatial), Enterprise R, Big Data Discovery CS as well as perhaps Stream[ing] Analytics and Dataflow ML, IoT CS and Flink and Tungsten.
Download the AMIS OOW16 Highlights for an overview of announcements at OOW16.