One of the relatively new features available with Oracle Autonomous Data Warehouse is Oracle Machine Learning Notebook. The description on Oracle’s tutorial site states: “An Oracle Machine Learning notebook is a web-based interface for data analysis, data discovery, and data visualization.” If you are familiar with Jupyter Notebooks (often Python based) then you may know and appreciate the Wiki like combination or markdown text and code snippets that are ideal for data lab ‘explorations’ of data sets and machine learning models. I am quite a fan myself. Especially wrangling data, juggling with Pandas Data Frames and visualizing data with Plotly is good fun and it is quite easy to accomplish meaningful and advanced results.
Oracle Database has quite a lot of machine learning inside. Ever since the Darwin Data Mining engine was first added to the Oracle Database – in release 9i R2 back in 2002 – machine learning algorithms have been available on top of data in relational tables from SQL and PL/SQL. The Advanced Analytics database option unleashes that power to organizations today. And in Autonomous Data Warehouse, this option is available, even in the free tier of Oracle Cloud.
Let’s take Oracle ML SQL Notebooks for a quick spin.
From my ADW Dashboard, I go to Service Console:
And on Service Console, I go to Oracle ML SQL Notebooks. Note: beforehand, in the Administration tab, I have already created a Oracle ML SQL Notebooks user. Creating that user in the this web based consoled resulted in the creation of a database user in my ADW instance.
Logging in with previously created user account:
And creating a new notebook:
The Notebook executes in the context of the database schema of the currently connected user. the dataman schema was created when the user account was created for Oracle ML SQL Notebooks. I now also create a table in that database schema, using SQL Developer:
Now I can work with the data in this table inside the notebook.
Here is our initial view at the empty notebook:
The notebook currently contains a single, empty cell. I can add either markdown text (%md), a SQL statement (%sql) or a PL/SQL script (%script) in the cell.
Here are two examples: a rich text cell and a SQL cell:
The SQL statement can be more interesting – any valid SQL statement can be used in a cell. Here is a more complex SQL statement, the result of which is visualized in a line chart:
The line chart allows me to zoom in on a specific area. It does not allow me to define the format used for the labels displayed on the horizontal axis. I am not sure how I managed to have the the y-axis not show the entire value range (0-481.1). In the next chart – bar chart – I am not able to only show the tip of the bars – to highlight there differences:
The meaning by the way of this chart is the difference in average number of deaths between week days. Sunday has far fewer deaths than Friday – but it is not as obvious from the chart as it would be if the y-axis would start at 350 or thereabouts.
By crudely subtracting 350 from each queried value, I get more or less the visual effect I wanted to show:
But it is not elegant.
I have tried next to show in a scatter plot (the line chart did not give me a meaningful result for this query) how the number of deaths per weekday evolves throughout the quarters. In the query, I calculate the index of a weekday’s ‘performance’ in a quarter against the average daily death count for that quarter for all weekdays. SQL is versatile – creating the query is fairly straightforward. The scatterplot does a nice job. Still it feels that line chart would have been better.
Note how deadly the Friday turns out to be, quite consistently. The Sunday is the least deadly day, consistently showing up for each quarter as the bottom scoring day of the week.
Here is the result I get for the line chart – not at all useful I am afraid:
In addition to SQL, we can also use PL/SQL in the cells of the notebook. A simple example:
Using PL/SQL, we can unleash the machine learning capabilities of the ADW – as well as perform many other data preparation and manipulation. Using SQL and the Notebooks visualization features, we can see the results of those operations.
The ML SQL Notebooks come with a number of samples. These show off what the notebooks can do – and even more what Machine Learning in the Oracle Database can do. The examples show Regression, Classification, Anomaly Detection and a number of other machine learning models in SQL and PL/SQL.
Here a screenshot from the Classification Prediction Model
The Oracle ML SQL Notebook is one way of leveraging not just SQL but also PL/SQL and the machine learning models that SQL and PL/SQL can leverage. In this notebook, we can use markdown styled text and SQL statements. The results from executing these statements can be visualized in different ways – bar chart, pie chart, line chart, table, area chart and scatter chart. It is a nice way to do some explorations of your data. However, coming from Jupyter Notebooks and all they have to offer, I can not be too excited about the SQL Notebooks. The features I find lacking most:
- ability to set (from query results) and use parameters (in queries); there is no ‘session state’ throughout the notebook
- ability to configure the visualization (for example the value range covered by an axis or the format used to display values)
I assume that the current state of the Oracle ML SQL Notebooks is one that will evolve quickly into a version that is richer in functionality.
Note: a few years back I saw a great demo of a similar mechanism from Oracle Labs; this Data Studio also involved a Notebook style tool, that supported SQL as well as other languages, intermingled in a single notebook. I hope to see some of the functionality shown in Data Studio included in Oracle ML SQL Notebooks:
Original announcement of the Oracle ML SQL Notebooks on Oracle Blogs: https://blogs.oracle.com/machinelearning/introducing-oracle-machine-learning-sql-notebooks-for-the-oracle-autonomous-data-warehouse-cloud
Demo of Oracle Labs Data Studio (JavaOne, 2017) – https://www.youtube.com/watch?v=mWDeEXs_CMQ