First steps with Oracle Analytics Cloud - Gather, Explore, Wrangle, Visualize image 45

First steps with Oracle Analytics Cloud – Gather, Explore, Wrangle, Visualize

Data analytics is what turns data into business value. Oracle has a long history in Data Analytics – from Oracle Discoverer and its predecessors such as Data Browser through OBI EE and Endeca to several cloud services. Oracle Analytics Cloud is the strategic offering to adopt going forward – not BI Cloud or ESSBase Cloud.

I have set up a trial account on Oracle Cloud and will now start exploring Oracle (Autonomous) Analytics Cloud. I will provision an instance, load a data set, do some data wrangling and enrichment, explore the data set and finally create some visualizations that will provide some business insights. In this article, I will share my experiences – to help anyone who is also interested get started. And to remember how I did it all, for a next time.

Provision Instance of OAC

Before starting out I had already set up an Oracle Public Cloud Trial Account.

From the dashboard in the trial identity domain – amis2019 – I navigate to Analytics:

image

In order to provision a fresh instance of Oracle Analytics Cloud (OAC), I click on the obvious button.

image

I provide a name (I had to take out the dash) and accept most default settings. Note that I do not need to also provision or provide a database or any other cloud service.image

After pressing Next, I have a chance to review. Of course I like what I see, and I move forward : by pressing Create I instruct the Oracle Public Cloud to provision my OAC instance.

 image

The request is received and will be processed:image

After about 10 minutes I receive an email that notifies me of the instance that has been prepared for me.image

I can now enter the OAC instance and start performing administration activities (such as granting access to other users) or – more interesting – start adding data sets. The URL to my OAC instance is found from the drop down menu in the OAC Service Console page.

image

Loading and Preparing Data

When I started to investigate how to load data into the OAC instance, I ran into a bit of a disappointment: OAC cannot handle JSON data (or XML for that matter).

image

My dataset was prepared in a Jupyter Notebook – a Pandas Dataframe as saved as JSON file. It was not too hard to revisit my Notebook and save the data frame as a CSV file – leaving out all attributes with complex values. However, isn’t OAC supposed to be the enterprise alternative to Jupyter Notebooks? I am unpleasantly surprised by this limitation.

The dataset that I will explore, wrangle and try to extract some business insights from is the session catalog from Oracle OpenWorld and CodeOne 2018. Details on some 2000 sessions – including title and abstract, duration, speakers and their companies, tracks, rooms, levels – offer a nice set of data to make some sense of. The Excel worksheet created from the CSV export from Pandas Data Frame in Jupyter Notebook can be downloaded here.

image

Login to Oracle Analytics Cloud. Click on Create and select Data Set:

image

Drag the Excel file to the icon and drop it

image

Uploading and processing will start automatically.

image

When the upload is complete and the data has been loaded, I get an overview of my fresh data set:

image

I can inspect columns – and if so desired perform refinement on them. For example: I can set the correct data type for column length – from text to number:

image

By pressing the Add button, I save the data set and ensure it will stay around for further explorations.

If I now click on the ‘hamburger menu’

image

I can go no Projects:

image

And create a new project to which I can then add my data set.

image


The first step in creating the new project is adding a data set to the project:image

I am taken to the central data exploration page – one of three tabs in a project: Prepare, Visualize, Narrate. Roughly translated as wrangle the data into a shape fit to visualize, explore the data through visualizations and prepare a (visual) report for your target audience.

image

Let’s switch (back) to the Preparation tab for one moment. I want to add a few additional columns to the data set.

Prepare Data: Enrich Data Set

The data preparation tab shows a list of recommendations on the far right. These recommendations are actions on the data set that OAC suggests based on profiling the data and smart algorithm baked into OAC. It offers to create new features (aka columns or attribute) from existing columns – in order to make visualization easier later on. For those of you who are trained to normalize data: this is the opposite! It is quite common in data analytics to enrich a data set with data derived from that very same data set. This may feel as not adding any value at all. However, once you start creating visualizations and answering business questions, you will be glad with those smarter, more convenient features.

image

I think I may want to explore the length of the session title. A maximum of 80 characters is allowed, and I am usually struggling to submit session proposals that fit. I am wondering if others are struggling just as much. And if there are perhaps ways around that limitation: did anyone manage to get in titles of more than 80 characters?

I click on the little add icon in the upper left hand corner, next to Preparation Script.

image

The Create Column panel is shown. I enter the name of the new column – title_length – and look for an operator to use under String. The operator I need is called char_length. I add it to my column definition. Once I start typing the expression inside the parentheses, a dropdown list appears of functions and columns. I select title.

image

Next I press Add Step.

image

and the column is added to my data set.

Slightly more complex: the column Track contains an array (JSON style notation) with strings. OAC just sees one long string – it cannot interpret individual string values. Now perhaps this is something I should have resolved in my Jupyter Notebook, when I had the chance. But here we are.

image

Can I extract columns primary_track and secondary_track from this array definition?

Hint: here is the documentation on the functions available for use in the expression editor: https://docs.oracle.com/en/cloud/paas/analytics-cloud/acabi/functions.html#GUID-BBA975C7-B2C5-4C94-A007-28775680F6A5 

To be honest, it was not easy to get this expression figured out. I prefer a little notebook style Pandas wrangling. The help offered by the expression editor is limited. Note: char(39) resolves to apostrophe ‘ – I could find no other way to insert the apostrophe into the expression.

CASE WHEN POSITION(char(39) IN track)=0 THEN ‘ ‘ ELSE SUBSTRING(track FROM 1+Locate(char(39), track) FOR Locate(char(39) , track,3)-3) END

image

Press Add Step to commit my new derived column.

In a similar way I can create a secondary_track column:

CASE WHEN POSITION(‘,’ IN track)=0 THEN ‘ ‘ ELSE SUBSTRING(track FROM 3+Locate(‘,’, track, 3) FOR (Locate(char(39), track, Locate(‘,’, track, 3)+3) – Locate(‘,’, track, 3) – 3)) END

(first find the comma that separates the first from the second track, then find the second apostrophe after the comma)

SNAGHTML2565bcf

Note: in the remainder of the exploration, it seems that these derived columns cause errors:

imageand for now I see no alternative but to delete them.

Maybe now is a good time to some exploration and visualization.

Click on Save – to save the project:

image

Explore/Visualize Data Set

Switch to the Visualize tab.

Right click on the Length data element and select Explain length  from the dropdown menu.

image

Explaining a data element results in an interesting data exploration report for a specific data element. Let’s have a look.

image

The values for (session) length vary from 15 to 240 (that is a session that lasts for 6 hours!). Most sessions clearly last for up to 50 minutes.

Let’s explain the data element day:

image

and the derived attribute title_length

image

That is surprising: the title length has to be no more than 80 characters. How can there be 85 unique values? Some speakers apparently are more equal than others. The most common title length is 79 characters.

Looking at Key Drivers for data element oracle_speaker (an element that indicates whether a session has a speaker associated with it who works for Oracle), OAC provides some insights: if a speaker has a Java Champion associated with it, the chances are much smaller than normal that oracle_speaker has the value ‘Y’.

image

Using the data element explanation feature, we can get a fair understanding of the data, its values and the correlations between features. Note: correlation does not mean causality!

Now let’s quickly do a real visualization.

Drag day to the main panel:

image

and drop as Columns.

Drag RowCount to the panel and drop as Values:

image

The first visualization is presented:

image

Let’s add a little more insight by using the hour component of the (start) time to the Rows:

image

Switch to different graph style

image

A simple bar chart:

image

Let’s visualize the numbers of sessions per conference:

image

And the number of sessions per session type, per conference:

image

I have always wondered how many sessions at these conferences have at least one speaker employed by Oracle. This treemap answers that question visually:

image 

Green means Oracle Speaker in a session, blue is no Oracle speaker. The size of the box corresponds to the number of sessions. OpenWorld and CodeOne are shown on the left respectively the right side.

Not an obvious visualization – but quite cool and easy to produce: a Sankey diagram that shows how many of the Oracle OpenWorld or CodeOne sessions have N or Y an Oraxole speaker associated with them.

image 

In visualizations, the true power of OAC shines. The challenge becomes for the data analyst to find the right visualization for the relevant data elements for a specific question. OAC provides some help with that too, by exposing a best visualization option for a data element:

image

I have selected elements title_length, oracle_speaker and row_count. From the context menu I now select Create Best Visualization. Let’s see.

image

An horizontal stack bar that shows the number of occurrences for each title length value – in green for sessions with an Oracle speaker and in blue for non Oracle associated sessions. It was my assumption that only the Oracle sessions could have titles of more than 80 characters, but apparently that is not the case. Perhaps sponsors have special privileges as well?!

I am impressed with the speakers who manage with titles of just 10 or 14 characters. How do they do it?

A last visualization:  a tag cloud on session type – where the size of the tag represents the number of sessions of that particular type. The color indicates the conference. Creating this tag cloud took about 7 seconds of dragging a few data elements. Really nice.

image

Conclusion

Oracle Analytics Clouds offers great visualizations.

image

This tremendous list of visualizations and the ease with which they can be applied to the data set is surely one of the USPs of Oracle Analytics Cloud. Once the data is in good shape, we have a powerful set at our fingertips.

Its data profiling functionality helps to quickly get going. Data preparation is not quite that simple. Creating new, derived columns using expressions is hard going. I hope that will improve. I have not yet even scratched the surface of the visualizations, nor have I touched on machine learning and the narrations functionality, where reports are composed to present the findings.

image

I have also not yet explored the Data Flow capability in OAC. From the documentation: “Data flows let you use one or multiple data sets, combine the data sets, rename columns, change attributes of the data, and integrate the data sets to produce a curated data set that you can use to create visualizations.” Perhaps there I will have a better experience than in the data preparation tab. In a future article I hope to address some of these topics.

Note: most of what was shown in this article can also be done with Oracle Data Visualization Desktop – a Windows or Mac desktop tool, freely downloadable from Oracle eDelivery: https://www.oracle.com/technetwork/middleware/oracle-data-visualization/downloads/oracle-data-visualization-desktop-2938957.html

Resources

The Excel document with the Oracle OpenWorld and CodeOne 2018 session catalog – https://github.com/lucasjellema/Oracle-OpenWorld-CodeOne-2018-SessionCatalog/blob/master/oow2018-sessions-clean.xlsx 

YouTube – How To Start Your Oracle Analytics Cloud Trial – https://www.youtube.com/watch?v=h6NsAEwVaDo

Documentation: User Guide for Oracle Data Visualization Desktop   https://docs.oracle.com/en/middleware/bi/data-visualization-desktop/index.html

Documentation: Analytics Cloud – https://docs.oracle.com/en/cloud/paas/analytics-cloud/index.html