Oracle Analytics Cloud - Data Flow to produce a Date Value and Timeline to Visualize the time related data image 100

Oracle Analytics Cloud – Data Flow to produce a Date Value and Timeline to Visualize the time related data

In two earlier articles on Oracle Analytics Cloud I have introduced the Oracle OpenWorld 2018 Session Catalog Data Set (First steps with Oracle Analytics Cloud – Gather, Explore, Wrangle, Visualize and Creating a Data Flow in Oracle Analytics Cloud to enriching with Geo Encoding to Map visualization of data) . A set with over 2000 session records, describing presentations at Oracle OpenWorld and CodeOne 2018. Each session has a title, a duration, is associated with one or more tracks, has a certain level, takes place in a specific room and is scheduled on a specific day and time. In the earlier articles, I have visualized some of these qualities, such as the geographical spread of the sessions visualized on a map of downtown San Francisco.

In this article I will take a look at some of the date and time aspects of this data set. I want to find out what the busiest times are throughout the day and throughout the week – in terms of the number of concurrent sessions. I want to find out about evening sessions at CodeOne and I want to present a timeline of the events as the week unfolds.

Enriching the Data Set with a Session Date and Timestamp

To show a timeline, the records need a Timestamp. However, they do not have one. The sessions have a day column with values Monday, Tuesday, etc. We know that Oracle OpenWorld 2018 took place from Sunday 21st of October until Thursday 25th of October. image

We know therefore how to convert the day to a real session_date.

There are at least three ways to get a new value available for visualization:

  • add a calculation on the visualization tab image
  • add step (and a new column) in the data preparation tab for a data setimage
  • add a step in the data flow and create a new column into the target data set

The last option seems to me to be most robust – I had some exceptions with the two other approaches – so I will apply that one.

Open the Data Flow GeoVenueEnrichSessions – that also adds venues and geo coordinates to all sessions. Add a step Add Columns.

image

Configure this step: the new column is called session_date and it is to contain the date of the session. The value is composed as a string that is subsequently converted to a real date attribute using the To_DateTime function:

To_DateTime (concat(concat(‘2018-10-‘, case when day=’Sunday’ then ’21’ when day=’Monday’ then ’22’ when day=’Tuesday’ then ’23’ when day=’Wednesday’ THEN ’24’ when day=’Thursday’ THEN ’25’  ELSE ’25’ END),concat(concat(‘ ‘,cast(hour(time) as char(2))), concat(‘:’,cast(minute(time) as char(2))))), ‘yyyy-mm-dd hh:mi’)

image

Press Save and Run Data Flow. This will add the session_date column to the data set and calculate the value for each session.

Timeline to Show the Number of Sessions per Day per Event Throughout the Week

Using the new brand new session_date data element I can now create a timeline. Session_date is the category element of the timeline. Event and Row_Count are added to the picture. Now I can easily tell how many CodeOne sessions take place on October 24th: 116 sessions.

image

Sessions throughout the Day

Using the the Hour component of the time data element in the data set, I can easily get an idea of when the sessions start – throughout the day. The peak is from 11AM to 1PM and at night there are quite a few sessions too:

image

I run into issues with processing the minutes part of the session time data element. Rather than trying to fix this issue, I decide to stick to the hour component. I want to add to each session a session_time attribute that indicates the hour [slot] in which the session started.

In this case I will add session_time in the visualization tab by simply adding a calculation. The expression I have used:

TimeStampAdd(SQL_TSI_hour,(hour(time)), session_date)

(this takes the session_date – which has a time component of midnight of the date – and add as many hour to that timestamp as is the value of the hour part of the time attribute

image

Using the session_time timestamp I can create a timeline that shows for each hour of each day how many sessions start in that hour:

image

Timeline with Filtering

Here I have added a filter on session type to the timeline. This results in an easy insight in when Birds of a Feather and Ignite sessions take place: at night. On Monday and Tuesday. And I feel a special connection with this finding: I presented on both Monday Night and Tuesday Night. On one occasion for an audience of 2 (yes, that is two) – of whom one was asleep for most of the session…

image

Heatmap of Number of Sessions throughout the Day and the Week

This heatmap shows what I am after – the darker blue the cell, the higher the number of sessions:

image

The busiest time of the week turns out to be:

image

Tuesday, 12PM – with 98 sessions starting in that one hour slot. As well as lunch.

Part of the Day – Morning, Afternoon, Evening

In the Data Flow, create a new column called session_day_part using the following case expression that divides the rows into three groups: morning, afternoon and evening.

case when hour(time)<12 then ‘morning’ when hour(time)>17 then ‘evening’  else ‘afternoon’ end

image

Press Save and Run Data Flow to modify the data flow and create the new column.

Two quick visualizations – indicative of the numbers of sessions in the morning, the afternoon and the evening:

image

and

image

Clearly the afternoons are busiest, Monday is the busiest day.