In this article, I will show how I have created a Data Flow in Oracle Analytics Cloud to enrich a data set with geocoding data from a different data set, in order to be able to create a map based visualization of data. More specifically: in a previous article I introduced a data set with the Oracle OpenWorld and CodeOne 2018 session catalog. I demonstrated how I got going on Oracle Analytics Cloud with this data set, and how I created my first visualizations. The session catalog contains the room in which each session takes place – and the first part of the room indicates the venue. These venues can be plotted on a map. My objective in this article is to associate the geocoordinates of the venues with each session record and subsequently use that encoding for visualizing the sessions on their actual geographical location on the city map of downtown San Francisco.
I will use a Data Flow in OAC to perform two actions. First I will create a new column called venue that is derived from the first ten characters of the room column. This will derive one of eight venue values for each session. Then I will join the sessions data set with a data set that contains the geo locations for each venue. This makes the longitude and latitude of the venue location part of each session, and allows me to create a map visualization that indicates the locations of selected sessions.
Steps:
- Extract unique venues from room values
- Create data set with venues and their geo-coordinates
- Join sessions data set with venues data set – in order to add geo coordinates to each session
- Create map visualization to show sessions on map [of downtown San Francisco]
next steps: create custom map with floorplans for each of the venues
Starting from the data set prepared in the previous article – downloadable from https://github.com/lucasjellema/Oracle-OpenWorld-CodeOne-2018-SessionCatalog/blob/master/oow2018-sessions-clean.xlsx.
Extract unique venues from room values
I have opened the data set and explore the room column. The venue is included as first part of the string of the room column.
I can get a feel for the different values by taking a substring of the room string values. It turns out that a substring of the first 12 characters of room give me the unique values representing the venues. I create a new column venue, using an expression to derive that substring of room:
After Adding the Step and Applying the Script, I can turn to the visualize tab:
And get a list of the unique venue values.
I can copy the values of the venues to the clipboard and from there to a text file or an Excel sheet.
Create data set with venues and their geo-coordinates
Next I need to extend this text file with the geocoordinates. I use this website for getting the exact longitude and latitude value: https://www.findlatitudeandlongitude.com/?loc=Moscone+Center+west%252C+San+Francisco%252C+CA&id=2190619#.XEwbXFxKhPZ
After getting all coordinates, this is what the file looks like:
it also includes the names of the venues in full.
Next I add this data set to my OAC instance.
In the Create Data Set dialog, drag the CSV file with the venue data:
The file is uploaded and processed and the result is presented. Click on Add to save the Data Set to OAC. It can now be used in Data Flows – or be visualized in its own right:
Join sessions data set with venues data set – in order to add geo coordinates to each session
Data Flows are the data preparation pipelines in OAC that can take one or more data sets, perform one more operations on them – including adding and removing, merging and splitting columns, filtering rows, joining data sets, aggregating data sets, deriving values using machine learning models and more. I now want a data flow to add the geo coordinates and the proper venue name to each session record.
A Data Flows always starts from the driving data set. So specify that set:
Since I want to join the data set to another data set, I have to add an Add Data step to the data flow:
And indicate which Data Set to join with:
Next, I have to specify the join condition – inner, left or right or full outer – and the columns on which to match. In this case, venue in source set is joined with venue_short in the geoenrichment data set:
Add Save Data step – to record the output from the data flow as a (new) data set.
Configure the Save Data step. Then, press Save and Run Data Flow respectively.
The new data set is generated by the data flow [execution]
Unfortunately, we need to do one more thing: the columns lat and long in this data set should be treated as Attribute and not as Measure – otherwise we cannot use them in Map visualization.
Create map visualization to show sessions on map [of downtown San Francisco]
Open the project, add a new canvas. Click on Visualizations and click Create Visualization in the context menu:
A Map is added to the canvas.
Select the lat and long data elements and drag them to the Category cell for Layer 1 for the Map.
This should already make the outlines visible of downtown San Francisco, with blue points for each of the venues.
Drag Row Count to Size (Bubble) and set Color to venue_1. The result looks like this:
When I click on the map, the property palette in the lower left hand corner changes into the map properties. I can now specify if the zoom controls should be shown, what the title is for the map, if the scale is in miles or km and more.
Using Trellis Columns by event I can easily compare the venues used for CodeOne vs Oracle OpenWorld:
Clearly, CodeOne was far less spread out – using only Moscone West and North.
Note: OAC does something undesirable: the color used for Moscone North is not the same in the two maps (green vs red). In the next example this happens again.
In this final example, I have added a filter by day and am currently looking at only Sunday and Monday. The number of sessions in the Marriott hotel on these two days is 69.
Resources
Data files are in GitHub repo: https://github.com/lucasjellema/Oracle-OpenWorld-CodeOne-2018-SessionCatalog (specifically https://github.com/lucasjellema/Oracle-OpenWorld-CodeOne-2018-SessionCatalog/blob/master/oow2018-sessions-clean.xlsx and https://github.com/lucasjellema/Oracle-OpenWorld-CodeOne-2018-SessionCatalog/blob/master/oow2018_geoenrichment.csv.
findlatitudeandlongitude Website for getting fine grained longitude and latitude: https://www.findlatitudeandlongitude.com/?loc=Moscone+Center+west%252C+San+Francisco%252C+CA&id=2190619#.XEs64VxKhPY
OAC Docs – manage map data – for adding custom map layers and backgrounds – https://docs.oracle.com/en/cloud/paas/analytics-cloud/acabi/manage-map-information-analyses.html#GUID-F5C83299-9020-4B58-B011-19C239E4A194
DV Demo with custom maps – https://www.youtube.com/watch?v=iIsVuUW2nmQ – using GeoJson shapes
QGis – Open Source tool for creating custom GeoJson files for floorplans, fantasy maps and more – https://qgis.org/en/site/
Introduction to GeoJSON: https://macwright.org/2016/06/05/falsehoods-developers-believe-about-geojson.html
Article Digital Cartography: Making Maps in The Age Of the Internet – http://sketchbook.nclud.com/digital-cartography-making-maps-age-internet/
Building Map for Game of Thrones – using GeoJSON, PostGIS and more – https://github.com/triestpa/Atlas-Of-Thrones