Apache Superset is an open source platform for data exploration and visualization. It can be described as an open, free alternative for Microsoft PowerBI, Tableau, Qlik and Oracle Analytics Desktop. Superset connects (through the SQL Alchemy framework) to dozens of SQL compliant databases and can work with CSV and JSON data sets. This article very briefly introduces Superset and then invites you to immediately start working with it in a Gitpod workspace (cloud based ephemeral quickstart and free workspace – you click on the link and the workspace opens up in the cloud, ready to start working with a freshly installed Superset instance).
Superset provides a SQL IDE for preparing data for visualization (define calculated attributes and ser formats and other characteristics for columns), including a rich metadata browser. Note: data is queried from its source by Superset and the results are held in Superset memory for analysis and presentation. A live connection to the source data set is required because that is where the source data is queried. Superset has a lightweight semantic layer which empowers data analysts to quickly define custom dimensions and metrics. Superset has its own datastore for definitions of datasets, charts, dashboards and additional metadata. This datastore is shared across all users who have access to a specific Superset instance
Superset makes it quite easy to assemble quite rich visualizations – offering many dozens of chart types. Charts can be annotated – for example to point out specific events that help clarify the data or that describes conclusions drawn from the data. Charts can be collected in Dashboards. Dashboards can be published – applying role based access on who is allowed to see which dashboard.
Gitpod Workspace for Apache Superset
Gitpod is an open source project and a cloud service that provides ephemeral development environments. You can host Gitpod yourself or use the cloud service that offers 50 hours of free workspace usage. I have written about Gitpod in this article. The Gitpod workspace I have prepared for Superset is available at this URL: https://gitpod.io/#https://github.com/lucasjellema/gitpod-apache-superset . Simply click on the URL and a workspace will open with Superset installed and running. Open port 8088 to enter the Superset web UI. Login with user admin and password admin.
You will enter the main page where you can start adding database connections, datasets, charts:
A Postgres database, named examples, is included and pre-configured in Superset for you.
Quite a few data sets – derived from this examples database – are predefined in the workspace environment:
To quickly create a Visualization for one of these datasets, start the Explore workflow from the Datasets tab, start by clicking the name of the dataset that will be powering your chart: cleaned_sales_data:
.
This is the no code, drag & drop visualization editor that you will see:
By clicking, dragging and dropping the data set fields the following stacked bar chart visualization is quickly composed:
Click on the link View all charts to select the desired chart type:
Then select Stacked category and Time Series Bar Chart
Drag sales to the metrics field. Select SUM as the Aggregate to apply
Set Time Grain to Quarter:
And Click on Customize then check the box for Stack Series
The chart will roughly look like this:
The chart can be added to one or more dashboards. The result can also be exported – the data summary as well as the image.
Note: you can easily add datasets by uploading CSV or Excel files:
The Gitpod workspace ships with many sample datasets, charts and dashboards to give you a taste of what can be one with Superset. One such dashboard is shown in the next figure:
Gitpod Workspace composition
The Gitpod workspace uses docker-compose and contains the installation described in the Superset documentation. In addition to Superset, Redis and PostgreSQL are installed, the latter to provide the metadata store ( that can double as database for data sets to analyze and visualize).
Resources
Apache Superset Homepage https://superset.apache.org/
Apache Superset GitHub – https://github.com/apache/superset
The Superset documentation is a little sparse. However, Preset provides rich documentation about its managed Superset cloud service that is by and large applicable to any Superset environment: https://docs.preset.io/v1/en (Preset is a cloud-hosted data exploration and visualization platform built on top of the popular open source project, Apache Superset.)
Article: 9 new chart types in Superset: https://preset.io/blog/2021-6-14-superset-nine-new-charts/
Tutorial for starting to work with Superset: https://censius.ai/blogs/apache-superset-tutorial#blogpost-toc-6
Introducing Gitpod https://lucasjellema.medium.com/first-steps-with-gitpod-great-for-try-out-quick-open-source-contributions-and-for-workshops-9590c322c18e