Platys is a tool created by Guido Schmutz, architect and data specialist at Accenture and frequent teacher at various universities. In order to be able to quickly create an environment to investigate a tool, build a proof of concept with a specific combination of technologies or provide all his students with the handson setup they need for his class’ labs, Guido created Platys. Platys is a tool that will generate the tailor made docker-compose.yml with all the configuration for a selected combination of technologies. You tell Platys which technologies you want in your environment and Platys generates the docker-compose.yml file that will deliver exactly that environment.
This picture shows the high level overview:
- Configure the technologies you want to include in the environment – by editing the config.yml file
- Run the Platys generator from the CLI (Platys itself runs in a container)
- Platys produces a docker-compose.yml file with all required containers, their configuration and the docker network
- Run docker-compose up for this file
- After pulling the images and starting the containers with the required configuration, your custom data platform environment is ready for action
Platys already knows about 150+ tools and technologies, from data stores, event brokers, message platforms to data pipelines, data processing, job scheduling and dashboarding & data visualization. This picture gives you an idea of the areas and technologies that are currently supported:
The config.yml basically is a long list of toggles: each of the supported technologies is represented by a toggle that by default is off. Switch on all toggles for the technologies you want to combine in a platform by changing false to true in this file for the relevant toggles. Note: the file contains some other settings that you can play with. The documentation tells you what the options and their meanings are.
Quick Demonstration of Using Platys – for a Kafka and Trino Platform
You can very easily try out Platys yourself. Installing it, initializing and configurating a platform is straightforward and generating the [docker-compose.yml file for the] platform is real easy. Running the platform is too. I have created a Gitpod workspace for myself to use Platys (it comes with Docker, Docker Compose, Platys and instructions for using it) – but you can easily install it wherever you want (as long as Docker and Docker Compose are available).
Let me go through the steps of running a platform with Apache Kafka and Trino – as described also in Guido’s cookbook.
I want the following selection from the Platys menu:
And when the platform is running, this is what I expect:
I have started the Gitpod workspace with Platys installed.
In this, I initialize the platform with a list of the services I desire. This saves me from having to switch on the toggles in the config,yml file.
After a few seconds, this is the output
I feel ready to run the Platys generator that will produce the docker-compose.yml:
The file is created after several dozens of seconds:
At this point I can inspect the docker-compose.yml file. Guido strongly recommends not changing it! If you want to start additional services that Platys does not currently support, you can create a file called docker-compose.override.yml to manually add/overwrite (see https://github.com/TrivadisPF/platys/blob/master/documentation/docker-compose-override.md). This way you can always add new stuff to a running platform, just by enabling another service, re-generate and do a “docker-compose.yml up -d”, which will only start the new service, leaving the other ones already running untouched
I can also just run docker-compose up:
and after several minutes, my own tailored data platform is running, ready for me to start explorations, demonstrations, handson labs etc
The current state of the platform can easily be verified using the auto-generated status page – that lists the services, there endpoints and relevant configuration details such as username and password. Here is a screenshot of what that looks like for a different platform environment (click on any service to get an overview of its details):
Guido’s cookbook provides instructions for generating Kafka topics and messages on these topics using the tpch-kafka loader from Maven central:
Upon inspection in AKHQ – part of the platform – we can see the topics that have been created and the messages that were produced:
Trino is a platform that presents data from many different types of data sources through a relational, SQL interface. One example is Trino’s support for Kafka: we can query data through Trino that actually resides on a Kafka Topic.
Kafka does not understand SQL – but Trino can translate our SQL instructions into the required interactions with Kafka – allowing us to consider a Kafka Topic almost like a database table. See here an example in my freshly minted data platform: using Trino I inspect the “tables” (actually: the topics) and the structure of the messges on the topic
I can also query the messages:
Note: in order for Trino to be able to understand the structure of the messages – and be able to handle queries against the properties of customers as if they are columns in a table, I need to provide a simple JSON file in the configuration folder for Trino in my new platform directory structure (this folder is mapped into the Trino container). This file describes the JSON structure of the messages on the customers topic.
After providing that file and restarting the Trino service, Trino has knowledge of the structure of the “customers table” and can handle these instructions:
Platys already supports an impressive number of technologies – and more are underway. Guido is working on the 1.17 release and these are the features he has assembled for that release. I guess he can use some help with that – so feel invited to participate.
Use Platys to quickly assemble an R&D or workshop environment – for exploration & inspiration. Platys can pull together many popular technologies and knows how to bind them together in a working combination – something that can sometimes take quite some effort to realize on your own (or at least on my own).
Platys runs anywhere – CLI and Container. Platys will generate a Docker Compose configuration for the selected Data Platform components & configuration. Then run this new Data Platform simply with Docker Compose Up. Platys and Gitpod go together well – it is my favorite way of working. Guido indicated that he typically work on AWS Lightsail – see this doc.
Platys GitHub Repository https://github.com/TrivadisPF/platys-modern-data-platform
Platys Cookbooks – dozens of step by step instructions for specific data platforms created with Platys – https://github.com/TrivadisPF/platys-modern-data-platform/tree/master/cookbooks
Picture of all supported services: https://github.com/TrivadisPF/platys-modern-data-platform/blob/master/documentation/images/modern-data-platform-overview.png
My Gitpod workspace to use Platys in