Platys – generate a customized container powered Data Platform environment image 2

Platys – generate a customized container powered Data Platform environment

Platys is a tool created by Guido Schmutz, architect and data specialist at Accenture and frequent teacher at various universities. In order to be able to quickly create an environment to investigate a tool, build a proof of concept with a specific combination of technologies or provide all his students with the handson setup they need for his class’ labs, Guido created Platys. Platys is a tool that will generate the tailor made docker-compose.yml with all the configuration for a selected combination of technologies. You tell Platys which technologies you want in your environment and Platys generates the docker-compose.yml file that will deliver exactly that environment.

This picture shows the high level overview:

imageAfter installing Platys and initializing a new platform environment, these are the steps

  1. Configure the technologies you want to include in the environment – by editing the config.yml file
  2. Run the Platys generator from the CLI (Platys itself runs in a container)
  3. Platys produces a docker-compose.yml file with all required containers, their configuration and the docker network
  4. Run docker-compose up for this file
  5. After pulling the images and starting the containers with the required configuration, your custom data platform environment is ready for action

Platys already knows about 150+ tools and technologies, from data stores, event brokers, message platforms to data pipelines, data processing, job scheduling and dashboarding & data visualization. This picture gives you an idea of the areas and technologies that are currently supported:

image

The config.yml basically is a long list of toggles: each of the supported technologies is represented by a toggle that by default is off. Switch on all toggles for the technologies you want to combine in a platform by changing false to true in this file for the relevant toggles. Note: the file contains some other settings that you can play with. The documentation tells you what the options and their meanings are.

Quick Demonstration of Using Platys – for a Kafka and Trino Platform

You can very easily try out Platys yourself. Installing it, initializing and configurating a platform is straightforward and generating the [docker-compose.yml file for the] platform is real easy. Running the platform is too. I have created a Gitpod workspace for myself to use Platys (it comes with Docker, Docker Compose, Platys and instructions for using it) – but you can easily install it wherever you want (as long as Docker and Docker Compose are available).

Let me go through the steps of running a platform with Apache Kafka and Trino – as described also in Guido’s cookbook.

I want the following selection from the Platys menu:

image

And when the platform is running, this is what I expect:

image

I have started the Gitpod workspace with Platys installed.

Next, I create new directory and initialize a fresh platform in that directory using the Platys CLI:

image

In this, I initialize the platform with a list of the services I desire. This saves me from having to switch on the toggles in the config,yml file.

After a few seconds, this is the output

image

I then  image

I feel ready to run the Platys generator that will produce the docker-compose.yml:

image

image

The file is created after several dozens of seconds:

image

At this point I can inspect the docker-compose.yml file. Guido strongly recommends not changing it! If you want to start additional services that Platys does not currently support, you can create a file called docker-compose.override.yml to manually add/overwrite (see https://github.com/TrivadisPF/platys/blob/master/documentation/docker-compose-override.md). This way you can always add new stuff to a running platform, just by enabling another service, re-generate and do a “docker-compose.yml up -d”, which will only start the new service, leaving the other ones already running untouched

I can also just run docker-compose up:

image

and after several minutes, my own tailored data platform is running, ready for me to start explorations, demonstrations, handson labs etc

image

image

The current state of the platform can easily be verified using the auto-generated status page – that lists the services, there endpoints and relevant configuration details such as username and password. Here is a screenshot of what that looks like for a different platform environment (click on any service to get an overview of its details):

image

Guido’s cookbook provides instructions for generating Kafka topics and messages on these topics using the tpch-kafka loader from Maven central:

image

Upon inspection in AKHQ – part of the platform – we can see the topics that have been created and the messages that were produced:

image

Trino is a platform that presents data from many different types of data sources through a relational, SQL interface. One example is Trino’s support for Kafka: we can query data through Trino that actually resides on a Kafka Topic.

image

Kafka does not understand SQL – but Trino can translate our SQL instructions into the required interactions with Kafka – allowing us to consider a Kafka Topic almost like a database table. See here an example in my freshly minted data platform: using Trino I inspect the “tables” (actually: the topics) and the structure of the messges on the topic

image

I can also query the messages:

image

Note: in order for Trino to be able to understand the structure of the messages – and be able to handle queries against the properties of customers as if they are columns in a table, I need to provide a simple JSON file in the configuration folder for Trino in my new platform directory structure (this folder is mapped into the Trino container). This file describes the JSON structure of the messages on the customers topic.

After providing that file and restarting the Trino service, Trino has knowledge of the structure of the “customers table” and can handle these instructions:

image

Roadmap Platys

Platys already supports an impressive number of technologies – and more are underway. Guido is working on the 1.17 release and these are the features he has assembled for that release. I guess he can use some help with that – so feel invited to participate.

image

Conclusion

Use Platys to quickly assemble an R&D or workshop environment – for exploration & inspiration. Platys can pull together many popular technologies and knows how to bind them together in a working combination – something that can sometimes take quite some effort to realize on your own (or at least on my own).

Platys runs anywhere – CLI and Container. Platys will generate a Docker Compose configuration for the selected Data Platform components & configuration. Then run this new Data Platform simply with Docker Compose Up. Platys and Gitpod go together well – it is my favorite way of working. Guido indicated that he typically work on AWS Lightsail – see this doc.

Resources

Platys GitHub Repository https://github.com/TrivadisPF/platys-modern-data-platform

Platys Cookbooks – dozens of step by step instructions for specific data platforms created with Platys – https://github.com/TrivadisPF/platys-modern-data-platform/tree/master/cookbooks

image

Picture of all supported services: https://github.com/TrivadisPF/platys-modern-data-platform/blob/master/documentation/images/modern-data-platform-overview.png 

My Gitpod workspace to use Platys in

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.