Last week I presented on Apache Kafka – twice. Once to a group of over 100 students, once to 30+ colleagues. In both instances, I invited attendees to partake in a workshop with hands-on labs to get acquainted with Apache Kafka. I had prepared a Docker Compose based Kafka platform (aided by the work by Guido Schmutz) that participants could install locally on their laptop. However, an alternative approach that did not involve any local installation and was actually quicker and just as free was through a free Apache Kafka cloud service: CloudKarafka.
In this article, I will show you can quickly get going on CloudKarafka.
CloudKarafka offers managed Kafka Clusters on the cloud and as part of their offering, they make a free plan available – the Developer Duck plan.
Go to the CloudKarafka website and scroll down to the Developer Duck plan. Click on Try a Developer Duck.
You will be asked to login, either with your GitHub account, a Google account or a new account that you create via Sign Up.
Specify a name for the instance – for example Online Meetup Apache Kafka (however, Butterfly will also do nicely).
The Developer Duck plan (FREE!) should already be selected.
Click on Select Region.
The default region – and the one that seems most stable – is US-East-1 on AWS. You might as well stick to that option.
Press Review.
You will get an overview of the choices you have made. You can review and decide to go back to revise. But why would you? Go ahead, and press Create Instance.
After a little while, you will be notified that your new instance has been provided for you.
Click on name of the instance. It is a hyperlink, and it will take you to a page with details regarding your new instance.
You will need various details on this page later on when start to programmatically access Kafka Topics.
- username
- password
- Topic Prefix
- Servers (endpoints of the Kafka brokers in your new cluster instance)
You can always return to this page to look up these details.
Try out the new Cloud Karafka Ducky Plan Instance
Go to the Topic tab. You will see one topic listed: a default topic that was created along with your Kafka instance. Note that under the Ducky Plan you can create up to 5 topics. And you can also delete them.
Select and copy the name of the default topic to the clipboard. Then open the Browser tab.
On this page, you can publish messages to your topic or consumer messages from your topic. Paste the name of the topic into both Topic fields: Then click on Consume. A consumer is created and the messages consumed are pushed to the browser.
Enter a message in the Producer area and click on Produce. The message is published to the topic. Next, it is consumed from the topic, pushed to the browser and displayed in the page. An indication of the partition from which the message is consumed is also provided.
Feel free to play around with different types of messages.
Note: most advanced tools available from CloudKarafka are unfortunately though not unexpectedly not part of the free plan.
Programmatic Interaction with CloudKarafka’s Free Developer Duck Plan from Node applications
The hello world of Kafka clients in Node (JS) – that is the ambition in this section, nothing more. The sources are available from GitHub: https://github.com/AMIS-Services/online-meetups-introduction-of-kafka/tree/master/lab2-programmatic-consume-and-produce.
The NPM module repository returns over 550 modules when searched for the keyword kafka. Not all of them are libraries to facilitate the interaction from your Node application with Apache Kafka clusters – but over a dozen are. In this lab, we will work with the node-rdkafka NPM module, node-rdkafka on GitHub for details on this particular library and Reference Docs for the API specification. The node-rdkafka library is a high-performance NodeJS client for Apache Kafka that wraps the native (C based) librdkafka library. All the complexity of balancing writes across partitions and managing (possibly ever-changing) brokers should be encapsulated in the library.
The application has a simple enough package.json file – with the dependency on node-rdkafka as the most important element.
{ "name": "nodejs-kafka-example", "version": "1.0.0", "description": "", "main": "index.js", "scripts": { "test": "echo \"Error: no test specified\" && exit 1" }, "author": "", "license": "ISC", "dependencies": { "node-rdkafka": "^2.2.0" } }
The config.js file contains the configuration of the Kafka Cluster. It is this file that you need to update based on the connection details of the CloudKarafka instance:
const topic = "kjynvuby-default" // set the correct topic name, especially when you are using CloudKarafka const kafkaConfig = { // Specify the endpoints of the CloudKarafka Servers for your instance found under Connection Details on the Instance Details Page // this looks like this: moped-01.srvs.cloudkafka.com:9094,moped-02.srvs.cloudkafka.com:9094,moped-03.srvs.cloudkafka.com:9094" "metadata.broker.list": "moped-01.srvs.cloudkafka.com:9094,moped-02.srvs.cloudkafka.com:9094,moped-03.srvs.cloudkafka.com:9094" , "security.protocol": "SASL_SSL", "sasl.mechanisms": "SCRAM-SHA-256", "sasl.username": "kjynvuby", "sasl.password": "D0_sMX2ICVfuOfYjpZE8VdAnMlrknXSd" }; module.exports = { kafkaConfig, topic };
Producing
The simplest Node client for producing messages to a Kafka Topic looks like this – no dependencies on your specific Kafka Cluster (it leverages config.js for that):
const Kafka = require("node-rdkafka"); // read the KAFKA Brokers and KAFKA_TOPIC values from the local file config.js const externalConfig = require('./config'); // function to generate a message const generateMessage = i => new Buffer.from(`Generated a happy message - number ${i}`); function generateAndProduceMessages(arg) { for (var i = 0; i < messageBatchSize; i++) { producer.produce(topic, -1, generateMessage(i), i) } console.log(`producer ${arg.name} is done producing messages to Kafka Topic ${topic}.`) } // construct a Kafka Configuration object understood by the node-rdkafka library // merge the configuration as defined in config.js with additional properties defined here const kafkaConf = {...externalConfig.kafkaConfig , ...{ "socket.keepalive.enable": true, "debug": "generic,broker,security"} }; const messageBatchSize = 3; // number of messages to publish in one burst const topic = externalConfig.topic; // create a Kafka Producer - connected to the KAFKA_BROKERS defined in config.js const producer = new Kafka.Producer(kafkaConf); prepareProducer(producer) // initialize the connection of the Producer to the Kafka Cluster producer.connect(); function prepareProducer(producer) { // event handler attached to the Kafka Producer to handle the ready event that is emitted when the Producer has connected sucessfully to the Kafka Cluster producer.on("ready", function (arg) { console.log(`Producer connection to Kafka Cluster is ready; message production starts now`) generateAndProduceMessages(arg); // after 10 seconds, disconnect the producer from the Kafka Cluster setTimeout(() => producer.disconnect(), 10000); }); producer.on("disconnected", function (arg) { process.exit(); }); producer.on('event.error', function (err) { console.error(err); process.exit(1); }); // This event handler is triggered whenever the event.log event is emitted, which is quite often producer.on('event.log', function (log) { // uncomment the next line if you want to see a log message every step of the way //console.log(log); }); }
Consuming
And finally the simplest Node message consumer is shown below. This consume.js module also depends on config.js for the configuration details of the actual [CloudKarafka] Kafka Cluster.
const Kafka = require("node-rdkafka"); // see: https://github.com/blizzard/node-rdkafka const externalConfig = require('./config'); const CONSUMER_GROUP_ID = "node-consumer-2" // construct a Kafka Configuration object understood by the node-rdkafka library // merge the configuration as defined in config.js with additional properties defined here const kafkaConf = {...externalConfig.kafkaConfig , ...{ "group.id": CONSUMER_GROUP_ID, "socket.keepalive.enable": true, "debug": "generic,broker,security"} }; const topics = [externalConfig.topic] var stream = new Kafka.KafkaConsumer.createReadStream(kafkaConf, { "auto.offset.reset": "earliest" }, { topics: topics }); stream.on('data', function (message) { console.log(`Consumed message on Stream: ${message.value.toString()}`); // the structure of the messages is as follows: // { // value: Buffer.from('hi'), // message contents as a Buffer // size: 2, // size of the message, in bytes // topic: 'librdtesting-01', // topic the message comes from // offset: 1337, // offset the message was read from // partition: 1, // partition the message was on // key: 'someKey', // key of the message if present // timestamp: 1510325354780 // timestamp of message creation // } }); console.log(`Stream consumer created to consume from topic ${topics}`); stream.consumer.on("disconnected", function (arg) { console.log(`The stream consumer has been disconnected`) process.exit(); }); // automatically disconnect the consumer after 30 seconds setTimeout(function () { stream.consumer.disconnect(); }, 30000)
Running the application is of course very straightforward – after running npm install to download the NPM module node-rdkafka and other dependencies. With a simple node produce.js you can start the production of three simple, generated messages to the CloudKarafka topic.
Using consume from topic in the Browser tab in the CloudKarafka console, we can check the arrival of these messages.
When you run consume.js, a connection is created to the same Kafka Topic that the messages were just produced to, and the messages are consumed:
In this case, both Node applications probably executed on your laptop. However, you could just as easily run the Producer somewhere in a cloud environment and the Consumer on a laptop in a faraway country. Because the Kafka Cluster runs in the cloud and can be accessed from anywhere, many options open up.
Summary
For a quick introduction to Apache Kafka, Cloud Karafka’s Developer Duck plan is hard to beat. In just a few minutes, the Apache Kafka Cluster is ready. Publishing and consuming messages to and from the cloud can start very rapidly. Workshops, demonstrations, group assignments will all benefit from the Developer Duck. I know my students did.