A Free Apache Kafka Cloud Service – and how to quickly get started with it

Lucas Jellema

Last week I presented on Apache Kafka – twice. Once to a group of over 100 students, once to 30+ colleagues. In both instances, I invited attendees to partake in a workshop with hands-on labs to get acquainted with Apache Kafka. I had prepared a Docker Compose based Kafka platform (aided by the work by Guido Schmutz) that participants could install locally on their laptop. However, an alternative approach that did not involve any local installation and was actually quicker and just as free was through a free Apache Kafka cloud service: CloudKarafka.

In this article, I will show you can quickly get going on CloudKarafka.

CloudKarafka offers managed Kafka Clusters on the cloud and as part of their offering, they make a free plan available – the Developer Duck plan.

Go to the CloudKarafka website and scroll down to the Developer Duck plan. Click on Try a Developer Duck.

You will be asked to login, either with your GitHub account, a Google account or a new account that you create via Sign Up.

Specify a name for the instance – for example Online Meetup Apache Kafka (however, Butterfly will also do nicely).

 

The Developer Duck plan (FREE!) should already be selected.

Click on Select Region.

The default region – and the one that seems most stable – is US-East-1 on AWS. You might as well stick to that option.

Press Review.

You will get an overview of the choices you have made. You can review and decide to go back to revise. But why would you? Go ahead, and press Create Instance.

After a little while, you will be notified that your new instance has been provided for you.

Click on name of the instance. It is a hyperlink, and it will take you to a page with details regarding your new instance.

You will need various details on this page later on when start to programmatically access Kafka Topics.

  • username
  • password
  • Topic Prefix
  • Servers (endpoints of the Kafka brokers in your new cluster instance)

You can always return to this page to look up these details.

Try out the new Cloud Karafka Ducky Plan Instance

Go to the Topic tab. You will see one topic listed: a default topic that was created along with your Kafka instance. Note that under the Ducky Plan you can create up to 5 topics. And you can also delete them.

Select and copy the name of the default topic to the clipboard. Then open the Browser tab.

On this page, you can publish messages to your topic or consumer messages from your topic. Paste the name of the topic into both Topic fields: Then click on Consume. A consumer is created and the messages consumed are pushed to the browser.

Enter a message in the Producer area and click on Produce. The message is published to the topic. Next, it is consumed from the topic, pushed to the browser and displayed in the page. An indication of the partition from which the message is consumed is also provided.

Feel free to play around with different types of messages.

Note: most advanced tools available from CloudKarafka are unfortunately though not unexpectedly not part of the free plan.

 

Programmatic Interaction with CloudKarafka’s Free Developer Duck Plan from Node applications

The hello world of Kafka clients in Node (JS) – that is the ambition in this section, nothing more. The sources are available from GitHub: https://github.com/AMIS-Services/online-meetups-introduction-of-kafka/tree/master/lab2-programmatic-consume-and-produce.

The NPM module repository returns over 550 modules when searched for the keyword kafka. Not all of them are libraries to facilitate the interaction from your Node application with Apache Kafka clusters – but over a dozen are. In this lab, we will work with the node-rdkafka NPM module, node-rdkafka on GitHub for details on this particular library and Reference Docs for the API specification. The node-rdkafka library is a high-performance NodeJS client for Apache Kafka that wraps the native (C based) librdkafka library. All the complexity of balancing writes across partitions and managing (possibly ever-changing) brokers should be encapsulated in the library.

The application has a simple enough package.json file – with the dependency on node-rdkafka as the most important element.

{
  "name": "nodejs-kafka-example",
  "version": "1.0.0",
  "description": "",
  "main": "index.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "author": "",
  "license": "ISC",
  "dependencies": {
    "node-rdkafka": "^2.2.0"
  }
}

image

 

The config.js file contains the configuration of the Kafka Cluster. It is this file that you need to update based on the connection details of the CloudKarafka instance:

const topic = "kjynvuby-default" // set the correct topic name, especially when you are using CloudKarafka

const kafkaConfig = {
    // Specify the endpoints of the CloudKarafka Servers for your instance found under Connection Details on the Instance Details Page
    // this looks like this: moped-01.srvs.cloudkafka.com:9094,moped-02.srvs.cloudkafka.com:9094,moped-03.srvs.cloudkafka.com:9094"
    "metadata.broker.list": "moped-01.srvs.cloudkafka.com:9094,moped-02.srvs.cloudkafka.com:9094,moped-03.srvs.cloudkafka.com:9094"
    , "security.protocol": "SASL_SSL",
    "sasl.mechanisms": "SCRAM-SHA-256",
    "sasl.username": "kjynvuby",
    "sasl.password": "D0_sMX2ICVfuOfYjpZE8VdAnMlrknXSd" 
};

module.exports = { kafkaConfig, topic };

Producing

The simplest Node client for producing messages to a Kafka Topic looks like this – no dependencies on your specific Kafka Cluster (it leverages config.js for that):

const Kafka = require("node-rdkafka");
// read the KAFKA Brokers and KAFKA_TOPIC values from the local file config.js
const externalConfig = require('./config');


// function to generate a message
const generateMessage = i => new Buffer.from(`Generated a happy message - number ${i}`);

function generateAndProduceMessages(arg) {
    for (var i = 0; i < messageBatchSize; i++) {
        producer.produce(topic, -1, generateMessage(i), i)
    }
    console.log(`producer ${arg.name} is done producing messages to Kafka Topic ${topic}.`)
}

// construct a Kafka Configuration object understood by the node-rdkafka library
// merge the configuration as defined in config.js with additional properties defined here
const kafkaConf = {...externalConfig.kafkaConfig
    , ...{
    "socket.keepalive.enable": true,
    "debug": "generic,broker,security"}
};
const messageBatchSize = 3; // number of messages to publish in one burst
const topic = externalConfig.topic;

// create a Kafka Producer - connected to the KAFKA_BROKERS defined in config.js
const producer = new Kafka.Producer(kafkaConf);
prepareProducer(producer)
// initialize the connection of the Producer to the Kafka Cluster
producer.connect();

function prepareProducer(producer) {
    // event handler attached to the Kafka Producer to handle the ready event that is emitted when the Producer has connected sucessfully to the Kafka Cluster
    producer.on("ready", function (arg) {
        console.log(`Producer connection to Kafka Cluster is ready; message production starts now`)
        generateAndProduceMessages(arg);
        // after 10 seconds, disconnect the producer from the Kafka Cluster
        setTimeout(() => producer.disconnect(), 10000);
    });

    producer.on("disconnected", function (arg) {
        process.exit();
    });

    producer.on('event.error', function (err) {
        console.error(err);
        process.exit(1);
    });
    // This event handler is triggered whenever the event.log event is emitted, which is quite often
    producer.on('event.log', function (log) {
        // uncomment the next line if you want to see a log message every step of the way
        //console.log(log);
    });
}

Consuming

And finally the simplest Node message consumer is shown below. This consume.js module also depends on config.js for the configuration details of the actual [CloudKarafka] Kafka Cluster.

const Kafka = require("node-rdkafka"); // see: https://github.com/blizzard/node-rdkafka
const externalConfig = require('./config');

const CONSUMER_GROUP_ID = "node-consumer-2"
// construct a Kafka Configuration object understood by the node-rdkafka library
// merge the configuration as defined in config.js with additional properties defined here
const kafkaConf = {...externalConfig.kafkaConfig
    , ...{
    "group.id": CONSUMER_GROUP_ID,
    "socket.keepalive.enable": true,
    "debug": "generic,broker,security"}
};

const topics = [externalConfig.topic]


var stream = new Kafka.KafkaConsumer.createReadStream(kafkaConf, { "auto.offset.reset": "earliest" }, {
    topics: topics
});

stream.on('data', function (message) {
    console.log(`Consumed message on Stream: ${message.value.toString()}`);
    // the structure of the messages is as follows:
    //   {
    //     value: Buffer.from('hi'), // message contents as a Buffer
    //     size: 2, // size of the message, in bytes
    //     topic: 'librdtesting-01', // topic the message comes from
    //     offset: 1337, // offset the message was read from
    //     partition: 1, // partition the message was on
    //     key: 'someKey', // key of the message if present
    //     timestamp: 1510325354780 // timestamp of message creation
    //   }
});

console.log(`Stream consumer created to consume from topic ${topics}`);

stream.consumer.on("disconnected", function (arg) {
    console.log(`The stream consumer has been disconnected`)
    process.exit();
});

// automatically disconnect the consumer after 30 seconds
setTimeout(function () {
    stream.consumer.disconnect();
}, 30000)

Running the application is of course very straightforward – after running npm install to download the NPM module node-rdkafka and other dependencies. With a simple node produce.js you can start the production of three simple, generated messages to the CloudKarafka topic.

image

 

 

Using consume from topic in the Browser tab in the CloudKarafka console, we can check the arrival of these messages.image

When you run consume.js, a connection is created to the same Kafka Topic that the messages were just produced to, and the messages are consumed:image

In this case, both Node applications probably executed on your laptop. However, you could just as easily run the Producer somewhere in a cloud environment and the Consumer on a laptop in a faraway country. Because the Kafka Cluster runs in the cloud and can be accessed from anywhere, many options open up.

 

Summary

For a quick introduction to Apache Kafka, Cloud Karafka’s Developer Duck plan is hard to beat. In just a few minutes, the Apache Kafka Cluster is ready. Publishing and consuming messages to and from the cloud can start very rapidly. Workshops, demonstrations, group assignments will all benefit from the Developer Duck. I know my students did.

 

 

 

 

 

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Next Post

DIY Kafka Topic Watcher tool - Node, Express, Server Sent Events and Apache Kafka

Facebook 0 Twitter Linkedin This article can be read in at least two different ways: as a somewhat lengthy introduction of a handy tool that you can easily run to inspect messages published to the topics in a Kafka Cluster as a detailed yet basic example of how to combine […]