Events are quite relevant in modern computer architecture. At various levels of the stack. Events can represent various things – from IoT based measurements and logistical updates to web site activities and business transactions to operational IT metrics and situations. Events can be produced at peak speeds and high volumes – and may require rapid and robust processing. Apache Kafka has grown into almost the de facto foundation for event processing. Apache Kafka provides high volume and high speed and fault tolerancy in processing events. It is a unified, high-throughput, low-latency platform for handling real-time data feeds. It is, in its essence, a “massively scalable pub/sub message queue architected as a distributed transaction log,”making it highly valuable for enterprise infrastructures to process streaming data. (see the website for Apache Kafka for more details: https://kafka.apache.org/ )
Very simply put: events are sent to Kafka by producers. Events are associated with specific topics. Events are stored in Kafka in partitions. Partitions are kept in transaction log [files] that are replicated across multiple nodes. Consumers can read events from the partitions in the order in which they were originally received – for as long as Kafka retains them. Batch-wise consumption is available for lower-overhead processing of events. The next picture visualizes Kafka at a very high level:
illustration source: https://www.cloudera.com/documentation/kafka/1-2-x/topics/kafka.html
In various keynote sessions during Oracle OpenWorld 2016, Apache Kafka made an appearance and some there was an indication of a Kafka powered Event Hub as part of the Oracle PaaS fabric or family of cloud services. Kafka would make it feasible to have the Oracle PaaS Cloud handle fast data and real time events in a well organized, structured manner – and make these events readily available to multiple consumers in a uniform way.
Oracle Functions – jobs executed in a serverless architecture – for example can be triggered by an event on the Event Hub as well as trigger such an event themselves – see next screenshots.
Not surprisingly, Kafka plays a role in the Big Data and Streaming (Fast Data) platform, see next screenshot:
Kafka is the perfect landing pad for large event volumes – to capture them in a safe manner and make them available for processing by multiple parallel consumers.
Machine Learning – one of the popular themes – is typically driven from large data volumes that arrive in real time and have to be captured and analyzed to derive models from. Once the models have been derived, real time data – events- is analyzed and fed into the models in order to predict outcomes and/or derive recommendations. In both stages, Kafka can play a role, as is illustrated by this next screenshot.
The Oracle Stream Analytics product (fka Stream Explorer, fka Oracle Event Processor and CEP) also works with Kafka, using it as a source for events to process – and as a target as well. Read https://guidoschmutz.wordpress.com/2016/05/05/oracle-stream-analytics-osa-the-new-oracle-stream-explorer/ (by Guido Schmutz) for more details or check out http://www.rittmanmead.com/blog/2016/07/stream-analytics-processing-kafka-oracle-stream-analytics/ (by Robin Moffatt) for even more background.
This screenshot is from Oracle Stream Analytics: