This article is part three in a series about Oracle – very recently released by Oracle and available from OTN. With Oracle Stream Explorer, business users and citizen developers as well as hard core developers can create explorations on top of live streaming data to filter, enrich, aggregate, inspect them and detect patterns. Stream Explorer provides a business user friendly browser based user interface in which streams are configured, explorations are composed, enrichment is set up and publication to external consumers is defined. In the previous two articles I showed how to create a simple exploration based on a stream of events read from a CSV file and how to derive aggregate values from these events. The second article introduced enrichment based on data in database table, adding more meaning to the aggregation performed on the stream.
In this article, we will look at some of the built in patterns that we can easily include in our explorations to detect or use for further aggregation and interpretation.
The use case is: we are organizing a small conference. In three rooms, sessions take place simultaneously. Our attendees are free to decide which session to attend. We would like to know at virtually any moment how many people are in each room. We have set up simple detectors at the doors of the rooms that produce a signal whenever someone enters or leaves the room. This signal consists of the room identifier (1,2 or 3) and an IN/OUT flag (values +1 or -1).
In article one, we simply determined the number of people per room – updated every 10 seconds with the latest events. In the second article we enriched these updates with details about the rooms – name of the room as well as its capacity. I am still looking for a way to calculate the occupancy percentage (number of attendees divided by room capacity) in order to alert for rooms at higher than 90% occupancy. (if SX does not support these calculations, I could add a virtual column to the ROOMS table that returns the number of attendees equal to 90% capacity and work from there).
In the article you are currently reading I will make use of some built in patterns:
- Top N – to report every 30 seconds what is the room with the highest number of people in it
- Detect Missing Event – identify event sources that have stopped publishing events (for example to find a room with a jammed door, a broken detector or no human activity)
- Eliminate Duplicates – do not report a specific event type (such as missing events because of jammed door) for the same room again if it was already reported in the last 30 seconds
Using a one of the predefined patterns is very simple: instead of creating a generic new exploration, we create a pattern-based-exploration and provide the parameters associated with the pattern.
Top N: report every 30 seconds what is the room with the highest number of people in it
In the Catalog, Create a new item and select Pattern as the type for the new item. A list of supported pattern types is shown. Pick the Top N entry:
The pattern template page appears – an exploration is configured, based on the template for the pattern.
The source for this exploration is EnrichedRoomFlow. We have to specify that we want to report every 20 seconds (slide 20) about the room that had the largest number of people in it (order by AttendeeCount). We want to look at the attendee count over the past 40 seconds – as to not call a room the most popular too easily. And since we are only interested in the single most popular room (instead of the top 2 or 3) we set N to 1.
Next we can configure the name and description of the exploration. Additionally, we can define a target – such as a CVS file – where the findings from the exploration are written to.
Click finish and publish the exploration.
The target file is created and before too long, the popular rooms are written to it:
Detect Missing Event
Create a new item of type Pattern. Select Detect Missing Event as the pattern.
We have to specify the source in which we want to detect the missing event. Select EnrichedRoomFlow – although NetRoomFlow would have done just as nicely. Specify the Tracking Field: which field(s) are we watching to check whether the event is reported or not? In this case, we are looking for rooms that do not report signals anymore. Therefore, the tracking field is the RoomId. We consider a room non-responding if we have not received any signal for 1 minute; that is our heartbeat interval.
The exploration can be configured in terms of its name and description:
Additionally, we can configure a target – a csv file to collect all reported events – all notifications of a non-responsive room:
Then, when the configuration is done, we can publish the Exploration – to have the OEP application deployed to the OEP Server. The target file is created, and events are started being written to it.
In order to test whether this exploration is working correctly, I have added a fourth room in the database table and included some entry and exit events for room #4:
The target file will contain – after a few minutes have passed – signals for room 4 – as expected. It turns out that room 3 is eventually reported as well. Apparently, my artificial set of events has a section where only rooms 1 and 2 are reported:
Another pattern we can easily leverage is eliminate duplicates. It allows us to ensure that over a specified period of time an event of a certain type is not published more than once – for a certain combination of property values. For example the non responsive rooms discussed overhead are reported repeatedly. Room number 4 is non responsive and the exploration keeps on telling us that. Now we may not need to get that information for the same room more often than say once per 5 minutes. This can easily be taken care of with this duplicate elimination pattern.
Create a new item of type pattern and select the eliminate duplicates pattern:
Set as the source the output from the detect missing events exploration that reports non-responsive rooms. Select RoomId as the key to de-duplicatie on and set the Window to 5 minutes – as we want to prevent duplicate reports within a five minute period.
Configure a target file, to test the output from the exploration:
The file is created and lo and behold: non responsive rooms are reported, but far less frequently than is done by the DetectJammedDoorsFailedDetectorsNonResponsiveRooms exploration in the nonResponsiveRooms