One of the intriguing subjects at last weeks Oracle Open World conference for me was Complex Event Processing. The ability to process streams of real time events, and up to 1000s per second of them, opens up many new avenues of interesting opportunities. Detecting patterns and anomalies, aggregating and comparing – reporting only the noteworthy events or findings, thereby making sense (information) of enormous volumes of data is both great fun and tremendously useful in specific situations. The SQL like CQL language – Continuous Query Language – provides us with a syntax to analyze data streams in various ways – detecting trends and patterns, variations and aberrations. There are several key characteristics in the CEP architecture: handling large volumes of events, applying this special language for event analysis and executing the queries continuously. That last bit means that the query is registered with a stream of events, not executed at a single moment in time against the data as it is at that moment but looking for query results continously as the events keep streaming in. Query results can be reported for as long as the query registration remains active and the events keep streaming in.
Normal SQL is definitely not capable of being executed continuously nor is it particularly suited for dealing with large volumes of real time events that we typically do not want to store in a database table. The special pattern analyzing functionaliy of CQL also is not available in SQL. However – SQL can do more in this area than you may suspect. In this article I will give a small and hardly useful demonstration of those capabilities. We will investigate a stream of Car Sighting events. A Car Sighting event consists of a timestamp and the color of the car.
The analysis of the Car Sighting data stream we will do should answer the question: report all car sightings that are followed within five cars by a car of the same color.
As SQL does not deal with data streams, we first need to create a table for the Car Sightings:
create table car_sightings ( passby_time timestamp , car_color varchar2(20) )
Note: the Oracle CEP will be able to access data in a table as if were a stream of events. Therefore the pattern detection etc. capabilities of CQL can also be applied to data that is stored in a regular table. The outcome of a CQL query can be further processed, dealing with it as if it were a stream or a relational result set. A stream can also be joined with a relational table.
Then we generate a load of random Car Sightings events – picking car colors from a list of nine colors:
insert into car_sightings ( passby_time, car_color) ( select systimestamp + 1/(24*60*60) * (rownum + 0.5 - 0.5 * dbms_random.value) , color from ( select colors.column_value color from table(string_table('red', 'yellow','green','brown','purple', 'silver', 'white','blue', 'orange')) colors ) colors cross join ( select 'x' from dual connect by level < 100 ) order by dbms_random.value )
The spread in timestamp is not very sophisticated – but it will serve our purpose.
Having the data set up, we need to write a query that fill find each car that satisfies the condition that within 5 following cars there is one with the same color. Or expressed slightly differently: a car for which holds true that the passby_time of the next car with the same color is earlier than the passby_time of the sixth following car. Stated like that, the Analytical LEAD function springs to mind: with LEAD used on the next car sighting in the partition of car sightings with the same color we get the passby_time of the next car of the same color; with LEAD used on the sixth following car we get the passby_time of that car. When we compare these two values, we can select all records where the first passby_time is earlier than second. That gives us all cars followed within five cars by a car with the same color. Quite simple once written down.
select car_color , passby_time , next_time_car_same_color , time_fifth_successor from ( select car_color , passby_time , lead(passby_time,6) over ( order by passby_time ) time_sixth_successor , lead(passby_time) over ( partition by car_color order by passby_time ) next_time_car_same_color from car_sightings ) where next_time_car_same_color < time_sixth_successor order by passby_time
The query above gives us the query results. The outer query selects all rows from the inner query where the next_time_car_same_color <= time_sixth_successor. The inner query adds to every record from CAR_SIGHTINGS the passby_time from the car sighting six rows down the line and the passby_time of the next car sighting of a car of the same color (within the partition based on car_color).