Simple Event Processing – Using SQL with Analytical Functions to detect traffic patterns
One of the intriguing subjects at last weeks Oracle Open World conference for me was Complex Event Processing. The ability to process streams of real time events, and up to 1000s per second of them, opens up many new avenues of interesting opportunities. Detecting patterns and anomalies, aggregating and comparing – reporting only the noteworthy events or findings, thereby making sense (information) of enormous volumes of data is both great fun and tremendously useful in specific situations. The SQL like CQL language – Continuous Query Language – provides us with a syntax to analyze data streams in various ways – detecting trends and patterns, variations and aberrations. There are several key characteristics in the CEP architecture: handling large volumes of events, applying this special language for event analysis and executing the queries continuously. That last bit means that the query is registered with a stream of events, not executed at a single moment in time against the data as it is at that moment but looking for query results continously as the events keep streaming in. Query results can be reported for as long as the query registration remains active and the events keep streaming in.
Normal SQL is definitely not capable of being executed continuously nor is it particularly suited for dealing with large volumes of real time events that we typically do not want to store in a database table. The special pattern analyzing functionaliy of CQL also is not available in SQL. However – SQL can do more in this area than you may suspect. In this article I will give a small and hardly useful demonstration of those capabilities. We will investigate a stream of Car Sighting events. A Car Sighting event consists of a timestamp and the color of the car.
The analysis of the Car Sighting data stream we will do should answer the question: report all car sightings that are followed within five cars by a car of the same color.
As SQL does not deal with data streams, we first need to create a table for the Car Sightings:
create table car_sightings ( passby_time timestamp , car_color varchar2(20) )
Note: the Oracle CEP will be able to access data in a table as if were a stream of events. Therefore the pattern detection etc. capabilities of CQL can also be applied to data that is stored in a regular table. The outcome of a CQL query can be further processed, dealing with it as if it were a stream or a relational result set. A stream can also be joined with a relational table.
Then we generate a load of random Car Sightings events – picking car colors from a list of nine colors:
insert into car_sightings ( passby_time, car_color) ( select systimestamp + 1/(24*60*60) * (rownum + 0.5 - 0.5 * dbms_random.value) , color from ( select colors.column_value color from table(string_table('red', 'yellow','green','brown','purple', 'silver', 'white','blue', 'orange')) colors ) colors cross join ( select 'x' from dual connect by level < 100 ) order by dbms_random.value )
The spread in timestamp is not very sophisticated – but it will serve our purpose.
Having the data set up, we need to write a query that fill find each car that satisfies the condition that within 5 following cars there is one with the same color. Or expressed slightly differently: a car for which holds true that the passby_time of the next car with the same color is earlier than the passby_time of the sixth following car. Stated like that, the Analytical LEAD function springs to mind: with LEAD used on the next car sighting in the partition of car sightings with the same color we get the passby_time of the next car of the same color; with LEAD used on the sixth following car we get the passby_time of that car. When we compare these two values, we can select all records where the first passby_time is earlier than second. That gives us all cars followed within five cars by a car with the same color. Quite simple once written down.
select car_color , passby_time , next_time_car_same_color , time_fifth_successor from ( select car_color , passby_time , lead(passby_time,6) over ( order by passby_time ) time_sixth_successor , lead(passby_time) over ( partition by car_color order by passby_time ) next_time_car_same_color from car_sightings ) where next_time_car_same_color < time_sixth_successor order by passby_time
The query above gives us the query results. The outer query selects all rows from the inner query where the next_time_car_same_color <= time_sixth_successor. The inner query adds to every record from CAR_SIGHTINGS the passby_time from the car sighting six rows down the line and the passby_time of the next car sighting of a car of the same color (within the partition based on car_color).
- Putting Analytical Functions to good use – find tables with multiple foreign keys to the same referenced table
- Introduction to Analytical Functions – Use of Analytics as alternative for Simple Aggregations
- SQL Question: Find overlap in periods – using Analytical Functions (LAG and LEAD)
- The Next Value – How much higher is my salary than the next in (Dense) Rank – Oracle Analytical Functions
- Analytical SQL Functions – theory and examples – Part 1 on the Partition Clause
- ADF: (re-)Introducing Contextual Events in several simple steps
- Oracle SQL – Finding free adjacent seats in an airplane, using Subquery Factoring, Analytical Functions (LISTAGG and LEAD), Outer Join, PIVOT and good old INSTR
- Oracle SQL – spotting combinations that occur and those that do not – demonstrating Analytical Functions, Outer Join and SubQuery Factoring
- Oracle SQL: Using subquery factoring in an INSERT statement
- OTN Yathra 2013 – Spreading the story of Oracle across India – (Half time)
- ADF DVT Speed Date: Present Metrics per Year, Quarter and Month using a zoom-enabled ADF DVT Resource Utilization Gantt and ADF BC
- Using TRUNC in SQL to get the first date in a period
- ADF DVT: Visualizing valid periods using Project and Scheduling Gantt Charts
- Advanced SQL to find valid periods – juggling with outer joins, running totals and analytical functions
- The Very Very Latest in Database Development – slides from the Expertezed presentation