Oracle Database Release 12c (12.1) introduced a new operator that we can use in SQL queries. The MATCH_RECOGNIZE operator allows us to detect patterns in our relational data. Specifically: it allows us to identify records that mark the beginning of a set of records that together form a pattern. In this set, each record satisifies a certain condition. The fact that in a set of records these conditions are met – in a certain order – is proof of the pattern.
The condition a record adheres to can be as simple as: value in column is higher than value in previous record. And a pattern can then be as simple as: for at least three subsequent records is the value increasing. Or the condition is: “value in a certain column is equal to a constant” and the pattern equals “at least 6 and no more than 9 subsequent records all have that constant’s value”. And of course the conditions can be far more variant and the patterns much more complex.
The conditions could be:
* S = …
* O = —
With the filter: Find the pattern SOS – or: find three subsequent records for which the 1st and last matches condition S and the 2nd one satisfies condition O.
One common element among all pattern matching in relational data is that the data needs to be ordered in some specific order. This can be by time (very common) but also along other axes – such as distance from a certain position, altitude or price [range] or age [category].
In the Oracle OpenWorld 2016 session UGF 2630 Database ACES Share Their Favorite Database Things: Part I and II I have presented on this SQL Match Recognize feature. I had all of 5 minutes to tell my story (that was the nature of this session). Which was a bit of a challenge. You will find my slides below.
My main example was based on a well known painting:
It is called: who is afraid of Red, Yellow and Blue. It inspired me to the following challenge “Art in the Carpark”: do we have this pattern (red, yellow, blue) in our carpark? Do we have a sequence of three cars that have exactly these colors – red, yellow and blue?
I work with (made up) relational data in the following structure:
This particular challenge can just as easily be addressed using analytical functions (LAG or LEAD) as with MATCH_RECOGNIZE. The straightforward solution with LEAD:
and the corresponding solution using MATCH_RECOGNIZE – that certainly does not look any simpler at first glance. And frankly – it is not simpler. However, as we will soon see, it much more versatile and rich.
Suppose the rules for the query slightly change. For example: We would also recognize the pattern if there were two or three or more red cars followed by one ore more yellow cars followed by at least one and possibly more blue cars. And let’s also assume that white cars do not count. There can be any number of white cars between the red and the yellow or between the yellow and the blue cars, without disturbing the pattern. The following car sequence will still fit the pattern:
At this moment, we can start to appreciate the power of MATCH_RECOGNIZE.Extending the LEAD based solution to handle the extended rules for the RED YELLOW BLUE pattern is far from trivial. I have not given it a try – and using some aggregation partition by car_color I suspect is not extremely hard. But using MATCH_RECOGNIZE it is far easier to create a solution that is close to the requirement. By expressing the pattern definition as a regular expression: RED+ WHITE* YELLOW+ WHITE* BLUE+ (which is just the REGEXP equivalent of the rules stated before) we can use MATCH_RECOGNIZE to find the art in the car park like this:
This is hardly any more complex than the previous query – two small changes were all that was required to adapt the query to the updated requirement. The query returns from my demo data the following result:
The slides shown below provide more details on how to construct the regular expressions and which constructs and functions are available to create the conditions – including aggregation, cross row lookups. Even more details are in the documentation (http://docs.oracle.com/database/121/DWHSG/pattern.htm#DWHSG8956 ) and in many excellent blog articles by Stew Ashton (https://stewashton.wordpress.com/category/match_recognize/ ).
with look_ahead_cars as ( SELECT c.* -- for each car, find next and next after that , lead(car_color,1) over (order by parking_space) next_color , lead(car_color,2) over (order by parking_space) sec_nxt_color FROM parked_cars c ) select parking_space from look_ahead_cars where car_color ='red' –- for each red car and next_color ='yellow' -- check if next is yellow and sec_nxt_color='blue' –- and the one after that is blue ; SELECT * -- produces columns from parked_cars and from match_recognize FROM parked_cars -- record set to find pattern in MATCH_RECOGNIZE ( ORDER BY parking_space -- specify ordering of records for pattern MEASURES RED.parking_space AS red_space -- results to be produced , MATCH_NUMBER() AS match_num -- umptieth match ALL ROWS PER MATCH –- all records in pattern or only the first PATTERN (RED YELLOW BLUE) –- the pattern to locate DEFINE –- row conditions to be used in pattern RED AS RED.car_color = 'red', –- identify row with red car YELLOW AS YELLOW.car_color = 'yellow', –- locate row with yellow car BLUE AS BLUE.car_color = 'blue' –- record with blue car ) MR ORDER BY MR.red_space , MR.parking_space ; -- more advanced requirement SELECT * -- produces columns from parked_cars and from match_recognize FROM parked_cars -- record set to find pattern in MATCH_RECOGNIZE ( ORDER BY parking_space -- specify ordering of records for pattern MEASURES RED.parking_space AS red_space -- results to be produced , MATCH_NUMBER() AS match_num -- umptieth match ALL ROWS PER MATCH –- all records in pattern or only the first PATTERN (RED+ WHITE* YELLOW+ WHITE* BLUE+) DEFINE –- row conditions to be used in pattern RED AS RED.car_color ='red', –- match on row with red car YELLOW AS YELLOW.car_color ='yellow', –- match on yellow car BLUE AS BLUE.car_color ='blue'–- match on blue car WHITE AS WHITE.car_color ='white'–- match on white car ) MR ORDER BY MR.red_space , MR.parking_space
The slides from this presentation (embedded from SlideShare):