Oracle SQL Pattern Recognition – introducing the Match Recognize operator

0

imageOracle Database Release 12c (12.1) introduced a new operator that we can use in SQL queries. The MATCH_RECOGNIZE operator allows us to detect patterns in our relational data. Specifically: it allows us to identify records that mark the beginning of a set of records that together form a pattern. In this set, each record satisifies a certain condition. The fact that in a set of records these conditions are met – in a certain order – is proof of the pattern.

The condition a record adheres to can be as simple as: value in column is higher than value in previous record. And a pattern can then be as simple as: for at least three subsequent records is the value increasing. Or the condition is: “value in a certain column is equal to a constant” and the pattern equals “at least 6 and no more than 9 subsequent records all have that constant’s value”. And of course the conditions can be far more variant and the patterns much more complex.

The conditions could be:

* S = …

* O = —

With the filter: Find the pattern SOS – or: find three subsequent records for which the 1st and last matches condition S and the 2nd one satisfies condition O.

One common element among all pattern matching in relational data is that the data needs to be ordered in some specific order. This can be by time (very common) but also along other axes – such as distance from a certain position, altitude or price [range] or age [category].

image

In the Oracle OpenWorld 2016 session UGF 2630 Database ACES Share Their Favorite Database Things: Part I and II I have presented on this SQL Match Recognize feature. I had all of 5 minutes to tell my story (that was the nature of this session). Which was a bit of a challenge. You will find my slides below.

My main example was based on a well known painting:

image

 

It is called: who is afraid of Red, Yellow and Blue. It inspired me to the following challenge “Art in the Carpark”: do we have this pattern (red, yellow, blue) in our carpark? Do we have a sequence of three cars that have exactly these colors – red, yellow and blue?

I work with (made up) relational data in the following structure:

image

 

This particular challenge can just as easily be addressed using analytical functions (LAG or LEAD) as with MATCH_RECOGNIZE. The straightforward solution with LEAD:

image

and the corresponding solution using MATCH_RECOGNIZE – that certainly does not look any simpler at first glance. And frankly – it is not simpler. However, as we will soon see, it much more versatile and rich.

image

Suppose the rules for the query slightly change. For example: We would also recognize the pattern if there were two or three or more red cars followed by one ore more yellow cars followed by at least one and possibly more blue cars. And let’s also assume that white cars do not count. There can be any number of white cars between the red and the yellow or between the yellow and the blue cars, without disturbing the pattern. The following car sequence will still fit the pattern:

image

At this moment, we can start to appreciate the power of MATCH_RECOGNIZE.Extending the LEAD based solution to handle the extended rules for the RED YELLOW BLUE pattern is far from trivial. I have not given it a try – and using some aggregation partition by car_color I suspect is not extremely hard. But using MATCH_RECOGNIZE it is far easier to create a solution that is close to the requirement. By expressing the pattern definition as a regular expression: RED+ WHITE* YELLOW+ WHITE* BLUE+ (which is just the REGEXP equivalent of the rules stated before) we can use MATCH_RECOGNIZE to find the art in the car park like this:

image

This is hardly any more complex than the previous query – two small changes were all that was required to adapt the query to the updated requirement. The query returns from my demo data the following result:

image

 

The slides shown below provide more details on how to construct the regular expressions and which constructs and functions are available to create the conditions – including aggregation, cross row lookups. Even more details are in the documentation (http://docs.oracle.com/database/121/DWHSG/pattern.htm#DWHSG8956 ) and in many excellent blog articles by Stew Ashton (https://stewashton.wordpress.com/category/match_recognize/ ).

Code Samples:

with look_ahead_cars as
( SELECT c.* -- for each car, find next and next after that
  ,      lead(car_color,1) over (order by parking_space) next_color
  ,      lead(car_color,2) over (order by parking_space) sec_nxt_color
  FROM   parked_cars  c
)
select  parking_space
from    look_ahead_cars
where   car_color ='red'     –- for each red car
and     next_color ='yellow' -- check if next is yellow
and     sec_nxt_color='blue' –- and the one after that is blue
;

SELECT * -- produces columns from parked_cars and from match_recognize
FROM   parked_cars -- record set to find pattern in
MATCH_RECOGNIZE 
(
  ORDER BY parking_space -- specify ordering of records for pattern
  MEASURES RED.parking_space AS red_space -- results to be produced
  ,        MATCH_NUMBER() AS match_num -- umptieth match
  ALL ROWS PER MATCH –- all records in pattern or only the first
  PATTERN (RED YELLOW BLUE) –- the pattern to locate
  DEFINE –- row conditions to be used in pattern
  RED    AS RED.car_color    = 'red', –- identify row with red car
  YELLOW AS YELLOW.car_color = 'yellow', –- locate row with yellow car
  BLUE   AS BLUE.car_color   = 'blue' –- record with blue car
) MR
ORDER 
BY     MR.red_space
,      MR.parking_space
;

-- more advanced requirement
SELECT * -- produces columns from parked_cars and from match_recognize
FROM   parked_cars -- record set to find pattern in
MATCH_RECOGNIZE 
(
  ORDER BY parking_space -- specify ordering of records for pattern
  MEASURES RED.parking_space AS red_space -- results to be produced
  ,        MATCH_NUMBER() AS match_num -- umptieth match
  ALL ROWS PER MATCH –- all records in pattern or only the first
  PATTERN (RED+ WHITE* YELLOW+ WHITE* BLUE+)
  DEFINE –- row conditions to be used in pattern
  RED    AS RED.car_color    ='red', –- match on row with red car
  YELLOW AS YELLOW.car_color ='yellow', –- match on yellow car
  BLUE   AS BLUE.car_color   ='blue'–- match on blue car 
  WHITE  AS WHITE.car_color  ='white'–- match on white car
) MR
ORDER 
BY     MR.red_space
,      MR.parking_space



The slides from this presentation (embedded from SlideShare):

About Author

Lucas Jellema, active in IT (and with Oracle) since 1994. Oracle ACE Director and Oracle Developer Champion. Solution architect and developer on diverse areas including SQL, JavaScript, Kubernetes & Docker, Machine Learning, Java, SOA and microservices, events in various shapes and forms and many other things. Author of the Oracle Press book Oracle SOA Suite 12c Handbook. Frequent presenter on user groups and community events and conferences such as JavaOne, Oracle Code, CodeOne, NLJUG JFall and Oracle OpenWorld.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.