One of the very nice new features in SQL in Oracle Database 12c is the MATCH_RECOGNIZE operator. MATCH_RECOGNIZE adds a facility for spotting patterns in records. This allows us to locate records that are part of some kind of pattern relative to other records. It does sound similar to what Analytic Functions – most notably LAG and LEAD can do – but it is different. LAG and LEAD allow you to calculate the result of a record based on other records in the result set – but you are very limited in the ways in which you can indicate which other records in the result set are to be referenced by LAG and LEAD. Typically, it will be a fixed number of records before or after the record itself.
MATCH_RECOGNIZE allows us to have the database find a pattern – a regular expression expressed in terms of row conditions – in a far more flexible, dynamic and almost fuzzy way.
In this article, I will use this functionality to find the most valuable player in a football team (US readers: I mean the sports that you may refer to as Soccer). My definition of the MVP is the player who is most frequently part of a period of uninterrupted ball possession ending with a goal. Whether the play scores the goal, presents the assist or is involved earlier on does not matter: if he was part of the play leading up to the goal – we count the contribution. However, the ball possession ends when a player from the other team has possession of the ball.
Note: the same information can also be retrieved using a recursive query (start from action = G and recurse until a different team is encountered). I will provide that example in a different article.
I am also sure that some elaborate analytics with smart window definitions would probably get you there too.
We use two simple tables to record the matches and the match_events:
The action column holds values P (for pass – including throw in, penalty kick and corner kick), C (for causing a corner), O (for playing the ball outside the pitch) and G (for goal). Team is either A or B – A for the home team, and B for the team playing away. Player is the number on the player’s jersey, uniquely identifying the player within the team.
A few matches are created:
And some match events are loaded:
The first piece of SQL we create will return all actions by all players, identified by match, team, player_id and sequence (of the event within the match):
select me.match_id , me.seq , me.player , case me.team when 'A' then m.team_a when 'B' then m.team_b end team , me.action from match_events me join matches m on (me.match_id = m.id)
Now time for the interesting part, the match_recognize:
the player actions are fed into the match_recognize. Here they are partitioned by match. Within each match, we try to find all records that are part of a pattern that consists of 0, 1 or more possession instances – defined as a row where the same team makes the action as in the first row in the pattern – followed by exactly one goal instance. Goal is defined as an action of type G performed by the same team that did the possessions.
This match_recognize query returns records like this:
The last step is to remove duplicates – if a player participates multiple times a string of actions leading up to a goal, this will still count as one assist – and to aggregate. The query finally becomes:
with player_actions as ( select me.match_id , me.seq , me.player , case me.team when 'A' then m.team_a when 'B' then m.team_b end team , me.action from match_events me join matches m on (me.match_id = m.id) ) select team , player , count(distinct match_id||'XX'||match_num) assists_and_goals from ( select mr.match_id , mr.team , mr.player , match_num from player_actions match_recognize ( partition by match_id ORDER BY seq MEASURES MATCH_NUMBER() AS match_num , seq as seq ALL ROWS PER MATCH AFTER MATCH SKIP TO LAST GOAL PATTERN (POSSESSION+ GOAL) DEFINE POSSESSION AS POSSESSION.team = FIRST(POSSESSION.team), GOAL AS GOAL.action='G' and GOAL.team = POSSESSION.team ) MR ) player_goal_contributions group by team , player order by assists_and_goals desc
And the results are in: player number 7 from the Dutch team is the MVP based on the limited set of data available for our analysis:
Resources
Download SQL Scripts with all statements described in this article: mostValuablePlayer.
Some considerations with regard to measuring ball possession in football: http://leastthing.blogspot.nl/2012/02/how-ball-possession-is-measured-in.html .
Hi Lucas,
I think your use case for MATCH_RECOGNIZE is totally awesome. Great job. I have passed this link on to my pattern matching development team!
I was wondering if you would like to do a joint presentation at this year’s OOW (2014 call for papers is coming will open shortly) on this topic? We could get data from this year’s world cup and create a really fun session around this idea of “most valuable player”. Would really like to work on this with you if you are interested.
You can contact me directly at keith.laker@oracle.com.
regards
Keith
Keith Laker | Senior Principal Product Manager | Data Warehousing and SQL Analytics
Oracle UK Scotscroft, Towers Business Park, Wilmslow Road, Didsbury. M20 2RY
I think there might be a bug in the SQL….because I’ve never seen a dutch player be most valuable
hee hee hee
If you cannot beat the data, fiddle with the query…. 😉
Lucas