Oracle Database 12c: Find most valuable player using MATCH_RECOGNIZE in SQL

3

One of the very nice new features in SQL in Oracle Database 12c is the MATCH_RECOGNIZE operator. MATCH_RECOGNIZE adds a facility for spotting patterns in records. This allows us to locate records that are part of some kind of pattern relative to other records. It does sound similar to what Analytic Functions – most notably LAG and LEAD can do – but it is different. LAG and LEAD allow you to calculate the result of a record based on other records in the result set – but you are very limited in the ways in which you can indicate which other records in the result set are to be referenced by LAG and LEAD. Typically, it will be a fixed number of records before or after the record itself.

MATCH_RECOGNIZE allows us to have the database find a pattern – a regular expression expressed in terms of row conditions – in a far more flexible, dynamic and almost fuzzy way.

In this article, I will use this functionality to find the most valuable player in a football team (US readers: I mean the sports that you may refer to as Soccer). My definition of the MVP is the player who is most frequently part of a period of uninterrupted ball possession ending with a goal. Whether the play scores the goal, presents the assist or is involved earlier on does not matter: if he was part of the play leading up to the goal – we count the contribution. However, the ball possession ends when a player from the other team has possession of the ball.

image

Note: the same information can also be retrieved using a recursive query (start from action = G and recurse until a different team is encountered). I will provide that example in a different article.

I am also sure that some elaborate analytics with smart window definitions would probably get you there too.

We use two simple tables to record the matches and the match_events:

image

The action column holds values P (for pass – including throw in, penalty kick and corner kick), C (for causing a corner), O (for playing the ball outside the pitch) and G (for goal). Team is either A or B – A for the home team, and B for the team playing away. Player is the number on the player’s jersey, uniquely identifying the player within the team.

A few matches are created:

image

And some match events are loaded:

image

The first piece of SQL we create will return all actions by all players, identified by match, team, player_id and sequence (of the event within the match):

select me.match_id
, me.seq
, me.player
, case me.team
when 'A' then m.team_a
when 'B' then m.team_b
end team
, me.action
from match_events me
join
matches m
on (me.match_id = m.id)

image

Now time for the interesting part, the match_recognize:

image

the player actions are fed into the match_recognize. Here they are partitioned by match. Within each match, we try to find all records that are part of a pattern that consists of 0, 1 or more possession instances – defined as a row where the same team makes the action as in the first row in the pattern – followed by exactly one goal instance. Goal is defined as an action of type G performed by the same team that did the possessions.

This match_recognize query returns records like this:

image

The last step is to remove duplicates – if a player participates multiple times a string of actions leading up to a goal, this will still count as one assist – and to aggregate. The query finally becomes:

with player_actions as
( select me.match_id
, me.seq
, me.player
, case me.team
when 'A' then m.team_a
when 'B' then m.team_b
end team
, me.action
from match_events me
join
matches m
on (me.match_id = m.id)
)
select team
, player
, count(distinct match_id||'XX'||match_num) assists_and_goals
from ( select mr.match_id
, mr.team
, mr.player
, match_num
from player_actions
match_recognize
(
partition by match_id
ORDER BY seq
MEASURES MATCH_NUMBER() AS match_num
, seq as seq
ALL ROWS PER MATCH
AFTER MATCH SKIP TO LAST GOAL
PATTERN (POSSESSION+ GOAL)
DEFINE
POSSESSION AS POSSESSION.team = FIRST(POSSESSION.team),
GOAL AS GOAL.action='G' and GOAL.team = POSSESSION.team
) MR
) player_goal_contributions
group
by team
, player
order
by assists_and_goals desc

And the results are in: player number 7 from the Dutch team is the MVP based on the limited set of data available for our analysis:

image

Resources

Download SQL Scripts with all statements described in this article: mostValuablePlayer.

Some considerations with regard to measuring ball possession in football: http://leastthing.blogspot.nl/2012/02/how-ball-possession-is-measured-in.html .

Share.

About Author

Lucas Jellema, active in IT (and with Oracle) since 1994. Oracle ACE Director for Fusion Middleware. Consultant, trainer and instructor on diverse areas including Oracle Database (SQL & PLSQL), Service Oriented Architecture, BPM, ADF, Java in various shapes and forms and many other things. Author of the Oracle Press book: Oracle SOA Suite 11g Handbook. Frequent presenter on conferences such as JavaOne, Oracle OpenWorld, ODTUG Kaleidoscope, Devoxx and OBUG. Presenter for Oracle University Celebrity specials.

3 Comments

  1. Hi Lucas,

    I think your use case for MATCH_RECOGNIZE is totally awesome. Great job. I have passed this link on to my pattern matching development team!

    I was wondering if you would like to do a joint presentation at this year’s OOW (2014 call for papers is coming will open shortly) on this topic? We could get data from this year’s world cup and create a really fun session around this idea of “most valuable player”. Would really like to work on this with you if you are interested.

    You can contact me directly at keith.laker@oracle.com.

    regards

    Keith
    Keith Laker | Senior Principal Product Manager | Data Warehousing and SQL Analytics
    Oracle UK Scotscroft, Towers Business Park, Wilmslow Road, Didsbury. M20 2RY

Leave a Reply