Oracle Database 12c: Find most valuable player using MATCH_RECOGNIZE in SQL image46

Oracle Database 12c: Find most valuable player using MATCH_RECOGNIZE in SQL

One of the very nice new features in SQL in Oracle Database 12c is the MATCH_RECOGNIZE operator. MATCH_RECOGNIZE adds a facility for spotting patterns in records. This allows us to locate records that are part of some kind of pattern relative to other records. It does sound similar to what Analytic Functions – most notably LAG and LEAD can do – but it is different. LAG and LEAD allow you to calculate the result of a record based on other records in the result set – but you are very limited in the ways in which you can indicate which other records in the result set are to be referenced by LAG and LEAD. Typically, it will be a fixed number of records before or after the record itself.

MATCH_RECOGNIZE allows us to have the database find a pattern – a regular expression expressed in terms of row conditions – in a far more flexible, dynamic and almost fuzzy way.

In this article, I will use this functionality to find the most valuable player in a football team (US readers: I mean the sports that you may refer to as Soccer). My definition of the MVP is the player who is most frequently part of a period of uninterrupted ball possession ending with a goal. Whether the play scores the goal, presents the assist or is involved earlier on does not matter: if he was part of the play leading up to the goal – we count the contribution. However, the ball possession ends when a player from the other team has possession of the ball.

image

Note: the same information can also be retrieved using a recursive query (start from action = G and recurse until a different team is encountered). I will provide that example in a different article.

I am also sure that some elaborate analytics with smart window definitions would probably get you there too.

We use two simple tables to record the matches and the match_events:

image

The action column holds values P (for pass – including throw in, penalty kick and corner kick), C (for causing a corner), O (for playing the ball outside the pitch) and G (for goal). Team is either A or B – A for the home team, and B for the team playing away. Player is the number on the player’s jersey, uniquely identifying the player within the team.

A few matches are created:

image

And some match events are loaded:

image

The first piece of SQL we create will return all actions by all players, identified by match, team, player_id and sequence (of the event within the match):

select me.match_id
, me.seq
, me.player
, case me.team
when 'A' then m.team_a
when 'B' then m.team_b
end team
, me.action
from match_events me
join
matches m
on (me.match_id = m.id)

image

Now time for the interesting part, the match_recognize:

image

the player actions are fed into the match_recognize. Here they are partitioned by match. Within each match, we try to find all records that are part of a pattern that consists of 0, 1 or more possession instances – defined as a row where the same team makes the action as in the first row in the pattern – followed by exactly one goal instance. Goal is defined as an action of type G performed by the same team that did the possessions.

This match_recognize query returns records like this:

image

The last step is to remove duplicates – if a player participates multiple times a string of actions leading up to a goal, this will still count as one assist – and to aggregate. The query finally becomes:

with player_actions as
( select me.match_id
, me.seq
, me.player
, case me.team
when 'A' then m.team_a
when 'B' then m.team_b
end team
, me.action
from match_events me
join
matches m
on (me.match_id = m.id)
)
select team
, player
, count(distinct match_id||'XX'||match_num) assists_and_goals
from ( select mr.match_id
, mr.team
, mr.player
, match_num
from player_actions
match_recognize
(
partition by match_id
ORDER BY seq
MEASURES MATCH_NUMBER() AS match_num
, seq as seq
ALL ROWS PER MATCH
AFTER MATCH SKIP TO LAST GOAL
PATTERN (POSSESSION+ GOAL)
DEFINE
POSSESSION AS POSSESSION.team = FIRST(POSSESSION.team),
GOAL AS GOAL.action='G' and GOAL.team = POSSESSION.team
) MR
) player_goal_contributions
group
by team
, player
order
by assists_and_goals desc

And the results are in: player number 7 from the Dutch team is the MVP based on the limited set of data available for our analysis:

image

Resources

Download SQL Scripts with all statements described in this article: mostValuablePlayer.

Some considerations with regard to measuring ball possession in football: http://leastthing.blogspot.nl/2012/02/how-ball-possession-is-measured-in.html .

3 Comments

  1. Keith Laker March 14, 2014
  2. connormcdonald July 25, 2013
    • Lucas Jellema July 25, 2013