Few things lend itself so well to SQL explorations as sport related data. The imminent World Cup Football (or Soccer in some deprived parts of the world)2014 in Brazil is no exception. In this article, I will show and example of how to query the results for the matches played thusfar in each group and derive from these results the current standing in the group, using all the rules laid down by FIFA (as documented here).
After creating a table with all competing teams and assigning them to groups, we will create a query to produce all matches to be played – according to a simple algorithm. Next we create table to record the match results. Using this table, we can construct a query to retrieve the current standings in each group. The result will look something like this (note: using imaginary match results):
First create the table with the teams and their group assignment (which group and which seat within the group):
create table teams ( country varchar2(3) , group1 varchar2(1) , seq_in_group number(1) ); insert into teams (country, group1, seq_in_group) values ('BRA','A',1); insert into teams (country, group1, seq_in_group) values ('CRO','A',2); insert into teams (country, group1, seq_in_group) values ('MEX','A',3); insert into teams (country, group1, seq_in_group) values ('CMR','A',4); ...
Next, the table that describes the matches to be played in each group (the sequence of the matches):
create table match_schedule ( seq number(1) , home number(1) , away number(1) ); insert into match_schedule (seq, home, away) values (1,1,2); insert into match_schedule (seq, home, away) values (2,3,4); insert into match_schedule (seq, home, away) values (3,1,3); insert into match_schedule (seq, home, away) values (4,4,2); insert into match_schedule (seq, home, away) values (5,4,1); insert into match_schedule (seq, home, away) values (6,2,3);
From this data, we can query the matches to be played – per group and in the proper sequence (note that the matches of Spain vs Chile and The Netherlands vs Australia seem to be out of order considering the pattern that occurs in all other groups).
select ms.seq , t1.group1 , t1.country , t2.country from match_schedule ms join teams t1 on (ms.home = t1.seq_in_group) join teams t2 on ( ms.away= t2.seq_in_group and t1.group1 = t2.group1 ) order by group1 , seq
This is a straightforward join from the match_schedule table to the teams table (twice) – ensuring that both home and away teams are from the same group.
The result from this query looks like this:
Next up is the table MATCH_RESULTS where the outcomes of the matches are recorded:
create table match_results ( group1 varchar2(1) , home number(1) , away number(1) , home_goals number(2) , away_goals number(2) );
I then insert some made up records (at the time of writing, these matches have not been played yet) in to this table.
insert into match_results ( group1 , home, away, home_goals, away_goals) values ('A',1,2,2,1); insert into match_results ( group1 , home, away, home_goals, away_goals) values ('A',3,4,1,3); insert into match_results ( group1 , home, away, home_goals, away_goals) values ('A',1,3,0,0); insert into match_results ( group1 , home, away, home_goals, away_goals) values ('A',4,2,3,1); insert into match_results ( group1 , home, away, home_goals, away_goals) values ('B',1,2,1,1); insert into match_results ( group1 , home, away, home_goals, away_goals) values ('B',3,4,2,0); insert into match_results ( group1 , home, away, home_goals, away_goals) values ('B',1,3,3,1); insert into match_results ( group1 , home, away, home_goals, away_goals) values ('B',4,2,0,3);
These results can be queried with this SQL statement:
select mr.group1 "Group" , t1.country||'-'||t2.country match , mr.home_goals||'-'||away_goals score from match_results mr join teams t1 on (mr.home = t1.seq_in_group and mr.group1 = t1.group1) join teams t2 on ( mr.away= t2.seq_in_group and t1.group1 = t2.group1 ) join match_schedule ms on (ms.home = mr.home and ms.away = mr.away ) order by mr.group1 , ms.seq
and displayed like this:
Finally, taking the match results, it is possible to query the group standings. We know that when a team has scored more goals than the opposing side, it will add three points to its total and when it plays a draw, it will add a single point. When two teams have the same number of match points, their tie is resolved based on goal balance (difference between goals scored and goals conceded) and if that does not resolve the tie the total number of goals scored decides. There are rules beyond this – to resolve still further ties. I have not worked them into the following query:
with results as ( select mr.group1 , mr.home , mr.home_goals , mr.away , mr.away_goals from match_results mr union all select mr.group1 , mr.away , mr.away_goals , mr.home , mr.home_goals from match_results mr ) , team_results as ( select group1 , home team , count(*) over (partition by group1, home) games_played , sum( case when home_goals > away_goals then 3 when away_goals > home_goals then 0 else 1 end ) over (partition by group1, home) points , sum( home_goals) over (partition by group1, home) goals_scored , sum( away_goals) over (partition by group1, home) goals_conceded from results ) select distinct tr.group1 , t.country , games_played , points , goals_scored-goals_conceded goal_difference , goals_scored from team_results tr join teams t on (tr.team = t.seq_in_group and tr.group1 = t.group1 ) order by tr.group1 , points desc , goal_difference desc , goals_scored desc
The results from this query (again, quite imaginary):
The next step in my quest for … I do not quite know what could be the exploration of the matches that still have to be played and the possible [what-if] scenarios that can play out given the current results.
Resources
The scripts for this article: WORLDCUP2014_queryStandings.
The rules laid down by FIFA for determining the group ranking : documented here.
Dimitri Gielis’ APEX application for World Cup 2014 Challenge: http://dgielis.blogspot.nl/2014/05/world-cup-2014-apex-betting-app-online.html
Once the World Cup starts, the match results should be available somewhere in XML format (RSS feeds maybe). We could then write a dbms_scheduler job which could get the match results through the utl_http calls, parse the resulting XML and populate the match_results table. Voila, a real-time updated scoreboard!!