SQL Challenge: Find World Cup Football matches with a comeback image300

SQL Challenge: Find World Cup Football matches with a comeback

As I recently said in another blog post, the World Cup Football is such a wonderful source of data that begs to be digested using SQL in many different ways. While working on a tag cloud producer – topic for a subsequent article – I ran into a special challenge. It is easy to describe, yet not so easy to resolve. Or at least, it took me some time to come up with this approach. I would like to invite any reader who has a suggestion as to how this can be resolved.

For my tag cloud regarding the matches in the World Cup tournament, I want to identify those matches that has a comeback in them. That is: matches where the winner has comeback from trailing the opponent . For example: The Netherland were 0-1 behind against Spain (after the penalty kick) and went on to win the game. So this match Spain-The Netherlands has a comeback.

I have made use of a compact notation of the way the scoring process went,  using a string with 0 and 1 characters. For example: ‘011111’ describes the Spain-Netherlands match. The zero indicates a goal for the home team (Spain), each 1 represents 1 goal for the visitors (The Netherlands). ‘1000’ is the scoring process in Brazil vs Croatia – with Croatia coming into the lead before conceding the match.

To find the matches with a comeback in them, I have to find the strings that somewhere midway have more ones than zeros but in the end have more zeros than ones – or vice versa.

The raw data looks like this:

image

with the Spain-Netherlands match highlighted. It reads_ match number 3, group B, number 1 in the group playing at home vs the number two in the group playing away.

My approach hinges on two things:

  • use of recursive subquery factoring to create records for each of the intermediate scores in each of the matches (so Spain vs Netherlands produces six rows for 1-0 or ‘0’, 1-1 or ‘01’, 1-2 or ‘011’, 1-3 or ‘0111’, 1-4 or ‘01111’ and 1-5 or ‘011111’respectively)
  • use of translate and length to count the number of 1 and 0 characters in a string (note: translate(‘01011’, ‘10’,’1’) results in ‘111’ and translate(‘01011’, ‘01’,’0’) results in ‘00’ )

The first step in creating the SQL query that finds all matches with a comeback in them:

using a recursive subquery, this query retrieves all substrings – and therefore all intermediate scores – from the final score

with scoring_process (match_id, intermediate_score, iteration) as
( select id, substr(scoring_process,1,1), 1
  from   match_results
  union all
  select id,substr(scoring_process,1,iteration+1), iteration+1
  from   match_results mr
         join
         scoring_process sp
         on (sp.match_id = mr.id and length(scoring_process)>= iteration + 1)
)
select *
from   scoring_process

image

The next step calculates the running_score: the difference between the number of home and away goals – using the translate operator that replaces every occurrence in the source string of the indicated character with its indicated replacement. When no replacement is specified for a character, it is simply not included in the resulting string. Note: the leading ‘0’ and ‘1’ are added because length(translate(‘000’,’10’, ‘1’)) results in NULL. I could have used NVL instead of adding the extra character.

with scoring_process (match_id, intermediate_score, iteration) as
( select id, substr(scoring_process,1,1), 1
  from   match_results
  union all
  select id,substr(scoring_process,1,iteration+1), iteration+1
  from   match_results mr
         join
         scoring_process sp
         on (sp.match_id = mr.id and length(scoring_process)>= iteration + 1)
)
, match_stages as
( select sp.*
  ,      length(translate('1'||intermediate_score, '10','1')) - length(translate('0'||intermediate_score, '01','0')) running_score
  from   scoring_process sp
)         
select *
from   match_stages

image

To find the matches with a true comeback in them, I need to identify the match_id’s that at one point in time had a negative running_score and at some other moment a positive running_score.

Using simple aggregating (MAX and MIN) along with HAVING I can find the match_id that have running_scores on opposite sides of zero.

with scoring_process (match_id, intermediate_score, iteration) as
( select id, substr(scoring_process,1,1), 1
  from   match_results
  union all
  select id,substr(scoring_process,1,iteration+1), iteration+1
  from   match_results mr
         join
         scoring_process sp
         on (sp.match_id = mr.id and length(scoring_process)>= iteration + 1)
)
, match_stages as
( select sp.*
  ,      length(translate('1'||intermediate_score, '10','1')) - length(translate('0'||intermediate_score, '01','0')) running_score
  from   scoring_process sp
)         
select ms.match_id
from   match_stages ms
having max(running_score)> 0 and min(running_score) < 0
group
by     ms.match_id
&#91;/code&#93;

<a href="https://technology.amis.nl/wp-content/uploads/2014/06/image303.png"><img style="background-image: none; padding-top: 0px; padding-left: 0px; display: inline; padding-right: 0px; border: 0px;" title="image" src="https://technology.amis.nl/wp-content/uploads/2014/06/image_thumb312.png" alt="image" width="86" height="97" border="0" /></a>

(note: I have only used the first 4 matches in Groups A and B. Surprising that just these 8 matches have three comebacks in them.)

The next level of challenge would be to find ‘dramatic comebacks’ where a team was two goals behind at some stage and yet managed to win the game. Any suggestions for resolving that one?

The script to create the table and the data:


[code language='SQL']
CREATE TABLE "MATCH_RESULTS"
   (    "GROUP1" VARCHAR2(1 BYTE),
    "HOME" NUMBER(1,0),
    "AWAY" NUMBER(1,0),
    "HOME_GOALS" NUMBER(2,0),
    "AWAY_GOALS" NUMBER(2,0),
    "SDM_ID" NUMBER(2,0),
    "LOCAL_START_TIME" DATE,
    "SCORING_PROCESS" VARCHAR2(20 BYTE),
    "WEATHER_CATEGORY" VARCHAR2(20 BYTE),
    "ID" NUMBER(2,0)
   );

Insert into MATCH_RESULTS (GROUP1,HOME,AWAY,HOME_GOALS,AWAY_GOALS,SDM_ID,LOCAL_START_TIME,SCORING_PROCESS,WEATHER_CATEGORY,ID) values ('A',1,2,3,1,5,to_timestamp('12-JUN-14','DD-MON-RR HH.MI.SSXFF AM'),'1000',null,1);
Insert into MATCH_RESULTS (GROUP1,HOME,AWAY,HOME_GOALS,AWAY_GOALS,SDM_ID,LOCAL_START_TIME,SCORING_PROCESS,WEATHER_CATEGORY,ID) values ('A',3,4,1,0,3,to_timestamp('13-JUN-14','DD-MON-RR HH.MI.SSXFF AM'),'0',null,2);
Insert into MATCH_RESULTS (GROUP1,HOME,AWAY,HOME_GOALS,AWAY_GOALS,SDM_ID,LOCAL_START_TIME,SCORING_PROCESS,WEATHER_CATEGORY,ID) values ('A',1,3,0,0,6,to_timestamp('17-JUN-14','DD-MON-RR HH.MI.SSXFF AM'),'0',null,17);
Insert into MATCH_RESULTS (GROUP1,HOME,AWAY,HOME_GOALS,AWAY_GOALS,SDM_ID,LOCAL_START_TIME,SCORING_PROCESS,WEATHER_CATEGORY,ID) values ('A',4,2,0,4,2,to_timestamp('18-JUN-14','DD-MON-RR HH.MI.SSXFF AM'),'1111',null,18);
Insert into MATCH_RESULTS (GROUP1,HOME,AWAY,HOME_GOALS,AWAY_GOALS,SDM_ID,LOCAL_START_TIME,SCORING_PROCESS,WEATHER_CATEGORY,ID) values ('B',1,2,1,5,12,to_timestamp('13-JUN-14','DD-MON-RR HH.MI.SSXFF AM'),'011111',null,3);
Insert into MATCH_RESULTS (GROUP1,HOME,AWAY,HOME_GOALS,AWAY_GOALS,SDM_ID,LOCAL_START_TIME,SCORING_PROCESS,WEATHER_CATEGORY,ID) values ('B',3,4,3,1,7,to_timestamp('13-JUN-14','DD-MON-RR HH.MI.SSXFF AM'),'0010',null,4);
Insert into MATCH_RESULTS (GROUP1,HOME,AWAY,HOME_GOALS,AWAY_GOALS,SDM_ID,LOCAL_START_TIME,SCORING_PROCESS,WEATHER_CATEGORY,ID) values ('B',1,3,0,2,1,to_timestamp('18-JUN-14','DD-MON-RR HH.MI.SSXFF AM'),'11',null,19);
Insert into MATCH_RESULTS (GROUP1,HOME,AWAY,HOME_GOALS,AWAY_GOALS,SDM_ID,LOCAL_START_TIME,SCORING_PROCESS,WEATHER_CATEGORY,ID) values ('B',4,2,2,3,10,to_timestamp('18-JUN-14','DD-MON-RR HH.MI.SSXFF AM'),'10011',null,20);

4 Comments

  1. Vinod Gowda March 10, 2017
  2. stewashton June 27, 2014
  3. stew ashton June 23, 2014
  4. Anton Scheffer June 23, 2014