SQL Challenge: Dynamically producing a tag cloud for World Cup Football matches

0

Tag clouds are a special way to present information about a set of records. Tags are descriptive labels that are associated with specific records. A tag cloud visualizes the various labels that have been applied as well as the relative occurrence frequency of these labels. The tag cloud can be used as a filter mechanism to quickly drill down to a selected set of records: by selecting one or more tags, the set of filtered records is reduced to just the records that have all selected tags associated with them.

In this article, we look at SQL queries that help establish the tags that are associated with records of a specific nature. In this case, the records are matches played for the World Cup Football 2014 in Brazil.

Tags can be explicitly associated with records – for example by human users – and recorded in tables like TAGS and TAG_ASSOCIATIONS. Querying these associations is fairly straightforward. More interesting – as we will discuss in this article – is the dynamic derivation of tags that are to be associated with the matches. By defining conditions under which certain tag values are applicable, we can calculate tags based on the data content. If for example we determine that any match with more than four goals is tagged with the label exciting then we can derive this tag from the data describing the matches. Other tags that we can derive automatically include “comeback” (for teams that win after having been trailing their opponent), “intercontinental” (for matches between teams from different continents), “early” (for matches played at 1PM) and “north” (for matches played north of 10 degrees South lattitude).

The final result of querying the tags for the first four matches in groups A and B is shown here:

image

The tag cloud for these matches is queried as:

image

There are many components that can visualize such data. This article is not about such components. We will simply focus on retrieving the tags from the database.

The tables used in this article are shown below:

image

The DDL scripts for this table as well as the DML statements to load the relevant data can  be downloaded at the end of this article.

The central table is MATCH_RESULTS. It contains details about matches have been played – and references the TEAMS table (and indirectly the FOOTBALL_REGIONS table) as well as the STADIUMS table (and indirectly the CITIES table).

The initial data set in MATCH_RESULTS is:

image

In addition to the pure football details, there two tables to hold the explicit tag associations: TAGS and MATCH_TAGS.

image

The first query for retrieving MATCH_DETAILS and the associated tags is defined as follows:

with matches as
( select mr.id
, mr.home_goals
, mr.away_goals
, t1.country home_country
, t2.country away_country
, t1.fifa_rank home_fifa_rank
, t2.fifa_rank away_fifa_rank
, r1.name home_region
, r1.continent home_continent
, r2.name away_region
, r2.continent away_continent
, mr.local_start_time
, mr.scoring_process
, cty.lattitude
from match_results mr
join
stadiums sdm
on (mr.sdm_id = sdm.id)
join
cities cty
on (sdm.cty_id = cty.id)
join
teams t1
on ( mr.home = t1.seq_in_group
and
mr.group1 = t1.group1
)
join
teams t2
on ( mr.away= t2.seq_in_group
and
t1.group1 = t2.group1
)
join
football_regions r1
on (t1.rgn_id = r1.id)
join
football_regions r2
on (t2.rgn_id = r2.id)
)
, tagged as
(select t.tag
, mt.match_id
from match_tags mt
join
tags t
on (mt.tag_id = t.id)
)
select substr(tag, 1, 25) tag
, home_country
, away_country
from tagged
join
matches m
on (m.id = tagged.match_id)

The results from this query – with the current data set – is as follows:

image

These tags are explicitly associated with the matches. Let’s now add some tags that are dynamically derived.

First of all, we leverage an earlier article: https://technology.amis.nl/2014/06/22/sql-challenge-find-world-cup-football-matches-with-a-comeback/. In this article, I have shown how we can find matches in which a comeback took place. A comeback is defined as a team that was behind at some point in the match and then went on to win the match. The query to find such matches is discussed in this article. We can use that query together with the previous one to find both the explicitly associated tags as well as the dynamically derived “comeback” tags.

3

The result from this query is:

image

with matches as
( select mr.id
, mr.home_goals
, mr.away_goals
, t1.country home_country
, t2.country away_country
, t1.fifa_rank home_fifa_rank
, t2.fifa_rank away_fifa_rank
, r1.name home_region
, r1.continent home_continent
, r2.name away_region
, r2.continent away_continent
, mr.local_start_time
, mr.scoring_process
, cty.lattitude
from match_results mr
join
stadiums sdm
on (mr.sdm_id = sdm.id)
join
cities cty
on (sdm.cty_id = cty.id)
join
teams t1
on ( mr.home = t1.seq_in_group
and
mr.group1 = t1.group1
)
join
teams t2
on ( mr.away= t2.seq_in_group
and
t1.group1 = t2.group1
)
join
football_regions r1
on (t1.rgn_id = r1.id)
join
football_regions r2
on (t2.rgn_id = r2.id)
)
, scoring_process (match_id, intermediate_score, iteration) as
( select id, substr(scoring_process,1,1), 1
from match_results
union all
select id,substr(scoring_process,1,iteration+1), iteration+1
from match_results mr
join
scoring_process sp
on (sp.match_id = mr.id and length(scoring_process)>= iteration + 1)
)
, tagged as
(select t.tag
, mt.match_id
from match_tags mt
join
tags t
on (mt.tag_id = t.id)
union all
select 'comeback' tag
, match_id
from ( select ms.match_id
from ( select sp.*
, length(translate('1'||intermediate_score, '10','1')) - length(translate('0'||intermediate_score, '01','0')) running_score
from scoring_process sp
) ms
having max(running_score)> 0 and min(running_score) < 0
group
by ms.match_id
)
)
select substr(tag, 1, 25) tag
, home_country
, away_country
from tagged
join
matches m
on (m.id = tagged.match_id)

With this query as starting point, it is easy to add for example matches tagged with the tag surprise. We define as surprise any match where a team wins against an opponent that is ranked on the FIFA rank at least five positions higher or an team ties against an opponent that is at least ten positions higher. The additional subquery looks like this:

union all
select 'Surprise' tag
, m.id match_id
from matches m
where (m.home_goals > m.away_goals and m.home_fifa_rank - m.away_fifa_rank > 5)
or
(m.home_goals < m.away_goals and m.home_fifa_rank - m.away_fifa_rank < -5) or (m.home_goals = m.away_goals and abs(m.home_fifa_rank - m.away_fifa_rank) > 10)

The result:

image

Another tag we derive is recent which is assigned to the most recent five matches in the system:

union all
select 'recent' tag
, m.id match_id
from ( select m.*
, row_number() over (order by m.local_start_time desc) rnk
from matches m
) m
where rnk < 6
...

image

The result from the complete query that finds and calculates tags:

image

with matches as
( select mr.id
, mr.home_goals
, mr.away_goals
, t1.country home_country
, t2.country away_country
, t1.fifa_rank home_fifa_rank
, t2.fifa_rank away_fifa_rank
, r1.name home_region
, r1.continent home_continent
, r2.name away_region
, r2.continent away_continent
, mr.local_start_time
, mr.scoring_process
, cty.lattitude
from match_results mr
join
stadiums sdm
on (mr.sdm_id = sdm.id)
join
cities cty
on (sdm.cty_id = cty.id)
join
teams t1
on ( mr.home = t1.seq_in_group
and
mr.group1 = t1.group1
)
join
teams t2
on ( mr.away= t2.seq_in_group
and
t1.group1 = t2.group1
)
join
football_regions r1
on (t1.rgn_id = r1.id)
join
football_regions r2
on (t2.rgn_id = r2.id)
)
, scoring_process (match_id, intermediate_score, iteration) as
( select id, substr(scoring_process,1,1), 1
from match_results
union all
select id,substr(scoring_process,1,iteration+1), iteration+1
from match_results mr
join
scoring_process sp
on (sp.match_id = mr.id and length(scoring_process)>= iteration + 1)
)
, tagged as
(select t.tag
, mt.match_id
from match_tags mt
join
tags t
on (mt.tag_id = t.id)
union all
select 'Surprise' tag
, m.id match_id
from matches m
where (m.home_goals > m.away_goals and m.home_fifa_rank - m.away_fifa_rank > 5)
or
(m.home_goals < m.away_goals and m.home_fifa_rank - m.away_fifa_rank < -5) or (m.home_goals = m.away_goals and abs(m.home_fifa_rank - m.away_fifa_rank) > 10)
union all
select 'Intercontinental' tag
, m.id match_id
from matches m
where (m.home_continent != m.away_continent)
union all
select 'Derby' tag
, m.id match_id
from matches m
where (m.home_region = m.away_region)
union all
select case
when lattitude < 10 then 'north' when lattitude > 23 then 'south'
else 'central' end tag
, m.id match_id
from matches m
union all
select case to_char(m.LOCAL_START_TIME, 'HH24')
when '13' then 'early'
when '22' then 'late'
end tag
, m.id match_id
from matches m
where to_char(m.LOCAL_START_TIME, 'HH24') in ('13','22')
union all
select 'goalless tie' tag
, m.id match_id
from matches m
where home_goals + away_goals=0
union all
select 'recent' tag
, m.id match_id
from ( select m.*
, row_number() over (order by m.local_start_time desc) rnk
from matches m
) m
where rnk < 6 union all select 'exciting' tag , m.id match_id from matches m where home_goals + away_goals > 4
union all
select 'comeback' tag
, match_id
from ( select ms.match_id
from ( select sp.*
, length(translate('1'||intermediate_score, '10','1')) - length(translate('0'||intermediate_score, '01','0')) running_score
from scoring_process sp
) ms
having max(running_score)> 0 and min(running_score) < 0
group
by ms.match_id
)
)
select substr(tag, 1, 25) tag
, home_country
, away_country
from tagged
join
matches m
on (m.id = tagged.match_id)

The tag cloud is a list of tags and their occurrences. It looks like this:

image

and is derived with the following addition to the query:

...
select substr(tag, 1, 25) tag
, count(match_id) occurrences
from tagged
group
by substr(tag, 1, 25)

Also of interest is the set of tags that applies to individual matches. This query is used to derive these values per match , using the LISTAGG operator:

...
select home_country ||'-'||away_country lineup
, home_goals ||'-'||away_goals score
, tags
from ( select match_id
, listagg(lower(tag),',') within group (order by tag) tags
from tagged
group
by match_id
) mt
join
matches m
on (m.id = mt.match_id)

The result is:

image

Resource

Download the DDL plus DML file to create the tables and data described in this article:worldcup2014_tags .

About Author

Lucas Jellema, active in IT (and with Oracle) since 1994. Oracle ACE Director and Oracle Developer Champion. Solution architect and developer on diverse areas including SQL, JavaScript, Kubernetes & Docker, Machine Learning, Java, SOA and microservices, events in various shapes and forms and many other things. Author of the Oracle Press book Oracle SOA Suite 12c Handbook. Frequent presenter on user groups and community events and conferences such as JavaOne, Oracle Code, CodeOne, NLJUG JFall and Oracle OpenWorld.

Comments are closed.