Tag clouds are a special way to present information about a set of records. Tags are descriptive labels that are associated with specific records. A tag cloud visualizes the various labels that have been applied as well as the relative occurrence frequency of these labels. The tag cloud can be used as a filter mechanism to quickly drill down to a selected set of records: by selecting one or more tags, the set of filtered records is reduced to just the records that have all selected tags associated with them.
In this article, we look at SQL queries that help establish the tags that are associated with records of a specific nature. In this case, the records are matches played for the World Cup Football 2014 in Brazil.
Tags can be explicitly associated with records – for example by human users – and recorded in tables like TAGS and TAG_ASSOCIATIONS. Querying these associations is fairly straightforward. More interesting – as we will discuss in this article – is the dynamic derivation of tags that are to be associated with the matches. By defining conditions under which certain tag values are applicable, we can calculate tags based on the data content. If for example we determine that any match with more than four goals is tagged with the label exciting then we can derive this tag from the data describing the matches. Other tags that we can derive automatically include “comeback” (for teams that win after having been trailing their opponent), “intercontinental” (for matches between teams from different continents), “early” (for matches played at 1PM) and “north” (for matches played north of 10 degrees South lattitude).
The final result of querying the tags for the first four matches in groups A and B is shown here:
The tag cloud for these matches is queried as:
There are many components that can visualize such data. This article is not about such components. We will simply focus on retrieving the tags from the database.
The tables used in this article are shown below:
The DDL scripts for this table as well as the DML statements to load the relevant data can be downloaded at the end of this article.
The central table is MATCH_RESULTS. It contains details about matches have been played – and references the TEAMS table (and indirectly the FOOTBALL_REGIONS table) as well as the STADIUMS table (and indirectly the CITIES table).
The initial data set in MATCH_RESULTS is:
In addition to the pure football details, there two tables to hold the explicit tag associations: TAGS and MATCH_TAGS.
The first query for retrieving MATCH_DETAILS and the associated tags is defined as follows:
with matches as ( select mr.id , mr.home_goals , mr.away_goals , t1.country home_country , t2.country away_country , t1.fifa_rank home_fifa_rank , t2.fifa_rank away_fifa_rank , r1.name home_region , r1.continent home_continent , r2.name away_region , r2.continent away_continent , mr.local_start_time , mr.scoring_process , cty.lattitude from match_results mr join stadiums sdm on (mr.sdm_id = sdm.id) join cities cty on (sdm.cty_id = cty.id) join teams t1 on ( mr.home = t1.seq_in_group and mr.group1 = t1.group1 ) join teams t2 on ( mr.away= t2.seq_in_group and t1.group1 = t2.group1 ) join football_regions r1 on (t1.rgn_id = r1.id) join football_regions r2 on (t2.rgn_id = r2.id) ) , tagged as (select t.tag , mt.match_id from match_tags mt join tags t on (mt.tag_id = t.id) ) select substr(tag, 1, 25) tag , home_country , away_country from tagged join matches m on (m.id = tagged.match_id)
The results from this query – with the current data set – is as follows:
These tags are explicitly associated with the matches. Let’s now add some tags that are dynamically derived.
First of all, we leverage an earlier article: https://technology.amis.nl/2014/06/22/sql-challenge-find-world-cup-football-matches-with-a-comeback/. In this article, I have shown how we can find matches in which a comeback took place. A comeback is defined as a team that was behind at some point in the match and then went on to win the match. The query to find such matches is discussed in this article. We can use that query together with the previous one to find both the explicitly associated tags as well as the dynamically derived “comeback” tags.
3
The result from this query is:
with matches as ( select mr.id , mr.home_goals , mr.away_goals , t1.country home_country , t2.country away_country , t1.fifa_rank home_fifa_rank , t2.fifa_rank away_fifa_rank , r1.name home_region , r1.continent home_continent , r2.name away_region , r2.continent away_continent , mr.local_start_time , mr.scoring_process , cty.lattitude from match_results mr join stadiums sdm on (mr.sdm_id = sdm.id) join cities cty on (sdm.cty_id = cty.id) join teams t1 on ( mr.home = t1.seq_in_group and mr.group1 = t1.group1 ) join teams t2 on ( mr.away= t2.seq_in_group and t1.group1 = t2.group1 ) join football_regions r1 on (t1.rgn_id = r1.id) join football_regions r2 on (t2.rgn_id = r2.id) ) , scoring_process (match_id, intermediate_score, iteration) as ( select id, substr(scoring_process,1,1), 1 from match_results union all select id,substr(scoring_process,1,iteration+1), iteration+1 from match_results mr join scoring_process sp on (sp.match_id = mr.id and length(scoring_process)>= iteration + 1) ) , tagged as (select t.tag , mt.match_id from match_tags mt join tags t on (mt.tag_id = t.id) union all select 'comeback' tag , match_id from ( select ms.match_id from ( select sp.* , length(translate('1'||intermediate_score, '10','1')) - length(translate('0'||intermediate_score, '01','0')) running_score from scoring_process sp ) ms having max(running_score)> 0 and min(running_score) < 0 group by ms.match_id ) ) select substr(tag, 1, 25) tag , home_country , away_country from tagged join matches m on (m.id = tagged.match_id)
With this query as starting point, it is easy to add for example matches tagged with the tag surprise. We define as surprise any match where a team wins against an opponent that is ranked on the FIFA rank at least five positions higher or an team ties against an opponent that is at least ten positions higher. The additional subquery looks like this:
union all select 'Surprise' tag , m.id match_id from matches m where (m.home_goals > m.away_goals and m.home_fifa_rank - m.away_fifa_rank > 5) or (m.home_goals < m.away_goals and m.home_fifa_rank - m.away_fifa_rank < -5) or (m.home_goals = m.away_goals and abs(m.home_fifa_rank - m.away_fifa_rank) > 10)
The result:
Another tag we derive is recent which is assigned to the most recent five matches in the system:
union all select 'recent' tag , m.id match_id from ( select m.* , row_number() over (order by m.local_start_time desc) rnk from matches m ) m where rnk < 6 ...
The result from the complete query that finds and calculates tags:
with matches as ( select mr.id , mr.home_goals , mr.away_goals , t1.country home_country , t2.country away_country , t1.fifa_rank home_fifa_rank , t2.fifa_rank away_fifa_rank , r1.name home_region , r1.continent home_continent , r2.name away_region , r2.continent away_continent , mr.local_start_time , mr.scoring_process , cty.lattitude from match_results mr join stadiums sdm on (mr.sdm_id = sdm.id) join cities cty on (sdm.cty_id = cty.id) join teams t1 on ( mr.home = t1.seq_in_group and mr.group1 = t1.group1 ) join teams t2 on ( mr.away= t2.seq_in_group and t1.group1 = t2.group1 ) join football_regions r1 on (t1.rgn_id = r1.id) join football_regions r2 on (t2.rgn_id = r2.id) ) , scoring_process (match_id, intermediate_score, iteration) as ( select id, substr(scoring_process,1,1), 1 from match_results union all select id,substr(scoring_process,1,iteration+1), iteration+1 from match_results mr join scoring_process sp on (sp.match_id = mr.id and length(scoring_process)>= iteration + 1) ) , tagged as (select t.tag , mt.match_id from match_tags mt join tags t on (mt.tag_id = t.id) union all select 'Surprise' tag , m.id match_id from matches m where (m.home_goals > m.away_goals and m.home_fifa_rank - m.away_fifa_rank > 5) or (m.home_goals < m.away_goals and m.home_fifa_rank - m.away_fifa_rank < -5) or (m.home_goals = m.away_goals and abs(m.home_fifa_rank - m.away_fifa_rank) > 10) union all select 'Intercontinental' tag , m.id match_id from matches m where (m.home_continent != m.away_continent) union all select 'Derby' tag , m.id match_id from matches m where (m.home_region = m.away_region) union all select case when lattitude < 10 then 'north' when lattitude > 23 then 'south' else 'central' end tag , m.id match_id from matches m union all select case to_char(m.LOCAL_START_TIME, 'HH24') when '13' then 'early' when '22' then 'late' end tag , m.id match_id from matches m where to_char(m.LOCAL_START_TIME, 'HH24') in ('13','22') union all select 'goalless tie' tag , m.id match_id from matches m where home_goals + away_goals=0 union all select 'recent' tag , m.id match_id from ( select m.* , row_number() over (order by m.local_start_time desc) rnk from matches m ) m where rnk < 6 union all select 'exciting' tag , m.id match_id from matches m where home_goals + away_goals > 4 union all select 'comeback' tag , match_id from ( select ms.match_id from ( select sp.* , length(translate('1'||intermediate_score, '10','1')) - length(translate('0'||intermediate_score, '01','0')) running_score from scoring_process sp ) ms having max(running_score)> 0 and min(running_score) < 0 group by ms.match_id ) ) select substr(tag, 1, 25) tag , home_country , away_country from tagged join matches m on (m.id = tagged.match_id)
The tag cloud is a list of tags and their occurrences. It looks like this:
and is derived with the following addition to the query:
... select substr(tag, 1, 25) tag , count(match_id) occurrences from tagged group by substr(tag, 1, 25)
Also of interest is the set of tags that applies to individual matches. This query is used to derive these values per match , using the LISTAGG operator:
... select home_country ||'-'||away_country lineup , home_goals ||'-'||away_goals score , tags from ( select match_id , listagg(lower(tag),',') within group (order by tag) tags from tagged group by match_id ) mt join matches m on (m.id = mt.match_id)
The result is:
Resource
Download the DDL plus DML file to create the tables and data described in this article:worldcup2014_tags .