SQL Challenge: Drilling down into World Cup Football Tag Cloud image316

SQL Challenge: Drilling down into World Cup Football Tag Cloud

imageIn a previous article, I have introduced the World Cup Football tag cloud (https://technology.amis.nl/2014/06/22/sql-challenge-dynamically-producing-a-tag-cloud-for-world-cup-football-matches/). I have shown how tags can be associated with football matches, both explicitly and implicitly – the latter derived automatically, based on the structured information about the matches.

The tag cloud can be used as a filter mechanism to quickly drill down to a selected set of records: by selecting one or more tags, the set of filtered records is reduced to just the records that have all selected tags associated with them. The tag cloud itself is then also readjusted: to present the tags and their relative frequencies for the remaining set of records.

This article discusses how in SQL we can implement the drill down for these two requirements:

  • only retain the records that have all selected tags associated with them
  • recalculate the tag cloud for the remaining set of records

We will not concern ourselves with the visual representation of the tag cloud, the selected tags, the remaining records or the drill down interaction.

To make life easy, I will use a simple collection type in my queries, defined as:

create type string_table as table of

With this collection type, I can easily create a set of selected tag values – and work with these tag values in SQL

with tag_filter as
( select string_tbl('recent','comeback') selected_tags
from dual
)
, filter_tags as
( select column_value filter_tag
from table((select selected_tags from tag_filter))
)
select *
from filter_tags

FILTER_TAG
--------------------------------------------------
recent
comeback

Now the moment has come to integrate the selected_tags into the query that returns all matches – and for all matches their tag cloud. The query – introduced in the previous article https://technology.amis.nl/2014/06/22/sql-challenge-dynamically-producing-a-tag-cloud-for-world-cup-football-matches/ – that returns the matches with their tags is now extended, using the tag filter and filter_tags inline views and using a little collection juggling. In inline view match_tag_sets, I determine the tagset for each match, as a string table, using the COLLECT aggregation operator that aggregates multiple strings into a collection.  Inline view selected_matches subsequently selects only the matches that have all the selected tags (from tag_filter) in their collection of tags. This is determined using the SUBMULTISET OF. The final query queries all selected matches and their tags – skipping the selected tags from tag_filter (using not member of against the tag_filter.selected_tags collection.

with tag_filter as
( select string_table('recent','comeback') selected_tags
from dual
)
, filter_tags as
( select column_value filter_tag
from table((select selected_tags from tag_filter))
)
, matches as
( select mr.id
...
...
, match_tag_sets as
( select match_id
, cast(collect(substr(tag,1,25)) as string_table) tagset
from tagged
join
matches m
on (m.id = tagged.match_id)
group
by match_id
)
, selected_matches as
( select match_id
from match_tag_sets mts
cross join
tag_filter tf
where tf.selected_tags SUBMULTISET mts.tagset
)
select home_country ||'-'||away_country lineup
, home_goals ||'-'||away_goals score
, tags
from ( select match_id
, listagg(lower(tag),',') within group (order by tag) tags
from tagged
join
selected_matches
using (match_id)
cross join
tag_filter tf
where tag not member of tf.selected_tags
group
by match_id
) mt
join
matches m
on (m.id = mt.match_id)

The outcome of this query in this case for the selected tags as

image

is

image

The tag cloud that remains after one or more tags have been selected contains the tags that apply to the matches that remain after applying the selected tags. That means in this example that we need the tag cloud to be derived from the six matches listed in the figure above – matches that all have tags Intercontinental and central.

The last part of the query that derives the tag cloud is this:

, match_tag_sets as
( select match_id
, cast(collect(substr(tag,1,25)) as string_table) tagset
from tagged
join
matches m
on (m.id = tagged.match_id)
group
by match_id
)
, selected_matches as
( select match_id
from match_tag_sets mts
cross join
tag_filter tf
where tf.selected_tags SUBMULTISET mts.tagset
)
select substr(tag, 1, 25) tag
, count(match_id) occurrences
from tagged
join
selected_matches
using (match_id)
cross join
tag_filter tf
where tag not member of tf.selected_tags
group
by substr(tag, 1, 25)
order
by occurrences desc

The remaining tag cloud:

image

Resources

Download DDL plus DML plus queries: worldcupfootball2014_ddl_dml.