Another article on analyzing and visualizing the results from the Tour de France 2011 using the ADF DVT components. This article uses the same set of data already discussed in several previous articles – including the standings per stage as well as the overall standings after each stage. This article will focus on using the combination graph, the bubble graph and the high/low (aka stock) chart for taking a closer look at speed, length and withdrawals (and any connection there can be between these aspects).
Some of the pretty pictures created in this article:
Distance – per stage and overall using a Combination Graph
The distance varies quite substantially between stages. A combination graph is a good way to visualize both the distance per stage as well as the overall distance in the Tour up until a certain stage.
The SQL Query that underlies the ViewObject that underpins this graph is shown here:
select stagesequence , totaldistance - nvl(lag(totaldistance) over (order by stagesequence), 0) stagedistance , totaldistance from stage e join tour t on (e.tour_id = t.id) where t.year = 2011
The collection on the Data Control based on the ViewObject based on this query can be dragged to the page and droppped as a Graph. Select Combination graph this time:
Configure the graph:
Note how the two data points are Stagedistance (for the bar chart against the Y1 axis) and Totaldistance (for the line against the Y2 axis).
Run the page to show the following overview:
Withdrawals per Stage – using Combination and Pareto
The withdrawals – riders who do not appear in the overall ranking from a certain stage onwards – can be analyzed in various ways. One way is also through the use of a Combination Chart that shows both the number of withdrawals in a stage (bar) as well as the total number of withdrawals until a certain stage (using a line). This is configured using the following query:
with stages as ( select e.stagesequence , e.totaldistance - lag(e.totaldistance) over (order by e.stagesequence) stagedistance , e.totaldistance , (select count(s.id) from standing s where s.stage_id = e.id and s.type_of_standing='ITG') rider_count from stage e join tour t on (e.tour_id = t.id) where t.year = 2011 ) , stages_and_withdrawals as ( select stagesequence , stagedistance , totaldistance , rider_count , lag(rider_count) over (order by stagesequence) - rider_count stage_withdrawals from stages ) select stagesequence , stagedistance , totaldistance , rider_count , stage_withdrawals , sum(stage_withdrawals) over (order by stagesequence) cummulative_withdrawals from stages_and_withdrawals
Based on this query (using a ViewObject in the Application Module) the following Combination Graph is created:
And the when the page is run, it shows up like:
One other, similar way of visualizing the number of withdrawals per stage, especially in order to find the stages that were particularly rough for some reason, is through the use of a Pareto chart. It is easily created, using only the stage sequence and the number of withdrawals in a particular stage for input. Drag the same collection as before to the page, drop it as Graph and choose the Pareto chart:
Configure the chart:
and run the page – giving the following overview of withdrawals:
Clearly stage 9 was a bastard of stage, accounting for close to 30% of all withdrawals all by itself!
Speed per Stage – High, Low and Median using a Stock Chart
The Stock Chart (aka High/Low chart) displays an indicator per date. The indicator is constructed using three values: the maximum and minimum for a day and a third value that for real stocks indicates the close value and will be used for the median speed in my visualization. The stock chart can be used with additional values: a fourth one for the opening value and optionally a fifth one to indicate the stock volume. I will not be using those 4th and 5th parameters – initially at least.
The following query returns the highest, lowest and average speed for each stage of the 2011 Tour de France:
with stages as ( select e.stagesequence , nvl(e.totaldistance - lag(e.totaldistance) over (order by e.stagesequence), e.totaldistance) stagedistance , e.totaldistance , (select count(s.id) from standing s where s.stage_id = e.id and s.type_of_standing='ITG') rider_count , (select min(s.totaltime) from standing s where s.stage_id = e.id and s.type_of_standing='ITE' ) winner_time , (select max(s.totaltime) from standing s where s.stage_id = e.id and s.type_of_standing='ITE' ) last_time , (select median(s.totaltime) from standing s where s.stage_id = e.id and s.type_of_standing='ITE' ) median_time , to_date('30-06-2011','DD-MM-YYYY')+ stagesequence date_of_stage from stage e join tour t on (e.tour_id = t.id) where t.year = 2011 ) , stages_and_withdrawals as ( select stagesequence , stagedistance , totaldistance , rider_count , lag(rider_count) over (order by stagesequence) - rider_count stage_withdrawals , winner_time , last_time , median_time , stagedistance/(extract(hour from winner_time) + 1/60* extract(minute from winner_time) + 1/3600 * extract(second from winner_time)) winner_speed , stagedistance/(extract(hour from last_time) + 1/60* extract(minute from last_time) + 1/3600 * extract(second from last_time)) last_speed , stagedistance/(extract(hour from median_time) + 1/60* extract(minute from median_time) + 1/3600 * extract(second from median_time)) median_speed , date_of_stage from stages ) select stagesequence , stagedistance , totaldistance , rider_count , stage_withdrawals , sum(stage_withdrawals) over (order by stagesequence) cummulative_withdrawals , winner_time , last_time , median_time , winner_speed , last_speed , median_speed , date_of_stage , 'Speed' metric from stages_and_withdrawals where median_speed is not null
Note: I had some errors regarding lacking data when I used the true date for the stages – because of rest-days the days are not consecutive and the stock chart had a problem dealing with that. So I have manipulated the query in creating a time wise consecutive set of data.
After creating a ViewObject based on query, adding it to the Application Module, creating a new page and refreshing the Data Control, I am ready to create the data bound stock chart.
Drag the collection to the page, drop it as Graph:
Select Stock Chart:
Configure the stock chart using the attribute from the ViewObject:
Make some small stilistic changes and run the page.
So here is the chart that indicates the speed for each stage. The top end of the marker indicates the highest speed – or the speed of the winner of the day – expressed in kilometer per hour. The low end of the indicator represents the speed for the last rider to finish in a stage. The horizontal bar indicates the median speed – that is the speed in the middle of all speed values in sorted order.
When winner and median speed are the same: the stage ended with a group sprint – such as the first stage. When the indicator is nothing but a horizontal bar – such as for the 21st stage in Paris – all riders finished in one group. When the horizontal bar is at the bottom, equal to the last rider’s speed, there was a group or a single rider ahead with a compact group behind it. Obviously the length of the marker indicates the spread in speed or the gap in time between the winner and the rider to come last. The contre-le-montre (speed time trial) on Saturday (20th stage) has the largest difference between winner and last rider.
Bubble Chart to visualize the relation between Speed, Distance and the number of Withdrawals
I have wondered whether there is a relation between the number of withdrawals in a certain stage and the (combination of) speed and distance in the stage. Such a association can be visualized using a Bubble Chart – where bubbles are plotted for each stage against both distance (X-axis) and speed (Y-axis). The radius (size) of the bubble visualizes the number of withdrawals in the stage.
The Bubble Chart is configured with attributes for X-axis, Y-axis and Bubble Size and optionally for bubble color as well.
Well, I am not sure too many conclusion can be drawn from this chart:
The fastest stage was of in-between length and has two withdrawals, the shortest have no witdrawals but the slowest all have at least one withdrawal. The largest number of withdrawals – the biggest bubble – is fairly long, of average speed. But the longest – of almost the same speed – has not withdrawals at all. I am not ready to declare a strong correlation I am afraid. Slippery roads, carelss car drivers or wreckless riders may also have something to do with this.
Download JDeveloper 11.1.2 Application: TourDeFranceBubbleParetoStockPlusExport.
Tour de France 2011 â€“ the official website â€“Â http://www.letour.fr/us/index.html
Building Java Object Graph with Tour de France results â€“ using screen scraping, java.util.Parser and assorted facilities â€“http://technology.amis.nl/blog/12786/building-java-object-graph-with-tour-de-france-results-using-screen-scraping-java-util-parser-and-assorted-facilities
Using JPA to persist the Tour de France Java Object Graph to relational database tablesÂ http://technology.amis.nl/blog/12796/using-jpa-to-persist-the-tour-de-france-java-object-graph-to-relational-database-tables
Tour de France â€“ Analysis using ADF DVT GraphsÂ -Â http://technology.amis.nl/blog/12956/tour-de-france-analysis-using-adf-dvt-graphs
Tour de France 2011 â€“ Analysis using ADF DVT Graphs â€“ Part 2 â€“ Gap with Cadel for Top 5 throughout the stages Â http://technology.amis.nl/blog/12972/tour-de-france-2011-analysis-using-adf-dvt-graphs-part-2-gap-with-cadel-for-top-5-throughout-the-stages