Analyzing the 2019 Tour de France in depth using Strava performance data from Race Riders

0

This year’s Tour de France was quite a spectacle. Great performances, exciting stages, unexpected events: it had it all. Analyzing the race events as they unfolded during the stages of this year’s Tour is something I am keen to attempt. Using Jupyter Notebooks, Python and Pandas and Plotly for visualization, I am sure I can get more detailed stories extracted from raw race data. The starting point for such analysis activities is… the data.

However, I have not been able to find public sources for detail data – such as timeseries data with the GPS location of riders or even groups during the stages of the TdF. So it felt like a dead end before I even had gotten started. Then I remembered Strava. Strava (Swedish for strive) is a platform for tracking performance and deep diving into the collected data.

image 

Strava collects data from athletes regarding their activities – such as running, cycling, walking and hiking. Members can upload data – and tens of millions do so, including some well known cyclists – see this blog article for a list of over 40 Tour de France contenders who publish [some of] their data on Strava: https://blog.strava.com/tour-de-france-riders-to-follow-18148/.

Strava data can be retrieved using an official API (https://developers.strava.com/) for which Python libraries have been developed [https://github.com/hozn/stravalib]. However, only personal data can be retrieved. For use of data from other members, “you will have to make an application and request that athletes sign in with Strava, and grant your application certain permissions using OAuth 2.0.”

Data from public athletes can be looked up on the Strava website ([https://www.strava.com/athletes/search]). An overview is provided of activities for which data was uploaded by an athlete.

For example Steven Kruijswijk (team Jumbo Visma, #3 in final classification):

image

For each of these activities, details can be inspected on the website; here for example TdF Stage 14 for Steven Kruijswijk:

image

And some on screen stats analysis:

image

It is not hard to find out the requests made by the web application to retrieve the data that is presented – using the request analysis features in the browser Developer Tools:

image

It turns out that the response to these requests are pure JSON documents – that can easily be interpreted.

SNAGHTML172fb978

The URL used by the Strava webapp to retrieve the data uses the activity identifier as its primary key, and addes request parameters to instruct the API backend about the information elements to return. The URL is composed like this:

https://www.strava.com/activities/2548396565/streams?stream_types%5B%5D=time&stream_types%5B%5D=velocity_smooth&stream_types%5B%5D=watts_calc&stream_types%5B%5D=altitude&stream_types%5B%5D=heartrate&stream_types%5B%5D=cadence&stream_types%5B%5D=temp&stream_types%5B%5D=distance&stream_types%5B%5D=latlng&stream_types%5B%5D=grade_smooth&_=1565273406470

The value 2548396565 is the activity id for Steven Kruijswijk’s data set for the TdF 2019 Stage 14 performance recording.

I do not yet know yet the meaning or even relevance of the last parameter.

It is important to realize that this URL can only be accessed from an authenticated browser session (authenticated in a browser with a valid Strava account). I have not used this URL to programmatically and repeatedly collect data directly from a computer program but instead only for copy & paste to a JSON text file in the browser. For Stage 14 in the 2019 Tour de France, I have collected data for a number of riders, including the stage winner (Thibaut Pinot), an early front runner (Marco Haller), one of the most active riders in this year’s TdF (Thomas de Gendt) and of course Steven Kruijswijk.

image

In future articles, I will show you some of the analysis of this detailed data using Python, Pandas, Plotly in a Jupyter Notebook. One early glimpse:image

This chart shows the time gap with the stage winner (Thibaut Pinot) for each of three riders at each distance during the stage. The black dotted line is the altitude profile as recorded by Thibaut’s tracking device. The official stage profile as published by the Tour de France organization is shown below – and should correspond with the this dotted line.

image


About Author

Lucas Jellema, active in IT (and with Oracle) since 1994. Oracle ACE Director and Oracle Developer Champion. Solution architect and developer on diverse areas including SQL, JavaScript, Kubernetes & Docker, Machine Learning, Java, SOA and microservices, events in various shapes and forms and many other things. Author of the Oracle Press book Oracle SOA Suite 12c Handbook. Frequent presenter on user groups and community events and conferences such as JavaOne, Oracle Code, CodeOne, NLJUG JFall and Oracle OpenWorld.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.