Posts tagged url
Building Java Object Graph with Tour de France results – using screen scraping, java.util.Parser and assorted facilities4
Last Saturday, the Tour de France 2011 departed. For people like myself, enjoying sports and working on Data Visualizations on the one hand and far fetched uses of SQL on the other, the Tour de France offers a wealth of data to work with: rankings for each stage in various categories, nationalities and teams to group by, distances and velocity, years to compare with one another and the like. So it has been my intention for some time to get hold of that data in a format I could work with.
Today I finally found some time to get it done. To locate the statistics for the Tour de France editions for the last few years and get them onto my laptop and into my database. This article describes the first part of that journey: how to get the stage results from some source on the internet into my locally running Java program in an appropriate object structure.
My starting point is the official Tour de France website:
This website goes back to 2007 and also has the latest (2011) results. It presents the result in a format pleasing to the human eye – based on an HTML structure that is fairly pleasing to my groping Java code as well.