In a previous blog Lucas used JSoup to collect data from a web page. In this post I’ll show a declarative way to screen scrape data with the help of Google Documents.
The following webpage http://www.databaseolympics.com/games/gamesyear.htm?g=26 contains the olympic data I would like to import
- Open a new Google spreadsheet document.
- Paste the following formula in a cell A1
=ImportHtml("http://www.databaseolympics.com/games/gamesyear.htm?g=26";"table"; 3)
- Press enter 🙂
The importHtml
function instructs Google Documents to retrieve the third table on the webpage. There are other import functions as well see http://docs.google.com/support/bin/answer.py?answer=75507 for more information.