Screen scraping using Google Documents in a minute or less…

In a previous blog Lucas used JSoup to collect data from a web page. In this post I’ll show a declarative way to screen scrape data with the help of Google Documents.

The following webpage http://www.databaseolympics.com/games/gamesyear.htm?g=26 contains the olympic data I would like to import

  1. Open a new Google spreadsheet document.
  2. Paste the following formula in a cell A1
    =ImportHtml("http://www.databaseolympics.com/games/gamesyear.htm?g=26";"table"; 3)
  3. Press enter 🙂

The importHtml function instructs Google Documents to retrieve the third table on the webpage. There are other import functions as well see http://docs.google.com/support/bin/answer.py?answer=75507 for more information.

Spreadsheet after screenscraping