Screen scraping using Google Documents in a minute or less…

Jorrit Nijssen

In a previous blog Lucas used JSoup to collect data from a web page. In this post I’ll show a declarative way to screen scrape data with the help of Google Documents.

The following webpage http://www.databaseolympics.com/games/gamesyear.htm?g=26 contains the olympic data I would like to import

  1. Open a new Google spreadsheet document.
  2. Paste the following formula in a cell A1
    =ImportHtml("http://www.databaseolympics.com/games/gamesyear.htm?g=26";"table"; 3)
  3. Press enter 🙂

The importHtml function instructs Google Documents to retrieve the third table on the webpage. There are other import functions as well see http://docs.google.com/support/bin/answer.py?answer=75507 for more information.

Spreadsheet after screenscraping

Next Post

Shortly to follow - a book review - Oracle WebCenter 11g PS3 Administration Cookbook by Yannick Ongena (Packt Publishing, 2011)

Facebook0TwitterLinkedinHot off the press (well, that is what you always will have with printing on demand I suppose) I received an electronic copy (not off the press after all) of Oracle WebCenter 11g PS3 Administration Cookbook by Yannick Ongena (Packt Publishing, 2011). WebCenter has been one of my favorite Oracle […]