In a recent article I discussed how to programmatically fetch a JSON document with information about sessions at Oracle OpenWorld and JavaOne 2017. Yesterday, slidedecks for these sessions started to become available. I have analyzed how the link to these downloads were included in the JSON data returned by the API. Then I created simple Node programs to tweet about each of the sessions for which the download became available
and to download the file to my local file system.
I added provisions to space out the tweets and the download activity over time – as to not burden the backend of the web site and to not be kicked off Twitter for being a robot.
The code I crafted is not particularly ingenuous – it was created rather hastily in order to share with the OOW17 and JavaOne communities the links for downloading slide decks from presentations at both conferences. I used npm modules twit and download. This code can be found on GitHub: https://github.com/lucasjellema/scrape-oow17.
The documents javaone2017-sessions-catalog.json and oow2017-sessions-catalog.json contain details on all sessions – including the URLs for downloading slides.