Retrieve song lyrics in Java using Screenscraping with JSoup

0
Share this on .. Tweet about this on Twitter0Share on LinkedIn0Share on Facebook0Share on Google+0Email this to someoneShare on TumblrBuffer this page

Last year I wrote about JSoup, a Java library that helps with screenscraping: Screenscraping from Java using jsoup – effective data gathering from websites (https://technology.amis.nl/blog/13121/screenscraping-from-java-using-jsoup-effective-data-gathering-from-websites). Last month I had another opportunity for using JSoup, this time to gather song lyrics for the songs on a CD. The context in this case was the internal SOA for Java Professionals training program at AMIS. The students did an assignment to complete the second block in this three-piece program. Their assignment required them to implement a Web Service that produced the CD Booklet for a certain CD – returned as PDF document with illustration, song titles and song lyrics. One of the resources we made available to the students was a Java Class that returned song lyrics. It was their challenge to integrate this class in a proper way in their application (be it PL/SQL, SOA Suite 11g or OSB based).

The LyricsGatherer is easily constructed using JSoup and the website http://www.songlyrics.com/ (that suffers from periodic and unfortunate loss of service) :

Image

Downdrilling on the search results brings us to the actual song lyrics:

Image

 

And if a browser can do this, so can a Java program (generally speaking and definitely true in this case).

The Java code – leveraging JSoup – to retrieve song lyrics looks like this:

package nl.amis.music.lyrics;

import java.io.IOException;

import java.util.ArrayList;
import java.util.List;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.nodes.Node;
import org.jsoup.nodes.TextNode;

public class LyricsGatherer {

   private final static String songLyricsURL = "http://www.songlyrics.com";


   public static List<String> getSongLyrics( String band, String songTitle) throws IOException {
     List<String> lyrics= new ArrayList<String>();

     Document doc = Jsoup.connect(songLyricsURL+ "/"+band.replace(" ", "-").toLowerCase()+"/"+songTitle.replace(" ", "-").toLowerCase()+"-lyrics/").get();
     String title = doc.title();
     System.out.println(title);
     Element p = doc.select("p.songLyricsV14").get(0);
      for (Node e: p.childNodes()) {
          if (e instanceof TextNode) {
            lyrics.add(((TextNode)e).getWholeText());
          }
      }
     return lyrics;
   }

   public static void main(String[] args) throws IOException {
      System.out.println(LyricsGatherer.getSongLyrics("U2", "With or Without You"));
      System.out.println(LyricsGatherer.getSongLyrics("Billy Joel", "Allentown"));
      System.out.println(LyricsGatherer.getSongLyrics("Tori Amos", "Winter"));
    }
}

The results

Image

can easily be returned in a Web Service style fashion.

Resources

Download the source discussed in this article:SongLyricsProvider.zip .

Share this on .. Tweet about this on Twitter0Share on LinkedIn0Share on Facebook0Share on Google+0Email this to someoneShare on TumblrBuffer this page

About Author

Lucas Jellema, active in IT (and with Oracle) since 1994. Oracle ACE Director for Fusion Middleware. Consultant, trainer and instructor on diverse areas including Oracle Database (SQL & PL/SQL), Service Oriented Architecture, BPM, ADF, JavaScript, Java in various shapes and forms and many other things. Author of the Oracle Press books: Oracle SOA Suite 11g Handbook and Oracle SOA Suite 12c Handbook. Frequent presenter on conferences such as JavaOne and Oracle OpenWorld. Presenter for Oracle University Celebrity specials.

Leave a Reply