Last year I wrote about JSoup, a Java library that helps with screenscraping: Screenscraping from Java using jsoup – effective data gathering from websites (https://technology.amis.nl/blog/13121/screenscraping-from-java-using-jsoup-effective-data-gathering-from-websites). Last month I had another opportunity for using JSoup, this time to gather song lyrics for the songs on a CD. The context in this case was the internal SOA for Java Professionals training program at AMIS. The students did an assignment to complete the second block in this three-piece program. Their assignment required them to implement a Web Service that produced the CD Booklet for a certain CD – returned as PDF document with illustration, song titles and song lyrics. One of the resources we made available to the students was a Java Class that returned song lyrics. It was their challenge to integrate this class in a proper way in their application (be it PL/SQL, SOA Suite 11g or OSB based).
The LyricsGatherer is easily constructed using JSoup and the website http://www.songlyrics.com/ (that suffers from periodic and unfortunate loss of service) :
Downdrilling on the search results brings us to the actual song lyrics:
And if a browser can do this, so can a Java program (generally speaking and definitely true in this case).
The Java code – leveraging JSoup – to retrieve song lyrics looks like this:
package nl.amis.music.lyrics; import java.io.IOException; import java.util.ArrayList; import java.util.List; import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.nodes.Node; import org.jsoup.nodes.TextNode; public class LyricsGatherer { private final static String songLyricsURL = "http://www.songlyrics.com"; public static List<String> getSongLyrics( String band, String songTitle) throws IOException { List<String> lyrics= new ArrayList<String>(); Document doc = Jsoup.connect(songLyricsURL+ "/"+band.replace(" ", "-").toLowerCase()+"/"+songTitle.replace(" ", "-").toLowerCase()+"-lyrics/").get(); String title = doc.title(); System.out.println(title); Element p = doc.select("p.songLyricsV14").get(0); for (Node e: p.childNodes()) { if (e instanceof TextNode) { lyrics.add(((TextNode)e).getWholeText()); } } return lyrics; } public static void main(String[] args) throws IOException { System.out.println(LyricsGatherer.getSongLyrics("U2", "With or Without You")); System.out.println(LyricsGatherer.getSongLyrics("Billy Joel", "Allentown")); System.out.println(LyricsGatherer.getSongLyrics("Tori Amos", "Winter")); } }
The results
can easily be returned in a Web Service style fashion.
Resources
Download the source discussed in this article:SongLyricsProvider.zip .