Retrieve song lyrics in Java using Screenscraping with JSoup

Lucas Jellema
0 0
Read Time:2 Minute, 13 Second

Last year I wrote about JSoup, a Java library that helps with screenscraping: Screenscraping from Java using jsoup – effective data gathering from websites (https://technology.amis.nl/blog/13121/screenscraping-from-java-using-jsoup-effective-data-gathering-from-websites). Last month I had another opportunity for using JSoup, this time to gather song lyrics for the songs on a CD. The context in this case was the internal SOA for Java Professionals training program at AMIS. The students did an assignment to complete the second block in this three-piece program. Their assignment required them to implement a Web Service that produced the CD Booklet for a certain CD – returned as PDF document with illustration, song titles and song lyrics. One of the resources we made available to the students was a Java Class that returned song lyrics. It was their challenge to integrate this class in a proper way in their application (be it PL/SQL, SOA Suite 11g or OSB based).

The LyricsGatherer is easily constructed using JSoup and the website http://www.songlyrics.com/ (that suffers from periodic and unfortunate loss of service) :

Image

Downdrilling on the search results brings us to the actual song lyrics:

Image

 

And if a browser can do this, so can a Java program (generally speaking and definitely true in this case).

The Java code – leveraging JSoup – to retrieve song lyrics looks like this:

package nl.amis.music.lyrics;

import java.io.IOException;

import java.util.ArrayList;
import java.util.List;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.nodes.Node;
import org.jsoup.nodes.TextNode;

public class LyricsGatherer {

   private final static String songLyricsURL = "http://www.songlyrics.com";


   public static List<String> getSongLyrics( String band, String songTitle) throws IOException {
     List<String> lyrics= new ArrayList<String>();

     Document doc = Jsoup.connect(songLyricsURL+ "/"+band.replace(" ", "-").toLowerCase()+"/"+songTitle.replace(" ", "-").toLowerCase()+"-lyrics/").get();
     String title = doc.title();
     System.out.println(title);
     Element p = doc.select("p.songLyricsV14").get(0);
      for (Node e: p.childNodes()) {
          if (e instanceof TextNode) {
            lyrics.add(((TextNode)e).getWholeText());
          }
      }
     return lyrics;
   }

   public static void main(String[] args) throws IOException {
      System.out.println(LyricsGatherer.getSongLyrics("U2", "With or Without You"));
      System.out.println(LyricsGatherer.getSongLyrics("Billy Joel", "Allentown"));
      System.out.println(LyricsGatherer.getSongLyrics("Tori Amos", "Winter"));
    }
}

The results

Image

can easily be returned in a Web Service style fashion.

Resources

Download the source discussed in this article:SongLyricsProvider.zip .

About Post Author

Lucas Jellema

Lucas Jellema, active in IT (and with Oracle) since 1994. Oracle ACE Director and Oracle Developer Champion. Solution architect and developer on diverse areas including SQL, JavaScript, Kubernetes & Docker, Machine Learning, Java, SOA and microservices, events in various shapes and forms and many other things. Author of the Oracle Press book Oracle SOA Suite 12c Handbook. Frequent presenter on user groups and community events and conferences such as JavaOne, Oracle Code, CodeOne, NLJUG JFall and Oracle OpenWorld.
Happy
Happy
0 %
Sad
Sad
0 %
Excited
Excited
0 %
Sleepy
Sleepy
0 %
Angry
Angry
0 %
Surprise
Surprise
0 %
Next Post

Using custom functions in EL expressions in JSF 1.x

EL expressions are one of the main driving forces for JavaServer Faces. Most dynamic characteristics of pages and widgets are governed by EL expressions. In JSF 1.x, there are some limitations for EL expressions that can at times be a little frustrating. One of the limitations is the fact that […]
%d bloggers like this: