Retrieve song lyrics in Java using Screenscraping with JSoup


Last year I wrote about JSoup, a Java library that helps with screenscraping: Screenscraping from Java using jsoup – effective data gathering from websites ( Last month I had another opportunity for using JSoup, this time to gather song lyrics for the songs on a CD. The context in this case was the internal SOA for Java Professionals training program at AMIS. The students did an assignment to complete the second block in this three-piece program. Their assignment required them to implement a Web Service that produced the CD Booklet for a certain CD – returned as PDF document with illustration, song titles and song lyrics. One of the resources we made available to the students was a Java Class that returned song lyrics. It was their challenge to integrate this class in a proper way in their application (be it PL/SQL, SOA Suite 11g or OSB based).

The LyricsGatherer is easily constructed using JSoup and the website (that suffers from periodic and unfortunate loss of service) :


Downdrilling on the search results brings us to the actual song lyrics:



And if a browser can do this, so can a Java program (generally speaking and definitely true in this case).

The Java code – leveraging JSoup – to retrieve song lyrics looks like this:



import java.util.ArrayList;
import java.util.List;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.nodes.Node;
import org.jsoup.nodes.TextNode;

public class LyricsGatherer {

   private final static String songLyricsURL = "";

   public static List<String> getSongLyrics( String band, String songTitle) throws IOException {
     List<String> lyrics= new ArrayList<String>();

     Document doc = Jsoup.connect(songLyricsURL+ "/"+band.replace(" ", "-").toLowerCase()+"/"+songTitle.replace(" ", "-").toLowerCase()+"-lyrics/").get();
     String title = doc.title();
     Element p ="p.songLyricsV14").get(0);
      for (Node e: p.childNodes()) {
          if (e instanceof TextNode) {
     return lyrics;

   public static void main(String[] args) throws IOException {
      System.out.println(LyricsGatherer.getSongLyrics("U2", "With or Without You"));
      System.out.println(LyricsGatherer.getSongLyrics("Billy Joel", "Allentown"));
      System.out.println(LyricsGatherer.getSongLyrics("Tori Amos", "Winter"));

The results


can easily be returned in a Web Service style fashion.


Download the source discussed in this .

About Author

Lucas Jellema, active in IT (and with Oracle) since 1994. Oracle ACE Director and Oracle Developer Champion. Solution architect and developer on diverse areas including SQL, JavaScript, Kubernetes & Docker, Machine Learning, Java, SOA and microservices, events in various shapes and forms and many other things. Author of the Oracle Press book Oracle SOA Suite 12c Handbook. Frequent presenter on user groups and community events and conferences such as JavaOne, Oracle Code, CodeOne, NLJUG JFall and Oracle OpenWorld.

Comments are closed.