Parsing RSS and Atom with ROME, easier is impossible

ROME
is a java library to do things with RSS and Atom feeds. Things can be
reading, writing, merging and converting. Because RSS and Atom feeds are a
standard there is one standard Object where every entry in the feed is converted
to (SyndEntry). In this article I’ll describe how easy it is to parse an RSS
feed with ROME.
Some people  probably have written their
own simple RSS parser, but believe me, this one is easier and can do more.

 

Parsing an RSS feed shouldn’t be too difficult. The last
time I did it was because  I wanted to
get the RSS feed from Apple to see what new movie trailers they had to offer. I
included a dependency to JDOM 1.0, Jaxen and Saxpath in Maven2 and made a
Trailer object (which is a simple bean with variables for title, link,
description and date).

Then I build this piece of code:

SAXBuilder builder = new SAXBuilder();

ArrayList<Trailer>
list=
new ArrayList<Trailer>();

Document doc = builder.build(new
URL(
http://images.apple.com/trailers/rss/newtrailers.rss”));

 

XPath xpath = XPath.newInstance(“//item”);

List<Element> results = xpath.selectNodes(doc);

for(Element
e:results){

Trailer t=
new
Trailer();

     
t.setTitle(e.getChild(“title”).getText());

t.setLink(e.getChild(“link”).getText());

t.setDescription(e.getChild(“description”).getText().replaceAll(“\n”,“”));

      String
dateString=e.getChild(“pubDate”).getText();


t.setDate(Utils.parseDate(dateString));

      list.add(t);

}

That looks quite simple already, but I didn’t like that I
had to populate the Trailer object and use XPath to get the items in the feed.
I was positive that someone already did this and made a nice wrapper for it. I
started searching and found ROME
(https://rome.dev.java.net/)

Parsing with ROME

My way of researching new libraries is getting something on
my screen as soon as possible and understand the details later. So that’s what
I’ll show you.

Start a new project in your favorite IDE and include the
jars for JDOM  and ROME  (see the end of this document where to
download the files or which dependency to include for Maven2).

Now start a new JUnit test or make a simple application and
put the following snippet somewhere where it will be executed:

SyndFeedInput
sfi=new SyndFeedInput();

URL url=new URL(“http://images.apple.com/trailers/rss/newtrailers.rss”);

SyndFeed feed
= sfi.build(new XmlReader(url));

List<SyndEntry>
entries = feed.getEntries();

for (SyndEntry entry:entries){

    System.out.println(entry.getTitle());

}

That’s not much is it? For the people who like to show off
how little lines of code they need it would look like this:

List<SyndEntry>
entries = new SyndFeedInput().build(new XmlReader(new
URL(
“http://images.apple.com/trailers/rss/newtrailers.rss”))).getEntries();

Only one line to get the feed, parse it and put it in a
list!

Details

Now it’s time for the details. The SyndFeedInput can handle
all types of RSS and Atom 0.3 feeds. The input of a SyndFeedInput can be a W3C
xml Document, JDOM Document, File, Sax inputsource or a java.io.Reader.

 The XmlReader does some voodoo according to the
documentation. This voodoo is trying to figure out the character set of the
xml. With my old parser I didn’t take into account that my streams could have
strange characters. If I wanted to subscribe to this Japanese feed I ran into
trouble. No output at all! With the Voodoo of ROME I didn’t have to worry about it
anymore:

Parsing RSS and Atom with ROME, easier is impossible romejapanese

I still don’t understand Japanese but at least the
characters are right. When you also get question marks the first time you
probably forgot to set the outputtype to UTF-8. In Windows you also have some trouble getting the characters
right, the solution is creating a servlet with jsp that writes html to you
browser. Include <%@
page contentType=”text/html; charset=UTF-8″ %>
in your jsp
and it should work.
Another nice detail is that the publishing date is converted to a java.util.date Object.

But wait, there is more…

Reading feeds and converting them to a SyndEntry is not the
only thing ROME
can do. It’s also possible to convert one format to the other, create your own
feeds, merge feeds and output a feed to an outputstream (like a file or
servlet)

The quality of the ROME
wiki is very good. There are some nice tutorials and links to articles about ROME. Usually with small
projects like these the documentation is really bad and you have to figure out
a lot yourself, but with ROME
even the Javadoc is quite elaborate.

You know there exists something like ROME but don’t know where it is, now you
know.

Files needed

 rome-0.8.jar and jdom-1.0jar

 
Or a Maven2 dependency:

<dependency>

  <groupId>rome</groupId>

  <artifactId>rome</artifactId>

  <version>0.8</version>

</dependency>

 

Sources

ROME Project site

ROME Wiki

ROME in a Day: Parse and Publish Feeds in Java (an article on xml.com)

2 Comments

  1. ray September 5, 2007
  2. Patrick Chanezon May 2, 2006