Querying RSS feeds in SQL – contortionism with Oracle XML DB in 10 lines of SQL

6

There are times when a challenge presents itself. Often for no logical reason and without any direct benefit. Like doing a crosswords. I have this unconscious part of me that keeps sending me inane challenges, that I cannot seem to shake myself free of. Yesterday morning – very early morning I might add – I felt this urge to write a SQL query that can directly query RSS feeds pulled from the internet into the database with UTL_HTTP. So at 5.30AM, I was sitting at the kitchen table with my laptop, facing this challenge. And it turned out to be fairly simple and rather elegant. So I will share it with you.

My objective was to write a query that allows me to find all Articles in a number of RSS feeds that discuss a certain topic and order them by publication date. Something like:

select title
,      author
,      publication_date
,      url
,      feedname
from   rss_collection
where  instr(title, 'TOPIC') > 0
order
by     publication_date desc

And of course the rss_collection is not a table in my database but some sort of table-like construct that goes out to the internet to fetch the rss-feed data real time. Using XMLTable, this is quite doable, as I will demonstrate.....

The first small step was to create a PL/SQL function to retrieve an entire RSS feed with one call. This function is get_rss(p_url in varchar2) shown here.

create or replace
function get_rss(p_url in varchar2)
return clob
is
  l_part utl_http.html_pieces;
  l_feed clob:='';
begin
    l_part := utl_http.request_pieces(p_url);
    for i in 1 .. l_part.count loop
       l_feed:= l_feed ||l_part(i);
    end loop;
    return l_feed;
end get_rss;

It uses utl_http to retrieve the RSS Feed based on its URL from the internet and concatenates the feed contents in a CLOB.

RSS is in a specific XML format, so the CLOB returned by this function can be instantiated as an XMLType variable: XMLType( get_rss(url)).

Note: Patrick Wolf wrote a comment to the original version of this article to point out that such a PL/SQL function is not necessary, as Oracle provides a function called HTTPURITYPE that will do something similar much more easily:

"The Oracle build-in HTTPURITYPE is really powerful, it will retrieve the content of an URL and if you specify getXML, automatically convert the XML output into an Oracle XMLTYPE. As soon as it is a XMLTYPE you can use all the XML build-ins to extract the data from the XML stream."

That means that instead of  XMLType( get_rss(url)) we can simply use: HTTPURITYPE(url).getXML()

The first step:

select xmlquery( 'for $i in //item
                  return <Article>{$i/title}<timestamp>{xs:string($i/pubDate)}</timestamp>{$i/link}</Article>
                 '
                 passing by value feed
                 returning content
                )
from   ( select httpuritype('http://technology.amis.nl/blog/?feed=rss2').getXML() feed
         from   dual
       )

The big challenge is to turn this chunk of XML data into a relational resultset. And that is where XMLTable comes in – a relative of the TABLE operator that can turn a Nested Table into a queryable result set. XMLTable does exactly that:

select title
,      link
,      author
,      to_date(substr(publication_date,6,20),'dd mon yyyy hh24:mi:ss') timest
from  xmltable(XMLNAMESPACES('http://purl.org/dc/elements/1.1/' AS "dc"),'for $i in //item
                  return <Article>{$i/title}{$i/dc:creator}{$i/pubDate}{$i/link}</Article>
                 '
                 passing httpuritype('http://technology.amis.nl/blog/?feed=rss2').getXML()
                  COLUMNS
                  title     varchar2(100) path 'title'
                  , link     varchar2(100) path 'link'
                  , author     varchar2(100) path 'dc:creator'
                  , publication_date     varchar2(100) path 'pubDate'
               )

The results:

Combining multiple feeds in a single query now becomes a piece of that famous cake:

select feed_name
,      title
,      link
,      author
,      to_date(substr(publication_date,6,20),'dd mon yyyy hh24:mi:ss') timest
from  (select 'AMIS Weblog' feed_name
       ,      httpuritype('http://technology.amis.nl/blog/?feed=rss2').getXML() feed
       from   dual
       union all
       select 'OTN Headlines' feed_name
       ,      xmltype(get_rss('http://www.oracle.com/technology/syndication/rss_otn_news.xml')) feed
       from   dual
       union all
       select 'OraBlogs' feed_name
       ,      xmltype(get_rss('http://blogs.oracle.com/readingLists/oracleblogs.xml')) feed
       from   dual
      ) feeds
,     xmltable( XMLNAMESPACES('http://purl.org/dc/elements/1.1/' AS "dc")
               , 'for $i in //item
                  return <Article>{$i/title}{$i/dc:creator}<timestamp>{xs:string($i/pubDate)}</timestamp>{$i/link}</Article>
                 '
                 passing feeds.feed
                  COLUMNS
                    title     varchar2(100) path 'title'
                  , link      varchar2(100) path 'link'
                  , author    varchar2(100) path 'dc:creator'
                  , publication_date  varchar2(100) path 'timestamp'
               )
order
by    timest desc  

 

 

Resources

 Text document will all queries from this article: xmlquery_rssfeed.txt . Note: this demo was created against an Oracle 11g – 11.1.0.6 Database. I am not sure if it will work against earlier versions.
 

Share.

About Author

Lucas Jellema, active in IT (and with Oracle) since 1994. Oracle ACE Director for Fusion Middleware. Consultant, trainer and instructor on diverse areas including Oracle Database (SQL & PLSQL), Service Oriented Architecture, BPM, ADF, Java in various shapes and forms and many other things. Author of the Oracle Press book: Oracle SOA Suite 11g Handbook. Frequent presenter on conferences such as JavaOne, Oracle OpenWorld, ODTUG Kaleidoscope, Devoxx and OBUG. Presenter for Oracle University Celebrity specials.

6 Comments

  1. Anton,

    That one completely slipped my mind – both the conscious and the unconscious parts. Apparently this new article is nothing but a summary of what information we had gathered before. Well, that is not overly useful, is it? I will try to find really new things then to write about. One thing we can conclude is that finally I have arrived at code that is much more compact and elegant than what I once proposed – back in April.

    Best Regards,
    Lucas