Querying RSS feeds in SQL – contortionism with Oracle XML DB in 10 lines of SQL

There are times when a challenge presents itself. Often for no logical reason and without any direct benefit. Like doing a crosswords. I have this unconscious part of me that keeps sending me inane challenges, that I cannot seem to shake myself free of. Yesterday morning – very early morning I might add – I felt this urge to write a SQL query that can directly query RSS feeds pulled from the internet into the database with UTL_HTTP. So at 5.30AM, I was sitting at the kitchen table with my laptop, facing this challenge. And it turned out to be fairly simple and rather elegant. So I will share it with you.

My objective was to write a query that allows me to find all Articles in a number of RSS feeds that discuss a certain topic and order them by publication date. Something like:

select title
,      author
,      publication_date
,      url
,      feedname
from   rss_collection
where  instr(title, 'TOPIC') > 0
order
by     publication_date desc

And of course the rss_collection is not a table in my database but some sort of table-like construct that goes out to the internet to fetch the rss-feed data real time. Using XMLTable, this is quite doable, as I will demonstrate.

The first small step was to create a PL/SQL function to retrieve an entire RSS feed with one call. This function is get_rss(p_url in varchar2) shown here.

create or replace
function get_rss(p_url in varchar2)
return clob
is
  l_part utl_http.html_pieces;
  l_feed clob:='';
begin
    l_part := utl_http.request_pieces(p_url);
    for i in 1 .. l_part.count loop
       l_feed:= l_feed ||l_part(i);
    end loop;
    return l_feed;
end get_rss;

It uses utl_http to retrieve the RSS Feed based on its URL from the internet and concatenates the feed contents in a CLOB.

RSS is in a specific XML format, so the CLOB returned by this function can be instantiated as an XMLType variable: XMLType( get_rss(url)).

Note: Patrick Wolf wrote a comment to the original version of this article to point out that such a PL/SQL function is not necessary, as Oracle provides a function called HTTPURITYPE that will do something similar much more easily:

The Oracle build-in HTTPURITYPE is really powerful, it will retrieve the content of an URL and if you specify getXML, automatically convert the XML output into an Oracle XMLTYPE. As soon as it is a XMLTYPE you can use all the XML build-ins to extract the data from the XML stream.”

That means that instead of  XMLType( get_rss(url)) we can simply use: HTTPURITYPE(url).getXML()

The first step:

select xmlquery( 'for $i in //item
                  return <Article>{$i/title}<timestamp>{xs:string($i/pubDate)}</timestamp>{$i/link}</Article>
                 '
                 passing by value feed
                 returning content
                )
from   ( select httpuritype('https://technology.amis.nl/blog/?feed=rss2').getXML() feed
         from   dual
       )

Querying RSS feeds in SQL - contortionism with Oracle XML DB in 10 lines of SQL

The big challenge is to turn this chunk of XML data into a relational resultset. And that is where XMLTable comes in – a relative of the TABLE operator that can turn a Nested Table into a queryable result set. XMLTable does exactly that:

select title
,      link
,      author
,      to_date(substr(publication_date,6,20),'dd mon yyyy hh24:mi:ss') timest
from  xmltable(XMLNAMESPACES('http://purl.org/dc/elements/1.1/' AS "dc"),'for $i in //item
                  return <Article>{$i/title}{$i/dc:creator}{$i/pubDate}{$i/link}</Article>
                 '
                 passing httpuritype('https://technology.amis.nl/blog/?feed=rss2').getXML()
                  COLUMNS
                  title     varchar2(100) path 'title'
                  , link     varchar2(100) path 'link'
                  , author     varchar2(100) path 'dc:creator'
                  , publication_date     varchar2(100) path 'pubDate'
               )

The results:

Querying RSS feeds in SQL - contortionism with Oracle XML DB in 10 lines of SQL

Combining multiple feeds in a single query now becomes a piece of that famous cake:

select feed_name
,      title
,      link
,      author
,      to_date(substr(publication_date,6,20),'dd mon yyyy hh24:mi:ss') timest
from  (select 'AMIS Weblog' feed_name
       ,      httpuritype('https://technology.amis.nl/blog/?feed=rss2').getXML() feed
       from   dual
       union all
       select 'OTN Headlines' feed_name
       ,      xmltype(get_rss('http://www.oracle.com/technology/syndication/rss_otn_news.xml')) feed
       from   dual
       union all
       select 'OraBlogs' feed_name
       ,      xmltype(get_rss('http://blogs.oracle.com/readingLists/oracleblogs.xml')) feed
       from   dual
      ) feeds
,     xmltable( XMLNAMESPACES('http://purl.org/dc/elements/1.1/' AS "dc")
               , 'for $i in //item
                  return <Article>{$i/title}{$i/dc:creator}<timestamp>{xs:string($i/pubDate)}</timestamp>{$i/link}</Article>
                 '
                 passing feeds.feed
                  COLUMNS
                    title     varchar2(100) path 'title'
                  , link      varchar2(100) path 'link'
                  , author    varchar2(100) path 'dc:creator'
                  , publication_date  varchar2(100) path 'timestamp'
               )
order
by    timest desc  

Querying RSS feeds in SQL - contortionism with Oracle XML DB in 10 lines of SQL

 

Resources

Text document will all queries from this article: xmlquery_rssfeed.txt . Note: this demo was created against an Oracle 11g – 11.1.0.6 Database. I am not sure if it will work against earlier versions.

6 Comments

  1. Husnu Sensoy November 6, 2007
  2. Lucas Jellema November 3, 2007
  3. anton November 3, 2007
  4. Lucas Jellema November 2, 2007
  5. Gareth Roberts November 2, 2007
  6. Patrick Wolf November 2, 2007