There are times when a challenge presents itself. Often for no logical reason and without any direct benefit. Like doing a crosswords. I have this unconscious part of me that keeps sending me inane challenges, that I cannot seem to shake myself free of. Yesterday morning – very early morning I might add – I felt this urge to write a SQL query that can directly query RSS feeds pulled from the internet into the database with UTL_HTTP. So at 5.30AM, I was sitting at the kitchen table with my laptop, facing this challenge. And it turned out to be fairly simple and rather elegant. So I will share it with you.
My objective was to write a query that allows me to find all Articles in a number of RSS feeds that discuss a certain topic and order them by publication date. Something like:
select title , author , publication_date , url , feedname from rss_collection where instr(title, 'TOPIC') > 0 order by publication_date desc
And of course the rss_collection is not a table in my database but some sort of table-like construct that goes out to the internet to fetch the rss-feed data real time. Using XMLTable, this is quite doable, as I will demonstrate.
The first small step was to create a PL/SQL function to retrieve an entire RSS feed with one call. This function is get_rss(p_url in varchar2) shown here.
create or replace function get_rss(p_url in varchar2) return clob is l_part utl_http.html_pieces; l_feed clob:=''; begin l_part := utl_http.request_pieces(p_url); for i in 1 .. l_part.count loop l_feed:= l_feed ||l_part(i); end loop; return l_feed; end get_rss;
It uses utl_http to retrieve the RSS Feed based on its URL from the internet and concatenates the feed contents in a CLOB.
RSS is in a specific XML format, so the CLOB returned by this function can be instantiated as an XMLType variable: XMLType( get_rss(url)).
Note: Patrick Wolf wrote a comment to the original version of this article to point out that such a PL/SQL function is not necessary, as Oracle provides a function called HTTPURITYPE that will do something similar much more easily:
“The Oracle build-in HTTPURITYPE is really powerful, it will retrieve the content of an URL and if you specify getXML, automatically convert the XML output into an Oracle XMLTYPE. As soon as it is a XMLTYPE you can use all the XML build-ins to extract the data from the XML stream.”
That means that instead of XMLType( get_rss(url)) we can simply use: HTTPURITYPE(url).getXML()
The first step:
select xmlquery( 'for $i in //item return <Article>{$i/title}<timestamp>{xs:string($i/pubDate)}</timestamp>{$i/link}</Article> ' passing by value feed returning content ) from ( select httpuritype('https://technology.amis.nl/blog/?feed=rss2').getXML() feed from dual )
The big challenge is to turn this chunk of XML data into a relational resultset. And that is where XMLTable comes in – a relative of the TABLE operator that can turn a Nested Table into a queryable result set. XMLTable does exactly that:
select title , link , author , to_date(substr(publication_date,6,20),'dd mon yyyy hh24:mi:ss') timest from xmltable(XMLNAMESPACES('http://purl.org/dc/elements/1.1/' AS "dc"),'for $i in //item return <Article>{$i/title}{$i/dc:creator}{$i/pubDate}{$i/link}</Article> ' passing httpuritype('https://technology.amis.nl/blog/?feed=rss2').getXML() COLUMNS title varchar2(100) path 'title' , link varchar2(100) path 'link' , author varchar2(100) path 'dc:creator' , publication_date varchar2(100) path 'pubDate' )
The results:
Combining multiple feeds in a single query now becomes a piece of that famous cake:
select feed_name , title , link , author , to_date(substr(publication_date,6,20),'dd mon yyyy hh24:mi:ss') timest from (select 'AMIS Weblog' feed_name , httpuritype('https://technology.amis.nl/blog/?feed=rss2').getXML() feed from dual union all select 'OTN Headlines' feed_name , xmltype(get_rss('http://www.oracle.com/technology/syndication/rss_otn_news.xml')) feed from dual union all select 'OraBlogs' feed_name , xmltype(get_rss('http://blogs.oracle.com/readingLists/oracleblogs.xml')) feed from dual ) feeds , xmltable( XMLNAMESPACES('http://purl.org/dc/elements/1.1/' AS "dc") , 'for $i in //item return <Article>{$i/title}{$i/dc:creator}<timestamp>{xs:string($i/pubDate)}</timestamp>{$i/link}</Article> ' passing feeds.feed COLUMNS title varchar2(100) path 'title' , link varchar2(100) path 'link' , author varchar2(100) path 'dc:creator' , publication_date varchar2(100) path 'timestamp' ) order by timest desc
Resources
Text document will all queries from this article: xmlquery_rssfeed.txt . Note: this demo was created against an Oracle 11g – 11.1.0.6 Database. I am not sure if it will work against earlier versions.
A sample Oracle API written by me on source-forge a time ago.
O-RSS: http://sourceforge.net/projects/o-rss/
Anton,
That one completely slipped my mind – both the conscious and the unconscious parts. Apparently this new article is nothing but a summary of what information we had gathered before. Well, that is not overly useful, is it? I will try to find really new things then to write about. One thing we can conclude is that finally I have arrived at code that is much more compact and elegant than what I once proposed – back in April.
Best Regards,
Lucas
Lucas, did you ever read this blog entry (and all the comments): http://technology.amis.nl/blog/?p=1173
Patrick, Thanks for the tip! I was not aware of that function. That is really useful. Lucas
Lucas and Patrick, you’re on the ball as usual, great stuff!
Gareth
Lucas,
you can also use HTTPURITYPE, then you do not even need the PL/SQL function. See my posting
http://inside-apex.blogspot.com/2007/02/integrating-yahoo-pipes-into-apex.html
Patrick