This is the fourth installment in a series of four on the concept of
Post Loading Resources into an HTML document after the document itself
has been loaded. Using AJAX techniques and simple JavaScript library –
the PLR framework – we can very easily get the browser to get hold of
additional contents, after the document has been loaded. This content
can come from a variety of sources, local or remote (using a Proxy
Servlet), and can be processed in many ways. We have seen in the second
installment how we simply paste the post loaded content into the
innerHTML property of container elements like DIV or TD or the value
property of Form elements. Part 3 discusses a more advanced version of
PLR that can handle custom PLR processors as well as dependencies
between differen PLR objects. This part also discussed the concept of
HTML scraping. It also demonstrates how post loaded resources can be
JavaScript libraries that can be dynamically added to the document. We
have used that functionality to dynamically create SELECT elements and
subsequently dynamically populate these lists with post loaded data.
In
this part we will build upon the previous articles and add
functionality to define a refresh interval for Post Load Resource
objects. The PLR framework will re-load the resource every time the
refresh interval expires. Thus we can include news-headlines sections
in portlet-like frames that are periodically refreshed. This article
also discusses client-side XSLT transformations, used for processing
external RSS-feeds as well as processing XML fragments that are the
result of (X)HTML scraping.
The previous installments are listed here:
- Part 1 – Ajax-based Post Loading of resources in HTML pages – for reuse of resources and fast user feedback
- Part 2 – HTML Post Loading Resources Framework (AJAX Based) – Part 2 – Loading and pasting simple content
- Part 3 – HTML Post Loading and Processing Resources using AJAX – Part 3: multiple, dependent resources and custom processing
Client Side XSLT transformation using AJAXSLT (powered by Google)
Since
many of the resources we end up Post Loading are in fact XML documents
– (scraped) XHTML, plain XML, RSS, SOAP – it will be a common
requirement to do XSLT transformations as part of the processing of
these resources. Most modern browsers have built-in support for XSLT
transformations – though the implementations across browsers differ. I
choose to make use of the AJAXSLT library from Google for doing the
client side transformations. I have discussed AJAXSLT in an earlier
article: Introducing AJAXSLT – library for client side, JavaScript, XSLT transformations (good for RIA and AJAX) This article explains where to get AJAXSLT, how to set it up and how to use it.
For
the Post Load Resource library, nothing really changes. We have to
include the AJAXSLT JS-libraries in our HTML document – or load them
themselves as PLR, which is an option:
<script src="misc.js" type="text/javascript"></script> <script src="dom.js" type="text/javascript"></script> <script src="xpath.js" type="text/javascript"></script> <script src="xslt.js" type="text/javascript"></script>
The
XSLT used for the transformation is a PLR and the XML content another
PLR, that depends on the XSLT. This is very easily specified through
the PLR framework:
<h3>Show the contents of a local XML document - XSLT transformed in the client</h3> <DIV style="background-color :yellow;height:150px"> <pre id="PL1"> <Script language="JavaScript"> var deptXsltPLR = addPostLoadResource(null, 'dept2html.xsl', false, 'XSLT_FOR_DEPT_LOADER'); addPostLoadResource('PL1', 'dept.xml', false, 'POSTLOAD_DEPARTMENTS', customXMLXSLTrocessPostLoad, new Array(deptXsltPLR)); </Script> </pre> </DIV>
Here
we set up a DIV and we specify the two PLR objects: first the
deptXsltPLR which post loads the dept2html.xsl stylesheet. This PLR is
not linked to a custom processor or a DOM element – which is logical as
we will not process this XSLT document itself. Next we set up the PLR
for the dept.xml resource; this PLR depends on the deptXsltPLR and it
uses the customXMLXSLTProcessPostLoad function for processing the XML
PLR.
The XML document in this case looks like:
<?xml version="1.0"?> <DEPT> <ROW> <DEPTNO>10</DEPTNO> <DNAME>ACCOUNTING</DNAME> <LOC>NEW YORK</LOC> </ROW> <ROW> <DEPTNO>20</DEPTNO> <DNAME>RESEARCH</DNAME> <LOC>DALLAS</LOC> </ROW> <ROW> <DEPTNO>30</DEPTNO> <DNAME>SALES</DNAME> <LOC>CHICAGO</LOC> </ROW> <ROW> <DEPTNO>40</DEPTNO> <DNAME>OPERATIONS</DNAME> <LOC>BOSTON</LOC> </ROW> </DEPT>
And the XSLT stylesheet we use for transforming it is this one:
<?xml version="1.0" encoding="windows-1252" ?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <!-- Root template --> <xsl:template match="/"> <H2>Departments</H2> <ul> <xsl:for-each select="DEPT/ROW"> <li> <xsl:value-of select="DNAME" /> <xsl:text> - located in </xsl:text> <xsl:value-of select="LOC" /> </li> </xsl:for-each> </ul> </xsl:template> </xsl:stylesheet>
The function that will do the processing:
function customXMLXSLTrocessPostLoad() { var postLoadResource = this; var xml = postLoadResource.req.responseXML; var xslt = xmlParse(postLoadResource.dependsOn[0].req.responseText); var html = xsltProcess(xml, xslt); el(this.elementId).innerHTML = html; }
This
function implements a more or less generic way of processing an
XML/XSLT couple of Post Load Resource objects. It assumes that the
context object is the XML PLR and that the first of its dependencies
-dependsOn[0]- refers to the XSLT stylesheet. It then uses the AJAXSLT
library for parsing the XSLT document and performing the
transformation. The result is pasted as an HTML fragment into the
innerHTML property of the target DOM element. The result is shown below:
Combining HTML Scraping with XSLT Transformation
If
we use HTLM scraping – use a web-page as external resource and extract
a fragment from it – we could use XSLT for the post-load processing: if
the HTML fragment is in fact an XHTML or valid XML document, we can
apply the same type of processing used above. In our next example, we
will take the AMIS Technology Weblog as our external source. This fine
blog with lots of interesting articles on fantastic subjects like AJAX,
EJB 3.0, SOA and BPEL, and so much more, provides an RSS feed as well,
but for some reason we want to scrape the homepage:
What
we are interested in, is the list of recent posts. This is found in the
HTML page on the left hand side, probably inside a DIV and presented as
an unordered list. Close inspection of the page source gives us the
clues needed to scrape that recents posts content:
function customProcessAMISBLOG() { // get hold of new posts - that is the UL element following the string Recent Posts var postLoadResource = this; var pos = postLoadResource.req.responseText.indexOf('Recent Posts'); pos = postLoadResource.req.responseText.indexOf('<ul>', pos); // now the end of the <ul> demarcates the end of the list of Recents Posts var endPos = postLoadResource.req.responseText.indexOf('</ul>', pos); // store in string, parse as xml var xmlString = "<recentPosts>"+postLoadResource.req.responseText.substring(pos, endPos+5)+"</recentPosts>"; // transform var xml = xmlParse(xmlString); var xslt = xmlParse(postLoadResource.dependsOn[0].req.responseText); var html = xsltProcess(xml, xslt); el(postLoadResource.elementId).innerHTML=html; }
Inside the HTML document returned from https://technology.amis.nl/blog/index.php
we find the string Recent Posts. We locate the first <UL> element
after that, as well as the corresponding </ul> element. The
entire fragment between these two tags is wrapped in a
<recentPosts> element and parsed as XML document. Subsequently,
it is transformed used the following XSLT:
<?xml version='1.0' encoding='windows-1252'?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <div style="color:green"> <xsl:value-of select="'AMIS Technology Weblog - Recent Posts'"/> <ul> <xsl:for-each select="recentPosts/ul/li"> <li> <xsl:copy-of select="a" /> </li> </xsl:for-each> </ul> </div> </xsl:template> </xsl:stylesheet>
The result of this transformation looks like:
The snippet used in the HTML document to set up this AMIS Blog scraper is the following:
<h3>Download the frontpage of the AMIS Technology Weblog, scrape the list of recent posts and do an XSLT transformation</h3> <DIV class="left" id="PL7" > </DIV> <Script language="JavaScript"> var xsltblogPLR = addPostLoadResource(null, 'amisblog2html.xsl', false, 'XSLT_FOR_AMISBLOG_LOADER'); addPostLoadResource('PL7', 'technology.amis.nl/blog/index.php', true, 'POSTLOAD_AMISBLOG_LOADER', customProcessAMISBLOG,new Array(xsltblogPLR)); </Script>
Post Loading and Processing RSS Feeds
A
far more organized approach than HTML Scraping to
headlines-from-other-websites is the use of RSS feeds. RSS feeds are
simple XML based documents that can very easily be transformed. In this
next example, we will Post Load the RSS feed for the BBC News Headlines
– this feed is found at http://news.bbc.co.uk/rss/newsonline_world_edition/front_page/rss091.xml
– using a simple XSLT stylesheet. We can use the generic function
customXMLXSLTrocessPostLoad() to handle the XML and XSLT resources. The
specifiction of this RSS Reader in the HTML document is the following
snippet:
<h3>Postload the BBC News RSS feed, an XSLT stylesheet and transform to HTML (refresh the newsfeed every 2 minutes) </h3> <Script language="JavaScript"> var xsltPLR = addPostLoadResource('xml', 'rss2html.xsl', false, 'XSLT_FOR_RSS_LOADER'); addPostLoadResource('PL4', 'news.bbc.co.uk/rss/newsonline_world_edition/front_page/rss091.xml', true, 'POSTLOAD_BBCRSS_LOADER', customXMLXSLTrocessPostLoad, new Array(xsltPLR), 120); </Script> <DIV class="right" id="PL4" style="width:60%"> </DIV>
The
XSLT stylesheet used for transforming the RSS feed from Auntie Beeb to
pretty HTML (well, you can read it, can’t you?) is the following:
<?xml version=’1.0′ encoding=’windows-1252′?>
<xsl:stylesheet version=”1.0″ xmlns:xsl=”http://www.w3.org/1999/XSL/Transform”>
<xsl:template match=”/”>
<div style=”color:green”>
<h3><xsl:value-of select=”rss/channel/title/.”/></h3>
<ul>
<xsl:for-each select=”rss/channel/item”>
<li>
<xsl:element name=”a”>
<xsl:attribute name=”target”>
<xsl:text>_blank</xsl:text>
</xsl:attribute>
<xsl:attribute name=”href”>
<xsl:value-of select=”link”/>
</xsl:attribute>
<xsl:value-of select=”title”/>
</xsl:element>
</li>
</xsl:for-each>
</ul>
</div>
</xsl:template>
</xsl:stylesheet>
The feed itself looks like this:
<?xml version="1.0" encoding="ISO-8859-1" ?> <?xml-stylesheet title="XSL_formatting" type="text/xsl" href="/shared/bsp/xsl/rss/nolsol.xsl"?> <rss version="2.0"> <channel> <title>BBC News | News Front Page | World Edition</title> <link>http://news.bbc.co.uk/go/rss/-/2/hi/default.stm</link> ... <copyright>Copyright: (C) British Broadcasting Corporation, see http://news.bbc.co.uk/2/hi/help/rss/4498287.stm for terms and conditions of reuse</copyright> <docs>http://www.bbc.co.uk/syndication/</docs> <ttl>15</ttl> <image> <title>BBC News</title> <url>http://news.bbc.co.uk/nol/shared/img/bbc_news_120x60.gif</url> <link>http://news.bbc.co.uk</link> </image> <item> <title>Bush vows to face dangers head on</title> <description>The US president warns of "danger and decline" if the US fails to face down threats, in his State of the Union speech.</description> <link>http://news.bbc.co.uk/go/rss/-/2/hi/americas/4665758.stm</link> <guid isPermaLink="false">http://news.bbc.co.uk/2/hi/americas/4665758.stm</guid> <pubDate>Wed, 01 Feb 2006 03:54:34 GMT</pubDate> <category>Americas</category> </item> <item> <title>Nepal elections 'will go ahead'</title> <description>King Gyanendra of Nepal says local elections will go ahead next week, speaking a year after he took direct power.</description> <link>http://news.bbc.co.uk/go/rss/-/2/hi/south_asia/4668646.stm</link> <guid isPermaLink="false">http://news.bbc.co.uk/2/hi/south_asia/4668646.stm</guid> <pubDate>Wed, 01 Feb 2006 04:40:03 GMT</pubDate> <category>South Asia</category> </item> ....
You
can see it would be easy to extend the RSS portlet a little with a
summary of the news or an indication of the age of the item.
The Refresh Interval
It
is quite simple to add a Refresh Interval to the PLR framework. We add
it as a new property to the PLR object. Of course, we have to support
that new property in the functions called to add new Post Load Resource
objects:
/*--- content loader object for cross-browser requests ---*/ amis.PostLoadResource=function ( elementId , url , requireProxy // boolean indicating whether or not the resource must be acquired through a proxy from a remote domain , label , processor // function reference of function that will process the resource when received , dependsOn // other PostLoadResources this PLR may depend on, such as an XSLT that an XML depends on for being processed , refreshTime // time interval in seconds after which the resource should be reloaded and reprocessed ) { ... this.processor= processor; this.refreshTime = refreshTime; this.dependsOn = dependsOn; ... } // create a new PLR object and add it to the array of PLR objects to be dealt with when the page has loaded function addPostLoadResource ( elementId , url , requireProxy // boolean indicating whether or not the resource must be acquired through a proxy from a remote domain , label , processor // function reference of function that will process the resource when received , dependsOn , refreshTime // in seconds ) { var plr = new amis.PostLoadResource(elementId, url, requireProxy,label, processor, dependsOn, refreshTime); var size = postLoadResources.push( plr ); // add a new PostLoadResource object to the array plr.id = size -1; // ensure the plr objects knows where it sits in the postLoadResources array return plr; }
So
now we have added a refreshTime property. We have to make use of that
property when we have first processed the resource. If at that time we
find that a refresh time was specified, we have to schedule the next
load & execution of the PLR object. That is done in the
startProcessing() function:
function startProcessing(postLoadResource) { // check if the postLoadResource depends on other plrs that are not yet processed // if so, it goes into the waiting room - state= amis.STATE_LOADED_AND_WAITING ... postLoadResource.state = amis.STATE_PROCESSED; if (postLoadResource.refreshTime) { setTimeout("handleTimeOut('"+postLoadResource.id+"')", postLoadResource.refreshTime*1000); } }// startProcessing
After
processing is done, we look at the refreshTime property. If it is set,
we call the built-in JavaScript function setTimeout. This function has
two input parameters: a String specifying a function to be called and
the time after which that function should be called. We always call the
same function – handleTimeOut(). Since we cannot pass an object in a
string, we pass the id of the PLR object. This id property was set in
the addPostLoadResource() function and corresponds with the index of
the PLR object in the postLoadResources array. Since the refreshTime
property on the PLR is set in seconds and the setTimeout function takes
the timeout interval in miliseconds, we multiply refreshTime by 1000.
function handleTimeOut(id) { postLoadSingleResource(postLoadResources[id]); }
The
function handleTimeOut is a generic function that deals with PLRs that
need refreshing. The function gets passed in an id parameter, that
refers to a PLR object in the postLoadResources array. This object is
retrieved from the array using the id value and subsequently passed to
postLoadSingleResource. Processing takes place in the same way as when
the document was first loaded.
Note: the mechanism currently does
not refresh the PLR objects that have a dependency on the PLR that is
refreshed. If for example we specify a refresh rate for the XLST
stylesheet that we use to transform the BBC News Headlines RSS, then we
probably should refresh the dependent Headlines PLR itself as well,
since otherwise we see no effect of the refreshed XSLT whatsoever.
Clearly this is easy to add to the framework.
To be complete in this matter, the call that creates the PLR with auto-refresh from the HTML document looks something like:
addPostLoadResource('PL4', 'news.bbc.co.uk/rss/newsonline_world_edition/front_page/rss091.xml', true, 'POSTLOAD_BBCRSS_LOADER', customXMLXSLTrocessPostLoad, new Array(xsltPLR), 120);
It
is the parameter 120 that indicates that this particular PLR should be
refreshed every 120 seconds. We will now have the BCC News Headlines
refreshed every 2 minutes – which turns out to be far more frequent
than the BBC itself refreshed the feed…
Resources
Download the Sources for this article: PostLoadResourceDemo3.zip.
The previous installments in this series are listed here:
- Part 1 – Ajax-based Post Loading of resources in HTML pages – for reuse of resources and fast user feedback
- Part 2 – HTML Post Loading Resources Framework (AJAX Based) – Part 2 – Loading and pasting simple content
- Part 3 – HTML Post Loading and Processing Resources using AJAX – Part 3: multiple, dependent resources and custom processing
Related articles
Proxy Servlet for AJAX requests to multiple, remote servers