Performance guide-lines for XSL-T development

2

XSL-T transformation for Oracle developers.

When transforming XML with XSL-T stylesheets you should consider the performance of your transformation. Analog to tuning the Oracle database, XSL-T transformations can and should also be tuned to boost up the performance in both speed and memory. In the majority of the XSL-T transformation cases that are performed by us, i.e. transforming data-oriented XML documents (relatively small result sets queried from a database) to either HTML, CSV, SVG and PDF, the parsing of the stylesheet takes up to approximately 60 – 75% of the total processing time.

Transformation costs of an XML document can be indicated by the amount of memory consumed by the transformer-process and in the parse/transform-ratio, among others. You can minimize the parse/transform-ratio by reusing the same transformer instance for more XML documents. In the java language instead of instantiating a transformer object each time you want to perform a transformation, you could use a javax.xml.transform.Templates interface to parse the stylesheet once and then instantiate a Transformer object from the transformation context when a transformation is needed. In PL/SQL, using the DBMS_XSLPROCESSOR supplied package transformation is performed by calling newStylesheet() once and then make more calls to procesXSL() within a stored procedure.

Below I wrote a short-list of performance guide-lines for XSL-T transformations and development of stylesheets. Examples follow in short time.

1. Minimize unneccessary traversing of a node-tree.
Select or match a nested node by selecting the complete tree instead of using a wildcard-search or even worse a double-slash to indicate a node on any level in the tree. Do: <xsl:template match=’/a/b/c/d’> instead of <xsl:template match=’//d’>.

2. Minimize unecessarily reliance of the default stylesheet (identity transformation).
Nodes that are not processed in your stylesheet are processed by the identity stylesheet (doing <xsl:apply-templates/>. This adds a small overhead when the node tree is deeply nested. In such case it is better to select or match directly to the nested node.

3. Use selecting or matching nodes over filtering nodes.
In most cases it is prefered to select a node or to match a node than selecting all nodes (via <xsl:for-each>) and filter the nodes (using <xsl:if test…>) you are searching for.

4. Avoid sorting when you can.
Sorting in the stylesheet is expensive. Because in the majority of our cases (see above) the datasource is a database, let the database do the sorting. Generally a database (and the Oracle database in particular) is far more better in sorting data. If however sorting during transformation is unavoidable, consider to cache the sort-result in a variable for reuse.

5. Use indexes on often selected node-sets.
XSL-T indexes (<xsl:key> and key() function) are the equivalent of indexes in the Oracle database: when properly used they boost up the performance on selecting or matching the indexed node.

6. Consider using either push- or pull-model.
Use pull-model when the structure of the result-document is leading. For example when generating a XSL-FO or HTML-page the structure of that page is more or less leading. Insert/merge the values from the XML source into the XSL-FO or HTML. Typically less template matches are found in the transformation stylesheet.

Use push-model when the structure of the XML source is leading. The result document is generated by merging many matched templates. I would use the push-model when iterating over sibling nodes (<xsl:for-each>).

7. Consider to use named templates for reusable functionality.
Named templates can be seen as the equivalent of PL/SQL functions or java methods with a node(-set) as input and a node(-set) as return value, and as such very suitable for reuseable functionality. Named templates are ideal for rendering recurring fragments in result trees, for instance to add a specific look-and-feel to HTML-tables.

8. Use variables for frequently used node(-sets).
Instead of freqently re-selecting or re-matching a node-set, cache it in a variable and make calls to the variable. Using a variable results in a minor memory usage overhead, but the more often you us the variable instead of re-select the larger the benefit will become.

9. Debug your code.
Debugging is simple done by using <xsl:message>. Do remove your debugging message before going in production as it comsumes memory and the debug messages are writen on unexpected places (client browser, console, …).
As go-between solution, you can filter out or switch debug messages equivalent to a log4J (or log4PLSQL) solution:

<code>&lt;</code>xsl:variable name='debugOn' select=true()/&gt;<br />... xsl fragment ...<br /><code>&lt;</code>xsl:if test='$debugOn'&gt;<br />   <code>&lt;</code>xsl:message terminate='no'&gt;hello from the debugger<code>&lt;</code>/xsl:message&gt;<br /><code>&lt;</code>/xsl:if&gt;<br />... more xslt....

Before going to production change the variable debugOn to false() and you’re done.

10. Make your code more robust by placing error messages.
(<xsl:message terminate=’yes’>) can be used to catch (fatal) errors. Terminate=’yes’ will end the transformation process.

Share.

About Author

2 Comments

  1. Indeed, the Muenchian method for grouping data is very usable when you want to group nodes using xsl version 1.0. Also, notice how he make use of the key element and function to improve the performance of the stylesheet (point 5 in my blog).

    As of xsl version 2.0 a new grouping element is added to the language:

    <xsl:for-each-group select='an/XPath/expression' group-by='.'>
    <xsl:apply-templates select='current-group()'/>
    <xsl:for-each-group>

    or

    <xsl:for-each-group select='an/XPath/expression' group-starting-with='.'>
    <xsl:apply-templates select='current-group()'/>
    <xsl:for-each-group>

    If you compare this method with the Muenchian method (click the hyperlink in Lucas’ comment), you’ll see that this addition requires less complex coding.

    Also, in xsl version 2.0, you will have the advantage of new grouping functions: current-group() with returns the current group node selected by previous for-each-group.

    In fact, I was planning to elaborate on this new feature during a KC Server Dev. & Prog. Lang. meeting in the next months.

  2. Tips for Sorting and Grouping data in XSLT: Steve Muench of Oracle Corporation conceived of a very elegant, initially somewhat complex and very efficient way of sorting and grouping data in XML, using xsl:key. An example and introduction of this ‘Muenchian’ method is found here. An alternative approach is described in this tutorial from King’s College, London.