Extending XML Document Validation with Schematron

The use of XML documents is common practice nowadays and so is XML schema (XSD) to validate XML documents. XML validation is often needed to ensure structure, content and relations are correct and valid. However validation only using a schema (XSD) only covers a small part: it can describe the basic XML structure (valid elements and order) and some basic content validation of a XML node. Schematron can be used to cover the remaining part of XML validation, like:

  • Advanced structure validation
    e.g. element A should have either attribute X or attribute Y, but not both and always one of them
  • Structure depending on content
    e.g. when attribute A of element B has value ‘x’ , then it should have child element C
  • Content validation on multiple nodes
    e.g. sum of all percentage elements should be 100
  • Relations between elements
    e.g. For each employee element with a manager attribute there should be another employee element with an id attribute having the same value (meaning manager of employee should exist)


Schematron, just like an XSD, is an XML document itself. Each validation rule is defined by a rule element. The rule element has a context attribute to define on which node (or nodes) of your target XML the rule applies to. A X-path expression is used to define this context. For example a XML document containing one or more Department elements and we want to define a rule for each Department element in the XML document:


A rule element has one or more report and/or assert elements. Both contain a test attribute with the actual validation rule, the test. The only difference between them is that a report element results in an output error when the test results in (boolean) true whereas the error element results in an output error when the test results in (boolean) false.
For the test attribute also an X-path expression is used to define the validation rule.
Let’s say our Department element has two attributes, “name” and “abbr” (abbreviation) and two business rules apply:

  1. abbr should contain at least two characters
  2. abbr should contain less characters then name

Defining these rules with Schematron results in: (in xml < character is written as &lt;)


  Abbreviation too short
  Abbreviation too long

To complete the Schematron XML document, a rule element is a child element of the pattern element. The pattern element is used to group rules and a provide a name for the group. It’s only for readability and it has no further technical meaning. With root element schema we can finish the Schematron document.
The complete Schematron document of our little example:

< ?xml version="1.0" encoding="UTF-8" ?>

  
    
      Abbreviation too short
      Abbreviation too long
    
  

Below another source example with the rule that the total sum of Percent elements somewhere (childs or even grandchilds) within a Total element should be 100.

< ?xml version="1.0" encoding="UTF-8" ?>

     
          
               Sum is not 100%.
          
     

Before continuing with a complex example with relations between elements, how do we get this to work?
In fact, that’s quite easy. You only need to be able to do xsl(t) translations!
The beauty about Schematron is that it’s not a new technology, but just clever usage of xslt translations. No new language is needed and you even don’t need to learn xslt, just basic knowledge of X-path and XML are sufficient.

The trick is that you have to transform your Schematron XML containing your validation rules into a xslt that contains your validation rules. Then you use this generated xslt for validation of the XML documents by doing a xslt translation. And how do you generate your rules xslt from your Schematron document…also by an xslt translation! Your rules xslt is the result of the translation of your Schematron XML (with your validation rules) with a provided Schematron xslt (iso_schematron_skeleton_for_xslt1.xsl or iso_schematron_skeleton_for_xslt2.xsl, downloadable from schematron.com).
So it’s a two step approach. First you translate your Schematron rules XML with the Schematron xslt resulting into a new xslt. This xslt contains your rules. Now you can use your generated xslt to validate xml documents by doing a xslt translation. This final translation results into your errors or no output when the validation succeeds.

Schematron two step validation proces

In a production environment most of the time the rules are predefined or do not change (often), so the generated xslt can be stored (or cached).

To show the possibilities of Schematron validation I finalize this blog with the promised complex example with element relation rules.
Let’s start with the target XML, so the XML data which has to be validated. With example data it is easier to understand the Schematron rules.

< ?xml version="1.0" encoding="UTF-8" ?>

  
    
      
        J. Jansen
        1000
      
      
        P. Klaasen
        1100
      
    
  
  
    
      
        M. A. Neger
        1700
      
      
        L.E. Ader
        1500
      
      
        P.R. Esident
        2500
      
    
  

We want to implement the following business rules:

  • All employees of department “The Floor” should have less salary than any manager (=employee in department “Managers”).
  • An employee may not be the manager of himself.
  • There is only one manager without a manager (only one president).
  • The relation manager and employee is a valid one, so the manager of an employee must exist. This means that for each employee with a manager attribute there must be a manager with attribute id with the same value.

In Schematron xml these rules result into:

< ?xml version="1.0" encoding="UTF-8" ?>

  
    
      Too much
    
  
  
    
      Own manager
    
  
  
    
      More than one president
    
  
  
    
      Not a valid manager
    
  

More information can be found at schematron.com.
An easy step by step Schematron tutorial can be found here.

 

4 Comments

  1. Emiel Paasschens January 6, 2012
  2. Chris January 5, 2012
  3. Emiel Paasschens May 20, 2010
  4. Sumit Tambekar May 19, 2010