This blog is about XML namespace standards. Primary for using them in a Canonical Data Model (CDM), but also interesting for anyone who has to define XML data by creating XML Schema files (XSD). This blogpost is the second part of a trilogy about my experiences in using and developing a CDM. The first blogpost is about naming & structure standards and the third blogpost is about dependency management & interface tailoring.
XML Namespace Standards
A very important part of an XML model, is its namespace. With a namespace you can bind an XML model to specific domain and can represent a company, a business domain, a system, a service or even a single component or layer within a service. For a CDM model this means that choices have to be made. Use one or more namespaces. How to deal with newer versions of a CDM, etc.
Two approaches: one generic namespace vs component specific namespaces
Basically I’ve come across two approaches of defining a namespace in a CDM. Both ways can be a good approach, but you have to choose one based on your specific project characteristics.
- The first approach is to use one generic fixed namespace for the entire CDM. This may also be the ’empty’ namespace which looks like there is no namespace. This approach of one generic fixed namespace is useful when you have a central CDM that is available at run time and all services refer to this central CDM. When you go for this approach, go for one namespace only, so do not use different namespaces within the CDM.
For maintenance and to keep the CDM manageable, it can be useful to split up the CDM into more definition files (XSD’s), each one representing a different group (domain) of entities. However my advise is to still use the same namespace in all of these definition files. The reason is that in time the CDM will change and you may want to move entities from one group to another group or you wan to split up a group. When each group had its own namespace, you would have gotten a problem with backward compatibility. That’s because an element which moves from one group to another, would then have changed from its namespace.
When at a certain moment you’re going to have a huge amount of changes which also impacts the running software, you can create a new version of the CDM. Examples of such situations are connecting a new external system or replacing an important system by another system. In case you have more versions of the CDM, each version must have its own namespace where the version number is part of the name of the namespace. New functionality can now be developed with the new version of the CDM. When it uses existing functionality (e.g. calling an existing service) it has to transform the data from the new version of the CDM to the old version (and vice versa). - The second approach is that each software component (e.g. a SOAP webservice) has its own specific namespace. This specific namespaces is used as the namespace for a copy of the CDM. The software component uses this copy of the CDM. You can consider it as ‘his’ own copy of the CDM. A central runtime CDM is not needed any more. This means that the software components have no runtime dependencies on the CDM! The result is that the software components can be deployed and run independent of the current version of the CDM. This is the most important advantage!
The way to achieve this is to have a central CDM without a namespace (or a dummy namespace like ‘xxx’), which is only available as an off-line library at design time. So there even is no run time CDM to reference to!
Developers need to create a hard coded copy of the CDM for the software component they are building and apply a namespace to the copy. The name of this namespace is specific for that software component and typically includes the name (and version) of the software component itself. Because the software component is the ‘owner’ of this copy, the parts (entities) of CDM which are not used by the software component, can be removed from this copy.
In part III in my last blogpost about run time dependencies and interface tailoring I will advise when to use the first and when to use the second approach. First some words about XML patterns and their usage in these two namespace approaches.
XML Patterns
XML patterns are design patterns, applicable to the design of XML. Because the design of XML is defined by XML Schema, XSD files, these XML patterns actually are XML Schema (XSD) patterns. These design patterns describe a specific way of modeling XML. Different ways of modeling can result into the same XML, but may be different in terms of maintenance, flexibility, ease of extension, etc.
As far as I know, there are four XML patterns: “Russian Doll”, “Salami Slice”, “Venetian Blind” and “Garden of Eden”. I’m not going to describe these patterns, because that has already be done by others. For a good description of the first three, see http://www.xfront.com/GlobalVersusLocal.html and http://www.oracle.com/technetwork/java/design-patterns-142138.html gives for a brief summary of all four. I advise you to read and understand them when you want to setup an XML type CDM.
I’ve described two approaches of using a CDM above, a central run-time referenced CDM and a design time only CDM. So the question is, which XML design pattern matches best for each approach?
When you’re going for the first approach, a central run-time-referenced CDM, there are no translations necessary when passing (a part of) an XML payload from one service to another service. This is easier compared with the second approach where each service has a different namespace. Because there are no translations necessary and the services need to reference parts of entities as well as entire entity elements, it’s advisable to use the “Salami Slice” or the “Garden of Eden” pattern. They both have all elements defined globaly, so it’s easy to reuse them. With the “Garden of Eden” patterns types are defined globally as well and thus reusable providing more flexibility and freedom to designers and developers. The downside is that you end up with a very scattered and verbose CDM.
So solve this disadvantage, you can go for the “Venetian Blind” pattern and set the schema attribute “elementFormDefault” to “unqualified” and do not include any element definitions in the root of the schema’s (XSD) which make up the CDM. This means there are only XML type definitions in the root of the schema(s), so CDM is defined by types. The software components, e.g. a web service, do have their own namespace. In this way the software components define a namespace (through their XSD or WSDL) for the root element of the payload (in the SOAP body), while all the child elements below this root remain ‘namespace-less’.
This makes the life of an developer easier as there is no namespace and thus no prefixes needed the payloads messages. No dealing with namespaces in all transformation, validation and processing software that works with those messages makes programming code (e.g. xslt) less complicated, so less error prone.
This leads to my advise that:
The “Venetion Blind” pattern with the schema attribute “elementFormDefault” set to “unqualified” and no elements in the root of the schema’s, is the best XML pattern for the approach of using a central run-time referenced CDM.
When you’re going for the second option, no runtime CDM, but only a design time CDM, you shouldn’t use a model which results in payloads (or part of the payloads) of different services having exact the same namespace. So you cannot use the “Venetian Blind” pattern with “elementFormDefault” set to “unqualified” which I have just explained. You can still can use the “Salami Slice” or “Garden of Eden” pattern, but the disadvantages of large scattered and verbose CDM remain.
The reason that you can not have the same namespace for the payload of services with this approach is because the services have their own copy (‘version’) of the CDM. When (parts of) payloads of different services have the same element with also the same namespace (or the empty namespace), the XML structure of both is considered to be exactly equal, while that need not be the case!. When they are not the same you have a problem when services need to call each other and payloads are passed to each other. They can already be different at design time and then it’s quite obvious.
Much more dangerous is that they even can become different later in time without even being noticed! To explain this, assume that at a certain time two software components were developed, they used the same CDM version, so the XML structure was the same. But what if one of them changes later in time and these changes are considered as backwards compatible (resulting in a new minor version). The design time CDM has changed, so the newer version of this service uses this newer CDM version. The other service did not change and now receives a payload from the changed service with elements of a newer version of the CDM. Hopefully this unchanged service can handle this new CDM format correctly, but it might not! Another problem is that it might break its own contract (WSDL) when this service copies the new CDM entities (or part of it) to its response of caller. Thus breaking its own contract while the service itself has not changed! Keep in mind its WSDL still uses the old CDM definitions of the entities in the payload.
Graphically explained:
Service B calls Service A and retrieves a (part of) the payload entity X from Service A. Service B uses this entity to return it to his consumers as (part of) payload. This is all nice and correct according to its service contract (WSDL).
Later in time, Service A is updated to version 1.1 and the newer version of the CDM is used in this updated version. In the newer CDM version, entity X has also been updated to X’. Now this X’ entity is passed from Service A to Service B. Service B returns this new entity X’ to its consumers, while they expect the original X entity. So service B returns an invalid response and breaks its own contract!
You can imagine what happens when there is a chain of services and probably there are more consumers of Service A. Such an update can spread out through the entire integration layer (SOA environment) like ripples on water!
You don’t want to update all the services in the chains effected by such a little update.
I’m aware a service should not do this. Theoretically a service is fully responsible that always complies to its own contract (WSDL), but this is very difficult to implement this when developing lots of services. When there is a mapping in a service, this is quite clear, but all mapping should be checked. However an XML entity often is used as variable (e.g. BPEL) in some processing code and can be passed to a caller unnoticed.
The only solution is to avoid passing complete entities (container elements), so, when passing through, all data fields (data elements) have to be mapped individually (in a so called transformation) for all incoming and outgoing data of the service.
The problem is that you cannot enforce software to do this, so this must become a rule, a standard, for software developers.
Everyone, who has been in a software development for some years, knows this is not going to work. There will always be a software developer (at that moment or maybe in future for maintenance) not knowing or understanding this standard.
The best way to prevent this problem, is to give each service its own namespace, so entities (container elements) cannot be copied and passed through in its entirety and thus developers have to map the data elements individually.
This is why I advise for the approach of a design time only CDM to also use the “Venetian Blind” pattern, but now with the schema attribute “elementFormDefault” set to “qualified”. This results into a CDM of which
- it is easy to copy the elements that are needed, including child elements and necessary types, from the design time CDM to the runtime constituents of the software component being developed. Do not forget to apply the component specific target namespace to this copy.
- it is possible to reuse type definitions within the CDM itself, preventing multiple definitions of the same entity.
In my next blogpost, part III about runtime dependencies and interface tailoring, I explain why you should go in most cases for a design time CDM and not a central runtime CDM.