Abstract
W3C XML Schema seems to be on its way to becoming the most popular schema language for describing XML formats today. It is still a relatively new technology, however. While there have been some precursors to W3C XML Schema - notably XML Data Reduced and SOX - these were never widely adopted. Consequently, there has been relatively little experience with versioning schema-based XML component libraries. W3C XML Schema provides a variety of new capabilities that impact the way in which a schema library can be versioned to produce clean backward compatibility.
In discussing this issue, the UBL Technical Committee looked at existing versioning schemes, and considered many approaches for which examples could not be found in the XML world. These include such examples as the versioning of Microsoft's COM objects and other object-oriented schemes, which are mirrored in the capabilities of XML expressed with W3C XML Schema. Their investigations also included brain-storming around the requirements of some on-going production implementations, taking into account the new features of schema-based XML.
Some of the issues included:
whether or not namespaces should be used as packaging for versioning,
how polymorphic processing and extension/refinement of schemas impacts the versioning scheme,
the importance of simplicity in managing versions,
how major and minor versions should be distinguished,
the relationship of the versioning of a schema library to the versioning of the applications supporting it, and
how a component library intended to be extended by its users could best version its core libraries, when changes would directly impact external users of the library.
The lessons learned in the UBL investigation provide a valuable overview of the issues surrounding this topic, and reveal some of the power inherent in W3C XML Schema when it comes to devising a versioning scheme.
Keywords
Table of Contents
W3C's XML Schema Definition language [1] - "XSD" - is becoming the de facto standard for describing XML[2] formats. While there have been (and there are) other schema languages (simpler, arguably better, or just very different in approach), none of them is being embraced to the same extent as XSD. Many XML applications based on the use of DTDs in the definition of their formats are now migrating to XSD, while the majority of new XML formats are being described for the first time in XSD.
Because of the newness of schema languages, little thought has been given to the problems of versioning subsequent releases of a particular XML vocabulary or format. The XSD specification itself does not discuss versioning in great detail, leaving this decision up to the designer of a specific schema, who presumably has a better grasp of the requirements of the application. Despite this, XSD provides us with a great deal of power in determining how best to version our XML vocabularies. As with many aspects of XSD, this is accompanied by a fair degree of complexity, however.
The Universal Business Library [3] (UBL) is a new vocabulary intended for use as a global e-business standard. Its first release is imminent. It is based on many sources, among which is Commerce One's XML Common Business Library (xCBL)[4], version 3.0. xCBL was one of the first vocabularies to be based on an XML Schema, being expressed in Schema for Object-Oriented XML (SOX)[5] and XML Data Reduced (XDR)[6] from the 2.0 version onward, before XSD was published. Consequently, UBL has benefited from the xCBL experience in understanding how XML schema languages impact versioning.
It is hoped that this presentation of the considerations within the UBL Naming and Design Rules Technical Committee will serve as a good introduction to the concepts and mechanisms in versioning XML-Schema vocabularies, and will illustrate some of the design choices and their anticipated consequences.
UBL is an initiative that has its roots in the ebXML initiative, particularly in the Core Components effort within ebXML[7], being based on the Core Components Technical Specification now available from the TBG group within UN/CEFACT. UBL is an OASIS Technical Committee, with a version 1.0 release of the library that should be available soon, as it goes through the OASIS standardization process.
The focus of UBL is on business-to-business e-commerce, and the initial release contains core e-business documents such as purchase orders and responses, invoices, despatch advices/advance ship notices, etc. It is also designed and intended to serve as a library of core XML types for use within other, non-UBL documents. An inherent part of the UBL design is the assumption that users will need to extend it to meet their own application needs within industry verticals, specific countries, etc.
UBL will, if successful, probably grow to encompass dozens of basic document types. While this is not the case with the first version, a lot of attention is being paid at this early stage to versioning because a bad or ambiguous versioning strategy may impede future growth.
UBL is divided into two primary areas: the "library content" area and the "naming and design rules" area. (While there are many different sub-committees, this is the major division of the TC, and one that is reflected in how the group organizes its face-to-face work). The versioning strategy is seen as an aspect of the library design, while specific content changes would be the result of work going on within the library content area. Consequently, the initial discussion about how UBL releases should be versioned have taken place within the naming and design rules area, in anticipation of the needs of library content in the future.
The scope of an XML application - like any other application - has important consequences on versioning requirements. The more contained an application is, the less difficult the versioning problems are. For a small, single-enterprise application, in which the implementers and maintainers have complete control over what versions are supported, a strategy that requires simultaneous upgrades in order to support a new version of a vocabulary is feasible. However, even within large, single-enterprise applications this type of version-migration can cause tremendous difficulties, and can be very expensive. This is magnified many times over in the case of multiple-enterprise applications. If we look at the degree of planning and expenditure required to upgrade ERP systems, for example, we can get some sense of how difficult this can be. (And it is worth noting that most ERP software vendors place strong emphasis on the ease of migration between application versions.) Difficulties with application upgrades are a primary reason for the popularity of "thin clients" that can be downloaded at the time of use.
UBL as an XML application has a vast scope when considered in this light. It has the inherent problem of being an inter-enterprise standard to begin with, which makes the existence and simultaneous use of several versions an inevitability. There is no effective way to coordinate the point at which different users will upgrade to the same version, because users need to respond to the dynamics of their own trading community, and their own needs, resource levels, and restrictions.
At a strictly "vocabulary" level, these problems are not intractable. It is possible to have several public releases available at any one time, and to use public registry/repositories to make sure that users can always find the historical version they are looking for. While such an approach does have implications for how a versioning strategy operates (see below), it is not very problematic.
What makes the scale of UBL versioning difficult is the requirement to support trading partner interoperability to the greatest possible extent. (Remember that UBL as such has no control over the trading-partner applications themselves, and thus cannot resort to a "thin client" strategy.) Given the capabilities of the developing e-commerce infrastructure, it is not unreasonable to foresee systems capable of "match-making" between the known capabilities of registered applications, and finding a version-compatibility if such exists. Ideally, however, user applications should be able to support a very few versions of any given document type, and interoperate with the widest possible number of trading partners.
The trend within the e-commerce world is for trading-partner agreements to be reached in less and less time, with the ultimate goal of computer-moderated negotiation always in the back of one's mind. Unlike current e-commerce systems, where the agreement on technical standards and their implementations can take several months, people today are talking about "plug and play e-commerce." The idea is that a trading partner could be located in a central registry, and that all the information needed to automatically integrate systems could be downloaded. Applications could self-configure at run-time to support the needed interfaces, and the entire cycle from discovery through ordering to delivery and payment could be automated, or at least require a minimum of human intervention. Even if we accept that this is a future possibility, and not a present reality, the trend is still clear: protracted implementations as part of trading partner adoption should not be required. Applications should be required only to support a reasonable number of versions of a particular standard document type to have a reasonable expectation that they will interoperate with new-found trading partners.
This vision is very much embedded in the ebXML family of standards from which UBL has emerged, and it accounts for much of the interest in "web services" architectures that can potentially make such a vision a reality. When the global scope of UBL is taken into account, however, it can be understood that versioning is a critical element if this vision is ever to be made a reality.
If we look at existing DTD based e-commerce XML vocabularies, or non-XML syntaxes such as those used in EDI, we will find that a pattern of "simple" versioning exists: wholesale replacement. If we look at EDIFACT, for example, we will see that twice yearly the structural definitions are updated, with no guarantee of compatibility between versions. Code-lists, an integral part of the EDI architecture, are also updated with a similar frequency, but are packaged separately. (They also are managed with an implicit guarantee that existing coded values will not be removed from version to version, and so are backward-compatible. It is not uncommon to see users update to current versions of code-lists, while staying with an older version of the structure definitions.)
This same "replacement" approach is also used with some XML vocabularies. In these cases, an instance simply states which specific version of a DTD it requires, and the parser will load it and proceed. There is no assertion of a particular relationship between any two given versions of the vocabularies. While mechanisms such as parameter entities may make the expression of changes easier to implement, they do not provide any guarantees about the changes themselves, or how similar two versions are.
While this approach has been made to work fairly well in the past, it is not, perhaps, enough to support easy, quick trading-partner integrations. Further, it begins to pose questions about the identity of the versioned entity: what is a version other than a wholly-different-but-similar DTD or structure? (Clearly, such questions can be answered, but these answers are application-specific, and often inconsistent in practice.) If a version-relationship provides no useful guarantee regarding interoperability, then what use is it? Obviously, this is a bit extreme, but aside from issues of marketing and perception, the "simple" versioning approach has not much to recommend it (other than its simplicity, of course).
Some vocabularies have adopted a variation of the "simple" approach, in which some version changes are characterized as "major" and some as "minor". This is based on the practice within the software industry, where a certain degree of backward-compatibility can reasonably be expected of minor versions. Historically, however, there has not always been a strict definition about what backward-compatibility actually is - it almost seems as if this ambiguity has been leveraged by some software vendors to promote sales of software upgrades. A good example from the non-XML world is RTF ("rich text format"), in which most - but not all - of the format's features behave the same across product versions.
Generally speaking, however, the idea behind major and minor versions is a sound one, if only to help users understand the extent to which a given vocabulary differs from the preceding version, according to its own definitions of minor and major version relationships. (Or, as a cynic might say, it helps users predict how many new bugs they'll encounter in the new version.)
It is helpful to characterize the types of changes that are commonly demanded in the e-commerce arena across versions.
Addition of Document Types - As noted above, UBL in its first release will consist of a small number of "core" document types, with the understanding that this will expand to several times the size of the initial release in subsequent versions. This is typical of nearly all e-commerce vocabularies.
Addition of Functionality to Existing Document Types - These changes typically involve the updating of existing document types to reflect changes in the way business is done (for example, the addition of data to allow use of new payment mechanisms, such as P-Cards). Typically these changes are the result of user demand.
Bug Fixes - Examples include the addition of overlooked structures to allow supported functionality; the removal of unintended redundant data; the correction of typos or incorrectly-formulated names; etc.
Codelists Extensions - in any vocabulary that uses enumerated data types, enumerations must often be updated to reflect changes in the real world, and to accommodate user requests.
These types of changes can be considered "major" or "minor," depending on the definition of these terms, and depending on the actual impact that they have on the instance. The way in which UBL will handle each category of change is discussed below.
One of the core aspects of versioning is the identity of the entity being versioned. The increasing use of XML namespaces raises the issue of how they should be used in any versioning scheme.
The scope of UBL, and the potential for having several versions of the same schema extant at the same time, places emphasis on some particular aspects of versioning. First, the manageability of a versioned "package" becomes very important, because there is little or no control over the applications that use it. Users must be able to easily distinguish one version from another, and applications should be able to do the same, preferably without having to implement any functionality that is specific to one particular version. (It should be pointed out that, while XSD does offer us a "version" attribute on the schema element, there is no standard on how that field is to be used. Any versioning approach that relies on it, or similar other "version" attributes applied to various elements, is in essence proprietary.)
There are several candidates for packaging versions, of which namespaces is only one.
The most obvious strategy, perhaps, is the file. With the use of strict naming conventions ("UBLCoreV1_1.xsd", for example) we could package a library into one or several files. Within the names of the files, and perhaps inside the files, we could indicate the version, using a versioning field of some sort - an attribute or element.
However, when compared to the use of namespaces as modules, the disadvantages of this approach are apparent. While namespaces are understood natively by most XML software as representing a domain of limited extent, owned and maintained by a specific agency, the necessity for customized code when using filenames as version packages becomes problematic. An XML instance must specify exactly which namespace it is structured to validate against; the same is not true of a file. There is no native mechanism for expressing, in an instance, which particular file must be used in a way that would differentiate between versions.
What results is a single namespace resolving to whatever file happens to be associated with that namespace. Nothing in the instance guarantees the use of any particular version. Having the contents of namespaces change from version to version when multiple versions need to be simultaneously supported is clearly messy and difficult to manage.
A popular XSD design approach, known as the "Chameleon" approach, relies on the late-binding of structures into namespaces. This type of design is incompatible with the use of namespaces as a versioning mechanism if it is to be useful in any way - the late-binding can be made in the schema code, but only one such binding can be allowed. This defeats the major reason for using the "Chameleon" design. (Arguably, in an application with the scope of UBL, the Chameleon design is problematic in any event. What is needed to manage successfully a component library on this scale is complete clarity about versions. Any strategy that involves changing or concealing the contents of a namespace after it has been published makes the management of the library difficult, if not impossible.)
It is worth remembering that UBL is designed to be extended and reused. If a user external to UBL imports a schema module, and makes extensions to it, and then the contents of that namespace change, the user's instances themselves might break. This is not acceptable, given that the user has no control over the applications themselves.
An often overlooked factor about namespaces is that XSD also allows importing and including namespaces inside one another. This functionality is another advantage of the use of namespaces as the versioned packages inside libraries: namespaces can be used not just as top-level packaging mechanisms, but as mechanisms for introducing modularity into the schema library.
As a result of this analysis - and for some other reasons, as will be shown later - UBL has chosen to use namespaces as the versioned package, with a rule that specifies unambiguously that once published, the contents of a namespace will never change. This rule avoids the late-binding effects that make external extensions problematic, and militate against the easy use of namespaces as "modules" within the library. However, it does have some less-desirable effects as well: it requires the establishment of some other conventional rules within the modular scheme of the library (see discussion below).
One of the design goals of XSD was to make XML become more aligned with object-oriented technologies such as Java, C++, UML, etc. While the match will never be exact, given the different objectives of a markup language on the one hand and tools more focused on application functionality on the other, it is clear that XML is becoming a very common serialization format for the exchange of data between object-oriented applications. One could argue that it already has, or will soon, become the dominant paradigm within the e-commerce space - and with the advent of technologies such as JAXB, this is a defensible position.
Given this alignment, it is important to learn as much as possible from object-oriented versioning, especially as it concerns the use of XML as a serialization format. One of XSD's predecessors was SOX, a schema language that was explicitly designed to align with object-oriented systems. In xCBL's use of SOX, the versioning strategy was (not surprisingly) deliberately based on an object-oriented model, that of COM objects.
COM objects have very strict rules about versioning, which can be summarized as follows:
Minor versions: Any calls or data supported by a previous version must be supported by all subsequent versions. Without this rule, minor versions of COM objects could not replace one another. This replacement feature is used so that a newer minor version of a COM object can be installed on a machine at the same time as a piece of software that relies on the features added in the minor-version upgrade, without breaking any piece of software that relies on the functionality of previous minor versions. Commonly known as "backward compatibility," this is an obvious rule for software components such as COM objects.
Major versions: Previous major versions of COM objects are not replaced by newer major versions, but simply installed alongside. Major versions provide no guarantee of backward compatibility, but merely a sameness of overall functionality. xCBL adopted this exact model in its versioning scheme. It has some big advantages when applied to an XML component library. The XML-Java binding that SOX supported (very similar to JAXB, in fact) meant that the elements and attributes in an XML structure corresponded 1-for-1 to similar data members in the Java classes. A lack of backward compatibility in the XML would result in a similar lack of backward compatibility in the product versions that used it.
COM versions are not the only ones using this strategy. Another example can be found in SunOS, the operating system that is one of the components of the Solaris Operating Environment. Here too, minor versions are guaranteed to be backward compatible, while major versions make no such promise (although a major breakdown of backward compatibility is not in the cards.) One further example, showing how common this versioning strategy has become, even outside Object Orientation and software in general, is that of DocBook, probably the most widespread schema used mainly but not exclusively for the documentation of computer software, which again promises backward compatibility between minor versions, and reserves major versioning for those features that break it.
As an aside, it should be remembered that "minor" and "major" in this discussion only refer to some characteristics of change, not to any quantitative measure. In other words, just one change is enough to qualify a version change as "major", if that single change breaks backward compatibility; similarly, massive changes may result in a "minor" version change, if backward compatibility is not breached.
Another object-oriented feature of XSD (and before that, SOX) is its capability to support type-aware processing. XSD's type-awareness is based on object-oriented inheritance, and it includes what is known as "polymorphic processing". Polymorphism is the ability for an element to be known by more than one name - in essence, the names of all of its ancestors in an object-oriented inheritance sense: given a type X, and given an extension or refinement of it, type X1, then an element of type X1 can be substituted in any document where an element of type X is required. This effect carries on down the chain of inheritance, and is reflected in XSD's extension and refinement mechanisms.
This mechanism has great power when we consider backward-compatibility. It will not have escaped notice that the versioning rules for COM objects minor versions have their direct parallel in XSD's extension and refinement rules. You cannot extend or restrict anything in an XSD schema that would make the child backward-incompatible. Because this idea of type-inheritance operates the same in XSD as it does in object-oriented languages, it makes sense that we would want to use it in capturing minor-version relationships. With polymorphic processing, we have a tool that guarantees backward compatibility between objects in a single chain of inheritance.
This subject, although a bit complex, is worth exploring - it provides perhaps the strongest argument for using namespaces as the package for versioning, although in a non-obvious fashion. The basic proposition is this: by allowing, between minor versions, only changes permitted by XSD extension and restriction, we can ensure that all minor versions of an element are backward-compatible.
This brings us to the question of what kinds of changes are typically made in minor versions of schema libraries. We presented four cases above:
Addition of Document Types - There is no impact: the new document type will have no earlier versions to be compatible with.
Addition of Functionality to Existing Document Types - There are two cases:
where the new functionality is so different that the type is given a new or modified name, and
where the new functionality serves the same basic purpose, and therefore carries the same name.
Bug Fixes - it is usually the case that the new minor version of the type will carry the same name as its predecessor, and
Codelists Extensions - only restriction is permitted, so minor versions are somewhat problematic if we only XSD extension and restriction are permitted.
In the majority of cases, the "new" type in a minor version will have the same name as the old type, because it carries the same basic semantics and function. This is where the use of namespaces as packaging modules becomes important. If we capture each minor (and major) version in its own namespace, then we can assign the same local name to the type, and simply make whatever extensions or restrictions are called for.
As an example: In the version 1.0 namespace, there is an element X that has a sequence of children A, B, and C. To provide for extra functionality in version 1.1, D is added to the content model of type X. So, in the namespace with version 1.1, we can declare a new type X that extends the version 1.0 type X by adding D.
The result of this is quite impressive when we consider the power of polymorphic processing: anyone who wishes to process an instance of version 1.1 X with an application that only understands the data from X in version 1.0 will be able to do so cleanly, since X is backward compatible. The fact that these semantically and functionally identical types have the same name is powerful, since there is no need to come up with contrived names for bug-fixed or slightly-altered minor versions.
This is especially true in a library such as UBL, where the naming rules are based on a modelling methodology such as ebXML/UNCEFACT Core Components. Because the naming rules in the modelling methodology are precise (and based on ISO 11179, naturally) it may well be the case that there is only one *correct* name for the component in any event.
This leveraging of the object-oriented features of XSD gives us is the ability to guarantee functional backward-compatibility for all minor versions of a schema-library namespace. Any type-aware application built to understand version 1.0 will automatically understand 1.1, 1.2, 1.3, etc.
Given the scope of UBL, and the need to support multiple versions simultaneously, this is a tremendous advantage. (It may be of interest to note that type-aware processors promise to become common - XPath 2.0 will be type-aware, making the standard technologies that rely on it similarly type-aware. One can be sure that the developers who write the processors for these standards in object-oriented languages such as Java and C++ are aware of the benefits of this increase in object-orientation on the part of XPath.)
It should be clear at this point what path UBL has chosen, but it is worth examining the specific versioning scheme, both to serve as an example of what we are describing, and to illustrate some of the side-effects.
UBL uses namespaces as modules, with the rule mentioned above that once published, a namespace will never change. The URN syntax adopted by UBL for its namespaces has major and minor version information embedded in it - each major and minor version of a module is assigned a URN. These modules themselves are arranged in layers of dependency, to reflect the probable patterns of reuse.
The central "layer" is a set of namespaces that contain those constructs that are commonly used throughout the library. One namespace contains the "core component types" which are the leaf-level constructs from Core Components modelled in XSD. Another set of custom simple types are contained in the "common leaf types" namespace, and a set of commonly used complex types is contained in the "common aggregate types" namespace.
These namespaces are imported by the second-layer namespaces, which reflect functional areas. Each second-layer namespace has a set of declarations that are specific to a business process or functional area: the documents used for ordering, for example, are contained in a single namespace, largely because any application that supports that process will need to have the entire set available (where it might not need the schemas specific to another process such as shipment notification). Given the eventual size of an e-commerce library, memory management for parsing can become an issue, and this helps to solve the problem. Typically, components designed to support a process are re-used in different document types throughout that process, so these namespace grouping reflect the patterns of reuse in real-world applications.
There is an anticipated third layer as well, which will consist of namespaces maintained by external users, full of extensions to the UBL library. It is assumed that these extensions will be made to existing document types, so a set of industry- or domain-specific namespaces would grow up around each of the functional areas. (External users wishing to extend UBL must do so in their own namespaces.)
It should be noted that XSD imports have a little-known property: they are not transitive. In plain language, this means that I must directly import any namespace containing types that I wish to extend. In UBL, this is reflected in the rule that any minor version Smust import the corresponding namespace of the immediately preceding minor version, and may import other namespaces as needed. Major versions have no such requirement, because there is no guarantee of backward compatibility.
The arrangement of namespace modules within UBL results in some necessary rules: if the central, or "core" layer of namespaces is versioned, then all second-layer namespaces must also be versioned if they are to leverage the new common constructs. Second-layer namespaces, however, can be versioned independently, since other second-layer namespaces have no necessary dependency on them. Core namespaces, of course, do not import second-layer ones. This means that UBL as a whole has no actual version - it is merely a group of interdependent modules that are versioned according to the rules given here. (Of course, the marketing TC will give it a single version anyway, for branding purposes, but that is not a technical reality, merely a marketing one!)
The major benefit of this modular approach is that functional areas can be managed separately without having any impact outside of themselves and the people who choose to extend them for specific domains (the likely source of requirements for improvements and bug fixes in any event...) A problem in the second-layer shipment notification namespace will have no affect on the invoicing namespace, and they can be managed and released to meet the needs of their users. Otherwise, the management burden across an ever-wider set of business processes would become intolerable.
As noted before, minor versions import their immediately preceding version and make only those changes allowed by XSD extensions and refinement, thus guaranteeing backward compatibility. Further, the name of a type cannot be changed unless the intent of the change produces new semantics, making the library easier to understand across versions, and significant changes easier to spot. For major versions, everything starts over - the preceding major version is not imported, if only to make sure that the chain of inheritance does not itself become too huge, and all types are declared as if for the first time. (The modelling methodology guarantees a certain level of continuity here that might otherwise be worrying.)
It should be noted that UBL's decision to not capture codelists as native enumerations, but to provide an external binding mechanism leaves codelist maintainers free to version according to the needs of their code lists. (This is too long a subject-matter to go into here, but information is available at the UBL TC site on http://oasis-open.org/committees/ubl/ ).
UBL has abroad scope, and an immense scale as an application.This places tough requirements on it in terms of versioning. E-commerce applications are themselves very forward-looking, and do much to increase the difficulty of meeting versioning needs, by demanding "plug-and-play" interoperability. XSD contains object-oriented features that help us meet the challenge of these requirements however - inheritance, as expressed through extension and refinement, and the potential to support type-aware, polymorphic processing. The models of COM and Solaris demonstrate to us how software strategies have solved the versioning problem, and we can leverage this experience directly. In XML namespaces, we are given a native mechanism that provides the perfect packaging and modularity mechanism for an object-oriented versioning scheme, and one that lets us keep our type names consistent and meaningful.
UBL serves as a demonstration of how the complexities of XSD can be leveraged to help promote cross-version interoperability, rather than simply producing confusion, fear, and pain. It is hoped that, with the increasing number of XSD-based XML vocabularies, this type of a versioning scheme can be used to help other standard schema libraries facing similar challenges.
![]() ![]() |
Design & Development by deepX Ltd. |