Abstract
This paper addresses the interplay between complex business requirements and technical design to maximize interoperability by leveraging the XSD type-derivation hierarchy. It provides a brief background of the OASIS UBL TC, the work of the Context Methodology subcommittee, and the concept of processable business context description as inherited from the ebXML Core Components work. The focus is on the difficulties of modelling complex business relationships within the limitations imposed by XSD type derivation mechanisms. For example, how can the perceived need to delete required components be made to work with XSD's type derivation constraints? While very much aligned with the spirit of the XSD specification, the methodology described results in a powerful and non-obvious design philosophy that impacts both document design and management.
UBL places emphasis on interoperability within business-to-business e-commerce, both within and across verticals. The range of business requirements is vast, and a number(B of standards organizations has evolved to try to provide interoperable specifications. The UBL approach to extension and derivation of XML business documents leverages this existing hierarchy, and utilizes the type-derivation and polymorphic features of XSD to allow a greater degree of inheritance and interoperability than has historically been possible. Ultimately, XSD is used to model the relationships existing among the players in the business world, defining and limiting how type derivation may occur within a UBL-compliant universe.
This approach resolves the tensions between the XSD mechanisms and complex real-world business requirements. The paper provides a unique view of how UBL will function in the realm of standardization, as well as exemplifying a design philosophy that has implications for application design and the general use of XSD to promote data interoperability.
Keywords
Table of Contents
This presentation covers the approach currently being considered within the OASIS Universal Business Language TC for providing an extension mechanism that produces the maximum degree of interoperability. While W3C XML Schema provides a number of features that allow XML to behave in ways aligned with the capabilities of object-oriented languages to leverage type hierarchies (such as the ability to extend and refine types, and to capture the structural relationships between them) these capabilities do not easily match the business requirements for extension within the realm of document standardization for e-commerce
UBL has found a way to leverage the OO-like features of W3C XML Schema to meet these business requirements, by leveraging the type hierarchy to support polymorphic processing of XML. Ultimately, this provides a degree of interoperability that is otherwise difficult to achieve.
The key to this approach is to understand the business domain levels at which standards bodies function, and to realize a mechanism that reflects this understanding. Business documents are standardized in a general way at a high level, and become increasingly specialized as they are standardized at more specific levels by industry groups and trading communities. While the solution UBL has come up with is not obvious, it allows us to model this real-world specialization in a way that is consistent with the extension and refinement mechanisms of W3C XML Schema. Ultimately, polymorphic processing of the business documents realizes a core of interoperable data that utilizes this set of specialization relationships.
The scope of UBL is extremely broad, but addresses a problem space that has been explored before: interoperability between business applications conducting e-commerce. The role of extension methodologies in producing interoperability is new to schema-based XML solutions, but the role of standardization itself predates the technology. Looking at the approach used by the major EDI standards helps us to understand the problem space.
X12 and UN/EDIFACT are the major EDI standards. They are maintained by "top-level" organizations: in the case of EDIFACT, the organization is UN/CEFACT, and it has a global scope (hence its association with the UN). X12 is maintained at a national level by ANSI. In neither case are these standards easy to use (nor intended for use) "out of the box" - they both take what we will characterize as a "kitchen sink" approach. That is, they define all of the possible standard uses of their syntax. While it is possible to produce applications that cover every possible use of data within the standard document definitions, this is not typically how EDI systems are implemented by trading partners.
Instead, business communities - such as an industry-vertical organization or a national one - will take the standard as given, and identify which sub-set of the possible data elements will be used for their purposes. This activity involves both pecifying a given data element and specifying exactly how it will be used within that community. Because the EDI standards themselves allow many ways of expressing the same data, the creation of these "implementation guides" is critical in producing interoperability between trading partners. Examples of such implementation guides for X12 would be those produced by industry groups such as AIAG (US Automotive), AIA (US Aerospace), PIDX-CIDX (Petroleum and Chemicals), or EIDX (electronics) to give just a few examples. These bodies are formed for the purposes of standardizing business process and data exchange within their industries. Often, they have a national affiliation or focus. These groups are concerned not only with EDI implementations, but also XML-based ones.
Having an implementation guide at this level is helpful - it speeds implementation and reduces the cost of e-business by narrowing the scope of data that a particular application needs to support, but typically a given pair of trading partners will take this at least one step further: a given trading partner agreement will contain a definition of a sub-set of the relevant industry standard implementation guide. In some cases, this subsetting is taken down to the departmental level within large enterprises.
In its simplest form, we can understand this as a simple subsetting process: the EDI standard is at the top of the chain, which is subset by an industry vertical implementation of that standard, which is in turn subset by a trading-partner-agreement-specific implementation of the industry subset, and so on. Naturally, the real world is not so simple...
The problem is that it is impossible to know ahead of time the needs of every trading partner for inclusion in the top-level standard. As a result, specific additions to what is provided by the parent standard must be made at the industry or trading-partner level. In the EDI world, the way this was officially done was to take whatever changes were required to support the needed data and to submit a change request to the maintaining body, which resulted in - hopefully - a change to the standard concerned. While there was some provision for "custom" codes and data to be included in these documents, this produced no real gain in interoperability beyond the support that trading partners built into their systems when they agreed on the customization - you could not realistically expect a new trading partner to support your customizations without doing further integration, even if they supported the core standard. Many XML business-to-business vocabularies operate with similar mechanisms.
Someone who has never participated in a standards process cannot appreciate the time required to make and approve even a simple change. Even non-controversial changes are subject to the maintenance cycles of the standard bodies, and controversial changes can result in something very different from what the submitter intended, a long time after the submission. Naturally, trading partners did not always wait for the official bodies to approve their requests - they implemented according to the needs of their business, and let the standards follow on at their own pace, with any deviation from their request handled afterward, if at all.
As an extension mechanism, this was far from optimal, and it tended to result in similar-but-non-interoperable implementations based on a common standard, rather than true interoperability. Trading-partner integrations were lengthy and expensive, and this slowed the rate of adoption of the standards and the degree to which e-commerce replaced paper commerce. The goal of UBL is essentially the same as that of the EDI standards bodies - to produce interoperability between trading partners at a global level. Ideally, however, the "extension mechanism" will produce something that functions more effectively, and which produces a higher degree of interoperability.
UBL accepts that the relationship of standards bodies within the business world is a reality, and a beneficial one. Standardization is needed at many levels, and the success achieved by international, national, and industry-vertical standards groups should not be ignored, discarded or replaced. What UBL hopes to achieve, however, is to use a different approach toward accommodating customizations. This approach dictates that the UBL standard itself, at the highest level, becomes very different from the EDI standards (and, in fact, different from many XML-based standard business vocabularies).
UBL aims to provide a useful core set of XML documents and reusable components - the "80% case" which provides a useful set of data structures without claiming to be fully complete - with the intention that this core would be taken by various other groups and extended and refined to reflect the needs of different types of business. This phenomenon would be the same as in the EDI world - industry-vertical or national groups would take the UBL standard, and make whatever changes are needed, and trading partners would take the industry-vertical or national standard and further extend and refine it. Because the core library does not pretend to cover every possibility, it relies on a much richer and more fully controlled extension mechanism. Moreover, with schema-based XML we have a set of tools that will allow us to realize a much greater degree of interoperability, and in a much more efficient fashion.
When we look at the end goal - interoperability - we have to recognize that data variation in e-commerce systems is often driven by real-world needs within an industry or trading community. But equally, the variation between vocabularies is the result of other factors, and could be avoided. What we are aiming for is to have the same data called the same thing by everyone who uses it - with variation only where required.
Before proceeding, it might be useful to understand where the UBL initiative sits in relation to other related e-business standards activities, and to have some other background about the direction UBL is taking. (For a more in-depth understanding of UBL generally, readers are directed to the presentations by Mark Crawford, Jon Bosak, Eve Maler and others in this forum.)
UBL came about as a result of ebXML, a joint effort between UN/CEFACT and OASIS. This initiative was limited to 18 months in duration, and while it produced a great deal of interesting work, it did not attempt to define standard XML schemas for e-business. Instead, it focused on producing semantic models that would serve as the basis for e-business documents defined in any syntax. UBL arose out of the short-term need for XML document definitions, to be based as much as possible on the work of ebXML and the Core Components Technical Specification which was continued within UN/CEFACT.
UBL is an on-going effort in OASIS, with a 1.0 version just released, but with a considerable amount of follow-on work before it realizes its goals. Among the decisions it has made is that the W3C Schema definition of its XML documents will be the official, canonical version. This decision becomes important when the extension methodology is considered, because of the mis-match between the extension capabilities of W3C XML Schema and the business requirements in this area.
As mentioned earlier, UBL adheres as closely as possible to the ebXML (now UN/CEFACT) Core Components Technical Specification. One of the provisions within this specification is known as the "context mechanism," which has a strong bearing on the UBL extension methodology.
The idea of context is fairly simple: business situations are uniquely describable at a meaningful level using a set of computer-processable classifications. This is done using a set of eight attributes, each of which functions as an axis in an eight-dimensional matrix. This is informally known within UBL as "eight-space." These attributes can be understood as follows:
Business Process
Geopolitical
Trading-Partner Role
Third-Party Role
Industry
Product/Service
Legal Constraints
System Capabilities
Each of these axes is supported by a classification, and for each attribute a single value, a range of values, or a list of values (or any combination of these) can be supplied. As a result, each business transaction can be described with this system, and differentiated from any other type of business transaction. This is useful for UBL because it allows the people extending and refining UBL schemas to identify the business situation that necessitates those extentions and refinements and for which tthey are appropriate. The Core Components Technical Specification allows for a constraints language that ties changes to a base semantic model to the business situation for which they are appropriate.
To illustrate what has been described in the previous section, look at the following (admittedly extreme) example. (We will revisit this example later to illustrate the UBL approach to solving this problem.)
A mobile oil-drilling platform operating in the North Sea needs to order spare parts for delivery via satellite hookup. As part of the business data needed to accomplish this transaction, they need to specify the location to which the spare parts are to be delivered. Let's say (for the sake of illustration) that the standard "80%" UBL library has been modified by an Oil & Gas industry group. The address component is used in the Oil & Gas UBL profile to specify where goods are to be delivered. This Address requires such fields as Street, City, State/Province, and Country. It does not contain GPS Coordinates.
For our oil-drilling trading partner, however, this is exactly wrong. GPS coordinates need to be mandatory, and the typically required address fields need to be completely disallowed, since none of this information is relevant to an oil-drilling platform situated in international waters.
In customizing (that is, extending and refining) the core UBL schema, the oil-drilling platform could describe its context as follows:
Business Process = Ordering, Delivery
Geopolitical = International Waters
Trading-Partner Role = Buyer, Recipient
Third-Party Role = None
Industry = Oil Drilling Platform
roduct/Service = Spare parts
Legal Constraints = None
System Capabilities = None
The business situation described is supplying a delivery location in a spare parts order, with a Buyer/Recipient drilling for oil in international waters. Having thus identified our business situation, we can then tie the needed schema modifications to this situation: remove Street, City, and State/Province from our Address, and insert GPS Coordinates as required.
W3C XML Schema is a very useful and complex standard. Among its achievements is the ability to align XML data with object-oriented systems. (This is not coincidental: the creators of SOX - the Schema for Object-Oriented XML - were members of the group that created W3C XML Schema.) But W3C XML Schema also provides for many other applications, and it is not always easy to choose which capabilities are useful in a given application.
We will clarify here the parts of the W3C XML Schema specification that are important to the extension methodology, and describe how these might figure in an implementation. The object-oriented aspects of W3C Schema are obviously of great importance here.
W3C Schema allows us to describe types - in the case of UBL, global types - that can be re-used, extended, and refined. In UBL, the anticipated use of this construct will be to have elements referencing an existing type. So, if we want to have a BuyerAddress, we can simply declare it and then reference the AddressType to describe its structure.
Extensions (both additive and subtractive) allow us to take a UBL AddressType, and extend it in a type declaration of our own - MyAddressType. We simply reference the UBL type, and then declare whatever changes against that type we need to make. If we want to then create an element that is of our extended type, we simply reference our extended type from our element declaration: MyBuyerAddress, for example.
The significant part of this mechanism is the explicit relationship between the UBL base type and our extended type. Because this relationship is explicit, it is available to any application that processes documents which use elements based on these types. This is a mechanism with which W3C XML Schema allows us to describe oo-like type hierarchies, since our extension can be referenced by someone else and further extended, resulting in chains of inheritance.
Further, we have a provision for declaring abstract types that are only available for extension by other types, and cannot be directly referenced from an element declaration. This becomes significant in the UBL approach, as will be explained below.
We need now to look at exactly how additive extension and subtractive refinement functions, since this is the mechanism UBL plans to use to express its extension relationships.
Additive extension allows a type extension to add attributes to its base type, and to add elements at the end of a sequence or an implied sequence (e.g. a single-element content model). It is possible to add a group containing a choice at the end of a sequence, as well as a single element. Additions may have whatever cardinality (required, optional, etc.) is needed.
From a business perspective this is quite limiting: I cannot insert elements into the middle of a sequence, nor at the beginning, which might seem natural places to insert data for some business cases.
Subtractive refinement is somewhat more limited, and for a very good reason: when a child type extends a base type, any instance of the child is assumed to contain everything needed to produce a valid instance of its parent. This is, in essence, what the type hierarchy asserts against our elements. It does not restrict the inclusion of additional information - it merely requires that the child have all data needed to produce a valid instance of the parent.
In the OO world, this mechanism ensures backward compatibility, and the XML mechanism is based on exactly the same principles.
The capabilities of W3C XML Schema's subtractive refinement are a reflection of this: you can change the cardinality of an optional element to zero, thus disallowing its use; and you can require the inclusion of an element in the child that is optional in the parent (although you cannot require the inclusion of more than the maximum number available to the parent). What you cannot do is disallow the inclusion of an element that was required in the parent.
This is quite inconvenient, if not downright frustrating, if we consider some business requirements. Consider the case of our North-Sea oil platform. The requirement was to add some elements to Address to cover GPS coordinates, and make them mandatory - this was no problem, as discussed in the section above, although they have to be added at the end. The other need was to disallow some required elements (Street, City, State/Province), but this cannot be done!
So far, we have been discussing extension and refinement to complex types. For simple types, the mechanism makes some different provisions. When refining simple types, it is only possible to restrict the set of data that is valid within that type. For example, we can have a string data type that allows up to 20 characters. An extension of this simple type could be a string that allowed only 14 characters - it is still a valid instance of its parent. But we would not be allowed to have an extended type that permitted 25-character strings, since this would allow the child simple type to have instances that were more than 20 characters.
For UBL, this is not terribly difficult from a business perspective, for a variety of reasons. Partly, this is because UBL will not generally be very restrictive about its simple types (a document design consideration rather than an aspect of W3C XML Schema). Also, enumerated simple types - typically code lists - are handled through an entirely different mechanism that allows inclusion of external code lists, thus avoiding what is generally recognized as a potentially huge problem in terms of business requirements (see Eve Maler's presentation in this forum regarding the UBL Naming and Design Rules SC.)
Perhaps the main reason for UBL's use of W3C's XML Schema extension mechanism to express extensions and restrictions is that it allows the polymorphic processing of business documents. (This is certainly not the only reason - generic tools support is another major reason, for example.) Leveraging polymorphic processing is not an inherent aspect of a W3C XML Schema-based XML application, but it relies on the features of the schema language that we have just discussed.
Polymorphic processing allows child elements that are descended from a parent required in a content model to be substituted for their parent. This means that if document type A requires the UBL Address element, then a valid instance of document type A can instead use MyAddress (assuming it is based on MyAddressType, an extension of the UBL AddressType on which the UBL Address element was based). A W3C XML Schema-compliant parser will allow the document instance with the MyAddress element in it to parse!
This is not a simple concept the first time it is encountered, but it does provide UBL with a level of interoperability that would not otherwise be available. Basically, it allows us to extend and refine the types of the UBL library and then pass our extended documents to applications that natively understand only the core UBL library. What happens with the extended data is up to the application (typically it is buffered and a user notified that there is some additional information to be considered), but the core UBL data - common to both applications - will still be valid.
Remember that our goal in terms of interoperability is to have the same data called by the same name, wherever it is actually the same. Polymorphic processing allows us to achieve that level of interoperability.
Note that polymorphic processing relies on the explicit expression of the type hierarchy in the XML schema - the schemas must be available to the parser at the time the instance is processed. Further, consider the fact that document types themselves can exhibit this behavior. If we take your document type, and extend it, you can still process our different document type as if it were one of your own.
UBL has not made final decisions about its extension methodology yet, and it is obvious that this approach has an impact on many other aspects of the overall UBL Library's design. The initial UBL release in this area will not cover everything discussed here, since much is seen by the UBL TC as a "Phase II" effort. However, what is presented here will be very similar to the end deliverables. (Remember - it is a standards effort, so nothing is cast in stone!)
Despite this, it should be clear by now that W3C Schema and the Core Components Technical Specification Context Mechanism offer some powerful tools for promoting interoperability between application customizations.
UBL faces a dilemma - the raw capabilities of W3C XML Schema do not provide enough extension and refinement to completely meet the need of the business community. Some things are easy - adding elements at the end of existing sequences, and requiring optional elements, for example. But others - as in the case of our North Sea oil rig example, which specifically disallows the use of non-relevant data - are not.
There is a further lack, too - W3C XML Schema makes no attempt at capturing the business purpose behind an extension. While this is not germane at a technical level - extensions function fine without this information - at a business level, it is critical. If we cannot model the correct set of business relationships - that is, specializations of standard business data - in our type hierarchy, our ability to use polymorphic processing to provide interoperability breaks down. The danger here is that two trading partners might make different extensions to meet the same business needs, creating an inappropriate branch in our type hierarchy, and disallowing interoperation further down the chain. This is a very real danger, given that many parallel industry groups exist, but are separated by their national affiliations (take the automotive industry, for example - we have ODETTE in Europe, and AIAG in the US).
Another aspect of the same problem is the danger presented by infinitely nestable contexts - for instance, the ability of doing first an extension based on industry, extended in turn on the basis of Geography, extended further by another industry, and yet further on the basis of a different geographical consideration. The question here is how much is enough?
Solutions to these three problems are being considered by the UBL effort. For the first, the answer comes from OO technology: create a set of "Ur-types" from which all other types are descended, including the UBL Library itself. For the second problem, we have the Context Mechanism with which to measure the relationships in a business sense, such that commonalities between disparate national groups could be identified in a systematic fashion. As for the third problem, a clear definition of UBL compatibility, to which applications will have to adhere, may be enough.
If we look at our North-Sea oil rig example, we have a case where the needed customization cannot be expressed in W3C XML Schema with the correct, explicit expression of the type hierarchy. The UBL solution to this provides an "escape hatch" to cover this situation, in the form of a set of abstract types, or "Ur-types", from which all extensions are descended, including the UBL Library itself. These Ur-types contain everything that is found in the standard UBL Library, but everything has a cardinality of 0..*
The Ur-types are abstract - that is, they cannot ever be directly referenced by element declarations. The Ur-types are only used as the basis for a set of trivial extensions that simply re-assign the types to a new namespace, and provide the appropriate cardinalities. Things that UBL needs to have required will be required, and those that have a different cardinality have a different one.
These trivial extensions will certainly be made throughout the UBL namespaces. They also provide a way for people who need to perform an "illegal" customization to a UBL type - that is, one that cannot correctly be expressed in W3C XML Schema and cannot therefore be considered UBL compliant- to make their change without losing their ties to the type hierarchy, but at the cost of not being able to use the UBL namespace. In other words, schemas strictly derived through W3C XML Schema derivation from the Ur-library inhabit the UBL namespace; schemas derived through some other means do not.
Take our North-Sea oil rig example. We cannot extend the Address element directly, because we wish to remove some elements that are required. Instead, we extend the abstract type on which UBL Address is based. Because it has no required elements, we can remove whatever we need to remove. In our business document, we can then substitute our extended Address for the UBL Address, and it will still be polymorphically valid, because it is descended from a common ancestor - the UrAddress!
Note that this mechanism would only be used in cases where an "illegal" extension was needed, since it does impact the scope in which applications need to function. (This issue is addressed below).
Take the second part of our dilemma - that the type hierarchy needs to accurately reflect the business needs that produce customizations, and that W3C XML Schema gives us no way of handling this problem. Here, it is the CCTS Context Mechanism that allows us to avert disaster.
The Context Mechanism provides us with a way to identify our business situation within an "eight-space" as described above. It also gives us a constraints language - a way of formally expressing how our customizations are tied to our business situation (that is, which extensions and refinements are needed for which business situation, or "context"). Because the context itself and the constraints language are both machine-processable, it is possible (presumably through an ebXML registry/repository or similar) to identify when a new customization is needed, and when a customization already exists to fulfill the expressed need. By providing the standards bodies at all levels with this mechanism, we can retain the integrity of our type hierarchy, such that the modelling of our schema extensions matches the specializations required for business. Our polymorphic capability, which relies on the correctness of the type hierarchy to provide a meaningful level of interoperability, still works correctly.
It should also be noted that because the constraints language is machine-processable, applications should be able to determine whether a particular derivation violates the nestable context constraints, thus ensuring UBL compatibility.
One result of this approach to customization is that application developers will need to choose the level of the hierarchy at which they support UBL or a UBL-derived library. There is really no negative impact here - enterprise application developers can still do exactly what they do today, which is to directly support a single customized version of a document type - and they will lose no capability. What changes is the ability to write applications that leverage polymorphic processing to provide greater levels of interoperability.
Let's examine this in detail. We are given the Ur-types and the UBL Library. The UBL Library has been extended by industry groups, and those industry groups have trading partners that themselves have made extensions. As an implementor, you can choose to support the UBL library directly, which means that anyone who writes applications supporting UBL, or any of the customizations made by the industry groups or their trading partners will have a minimum level of interoperability. If you choose to support an industry-customized library, then only applications supporting that industry-customization, or the trading-partner extensions of it, will interoperate.
This levelling effect is desirable, because it allows applications to determine the scope within which they need to interoperate, and to provide the minimum level of support to achieve that goal. Generic applications would probably choose to support the Ur-types, since this produces the maximum degree of interoperability between customizations, and the needs of specific businesses and industry cannot be foreknown. Industry-specific applications can rule out any "non-backward-compatible" customizations by not providing support for the Ur-types, but providing support only at the level of the UBL library itself.
Ultimately, the rule of thumb for applications is to provide support at the level above which the application is intended to function, so that polymorphic processing will allow interoperation among customizations at that level. Having an explicit type hierarchy allows applications to detect when they are handed a customization which they do not support.
The UBL extension methodology ultimately builds on the relationships between standards bodies and trading partners that was created to support EDI standardization. Without such organizations, there would be no way to create a viable type hierarchy of business data - the model of free-form customization would be unmanageable. UBL is designing its extension methodology so that standards groups at any level can create UBL-based vocabularies that meet the needs of their specific communities, without sacrificing a high degree of interoperability. Better yet, this interoperability functions at a minimum level at run time, without any integration needed between trading partners.
What the UBL approach realizes is the full capability of W3C XML Schema to help promote interoperability among disparate standards. This has not been achieved, or even posited by any earlier XML vocabulary. It clearly builds on the semantic and technical work of the ebXML Core Components Technical Specification, and the efforts of many other standards groups (ebXML Registry/Repository, etc.)
![]() ![]() |
Design & Development by deepX Ltd. 2002 |