XML 2003 logo

Using Standards to Fast Track Enterprise Messaging for Finance

Abstract

XML messaging continues to grow as a solution for integrating enterprise systems. The problem, however, is that available standards rarely cover an enterprise's complete needs, while developing proprietary vocabularies is expensive and time-consuming. This presentation reviews some recent finance projects where XML standards were used to jump start development of an enterprise messaging and integration solution, but were not allowed to constrain the solution. The presentation reviews the trade-offs, the positives, the negatives, and some strategies.

Keywords


Table of Contents

1. Introduction
2. Why Standards?
3. XML Standards
4. Financial XML Standards
5. Reusing Standards in Proprietary XML Formats
6. Integrating XML Standards
7. Practical Results
8. Alternative Models for Proprietary XML Formats
9. Conclusion
Bibliography
Glossary
Biography

1. Introduction

Although a common aim of XML standards is to allow process interoperability between companies, many financial enterprises are spending more effort on deploying XML standards internally, with proprietary extensions. This paper discusses some of the reasons why, and discusses practical issues that are involved in taking external standards and making them work as an internal solution.

2. Why Standards?

Why standards? This may seem to be an odd question to pose at an XML conference, but since this paper discusses how and when to use standards to fast track enterprise development, it is important to understand the business cases for standards in the enterprise technology stack. Standards are sometimes promoted with almost religious fervour by technologists, but the "standards versus proprietary" argument is never a simple choice for the business decision makers who ultimately have to sign the cheques.

For business decision makers, the question is always one of what value any technology, standards-based or proprietary, brings to the business equation. Does it deliver immediate cost savings? Does it provide genuine leverage in negotiating deals with vendors? Some technologies offer theoretical leverage via a "levelling of the playing field", but that only works if there really are two or more vendors that the enterprise is prepared to deal with. Business stability and continuity are as important in judging vendors, and sometimes more important, than whether a vendor produces standards-based products or not. If there is only one vendor that satisfies an enterprise's non-technical selection criteria, then the question of "standards versus proprietary" becomes moot, because there is no such choice.

That said, for the many cases where there is a choice, where does the business value come from standards? Maximum value comes when you can install a new piece of hardware or software, and it just runs in concert with your existing system, without any integration effort. Even better if it also requires zero training effort for your staff, because it follows user interface standards that your staff are already familiar with. Note that this (idealised) scenario does not distinguish between open/public standards and proprietary "vendor" standards. The bottom line is that enterprises need to manage their total expenditure, and that includes total expended effort. Some freedom of vendor choice may be a nominal business requirement, but that cannot be at the expense of too much integration effort. There is a tendency now for enterprises to develop strategic partnerships with particular vendors in order to manage total cost of ownership at a direct business level, rather than placing faith in an expectation that standards and vendor competition together will guarantee lowest (actual) cost of ownership.

3. XML Standards

For data standards, e.g. XML standards, the strongest business case is when a standard enables the use of off-the-shelf applications at a cost that is lower than the cost of developing internal equivalents (and remember that internal applications may not require the full functionality of the off-the-shelf applications). You also want those applications to be maintained and supported by the producing companies (or perhaps by 3rd party support specialists), rather than by your own support staff.

XML itself is well supported. Related specifications like SAX, DOM, XSLT, W3C XML Schema, and SOAP are also well supported. XQuery and XForms should also be well supported when they become full W3C recommendations. So in terms of generic "horizontal" XML, there are enough off-the-shelf applications to justify enterprise use of XML.

Some specifications serve a vertical market, but a large and broad vertical market. This is the case with electronic business using extensible markup language (ebXML) and universal business language (UBL) serving the needs of general commerce. Although commerce is a vertical market, these specifications address horizontal needs of the commerce community, and so they are likely to receive good application support, particularly given their UN/CEFACT backing.

4. Financial XML Standards

Finance is a vertical market which overlaps with commerce. It comprises all of the activities that companies and governments undertake to raise investment capital. This includes business loans, government and corporate bonds, and financial markets. Finance is an area that spends billions of dollars/euros/pounds on IT each year. There are a number of financial XML specifications, including:

ISO 15022 version 2

international standards organisation (ISO) 15022 version 1 is a major non-XML specification that has contributed to a number of the existing financial XML specifications. It provides a standard set of (>10k) data fields for financial information and (~100) messages for financial transactions. The data dictionary and catalogue of messages are maintained on ISO's behalf by society for worldwide interbank financial telecommunication (SWIFT), a banking industry co-operative.

Version 2 of ISO 15022 is not an incremental development, but a complete re-engineering. The version 2 standard, which is expected to pass the ISO voting process in December 2003 or January 2004, defines a repository into which nominated financial industry groups (including some of the existing financial ML producers) can contribute business models. These are converted to process/message/data models expressed in unified modeling language (UML). The ISO 15022 version 2 model is a single unified model. New contributions will be integrated with the existing definitions that are already in the repository, so that duplication is avoided.. XML Schemas for the messages underlying the processes are generated automatically from the UML model. Registration and standards management groups should be in place by mid-2004, at which point the repository will be ready to accept its first financial models. Note that the existing ISO 15022 version 1 definitions will not be used to seed the version 2 repository. So version 2 data dictionary definitions may be different to their version 1 equivalents. It will depend on which groups submit their business models first, since the first submitted definition (for each particular item) is the definition that will be used thereafter.

The metamodel underlying ISO 15022 version 2 is a deliberately simple one. ISO 15022 UML models are restricted to a subset of UML constructs, and ISO 15022 Schemas are restricted to a subset of W3C XML Schema features. The only W3C XML Schema features that they use (beyond what DTDs provide) are namespaces, datatypes, and local element definitions.

financial products markup language (FpML)

FpML is a financial specification which initially focussed on transactions of over-the-counter financial instruments. These are financial products which are not traded via an exchange, but directly between two financial institutions, such as banks. Traditionally, over-the-counter deals are agreed by telephone, where only the most important details are discussed. After this, each party faxes and couriers the full details to the other, and the details are then compared to make sure that both parties were actually agreeing to the same thing. Time pressures mean that the full details cannot be agreed during the telephone phase of the negotiations. As such, the process of confirmation, comparing what both parties believed the they had agreed to, is manually tedious, and this makes it a good candidate for automation using XML. FpML messages each have a known and predefined set of defaults which are overridden explicitly by either party as required. This makes the process of not specifying something explicitly a well-defined one, and allows the confirmation to be done in a fast semi-automatic fashion where only mismatched information is brought to the attention of humans.

FpML 4.0 (in beta at the time of writing) covers interest rate derivatives (swaps and forward rate agreements), equity derivatives, energy derivatives, foreign exchange (FX) spots, and FX derivatives (forwards, swaps, non-deliverable forwards (NDF), simple options, and option strategies). FpML 4.0 introduces messaging constructs to FpML (request/response and notification), but does not use SOAP. Work is also being done within FpML on how to extend document validation beyond what can be checked using a W3C XML Schema.

FpML 1.0, 2.0, and 3.0 had DTDs, but 4.0 does away with the DTD in favour of W3C XML Schema. Most (but not all) element definitions remain global, but FpML 4.0 uses substitution groups, which some popular Schema "compilers" cannot yet cope with. FpML 4.0 also makes use of type substitution, in particular so that all product Schemas share the same <FpML> root element.

FpML 1.0 introduced the concept of enumeration "schemes", where enumerated vocabularies (like countries and currencies) are not embedded in the Schemas, but can be selected within instance documents using identifying URIs (where no explicit scheme is selected, a default applies). This continues in FpML 4.0, but now some enumerations (those which are unchanging for all practical purposes) have been moved into the Schemas as normal enumerations. This makes application development easier, since there is no need to code for the possibility that every enumeration could be different for every instance document.

market data definition language (MDDL)

MDDL is a financial information specification produced by the financial information services division (FISD) of the software and information industry association (SIIA). MDDL 1.0 was released at the start of November 2001, and supports the publication of snapshots and historical time-series of equity prices, financial indices, and mutual fund data. MDDL 2.0 added support for setup information (reference data) for bonds. Reference data has become a hot topic in the financial world, as inconsistent reference data is now seen as a key impediment to straight through processing (STP). MDDL wants to be the reference data format of choice, and will be augmenting the 1.0 definitions and adding setup information to the pricing information. Corporate actions are also on the roadmap. The London Stock Exchange is using MDDL for its new reference data service which will go live in 2004.

MDDL has features which can be used to help reduce its bandwidth requirements. Commonly used fragments can be written once in the <references> section at the start of the document, and then referenced wherever required in the remainder of the document. Most MDDL elements can have property elements as children, but property values are also inherited from ancestors, which helps reduce the need to inline the same values multiple times. Another MDDL feature is an optional <other> element which provides extension points throughout MDDL documents, so that there is plenty of scope for augmentation of the standard MDDL information.

MDDL's special features, particularly inheritance, are simple in concept, but tedious to implement using W3C XML Schema. For version 1.0, a base Schema without special features was edited by hand, and an XSLT script was used to process the Schema and add the special features. The major difficulty with this approach was that the same construct (as viewed in an instance document) can be represented many ways in W3C XML Schema, so if there are multiple Schema authors, multiple constructs are used. This makes it difficult to write the XSLT script to augment the Schema, since so many alternatives need to be supported.

So, since version 2.0, the MDDL data model has been edited using an XML document with a restrictive Schema that allows only the minimum number of constructs necessary to produce MDDL. This reduces the choices available to MDDL editors, and greatly improves the quality control. This data model in XML is then processed using XSLT to produce the Schema. The data model is also used to generate visual models for MDDL, and to drive code generators for MDDL applications.

extensible business reporting language (XBRL)

XBRL is a financial specification which initially focussed on company filings and reports. On an international level, the major complexity with company reports at present is that each country has its own accounting standard. In the USA, the US generally accepted accounting principles (GAAP) is used, while in the UK it is the UK GAAP, Australia has an Australian GAAP, etc. Each accounting standard requires a different XBRL "taxonomy". What differs in each case are the list of defined accounting items and the rules on how lower level items are added/subtracted/multiplied to give higher level items.

XBRL allows companies to add their own items by extending their local XBRL taxonomy. This allows those items which are important to a company's understanding of its own business to be directly related to the standard accounting terms required in its annual filing and report.

Since version 2.0, XBRL has had a very flat format that does not make use of the normal hierarchical structure of XML. Instead, structure is imposed on the information using XLink linkbases. This allows XBRL to support multiple views of the same underlying data, each view corresponding to a different linkbase. These views include the rules on how to add/subtract/multiply line items in order to produce the subtotals. So an XBRL "document" is really a combination of both the underlying data file and the associated XLink linkbase files. XBRL taxonomies, which define the allowed data elements in an XBRL data file, are W3C XML Schemas.

treasury workstation integration standards team (TWIST)

TWIST is a financial specification that specifically targets STP. Initially focussing on FX, TWIST covers the lifecycle of a trade from identification of trading partners through to final settlement of a transaction. The component of TWIST covering trade details overlaps with the FX component of FpML, and the good news is that TWIST and FpML work together so that their XML vocabularies are aligned for the area where they overlap. TWIST is also working on commercial payments (one company pays another), an area that is simple in theory, but has numerous potential complications in practice. In general, TWIST is driven by the requirements of large corporate enterprises in dealing with their banks.

FIXML

financial information exchange (FIX) is a non-XML financial transaction protocol in the pre-trade/trade area which aims to be vendor-neutral. The FIX consortium is composed of a group of banking and financial institutions who view themselves as clients rather than vendors. FIXML was announced as the XML-isation of the existing FIX protocol (binary messages). However, FIXML has not taken much market share from the binary FIX protocol, as it offers no benefits to offset the cost of migration. FIX has been working with ISO 15022 to prepare a pre-trade/trade business model for the ISO 15022 version 2 repository, and they are the only group likely to be ready to add content to the repository when it becomes available in mid-2004. Although ISO 15022 is a possible future migration path for FIX, the FIX organisation continues to develop both binary FIX and FIXML, looking to extend the format to cover settlement as well. So FIX is hedging its bets on what the future will be, although the problem with this approach is that FIX does not have to resources required to advance its binary FIX, FIXML, and ISO 15022 efforts simultaneously, so work on one comes at the expense of the others.

research information exchange markup language (RIXML)

RIXML is an investment research specification which focusses on metadata rather than on the way that research reports are structured. The intention of RIXML is not to provide a way of writing investment research content using XML, but to provide a standard attachment that can be used with any media type to indicate the nature of the content. One of the key uses of XML in financial specifications is to allow metadata to be associated with content so that the best possible filtering and ranking of the available information can be done. RIXML is unusual in completely externalising the metadata from the content, but it effectively delivers its users' major value-add (filtering) without requiring any change to the content formats they use, so it does not interrupt existing workflows.

The curious thing is that to date, none of these XML standards has the level of off-the-shelf application support required to automatically make the value case for its use. Why is that? To some extent, it is historical. Finance enterprises are using XML standards internally, but still building their own systems for doing so. Finance has a history (from the 1980s and 1990s) of creating its own IT infrastructure — some argue that banks are just large IT firms that also have branches and automatic tellers — in spite of the cost. That came about because margins were good, and so the cost could be sustained. It is hard to say whether the cost was ever justified, but as banking and finance are service industries, it can be difficult to quantify a finance organisation's differentiators, its unique selling points. This has led to a culture of secrecy, where many finance organisations feel they can justify building their own IT systems on the grounds that it helps stop their competitors from copying the service(s) they provide.

However, the financial markets entered a downturn in mid-2000, and this was compounded by the tragedy of the World Trade Center in 2001. The consequent reduction in profits has forced many companies to rethink their IT approach, and look for ways of reducing recurrent costs. What is happening so far is not a complete change of philosophy, but a reduction in the scale of internal IT development.

The timing of the downturn was unfortunate, because 2000 was also the year when the finance community really started to take an interest in XML, and numerous groups started work on XML specifications to cover particular specialities within finance. So by the time the specifications were ready for implementation, the money available to develop applications had dried up. Companies were busy planning survival strategies, and application development with new XML data formats was out of the question.

The situation has improved in 2003, but it is still obvious that major players in the finance area are not all jumping to support XML standards in the products they supply to customers. The reasons for this are of a business/cultural nature, rather than of a technical nature. The biggest player (the incumbent) in any market typically has nothing to gain from standards (applied to their product delivery), and a lot to lose. The number two player also may not have a lot to gain, particularly if the other players are much smaller. So it is usually number three or four players who will be the first to implement a standard in a product offering, on the basis that what they lose to other players of the same size (or smaller) should be less that what they gain from the larger players. The larger players move only after their larger customers move (or threaten to move) to their smaller but standards-based competitors.

It is important to realise that big players in a market won't always be forced to apply standards to their products. If they can supply sufficient valuable functionality and reliability, at a suitable cost point, that can be enough for their clients to decide that alternative vendors are not an immediate priority, and hence that standards are also not an immediate priority.

The upshot is that neither off-the-shelf applications, nor financial product offerings, are driving the use of XML in finance at present. If these are not drivers, what is? A large part of the (perceived) value of a finance organisation is its knowledge of the financial markets. Such knowledge is a differentiator, because there is no strict agreement on how financial markets behave, and hence no agreement on what the best investment strategies are. Finance organisations maintain large data models which encode their particular view of the markets and how they work — what the data is, what the relationships between data items are, what the processes are which underly the lifecycles of financial transactions. These models are expensive to maintain, perhaps too expensive given the downturn and lower profit margins in the finance industry. Further, there are reasonable grounds to believe that the finance industry is not seeing a hiccup, but a seachange. Margins are unlikely to creep back up to where they once were, so companies instead have to find new strategies to allow them to live within their means, and still show decent profitability.

It is in the creation and maintenance of these data models, and the applications which implement them, that XML standards are attractive. Although companies have distinct proprietary aspects to their data models, they are modelling the same financial markets as their competitors are, so the majority of items in the competing models are more-or-less the same. Public standards, to which a company and/or its competitors have contributed, will tend to contain these majority common items, as those are items that the participants will have been able to agree upon. So the definitions contained within a public XML specification can be a good starting point onto which to build a proprietary enterprise model.

Do companies need proprietary models, in this age of standards? Yes. Companies have certain data that they need to store and/or model, and for large enterprises, it is unlikely that any set of standards will adequately cover the full range and depth of data. Companies cannot afford to limit their ambitions just because of the lack of suitable external data standards. So the need for proprietary models will always be with us.

5. Reusing Standards in Proprietary XML Formats

What can you take from an external XML standard? Firstly, you can take the vocabulary (element/attribute names and associated datatypes). Just having a set of names is an important starting point, because naming is still the most difficult problem in IT, and it consumes an enormous amount of (expensive) meeting time. Starting with a large, consistent, and agreed-upon set of names gives an enormous head start compared to starting from scratch. Datatypes are also very important, because transforming element/attribute names is easy, but the same is not true for the values between the start and end tags, so having the right datatypes is also a big advantage. From there, you can take some or all of the structures as well. The closer the structures are to the needs of your company's data model, the more you save. Knowing your requirements for vocabulary and structures beforehand allows you to then review the available standards and find the content that you need to seed your data model.

Financial XML standards tend to focus on a particular niche. If they tried to cover the whole of finance in one hit, they would never produce anything within a reasonable amount of time. This means that no one standard is likely to cover your enterprise needs. So what the strategies you can use?

  • You can take the one standard that most closely matches your enterprise's needs, and then modify/extend it as you require.

  • You can take a number of standards that collectively cover a large portion of your enterprise's needs. Then, as well as modification and extension, you need to consider integration of the different standards.

Integration of XML formats is easier than integration of textual and/or binary formats, because of the commonality that XML imposes. However, that doesn't make integration trivial, just easier. In particular, XML formats that use different schema languages (most likely DTD & W3C XML Schema) can only be integrated if you convert to a common schema language and then integrate. Even when using the same schema language, the formats can vary greatly in the way they use the schema language, as demonstrated by the financial MLs mentioned earlier (Section 4, “Financial XML Standards”).

As an aside, arguably ISO's document schema definition languages (DSDL) would allow integration without merging of the schema languages, but it doesn't yet have a prime-time slot in enterprise production environments. Maybe it will one day, if Apache implements it in Xerces at some point, and if developer tools then support it as well. That said, context-sensitive editors and forms generators for multi-schema DSDL documents have not yet appeared on the market, and may never, and that may be reason enough not to take the DSDL approach.

6. Integrating XML Standards

If you do your integration of financial schemas by hand, then when the next version of one of the schemas is released, you could have a large maintenance and re-integration job, assuming the updated version is worth moving to. The sheer cost might lead you to avoid updating, even if there are some worthwhile improvements. For schema conversion, if you use a tool to do the work (e.g. a DTD to W3C XML Schema converter), that helps a lot. You still have the re-integration effort, but at least the schema conversion task is reduced to a zero-cost push-button item.

As a practical example, in a project where the FpML 3.0 DTD was used to provide the starting point for a proprietary W3C XML Schema, Tibco's TurboXML was used to do the DTD to Schema conversion. It did a good job, converting the DTD into a consistently structured Schema. However, in the FpML DTD, datatypes were included as attribute values of the elements, and hence were converted into attributes in the Schema. Luckily, because the generated Schema was consistent in structure, it was a short exercise to write an XSLT script that, when applied to the generated Schema, set each datatype correctly from the value of the datatype attribute, and then deleted the datatype attribute. These are the kinds of tasks you need to do when integrating XML standards. XSLT is a very useful tool for manipulating W3C XML Schemas (particularly if their structure is consistent). DTDs are harder to manipulate, unless you are skilled with Perl. For finance, W3C XML Schema is the normal choice of schema language, because of the datatypes it provides.

The thing about integrating two or more formats is that you can do it, but with each new version that you want to upgrade to, you have to redo the integration. Although you can do it by hand once, and it may seem worth the effort, you might not be so happy to repeat that same effort every time an improved version of either/any format is released. So, once you have a grip on how you want the integration to work, and what you want the result to look like (this usually implies starting at least some of the integration by hand), there is benefit in automating the integration process. You won't see that benefit the first time you integrate (you may only see an extra cost), but the effort can pay for itself from the first time you have an update to any of the component standards.

Version management is easy to ignore the first time you do an integration, and it is an item that project managers are all too likely to cut out of the project plan (to reduce the inevitable overrun). It gets cut because it doesn't effect the delivery of the first integration deliverables. Unfortunately, it isn't necessarily possible to include proper version management for later deliverables, because the project manager then would have to explain why something wasn't done about it during the first delivery cycle. This can be a vicious circle. There isn't a simple or easy solution, either. You just have to be aware that this issue exists, and ignoring it during your first integration may not save you money in the long term. Still, modern businesses often don't aim to save money in the long term, as that is too speculative, and those gains might never be realised if the company goes out of business first. So businesses save money in the short or medium term of the current delivery cycle. This is a valid business choice, but be aware of the limitations of delivering the first cycle as cheaply as possible.

Even once you settle on a single schema language, there will be integration issues based on the differing "styles" of the schemas. Every XML format seems to have its own style, and there always seems to be a good reason why each new format must have its own style.

When integrating multiple XML standards/specifications, one option is to integrate the formats without integrating the styles. This requires the least effort, but can make the final format hard to use, because users (particularly developers writing generators or readers for the format) will constantly have to refer to documentation in order to know what the convention is for each different part of the integrated format. So, this approach saves money in early development, but at a continuing cost to users (e.g. application developers) thereafter.

If one format covers a significantly larger proportion of your needs than any of the others, it can be useful to copy the style of that majority part for the whole integrated format. Integration then involves reworking the minority formats into the style of the majority format. This is more work in integration, but the pay-off comes for users, who get a consistent and predictable format to work with.

For example, there is no agreed standard on how to construct XML element and attribute names. This is at least in part because the W3C itself has never been able to agree on such naming rules. Some specifications have "element-names-delimited-by-hyphens", some have "elementNamesInLowerCamelCase", some have "ElementNamesInUpperCamelCase", etc. Although this sounds like a small problem, naming remains the hardest problem in IT, and integrating specifications whose element/attribute names have different case conventions can be very tedious. If you don't align the cases to a single style, the integrated specification is difficult to use, and doesn't project any sense of quality. Writing algorithmic rules to modify case conventions is often difficult or impossible, as picking the word boundaries 100% correctly can be unreliable. The best you can do is to set up a manually maintained translation table (e.g. in XML), either for all names or as an override for names that are not correctly converted using an automatic algorithm, and then drive your element/attribute renaming code from that table.

Another major issue is global vs. local element scope. Some XML specifications make full use of W3C XML Schema's support for locally scoped element names, so that elements names can be re-used in different contexts with different content models. Other specifications continue to use the DTD-style approach of having globally unique element names. If you have to integrate a locally-named Schema so that the result is globally-named, then you again will need to have a renaming table.

Another major stylistic choice in schemas is whether to use element content versus attribute content. FpML and MDDL both have a policy of using element content almost exclusively, with attribute content reserved for attributes that are globally applicable to elements. An alternative design policy is to use element content for content that can be sensibly displayed to (at least some) humans, and attribute content for content that is only of interest for automatic processing. However, the "no or little attribute content" rule seems to be winning for data-oriented financial XML, and is not a difficult XSLT programming task to convert attributes in an XML Schema into equivalent child elements.

You also may not require certain features for your enterprise format. For example, you may only want data dictionary definitions from an XBRL taxonomy Schema, and not need any of the XLink information. You may want to re-use market data definitions from MDDL, but not need the shorthands like MDDL inheritance which can reduce the size of instance documents, but at the expense of complicating the Schema. In that case, the best thing is come up with a process (manual or automatic) for stripping out the features you do not want before you perform your integration. For an enterprise format, there is little point in including features that have no enterprise buy-in. The designers of XML standards are in a different position, they have to think about the requirements of all user firms. An enterprise format does not need to be inclusive, and unused features can be removed.

Some financial W3C XML Schemas are now making use of substitution groups, as an alternative to DTD-style choice groups. The biggest practical problem with substitution groups is that many application developers have made themselves dependent on Schema "compilers", particularly Sun's JAXB, and substitution groups are one of the W3C XML Schema features that is not supported by JAXB. There is a historical reason for this. Originally, Java provided a binary serialisation format for JavaBeans. This was intended for short term serialisation, e.g. for network transmission of objects. However, people started using it for long term serialisation as well, storing their serialised JavaBean as binary large objects (BLOBs) in relational databases. The problem was that there was no provision for version management, so if you updated your JavaBean class, you would break serialised instances of that class. The problem was so bad that one of Sun's senior JavaBean developers once apologised to me for Java serialisation. He also apologised for applets, but that is another story.

Sun's Swing team flirted with using XML for serialisation, but some of the Java folks in Sun realised that there would always be problems in serialising complex object graphs no matter what format you choose. Instead, they realised that if you reverse the process, and create Java classes from one or more XML schemas, you end up with a set of classes which are inherently serialisable. The upshot is that JAXB compiles XML schemas, but only enough to support Sun's object serialisation requirements. Some W3C XML Schema features are not supported, and may never be supported. This has become a support headache for anyone who dares to produce an XML Schema that JAXB cannot compile (even if it was created before JAXB existed), as developers complain about the lack of JAXB compatibility, without understanding what the broader issues are, nor why their lowest-common-denominator tool is now causing as many problems as it solves.

The practical implication of this is that it is now common for enterprises to replace substitution groups with choice groups when integrating external W3C XML Schemas into their proprietary Schemas, to make it possible to use JAXB and similar tools. It is not an insignificant change to make, nor to keep updated, but it is a task that can be automated, and taking this particularly pain during schema integration will make life easier for some application developers.

7. Practical Results

This integration approach was used to create a proprietary messaging format. Approximately half of the content came from FpML 3.0 (and this is currently being upgraded to 4.0). The DTD was converted into W3C XML Schema using Tibco's TurboXML and a custom XSLT post-processing script. A quarter of the content was manually translated from ISO 15022 version 1 into XML Schema. The remaining quarter of the content was proprietary.

For the project, a set of messages was required. Although one approach was to have a single Schema covering all messages, a better approach is to have a separate Schema for each message, so incorrectly delivered messages are rejected at validation time. This led to question of how to create the subset Schema corresponding to each message. Cutting and pasting from the main integrated Schema was tried initially, but it was too manually expensive, and too error-prone.

The solution was to once again use XSLT. The process was as follows:

  1. For each message, a simple XML file contained a list of the elements which must be in the message Schema. Often this was just the root element for the message.

  2. An XSLT script was applied to the main Schema, using the list of required elements as a secondary control input, to create an output file listing all the definitions on which the required elements were dependent. This turns out to be fairly easy to do with typical XML Schemas. From any element definition, you follow the dependency trail. The element's complex or simple type (if any) is a dependency to follow, as are any types they are derived from. Any elements referenced within an element or type definition are dependencies to follow, and references to model or attribute groups also have to be followed. This kind of recursion is fairly easy to do in XSLT. The output file may be larger than the original Schema, particularly as some dependencies may be multiply counted, but that is OK. The main thing to be careful of is recursive/circular dependencies, which require a script that maintains a stack of all the elements it has followed, and stops when the dependency trail becomes circular.

  3. The dependency file generated in the previous step was filtered using XSLT to remove redundant multiple listings of dependencies.

  4. A final XSLT was applied to the main Schema, using the dependency file as a secondary control input. Only the required elements and dependent items are copied into the output message Schema.

This approach of using one XSLT to build up a picture of what needs to be done, before using another XSLT script to perform the required edits, works very well in manipulating data-oriented W3C XML Schemas. Trying to do everything in a single XSLT script is rarely helpful, and just leads to complicated and unmaintainable scripts. That may be less true of XSLT 2.0 than of XSLT 1.0, but since Schema manipulation of this kind is done as an off-line process, not a real-time process, there is nothing to be gained from trying to do everything in a single pass.

Although this process is workable, it requires a continuing in-house XSLT expertise, as some modifications to the integration process will be required whenever updated schemas introduce new issues. For the project in question, which did not have any permanent XSLT developers, 3rd party data modelling tools are now being evaluated as an alternative for maintaining the proprietary messaging format. Nonetheless, the initial XSLT scripts were invaluable for quickly establishing what could be done, and for understanding what 3rd party tools would need to do to genuinely provide a long term solution.

8. Alternative Models for Proprietary XML Formats

An alternative to generating your enterprise XML format from external XML formats is instead to generate it from a data model. Such models are available commercially for finance, and are interesting because they can be used as the basis for generating database schemas and application code as well as XML schemas. Particularly interesting will be when the ISO 15022 version 2 repository has a useful amount of content, as the UML model will be available as well as the Schemas (although there is expected to be a charge for the UML model). This may make it attractive for enterprises to use ISO 15022 as the basis for a consistent data model that spans the messaging schemas through to the application objects through to the database schemas.

9. Conclusion

Starting with an external standard or two is a good way to kick start development of internal XML data or messaging formats. However, it doesn't come for free. It has to be managed properly, if you want your internal format to be able to benefit from new versions of your donor standards. The difficulties of integration and management, however, are far less than the difficulties of building a winning enterprise data/messaging format from scratch. There is often a compromise between standards and proprietary development that gives the best return for your enterprise development dollar.

Bibliography

[DSDL] DSDL, Document Schema Definition Languages, ISO/IEC 19757, http://dsdl.org/

[ebXML] ebXML, Electronic Business Using Extensible Markup Language, http://www.ebxml.org/

[FISD] FISD, Financial Information Services Division of the SIIA, http://www.fisd.net/

[FIX] FIX, Financial Information Exchange, http://www.fixprotocol.org/

[FpML] FpML, Financial Products Markup Language, http://www.fpml.org/

[ISO15022] ISO 15022, http://www.iso15022.org/

[JAXB] JAXB, Java Architecture for XML Binding, http://java.sun.com/xml/jaxb/

[MDDL] MDDL, Market Data Definition Language, http://www.mddl.org/

[RIXML] RIXML, Research Information Exchange Markup Language, http://www.rixml.org/

[SIIA] SIIA, Software and Information Industry Association, http://www.siia.net/

[SWIFT] SWIFT, Society For Worldwide Interbank Financial Telecommunication, http://www.swift.com/

[TWIST] TWIST, Treasury Workstation Integration Standards Team, http://www.twiststandards.org/

[UML] UML, Unified Modeling Language, http://www.uml.org/

[XBRL] XBRL, Extensible Business Reporting Language, http://www.xbrl.org/

Glossary

BLOBs

binary large objects

DSDL

document schema definition languages

ebXML

electronic business using extensible markup language

FISD

financial information services division

FIX

financial information exchange

FpML

financial products markup language

FX

foreign exchange

GAAP

generally accepted accounting principles

ISO

international standards organisation

MDDL

market data definition language

RIXML

research information exchange markup language

SIIA

software and information industry association

STP

straight through processing

SWIFT

society for worldwide interbank financial telecommunication

TWIST

treasury workstation integration standards team

UBL

universal business language

UML

unified modeling language

XBRL

extensible business reporting language

Biography

Anthony B. Coates (Tony) specialises in information management and integration solutions for financial and corporate clients. Tony is Chief Architect for London Market Systems, and previously was Leader of XML Architecture and Design in Reuters Chief Technology Office in London. Tony has worked as an architect for a number of financial XML initiatives, including MDDL (Market Data Definition Language, http://www.mddl.org/) and FpML (Financial Products Markup Language). Tony is also an editor of the MDDL specification. Tony's background includes developing software for technical analysis and financial graphics, developing multimedia and Web applications, and theoretical and experimental physics. He has worked with XML since 1998, and Java since 1996. Tony is a past secretary of the Australian Java Users Group.