XML 2003 logo

Validating FpML

Abstract

Complex languages built on XML require support for validation that goes far beyond what can be provided using traditional schema languages like XML Schema. In this paper we look at the experience we made as part of the FpML Validation Working Group, which is specifying standard validation rules for the Financial Products Markup Language (FpML), a cross-industry standard for financial derivatives trades.

The issues surfaced in here are likely to be of interest to any organisation or standards body dealing with data that exhibits complex structure or semantics. We discuss the process for validation rule gathering that the working group followed, and the issues it has been facing. We then look at the two reference implementation languages, Schematron and CLIX, that we used to formally express the validation rules, and some of the requirements we found: validation of the documents against external data sources like relational databases, expressiveness requirements, and complex data type support.


Table of Contents

1. Introduction
2. Background about FpML
2.1. Motivation for Semantic Validation
3. Defining Validity
3.1. Issues
4. Reference Implementations
5. Summary
Acknowledgements
Bibliography
Biography

1. Introduction

The Financial Products Markup Language [FpML] is an industry standard for storing information about financial derivatives instruments. It is being specified by a group of financial institutions, tool vendors and service providers under the auspices of the International Swaps and Derivatives Association [ISDA]. The goal of FpML is to specify a standardised format for the most commonly traded types of derivatives, and to thereby increase automation in trade negotiation, confirmation and related activities, eliminating error-prone and time consuming manual activities.

FpML is a complex language that captures a large amount of rich semantic information about its application domain. Validating an FpML document is not entirely straightforward, and it was realised early on that schema-based validation would be insufficient. While XML Schema [XML Schema] has been successfully used to capture the structure of FpML, it has neither the theoretical expressiveness, complementary feature set, nor the required flexibility to constrain the semantic domain. Therefore, while a Schema-validated FpML document may be valid, it is not necessarily meaningful. Figure Figure 1 below captures this notion graphically:

Levels of validity

Figure 1. Levels of validity

ISDA set up the Validation Working Group to address this problem, specifically:

  1. To capture semantic validation rules about the difference types of financial instruments, expressed in English.

  2. To express the validation constraints in a suitable formal constraint language, to eliminate ambiguities in the language, surface mistakes in reasoning, and enable later automation of validation.

The following is an account of the experiences of the Validation Working Group. It gives an overview of the validation rule gathering process, supplemented with a number of real examples. It then takes a look at the major process issues and pitfalls - such as constrained access to business expertise - that we encountered. From a technical point of view, it discusses the requirements placed on the constraint languages and gives examples that compare and contrast the current reference constraint languages, CLIX and Schematron.

2. Background about FpML

The Financial Products Markup Language [FpML] is a markup language, expressed in XML, for financial derivatives transactions. The goal of FpML is to eliminate time consuming and error-prone manual processes by providing a standardised vocabulary for capturing trade information and enabling communcation between systems. FpML is used both in vertical message flows inside organisations, for example to relay information from front office trade capture systems, where deals are entered, to middle office for risk management, and horizontal flows between institutions. Horizontal flows, which are used in bilateral "over-the-counter" (OTC) trading, include trade negotation and, once a deal has been recorded, matching and confirmation. More information about these activities can be found on the FpML web site.

Derivative trades are complex financial contracts. The most widely known example of a derivative is probably an option, which gives a buyer a right to purchase equity at a given price, or a seller the right to sell equity at a given price. Derivatives are thus higher order contracts whose underlying value is secured by other types of financial products.

FpML has been growing rapidly since its inception to cover an ever-expanding set of derivative products: the first version supported mainly interest rate derivatives, version 2.0 added many different types of options and version 4.0 has a greatly expanded product set that covers FX, equity and credit derivatives. Version 4.0 also adds a set of header elements to assist messaging transport and workflow. The following statistics about the latest version are a good indicator of the schema complexity:

  • There is one main schema and 9 included schemas

  • The schemas hold 360 complex type definitions

  • There are 967 local and global element definitions

The fragment below is a sample portion of an interest rate swap taken from the FpML specification; several elements and attributes have been removed for clarity:

<FpML version="4-0" xsi:type="DataDocument" ...>
  <trade>
     <swap>
       <swapStream>
         <payerPartyReference href="CHASE" />
         <receiverPartyReference href="BARCLAYS" />
         <calculationPeriodDates id="floatingCalcPeriodDates">
           <effectiveDate>
             <unadjustedDate>1994-12-14</unadjustedDate>
           </effectiveDate>
           <terminationDate>
             <unadjustedDate>1999-12-14</unadjustedDate>
           </terminationDate>
        </calculationPeriodDates>
         ..
      </swapStream>
    </swap>
  </trade>
</FpML>

The block of XML shows some of the information that is captured by the schema: the top level element identifies this document as a "data document", as opposed to a message, by explicitly overriding the element type to match the required data type. The remaining part of the fragment shows some of the economic data included with the trade: it identifies the paying and receiving parties, and specifies the start date, or "effective date", and end date, or "termination date", of the trade.

More examples are published with the latest FpML recommendation and the working drafts, which can be found at the FpML web site.

2.1. Motivation for Semantic Validation

The need to introduce additional semantic validation for FpML arises for a number of reasons, which we will discuss in this section:

  1. The need to further constrain documents where XML Schema's expressive power is insufficient.

  2. Tailoring of the schema, which includes a lot of optionality, for particular business purposes.

  3. A desire to capture the business meaning of elements, and especially the relationship between elements. This has two motivations:

    • To facilitate implementation and eliminate scope for ambiguity.

    • To standardise the meaning for bilateral communication between counterparties.

  4. Validation against reference data or other external data sources.

We will look at a few examples of how these issues surface in practice. The first example below shows a schema-valid fragment of an FpML document that violates a basic constraint: the effective date, being the start date of a trade, cannot exceed the termination date. Since this constraint relates the values of elements in different parts of the document tree, it cannot be expressed in XML Schema.

<calculationPeriodDates id="floatingCalcPeriodDates">
  <effectiveDate>
    <unadjustedDate>2000-12-14</unadjustedDate>
  </effectiveDate>
  <terminationDate>
    <unadjustedDate>1999-12-14</unadjustedDate>
  </terminationDate>
</calculationPeriodDates>

Tailoring of the schema becomes necessary because of the large amount of optionality in FpML. This optionality is necessary to enable the reuse of elements in different types of products, where the semantics of the elements may be slightly different depending on the context. Optionality is also necessary because different types of information need to be captured at different stages of the document life cycle. For example, the initial trade negotiation stages may populate fewer elements than later stages.

Elimination of ambiguity in the element definitions is best explained using the previous example about start and end dates: the validation working group actually debated for a number of weeks whether the termination date of a trade should be allowed to fall on the effective date. The consensus reached in the end was that it should not. This decision was captured as a validation rule for the calculationPeriodDates element, which contains the two date elements and clearly states that the termination date has to be greater than the effective date rather than greater or equal. When in doubt, implementers can thus refer to the validation rules to disambiguate the meaning of these elements.

<businessCenters id="primaryBusinessCenters">
  <businessCenter>DEFR</businessCenter>
</businessCenters>

Finally, when putting FpML into operation the need to validate against reference data arises. The fragment of FpML above contains a businessCenter element that identifies Frankfurt in Germany as one of the business centers for the trade. The schema does not specify a legal set of values for this element - it allow unconstrained strings. This is because the business centers that counterparties are prepared to accept for a deal is not for the FpML standard to determine. Instead, business center tables are often stored and maintained internally by firms, in files or reference databases. Documents thus have to be validated against such external data sources to ensure correctness. Since Schema languages have no mechanisms to provide such access, there needs to be a separate validation step.

3. Defining Validity

The Validation Working Group was assigned the task of investigating and addressing the semantic validation problem in a systematic manner. Our approach encompassed two steps:

  • Rule gathering to come up with a list of constraints.

  • Formalisation into a suitable constraint language to enable execution.

In the context of a standardisation process, there is of course a desire to provide quality guarantees and to anticipate likely pitfalls. It was clear from the charter of the working group that two issues had to be addressed from the outset: the need to explain the completeness of the rules - what does it mean for a rule to be present in the rule set, and what does it mean for a rule not to be present; and a desire to avoid errors and ambiguities in translation from natural language.

It is not possible in general to provide any formal guarantees about the completeness of validation rules, just as it is similarly impossible to provide guarantees about the completeness of a requirements specification. Rule gathering involves interaction with human domain experts, and completeness is thus a process and control problem rather than a formal issue. Since FpML did not provide any semantic validation at all before this work, it was felt that any initial set of rules provided by the working group would already be an improvement. Instead of providing a guarantee for completeness, we instead defined a simple and transparent process whose quality is open to inspection and scrutiny:

Validation Working Group Process

Figure 2. Validation Working Group Process

Work starts with feedback from previous releases. This may include natural language rules that are suggested by FpML users. The set of rules is then expanded through a schema analysis: experts on products like interest rate derivatives work together with a delegate from the working group to perform a structure walkthrough of every element in a schema, similar to a code review. In this process, we try to identify constraints by looking for the most common constraint sources:

  • Dates - it is natural that a large number of rules is expected to surface with respect to dates in a trade document. Almost two thirds of the current set of rules deals with dates.

  • Cross-references - FpML uses an id/href style cross-referencing system. The allowed values in references frequently have to be further constrained to particular portions of the document.

  • Value constraints - frequently the values of elements have to be further constrained depending on where they appear in the document, a consequence of the element reuse philosophy of FpML.

The rules identified in the schema analysis are captured in English, but already include some structure - for example element references are captured using fragments of XPath, for example in rule ird-54: [The dates in step/stepDate must be unadjusted calculation period dates in ../../calculationPeriodDates]

The next stage uses the rules to construct some valid and invalid test cases. FpML already publishes a number of sample documents with every specification. These are used as input and modified to violate a number of boundary cases for each rule. This process has been very useful in identifying rules that were either ambiguous or, on second consideration, not valid. At the end of this stage, we had a number of files that were guaranteed to be correct or invalid with respect to one particular rule - though no guarantees were made about interference with other rules.

The final pre-release stage in the process used the natural language description to codify the rules in suitable validation constraint languages. These are the reference implementations of the rules, which are designed to show how the rules are actually expressed over instance documents, with all the necessary details such as namespace prefixes and precise element locations. They are intended as guides to users who wish to implement their own validation components.

When the rules are finally release, they are allocated a unique URI that captures the release date. Rule release is currently decoupled from the standards process to enable rules to evolve more quickly that the schema, to enable the working group to deal with its backlog.

3.1. Issues

There were a number of issues we faced during the rule gathering and formalisation process that are likely to recur in any organisation embarking on a similar exercise:

Access to experts - the validation rules represent domain expertise that is held by a small number of individuals whose time is very constrained. Therefore, the tasks of writing up and correcting rules, and of writing test cases had to be allocated to volunteers in the working group. A bottleneck is created mainly in the initial rule gathering process, and when clarifications are necessary. While the separation of roles between business experts and volunteer technology experts has helped in sharing the workload, such bottlenecks need to be taken into account as they cause difficulties in planning.

Tacit Knowledge - some knowledge about the business meaning becomes so codified and "obvious" that it is hard to extract. The involvement of non-experts who ask unexpected questions can help to surface such issues.

Vulnerability to schema change - during the development of the validation rules, the schema for FpML was still under development. The semantic rules rely on a constant structure and some schema changes broke or invalidated rules. This caused some tedious rework as the schema changes had to be manually inspected and the affected rules rewritten or deleted.

Test case generation and maintenance - the test cases for FpML are currently generated manually. For the current set of about 50 rules, this means 200 hand-edited and annotated files. Not only is it time consuming to create and maintain these test cases - they are subject to schema changes just like the rules -, but there is also no coverage guarantee since it would be prohibitive to manually investigate the space of counterexamples.

4. Reference Implementations

There are currently two reference implementations for semantic validation of FpML: the first expresses rules in Schematron [Schematron] and the other in CLIX [CLIX]. We will have a look at how natural language rules are captured in these languages in the course of a short description of both.

Schematron is an XPath [XPath] based constraint language. It uses paths to select elements from documents and boolean expressions to make assertions about those elements. When assertions are violated, a custom error message, which can be a full formatted XML fragment, is written to the output. Schematron is frequently executed using XSLT, but some programming language implementations are also available. Most implementations of Schematron are made available as open source products.

The following is an example constraint taken from the FpML rules, rule ird-13, which reads in natural language: [If rollConvention is neither NONE nor SFE then the period must be M or Y.]

<pattern name="ird-13">
  <rule context="//calculationPeriodFrequency[not(rollConvention 
                                       = 'NONE' or rollConvention = 'SFE')]">
    <assert test="period[. = 'M' or . = 'Y']">
        In calculationPeriodFrequency: if rollConvention is 
        neither 'NONE' nor 'SFE' 
        then the period must be 'M' or 'Y'.  
    </assert> 
  </rule>
</pattern>
  

CLIX is a constraint language based on first order logic and XPath. It was specified by Systemwire as the input language to the xlinkit check engine and is currently undergoing a process of open publication and decoupling from the product. CLIX combines a powerful and expressive language with reporting features that include reporting in XML and HTML and, significantly, is able to express constraints between multiple documents. This means that CLIX can be used to validate against reference data and legacy systems. The above constraint is handled by CLIX as follows:

<forall var="frequency" in="//calculationPeriodFrequency">
  <implies>
    <and>
      <notequal op1="$frequency/rollConvention" op2="'NONE'"/>
      <notequal op1="$frequency/rollConvention" op2="'SFE'"/>
    </and>
    <or>
       <equal op1="$frequency/period" op2="'M'"/>
       <equal op1="$frequency/period" op2="'Y'"/>
    </or>
  </implies>
</forall>

This can also be automatically rendered in a standard infix form for web publication, using XSLT, as:

forall frequency in //calculationPeriodFrequency (
  ($frequency/rollConvention!='NONE' and $frequency/rollConvention!='SFE')
      implies
  ($frequency/period='M' or $frequency/period='Y'))

For about 70% of the rules, the differences between Schematron and CLIX are not very pronounced, and lie mainly in the way the languages capture the business logic: in Schematron, they are captured as part of the XPath expressions, in CLIX the XPath expressions are used for element access only and the logical language around them provides the business logic. We will leave it to the reader to judge which way is easier to use, although we believe that exposing a structured language instead of capturing a rule in a single string aids readability. Separating the logical language from XPath also has the advantage that CLIX can be applied to data other than XML if necessary.

For more complicated rules, the difference becomes more pronounced: Schematron, which builds on a boolean logic, does not feature iteration constructs. Therefore, it cannot handle rules where many iteration constructs need to be nested, for example rules of the form [forall elements A, there is an element B such that for all elements C ...] and so on. In these cases, the Schematron rules normally fall back to a JavaScript implementation whereas CLIX retains its declarative characteristics. This is beneficial because the rules retain their documentary character, can be shown to business analysts, and can be used for static analysis tasks like automatic test case generation.

The remaining differentiator is to do with reference data, which is less important for a standards body like FpML - FpML is not in the business of standardising reference data - but becomes important when deploying a validation solution in practice. At that point, currencies have to be checked against currency tables, business centres against business centre tables, and so on. It would be unfortunate for an organisation that has invested into a declarative approach for its advantages to have to fall back to hardcoding at that point. CLIX can get around this problem because its constraints naturally span multiple documents, which can include documents fetched from databases.

The remaining requirements for the validation language relate to performance - which has not yet fully addressed by the group, although preliminary tests with CLIX show checks on trades to lie in the low 100 msec range, which seems acceptable -, and with reporting and ease of use. It is too early to report on ease of use, since the people involved in providing the reference implementations were generally already experts in the respective languages. Reporting requirements were met by both Schematron and xlinkit, both of which can handle HTML and XML reporting natively.

At the time of this writing, both reference implementations are about to be released with the next FpML Working Draft. There will also be a free validation service together with further information and background on implementation that we would encourage readers to try out, at http://www.systemwire.com.

5. Summary

The issues that have surfaced in trying to validate FpML are likely to resurface with any organisation or standards body that has to manage data with complex structures and semantics. There is an understanding in the finance industry, which to some extent reflects its maturity, that surfacing the business meaning of elements in validation rules is beneficial because it contributes to knowledge management practices, decreases the implementation effort, eases testing, and eliminates ambiguities.

We expect the rule gathering process to mature in the next few months as we learn from our initial experiences and find new approaches to addressing the issues that we have outlined above, and hope to report again in the future. In the meantime, we invite readers to visit the FpML web site and the validation service, and FpML rules page, on http://www.systemwire.com.

Acknowledgements

I would like to thank all members of the FpML Validation Working Group, and their sponsoring companies, for donating their time and effort.

Bibliography

[CLIX] The CLIX constraint language, web site http://www.clixml.org

[FpML] The Financial Products Markup Language, web site http://www.fpml.org/

[ISDA] The International Swaps and Derivatives Organisation, web site http://www.isda.org/

[Schematron] The Schematron Assertion Language, web site http://www.ascc.net/xml/resource/schematron/schematron.html

[XML Schema] XML Schema, web site http://www.w3.org/XML/Schema.

[XPath] XPath 1.0, web site http://www.w3.org/TR/xpath

Biography

Christian did his PhD research at University College London on Managing the Consistency of Distributed Documents. In 2002 he set up Systemwire together with his advisors Anthony Finkelstein and Wolfgang Emmerich to commercialise the results of this research, the validation engine xlinkit. He is currently Technical Director of Systemwire. Christian is also chairman of the Validation Working Group of the Financial Products Markup Language (FpML), an industry standard for financial derivatives trading.