MDDL Market Data Definition Language
ABSTRACT
MDDL is the product of an industry-wide consortium of major players in the financial market industry for disseminating quotes and related data for various financial instruments. This presentation discusses MDDL and the use of XML mechanisms that work for this highly focused industry segment.
The financial industry has long been a focus for daily activity of all walks of life. From periodicals to news bulletins, the media promotes the performance and specifics of the stock markets to billions globally. Individuals are increasingly more active in monitoring "the market" enabling them to contribute directly to portfolio investment decisions previously handled only by professionals.
Under the auspices of the FISD of the SIIA, MDDL has been developed by an industry-wide consortium including the largest players in the financial market for the distribution and interchange, via XML, of the financial data that comprise the basis for this activity. Launched on 02 November 2001, the specification initially covers snapshot quotes and data fields for instruments like equities and indices but the framework is developing to include all types of financial instruments and related data. MDDL seeks to provide a common vocabulary and common data model definitions to facilitate the interchange of market data from source through vendors to the end-users OR consumers.
Market data is often delivered on dedicated circuits using highly specific compression schemes involving a large base refresh or "snapshot" of the data followed by subsequent smaller "streaming" updates. Other market information is broadcast or can be requested as needed over these same circuits. Several XML constructs are employed to represent the model and data consistent with the prevailing method of "snapshot" data dissemination. Though not likely to match the performance of dedicated systems, the specification has some features permitting extensions to enable "streaming" updates as well.
As MDDL is focused on the specific issue of market data, it has a necessary interrelationship with other financial reporting schemes including news, public disclosures, and research.
The presentation will present scenarios and examples illustrating the focus of MDDL and its conceived implementation. Specific discussion will focus on the references and properties concepts inherit to MDDL, as well as extensibility, and the function XML constructs like these play in a developing standard.
Reference information can be found at http://www.fisd.net/ or http://www.mddl.org/
Table of Contents
1. Introduction of the Space
Globally, there are more than 250 markets (exchanges) where more than two million financial instruments and securities are traded by buying and selling various forms of stocks in a generally closed, membership only, environment. When a trader, acting as a buyer or seller on that exchange, quotes a price (bid or ask respectively), or when two traders come to an agreement and trade a stock, an event is generated by the exchange indicating the positioning or intended sale of a security. The exchange will also generate events reporting administrative changes or corrections in instruments covered by the exchange or transactions reported during the day. Note that the reporting of a trade of a stock in any market is not actually the legal contract but is a commitment that often can take three days to close before all the legal paperwork and funds are exchanged.
In addition to the formal exchanges, there are many other pricing desks and related organizations responsible for setting the rate or price on non-exchange traded securities. These desks, effectively, create instruments that are also quoted and sold and report this data out in a similar manner to an exchange.
The data provider, whether exchange or pricing desk, then distributes the market data to vendors, or others who have purchased the distribution service, in a format specific to that provider using nomenclature unique to that provider. The provider may choose to enhance the quality of the raw data or augment the data provided from the exchange with custom data. For example, the exchanges may provide trade information as well as a company's listed name while they may not provide historical performance references nor will they provide other data related to a company's financial status.
MDDL seeks to focus on the aspect of financial market data used to analyze, account, or trade within a market. This may include volume and pricing data directly from exchanges or pricing desks as well as ancillary data derived from multiple sources. The intent is that a single set of definitions for market data terms, and thus a single Schema, be adopted to facilitate the recipient's processing of MDDL. By providing a standard mechanism to interchange a comprehensive collection of market data it is easier for users of MDDL to receive multiple sources of market data thus streamlining processing.
MDDL is an XML based interchange format and common data dictionary of the fields needed to describe 1) financial instruments, 2) corporate events affecting value and tradability, and 3) market-related, economic, and industrial indicators. There is no question that financial instruments are prominent in the intent of MDDL. Corporate events are relevant to this focus especially when an instrument stops trading, merges, or is discontinued. Indicators provide a basis for comparing various instruments in a market with averages or ranges of the market(s) as a whole. All of these objectives are stated with the overwhelming consideration to the primary focus of analyzing, accounting, and trading within a market.
2. Introduction of the Technology
Most market data events are generated by the exchanges and delivered over dedicated circuits to market data vendors or, in some cases, directly to end-users. The vendors, in turn, decode the transaction, normalize the data, update relevant databases, and redistribute the processed data to clients via a data feed. This data feed (generally) utilizes a proprietary protocol over a private network to vendor provided servers at the client facility. In many cases, the vendor provides a workstation or workstation software that accepts the feed and provides displays for the end-user.
In other facilities, users write feed handlers using an Application Programming Interface (API) provided by the vendor(s) of choice, process it into an internal nfrastructure, and redistribute it via the user's own network to private applications at end-user desktops. The redistributor may derive new data fields based on proprietary algorithms, or enhance that market data with data from proprietary instruments. For example, a mutual fund manager may take in a data feed from a vendor to understand the fund's constituents, convert it to a proprietary format for internal processing, add the corporation's official mutual fund instruments to that data stream, and redistribute the combined data to other clients or end-user applications. Further, there are many instruments that are not exchange traded and thus are quoted directly by firms specializing in that type of product.
In a professional or trading environment, the end-user application is often a dedicated workstation connected to a high capacity network receiving all desired data as it becomes available - conceivably in less than three-fourths of a second after the actual event delivered from the stock exchange. While not all professionals require this constantly updating data, these same workstations and applications are also used off-trading-floor as the application is readily available. Many market data related corporations are understanding that end-users do not require up-to-the-millisecond data and needs can be satisfied in a request-response environment facilitated by a browser and a website on a public or private network.
Most of the data about a financial vehicle is static throughout a business day, week, or even month. However, a significant amount of the data is very volatile throughout a trading session. Consider, the onslaught of all the activity that begins a United States market trading day (tens of thousands of vehicles and derivaties all trading individually). The vendors and redistributors that listen to all of the exchanges may receive more than 20,000 transactions per second for a sustained period over an hour. In this environment, it is not feasible to provide all of the data about an instrument every time it is modified. As such, when a user requests information about a vehicle for the first time, the user receives a refresh that the application is expected to keep. When modifications to the instrument take place, an update is sent to the user indicating a particular modification to that refresh. These update methods, when delivered from the vendor to its server, are compressed using a highly proprietary scheme but are often translated at the server into a licensed format the user can receive and process into a private application.
3. Where MDDL Fits
MDDL seeks to define a common format for market data utilizing an accepted neutral vehicle (XML) as well as a normalized understanding of the terms for the data being delivered in the various market data environments - exchange to vendor, vendor to client, or client to end-user.
MDDL has been a volunteer effort of member corporations of the sponsoring organization. FISD provides general management but several workgroups, comprised of members, were defined to handle the separate functions of developing the specifications:
-
The Technical Committee focuses on the angle brackets, elements, and attributes of the XML specification.
-
The Vocabulary Committee seeks to define the common terms that are the cornerstone of MDDL delivered market data.
-
The Liaison Committee is responsible for interaction with other specifications and standards organizations.
MDDL does not specify a wire protocol for exchanging the XML payload but does define the format of the instance documents delivered. From the MDDL perspective, it is the purview of the market data provider (whether it be exchange, vendor, or redistributor) to determine the physical method for data delivery.
Version 1.0 of MDDL delivers the initial products of common equities, mutual funds, and indexes. These products are considered sufficient to spark the interest of a large number of parties (equities) while providing some diversity to lay the foundation for the remainder of the specification. With the primary work of the technical form of the specification completed, additional products can now be added as soon as the appropriate working groups define them.
Version 1.0 implements the first two constructs, snap and timeseries . The snap is a one-time snapshot of data at a particular moment in time or, perhaps, after market trading closes. A timeseries is used for reporting multiple sets of data where the principle differentiation is time. For example, a snap may be provided during the middle of a trading day to show the current value of a stock or at the end of the day to show the closing values of the stock during the trading day while a timeseries would be used to provide the closing values of a particular stock over the range of a month or year.
MDDL Version "1.0-final" was formally announced and released on 02 November 2001 at the Fifth World Financial Information Conference sponsored by the FISD/SIIA (http://www.fisd.net/ and http://www.siia.net/). The specification can be viewed at http://www.mddl.org/. There are several files (besides the prosaic documentation) involved in defining the specification:
-
mddl-<version>.xsd is the Schema used to specify the XML specification.
-
mddl-<version>.dtd is the DTD related to that Schema.
-
mddl-<version>.saf is a Schema Adjunct File (SAF) containing the common definitions, potentially in multiple languages.
-
Many <controlled vocabulary>.xml files contain the enumerations and definitions of the controlled vocabularies.
4. Issues and Solutions
The MDDL committees identified various requirements for the specification that presented challenges for the technical committee to solve. It is likely that some of these issues are common to other XML specifications defined or in progress. These issues and proposed solutions are described below.
4.1. Revisions of the Specification Led to Attribute version
From the broad variety of market data available, only a small number of products were selected as initial targets to help formulate the foundation. As such, MDDL may be revised with new products and constructs in subsequent revisions faster than recipients can update systems to process them. Therefore, a very subtle yet powerful convention has been adopted - the main mddl element contains a required attribute, version , that must specify the formal name of the version of the specification used to generate the document.
This version attribute does not take up much space but does permit the receiver to quickly determine if the document might experience processing problems. It quickly identifies which version of the Schema or DTD should be used to validate the document if so desired. If each subsequent revision of the specification identifies a lexically larger version number, a receiver can make a simple test to determine if the instance falls within the scope of the receiving system's ability to handle the document.
Example: A minimal MDDL document:
<mddl version="1.0-final"> </mddl>
4.2. List of Known Sets Led to Controlled Vocabularies and Scheme
Most of the vocabulary terms within MDDL are represented as elements in the specification although certain lists of values are not consistent amongst all providers of data (either vendors or redistributors). Often, each corporation has its private lists for the same data element. For example, while many may agree on the ISO Currency Code abbreviations, not everyone may agree on whether to use the ISO 2 or 3 character country codes. Or, it is possible that the ISO lists do not contain a particular value or abbreviation so it must be extended. Further, some lists of values are so long that no specification should rightly contain an element (or even an attribute value enumeration) of each. Therefore, a convention called controlled vocabularies has been implemented whereby an external file can contain the abbreviations and definitions of all the possible values of an element. Elements where this applies have a scheme attribute identifying the URI of the list (in many cases a URL to the actual file). Receivers can validate the value against the file (or a copy of it!) as well as extract the definitions of the values from the file. In this way, if a company decides to have its own list of values or needs to specify a file with additional values for a controlled vocabulary element, it can change the scheme name from the default and place its own value in the list.
Example: The following is a reference to a controlled voabulary:
<currency scheme="http://www.mddl.org/mddl/2001/scheme/isoCurrency3.xml"> USD </currency>
which can be shortened to:
<currency>USD</currency>
if the default vocabulary is used for a particular property.
4.3. Proprietary Constructs, Products, and Properties Placed in Element other
A requirement of MDDL is that a producer can have its own proprietary constructs, products, or properties. While XML allows this natively, these extensions are clearly delineated within MDDL so a receiver can easily identify the extensions. Each element has an other child where namespaced content can be placed. As almost every piece of data is defined as an element, a producer can include its own constructs or products but may also enhance an existing property with its own interpretation of that property. The receiver is clearly aware that such extensions have been added.
Example: Any valid namespaced XML is added to the other tag:
<source>
<mdString>Vendor, Inc.</mdString>
<other><vendorML:slogan>Providing you a world of market data</vendorML:slogan>
</source>
4.4. Common Data within/without a Document Included in References Using XLink
When a large MDDL document is created, or a smaller document is desired, it is possible to include common sets of repeating data in a special references section within the same document or in another location completely. For example, if a set of snap s are provided relating to an instrument and derivatives from that instrument as well as a timeseries are provided for those vehicles over a month across a number of instance documents, a large portion of the identifying features of the company including the formal name, address, and other vital statistics would need to be repeated unnecessarily. By placing this common data within a single instance (perhaps in the references section of a static file on a website), MDDL can refer to this information in a way that indicates to the receiver that the data, if imported into the document, can be placed inline. This is intended for use with information that is ancillary to the primary market data but can be used with any tree of MDDL data to reduce document size, or in some cases, provide additional clarity when linking multiple quotes together.
Example: This references section:
<references>
<exchangeIdentifier id = "lse">
<code scheme = "http://www.vendor.com/XML/scheme/exchange">NYSE</code>
</exchangeIdentifier>
</references>
can be used in this manor:
<ref:exchangeIdentifier xlink:href="id('lse')"/>
Note that the xlink:href can make any valid XLink reference including another document.
4.5. Common Properties across Data Fields Lead to Inheritance Rule and a Shorthand
Many of the data fields (like last which is a price) share a common set of properties with other similar data fields. For example, any of the prices may be modified by currency , dataDateTime , valuationType , or multiplier , etc. Often these values are the same across all of the data fields in the hierarchy. For example, all of the data in a snap is probably in the same currency . A convention has been adopted whereby any of these properties can be defined at a higher level and all of the children inherit that value unless specifically overridden. In this way, the producer can define the currency at the snap level and imply that all of the values provided in that tree are of that same currency .
As an extension consider that if all the common properties are elevated up the hierarchy, a property may have no children other than the identifying base type (for example mdDecimal for a price type). When this is the case, the element that is the property can simply contain the value of the property rather than the child elements. A receiver can identify the base type from the schema or the SAF glossary.
Example: The following abbreviation:
<last>638</last>
is equivalent to:
<last> <mdDecimal>638</mdDecimal> </last>
given that last is a decimal.
4.6. Schema and DTD support Require Conversion Scripts and Functional Specification
MDDL supports both XML Schema and DTD formats. Initial discussions led to this decision given the availability of tools and the comfort of organizations in processing DTDs. However, the recent advantages of Schema have not been overlooked. The specification is developed in Schema and XSLT is used to convert the Schema to a DTD. Eventually, extra value and parameter checking can be added to the Schema for processors that would want that extra assistance while still providing a DTD without the overhead of editing and maintaining two copies. It must be noted that many of the advantages currently available in the Schema format cannot be used as there is no corollary in the DTD but this limitation is deemed acceptable as most of there features that would not likely be used in MDDL.
Extract of make file for MDDL:
xslt mddlbase-1.0-draft.xsd mddl-1.0-draft-schema2ref.xsl mddlref-1.0-draft.xsd xslt mddlbase-1.0-draft.xsd mddl-1.0-draft-addbaseref.xsl mddl-1.0-draft.xsd xslt mddlbase-1.0-draft.xsd mddl-1.0-draft-schema2dtd.xsl mddl-1.0-draft.dtd xslt mddl-1.0-draft.saf mddl-1.0-draft-saf.xsl mddl-1.0-draft-saf.htm xslt mddlbase-1.0-draft.xsd mddl-1.0-draft-schema2saf.xsl mddl-1.0-draft-index.saf
5. Other XML Issues to Be Considered
5.1. Method of Query
MDDL 1.0 provides a useful way for delivering market data from a producer to a recipient but currently makes no provision for the recipient to specify what data is to be provided. The user may wish to specify which instrument is desired as well as which data fields the user would like to receive (all data fields or particular ones or particular subsets). Therefore, a method of query needs to be identified. The developments of XQuery are encouraging - it provides more features than are necessary for MDDL but does add some features that XPointer/XPath/XLink do not.
5.2. Compression
MDDL, as with most XML specifications, is more verbose than traditional methods of data delivery simply because it is string based and tagged. Some individuals worry that MDDL is too verbose for the bandwidth consuming market data business segment even though XML, and thus MDDL, can be compressed very well. MDDL, through references , provides ways to minimize repetition of common data. However, general Schema/DTD sensitive schemes could be more effective at compression. Investigation and application of an appropriate compression scheme may be necessary for wholesale adoption of specifications like MDDL beyond web and query/response environments.
5.3. Method of Updates
As previously mentioned, MDDL was originally targeted at snapshots or dumps of market data but with the intent that realtime or updating information could be encapsuled. The MDDL references concept, in conjunction with a sequence number property, can permit publishing an initial refresh document with the bulk of the data referenced in subsequent updates, consistent with currently prevailing practices, by specifying the reference and sequence number. However, there may be a more efficient way common to all XML standards...

