XML 2003 logo

Use Cases for Native XML Servers

Abstract

Native XML Servers that manage XML through direct support for XML Schema and XML Query languages have been marketed and sold for several years, but there is still controversy about the situations in which they offer a better solution than a filesystem or a relational database. Because of the value of XML to organizations and the rapidly increasing amount of XML in use, understanding when to use which tool is especially important and salient for those responsible for company IT systems, software product development, and system integration projects. Determining when a Native XML Server is appropriate requires understanding the requirements of different types of applications that must manage, query, transform, or store XML. This presentation outlines some principal use cases in which Native XML Server technology has proved its value, drawing on Software AG's work with over 500 Tamino XML Server customers.

This paper begins with a review of the core value proposition of XML and the need for technology to assist with the persistence and exchange of XML. Specific technical requirements for managing XML are presented that show the issues that led to the creation of a system designed for persisting XML, the Native XML Server. A set of use cases that drive the need for Native XML Servers are explained. The use cases fall in two distinct yet related areas: content-oriented applications where data management is of primary concern and integration applications where the exchange of information is of primary concern. This paper explains the issues that make XML valuable in these use cases and specifically how a Native XML Server is important to their implementation.


Table of Contents

1. XML Value Proposition
2. Management of XML and the XML Server
3. XML Application Use Cases
3.1. Content Use Cases
3.1.1. Web Content Management
3.1.2. Specialized Document Management
3.1.3. Business Documents (eForms)
3.2. Integration Use Cases
3.2.1. Service-Oriented Integration (SOI)
3.2.2. Metadata Repository
4. Native XML Server as Operational Data Store
5. Summary
Acknowledgements
Bibliography
Biography

1. XML Value Proposition

XML has changed the way we work with data. This is no small feat for a technology that is just over 5 years old and doesn't specify a particular vocabulary or semantics, but is primarily a meta-language for describing a vocabulary. XML has been rapidly adopted as a representation for documents, metadata, and data for integration with other systems. Just about every system of record, be it a relational database or a packaged enterprise application now supports XML. The trend to new data being created is also to XML; leading content-creation products such as Adobe's PDF and Microsoft's Office now support XML natively.

XML has become a relevant technology so quickly because of five key attributes:

  • XML is a standard approach to describing data formats. XML can be used to create new data formats using a core set of conventions as a foundation. Instead of writing a new text format for a particular type of data and inventing a parser to process it, developers can use XML and standard XML parsers for this task. This results in simpler implementations in less time. [XML].

  • XML is flexible. An XML document can contain hierarchies, tables, and graphs of related and even recursive structure. This flexibility permits the ability to model a much wider range of data than previous formats, such as the row-and-column approach to data required by SQL.

  • XML is human-readable and available in any language. Anyone can look at an XML document and see what is being exchanged and the overall structure of the document, as opposed to binary formats of data exchange and representation. Common text encoding problems with multiple languages and character sets are also resolved within the XML framework.

  • XML is self-describing. An XML document can be packaged with an XML Schema or DTD, describing the structure and format of the XML document. XML differs from a CSV file or SQL model in that the data model can for the document can be expressed in a well-defined form and exchanged between systems as well as the data itself [XML Schema]

  • XML enables the information content of a document to be separated from presentation details. XML is useful as an intermediary format that can be transformed into many other kinds of documents such as HTML, Word documents, PDFs, pictures (SVG) and to mobile devices.

These advantages are the drivers of XML adoption. For more information on XML and evidence of its adoption in particular applications and vertical markets, please refer to [XML Backgrounder].

2. Management of XML and the XML Server

Originally, XML was designed as a document format but without much thought to storage and management. Content applications that began to use XML needed to solve this issue so that the XML documents could be kept in a secure repository. XML is also being used to facilitate integration between systems. XML documents are used to communicate data that must be exchanged between two different systems in a business process, for example a supplier notifying a purchaser of product availability. In these situations, XML must be stored in order to audit the integration process and managed for the duration of the process.

Developers have looked to manage XML in essentially two different ways:

  • One approach is to store XML in a filesystem. This approach works well for a few XML files (such as configuration files). However, it does not scale very well to situations when thousands of XML documents must be managed. The filesystem does not offer good tools to do search queries. Additionally, the search queries cannot directly access the XML data model, but only query the XML as a plain text file- ignoring the structure of the document. The filesystem does not offer any support for transactions, performance or scalability that are necessary for effective data management.

  • The second approach is to store the XML in a relational SQL database, such as the Oracle Database, IBM DB2 and Microsoft SQL Server. However, relational databases were not originally designed for the storage of XML, as XML was not around when they were developed. Relational databases were designed around the relational model and the standard query language called SQL.

Application developers often turn to an SQL database to manage XML because they need several features that SQL databases have traditionally implemented. List 1 below summarizes the requirements that developers look for in a DBMS. For a detailed discussion, see [XML Repositories].

  • Transactions: XML management requires ACID properties, and support of standard transaction isolation levels. In some cases, two-phase commit and distributed transactions are also necessary.

  • Standard CRUD operations: create, read, update, and delete.

  • Support for high performance search and query operations over large datasets.

  • Overall stability and being able to run for long periods of time without interruption

  • Enterprise Readiness in the form of High Availability, Scalability (up and out), Replication, Backup, Restore

Another reason many developers use an SQL database to manage XML, and a quite important one, is that it is a familiar tool for these purpose of data management. Because of the widespread demand for XML support, existing vendors of relational databases have added features to their products to assist with working with XML. This support for XML usually comes in the form of tools to assist with the mapping of XML to an SQL data model and extensions to SQL for the evaluation of XML Query statements, usually XPath [XPath]. These tools and extensions are very valuable and they provide methods for SQL databases to expose XML interfaces. For many situations, these tools are sufficient to work with XML. The main objective in using XML is often to connect with other XML applications and these tools provide a way to do it.

However, not all applications are best served by these XML extensions. There are many situations where it is too difficult to do the mapping or the mapping does not solve the key issues around the application. SQL and XML are very different languages and the differences create challenges for developers working with both.

Feature XML SQL
Data Model Hierarchical Tabular
Atoms of Data Documents, Elements, and Attributes Tables, Rows, and Columns.
Data Modeling Language DTDs and XML Schema SQL DDL
Query Languages XPath and XQuery SQL
Manipulation and Transformation Tools XQuery and XSLT No direct comparison
Interfaces and APIs HTTP, WebDAV, and APIs for C, Java, .NET etc. ODBC and JDBC and APIs for C, Java, .NET, etc.

Table 1. table1

When working with a use case that requires deep XML support and, the requirements for a transactional DBMS from List 1 are necessary as are the requirements from the XML column in Table 1. There is a product that provides the features of a DBMS and deep support for XML: the Native XML Server. Details on the evolution of this product category are outside the scope of this paper, but Michael Kay provides an informed perspective on this issue from work with Software AG's Tamino XML Server [XML Databases]. The XML tools provided by relational database vendors simply do not solve all of the issues created by the fact that XML and SQL are inherently different technologies created with very different goals and purposes. The need for technology designed for XML is clear in the discussion of specific XML applications and use cases.

3. XML Application Use Cases

XML is useful in both content and integration applications. XML's capability to separate information content from presentation details renders it very useful for content applications, because XML is a neutral document representation. This same capability is useful for integration because of the ability to query the XML, transform it, and route it according to the information content. Content applications can use XML to provide a consistent model to represent the document's content for computer processing needs and apply whatever presentation is appropriate for a particular application. The benefits of a structured format that could be parsed easily also made XML useful for integration between different computer systems. XML has already been widely adopted into both scenarios. Software AG has seen the relevance of these use cases and others through the use of Tamino by over 500 customers.

Because of XML's document oriented nature and its use in data exchange, XML applications are primarily based around two conceptions of XML: "XML as a document" and "XML as a transient message." Content use cases are centered on applications where an XML document is the focus of activity. The primary point of a content application is to manage the documents and the execution of a workflow related to the processing of the documents. Three use cases in this area are Web Content Management, Specialized Document Management, and Business Documents (eForms).

XML is also provides value in integration use cases. Although XML is not the only technology used in integration, it provides value because it is based on standards and it is widely adopted as interfaces to other systems. The primary point of an XML integration application is to manage the process whereby systems are connected together and how XML is routed from system to system. Two use cases for integration with XML are Service-Oriented Integration and the Metadata Repository use case.

3.1. Content Use Cases

3.1.1. Web Content Management

Web Content Management (WCM) applications are used to manage the content of a web site. XML is useful to WCM because it provides a way to define the structure of content to be displayed. Moreover XML can be transformed to fit multiple delivery channels such as PDF and mobile devices. An XML structure can be defined with a template that specifies a DTD or a Schema of required elements for the content. This assists with editorial and workflow requirements because it enforces a particular structure of the content. The submission process for XML2003 is a great example of this. The XML content needs to be stored reliably and exchanged with other systems. Moreover, it is useful to be able to query the contents of the XML documents to support the web site.

Stellent is an example of a company with an evolved XML strategy. Stellent markets and sells a Universal Content Management server that uses Tamino XML Server as a repository for XML content. Stellent gains these benefits from an XML Server: highly streamlined content exchange, enhanced search and retrieval capabilities and enterprise-level scalability. A Native XML Server makes it simpler for Stellent to implement XML capabilities than doing everything in SQL.

3.1.2. Specialized Document Management

XML can be used to model complex document structures- technical manuals, contracts, records, and financial statements. All of these documents have very strict structures that need to be enforced for legal and usability purposes. Moreover, they contain a large amount of complex and inter-related information. Efforts to model these documents using SQL generally require months of time because the documents do not fit within the relational model very well. Customers have seen that a relational approach can take 5 times as long as an XML approach because the relational model is not a good fit for documents.

An example of specialized document management is Single-Source Content Management. This approach is relevant when it is desired to reuse content at a fine-grained level and manage the workflow of content to several different channels. It is an especially useful approach for managing Interactive Electronic Technical Manuals (IETMs). An IETM is used to present information for the use and maintenance of complex products, such as airplanes and turbines. X.Systems has built a Single-Source Content Management system around Tamino XML Server called GemT. GemT uses Tamino as the repository for all content and as a means to integrate content with other systems of record. GemT uses XML to represent the complex information of the IETM because of XML's flexibility for modeling complex document data and for its ability to represent linked and hierarchical information, such as links between information about specific machinery pieces and instructions to maintain them.

Contracts are another example of a complex document that must be modeled and managed using a flexible data model. Nextance has developed a contract management system on top of Tamino that can model and manage contracts using XML. Nextance is able to implement a Contract Management System for their clients much more quickly than using a relational model. Modeling a contract as a series of tables can take 30-40+ tables for complex contracts. This can be modeled in XML as a single document with more ability to understand the content document and its structure because the structure is human readable. Instead of querying through a complex set of joined tables, it is possible to directly query the document structure. Tamino is integrated with existing back-end enterprise applications and relational databases in Nextance customer implementations.

3.1.3. Business Documents (eForms)

There is an increasing need to manage the business documents that define business transactions, chiefly forms. Insurance forms, application forms, time sheet forms, and mortgage documents are all examples of business documents. Vertical industries based around documents have defined schemas for their business processes. There are several examples from the financial services industry such as ACORD for insurance, MISMO for mortgage, FPML for financial products, and XBRL for reporting. Moreover, technology companies have defined eForm solutions that rely on XML infrastructure to model the data of the form such as Adobe, Microsoft, and PureEdge.

These companies all offer products to connect an XML document and an XML schema, such as one of the schemas of a vertical industry, to an electronic form. The electronic form provides specific presentation information and can embed business logic relevant to the application that assists with automatic calculation and validation. When the form is filled out it can be submitted to a server application that will process the XML. This enables the automation of paper-based workflow processes and the better integration of existing applications. Many scenarios involving eForms will require the data to be reflected back into a system of record, often a relational database. In these scenarios, the XML Server can act as an XML warehouse or an operational data store for the processing of the form along an integration workflow before the data is committed to the system of record.

With an XML Server, it is immediately possible to store the XML document of the form. The XML form can be stored with or without a defined schema. Once the data is stored in the XML Server, it is possible to query the information, generate reports, and provide web service interfaces to other systems. The XML Server is a specialized application server for storing the XML, processing it, keeping an audit log, and then updating other backend systems as necessary.

3.2. Integration Use Cases

3.2.1. Service-Oriented Integration (SOI)

A new form of integration architecture is rapidly evolving: Service-Oriented Architecture (SOA). SOA is an architecture for enterprise applications centered around the idea of software applications providing a service and that service providing a standard interface based on an XML document. SOA relies on XML as that data, Web Services as a transport protocol, and supporting XML technologies to model an XML integration. At the core of SOA is the ability to flexibly integrate systems through a loosely-coupled model. Service-Oriented Integration (SOI) applies the concepts of SOA to integration. SOI is useful for when it is necessary to integrate several heterogeneous systems without replacing those systems. A common issue is to provide real-time access to data from several systems, customers and partners.

Implementing SOI requires several integration tools for transporting and transforming XML data. XML must be retrieved from multiple sources, often by mapping the XML from a different interface (such as CICS transaction or a packaged enterprise application). The processing of the XML for the integration may require complex business logic involving a series of transformations as well as intermediate persistence for the purpose of executing the process. An XML Server is a core requirement for this as an intermediate store of the data for the purposes of auditing, logging, and querying. Software AG has implemented this architecture for numerous customers.

A major logistics company, North American Van Lines, was able to use service-oriented integration to expand the use of its EDI connections with its partners. In addition to an XML Server, XML integration middleware called EntireX was used to provide XML interfaces to its legacy systems. Tamino XML Server was used as an operational data store for XML to be exchanged with customers via a website. Tamino is used to support a new XML application for the auditing of the XML transactions. XML is being used instead of EDI for new partner connections because of its ability to enable realtime access to information through multiple channels, especially the web and mobile phones, and because it is a cost-effective to implement.

3.2.2. Metadata Repository

Many companies have tens or sometimes hundreds of different systems of record for different purposes. It is very difficult to determine what data is stored where. One system may store product data, another stores customer data. The data is also fragmented across different data stores. Making sense of all of this different data requires a metadata repository that can catalogue the different data stores and different semantic meanings of the data.

An XML server is a natural fit for a metadata repository because of XML's ability to model complex hierarchical data and links between different pieces of information. Because the structure of data to be managed often involves hierarchies and links, the hierarchical data model of XML is very effective. In this use case, the XML Server stores all of the metadata and provides the ability to map the elements of different data stores into a coherent set of documents for query and analysis. The XML Server can also act as a warehouse to store the data in an integrated form and can be used to query and analyze all of the different data sources.

4. Native XML Server as Operational Data Store

The above use cases show a clear need for XML to be supported deeply. It is clear from these examples that a Native XML Server can be useful in combination with a relational database. In such situations the Native XML Server is an operational data store and not the system of record. A system of record is the primary data store for an application. An example is an online transaction processing database for a retail web site. An operational data store exists to complement a system of record by storing data that is specific to an application or functional need. An example is a data warehouse or business intelligence database that enables the analysis of data in the retail website by region, product, etc.

This distinction is not new and has been part of database architecture for some time. Native XML servers can function as the system of record, particularly if the application is very document-centric. However, if a system of record is in place it makes sense to continue using it. A Native XML Server is useful in this situation as an operational data store that leverages the investment of the system of record by adding deeper support for XML than the system of record provides by itself. This scenario requires additional product features for the Native XML Server to be effective, as shown in List 2 below.

  1. Little to no administration required. The Native XML Server should not require a full-time DBA to manage the system but should be easy to manage.

  2. Small footprint. The Native XML Server should not require as extensive a CPU and Memory as a full-fledged database. The system should still be able to scale to a large amount of data, but it should not require undue amounts of disk space and memory because it is a focused application.

  3. Support for integration capabilities. The Native XML Server should provide basic tools to integrate with other systems especially other relational databases.

  4. Ability to serve as an intermediary. The purpose of the Native XML Server in this use case is not to store all of the back-end transactional data, but to store XML for a specific purpose, such as archiving or a temporary store before the XML is transferred to other systems. It therefore must be straightforward to load and extract XML and transfer to other systems. High throughput on loading is also important if Native XML Server will archive data from many different systems.

The combination of the requirements from List 1, List 2, and the XML column from Table 1 is what makes the Native XML Server a unique product. For use cases where the ability to work with XML deeply and in a rapid manner is important, a Native XML Server is suited to the task as a system of record or as an operational data store.

5. Summary

There are numerous use cases for a Native XML Server in both content and integration scenarios. The requirements for a Native XML Server have several points in common with SQL databases, but the requirements for support of XML capabilities make Native XML Servers a unique product category. The Native XML Server is designed to meet specific requirements for managing XML data and to integrate with existing systems of record, including relational databases. The value of a Native XML Server is clear from seeing how it enables different content and integration use cases.

Acknowledgements

I thank Michael Champion, Klaus Fittges, Trevor Ford, Sebastian Holst, and David Vap for thoughtful discussions on this subject and for their comments.

Bibliography

[XML] Bray, T., et al (2000), Extensible Markup Language (XML) 1.0 http://www.w3.org/TR/REC-xml

[XML Databases] Kay, Michael (2003), XML Databases http://www.xmlstarterkit.com/xmlzone/WP_XML_Databases_E.pdf

[XML Repositories] Holst, Sebastian (2003), XML Repositories: An Idea Whose Time Has Finally Come. http://www.gilbane.com/whitepapers.pl?view=10

[XML Schema] Thompson, et al. (2001), XML Schema Part 1: Structures. http://www.w3.org/TR/xmlschema-1/

[XPath] Clark, Jim (1999), XML Path Language (XPath). http://www.w3.org/TR/xpath/

[XQuery] Boag, et al. (2003), XQuery 1.0: An XML Query Language. http://www.w3.org/TR/xquery/

Biography

As Strategy Manager for Business Development at Software AG, Bryan drives the go-to market strategy for Software AG's XML products for partners. He focuses on understanding the business and technical benefits of Software AG's Tamino XML Server and EntireX to ISVs and System Integrators. Prior to joining Software AG, Bryan worked as a lead developer for ArsDigita, a pioneering collaborative commerce and content management vendor acquired by Red Hat, and Artesia Technologies, the leading provider of Digital Asset Management software. Bryan has several years experience in relational database development and XML, and was an early adopter of XML in content and integration applications. Bryan graduated from Northwestern University with a degree in Computer Science.