XML 2001 logo

Programming to XML - Data Binding Silver Bullet

Approaches and Alternatives

Subrahmanyam Allamaraju, Ph.D. <subbu@bea.com>

ABSTRACT

With the proliferation of XML across a broader range of applications than what was originally imagined, the programming requirements have changed over the past couple of years. Of the various approaches and APIs available for creating and processing XML, data binding is one of key technologies that addresses certain XML programming needs. The purpose of data binding is to provide programming language binding to XML documents. The goal of this paper to discuss XML programming needs in general, and identify a broader range of data binding needs. Some of the questions discussed in this paper include - What is data binding? What are the types of data binding? What kind of problems does it address? What kind of problems does not it not address? What are these assumptions and limitations of static data binding? Are there alternatives? In answering these questions, this paper goes back to the fundamentals of XML and XML programming needs in various application areas, and explores possible alternative data binding needs and approaches.

Table of Contents

1. XML Programming

Unlike most papers that start by stating that XML is ubiquitous because it is simple and eXtensibile, I would like to start with a contradicting note. It is true that XML is omnipresent today. However, this is not because XML is simple, self-describing, and eXtensible, but because it is more practical to use XML as a data format than other possible proprietary formats. In fact, a quick scan of the various domain-specific XML standardization initiatives for information exchange and information processing confirms this assertion. Nonetheless, there is a rich domain of applications that the self-describing and extensible nature of XML can solve. There are certain areas of enterprise applications that benefit tremendously from the extensible nature of XML.

Today, there are a variety of technologies to create and process XML. On one end of the spectrum of technologies, we have standard APIs like SAX and DOM, and standards such as XSLT and XPath. There are several implementations of these APIs and specifications. On the other end of the spectrum we've emerging technologies such as schema compilers and data binding. In between there are several homegrown XML programming APIs (often built on top of SAX and DOM) to simplify specific application needs.

This paper is about programming to XML and the role of data binding technologies. In this paper, I do not intend to discuss specific APIs or vendor products. Instead, my aim is to discusses what kind of data binding framework does an application developer need to create and process XML efficiently and productively in enterprise applications exchanging XML data.

In this paper, my focus is on applications that use XML as an interoperable data interchange format. In this category applications, XML is used as a data format. Applications (or components thereof) create XML and exchange with other applications (or components) for implementing business processes. This is one of the most commonly used approaches for achieving application integration - within and across enterprises. What are the programming requirements for such applications?

These applications are typically built using object-oriented programming languages, where information is often modelled as domain objects. These objects encapsulate business information, and various components manipulate these objects during the process of implementing business use cases. How to build such applications when the information is expressed in the form of XML? This question is similar to the question of how to build objects when information is stored in and retrieved from relational databases. This question lead to the so-called object-to-relational (O-R) mapping problem. The O-R mapping problem is known to be complex. Despite the availability of several mapping tools and environments, and the awareness of several design patterns, O-R mapping is complex as it attempts to map two sets of incompatible concepts. Most applications still deal with SQL directly using programming language level APIs such as JDBC.

The situation in the case of XML programming is similar. At the heart of the XML processing tools we have implementations of SAX and DOM. The rest is sweat and blood programming. We create XML documents manually or sometimes read it from files or network resources. We use SAX and/or DOM to extract information from XML documents. We use the DOM API to manipulate XML. Despite the veracity of these APIs, note that these APIs are not the best possible tools. SAX and DOM follow very "generic" approaches, and have very focused goals. For instance, the purpose of SAX is to provide a syntactical representation of XML. It reports characters of data as they are found in a document. On the other hand, DOM provides a structural view of XML. It constructs a tree-like model using which we can traverse to nodes of interest. These APIs have different purposes, and neither of these APIs make it easy and productive to process XML. As an application developer, what you're interested in is not syntactical or structural view of XML, but data. What you need is a mechanism to "get" useful information out of XML documents, "set" some information in XML documents, "create" XML documents etc. These are the operations (in quotes) you are interested in. For performing such operations, SAX and DOM do not come to our rescue. The typeless nature of XML coupled with these APIs make XML programming irksome and error-prone.

2. Data Binding

The grumblings of the early object oriented programmers developing database applications led to the notion of object-to-relational mapping. In the latter years, several vendors attempted (and continue to attempt) a variety of O-R mapping solutions. Today, O-R mapping is a viable solution for a certain class of problems, but it can not crack all possible mappings betweeen objects and relational databases. In the same manner, the adoption of XML as a data format and the need to create and process XML in object-oriented programming languages has lead to certain data binding ideas. Information, when expressed as XML, demands alternative approaches to enable more efficient XML processing without sacrificing programmer productivity. The idea of data binding is a result of identifying this need.

At a very fundamental level, data binding is a way of binding objects with XML, thereby providing an object-view over XML. With data binding, you can "somehow" create a class hierarchy corresponding to an XML document. Once such classes are created or made available, XML programming gets simpler. The generated classes mirror the structure of the underlying XML, and therefore manipulation of XML resembles usual object-oriented programming. Instead of parsing or traversing, you can now "get" and "set" attributes and elements as though you're dealing with first-class objects. This promise is similar to the promises made by O-R mapping advocates. Before examining the viability of this promise, let's ask more fundamental questions.

How to design data binding classes? How to implement these classes? The idea is fundamentally simple. XML schemas such as DTDs and XSDs define the structure of XML documents. These schemas specify what the root element is, what its children (elements and attributes) are, and so on. In addition, the schemas also specify certain constraints, such as the number of child elements of a given type that an element may have, the order of their occurrence etc. XSDs add more formal constraints for types of data, possible range of values, etc. So, it is tempting to consider such schemas to define a "type system" for the XML documents that adhere to those schemas.

Once there is a type system defining a class of XML documents, the next temptation is to try to "map" the types into a set of classes with equivalent associations and constraints. Once a level of equivalency of schemas and programming language classes has been established, it is quite simple to automate the mapping process. In simple terms, this is how data binding can be done. The following are the typical steps involved. The steps may vary depending on what specific tool/product you're using to generate data binding classes.

  1. Create a schema for the XML document that you're attempting to bind to.

  2. Use a schema compiler to generate classes. This steps gives you a tree of classes, with the class root level corresponding to the root element of the XML document.

  3. If you're interested in obtaining an object-view of XML, feed the XML to an instance of the root class.

  4. If you're interested in creating an XML document, create instances of the the generated classes.

  5. In either case, you can manipulate these objects in the usual manner.

  6. At any time, you can recreate the XML by simply serializing these instances. The generated classes provide methods to do so.

The figure below summarizes this process.

The following are some of the tools available to perform such data binding:

  1. Java Architecture for XML Binding (JAXB). JAXB is DTD-driven. By taking DTDs and a binding schema (that specifies additional information), the JAXB schema compiler generates Java classes.

  2. Microsoft XSD compiler. This schema compiler generates classes in C# and Visual Basic from schemas described using the XSD.

  3. Castor XML Source Code Generator. This is another schema compiler freely available from Castor.org. This compiler generates Java classes from schemas described using the XSD.

  4. Breeze XML Studio. This is yet another schema compiler which generates Java classes from both DTDs and XSDs.

Refer to the bibliography section for further information and hyperlinks to these technologies.

The reason data binding is sought after is because it provides a "typed" and more convenient programming model to process XML. While SAX and DOM are very generic, data binding gives you very specialized classes for processing XML. Besides being more convenient, typed programming is very affective as it makes design, build, and maintain enterprise applications using classes and interfaces. Given these advantages, we may expect to see more and more products over the next year.

3. Yet Another Silver Bullet?

Data binding is essential, but as application developers, you should be aware of what its intended purpose is, what its limits are, and when not to use schema compilers to provide data binding.

  1. Extensibility: As mentioned at the beginning of this paper, most applications today do not exploit the self-describing nature of XML. However there are several scenarios in the enterprise that can benefit tremendously from the metadata contained in XML documents. One such area is the area of enterprise application integration. For application integration to be successful and maintainable, it is important to keep the applications loosely coupled. The coupling between applications should be limited to the information needs of each application. The same applies to various components in a given application too. However, schema compiler generated language classes do not help meet this. When you use DTDs or XSDs for code generation, any change in the XML requires regenerating all the data binding classes, and redeploying on all the applications. This affects all the applications including those applications that are no way interested in the changes. Extending this argument further, effective enterprise application requires less rigid interpretation of schemas.

  2. XML without DTDs or XSDs: There are a variety of applications that use XML in a semi-structured fashion. One of the scenarios this could happen is when each application uses a slightly different XML variant although conceptually such documents are equivalent. Due to the "all-or-none" nature of XML parsers, it is difficult to define a unifying DTD or XSD for such XML documents, thereby ruling out any schema generator driven code generation.

  3. Conceptual mismatch: In addition to the above two, there are fundamental fallacies with schema compilation approaches. This is due to the fact neither DTDs nor XSDs have a complete one-to-one mapping with various object-oriented programming language concepts. While it is easy to map simple concepts such as parent-child associations with aggregations, concepts such as derivation by restriction have no counterparts in programming languages. For instance, what does a processing instruction mean in a programming language. Therefore, a "perfect" and "complete" round-trip between XML and objects is not possible with the current programming languages.

These situations limit the possible range of applications that may benefit from data binding. I would like to conclude this paper with two remarks:

One of the possible alternatives is to rethink on how XML and schemas are interpreted. Current notions of XML schemas encourage a global and rigid view of XML thus making it strongly typed. Can we relook at these views, and derive alternatives?

Bibliography

[1] Java Architecture for XML Binding (JAXB), http://java.sun.com/xml/jaxb/index.html.
[2] XSD Compiler, .NET Development, http://msdn.microsoft.com
[3] Breeze XML Studio, http://www.breezefactor.com
[4] Castor Source Code Generator, http://www.castor.org/sourcegen.html

Biography

Subrahmanyam Allamaraju, Ph.D.
Senior Engineer
BEA Systems Inc.
U.S.A.
Email: subbu@bea.com

Subrahmanyam Allamaraju is a Senior Engineer with BEA Systems Inc. At BEA, over the last one year, he has been exploring and developing XML based technologies for solving certain fundamental enterprise programming requirements. He has coauthored three books and several technical papers on J2EE and XML. His current focus is on evolving enterprise application architectures. He holds a Ph.D. in Electrical Engineering from the Indian Institute of Technology. His technical interests include distributed technologies and XML. You can find more about him at his web site: http://www.Subrahmanyam.com.