|

Programming to XML— Databinding Silver
Bullet
Featured Paper from XML 2001
Conference Proceedings
By:
Subrahmanyam
Allamaraju, Ph.D.Senior Engineer BEA Systems Inc.
U.S.A.
Email: subbu@bea.com
Subrahmanyam Allamaraju is a Senior Engineer with BEA Systems
Inc. At BEA, over the last one year, he has been exploring and developing XML
based technologies for solving certain fundamental enterprise programming
requirements. He has coauthored three books and several technical papers on
J2EE and XML. His current focus is on evolving enterprise application
architectures. He holds a Ph.D. in Electrical Engineering from the Indian
Institute of Technology. His technical interests include distributed
technologies and XML. You can find more about him at his web site:
http://www.Subrahmanyam.com.
XML
Programming
Unlike most papers that start by stating that XML is ubiquitous
because it is simple and eXtensibile, I would like to start with a
contradicting note. It is true that XML is omnipresent today. However, this is
not because XML is simple, self-describing, and eXtensible, but because it is
more practical to use XML as a data format than other possible proprietary
formats. In fact, a quick scan of the various domain-specific XML
standardization initiatives for information exchange and information processing
confirms this assertion. Nonetheless, there is a rich domain of applications
that the self-describing and extensible nature of XML can solve. There are
certain areas of enterprise applications that benefit tremendously from the
extensible nature of XML.
Today, there are a variety of technologies to create and process
XML. On one end of the spectrum of technologies, we have standard APIs like SAX
and DOM, and standards such as XSLT and XPath. There are several
implementations of these APIs and specifications. On the other end of the
spectrum we've emerging technologies such as schema compilers and data binding.
In between there are several homegrown XML programming APIs (often built on top
of SAX and DOM) to simplify specific application needs.
This paper is about programming to XML and the role of data binding
technologies. In this paper, I do not intend to discuss specific APIs or vendor
products. Instead, my aim is to discusses what kind of data binding framework
does an application developer need to create and process XML efficiently and
productively in enterprise applications exchanging XML data.
In this paper, my focus is on applications that use XML as an
interoperable data interchange format. In this category applications, XML is
used as a data format. Applications (or components thereof) create XML and
exchange with other applications (or components) for implementing business
processes. This is one of the most commonly used approaches for achieving
application integration - within and across enterprises. What are the
programming requirements for such applications?
These applications are typically built using object-oriented
programming languages, where information is often modelled as domain objects.
These objects encapsulate business information, and various components
manipulate these objects during the process of implementing business use cases.
How to build such applications when the information is expressed in the form of
XML? This question is similar to the question of how to build objects when
information is stored in and retrieved from relational databases. This question
lead to the so-called object-to-relational (O-R) mapping problem. The O-R
mapping problem is known to be complex. Despite the availability of several
mapping tools and environments, and the awareness of several design patterns,
O-R mapping is complex as it attempts to map two sets of incompatible concepts.
Most applications still deal with SQL directly using programming language level
APIs such as JDBC.
The situation in the case of XML programming is similar. At the
heart of the XML processing tools we have implementations of SAX and DOM. The
rest is sweat and blood programming. We create XML documents manually or
sometimes read it from files or network resources. We use SAX and/or DOM to
extract information from XML documents. We use the DOM API to manipulate
XML.
Despite the veracity of these APIs, note that these APIs are not the best
possible tools. SAX and DOM follow very "generic" approaches, and have very
focused goals. For instance, the purpose of SAX is to provide a syntactical
representation of XML. It reports characters of data as they are found in a
document. On the other hand, DOM provides a structural view of XML. It
constructs a tree-like model using which we can traverse to nodes of interest.
These APIs have different purposes, and neither of these APIs make it easy and
productive to process XML. As an application developer, what you're interested
in is not syntactical or structural view of XML, but data. What you need is a
mechanism to "get" useful information out of XML documents, "set" some
information in XML documents, "create" XML documents etc. These are the
operations (in quotes) you are interested in. For performing such operations,
SAX and DOM do not come to our rescue. The typeless nature of XML coupled with
these APIs make XML programming irksome and error-prone.
Data Binding
The grumblings of the early object oriented programmers developing
database applications led to the notion of object-to-relational mapping. In the
latter years, several vendors attempted (and continue to attempt) a variety of
O-R mapping solutions. Today, O-R mapping is a viable solution for a certain
class of problems, but it can not crack all possible mappings betweeen objects
and relational databases. In the same manner, the adoption of XML as a data
format and the need to create and process XML in object-oriented programming
languages has lead to certain data binding ideas. Information, when expressed
as XML, demands alternative approaches to enable more efficient XML processing
without sacrificing programmer productivity. The idea of data binding is a
result of identifying this need.
At a very fundamental level, data binding is a way of binding
objects with XML, thereby providing an object-view over XML. With data binding,
you can "somehow" create a class hierarchy corresponding to an XML document.
Once such classes are created or made available, XML programming gets simpler.
The generated classes mirror the structure of the underlying XML, and therefore
manipulation of XML resembles usual object-oriented programming. Instead of
parsing or traversing, you can now "get" and "set" attributes and elements as
though you're dealing with first-class objects. This promise is similar to the
promises made by O-R mapping advocates. Before examining the viability of this
promise, let's ask more fundamental questions.
How to design data binding classes? How to implement these classes?
The idea is fundamentally simple. XML schemas such as DTDs and XSDs define the
structure of XML documents. These schemas specify what the root element is,
what its children (elements and attributes) are, and so on. In addition, the
schemas also specify certain constraints, such as the number of child elements
of a given type that an element may have, the order of their occurrence etc.
XSDs add more formal constraints for types of data, possible range of values,
etc. So, it is tempting to consider such schemas to define a "type system" for
the XML documents that adhere to those schemas.
Once there is a type system defining a class of XML documents, the
next temptation is to try to "map" the types into a set of classes with
equivalent associations and constraints. Once a level of equivalency of schemas
and programming language classes has been established, it is quite simple to
automate the mapping process. In simple terms, this is how data binding can be
done. The following are the typical steps involved. The steps may vary
depending on what specific tool/product you're using to generate data binding
classes.
-
Create a schema for the XML document that you're attempting to
bind to.
-
Use a schema compiler to generate classes. This steps gives you
a tree of classes, with the class root level corresponding to the root element
of the XML document.
-
If you're interested in obtaining an object-view of
XML, feed
the XML to an instance of the root class.
-
If you're interested in creating an XML document, create
instances of the the generated classes.
-
In either case, you can manipulate these objects in the usual
manner.
-
At any time, you can recreate the XML by simply serializing
these instances. The generated classes provide methods to do so.
The figure below summarizes this process.
The following are some of the tools available to perform such data
binding:
-
Java Architecture for XML Binding
(JAXB). JAXB is DTD-driven.
By taking DTDs and a binding schema (that specifies additional information),
the JAXB schema compiler generates Java classes.
-
Microsoft XSD compiler. This schema compiler generates classes
in C# and Visual Basic from schemas described using the XSD.
-
Castor XML Source Code Generator. This is another schema
compiler freely available from Castor.org. This compiler generates Java classes
from schemas described using the XSD.
-
Breeze XML Studio. This is yet another schema compiler which
generates Java classes from both DTDs and XSDs.
Refer to the bibliography section for further information and
hyperlinks to these technologies.
The reason data binding is sought after is because it provides a
"typed" and more convenient programming model to process XML. While SAX and DOM
are very generic, data binding gives you very specialized classes for
processing XML. Besides being more convenient, typed programming is very
affective as it makes design, build, and maintain enterprise applications using
classes and interfaces. Given these advantages, we may expect to see more and
more products over the next year.
Yet Another Silver Bullet?
Data binding is essential, but as application developers, you
should be aware of what its intended purpose is, what its limits are, and when
not to use schema compilers to provide data binding.
-
Extensibility: As mentioned at the beginning of this paper,
most applications today do not exploit the self-describing nature of
XML.
However there are several scenarios in the enterprise that can benefit
tremendously from the metadata contained in XML documents. One such area is the
area of enterprise application integration. For application integration to be
successful and maintainable, it is important to keep the applications loosely
coupled. The coupling between applications should be limited to the information
needs of each application. The same applies to various components in a given
application too. However, schema compiler generated language classes do not
help meet this. When you use DTDs or XSDs for code generation, any change in
the XML requires regenerating all the data binding classes, and redeploying on
all the applications. This affects all the applications including those
applications that are no way interested in the changes. Extending this argument
further, effective enterprise application requires less rigid interpretation of
schemas.
-
XML without DTDs or XSDs: There are a variety of applications
that use XML in a semi-structured fashion. One of the scenarios this could
happen is when each application uses a slightly different XML variant although
conceptually such documents are equivalent. Due to the "all-or-none" nature of
XML parsers, it is difficult to define a unifying DTD or XSD for such XML
documents, thereby ruling out any schema generator driven code
generation.
-
Conceptual mismatch: In addition to the above two, there are
fundamental fallacies with schema compilation approaches. This is due to the
fact neither DTDs nor XSDs have a complete one-to-one mapping with various
object-oriented programming language concepts. While it is easy to map simple
concepts such as parent-child associations with aggregations, concepts such as
derivation by restriction have no counterparts in programming languages. For
instance, what does a processing instruction mean in a programming language.
Therefore, a "perfect" and "complete" round-trip between XML and objects is not
possible with the current programming languages.
These situations limit the possible range of applications that may
benefit from data binding. I would like to conclude this paper with two
remarks:
One of the possible alternatives is to rethink on how XML and
schemas are interpreted. Current notions of XML schemas encourage a global and
rigid view of XML thus making it strongly typed. Can we relook at these views,
and derive alternatives?
Home
| Events
| Standards
| Membership
| News
| Resources
| About
|