Abstract
The power of Java as a universal programming language combined with XML as the universal data format today forms the basis for many highly productive software applications. The standard APIs typically used here are DOM and SAX. The presentation shows how these well known interfaces can be extended just a little to take them to new levels of applicability. Where mission critical XML/SGML content is managed within a database system, added value like access control, locking, check-in / check-out and versioning of entire documents or parts of documents must be and can be achieved without leaving the "common sense" of DOM and SAX too far behind.
Following an explanation of the chosen approach and a discussion of pros and cons, a live demo will be held to show how the usage of standard interfaces also allows for seamless integration with available open source systems like "Cocoon" for dynamic and flexible publishing from database content.
Keywords
Table of Contents
Virtually everybody who needs to write a program that processes XML formatted data today (and who isn't?) is using the Document Object Model (DOM) and/or Simple API for XML (SAX) programming interfaces to access and manipulate the data. DOM is an official W3C recommendation, published by the World Wide Web Consortium (see http://www.w3.org/DOM), whereas SAX is a "de facto" standard, originally specified as a Java-only API but meanwhile ported to many other programming environments. (see http://www.saxproject.org)
As these two API's follow very different approaches for representing XML content, together they both form the perfect "toolbox" to tackle virtually any task when it comes to processing XML formatted data. DOM presents the data-content to a program in the form of "Nodes" of different types and thus provides an object-oriented view of the data-structure for working on that structure in the programs main memory. The SAX on the other hand is better suited for an event based processing of large sets of data that can be processed "one by one", e.g. in transformation tasks.
Because DOM and SAX have these quite distinct fields of application, for Java they were both put together to form the so called Java API for XML processing (JAXP) which has been around for some time already and since Java 1.3 is an official part of Sun's Java 2 Standard Edition (J2SE)
Following is an example of how a Java program might access a XML document through the DOM API
Figure 1.
import org.w3c.dom.*;
import javax.xml.parsers.*;
public class Example1
{
public void main(String[] args)
throws Exception
// exception handling ommitted for clarity
{
DocumentBuilder aBuilder =
DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document aDocument = aBuilder.parse(new
java.io.File("Shakespeare.xml"));
NodeList fTopNodes = aDocument.getDocumentElement().
getChildNodes();
for (int i = 0; (i < fTopNodes.getLength()); ++i) {
Node aNode = fTopNodes.item(i);
System.out.println(aNode.getNodeName() +
" - " + aNode.getNodeValue());
}
}
}
Example1: DOM programming
For classical, client-server access from a Java program to a relational database of course there is the Java Database Connectivity (JDBC) API. By providing generic, standardized access methods to the database, JDBC allows for as much database-vendor independence as possible when it comes to retrieving data from and writing data to a relational database from a Java program. However, JDBC still allows passing database-vendor proprietary SQL statements to the database and thus making use of proprietary database features - be it for the better or worse.
JDBC meanwhile is in version 3.0 and also an integral part of Sun's J2SE. Following is an example of how JDBC access from a Java program might look:
Figure 2.
import javax.sql.*;
public class Example2
{
public void main(String[] args)
throws Exception
// exception handling ommitted for clarity
{
// assuming properties 'jdbc.drivers' is set to
// suit run-time environment
Connection aConnection = DriverManager.getConnection(
"jdbc:mysql:Shakespeare", "scott", "tiger");
ResultSet aResultSet = aConnection.createStatement().
executeQuery("SELECT * FROM play");
while (aResultSet.next()) {
System.out.println(aResultSet.getString(
"TITLE") + ", " +
aResultSet.getString("PREMIERE"));
}
aConnection.close();
}
}
Example2: JDBC programming
With the growing acceptance of XML formats for mission critical data the need for secure and reliable storage management of that data becomes more and more apparent. With it's "Content Management Suite" Sörman Information (the company the author works for) provides such a system that combines the virtues of XML (as well as SGML) with the well established Atomicity, Consistency, Isolation, Durability (ACID) functionalities of a database management system.http://cms.sorman.com
This screenshot of the so called "ContentClient" connected to that database might give an idea of the logical database structure:
The treeview on the left shows the overall structure of the database content. The stored documents are organized in a customized folder hierarchy that might resemble a directory structure in the filesystem. Contrary to the filesystem however, the transition from the folder-structure into any document inherent structure (as defined by the documents DTD) is transparent. The user can traverse down to any component of every document she has access rights to and simply view it's content (as shown in the bottom pane) or invoke some action like check-out/check-in for updating only this single document part. The listview on the right side shows the last actions performed on the children of the currently selected node. (but don't worry - we do not really rewrite Shakespeare's work at Sörman)
When we at Sörman were heading for a new Java API to this system, the general path to follow was quite obvious given the existing standards as shown above. Quite naturally, the correct API for a system that combines XML formatting and database technology would combine JAXP with JDBC. Here is an example of the result, again as it might be used from a Java program:
Figure 4.
import org.w3c.dom.*;
import com.sorman.cms.dom.*;
public class Example3
{
public void main(String[] args)
throws Exception
// exception handling ommitted for clarity
{
ExtDOMImplementation aImpl =
CMSManager.getConnection(
"poet://localhost/demobase", "scott",
"tiger");
Document aDocument = aImpl.createDocument(null,
"//XML/Shakespeare.2.0/all_well.xml", null);
NodeList fTopNodes = aDocument.getDocumentElement().
getChildNodes();
for (int i = 0; (i < fTopNodes.getLength()); ++i) {
Node aNode = fTopNodes.item(i);
System.out.println(aNode.getNodeName() + "
- " + aNode.getNodeValue());
}
aImpl.close();
}
}
Example 3: basic "JAXDB" programming
Comparing this code to examples 1 and 2 shows that this really resembles something like the "union" of JAXP and JDBC. The database connection is opened by giving a database-url, username and password to the getConnection() method of the CMSManager (which resembles the JDBC DriverManager). However, the database connection opened implements the DOMImplementation interface (or it's subclassed ExtDOMImplementation resp.) to directly provide a DOMon this database connection. This way, the programming of the XML database is a seamless task without any special database transformation logic needed. The data retrieval and processing code does not differ from a simple XML file processing as already well known by most Java programmers.
Unknown from a standard DOMImplementation interface for simple XML file processing is the final close() call as shown in Example 3. This reminds us that here we really are dealing with a database connection that has to be closed to release the connection resources, again resembling the close() method on the JDBC Connection interface.
Exercise 1: at this point, to let stale code come alive, we'll try a little "live programming" and use the API as shown above from Jython.http://www.jython.org
A second look at the logical database structure as shown in the above screenshot Figure 3 reveals that being able to traverse a single document as defined by the standard DOM interfaces is not sufficient for working on the complete database containing multiple documents that are stored in an arbitrary folder structure.
Fortunately, the design of the DOM allows for easy extensibility for covering these new levels of applicability. We achieve this by simply introducing a new nodetype, called "DocumentContainer" which is a DOM node that might contain one or more Document nodes thereby representing a "folder" for documents in our logical database-structure. All the generic methods on the Node interface as defined by the DOM can be applied to this new type of node as well to provide a seamless transition from the single document level to the "multiple document" tree structure within the database.
The single root node into the database (represented as the root of the treeview in the above screenshot) is modeled as yet another new node-type, named "DocumentRepository". This DocumentRepository is a special kind of DocumentContainer and as such defined as a sub-classed interface of DocumentContainer in our "extended DOM".
Figure 5.
import org.w3c.dom.*;
import com.sorman.cms.dom.*;
public class Example4
{
public void main(String[] args)
throws Exception
// exception handling ommitted for clarity
{
ExtDOMImplementation aImpl = CMSManager.getConnection(
"poet://localhost/demobase", "scott",
"tiger");
ExtDocumentRepository aRoot = aImpl.createDocumentRepository();
Node aFolder = aRoot.getLastChild();
NodeList fDocNodes = aFolder.getChildNodes();
for (int i = 0; (i < fDocNodes.getLength()); ++i) {
Node aNode = fDocNodes.item(i);
System.out.println(aNode.getNodeName() + "
- " + aNode.getNodeValue());
}
aImpl.close();
}
}
Example 4: using additional node-types
Given the database content as shown in the screenshot above Figure 3, running this program would print out the titles of all documents contained in the "XML" folder (which happens to be the last child of the root node). To access the database content, no special qualifier is necessary as there is always the one and only root-node available through the createDocumentRepository() call from the ExtDOMImplementation interface.[1]
Exercise 2: let's give it a try and retrieve the DocumentRepository node and it's children through Jython.
Any database management system that deserves the name must provide some kind of query mechanism to let the user selectively retrieve data based on some search criteria. In our extended DOM interface we provide this functionality by adding the call selectByQuery([queryExpression]) to the generic Node interface. Issuing this call would retrieve a NodeList of all descendants of that Node that meet the search criteria as defined by the queryExpression. By invoking selectByQuery() on the DocumentRepository node (the root), the complete database would be covered by the query.
Let's see how this looks like in our example code:
Figure 6.
import org.w3c.dom.*;
import com.sorman.cms.dom.*;
public class Example5
{
public void main(String[] args)
throws Exception
// exception handling omitted for clarity
{
ExtDOMImplementation aImpl = CMSManager.
getConnection("poet://localhost/demobase",
"scott", "tiger");
ExtDocumentRepository aRoot = aImpl.createDocumentRepository();
NodeList fResultNodes = aRoot.selectByQuery(
"SELECT x FROM x in PSDbComponentExtent WHERE
x.sName LIKE 'play%'");
for (int i = 0; (i < fResultNodes.getLength()); ++i) {
Node aNode = fResultNodes.item(i);
System.out.println(aNode.getNodeName() + " -
" + aNode.getNodeValue());
}
aImpl.close();
}
}
Example 5: issuing a database query
One might argue, this mixing up of the clean, object-oriented aproach of the DOM with a declarative query language (Object Query Language (ACID) in this example) feels a bit awkward. While this might be true, in practice this mechanism proves extremely useful. The transition from DOM nodes to querying some subset and directly using the results as DOM nodes again is very convenient for the programmer and improves productivity.
Another important aspect of this choosen interface design is the fact that it is not dependent on the actual query language used. Emerging improvements in this area like XPath and XQuery could be adopted without changing the generic, extended Node interface.
Exercise 3: querying the database from Jython
One central aspect of the CMS that the API was designed for is the management of multiple versions of the content stored. Again referring to the screenshot above Figure 3, any component of a document can be separately checked out by an authorized user, edited and checked back in, thereby generating a new version of this document component. In usage the mechanism is quite similar to the interface of a source-code control system with the important distinction that here we are not working on complete files but possibly on discrete components as marked up by the XML (or SGML). This allows for improved parallelism in teams - people or processes - that are working on content together.
Figure 7.
This results in a "third dimension" in the stored content as this figure might suggest.
Versioned database content
When the database content is traversed using the extended DOM interface, only the top most layer, i.e. the latest version of every node is read. For exploring the "versioning dimension" of a node, another method is added to the generic Node interface, called getVersionList(). This method returns a list of all versions that are stored for this node. A different, earlier version can be taken from this list and assigned to the current node by the method setVersion([versionObject]). From that point on, the DOM traversal has entered a new version level and all subsequent navigation inside the DOM tree works on this selected version.
Here is the code example for this versioning API
Figure 8.
import org.w3c.dom.*;
import com.sorman.cms.dom.*;
public class Example6
{
public void main(String[] args)
throws Exception
// exception handling ommitted for clarity
{
ExtDOMImplementation aImpl = CMSManager.
getConnection("poet://localhost/demobase",
"scott", "tiger");
// creates a node directly by its unique (primary) key
ExtNode aNode = aImpl.createNode("(0-0-155-178)");
// retrieve a list of all versions of this node
// which is in descending order
// (latest version first, earliest version last in the list)
VersionList fVersions = aNode.getVersionList();
// get the object that represents the earliest
// version of the node from the list
Version aFirstVersion = fVersions.get(fVersions.size() - 1);
// switch the version to use on our node
aNode.setVersion(aFirstVersion);
System.out.println("The node value in version
" + aFirstVersion.getNumber() + " was : " +
aNode.getNodeValue());
aImpl.close();
}
}
Example 6: travelling multiple version levels
Through this simple mechanism, the versioning information on the content is available to version aware clients while at the same time totally transparent to clients that are not aware of multiple versions but are simply utilizing the standard DOM interfaces. In the above code, after switching the Version on aNode, all further navigation within the DOM tree will work on the content as it was stored for the selected (earlier) version.
Exercise 4: switch the DOM tree version from Jython
Besides the stored content itself, a reasonably complex CMS must manage all kinds of additional information about the overall system workflow, component-status information, node-identification and the like. The CMS API should provide access to this type of information that is obviously not covered by the standard DOM. Ideally this should be done without adding too much complexity to the interfaces to avoid "obfuscation" of the basic interface design.
At this point, again an existing part of the standard DOM came in handy: For working on element attributes, the standard DOM provides the NamedNodeMap interface for handling the attributes as a set of name-value pairs.
Java programmers on the other hand are used to the simple but effective mechanism of java-properties for handling name-value pairs, so quite naturally the method getProperties() was added to the Node interface to return a NamedNodeMap of all CMS specific attributes of the node the same way that the standard getAttributes() method returns the "content specific" node attributes.
Again, a small example should make this clear:
Figure 9.
import org.w3c.dom.*;
import com.sorman.cms.dom.*;
public class Example7
{
public void main(String[] args)
throws Exception
// exception handling ommitted for clarity
{
ExtDOMImplementation aImpl = CMSManager.
getConnection("poet://localhost/demobase",
"scott", "tiger");
// creates a node directly by it's unique (primary) key
ExtNode aNode = aImpl.createNode("(0-0-155-178)");
NamedNodeMap aMap = aNode.getProperties();
System.out.println("last action on node was:\t"
+ aMap.getNamedItem(ExtProperty.NAME_ACTION_NAME).getNodeValue());
System.out.println("by user:\t" +
aMap.getNamedItem(ExtProperty.NAME_ACTION_USER).getNodeValue());
aImpl.close();
}
}Example 7: accessing CMS specific node properties
As one can tell from the above code there is a pre-defined set of available property names defined through the ExtProperty interface (which also extends the standard Node interface and as such is our third additional type of node in the extended DOM).
Last but not least of our "small but powerful" extensions to the standard DOM is the notion of meta-data, i.e. a mechanism to enable adding customized, supplemental information to certain nodes. This requirement is met by almost the same mechanism we just saw for CMS specific properties. For meta-data, the respective method at the extended Node interface is called getMetaData([infoName]). The given infoName is an arbitrary name for the requested section of meta-data for which again a NamedNodeMap of the assigned meta-data names and values is returned.
In practice this looks something like this:
Figure 10.
import org.w3c.dom.*;
import com.sorman.cms.dom.*;
public class Example8
{
public void main(String[] args)
throws Exception
// exception handling ommitted for clarity
{
ExtDOMImplementation aImpl = CMSManager.
getConnection("poet://localhost/demobase",
"scott", "tiger");
// creates a node directly by it's unique (primary) key
ExtNode aNode = aImpl.createNode("(0-0-155-178)");
ExtPropertyMap aMap = aNode.getMetaData("Workflow");
System.out.println("workflow status is:\t"
+ aMap.getProperty("status").getNodeValue());
// changing workflow status in meta-data:
aMap.setProperty("status", "review");
aImpl.close();
}
}
Example 8: getting and setting meta-data
Close inspection of this code reveals that in fact, getMetaData() returns a map that implements the interface ExtPropertyMap that extends the standard NamedNodeMap. This interface adds the convenience methods getProperty() and setProperty() that are directly mapped to the standard getNamedItem() and setNamedItem() but feel a bit more "java-like". Note that getProperty() still returns an ExtProperty node on which we must call getNodeValue() to get the property value itself.
Exercise 5: let's set some meta-data through Jython
While the extensions to the standard DOM as shown above make perfect sense to our CMS specific projects (and in fact already are in routine use at some of our customers installations) the real benefit of basing the API on open standards shines when it enables us to connect the CMS to any tool that's available "somewhere out there".
"Apache Cocoon" is such a tool that serves as a publishing framework based on XML and XSLT technologies (please see http://xml.apache.org/cocoon/ for details).
Choosing Cocoon for proving the compatibility of our "JAXDB" interfaces also enables us to cover an area that was mostly neglected in our practical examples until now which is support for the accompanying SAX (besides DOM) as part of JAXP. In short, Cocoon provides it's functionality by pipelining multiple SAX streams that can be dynamically controlled through business rules to form a flexible yet powerful environment.
Exercise 6: we'll use Cocoon to publish some content based on a query for certain meta-data settings that we did through Jython in exercise 5.
From practice back to theory goes this short and closing discussion: in the introducing sections we saw how our "JAXDB" resembles JDBC in establishing database connectivity. When it comes to J2EE application server based environments for high scalability demands, optimizing technologies for JDBC connections like connection pooling and distributed transactions are provided by the application server (or "container" in J2EE parlance) and thus minimize project specific efforts in many respects.
For "JAXDB" connections, being similar but not equal to JDBC connections, these technologies are not provided "out of the box". Luckily however, this demand for "container managed" connectivity to back-end systems is shared with many so called Enterprise Information Systems (EIS) and was addressed by the Java Community through the J2EE Connector Architecture (JCA).
For an insight into JCA please refer to http://java.sun.com/j2ee/connector/. A detailed discussion of JCA goes beyond the scope of this presentation. Here it shoud only be noted that, given this architecture, a general path is laid out to further enable our "JAXDB" for application server environments. This would include providing a CMS - "Resource Adapter" that implements a JCA - "Common Client Interface" to ultimately form the "Java API for XML Enterprising - JAXE". So please stay tuned ...
[1] In a way, this provides a kind of reflection mechanism as any structure information for the content can be determined from the content itself once the database-connection is available. After all, that's what markup languages are all about ...
![]() ![]() |
Design & Development by deepX Ltd. 2002 |