Case Study: XML Schemas & Rich-Client Applications—The United Nations' Biosafety Clearing-House

Keywords: XML Schema, Rich Client, GUI, SWT

Jeff Lawson
Chief Software Architect
Cogent Logic Corporation
Toronto
Ontario
Canada
jeff@cogentlogic.com

Biography

Jeff Lawson is a software developer with twenty years experience developing COTS products and bespoke software. He spent the eighties coding Z80 and MC68000 and the early nineties developing C++/Win32/COM software. In recent years he has built up a good deal of experience working exclusively with XML/Java. Jeff was responsible for the development of the GUI/XML/DBMS inter-operability product XchainJ that first went into production in February 2002 and is now an Eclipse plug-in. Jeff is frequently called upon to develop software for government projects that use complex XML and also has experience delivering training courses and conference presentations.


Abstract


The Biosafety Clearing-House (BCH) was established by the Cartagena Protocol on Biosafety, a sub agreement to United Nations Convention on Biological Diversity. This case study references the rich-client software that was developed for the Canadian Node of the Biosafety Clearing-House (CNBCH). The software enables a variety of XML instance documents, belonging to a non-trivial, multi-document XML Schema, to be created, viewed and edited. The system architecture de-couples the rich-client from (i) the web services used to store/retrieve/update XML instance documents and (ii) the web services used to access controlled content, such as the ITIS taxonomy. This presentation describes the problems encountered in mapping the XML Schemas to a GUI and how they were solved.


Table of Contents


1. Introduction
2. The Approach
3. Dealing with XML Schemas
     3.1 Restrictions and Extensions
     3.2 Natural Language Support
     3.3 Controlled Vocabularies
4. The Mapping Process in the Development Tool
     4.1 XML Schema Import
     4.2 Automatic XML Fragment Generation
     4.3 User Customization
5. Reading and Writing XML in the Rich-Client Application
6. Conclusion
7. References

1. Introduction

The Biosafety Clearing-House (BCH)Paragraph 37 is an information exchange mechanism established by the Cartagena Protocol on Biosafety to assist the Parties to the agreement to implement its provisions and to facilitate sharing of information on genetically modified living organisms. The core holdings of the BCH are metadata records that facilitate access to decisions that have been made by various countries regarding the import or use of these types of organisms as well as information related to their national regulatory frameworks and regulatory agencies for these organisms. The Canadian Government has developed its own interoperable national Canadian Node of the BCH (CNBCH)Paragraph 38. All documents are realized as XML and functional inter-operability is achieved through web services. It is this software that is the focus of the current case study. The BCH and the CNBCH use different XML Schemas, the latter describing the same content as the former but with additional data to support extra Canadian requirements that pre-date Cartegena. The software was developed in a way that enables any XML Schema to be used thereby enabling it to be readily customized by other countries.

2. The Approach

The CNBCH work is typical of software that needs to present XML instance documents to end-users in a human-usable form, i.e. the requirement for people to create new XML documents and to open exiting documents for viewing or editing. Because most people are not comfortable with raw XML, an end-user (rich-client) application was created by mapping the XML Schema to form controls. Initiatives with similar requirements are often implemented using HTML or XForms but neither approach works well for non-trivial schemas, such as FGDC (~400 elements) or the CNBCH schema. HTML and XForms provide platform independence but they are primitive and inflexible: imagine using these technologies for the OpenGIS Consortium’s Sensor Collection Service which is an application of 53 XML Schema documents spanning 15 namespaces!

At the CNBCH requirements-analysis stage it was anticipated that the software would ultimately be made available worldwide. Since the Least Developed Countries (LDCs) have very little Internet bandwidth, a client-server web solution that minimized data transfer was preferable. Whereas HTML and XForms solutions rely intrinsically upon on web page downloads and incur frequent round-trips to the web server, the flexibility of a rich-client approach enables the end-user application to be supplied on CD-ROM and to connect to a web server for the transfer of target XML documents and controlled content, though the latter too can be supplied on CD-ROM.

Since XML provides platform-independence, it was natural to want the same for the rich-client. Java technology was the obvious choice but the standard Java Graphical User Interface (GUI), commonly known as Swing, is notoriously slow, being a generic Java system built upon graphics primitives. A better GUI technology is provided by the Standard Widget Toolkit (SWT) which forms part of the Eclipse Project, an open source software development project that was started by IBM but which is now controlled by the independent Eclipse FoundationParagraph 39. SWT is Java-based but it is implemented as a thin layer on top of each platform’s native GUI. The result is a GUI that is identical in performance to that of native applications.

Although Eclipse was originally envisaged as a host environment for software tools, so many people found that SWT is useful for stand-alone applications that Eclipse 3 (released June 2004) provided explicit support for such applications in the form of the Rich Client Platform (RCP). Cognizant of the need to support a variety of XML schemas (XML Schema and/or DTDs), the CNBCH software developer (Cogent Logic CorporationParagraph 40) made the decision early on to develop an Eclipse-hosted software tool (plug-in) that would take any XML Schema or DTD and automatically generate SWT forms and mappings, from specified root elements, for use in a generic RCP application designed to process such data.

3. Dealing with XML Schemas

XML instance documents contain elements and attributes, not complex types, sequences, etc. So, although the infrastructure of XML Schema is powerful in prescribing the content of XML instance documents, for the most part it is absent from the documents themselves, except for namespace, element and attribute names and for fixed and default values. Building a GUI that maps to an XML Schema requires identifying all elements and attributes and their relationship to each other (their locations in DOM trees) and optionally providing controlled content from fixed and default values plus content constraints from facets, the latter potentially being left to a later revision of the software.

To a large extent then, the following reserved names in XML Schema constitute ‘noise’: simpleType, complexType, simpleContent, complexContent, group, attributeGroup, all, choice, sequence, restriction, extension, list, union—but not entirely so. Whereas sequence, for example, has no useful part whatsoever in a GUI, choice indicates that only one of the contained elements may appear so the GUI must support user-selection of just one such element. There are several consequences that arise from the need to map an XML Schema to a GUI. The most compelling will now be considered.

3.1 Restrictions and Extensions

An XML Schema restriction or extension is similar to a sub-class in Object-Oriented Design (OOD), i.e. a specialization of a simple or complex data type such that the specialized type(s) can be used in place of the more general type. Since simple and complex types are only ever used to describe the content of named XML elements, we have to deal with scenarios where an element is readily identified from its name but its data type must be determined from supplemental data. That data is the xsi:typeParagraph 41 attribute (xsi references the namespace http://www.w3.org/2001/XMLSchema-instance). All elements in an XML instance document that can have different types should disambiguate with xsi:type. In the CNBCH application there is just one element that requires an xsi:type and this happens to be the one and only root element for all documents, CNBCHMetadata:

CNBCHMetadata_xsd.gif

Each CNBCH document type extends the common content in metadataRecordType, e.g. for LMO (living modified organism) documents the schema begins with:

LMO_xsd.gif

LMO instance documents start with:

LMO_xml.gif

For an application GUI, the software developer will typically create a panel for each data type and use a special selection control to enable the user to specify the desired type. The CNBCH application did not use a form-based selection control because the data type choice was available only for root elements. Instead, the user is given the choice when they create a new document. When opening an existing document, the application software must choose form panels based upon xsi:type values contained within the document. So, the RCP application presents a list of named document types and details such as element names and data types do not appear. The development tool, however, does present a choice of elements and, where necessary, data types:

CNBCH_FragmentChoices.gif

As another example, consider the choices available from the GML 2 sample application, city.xsd:

GML2_FragmentChoices.gif

Here, MultiGeometry, featureMember and geometryProperty must be disambiguated when chosen as the source for populating a panel.

Naturally, when designing the forms, attributes such as xsi:type and xml:lang (see below) must not been shown to the user. The development tool used for the CNBCH application automatically detects such occurrence and uses them to trigger the use of corresponding controls.

3.2 Natural Language Support

Canada has two official languages, English and French. Consequently, the CNBCH XML instance documents must support both languages. This is seen in the dc.title and dc.description elements in the fragment from an LMO instance document shown above. Notice that these elements have a maxOccurs of 2 and are declared to be of type textContentType in the schema. This type is defined as:

textContentType.gif

Hence, xml:langParagraph 42 associates user-readable text with a language:

xml_lang.gif

The designer of the end-user application must decide how to present multiple languages but there are two aspects to this. First of all, will the application itself be internationalized for users who wish to select their preferred language and, secondly, how will the language-specific data be presented. In the CNBCH application, a user can select from English or French for the GUI then all forms appear with English or French captions. When an XML document is loaded or created, it appears in a view with three tabs: English, French and XML. Someone selecting English for the GUI will see on the English tab English captions with the English data items and on the French tab English captions with the French data items. Furthermore, as a convenience, each language-specific control is able to switch to a mode that enables content for both languages to be seen on the same tab.

3.3 Controlled Vocabularies

Many XML documents contain data that derives from known sources. The Integrated Taxonomic Information System (ITIS)Paragraph 43 is one such example; it provides taxons. Another is the Government of Canada’s Core Subject ThesaurusParagraph 44. It is convenient for users to be able to look up controlled vocabularies and insert content directly into the forms they are working on. The CNBCH application uses special form controls that are able to connect to a web service to retrieve such data and to cache the data locally.

4. The Mapping Process in the Development Tool

Mapping an XML Schema to form controls occurs in three stages:

An XML Fragment corresponds to a single form and encapsulates a non-trivial branch of in the schema. Each root element maps to an XML Fragment that is flagged as a document. Other fragments correspond to elements that are reused in several places in the schema or elements that must be disambiguated by specifying each of all possible types.

4.1 XML Schema Import

A user creates a project and imports one or more XML Schemas and/or DTDs. An individual XML Schema is a tree of XML Schema documents which is traversed in its entirety during a single import operation. Each schema can be treated as single data structure but is important to realize that the schema tree is a tree of XML Schema nodes and not a tree that directly depicts XML instance documents! For instance, consider the declaration of the CNBCHMetadata element above. The element elements CNBCHMetadata, dc.title, dc.date, etc. are all sub-elements of the schema element even though in the XML instance documents dc.title and dc.date are sub-elements of CNBCHMetadata. It is possible to traverse the target XML tree by de-referencing element and type references.

4.2 Automatic XML Fragment Generation

When a user wishes to create a new XML Fragment, they are prompted to specify (i) a top-most element (ii) a data type (if there is a choice of data types) and (iii) a style template. Style templates contain information about the form controls to use with each element and attribute (by id, XPath, name and default) plus associated information such as fonts, colors, layout managers. The software then traverses the implied tree of the selected element/type and generates the form. The ‘noise’ items in the XML Schema are typically assigned ‘null controls’.

For each XML Fragment, the development tool displays a view containing the target XML tree because the overall picture is not always obvious to users:

SchemaTree.gif

4.3 User Customization

Once an XML Fragment has been created, the user can:

5. Reading and Writing XML in the Rich-Client Application

Each form control that manages XML content is capable of (i) reading XML, using the content to populate its fields and (ii) generating XML as a prelude to storage. These processes are performed on a per-element basis: upon creation, each control is configured with the element or attribute XPath name that it is responsible for. When reading XML, the application logic provides the control that maps to the root element with a JDOMParagraph 45 element and the control extracts the content that it recognizes, passing on sub-elements to child controls, where appropriate. During XML generation, the control that maps to the root is asked to generate its element. It does this by instantiating a JDOM element and passing it to its child controls before returning the fully populated element to the calling application logic. Control trees are traversed iteratively ensuring that all controls are visited in both reading and writing XML operations.

6. Conclusion

For a number of years now, scientific and technical professionals have been coordinating their efforts within their specialist fields to share information using XML. Some of these initiatives are well established, others are emergent; all are ongoing. The XML schemas produced can be of mind-boggling complexity, yet much of this information is not intended for computer-to-computer inter-operability but ultimately for end-user digestion. The successful CNBCH initiative demonstrates how rich-client architectures can provide good end-user experiences and this paper highlights the XML Schema related issues that need to be resolved when developing such systems.

7. References

NOTE: For more on the Biodiversity Clearing-House, see http://bch.biodiv.org/.

NOTE: For more on the CNBCH, see http://www.bch.gc.ca/.

NOTE: For more on Eclipse, see http://www.eclipse.org/.

NOTE: For more on Cogent Logic Corporation, see http://cogentlogic.com/.

NOTE: For more on xsi:type, see http://www.w3.org/TR/xmlschema-1/#xsi_type.

NOTE: For more on xml:lang, see http://www.w3.org/TR/REC-xml/#sec-lang-tag.

NOTE: For more on ITIS, see http://www.cbif.gc.ca/pls/itisca/.

NOTE: For more on the Core Subject Thesaurus, see http://www.thesaurus.gc.ca/.

NOTE: For more on JDOM, see http://jdom.org/.

XHTML rendition made possible by SchemaSoft's Document Interpreter™ technology.