Keywords: XML Schema, Rich Client, GUI, SWT
Biography
Jeff Lawson is a software developer with twenty years experience developing COTS products and bespoke software. He spent the eighties coding Z80 and MC68000 and the early nineties developing C++/Win32/COM software. In recent years he has built up a good deal of experience working exclusively with XML/Java. Jeff was responsible for the development of the GUI/XML/DBMS inter-operability product XchainJ that first went into production in February 2002 and is now an Eclipse plug-in. Jeff is frequently called upon to develop software for government projects that use complex XML and also has experience delivering training courses and conference presentations.
The Biosafety Clearing-House (BCH) was established by the Cartagena Protocol on Biosafety, a sub agreement to United Nations Convention on Biological Diversity. This case study references the rich-client software that was developed for the Canadian Node of the Biosafety Clearing-House (CNBCH). The software enables a variety of XML instance documents, belonging to a non-trivial, multi-document XML Schema, to be created, viewed and edited. The system architecture de-couples the rich-client from (i) the web services used to store/retrieve/update XML instance documents and (ii) the web services used to access controlled content, such as the ITIS taxonomy. This presentation describes the problems encountered in mapping the XML Schemas to a GUI and how they were solved.
1. Introduction
2. The Approach
3. Dealing with XML Schemas
3.1 Restrictions and Extensions
3.2 Natural Language Support
3.3 Controlled Vocabularies
4. The Mapping Process in the Development Tool
4.1 XML Schema Import
4.2 Automatic XML Fragment Generation
4.3 User Customization
5. Reading and Writing XML in the Rich-Client Application
6. Conclusion
7. References
The Biosafety Clearing-House (BCH)Paragraph 37 is an information exchange mechanism established by the Cartagena Protocol on Biosafety to assist the Parties to the agreement to implement its provisions and to facilitate sharing of information on genetically modified living organisms. The core holdings of the BCH are metadata records that facilitate access to decisions that have been made by various countries regarding the import or use of these types of organisms as well as information related to their national regulatory frameworks and regulatory agencies for these organisms. The Canadian Government has developed its own interoperable national Canadian Node of the BCH (CNBCH)Paragraph 38. All documents are realized as XML and functional inter-operability is achieved through web services. It is this software that is the focus of the current case study. The BCH and the CNBCH use different XML Schemas, the latter describing the same content as the former but with additional data to support extra Canadian requirements that pre-date Cartegena. The software was developed in a way that enables any XML Schema to be used thereby enabling it to be readily customized by other countries.
The CNBCH work is typical of software that needs to present XML instance documents to end-users in a human-usable form, i.e. the requirement for people to create new XML documents and to open exiting documents for viewing or editing. Because most people are not comfortable with raw XML, an end-user (rich-client) application was created by mapping the XML Schema to form controls. Initiatives with similar requirements are often implemented using HTML or XForms but neither approach works well for non-trivial schemas, such as FGDC (~400 elements) or the CNBCH schema. HTML and XForms provide platform independence but they are primitive and inflexible: imagine using these technologies for the OpenGIS Consortium’s Sensor Collection Service which is an application of 53 XML Schema documents spanning 15 namespaces!
At the CNBCH requirements-analysis stage it was anticipated that the software would ultimately be made available worldwide. Since the Least Developed Countries (LDCs) have very little Internet bandwidth, a client-server web solution that minimized data transfer was preferable. Whereas HTML and XForms solutions rely intrinsically upon on web page downloads and incur frequent round-trips to the web server, the flexibility of a rich-client approach enables the end-user application to be supplied on CD-ROM and to connect to a web server for the transfer of target XML documents and controlled content, though the latter too can be supplied on CD-ROM.
Since XML provides platform-independence, it was natural to want the same for the rich-client. Java technology was the obvious choice but the standard Java Graphical User Interface (GUI), commonly known as Swing, is notoriously slow, being a generic Java system built upon graphics primitives. A better GUI technology is provided by the Standard Widget Toolkit (SWT) which forms part of the Eclipse Project, an open source software development project that was started by IBM but which is now controlled by the independent Eclipse FoundationParagraph 39. SWT is Java-based but it is implemented as a thin layer on top of each platform’s native GUI. The result is a GUI that is identical in performance to that of native applications.
Although Eclipse was originally envisaged as a host environment for software tools, so many people found that SWT is useful for stand-alone applications that Eclipse 3 (released June 2004) provided explicit support for such applications in the form of the Rich Client Platform (RCP). Cognizant of the need to support a variety of XML schemas (XML Schema and/or DTDs), the CNBCH software developer (Cogent Logic CorporationParagraph 40) made the decision early on to develop an Eclipse-hosted software tool (plug-in) that would take any XML Schema or DTD and automatically generate SWT forms and mappings, from specified root elements, for use in a generic RCP application designed to process such data.
XML instance documents contain elements and attributes, not complex types, sequences, etc. So, although the infrastructure of XML Schema is powerful in prescribing the content of XML instance documents, for the most part it is absent from the documents themselves, except for namespace, element and attribute names and for fixed and default values. Building a GUI that maps to an XML Schema requires identifying all elements and attributes and their relationship to each other (their locations in DOM trees) and optionally providing controlled content from fixed and default values plus content constraints from facets, the latter potentially being left to a later revision of the software.
To a large extent then, the following reserved names in XML Schema constitute
‘noise’: simpleType, complexType, simpleContent, complexContent, group, attributeGroup, all, choice, sequence, restriction, extension, list, union—but not entirely so. Whereas sequence, for example, has no
useful part whatsoever in a GUI, choice indicates that only one
of the contained elements may appear so the GUI must support user-selection
of just one such element. There are several consequences that arise from the
need to map an XML Schema to a GUI. The most compelling will now be considered.
An XML Schema restriction or extension is similar to a sub-class in
Object-Oriented Design (OOD), i.e. a specialization of a simple or complex
data type such that the specialized type(s) can be used in place of the more
general type. Since simple and complex types are only ever used to describe
the content of named XML elements, we have to deal with scenarios where an
element is readily identified from its name but its data type must be determined
from supplemental data. That data is the xsi:typeParagraph 41 attribute
(xsi references the namespace http://www.w3.org/2001/XMLSchema-instance).
All elements in an XML instance document that can have different types should
disambiguate with xsi:type. In the CNBCH application there is
just one element that requires an xsi:type and this happens to
be the one and only root element for all documents, CNBCHMetadata:

Each CNBCH document type extends the common content in metadataRecordType,
e.g. for LMO (living modified organism) documents the schema
begins with:

LMO instance documents start with:

For an application GUI, the software developer will typically create
a panel for each data type and use a special selection control to enable the
user to specify the desired type. The CNBCH application did not use a form-based
selection control because the data type choice was available only for root
elements. Instead, the user is given the choice when they create a new document.
When opening an existing document, the application software must choose form
panels based upon xsi:type values contained within the document.
So, the RCP application presents a list of named document types and details
such as element names and data types do not appear. The development tool,
however, does present a choice of elements and, where necessary, data types:

As another example, consider the choices available from the GML 2 sample
application, city.xsd:

Here, MultiGeometry, featureMember and geometryProperty must
be disambiguated when chosen as the source for populating a panel.
Naturally, when designing the forms, attributes such as xsi:type and xml:lang (see
below) must not been shown to the user. The development tool used for the
CNBCH application automatically detects such occurrence and uses them to trigger
the use of corresponding controls.
Canada has two official languages, English and French. Consequently,
the CNBCH XML instance documents must support both languages. This is seen
in the dc.title and dc.description elements in the
fragment from an LMO instance document shown above. Notice that these elements
have a maxOccurs of 2 and are declared to be of
type textContentType in the schema. This type is defined as:

Hence, xml:langParagraph 42 associates user-readable
text with a language:

The designer of the end-user application must decide how to present multiple languages but there are two aspects to this. First of all, will the application itself be internationalized for users who wish to select their preferred language and, secondly, how will the language-specific data be presented. In the CNBCH application, a user can select from English or French for the GUI then all forms appear with English or French captions. When an XML document is loaded or created, it appears in a view with three tabs: English, French and XML. Someone selecting English for the GUI will see on the English tab English captions with the English data items and on the French tab English captions with the French data items. Furthermore, as a convenience, each language-specific control is able to switch to a mode that enables content for both languages to be seen on the same tab.
Many XML documents contain data that derives from known sources. The Integrated Taxonomic Information System (ITIS)Paragraph 43 is one such example; it provides taxons. Another is the Government of Canada’s Core Subject ThesaurusParagraph 44. It is convenient for users to be able to look up controlled vocabularies and insert content directly into the forms they are working on. The CNBCH application uses special form controls that are able to connect to a web service to retrieve such data and to cache the data locally.
Mapping an XML Schema to form controls occurs in three stages:
An XML Fragment corresponds to a single form and encapsulates a non-trivial branch of in the schema. Each root element maps to an XML Fragment that is flagged as a document. Other fragments correspond to elements that are reused in several places in the schema or elements that must be disambiguated by specifying each of all possible types.
A user creates a project and imports one or more XML Schemas and/or
DTDs. An individual XML Schema is a tree of XML Schema documents which is
traversed in its entirety during a single import operation. Each schema can
be treated as single data structure but is important to realize that the schema
tree is a tree of XML Schema nodes and not a tree that directly depicts XML
instance documents! For instance, consider the declaration of the CNBCHMetadata element
above. The element elements CNBCHMetadata, dc.title, dc.date,
etc. are all sub-elements of the schema element even though in
the XML instance documents dc.title and dc.date are
sub-elements of CNBCHMetadata. It is possible to traverse the
target XML tree by de-referencing element and type references.
When a user wishes to create a new XML Fragment, they are prompted to
specify (i) a top-most element (ii) a data type (if there is a choice of data
types) and (iii) a style template. Style templates contain information about
the form controls to use with each element and attribute (by id,
XPath, name and default) plus associated information such as fonts, colors,
layout managers. The software then traverses the implied tree
of the selected element/type and generates the form. The ‘noise’ items in
the XML Schema are typically assigned ‘null controls’.
For each XML Fragment, the development tool displays a view containing the target XML tree because the overall picture is not always obvious to users:

Once an XML Fragment has been created, the user can:
Each form control that manages XML content is capable of (i) reading XML, using the content to populate its fields and (ii) generating XML as a prelude to storage. These processes are performed on a per-element basis: upon creation, each control is configured with the element or attribute XPath name that it is responsible for. When reading XML, the application logic provides the control that maps to the root element with a JDOMParagraph 45 element and the control extracts the content that it recognizes, passing on sub-elements to child controls, where appropriate. During XML generation, the control that maps to the root is asked to generate its element. It does this by instantiating a JDOM element and passing it to its child controls before returning the fully populated element to the calling application logic. Control trees are traversed iteratively ensuring that all controls are visited in both reading and writing XML operations.
For a number of years now, scientific and technical professionals have been coordinating their efforts within their specialist fields to share information using XML. Some of these initiatives are well established, others are emergent; all are ongoing. The XML schemas produced can be of mind-boggling complexity, yet much of this information is not intended for computer-to-computer inter-operability but ultimately for end-user digestion. The successful CNBCH initiative demonstrates how rich-client architectures can provide good end-user experiences and this paper highlights the XML Schema related issues that need to be resolved when developing such systems.
NOTE: For more on the Biodiversity Clearing-House, see http://bch.biodiv.org/. |
NOTE: For more on the CNBCH, see http://www.bch.gc.ca/. |
NOTE: For more on Eclipse, see http://www.eclipse.org/. |
NOTE: For more on Cogent Logic Corporation, see http://cogentlogic.com/. |
NOTE: For more on xsi:type, see http://www.w3.org/TR/xmlschema-1/#xsi_type. |
NOTE: For more on xml:lang, see http://www.w3.org/TR/REC-xml/#sec-lang-tag. |
NOTE: For more on ITIS, see http://www.cbif.gc.ca/pls/itisca/. |
NOTE: For more on the Core Subject Thesaurus, see http://www.thesaurus.gc.ca/. |
NOTE: For more on JDOM, see http://jdom.org/. |
XHTML rendition made possible by SchemaSoft's Document Interpreter™ technology.