Abstract
The abstract was not available at the time the proceedings were created. Please check an updated version of the paper abstracts at the conference proceedings web site.
Table of Contents
The mission was to take a traditional customer support operation for a software product and deliver it online over the internet. The product was a very mature and extremely complex application development system called Uniface, the company was Compuware Corporation, the year was 1997. The term ‘portal’ had only just been invented. No off-the-shelf infrastructure was available to fast-track the development of the site. Web Content Management was unheard of.
The legacy was our starting point:
Two documentation sets for previous versions numbering thousands of pages
The current documentation set, 5,000 pages long
A new documentation set in the making
Four equivalent sets of training and consultancy materials, each at least 5,000 pages long
FAQs going back five years covering three previous versions
Release notes
A bug-tracking application concealed from end-users
Access paths to support engineers via email, fax and telephone.
The goal of the project was to provide:
Online access to previous versions of documentation and training materials, in PDF
A single-source authoring and publishing system to drive all new publishing actions
Online training materials
An online learning environment (initially dubbed the uniface@cademy)
An online helpdesk for customer support
Online access to FAQs and knowledge centres
An automated build process that could dynamically generate the latest state of all ‘static’ content
A publishing process to provide live updates where applicable.
The constraints driving the requirements were:
The lifecycle of some information spanned multiple versions of the Uniface product
The requirements of a publishing system were constantly evolving
The system should support the assembly of new document types from existing components
Dynamic, incremental publishing was necessary
No broken links could be tolerated
All information should be single-source
Access to content should not require proprietary plug-ins
Contributors and authors should be able to directly update content.
The team developing the system comprised three architects, two publishing specialists, two outside consultants, four developers, and an army of content creators / editors / etc. In total, around 30 people were involved, of whom seven would be heavily involved in designing and creating the schemas driving the system. These seven people were geographically divided across two countries, representing four distinct departments or organizations.
As is customary, designing the schema family to comply with the above happened relatively late in the project. First stop was an analysis phase. The existing materials and support applications were very well understood. However, to satisfy the single-source requirement and the ability to assemble new document types with relative ease, a modular design was clearly going to be necessary. We launched a series of user studies that ran for some months, and resulted in an information ‘map’. [1]
The information map charted the entire domain of the new system in terms of the kind of information that any given user needed to see in any given context. The information map essentially described the structure of a modular and heavily inter-linked set of topics. A cross-reference model defined how users might navigate from topic to topic.
A pattern was quickly discernible in the map: all the information in the system, whether used for training materials or to explain the workaround for a bug, could be categorized in one of five categories:
Concept
Task
Reference
Glossary
Example.
With these categories, documents of any complexity could be assembled. Each category contained various topics for product-specific content (for example, the reference information for a Uniface function), and topics for shared information. In total, over 50 topic types were identified.
When the time came to design the content management system at the heart of the application, each topic equated neatly to one DTD. Over and above the 50+ topic DTDs, we created a number of ‘structure’ DTDs to facilitate data entry, and a number of ‘publication’ DTDs that linked the topics together into coherent publications (such as a training course). A Link Manager was designed and built using an abstraction layer that gave us an external link database. The Link Manager allowed topics to be linked together in such a way that validating a document for publication meant validating the web of links rather than the constituent documents, as all atomic documents in the content management system could be safely assumed to be valid. The Link Manager satisfied the constraint that we needed to be able to incrementally build a publication set by ensuring that every published set was governed by a valid set of links.
Existing content was subjected to a conversion process, the database was filled (we were using Astoria), authors and developers joined forces to get the first working version of the site finished, and we raced towards the deadlines imposed on us by the next release of Uniface. As we went into production, we were using 65 DTDs.
In hindsight, creating the initial set of schemas was the simplest and most efficient part of the project. In terms of software development and publishing, the challenges presented by the project were phenomenal, despite the fact that everybody on the team had extensive (direct or indirect) experience of the processes involved in developing large software applications. We had all agreed from the outset that the ‘waterfall’ method of software development was not suitable for such a project, due to the incremental rollout and addition of so many of the components; the ‘cyclical’ development model (design, implement, test, deploy, ad infinitum) was clearly preferable. However, continually cycling back into the design phase to tweak our definitions meant revisiting the design of the schemas.
Revising a schema potentially causes everything dependent on or described by that schema to break. Since schemas are basically only text files containing declarations, they are easy to break, their contents are difficult to control, and the impact of change is difficult to predict. There is no automated way of charting the dependencies between, say, an attribute and every instance of every object relying on that attribute. In a project management sense, changing the definition of the attribute means that somebody has a lot of clearing up to do and a lot of invisible budget drain happens.
Sharing responsibility for schema development across a number of developers was proving to be impossible. Only one person (and that person had to be an expert on the entire system) could control the schemas at any one time. Only one developer could drive the process of keeping the application synchronized with the modifications made to the schemas. Functional requirements became steadily more difficult to satisfy as technical problems caused us to sacrifice functionality for technical feasibility. In short, with every day it became clearer that the only perfect way to release a new system of this scale was in the ‘big bang’ way: all at once with everything working properly, and yet that meant using the waterfall method of design and implementation.
Close to the end of the project, every small change to the DTDs triggered around 40 work-days of extra effort in the project. The authors and developers whose job it was to synchronize the application and content were de-motivated and unable to do their ‘real’ jobs. Managing the scope of the project in terms of deadlines and budget became a nightmare, it was schema-driven chaos. We made it, but at an almost un-quantifiable cost to the organization.
So, let us analyse this case study and ask ourselves if there is anything specific about the technology that makes shared development of schemas (and schema-driven applications) so difficult.
XML provides us (among other things) with a mechanism for applying names to objects. Complex XML-driven applications present a management problem because a name is an object in XML, when from a programmer’s perspective a name ought to be a property of an object. As such, XML declarations are not objects that can be dealt with as if they were source code. XML objects from a programmer’s perspective are schemas, transformations, and so on, all of which provide a container for multiple references to single objects, duplicated to the nth degree.
The management problem arises because of two issues:
Charting all the dependencies between objects is almost impossible
There is no suitable infrastructure for team development.
The lack of ‘source code’ for XML means that the management of team development in complex XML-driven environments is limited to the management of the XML objects produced by the developers. For example, in a system that relies on families of schemas, developers work on a schema, check it into a repository of some kind, and then other developers check it out and use it to develop associated pieces of the puzzle (java classes, transformations, and so on).
Applying version and source control is only possible at the level of the container in which the schema is placed. No semantic understanding of the actual contents is supported, which makes canonical comparisons impossible. Such a system therefore has no control over what the teams of developers build against the schema, which causes an exponential growth in the number of risks and potential errors per XML object compared with the situation where only one developer is working on the system.
In the lifecycle of any application environment where there is no source or version control, and when duplicate definitions describe much of the system, the following issues result as time goes by:
Increasing risk of failure
Decreasing reliability (‘buggy’ behaviour)
Unpredictable costs
Resource bottlenecks
Decreasing compliance with corporate or industry standards
Lengthening development cycles
Inability to change existing systems
Diminished understanding of the system as a whole
Vendor and supplier lock-in.
Most XML development environments are tools to help developers create systems, but not manage them. XML ‘out of the box’ offers no mechanism or infrastructure to handle such important management information (and we should not expect it to) — traditional methods consist of system documentation and the knowledge that has built up in the heads of the application owners.
Vodafone’s Global Product and Content Services Group is responsible for the ongoing design and development of Vodafone live! Having reached the target of a million active customers in early 2003, the group had the challenge of taking the service to the next level. Objectives were to:
Enable roll-out of the service to a growing number of worldwide Vodafone subsidiaries
Support a growing number of diverse content providers supplying very different types of content
Provide the flexibility to support deployment in new geographies
Manage new types of content without re-engineering core components.
The team had already made extensive use of XML to facilitate aggregating, transforming, personalising and delivering content for delivery to their multi-access portal. Various schemas had been developed to manage the XML data but they did not adequately support growth and flexibility objectives for the next generation of Vodafone live! A new markup language was required. The new language, VCML — Vodafone Content Markup Language — would be designed from the ground up:
to be mobile-centric and content-centric
to address the current and future needs of all the Vodafone operating divisions
to incorporate the latest developments in XML standards to provide fine-grained access to varied content.
I am using VCML as a case study because it illustrates the processes involved in designing and developing a family of XML schemas with a very wide scope, and a very modern application. VCML was designed as a markup language for the description of individual pieces of content that are produced by content-generating applications maintained by both Vodafone and third-parties, and passed to the Vodafone live! portal for assembly and presentation to a target device. VCML is designed to use content-centric rather than display-centric markup. This promotes flexibility by supporting new target devices without the need to continually modify or evolve the content.
digitalML Ltd. ensured the standards compliance of the new schema by exploiting facilities from the XML family of standards where possible rather than duplicating them, and by ensuring conformance with Internationalisation (I18N) and Web Accessibility Initiative (WAI) guidelines. A modular schema design approach was taken in order to support future development of user experience. This will allow new types of content to be supported by extension of the existing schema rather than creation of an entirely new schema. The modularity also supports new types of content delivery such as VoiceML presentation. New modules can be developed without affecting the existing modules or existing applications.
The project to design and deliver VCML has been very successful. By the end of the VCML development project, the family of schemas included 21 interrelated XML Schemas. Unlike the first case study, the development work that built on those schemas was still to come.
In the first case study at Compuware, everybody agreed that the cyclical model of software development was preferable. Everybody concluded that in a schema-driven application development environment, especially when there is a great deal of content involved, the cyclical method of development is extremely difficult to apply using current technologies.
In the second case study at Vodafone, the schema development was completed before application development started. The challenge facing development teams and architects is twofold (and standard fare for this kind of technology): firstly, the architects will need to evolve VCML in response to requests from the field; secondly, they will be faced with differing shelf-lives of document instances that will need to move in sync with the evolution of VCML. While the initial schema development and handover to the field has been handled is a way that is closer to the waterfall method of development, all ensuing activity as VCML evolves will demand a cyclical approach.
The cyclical approach to application development is necessary in any situation where you have documents and application logic with a shelf-life longer than the schemas, and where an existing implementation needs to stay alive throughout the process of change. In particular, it is the need to keep an application alive while evolving it that forces developers into the cyclical method, and this is precisely the approach that is so difficult to apply with XML.
When the process of schema design and implementation is spread across multiple team members in multiple locations, you need an owner, and an agreement in the field to honour a set of published schemas. Modifications are presented as change requests to the owner, who may or may not make the requested changes. When changes occur, a new cut of the schemas is published and deployed to the field.
The technology to support such a system relies heavily on processes and people. Schemas can be kept in a source control system (preferably one with strong support for versions), good programming practices can be applied to comments and case numbering, a UML model (or equivalent) should be maintained in a central place, visible to the organization, and everybody must agree the basic standards of schema design:
Which tools to use
Where and how to comment
When to use various XSD constructs as opposed to the available alternatives
When to use globally available resources, and when to ‘bake your own’
(and so on)...
The bottom line is this: without an infrastructure to give XML the objects behind the names, and thus allow us to sensibly version more than the container file of the schema itself, storing schemas in CVS is about the only applicable solution.
Object modelling provides a way out of our dilemma. Valid or well-formed XML always conforms to the definition of a document structure. The ‘model’ of an XML document is its schema. The schema also formally documents the data requirements of any process relying on the document. However, modelling an XML document actually means modelling the model of the XML document (that is, modelling the schema). The model of a schema is one level of abstraction higher than the schema.
Modelling the schema has the following advantages:
Modelling the schema allows you to generate any flavour of schema as a down-translation from a rich, syntax-independent medium
The model of a schema allows you to attach non-XML properties to a schema
If you can model the schema, you can generate the schema from the model.
If you can generate the schema, you now know where your schema physically exists in your system, or—and this is very important, because the schema describes the XML flowing between two processes—you know which processes will be affected by a change to your schema.
The following problems arise:
A model of a schema does not provide a mechanism for handling the re-use of objects; the re-use of objects in different schemas forces duplication of definitions
A change to an element still means breaking open all the models of all the schemas and finding out what is broken
Contextual behaviour, such as context-sensitive business rules, quickly leads designers to a situation in which models are duplicated.
The solution is this: If you can model the model of a schema, you can generate multiple models of multiple schemas from one model, and thus know where an element has been used (or any of its dependencies) anywhere on the system.
At this level of abstraction (you might say that we are talking about the model of the model of the model of an XML document!) the model of the objects means a high-level, object-oriented view of all the objects used in the system. The object model gives us many advantages, as follows:
Property sets, such as transformation rules, style information, Java code that processes the information, and so on can be attached to individual objects before they are deployed into physical implementations of the logic.
From a pool of single objects, you can build deployable ‘structures’ that are equivalent to schemas (or schema fragments), and publish them in the same way that a conventional software system would be built by compiling it from an identifiable code baseline. These are ‘published contexts’, meaning the contexts in which we know that any given object is actually used.
You can identify where a system will break by changing the object in the model of the objects, and this will tell you:
Which deployed structures (published contexts) are affected
Where those structures are used in the system
And therefore, which bits of your system will break.
Conversely, if you have captured enough information about the object in the first place, you can tell which parts of the system are affected but do not need to be changed. For example, if you take a transaction ID with an integer data type and add a timestamp to it, the impact on the system will only be in those parts where the contents of the transaction ID are processed, not where it is merely transferred as a packet of data.
The following problem still exists: you cannot version the published contexts (equivalent to schemas or related properties), because in order to version schemas you need to version the objects in the schemas.
The solution is to add another layer of abstraction and capture version information about an object before it is used to build deployable structures.
Modelling XML so that versions can be fully supported means building a model one layer of abstraction higher still. Keeping XML-based systems alive is a versioning issue. Keeping systems alive means being able to analyze the impact of change, auto-generate systems that are affected by change, and provide system owners with all the hooks and handles they need to manage change throughout the system. In an XML-based environment, this is only possible when you can version schemas. And versioning schemas, as we have seen in the ‘Model of objects’ layer, is only possible if you can version the individual objects used in the schemas.
Schema design is about managing evolution. Managing evolution is about managing not just versions of schemas, but more importantly the individual object declarations contained in the schemas. Organisations developing schema families in large, distributed teams must (perforce) use an approach in which technology plays second fiddle to process. In an ideal world, the two should be inextricably intertwined, with a process facilitated by a strong, supporting technology.
[1] Conceptually similar to but not the same as the results of the commercially available and copyrighted Information Mapping technique, hence the lack of initial capitalization.
![]() ![]() |
Design & Development by deepX Ltd. |