Abstract
This paper explores how to use the new [DOM Validation Interfaces] as the basis for a prompted editing tool. It looks at areas where these interfaces fall short of what may be required by such applications, and ways in which these shortfalls can be made good. It also examines ways in which implementations of these interfaces could provide support for validation against a richer variety of business rules than can be expressed in standard content model definition languages such as DTDs and XML Schema.
Keywords
Table of Contents
Listen carefully next time a new XML data format is announced, and you will hear the unmistakable sound of thousands of content owners groaning in unison. New technologies and new processing systems often impose new requirements on how raw XML data is formatted, and this is as it should be: if we can make some basic assumptions about the structure of our inputs we can achieve that much more in processing them. We have numerous technologies that allow us to codify (at least some of) these assumptions and then validate data against them. But it is surprisingly easy to lose sight of the fact that it is not just the recipient of the data who needs to work with these constraints: spare a thought for the poor souls who have to key the stuff into the machine in the first place. For many, having to produce content in XML is daunting enough without the prospect of having to understand the implications of 20 pages of XML Schema.
That is, of course, an over-dramatisation of the situation. Much of the content we are talking about is already in XML, or at least a format that can readily be converted to it by automatic means. Furthermore, the existence of technologies like [Extensible Stylesheet Language] and [XQuery] have the potential to remove much of the drudgery from ensuring that one's XML content is appropriately structured for the current application. But at the end of the day, there is one thing that cannot (currently) be automated: understanding how the elements of "XML Format 1" map onto "XML Format 2". To do that, one needs to understand what those elements mean in a more abstract sense. If we are to generate valid and meaningful XML content, whether by entering it at the keyboard or by transforming existing data, the process will at some point require a human being to understand the sense and structure of the target format.
This paper looks at "prompted editing" as one way in which we can ease the pain of this process of understanding by squeezing a bit more value from those technologies used to validate XML content at the receiving end. In general terms, a tool that supports prompted editing is one that can guide users through the creation of certain types of content by presenting them with closed lists of options as they edit, by marking errors as they type, even by actually preventing them from making invalid changes at all. There are already applications that will do this for XML content; there are even some that will read an XML Schema and provide editing assistance based on the grammar it defines. But these tend to be specialist applications aimed at people editing raw XML; thus the complex code involved in dynamically interpreting the applicability of a schema to a document that is supposed to conform to it is inextricably tied to particular user interfaces. Anyone wishing to build this sort of schema-awareness into their own applications, to support their own user interfaces, is thus faced with the daunting task of having to rewrite all that core code.
On the 27th of January this year, however, the World Wide Web Consortium (W3C)[DOM Validation Interfaces] reached "Recommendation" status. These interfaces are aimed at providing exactly the sort of information required to start building schema-awareness into any application working with a [DOM] view of an XML document. This should ensure that the complex and specialist task of writing code to understand the various schema languages is encapsulated within standard [DOM] processors, leaving application developers free to concentrate on making their user interfaces as intuitive as possible. Hopefully, the end result will be a much wider variety of tools capable of providing some analgesic support to soothe the pain of getting data into complex XML structures.
In the remainder of this paper, I would like to examine in a bit more detail how these new interfaces can be used to support prompted editing. In doing so, I will highlight some areas where I believe that many tools will require more information than the interfaces provide to give truly helpful feedback to users, and I will make some suggestions of ways to fill these gaps. As part of this, I will also examine how implementors of the specification might go about adding support for other sorts of validation above and beyond the obvious standards like Document Type Definition (DTD) and XML Schema. Finally, I believe that in the course of looking at these issues, it will become clear that we might profit from a slight, but significant, change in the way in which treat validation of XML content more generally.
There are many, many different possible types of prompted editing application; these include straightforward XML editors, high-level graphical tools, and even systems designed to create stylesheets appropriate for the generation of content according to a particular schema. For the purposes of this discussion, I am going to stick to the basic, generic case of a tool that can assist with the authoring of an instance document conforming to a particular schema; [1] the discussion should be applicable to all applications in this category, irrespective of how the data is ultimately displayed to the user.
First I will enumerate the basic operations required of such a tool; we can then look at how these might be implemented. For the purposes of drawing up a list of this sort, I want to make a slightly artificial, but nevertheless helpful distinction between functions required to assist creation of a document, and functions required to assist editing one. Inevitably, there is significant overlap between these two, but they generate different sorts of requirements that need to be addressed in different ways.
What is it that we are creating? It sounds silly, but almost the most important item of information that one can provide is a description of the purpose of the document being created. The author of a schema generally knows how the data he is describing is going to be used; he understands the semantic significance of its structural divisions; he has a pretty good idea of what a "normal" instance document will look like. The user, by contrast, may have only the schema itself for guidance; if he is very lucky it might contain useful embedded documentation, but even then, the syntactic constructs of schema languages can make it hard to see the wood for the trees. [2] Moreover, since we are talking about prompted editing tools, one would hope that the user would not have to examine the schema directly - the application should present him with all the information he needs to know. Even if our data definition is beautifully documented, we need to decide what to display to the user, and in what format.
Here is a list of possible information that might be provided through a prompted editing system, in ascending order of sophistication:
Plain text description. Eliminate the middle man - allow the author of the specification to communicate the intention of the data format directly to the user in their own language, rather than interposing the restrictions of machine-readable syntax at every stage. I would argue that this sort of information is among the most important items of editing assistance. For example, the documentation provided with the DTD for conference papers starts with a brief introduction that describes the sort of document required, the basic outline of its structure, and the intended division of content among the high-level elements. It would have taken orders of magnitude longer to gain this understanding from the DTD itself than it did to read those few lines of documentation. Would it not be even more useful if I could be sure that this description would be available at the touch of a button when I started writing a paper in accordance with the DTD?
Annotated schema view. When writing an instance document, one generally has specific information that one wants to express, but one does not always know the best way to express this in the format being used. If I want to create a series of nested sections in a DocBook document, and I come from an HTML background, I might start by using nested lists for want of knowing better. It's valid, but it probably is not the right way to do it, and context sensitive prompting will not necessarily bring the alternatives to my attention. But if I can browse the high-level structure of the schema, I will see that there are "section" elements that can be nested to an arbitrary level, which can have titles, and which are likely to appear in any table of contents for my document.
Default structure creation. Going one step further, it is worth noting that the top-level structure of XML data formats is often fairly rigid, whereas the details tend to get more flexible. At the most sophisticated level, one could have a system that actually generated outline content for all those sections of the document that were required, filled in default values, and possibly even included examples of optional sections. In effect, the schema would become the basis for generating an input template. Naturally, this is an area that has to be treated with some care: one can end up producing something that annoys users more than it assists them; nevertheless, the fundamental principle of learning by example is one that should be borne in mind.
What is common to all three of these options is that they emphasise the role of human communication. One could write a prompted editing system simply by guiding the user at every step of the way, detailing what is allowed, required and prohibited at any given point. Users of such systems would not be able to create invalid content according to the relevant schemas, but I suspect that they would find it hard going, and I am almost certain that they would create many documents that were different to what the schema author intended. The reason for this is simple: this approach fails to capitalise on the fact that people massively outperform computers when it comes to seeing and communicating the big picture. This is especially true in areas where the users understand the real-world domain that an XML format is working with: I already know far more in general about conference papers than any current computer, so if I can see the structure I have to work with, I'm likely to be able to do most of the interpretative work myself. Even where this background knowledge is not present, if an author can see a structure view annotated with natural language documentation, the rich framework of linguistic community in which we all live allows far more high-level information to be communicated than could be encoded into a series of machine-readable rules.
What am I allowed to put at a given point? This is the bread and butter of prompted editing: enumerating the content that is valid at a given point in the document by interpreting the schema; it plays to one of the most basic principles of interface design: break complex operations down into a series of smaller, more easily comprehensible choices. The simplest, most generic approach to the issue is to provide lists of valid element names, attribute names, and, where appropriate, enumerated values. This significantly reduces the complexity of the problem faced by someone trying to put together a document in an unfamiliar format.
What is the meaning of the content valid at a given point? Simply listing the names and values allowed at a given point has some limitations: in deciding which element to insert, one has only names to go on. With a little more information, the choice becomes much easier. In the case of elements and attributes, this might be a list which can be expanded to provide more detailed information about type, value limitations, and any accompanying documentation; in the case of text values one would at the very least hope to see some sort of human-readable description that helps the user understand what is expected. At the most sophisticated level, one would ideally be able to see what dependencies a given element had on others elsewhere in the document - must it be unique within a given set, is it a reference to another key value somewhere else, and so on.
Completeness checks. In addition to informing the user about what is and is not allowed, a prompted editing tool must also indicate what is required. As discussed above, such content listings can come in leaner and richer forms, but in the case of required content, there is a further possibility that has already been alluded to: adding outline content automatically. If done well, this can save the user a great deal of time and effort by adding all the boilerplate structure that is required in many XML formats.
Is it valid? The most basic requirement: indicate whether a document is valid according to the relevant schema.
Which parts are wrong? Being told that a document is not valid is, of course, not terribly helpful if one is not told which pieces of it are actually at fault. This may sound trivial, but it is worth noting that there is often a disparity between what the computer considers to be at fault, and what most end users would want to be shown. When validating an element against a [WXS] schema, for example, if it has a child that should not be present, most schema processors will report the parent element to be at fault (violation of content model); but to the uneducated eye, the problem lies with the child, which is probably what one would want to highlight in a user interface.
For parts that are wrong, why are they wrong? Again, this sounds blindingly obvious, but if a user is to be able to correct errors, the software will have to provide comprehensible error messages that detail what the problem is. This is closely related to the question of meaningful descriptions of content mentioned in item 2 of the creation requirements: if an attribute value does not satisfy a regular expression constraint, outputting a message containing a natural language description of the expected data type is going to be considerably more useful than displaying an error along the lines of "Value 'Foo' is invalid for pattern '[b]{1,2}lor*(t|T)'".
Another simple but important observation is that an error message needs to point the user in the right direction to fix the mistake. We cannot yet write telepathic software, so most of the time the program will have no idea what the user was trying to achieve, but this is no excuse for failing to provide as much information as the processor knows, rather than expecting the user to find half of it for himself. If an element has a duplicate ID, list all of the duplicates as well as the current one; if an element has an invalid child, list valid replacements, and list valid parents for the erroneous child. By presenting all the relevant information at exactly the point where it is needed, the software takes on as much of the burden of debugging as it is able, rather than posing a conundrum for the user to solve.
What if...? A frequent operation when editing a document is moving content around without changing it; ideally, a prompted editing environment would prevent you from moving a section of an XML tree to a point where it was invalid. A graphical "drag-and-drop" environment, for example, might only display a "drop" icon at those sections of the document were it was valid to insert content being moved or copied. On a slightly different note , when editing the value of an attribute or the text value of an element, the "OK" button on the editing dialog box might be greyed out until the value entered is appropriate. This sort of behaviour requires the application to play "what if" with the data that the user is working with in order to provide feedback on potential actions.
This list of requirements is not intended to be exhaustive, but I hope it covers the major areas. Although some aremore significant than others, they all have the potential to make an important contribution to the usability and usefulness of a prompted editing system. It should now be clear that the slightly arbitrary distinction between creation and editing of documents has produced two rather different sets of functions. The first set focuses on the issue of obtaining information about the structure of the schema/an ideal document: lists of valid content and information about its meaning. By contrast, the editing requirements are more concerned with the fact of validity: is this content/this potential change to the content valid or not? Potentially, these two problems call for quite considerably divergent solutions, which is something I wish to look at in the next section.
Turning to the new interfaces released by the W3C, let us examine how one could go about using them to meet the requirements set out above. Some areas are covered extremely well, to the point where an application would have to do little more than display the information made available; in others, it is possible to meet the requirements with a bit of extra work; and finally, in some cases, the interfaces just do not expose the required facts. I have summarised the levels of support in the table below; the details are provided in the remainder of this section.
| Requirement | Level of support |
| Creation 1: Top-level documentation | No direct support, limited workarounds available |
| Creation 2: Valid content at a given point | Full support, although with some caveats |
| Creation 3: Richer content details | Partial support, some information unavailable |
| Creation 4: Completeness checks | Partial support, some information unavailable |
| Editing 1: Document validity | Full support |
| Editing 2: Error location | Full support |
| Editing 3: Error causes | Minimal support |
| Editing 4: Speculative edits | Full support |
Table 1.
Seasoned W3C spectators will probably have noticed by now that this requirement (and to a lesser extent the other creation requirements as well) is very suggestive of the functionality originally intended to be made available through the Abstract Schemas interfaces for [DOM] Level 3. That specification was eventually abandoned, as I understand it, because it was too broad, too complex, and the requirements were not sufficiently clear. The Validation interfaces, which served as a partial replacement, have a much tighter scope, and deliberately do not cover the representation of the underlying schema. As I argued in setting out this requirement, however, something of this nature is extremely important for assisting authors of instance documents.
It is obviously not within the scope of this paper to define such an object model, and I am not going to suggest resuscitating Abstract Schemas because it seems to me that there were good reasons for dropping it. Before turning to the practicalities of what an application designer can do as things stand, however, I will make some suggestions that may be applicable to any future effort to create a generic mechanism.
There is a tendency to see the problem in terms of finding a logical representation of the schema itself. This can make it very difficult to abstract away from the details of the particular syntaxes for constraining XML content, and even if one succeeds, there is a real danger that one will just end up creating a new syntax in the process (a charge, incidentally, that was levelled at the Abstract Schemas specification). Furthermore, the underlying structure of schema languages is not always as straightforward as one might like: reflecting them too closely can produce an interface that makes basic operations harder than they need to be. [3] A possible alternative would be to break free from the assumption that everything about the content model should be encoded into the representation, and focus on those parts that really matter. For each element: what children can it have (irrespective of the complexities of specific types of content model such as xs:choice), what does it mean (textual description), what attributes can it have, which of the children and attributes are actually required, [4] and what sort of text content is it allowed. This is purely a high-level overview that allows someone to understand roughly how a document is structured, and I do not believe that it needs all the gory details like cardinality: those can be handled by the assistance one receives as one edits the individual sections of a document.
Grammar-based systems for defining XML content, like DTDs, [WXS] and [Relax-NG] lend themselves much more readily to this kind of representation than do declarative ones like [Schematron]. While there is no theoretical barrier to producing a representation that can be built up from either of these, I think that this would almost certainly be a mistake. The trouble is that declarative languages do not readily lend themselves to prompted editing, precisely because they do not obviously define a structure, but a disconnected series of rules that must be satisfied. On the other hand, I do believe that it makes sense to provide support for these sorts of rules as further restrictions on types that are already defined as part of a grammar - as I discuss in Section 4 below.
Any representation of schema content must provide simple access to human-readable documentation for both the schema as a whole and its individual components. This may require a standardisation of mechanisms used to insert such documentation into the data definitions (e.g. comment placement in DTDs). The example of systems like JavaDoc for the Java programming language should be sufficient to demonstrate the degree to which the potential value of such standard inline documentation outweighs the rigidity it imposes.
The representations of elements and attributes used in this model should be the same as those returned from the local content querying mechanisms discussed in Section 3.1.3 below, so that a consistent picture of the document structure is presented.
In the meantime, developers looking to include prompted editing capabilities in their applications have a couple of options. In the case of [WXS] schemas, there are several libraries available [5] that provide comprehensive, if sometimes rather complicated representations of schemas. Working with one of these, one could produce a simplified high-level view to display to authors embarking on a new document, but doing so will require a reasonably in-depth understanding of the way that schemas are structured. This is a shame, since the Validation interfaces have the potential to shield application developers from these complexities. Alternatively, one could attempt to build a "dummy" instance document by following the guidance supplied by the Validation interfaces, and use this as the model. It would lack documentation, and would be problematic to construct in the absence of information about which elements are required in a given content model (see Section 3.1.4), but it would be a start.
The ElementEditVAL interface provides numerous methods for querying what elements can be included at a particular location in a document. They all return a NameList, which is a list of {Namespace, LocalName} pairs, identifying the permitted elements. It should be noted that the contents of this list are only partially dependent on context: they take account of context insofar as it is necessary to determine which piece of the content model corresponds to the element being queried, but not to the extent of factoring in the effect of surrounding elements. For example, take the schema fragment:
<xs:sequence>
<xs:element name="ElementA" minOccurs="1" maxOccurs="1"/>
<xs:element name="ElementB" minOccurs="1" maxOccurs="1"/>
<xs:element name="ElementC" minOccurs="1" maxOccurs="1"/>
</xs:sequence>Corresponding to this we have the following instance fragment:
<ElementA/>
<ElementB/>
<ElementC/>If ElementB is asked for its allowedNextSiblings, context will be taken into account to find a content model for an ElementB that follows an ElementA. But it will not be taken into account when listing the possible siblings: a list containing ElementC will be returned despite the fact that ElementC could not be inserted without invalidating the content model. This is both a blessing and a curse. In some circumstances, the direct view of the content model independent of the conditions of the instance document is very useful (see Section 3.2.2 for an example). But where one wants to display to the user a list of elements that can safely be inserted into the document as it is now, one has to create an Element object for each name in the list and call a method like canInsertBefore to check that it really is valid at the moment. For a large content model this could be very inefficient.
Apart from the fact that that this list of bare names is sometimes not sufficient to produce a fully-featured user interface (as discussed in Section 3.1.3 below), there is one other slight complication that arises if one is working with a [WXS] schema that makes use of "wildcard" elements such as <any>. These are represented by means of special combinations of null, magic values such as ##any, and the namespaces which are referenced by the wildcard element. This means that at the point where a schema uses one of these wildcards, the application will not be able to provide any clues to the user about valid content beyond a bald list of namespaces. [6] In cases where the [DOM] implementation already has a grammar for the relevant namespace, this seems unduly restrictive, since the list of available elements could quite easily contain all the top-level elements from that namespace. [7] There is no doubt that this is complicated, and that an extension of the currently specified functionality could get decidedly messy; but I think there is definitely room for re-examination here with a view to making the treatment of wildcards very slightly more flexible.
In the meantime, the only recourse for those wishing to provide this little bit of extra assistance is much the same as that recommended for displaying a high-level representation of the schema in Section 3.1.1: pick a schema object model library, load the schema for the referenced namespace manually, and obtain a list of global elements.
The lack of an underlying object model means that is only possible to query one level ahead using the Validation interfaces. One can find out which elements can appear at a given point, but, before creating them, not what attributes those elements have; having created an element and obtained the list of its attributes, one cannot find out their default values without in turn creating them. The actual [DOM] tree being edited is effectively the object model; but the tree is by definition incomplete while one is creating it, which means that this object model is incomplete at just the point one is interested in. Thus while most of the information is theoretically available, accessing much of it requires making changes to the document. This is good for programming interface clarity, but bad for user interface richness.
The obvious solution in an ideal world would be to have the content querying methods discussed in Section 3.1.2 above return a list of something richer than pairs of strings. As an example, consider the case of an "Insert Element Wizard". When choosing which element to insert, it would help the user to see a description of each; it might also help to be able to expand a list of attributes, and possibly even immediate children, to get a feel for how it was used. Once a choice has been made, the next page might then allow attributes to be specified. If we are using a Document-View architecture, we will not want to make changes to the actual document, since this will trigger notifications to other parts of the application, but we do want to be able to show attributes, including which are required, and any default values. With the interfaces as they stand, this would involve one of two approaches. First, we could disable notification to other parts of our application temporarily, and make speculative changes which we would subsequently have to back out if the user cancelled the element creation. Alternatively, if we were working with a [WXS] schema, we could load it into an object model and do the detective work ourselves. This could potentially be very difficult in the case of non-global elements and attributes, since it would involve doing much of the content-matching work that a fully-blown schema processor has to do. On the other hand, the chances are that an implementation of the Validation interfaces has all this data readily to hand anyway, and could easily expose it through a somewhat richer interface. An implementation of the NameList interface backed by objects from a schema representation rather than just pairs of strings could, for example, choose to implement another, more substantial interface as well, through which more details about the listed elements might be made available.
One of the key conceptual distinctions in the Validation interfaces is between two different types of validity. Clearly, it is meaningless to try and check complete validity whilst a document is being created, since it has not been finished; on the other hand, it is reasonable to check whether what has been added is correct so far as it goes. The latter is called VAL_INCOMPLETE, the former VAL_SCHEMA. Using the nodeValidity method, one can request the validity status according to either of these two schemes from any node in the tree. The definitions of these are as follows:
| VAL_INCOMPLETE |
Check if the node's immediate children are those expected by the content model. This node's trailing required children could be missing. |
| VAL_SCHEMA |
Check if the node's entire subtree are those [sic] expected by the content model. |
If every node in a given subtree satisfies VAL_INCOMPLETE, but the subtree does not meet the requirements for VAL_SCHEMA, then the subtree is clearly missing some required content. By further checking to find the smallest subtrees under the initial subtree that are still reported as schema invalid, it is possible to determine at what level content is actually missing.
This is fiddly, but neither especially difficult nor particularly inefficient. The difficulty is that once one has found where content is missing, one cannot determine what is missing: there is no equivalent to the getRequiredAttributes method for elements. The reason for this is presumably that element content models potentially involve the complexities of cardinality, alternation, sequences, and so on; to expose all this would take us back to the kind of schema object model that was dropped with Abstract Schemas.
While the provision of this sort of detail is probably the only way to solve this problem in a formal fashion, a reasonable solution could be provided by taking a number of smaller, simpler measures together. I suggested earlier (see Section 3.1.1) that there is a place for a "high-level" content model that provides only basic information; this included details of whether an element was unconditionally required to appear as a child of another element. This could be supplemented with natural language documentation for the parent element to cover more complex cases. Finally, one could extend the "step-by-step" approach already taken by the Validation interfaces to include a method that returned a list of possible elements to be appended to the children of a given element that would move it a stage closer to completion. [8] For example, working with the following schema fragment:
<xs:element name="ElementA" minOccurs="1">
<xs:complexType>
<xs:sequence>
<xs:element name="ElementB" minOccurs="1"/>
<xs:element name="ElementC" minOccurs="0"/>
<xs:element name="ElementD" minOccurs="1"/>
</xs:sequence>
</xs:complexType>
</xs:element>Corresponding to this we have the following instance fragment:
<ElementA>
<ElementB/>
</ElementA>If we ask ElementB for allowedNextSiblings, it will tell us about both ElementC and ElementD. But if we want to know what we need to do close off the content model, we would like to be able to call a method called something like getRequiredNextSiblings, which would return us just ElementD in this case. Taken together with the documentation and the high-level model, I believe that this would make it possible to provide sufficient guidance to document authors on filling the gaps in their content; and none of these suggestions involve exposing the inner complexities of the underlying content models.
Simple and efficient in-memory validation has been conspicuously absent from most XML parsers until recently, and even now, at least one major parser only supports this in conjunction with document renormalisation. The DocumentEditVAL interface defines a validateDocument() method that will report the validity of a whole document without any normalisation. In addition, it provides a very useful feature called continuousValidityChecking; with this turned on, the DOM tree will throw an exception if an invalid change is attempted. These capabilities more than cover the first editing requirement.
Once one has ascertained that a document is invalid, finding the exact location of the errors is a little harder, since there is no central interface for locating erroneous nodes. Instead one has to work one's way down the tree comparing the return values of the nodeValidity method called to request first VAL_INCOMPLETE validity and then VAL_SCHEMA. If every node in an invalid subtree satisfies the conditions of VAL_INCOMPLETE validity, then one can be sure that the problem is an incomplete content model; the exact location of the problem can be pinpointed as described in Section 3.1.4. On the face of it, matters are much simpler if one (or more) of the nodes in the subtree is not valid according to VAL_INCOMPLETE, since it must be this node that is causing the problem.
Unfortunately, formal content-model definitions of validity does not map onto user perceptions in a straightforward way. Strictly speaking, if an element "A" has a child "B" that is not allowed, it is "A" that is invalid, because it does not satisfy its content model. The average user, however, would perceive the problem to be with "B", which is in the wrong place. To complicate matters further, "B" will actually be marked as entirely correct if it satisfies its own content model. How can one indicate to the user what is actually at fault? If the invalid child or attribute is neither defined globally for its namespace, nor defined locally to the current element, the processor will be unable to perform any validation on it, and will mark it as having VAL_UNKNOWN validity; this allows it to be identified the source of the problem. [9] On the other hand, nodes that are incorrectly positioned but for which the processor can find a definition will be validated according to their own content models, and will be marked as valid or invalid entirely independently of their relation to their parent. The only way to locate such nodes is to step through the children of the parent marked as invalid, asking each child for allowedNextSiblings and checking that the actual next sibling has a name that appears in the list.
This system does impose a fairly heavy processing requirement on a client of the Validation interfaces looking to find and highlight invalid nodes in a document. A potentially useful extension here would be a set of "subtree errors" on each node that allowed immediate access to the errors within a whole subtree, rather than requiring the sort of complex navigation detailed above. In addition, some more direct method of determining items that are invalidating the content model of a node would also be extremely useful. These are, however, fairly cosmetic enhancements: the existing functionality is quite adequate to meet this requirement.
Once the exact location of an error has been identified, the developer of a prompted editing application still has a problem: it is sometimes impossible to determine why particular items are invalid. Obviously, with content models, once it has been established that a given element is in the wrong place, that is all there is to know. As mentioned in the requirement, it can be helpful to provide information about valid replacement content, and valid parents for the misplaced child element. Both of these pieces of information are available through methods on the ElementEditVAL interface. With attributes and text content, on the other hand, simply telling the user that the value is not valid reveals very little. The value might be invalid because it does not satisfy a pattern; or it might be an ID that is supposed to be unique across the document; or it might be a key reference that is supposed to have a corresponding key elsewhere; and there are other possibilities across different schema languages.
It is easy to see why the Validation interfaces do not address this requirement: it is extremely difficult to solve in a generic, elegant fashion. Short of defining a general set of error codes that covered the superset of all possible schema errors (a brave undertaking if ever there was one!), I think that the best that can be hoped for is to fall back on the sort of human readable documentation for the elements and attributes that I have mentioned several times already. At least then, complex restrictions on values in an XML format can be communicated to the user in comprehensible terms, rather than leaving him to wade through the relevant schema document in search of the definition of a recalcitrant attribute. More sophisticated solutions than this will inevitably have to be tailored to individual technologies to be viable, and this runs somewhat counter to the intention of a generic [DOM] Validation interface.
This is an area where the defined interfaces excel. There are methods to determine whether it is permitted to insert, append, replace and remove elements; a similar array for attributes; and, very helpfully, a CharacterData validation interface that can operate at a granularity of single characters, making as-you-type validation of text a real possibility. The only minor pitfall to be wary of here is that the descoping of a proposed RangeEditVAL interface makes it tricky (but not impossible) to determine the possibility of editing whole sections of content without a single root element; implementing this requires temporarily changing the document in order to see the total effect, since the speculative methods only support individual nodes. I suspect that this is unlikely to be a significant issue for many people, however.
There are several conclusions we can draw from all this. First, on a general level, the Validation interfaces provide considerably more support for editing assistance than they do for creation. Using them, one could build an application that significantly eased editing pre-existing XML structures without ever really having to get one's hands dirty with the underlying schemas. The reason for this is that the interfaces excel when one is working at a fine level of detail at a particular point in a document; what they do rather less well (and this is clearly by design rather than oversight) is to facilitate displaying the bigger picture to the user. This wider view is a central part of the document creation process, and I think that developers writing tools to help users start from a blank canvas will still find themselves faced with detailed schema manipulation to implement some important types of feature. In the short term, the pain of this can be somewhat mitigated by taking advantage of one of the many schema object models now available.
In various places I have also suggested extensions that could be implemented without prejudicing the functionality already specified. This is not something to be done lightly. There are very good reasons for having broadly agreed standards, and adding proprietary extensions tends to undermine most of the benefits gained from standardisation. On the other hand there are several areas (in particular the NameList interface) where the addition of significant extra capabilities could be achieved with minimal extra development cost (and also negligible performance overhead) if done in the DOM implementation. This same functionality would cost much more to replicate in a client application. At any rate, when we were faced with this choice in implementing a prompted editing tool at DecisionSoft, there was little doubt that extension was going to be the simplest and cleanest route. Of course, other companies may not have the luxury of this option.
At the beginning of this section I said that I would draw some more general conclusions about the way we look at schema validation. I hope it has become clear that I think that the addition of a higher-level approach to the problem would be valuable. As a programmer, I have a tendency to think defensively: how can I validate that my data is correct? How can I prevent incorrect data from coming in? I also tend to spend much of my time thinking about the fiddly details: what element can appear at this point? How many times can it appear? What can the first character of its value be? These things are all important, especially when we're looking at information as the raw material for processing. But we also need to focus our attention on the interaction of human beings with the data: they are often the ones who have to read it, and even more often the ones who have to get it into the system in the first place. These interactions call for the addition of a freer, more open approach that emphasises the broader structure and semantic significance. To this end, our schema technologies must be capable of more than just analysis, they must also provide the possibility of synthesis. They should make it possible not just to take apart a corresponding document, but to build up a conceptual model. And to achieve this we may need to let go of the rigour and the formality in some (and only some) of the ways that we use these technologies.
The last area I want to examine is how the Validation interfaces might interact with a certain type of extension to normal schema technologies. I suggested earlier in this paper (see under Section 3.1.1) that grammar-based approaches to validation are much better suited to prompted editing than rule-based ones, because they provide a structure corresponding more closely to the intended structure of the documents they describe. The arguments in their favour are much the same as the arguments in favour of Object Oriented design in programming more generally: the use of techniques such as polymorphism, inheritance and, to a lesser degree in schemas, encapsulation, encourages a greater high-level clarity and improved modularity. Just as with OO design, however, certain problems do not lend themselves to simple expression with this approach: sometimes what one really wants is a simple rule rather than a complex type structure. The standard example here is the impossibility of cross-field validation in [WXS]: one cannot, for example, express the constraint that "ElementA" must have a value greater than zero if "ElementB" has a value greater than zero. It is sometimes possible to work around this by translating the restrictions into grammar-based terms: one could have a choice between two sequences, one of which contained "ElementANonZero" and "ElementBNonZero", and the other "ElementAZero" and "ElementBZero", but it is hardly pretty or readily comprehensible.
This limitation can be seen as an argument in favour of systems like [Schematron], which work on the basis of assertions; these are often implemented in [XPath]. One can even combine the two approaches by embedding Schematron rules in appinfo sections in a [WXS] schema. But I do not think that either approach is particularly satisfactory, since they throw away or undermine what is fundamentally valuable in grammar-based languages, namely the concept of typing. I do not have the scope in this paper to make this argument in detail; colleagues from DecisionSoft have done so at length at previous conferences, and a separate [Business Rules Validation Paper] is available on the subject. In very rough terms, what it suggests is that it would be possible to extend the schema type system to embrace the idea of embedded rules as part of the typing system itself. These rules would be expressed in [XQuery] and/or [XPath 2], and would allow a much richer sort of validation to be performed whilst remaining true to the idea of a grammar containing type definitions.
This proposal is [WXS]-specific, but what I want to take away from it are two generic points. The first is simply the value and the possibility of embedding rules into an XML grammar. The second, which I want to look at in more detail now, stems from the suggestion of using a system based on [XPath] as a means to achieving this, and is important for the possibility of validating such rules "on-the-fly" as a document is being edited.
A naive approach to implementing supplementary business rule validation would be to have a rule validator object that could determine the validity of a node (either as it actually was in the tree, or as modified by a proposed operation) in accordance with a particular rule. An implementation of the Validation interfaces would make use of such an object in order to ensure that proposed content satisfied the requirements of all the rules that were defined for the relevant section of the grammar. The problem with this approach is that it ignores the fundamental difference in scope between local element validation and general rule validation. Given an arbitrary validation rule, nodes that might affect its outcome could potentially be positioned anywhere in the document. The naive implementation would be forced to execute any rules associated with a node whenever its validity state was requested, since it could never be sure that there had not been a change somewhere which would affect their outcome. This would not only be inefficient, it would also dramatically reduce the usefulness of the prompted editing support that could be offered: since any node X might potentially impact another node Y somewhere completely different in the tree if Y had a validation rule referring to X, one could only be sure of the validity of a potential change to X by checking every other node in the tree. This is clearly not a reasonable solution.
Short of radically limiting the scope of the rules that can be expressed, which somewhat defeats the point of the exercise, the only solution that I can see is to drive this from the other end: notification of changes needs to be propagated to nodes potentially affected by them. At the crudest level, every node with a supplementary validation rule attached to it might receive updates about every change made anywhere in the tree, and decide if it was relevant, but this is still a fairly large hammer for a comparatively small nut. What we really need to be able to do is to determine how much of the tree a given rule needs to listen to in order to be sure of hearing anything relevant; and this is where [XPath] comes in. Because of the way axes work in [XPath], it is possible to determine a broad scope for any given expression without actually having to process each location step against the document. If one simply examines the axes it uses in abstract, one can identify a subtree nothing outside of which can ever have an impact on how it evaluates. [10] The root of this subtree never moves unless the starting node for evaluation is itself moved. The tentative suggestion that I am making is that this information could potentially be used to optimise rule-based validation to the point where it would be feasible to have it operating "live" on a [DOM] tree as it was being edited, contributing to the validity information exposed through the Validation interfaces. [11]
In very rough outline, I would see this working as follows:
Definitions in a grammar could have rules attached to them; these rules would either have to be XPaths or have an XPath "scope" attached to them as a sort of interface contract.
The XPaths could be processed when the grammar was loaded to produce a "broadest possible scope" for the expression.
Whenever a node with a rule attached to its definition is created in a document, it is set as a listener on the root of the scope subtree for the rule as evaluated from that node. [12]
Change events are bubbled up through the [DOM] tree for all changes made; as they reach nodes that are the roots of scope subtrees for rules, the nodes that are the target of these rules are informed of the change and can take action accordingly (possibly including informing the original source of the event that the change would invalidate the target of the rule).
This is a sketchy outline of a complex area. There are many important issues that I have not even touched on, such as how the validation operation itself might actually be performed in an efficient manner. But this is the material for another (substantial) paper in its own right. My intention in introducing it in passing here has been to highlight some of the rich possibilities that could potentially be exploited in an implementation of the Validation interfaces. In addition, I hope I have lent some small weight to the case for future extensions of existing grammar-based validation technologies making use of an integrated [XPath] approach to ensure that is tolerably "scoping-friendly".
The development work that laid the foundations for this paper was made possible by the availability of the feature-rich, free, open source Xerces XML parser from the Apache Software Foundation (http://xml.apache.org).
[DOM Validation Interfaces] This is the W3C Recommendation with which this paper is primarily concerned. It describes a series of interfaces that can be implemented to provide "live" validity information about a DOM tree as it is being edited, and also to expose a limited amount of content model information to support prompted editing. http://www.w3.org/TR/2004/REC-DOM-Level-3-Val-20040127/
[Business Rules Validation Paper] The white paper "Business rules validation - the standard the W3C forgot", which describes a possible approach to integrating business rules into XML Schemas, is available from http://www.decisionsoft.com/papers.html
[DOM] The Document Object Model is a tree-based processing system for XML data. Over the years it has expanded to include a broad range of XML manipulation functionality. http://www.w3.org/DOM/
[WXS] The XML Schema standard published by the World Wide Web Consortium describes a language that can be used to express grammars for XML documents. It has numerous advantages over DTDs, which include the fact that it can be expressed in XML, proper support for namespaces, the ability to define local content models, and a more rigorous object oriented approach. http://www.w3.org/XML/Schema
[Extensible Stylesheet Language] Extensible Stylesheet Language is a family of specifications that collectively provide a system for transforming and "styling" XML documents into different formats, including non-XML formats. http://www.w3.org/Style/XSL/
[XPath] XPath is a language used for selecting specific sections from an XML document. It is used in numerous other XML processing systems, including Schema and XSL. http://www.w3.org/TR/xpath
[XPath 2] XPath2 is the proposed update to the XPath language; it adds many new features, including support for schema data types. It is closely tied to the XQuery specification.. http://www.w3.org/TR/xpath20/
[XQuery] XQuery is a draft specification for a language that might be described as the big brother of XSL. It is intended to be a generic querying language that takes on the same role for XML as Structured Query Language plays for databases. http://www.w3.org/XML/Query
[Schematron] Schematron is a declarative, rules-based validation system for XML, usually implemented in terms of XPath expressions. http://www.schematron.com/
[Relax-NG] Relax-NG is a grammar-based validation system for XML which is intended to be clearer and easier to learn than standard W3C XML Schema whilst providing much the same capabilities. http://www.relaxng.org/
[Sun XML Schema Object Model] The XML Schema Object Model is an object model developed by Sun for representing XML Schemas. http://developers.sun.com/dev/coolstuff/xml/
[Microsoft .NET Schema Object Model] This is an XML Schema object model developed by Microsoft. It allows writing and editing of schemas as well as reading. The URL points to the relevant section of the MSDN library, and was correct at the time of writing. http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpguide/html/cpconXSDSchemaObjectModelSOM.asp
[Proposed XML Schema API] This is essentially a formalisation of the Xerces Post Schema Validation Infoset interfaces which has been submitted to the W3C as a member submission. Naturally the Xerces parser already implements these interfaces. http://www.w3.org/Submission/2004/SUBM-xmlschema-api-20040309/ (API) http://xml.apache.org/xerces2-j/index.html (Xerces Parser)
[org.eclipse.xsd] This is a library that allows the reading and writing of XML schemas, but not the validation of instance documents against them. It is developed as part of the Eclipse project by IBM. http://www.eclipse.org/xsd/
[XBRL] Extensible Business Reporting Language is a comparatively young standard for the expression of business data in XML, including, but not limited to, financial reports and accounts. It is based heavily on WXS and XLink. http://www.xbrl.org
[1] Throughout this paper I shall use the word "schema" in a generic sense to signify a grammar-based description of a data format. There are obviously many different technologies that meet this description, but for the purposes of this discussion, they are sufficiently similar in approach to be treated together. For reasons that will become clear, I do not believe that declarative systems like [Schematron] really lend themselves to prompted editing, despite the fact that they do some sorts of validation much better than grammars. A possibility for reconciling the two is presented in the section on Section 4.
[2] For example, substitution groups and abstract elements in [WXS] can clarify a grammar in a technical sense, but make it surprisingly hard to find the exact definition of a particular element!
[3] The XML Schema API (a formalisation of the PSVI interfaces from the Xerces XML parser) is a good example of this. It is an excellent representation of XML Schema, but using it to produce a list of elements contained in a given parent element involves writing a fairly complex recursive function to navigate through the [WXS] jungle of particles, model groups, complex types and element definitions.
[4] The idea of a "required" element may seem muddle-headed, since standard content models can specify complex combinations and permutations, such as "at least one of elements a, b, c and d". Clearly, none of these could meaningfully be marked as required on their own. But even if we restrict this designation to elements that are always required, no matter what, I suspect that this will cover a very significant subset of "required" content, and quite possibility the majority. If taken in conjunction with other hints such as element documentation and Section 3.2.3, it should be possible to clarify all but the most tortuous content models.
[5] The ones I am currently aware of are: [Microsoft .NET Schema Object Model], IBM's [org.eclipse.xsd], the [Proposed XML Schema API] that has arisen from the Xerces XML parser, and a [Sun XML Schema Object Model].
[6] Part of the problem here is an issue that the authors of the Validation specification quite reasonably shied away from, which is how to specify schema documents to be loaded for different namespaces. This has been something of an open issue in [DOM] for some time; the new Level 3 Core interfaces go quite a long way towards addressing it.
[7] The issue is a bit more complicated than this because of the difference in behaviour implied by the processContents attribute on wildcard elements. When I was working on an implementation of this, I ended up providing a very much expanded NameList interface that allowed clients to expand namespaces explicitly and to find out whether content from those namespaces would be subjected to "strict" validation or not; this was in direct response to the requirements of the team developing the user interface. I would not recommend this as a standard approach, but it does give an indication of just how much extra information could potentially be exposed.
[8] In [WXS] terms, this would be a subset of the list returned by calling getAllowedNextSiblings on the last child containing only elements that could match an as yet unsatisfied particle, excluding all particles (complete with their subsidiary content models) with minOccurs < 1 . Given that this is a step-by-step, "forward-only" sort of approach, I think it should also be compatible with systems like [Relax-NG] that allow ambiguous content models.
[9] Note that nodes with VAL_UNKNOWN validity can only be taken to be invalid if they are the direct children or attributes of an element with does not satisfy the conditions of VAL_INCOMPLETE.
[10] This is a slight simplification: there are certain things one can do in [XPath] (such as using the id() function)to "jump around" to anywhere in the tree; these would either have to be unsupported or be treated as having whole-document scope.
[11] Clearly the effectiveness of this sort of optimisation is heavily dependent on the nature of the rules. If all of the rules were expressed using XPaths starting with "//", this approach would not yield any benefit at all. On the other hand, given the approach suggested by the [Business Rules Validation Paper] whereby XPaths are attached to specific type definitions rather than applied to the document as a whole, such "global" expressions would be unnecessary in the majority of cases.
[12] This could be done using the existing [DOM] Events system. One drawback of this is that these events are not cancellable, so it would not be possible to use this system to prevent invalid edits. Another is that it would not provide any way of announcing proposed changes to support the speculative methods on the Validation interfaces.
![]() ![]() |
Design & Development by deepX Ltd. |