XML Europe 2004 logo

Refactoring XML

Abstract

The world of structured markup is characterized by a number of frequently recurring questions, ranging from the conceptually large (such as, 'what is the difference between documents and data?') to the technically detailed (such as, 'when should we model information with attributes instead of elements?'). By probing these fissures, it is possible to open up our understanding of where structured markup has got to, of where it might be going, and of how to use what we've got to better effect.

SGML was designed for use in a computing environment where the text-based console was the primary means of working. Many of its design features can be traced to the need for labour-saving when keying, and intelligibility when reading, markup on cramped computer screens. For XML, ease of technical implementation was a prime design consideration, yet the W3C has retrospectively made statements about XML's design objectives which are contradictory. And several design decisions taken in the broader XML family seem to run counter to some of XML's stated design precepts. Again, by probing these discontinuities between design and result (for example within XSLT) it is possible to see how XML breaks down as a usable language creation tool when faced with certain classes of problem.

It is recognised that current modelling mechanisms (DTD and schema languages) are not sufficient for modelling the full range of complexities in many classes of 'real world' documents, such as those containing overlapping structures and context-sensitive grammars. But emerging technologies like DSDL will address these issues better. Another example of this schism is the overkill associated with using XML to create serialisation formats, configuration files, and other day-to-day formats for use in software and data-centric projects.

This paper uses the topics discussed to define a spectrum of markup activities, and to place XML within this spectrum. Given this context for characterizing markup activities it becomes possible to suggest that we have a framework for seeing which language features are required and which are redundant when considering the use of a particular markup technology for a particular application. Ultimately, both SGML and XML are not well-aligned to large domains of problem type, and by refactoring XML (in effect, by re-profiling SGML) it is possible to envisage a larger family of markup metalanguages in which XML and SGML have better-defined places. The paper will conclude by outlining such a re-profiling and calling for participation in its standardisation.