XML 2003 logo

Strange Creations: Prototyping XML on the Desktop

Abstract

The abstract was not available at the time the proceedings were created. Please check an updated version of the paper abstracts at the conference proceedings web site.


Table of Contents

1. Introduction
2. Scope
3. Four Classic Views
4. Less Common Views
5. Appendix: The PubXS Format
5.1. Linking Semantics
5.2. Path Expressions
Biography

1. Introduction

At [XML Europe 2003], I suggested that XML might be able to shake off the back-end doldrums and become as relevant to end users as other technologies like P2P and HTML, simply by concentrating on online data delivery.

In this presentation I move to the next stage, pure, raw R&D, prototyping the kind of GUI tools that end-users might someday use to work with online XML data documents. The tools will all be Open-Source, and likely Java-based, and will be available for download at the time of the presentation. This is designed to be a fun presentation that will provoke thought, debate, and discussion on how we can bring XML onto the desktop and out to the masses.

Since the actual presentation will be a live demonstration of freshly-coded software, with a running commentary, this paper obviously cannot substitute for it; however, I will give an overview of some of the types of software that will be appearing so that people have a better idea of what to expect.

2. Scope

This presentation will not deal with XML used behind the scenes in areas like content syndication, web services, and grid computing, however exciting and active these areas may happen to be; it does not deal with application-specific XML save formats that those used by Open Office; and it does not deal with rendering human-readable XML prose documents. Instead, it deals with end-user applications where the original XML structure is at least somewhat visible to the user, and where the content is primarily data (tables and trees) rather than prose text.

The Resource Description Framework (RDF) and XML Topic Maps (XTM) are two mature and robust specifications that are both fully capable of describing the kind of data used in these sample applications. However, the applications in this presentation use neither; instead, they use a very simple immature and untested XML data specification of my own, called [PubXS], both to avoid religious wars and for the pure joy of being contrarian. All of the demo interfaces should be usable for both of the serious XML data formats with relatively little work.

3. Four Classic Views

The first part of the presentation will demonstrate viewing (and possibly, editing) software using the four classic structured-document views:

  • the tree view;

  • the flow view;

  • the table view; and

  • the form view.

All of these have the advantage of being familiar to the user. The flow view can make an XML tool look a lot like a word processor or a Web browser, for example; the tree view can make an XML tool look like an outliner; and the table view can make an XML tool look like a spreadsheet.

4. Less Common Views

The second part of the presentation will push into some less common areas, including at least the following:

  • A file explorer view, where each object in the XML data file appears as an icon.

  • A command-line interface view, also based on the file system metaphor.

  • A scrolling view, based on a stock ticker or news scroll.

  • An Infocom view (just for fun), based on the old Infocom text adventure games like [Zork].

5. Appendix: The PubXS Format

A PubXS document is a type of XML document, generalised for data (machine-readable information), but intended to be easy for humans to create and read as well. I use it for the demo programs in this presentation, in place of better-established XML data formats like RDF and XTM.

The PubXS document represents a tree of named data nodes using a tree of XML elements. The data nodes are ordered and hierarchical, following the same arrangement as the XML element tree. Here is an example:

<?xml version="1.0"?>

<currencies xmlns:pubxs="http://pubxs.org/pubxs/" pubxs:version="0.1">

 <currency pubxs:id="EUR">
  <source pubxs:link="countries.xml#EU">Euro Member Countries</source>
  <name>Euro</name>
  <iso-code>EUR</iso-code>
 </currency>

 <currency pubxs:id="GBP">
  <source pubxs:link="countries.xml#GB">United Kingdom</source>
  <name>Pounds</name>
  <iso-code>GBP</iso-code>
 </currency>

 <currency pubxs:id="USD">
  <source pubxs:link="countries.xml#US">United States of America</source>
  <name>Dollars</name>
  <iso-code>USD</iso-code>
 </currency>

</currencies>

PubXS does not allow mixed content. Elements with element content or no content represent branch nodes in the data tree; elements with text content represent leaf nodes. The data content of PubXS leaf nodes is untyped and always represented as strings.

PubXS uses three special control attributes inside of the XML document, all in the XML Namespace "http://pubxs.org/pubxs/" (represented in this document by the prefix "pubxs"). The attributes are as follow:

pubxs:version

The version of PubXS in use. This attribute must appear on the root element of any PubXS XML document. Currently, the value must be "0.1".

pubxs:id

A document-unique identifier for a node in the PubXS data tree, provided as a convenient anchor for linking into a tree. The value must be an XML 1.0 name.

pubxs:link

A reference to another PubXS data node, either in the current tree or in a different one. The value consists of three parts:

  1. A URI.

  2. The fragment separator '#'.

  3. A path expression pointing to a specific node in the tree.

If the URI is omitted, the link refers to another node in the same tree. If the fragment separator and path are omitted, the link refers to the root node in a PubXS tree. For the meaning of linking and the format of path expressions, see below.

5.1. Linking Semantics

A PubXS link always refers to a *more canonical* version of the data represented in a node. For example, the following might appear in a PubXS document:

<author pubxs:link="people.xml#dickens">Charles Dickens</author>

At the other end of the link the following might appear:

<person pubxs:id="dickens">
 <name>
  <given-name>Charles</given-name>
  <family-name>Dickens</family-name>
 </name>
 <nationality link:id="countries.xml#GB">Great Britain</nationality>
</person>

The latter is the official data for Charles Dickens. This approach allows full data normalization when desired, but still retains the natural efficiency of a hierarchical data format (it is not always necessary to retrieve the full record).

5.2. Path Expressions

In a PubXS link, following the fragment separator '#', a path expression appears. A path expression may be any of the following:

  1. A single name, representing a pubxs:id in the target document:

    http://foo.com/countries.xml#IN

  2. An absolute path, consisting of one or more path component (see below):

    http://foo.com/countries.xml#/countries/country[IN]

    http://foo.com/countries.xml#/countries/*[IN]

    http://foo.com/countries.xml#/countries/country[2]

  3. A relative path, consisting of '/' followed by a path element with an ID subscript, optionally followed by a regular path:

    http://foo.com/countries.xml#//country[IN]

All of the above examples are links to the entry for India in this document:

<countries xmlns:pubxs="http://pubxs.org/pubxs/" pubxs:version="0.1">

 <country pubxs:id="BD">
  <name>Bangladesh</name>
  <iso-code>BD</iso-code>
 </country>

 <country pubxs:id="CN">
  <name>China</name>
  <iso-code>CN</iso-code>
 </country>

 <country pubxs:id="IN">
  <name>India</name>
  <iso-code>IN</iso-code>
 </country>

 <country pubxs:id="PK">
  <name>Pakistan</name>
  <iso-code>PK</iso-code>
 </country>

</countries>

A path component consists of a solidus ('/') followed by an XML 1.0 name or the wildcard character '*'. A subscript may optionally follow, containing either a name (representing an identifier) or a number (representing a zero-based index).

If the subscript contains a name, it represents a specific pubxs:id in the target document.

If the subscript contains an index and the path component contains a name, the index represents the position relative to sibling nodes with the same name; otherwise, it represents the position relative to all siblings.

Biography

David Megginson, principal of Megginson Technologies, has been active within the SGML and, later, XML communities since 1991. He led the original initiative that created SAX, the Simple API for XML, which is now the most widely used streaming API for XML. David's work includes many Open Source software packages, together with the book [Structuring XML Documents] and the forthcoming book [The Art of XML], both published by Prentice-Hall. David formerly chaired the XML Information Set Working Group at the World Wide Web Consortium (W3C) and served as a member of the W3C's XML Working Group and XML Co-ordination Group.