Abstract
The UK e-Government Metadata Standard (e-GIF)[e-GMS] lays down the elements, refinements and encoding schemes to be used by government officers when creating metadata for their information resources or designing search systems for information systems. The e-GMS defines a core set of metadata elements that contain data needed for the effective retrieval and management of official information; this core is then refined for specific uses such as Electronic Records Management and public information websites. The e-GMS forms part of the UK e-Government Interoperability Framework (e-GIF).
alphaXML Ltd (now a division of HEDRA Limited) was commissioned by the UK Office of the e-Envoy in 2002 to develop a specification for representing e-GMS metadata in eXtensible Markup Language (e-GIF). This paper discusses the nature of that requirement, including requirements for exceptionally long data and metadata lifecycles, and the design criteria used in developing the XML representation. [1]
Keywords
Table of Contents
The UK e-GMS[e-GMS] lays down the elements, refinements and encoding schemes to be used by government officers when creating metadata for their information resources or designing search systems for information systems. The e-GMS defines a core set of metadata elements that contain data needed for the effective retrieval and management of official information; this core is then refined for specific uses such as Electronic Records Management and public information websites. The e-GMS forms part of the UK e-Government Interoperability Framework.
It is unlikely that any one system will need all of the elements and refinements included in the e-GMS. "Local metadata standards" that are subsets of the e-GMS are the usual route by which the e-GMS is applied in practice. Examples of e-GMS local metadata standards include the simple metadata used in the UK Online website, and the more complex metadata specified by the UK National Archives (formerly the Public Record Office) for Electronic Records Managment [PRO ERMS].
The e-GMS was developed using existing metadata standards as a basis. The set of metadata elements and refinements developed by the Dublin Core Metadata Initiative serves as the core of the e-GMS. In addition, explicit mappings are documented in the e-GMS to the following standards:
AGLS: Australian Government Locator Service
GI Gateway: Geographic Information Gateway
GILS: Government Locator Service (used in the USA)
IEEE LOM: Institute of Electrical and Electronic Engineers, Learning Object Metadata
At this point it is worth emphasizing that e-GMS metadata does not have to be in XML. The e-GMS is a technology-independent standard, and the metadata it specifies is expected to occur in a wide range of contexts, including within webpages, and in the internal data of various technology platforms supporting the delivery of Government services. However, metadata in XML is expected to be important for interoperability, and the specification of a standard XML representation for e-GMS metadata was seen as a key task for e-GMS implementation.
alphaXML Ltd (now a division of HEDRA Limited) was commissioned by the UK Office of the e-Envoy in 2002 to develop a specification for representing e-GMS metadata in XML. In addition to providing a generic representation for e-GMS metadata in XML, this specification needed to support the future development of specific XML representations for e-GMS Local Metadata Standards, and promote easy interoperation between them.
The requirements for an XML representation of e-GMS metadata turned out to be surprisingly complex. [2] In simple terms, e-GMS metadata is expected to occur in XML in the following situations:
the metadata is an integral part of the XML document it describes
the XML metadata describes an information resource that is not itself in XML
metadata is being provided (e.g. retrospectively) about an existing XML document
These contexts of use are discussed in detail in section Section 2 below.
There are many possibilities for representing e-GMS metadata in XML. Because of the range of different uses envisaged for e-GMS metadata, the design of an XML representation required a number of factors and issues to be taken into account. The approach taken was the result of detailed consideration of technical issues and likely scenarios of use for e-GMS metadata in XML. The main issues were:
Long life of Government information, and consequently long life of metadata
Compatibility with Dublin Core
Constraints on metadata element values specified in the e-GMS, including interdependencies between values, and constraints depending on the context of use
Interoperability between the solution chosen, and other XML metadata technologies
Detailed discussion of these issues, the consequent requirements, and the emerging design criteria, can be found in the following sections.
This section discusses the requirements and provides an outline of the approach taken.
There will be a large number of applications using e-GMS metadata, and a wide variety of data held in XML. The adoption of XML is still unpredictable on the common business desktop; there are now strong XML based contenders to the widespread proprietary solutions, but their ability to penetrate the market is still a matter of debate. XML is seen as a highly promising technology for achieving long term sustainable utility and effective archiving of electronic records and other business data. [3]
One situation of particular concern is the preparation of a human readable document such as a report, since experience suggests that document authors are very unlikely to enter adequate metadata unless they are working in an application context that enforces metadata entry. However, the values required for metadata elements are often present in the document content, in readily identifiable and consistently used structures. So, especilly for retrospective allocation of metadata, it is desirable to allow XML elements within a document to do double duty as metadata element values, thus allowing an e-GMS/XML metadata processor to extract the metadata automatically.
The specific situations considered included:
An XML message containing metadata (for example, metadata about the report of a public inquiry, being transferred separately from the report itself).
Metadata pertaining to and embedded in UK GovTalk™ XML schemas
Metadata assigned retrospectively to an XML document (for example, a page in an archived snapshot of a website in XHTML), without changing the existing data
A report or other human-readable document prepared in XML, where all or most of the metadata values required occur as XML elements within the document content
(X)HTML "meta-tagging" as trialled in UK public sector metadata pilots.
The representation of e-GMS metadata in XML needs to be adaptable, to accommodate the range of situations noted above. Metadata used in all of these situations needs to be fully interoperable. Metadata may be embedded in an XML document, or held separately (from an XML document or non-XML resource). Metadata element values may be located within an XML resource, or given directly in the XML metadata. These approaches may be combined, and are designed to interoperate freely, providing a flexible framework within which to design XML metadata for a specific e-GMS compliant application, or for a specific e-GMS Local Metadata Standard.
The values used for metadata elements are of key importance, since interoperation of metadata depends just as much on the comparability of element values as on the standardization of the metadata elements themselves. The development of a Local Metadata Standard will in general include strong specification of permitted value sets for e-GMS elements in that context. Making these accessible to all users of the metadata will be essential to the long-term utility of e-GMS metadata, in XML or otherwise.
Most e-GMS metadata elements can have a wide range of values, though some are restricted to, for example, dates. For most e-GMS elements, the XML schema does not validate element values. Where precisely defined value sets are required, these will be specified in e-GMS Local Metadata Standards. An important conculsion during the analysis of these requirements was that in general, these value sets will need to be validated by means other than XML schema-validation. [4]
The development of an e-GMS Local Metadata Standard will frequently involve migration into XML of existing data dictionaries and term sets. Some of these are the fruit of much effort over many years within Government organizations; some are or will be adopted from third-party sources such as industry sector standards. The e-GMS will be of little practical use in achieving interoperability, unless standardized value sets and notations provide commonly understood meanings for the metadata element values, aligned with major subject areas within Government such as Taxation, Health, Justice, etc.
This requirement to harmonize element values as well as the element repertoire is general and essential for metadata, but is easily forgotten, especially in the creative enthusiasm of adopting a new technology across a diverse community. In the longer term, a mature Government framework for metadata would see standardized value sets and notations used in e-GMS metadata administered in one or more registries. The persistence and integrity of these registries will be essential for the accessibility and usability of Government information in the long term. For readers unfamiliar with the public sector, "long term" means varying numbers of decades rather than months or years, going up to centuries for archived records.
The above discussion would apply to any widely deployed metadata standard. There is also a more precise requirement related to the XML representation of e-GMS metadata. The standardized value sets and notations used in e-GMS metadata should have, or must be given, concise names suitable for use as XML names. Furthermore, these names must be persistent, that is, they must be guaranteed to retain their significance for as long as the metadata is expected to be retained (including long term archiving as a public record).
This problem has much broader scope than that of this paper. Naming using URIs can only be a temporary patch; a well founded and not-too overloaded persistent naming scheme is required, that is not hostage to the buying and selling of internet domains, and to changes in corporate structure or the machinery of government.
In some e-GMS applications, it will be highly desirable to enable portions of the content of the resource itself to do double duty as metadata element values. Most interoperabilty contexts are unlikley to provide support for more complex XML technologies for linking and querying. So, there is a requirement to provide a simple means of identifying parts of an XML document as constituting values of e-GMS metadata elements. These metadata values may be identified within an XML document instance, or within a schema (for all documents validated by that schema).
Identifying metadata in this way is likely to be useful because document authors are reluctant to prepare separate metadata, and that fact can be a significant obstacle to the widespread implementation of metadata. However, it is important to realize that this approach will only work where the document preparation environment provides a clear and consistent document structure where metadata elements can be identified reliably within the document content. A good example of where this could be applied is archived email, where the uniform email header within an organisation can (be converted to XML and) supply a certain amount of useful metadata. Another example is the locally-standardised "document control" block frequently found in the front matter of reports.
This section discusses key design issues, and lists the design criteria for an XML representation of e-GMS metadata that arise out of each issue. In the lists of design criteria, “e-GMS-XML” is used as a short form of “an XML representation of general-purpose e-GMS metadata”; and “W3C Schema-validation” for “validation according to W3C XML Schema Recommendation 2001”.
The e-Government Metadata Standard is technology-independent. Amongst other representations, as discussed above, e-GMS metadata will certainly occur in XML. For example, this could be in application-to-application XML messages containing metadata, or in XML based records transfer between a Government department and the National Archives. Taking a more detailed architectural view, e-GMS metadata in XML is likely to occur in a number of different system contexts, including:
embedded within an "application information" (< appinfo > ) element in an XML schema document(prepared according to W3C XML Schema Recommendation 2001)
embedded within various XML documents fulfilling specific functions, e.g. public records, and reports submitted for specific regulatory purposes
as part of an XML message that contains e-GMS metadata about something outside the message
within a dedicated metadata repository
as a block of descriptive metadata within a wider metadata framework such as Metadata Encoding and Transmission Standard (e-GIF)
supplementary metadata associated with an existing XML document, for example, metadata created when a public record is selected for long term preservation, or metadata pertaining to the role of a pre-existing document within a set of documents collected for a Public Inquiry.
Design criterion: the XML representation chosen for e-GMS metadata must be able to function effectively in all these contexts.
e-GMS metadata can be expected to be long-lived, and contribute to the management, discovery and utilization of electronic resources over a long lifecycle (e.g. >100 years for an digital archive of electronic public records). XML is a widely adopted industry standard, based on (and equivalent to part of) an ISO standard already over 15 years old – and so shows clear signs of being long-lived.
For XML schemas, the picture is less clear. W3C XML Schema has wide vendor support at this time. Other schema languages for XML are gaining solid support in open source software, and an ISO XML schema standard is under development, which is intended to encompass and harmonize current approaches into a long-lived stable standard. W3C XML Schema is currently mandated as the main schema language in the e-GIF. However, it is not very suitable for describing metadata usage. It is designed to provide precise modular prescriptions for XML documents, rather than supporting the kind of open, extensible schemas ideally required for applying metadata in a flexible manner.
Bearing all this in mind, it is advisable for the XML representation of general-purpose e-GMS metadata to be independent of specific features of W3C schema-validation, whilst also being compatible with the immediate e-GIF requirement to validate XML by this means.
As discussed above, XML is likely to be long-lived. However, some public sector documents have a very long projected lifetime (>100 years), and it is unlikely that XML will remain the standard of choice for interoperability over all that time. The readability, simplicity and wide adoption of XML makes it unlikely that document content in XML will become unusable, and XML viewing applications are likely to remain available in the long term. However, the principal utility of metadata is in its daily use to support integrated access to current and past information resources, so it is quite likely that metadata in XML will eventually become functionally obsolete. Because of this, e-GMS metadata in XML should be easy to convert to a successor data format.
Design criteria:
e-GMS-XML does not depend on specific features of W3C Schema-validation, but rather uses XML structures which are likely to be straightforward to validate using any future XML schema language
e-GMS-XML is compatible with the current requirement to validate XML documents and messages using W3C Schema-validation
e-GMS-XML is likely to be easy to translate into a future successor format to XML.
In particular, e-GMS-XML should make all the data semantics self-evident in the XML, so that translation to a successor format requires only tools that process well-formed XML. Specifically, translation to a successor format should not require a schema-validating processor. [5]
e-GMS-XML is likely to be easy to read and understand for a future data analyst with good understanding of metadata principles, but without intimate knowledge of current XML technologies.
The e-GMS metadata standard is based on Dublin Core. An XML representation of Dublin Core metadata, together with a (W3C) XML schema, has been developed by the Dublin Core Metadata Initiative. The design criteria and principal scenarios of use for metadata are different between DCMI and UK Government; this is already evident in the e-GMS itself, where some aspects depart from DCMI principles, in particular because there are legitimate requirements, from different part of Government, that require support for both refinements and subelements within e-GMS metadata elements.
Because of this, simple adoption of the Dublin Core XML representation for e-GMS was unlikely to be appropriate, and in the course of correspondence with the DCMI schema group over Summer 2002, it became clear that this was indeed the case. However, it is highly desirable that interoperation between e-GMS metadata and generic Dublin Core metadata should be easy to achieve – if that were not so, then the main intended benefit of basing e-GMS on Dublin Core would be lost.
The concept of “dumb-down” use of metadata is important for interoperability between metadata-aware applications with different capabilities. The key point is that when any metadata processor looks at a set of metadata, it should be able to identify and use all the metadata elements which it can understand. In particular, refinements which it does not understand can be ignored, and the value of an element refinement used as if it were the unrefined element.
In general, “dumb-down” is a forgetful yet faithful metadata translation, preserving faithfully from a more expressive metadata form all and only what a less expressive metadata form can express. In the context of e-GMS, “dumb-down” metadata processing is likely to have two forms: processing metadata devised according to an e-GMS local metadata standard as if it were generic e-GMS metadata; and processing e-GMS metadata of any kind as if it were simple Dublin Core.
However, not all aspects of the e-GMS have been able to follow the "dumb-down" principle. In particular, this applies to metadata for records management, where the subelements required do not function as refinements. e-GMS Local Metadata Standards are not expected to mix refinement and subelement approaches, and the XML representation needs to support both without being confusing, and without requiring metadata users to understand the difference.
Design criteria:
e-GMS-XML can be mapped to Dublin Core in a straightforward manner, for those metadata elements common to e-GMS and Dublin Core.
e-GMS-XML supports processing of metadata conforming to an e-GMS local metadata standard as if it were generic e-GMS metadata, in a uniform and straightforward manner.
e-GMS-XML allows Local Metadata Standards to choose between an element-refinement and an element-subelement approach.
e-GMS metadata has a variety of constraints on the optionality and interdependency of its elements. Local metadata standards based on e-GMS are likely to introduce more of these kinds of constraints, since metadata will inevitably contain values that are governed by business rules. This design issue became evident because a number of the existing constraints are not suitable for validation using W3C Schema-validation. Other schema approaches can go further; however the nature and depth of industry support for these approaches in the medium to long term is uncertain. This spurred some deeper discussion of the different levels of validation involved in XML data exchange.
XML schema validation is principally designed to validate the structure of an XML document, and the data type of XML element content. It is widely seen as a virtue that XML validation should extend as far as possible towards checking everything that is visible to a human reader regarding correctness of XML content. However, this is not necessarily a virtue when seen from the whole system perspective, since replicating "under the hood" data validation that is tried and tested, in XML validation, brings in an additional maintenance burden as well as additional potential for mistakes. It would be inappropriate to be prescriptive about a suitable boundary between XML validation and "under the hood" validation for metadata. However, evaluation of prototype preservation metadata for the Digital Archive at the UK National Archives, undertaken by the author, suggests that XML validation should be kept very simple, wiht its main virtue being verification that the XML received is complete and fit to be processed by the receving application.
Design criteria:
avoid duplicating responsibility for metadata validation between XML validation and in-depth validation by the receiving system
if e-GMS-XML requires validation over and above straightforward validation of the structure and data type of the XML, this is simple, and specified in a technology-independent manner (and avoid this if possible)
where more complex constraints (for example, as specified in Local Metadata Standard) are intended to be supported by specific XML technologies, then guidelines and best practice on using these should be provided
XML metadata is an area where there are a number of standards, and these standards tend to be complementary rather than competing (though they may be competitors in the context of a specific application). The picture is made more complex by the fact that these standards come from different domains only now converging through the ubiquity of Internet technology – for example, there are well-regarded standards with origins in librarianship and information science (Dublin Core), artificial intelligence (DAML/OIL), and electronic publishing (ISO 13250 Topic Maps), together with efforts to integrate the metadata domain in its own right (ISO 11179, METS), as well as the ongoing work in W3C.
Although it is desirable to have a uniform XML representation of e-GMS metadata, it is also important to enable Government organizations to choose freely between technology solutions based on different industry standards. This is particularly important since some Government organizations have close ties to specific industry sectors. An important first step has been taken by making the e-GMS standard itself technology independent.
At one extreme, fine-tuned XML representations of e-GMS metadata could be devised for each specific context, using a range of XML metadata technologies. However, this would lead to a large number of different “standard” representations, and discourage easy interoperability. Another approach would be to define a rigid “one size fits all” XML representation. Neither of these is likely to meet the practical requirements of Government organizations. The design criteria below are intended to offer a reasonable middle way.
Design criteria:
e-GMS-XML provides definitions for e-GMS metadata element values to the extent that these are specified in e-GMS itself. These will be a common resource for all e-GMS XML representations.
e-GMS-XML provides a representation designed for use in an e-GIF XML message containing metadata about something outside the message. This is the most general form of e-GMS metadata in XML, designed to accommodate any (technology independent) e-GMS local metadata standard, and thus providing a simple basis for interoperability between any e-GMS complaint systems.
e-GMS-XML provides a representation designed to sit within the context of an XML document. This could be within the XML data for a publication (e.g. a report), or within another XML context such as a METS descriptive metadata section.
e-GMS-XML provides guidelines and examples for using e-GMS with selected XML metadata technologies. The aim of these guidelines is to support, for example, easy interoperability in RDF between e-GMS compliant systems using RDF. These guidelines are expected to evolve over time, as specific XML metadata technologies gain and lose acceptance in the marketplace.
e-GMS-XML provides guidelines for designing XML representations of e-GMS local metadata standards
It will be no surprise that not all the design criteria discussed above have been met by the XML representation chosen. Simplicity and generality have won over creativity, hopefully without losing elegance and economy. Unfortunately, the examples prepared during the original work in 2002 are now obsolete and would be more misleading than helpful.
The e-GMS version 2 is out for public consultation at the time of writing and is expected to be published in April 2003. Version 2 contains important new content to support the "2002 Requirements" for Electronic Records Management published by the UK National Archives (formerly the Public Record Office). During January 2003, the author was commissioned by the UK National Archives to develop an XML representation for the metadata standard in their 2002 Requirements, which is itself a Local Metadata Standard of e-GMS version 2. Following on from this work, at the time of writing the author is preparing the more generic XML schemas to accompany the final publication of e-GMS version 2. Examples conforming to version 2 as finalized for publication are expected to be included in the conference presentation, and subsequently made available from the conference website.
[e-GMS] e-Government Metadata Standard. (Draft for consultation of version 2 available at time of writing; full publication expected in April 2003.) http://www.govtalk.gov.uk/
[PRO ERMS] UK National Archives (formerly the Public Record Office), 2002 Requirements for Electronic Records Management Systems http://www.pro.gov.uk/recordsmanagement/erecords/2002reqs/default.htm
[1] At the time of writing, the XML schemas for e-GMS metadata are being revised to conform to e-GMS version 2, due for publication in April 2003. Examples based on these revised schemas are not available at the time of writing, though they are expected to be available for presentation at the conference.
[2] The requirements analysis and review of design issues took about three times as long as was expected initially. The implementation in XML was straightforward once the design issues had been resolved. A useful feature of the development process was that there was an immediate requirement for an XML schema for a specific Local Metadata Standard: the metadata for XML schemas in the UK GovTalk collection. This served as a prototype implementation that was very helpful as a starting point for the full analysis.
[3] See the author's tutorial "Technical Options in Digital Archiving: XML is the star! " presented at this conference.
[4] The UK e-Government Interoperability Framework, version 5, mandates W3C XML schema as the primary XML schema language for the UK public sector, and therefore this discussion is framed in those terms. There is a stated future intention to progress to using the broader ISO schema framework when that has been fully defined and has gained good support from vendors.
[5] This was the deciding factor that ruled out (with regret!) using the XML schema approach developed by the Dublin Core Metadata Initiative over Summer 2002. Their approach implements the mechanism of element refinement within the schema, rather than putting it out in the open in the XML.
![]() ![]() |
Design & Development by deepX Ltd. |