XML 2003 logo

XML Unifies Content Migration

Abstract

Content and document management systems are among the earliest applications of Extensible Markup Language (XML), which was a natural evolution from the SGML roots of many electronic publishing systems. Content management systems drove the adoption and use of XML metadata to describe content stored in these systems, as well as object repositories and object-relational databases to store content described in XML formats. As content management systems have been extended and optimized for content delivery to the web, web content management tools (CMS) tools have become familiar adjuncts to web sites, portals, and other web applications. Efforts to select and implement content management systems often emphasize XML handling capabilities for authoring, categorization through metadata tagging, and multi-purpose content delivery.

Less often addressed are capabilities to support content migration from existing sources into a CMS and corresponding content conversion or transformation between formats. Many organizations quickly realize two things: first, converting unstructured content such as Word documents or even static hypertext markup language (HTML) pages into a more structured format for the purposes of storage in a CMS can be very challenging, since translating from little structure to lots of structure requires explicit guidelines and categorization rules. Second, and related to the first, effective content conversion presumes, and in fact demands, the existence of metadata standards or specified tagging structures. Too many organizations intend to adopt centralized content management architectures without having explicit information architectures or metadata standards in place.

For organizations in this type of situation — implementing a CMS with the intention of moving content from multiple sites, divisions, business units, and other sources into a single repository structure — available XML technologies and supporting processes can greatly facilitate the implementation. Content migration and conversion can cover such a diverse scope of sources to threaten effective enterprise-wide rollout. One approach to mitigating this challenge is the establishment and use of XML metadata standards in the context of a content "migration factory" process that covers content inventory and analysis, content categorization, content conversion and transformation, metadata tagging, and formal migration into the CMS.

This concept can best be explained through the use of real-world examples. This presentation will highlight two case studies from the public sector — one at the state level, one federal — involving large-scale content migration efforts during the implementation of content management systems. Attendees should expect to learn some of the major challenges involved in standing up enterprise-wide CMS solutions, and several ways that standards can facilitate the process. While not all standards need to be based on XML, relevant successful experiences leveraging XML metadata standards for information architecture, taxonomy, and tagging will demonstrate the benefits of XML in addressing this need.

Keywords