XML 2001 logo

Real World Applications of Topic Maps

Joshua Holm <jholm@isogen.com>
Eric Freese <eric@isogen.com>

ABSTRACT

Since the adoption of the topic map ISO standard in 2000 and the release of the XML Topic Map (XTM) specification earlier this year, topic maps have gotten a great deal of attention. The concept has been billed as the be all, end all for knowledge organization and representation. Battles between topic maps and Resource Description Framework (RDF) zealots have broken out, with each touting the strengths and advantages of their individual schemes. However, through the din, a common question has arisen from users: "This sounds cool, but what can I use it for in my organization?"

Several topic map vendors and architects have developed example topic maps in order to demonstrate the power of the topic map paradigm. However, in most cases, these applications are merely toys and not of much use in a commercial enterprise. They also deal with topics (no pun intended), such as genealogy and music, which are often difficult to map to real-world applications. If potential users are unable to bridge the chasm between the examples and their own potential uses of the paradigm, they are rarely willing to risk dwindling corporate resources to field even a pilot program.

In keeping with the conference theme "What Really Works", this paper will present a real-world application of the topic map model in use at a real company. The application includes a parts management system and Interactive Electronic Technical Manual (IETM). The paper will discuss the application and highlight the successes and challenges encountered in designing and implementing the system.

Table of Contents

1. Introduction

Topic maps are everywhere it seems. If you examine the Idealliance conference programs over the past two years it seems that everyone and their dog has written a paper about them. Several companies have been launched with the hopes of selling topic map based software. Most of them are in the conference exhibition hall. It has been a year since the XTM standard has come out, but how has it served us? In this paper we will demonstrate that:

  1. topic maps are useful

  2. there are real world applications for them

  3. we can prove #1 and #2

ISOGEN is under contract to design and develop an authoring system as part of a major Department of Defense weapon system program. The technical documentation for this program will eventually reside in a class 5 IETM on the weapon system. Users off the system will be able to view the same data as a class 4 IETM being used for training and maintenance purposes. The authoring system and the IETM that will be produced take advantage of several topic map concepts for managing the data. These concepts include scope, strongly-typed topics and associations, and topic occurrences. Once a delivery system (a.k.a. viewer) is developed, it is anticipated that a full-blown topic map will be used to control and direct access to the technical information and to connect all the information objects together into a tightly bundled IETM.

2. Project Overview

2.1. Interactive Electronic Technical Manuals (IETMs)

Let us begin with what an IETM actually is. There are five classes of IETMs with Class 5 being the most advanced system. Class 1 IETMs are essentially electronic displays of fixed pages where the user can do nothing more than flip from page to page (i.e. a very simple PDF file). Class 2 IETMs build on Class 1 by providing some rudimentary navigational aids such as table of contents (i.e. a PDF file with a TOC). Class 3 IETMs continue to build by adding hypertext links between sections of the manual, but it is still very page-based. Class 4 IETMs are designed specifically for interactive delivery including the use of dialogs and other user interfaces tools. The presentation of the information is no longer page-based, and the system usually employs some database concepts or applications. Class 5 IETMs build on Class 4 by adding interaction directly to the system under repair. The IETM being constructed for this particular wepaon system is a class 5 IETM. We will now cover Class 4 and Class 5 IETMs in a little more depth.

Class 4 IETM's are hierarchically structured. Information is viewed in smaller logical blocks of text with a very limited use of scrolling. Interaction is through dialog boxes with user prompts. Text and graphics are simultaneously displayed in the same or separate window. The data format for U.S. military IETMs is defined in MIL-D-87269, which defines a set of content tags called the “generic layer”. The generic layer is essentially a set of HyTime architectural forms. The information is authored directly into a database for interactive electronic output and data managed by a Data Base Management System (DBMS). One way in which IETM authoring is different from standard paper-based authoring is that interactive features are "authored in" from the start versus added-on.

Class 5 IETMs are based essentially on the same display, authoring and data format requirements as Class 4 IETMs. The main differentiator is that class 5 IETMs often include expert systems that allow the same display session and view system to provide simultaneous access to many differing functions (e.g., supply, training, troubleshooting). Class 5 IETMs are also integrated with the equipment they cover. This means that, instead of the user driving the presentation of the information, as in a Class 4, the system itself plays a large part in guiding the user through the information. For example, when the system detects a fault, the user will be notified and course of action suggested. The user always has the option of not following the suggestion or deferring the action.From an article online http://www.logsa.army.mil/pubs/classes.htm

2.2. Authoring System Goals

The authoring system being developed will allow publications personnel to create and manage all technical content for the IETM from initial development through the entire life cycle of the weapon system. The authoring system consists of a suite of standards-conforming tools that will allow technical publications personnel to work effectively and efficiently on developing content for the program's current and future IETM. The system is designed to ensure efficient and cost-effective content development and management that meets the high technical demands of the program's content creation/acquisition, management, review/approval, and production processes, including the production of Class 4 and 5 IETM deliverables. This system must remain flexible and retain a high-level of efficiency, because the weapon system itself is still under development, with IETM contributors spread through several participating companies.

The authoring system consists of several Commercial-Off-The-Shelf (COTS) products that have been integrated to meet the unique requirements of authoring this technical information. The integrated system includes an authoring/editing application (ArborText Epic), a document management system (Documentum), a relationship/link management system (ISOGEN customization on top of Documentum), a redlining application (via Epic's change markup capability), technical illustrating software (IsoDraw), a workflow management application (Documentum), data transformation/translation software (OmniMark and GroveMinder, in conjunction with other transform standards, e.g., SAX) and an integrated viewer (custom development based on web browser). By design, the system was built using COTS applications to allow the program to capitalize on-going improvements in standards-based publishing methodologies and software applications. Also by design, the authoring system facilitates the collaborative authoring of technical content by numerous individuals, both within the contractor's facility and at customer and/or partner facilities.

The authoring system is being used to create content for several types of technical manuals. Content for all program publications will be developed and maintained in a document management system using XML as the baseline for content markup. The program's content development approach implements an “author-once, re-use many times” concept to reduce duplicate information and manual effort, and errors.

A redlining application will be used to allow authors, editors, and quality assurance personnel to make changes to XML documents similar to the way in which Microsoft Word's “Track Changes” feature (e.g., documents are not locked against change) functions.

Completed XML documents will undergo review by many internal and external personnel. These personnel will be notified by workflow management applications when completed documents or document components are ready for review. During reviews, the completed XML documents will be locked against change. Comments generated against the documents (or portions thereof) will be handled via annotations applied to the corresponding XML element. A web browser based annotation management system developed by ISOGEN allows technical and government reviewers to attach annotations to the technical information and check the annotations back into the repository for resolution by the authors. The workflow management application and repository will control user privileges to ensure users have the ability to make changes to documents or post annotations.

Illustrations for the program’s IETM deliverable will be developed from Pro/Engineer (Pro/E) source data. The authoring system's illustrating software will be capable of importing and manipulating illustration source data as needed to support publication needs.

The authoring system will accept data from other software applications within the contractor's technical computing environment. These software applications (e.g., SLIC-2B for Logistics Support Analysis Record (LSAR) data, Pro/E for engineering drawings, etc.) already contain significant source data needed for the program's IETMs. In the case of textual content, LSAR text will be translated/transformed into an XML-compliant document by data appropriate transformation/translation software.

The authoring system's integrated viewer will allow authors, editors, and quality assurance/production personnel to verify the completeness and behavior of a set of technical information. The integrated viewer is intended to demonstrate how the information would be presented if it were on the vehicle, including interaction with vehicle systems wherever possible.

2.3. Authoring System DTDs

The authoring system DTDs were developed by examining a set of applicable military standards focussing on MIL-D-87269 and MIL-STD-40051. As mentioned previously, MIL-D-87269 defines a set of generic content elements which define the pieces of information needed to build an IETM. These generic elements include elements such as <task>, <step>, <dialog>, <alert>, <graphic>, etc. MIL-STD-40051 is the U.S. Army standard for technical manuals. This standard defines the types of manuals the Army expects for its weapon systems and what types of information should be in each different manual.

In building the DTDs, ISOGEN took the hierarchy and general content types as defined in MIL-STD-40051 to define high- and middle-level wrapper elements. Next, the lower level element defined in MIL-D-87269 were used to construct the content models of the items to be authored.

When the authors are developing the content, they will generally be working in the MIL-D-87269 realm of elements. However, when items go out for review and delivery, the authored chunks will be grouped into MIL-STD-40051 groups which, when compiled for delivery, make up the entire IETM.

The DTDs also define a number of linking element which are allowed to point to only certain types of elements. This is done using HyTime constructs. Several enhancements were made to Epic to assist the authors in developing the links. This will be explained later.

3. Topic Map Constructs in the Authoring Environment

3.1. Example Application

Because the weapon system is still in development and there are export restrictions on its data, throughout this paper we will use a more common, but very similar, application that most readers will be familiar with — the automobile.

3.2. System Hierarchy

The weapon system and all of its systems, subsystems, components and parts, are represented by a system hierarchy which is modeled in a top-down approach. The system hierarchy is the spinal cord of the entire IETM. It is through the system hierarchy that all the information in the IETM is connected.

In topic map terms, each item in the system hierarchy has a topic which represents it. There are several topic types which are defined to classify the items in the system hierarchy including:

There is one main association defined which connects the items within the system hierarchy:

By breaking the system hierarchy up into these groupings, it is possible for the viewing application to know which items in the system hierarchy might have parts breakdowns associated with them. This also allows a smaller bundled set of information to be gathered, possibly for review purposes (i.e. an IETM for the drive train).

3.3. Authoring in Chunks

The style for authoring in an IETM environment is very different from page-based technical manuals. The granularity of the items being authored is much smaller. In traditional technical manuals an author might work on an entire chapter or section of the manual. Also in a traditional technical manual, the information is usually presented in an order which is intended to be processed sequentially.

Within an IETM, the author might only write tasks about a specific set of assemblies, while another author is dedicated to writing more free-flowing text explaining how items are supposed to work. Also because of the highly linked nature of IETMs, the order of presentation is difficult, if not impossible, to determine at authoring time. In either case the information must be able to stand on its own, since the authors cannot know all of the contexts in which the information being authored might be used.

Within the authoring system, in addition to the system hierarchy, there are 4 basic types of data. These data types include:

Within the four basic information types listed above, there are currently 38 different specialized types of data that can be authored. This paper will not review each one, but will highlight a few of the more interesting in terms of topic map application. Suffice it to say that each type of data has its own unique set of characteristics.

3.3.1. Procedures and Tasks

Within a military technical manual there are a large number of procedures and tasks. These items are written about specific items within the system hierarchy. For example, the procedure or task for changing your wiper blades would be linked to the “wiper blades” item in the system hierarchy. A procedure is made up of one or more separate tasks. For example, an oil change at the local 30–minute service shop may be set up as a single procedure with several tasks.

Within the topic map, the connection of the procedural information to the system hierarchy is done using the “discusses” association. The “discusses” association has one member each of type “system” and “material”. The connection between procedures and tasks is done through another association of type “consists of” where there are members of type “procedure” and “task”.

A task is a standalone set of steps intended to accomplish a goal (i.e. drain the oil, fill the oil reservoir, check fluid levels, etc.). The steps can have links to alert information if the user needs to be made aware of a hazardous condition involved with performing the step. At presentation, these alerts appear as if they were authored inline. Steps can also reference other procedures and tasks. These references cause the IETM processor to stop at that point in the task and perform the task being referenced. Once the referenced task is completed, processing may continue (depending on the type of reference) at the point from which the reference was made.

Within the topic map, the references from one task to another are handled through “task-reference” associations which have a “source” member and a “target” member.

3.3.2. Descriptive Information

As mentioned earlier, descriptive information describes systems and assemblies within the overall weapon system. Within a military technical manual there are entire chapters dedicated to describing how something works or providing information that does not suit itself to procedural delivery. As is the case with the procedural information, these items are written about specific items within the system hierarchy. Within the topic map, the connection of the descriptive information to the system hierarchy is done using the “discusses” association. The “discusses” association has one member each of type “system” and “material”.

3.3.3. Parts Information

Each item within the system hierarchy can be considered a part. This is the case since each item has part information about it. There are 2 different types of part information. The first is general information about the part in general such as part number and manufacturer. This information is the same no matter where or how many times a part is used in a vehicle or assembly. In topic map terms this information can be modelled as data occurrences with specific occurrence types defined for each type of information.

The second type of part information is that information associated with the part each time it is used. Some of this information includes quantity, location code, etc. This information exists as part of the association between a part and its containing assembly and should be modeled as data occurrences. Since associations cannot have occurrences, a topic must be declared for each instance of an “is part of” association in order to place this attaching information. In some cases information such as a usable-on and/or applicable-to ID is used when the information is valid on certain versions of the system. This information is modelled as scopes on the data occurrences.

3.3.4. Graphics

Graphics are used to illustrate the work being done in a task or a concept being described. Each step in a task has a graphic associated with it. Graphics can include exploded views of an assembly or locator views to show where an assembly is located on the vehicle. In either case graphics are specifically associated with an item in the system hierarchy. Within the topic map, the connection of the descriptive information to the system hierarchy is done using the “discusses” association. The “illustrates” association has one member each of type “system” and “graphic”.

3.3.5. Library Items

Library items are where the contractor gains the most from reuse. These items are used many times throughout the IETM, and would be very expensive to maintain, if each occurrence of them had to be maintained separately. By having a single instance of these items which is linked multiple times, a change can be made and applied to the entire database very rapidly.

All links to these items can be managed within the topic map as “uses” associations where the members are of type “source” and “target”. There was some debate as to whether these links should be modelled as special types of occurrences. Since occurrences are in essence special forms of associations, this would have been perfectly reasonable. However, since most of the other relationships between information chunks are modeled as associations, it was decided for consistency reasons to use associations here also.

3.4. Link Management

3.4.1. Link Database

During the early design of the authoring system there was a great deal of discussion about the best way to manage the linking and type information. It was determined that to store the information in a large XML file would be extremely inefficient especially when searches of the information were required. Also maintaining an environment in which there will be several authors working with the repository at the same time, made managing the topic map as an XML file extremely difficult.

It was decided to implement a database table integrated with Documentum to store the links that would otherwise have been stored in the topic map as associations. This includes:

This allows for efficient searches such as where-used queries to be run. It also allows very quick updates. A customization was developed that walks through the link database to ensure that:
  1. all links point to an object within the repository

  2. all links point to the correct type of object

3.4.2. Link Creation

Several customization were added to the Epic tool to assist the authors in the link creation process. These tools allow the author to create links individually or en masse. The authors are allowed to make specific types of links only where the links are allowed. The authors are also presented with the valid information chunks available based on the type of link being created.

4. Topic Maps Constructs Outside the Authoring Environment

One of the main reasons the topic map paradigm was considered for this project was to support the massive linking environment that would exist in the IETM when it was delivered. The contractor's vision was to adapt a web browser to use as a viewing mechanism. This would require a potentially huge collection of XML documents with possibly millions of links between them all. An XML topic map file will be included to specifically manage the system hierarchy and act as a hub for navigation to all the other XML files.

As the viewer continues to be developed, the full potential of the topic map scheme will be realized. The following sections will discuss the anticipated gains by implementing a topic map in the viewing environment.

4.1. Explicit Links

Systems are linked through the system hierarchy explicitly. Procedural and descriptive information is explicitly linked to the appropriate item in the system hierarchy. It is possible for a piece of procedural information to be linked to the same item in the system hierarchy as a piece of descriptive information. In cases such as this, the procedural and descriptive information are implicitly linked. The same can occur to graphics, parts lists, or other chunks of information that share anchors in the system hierarchy. The challenge was to determine the best way to communicate this to a viewing application. It was decided that by making these links explicit as part of the export process from the authoring repository, the burden of knowing what could link to what in the IETM would be removed from the viewer. This would allow the viewer software to concentrate on presentation without requiring it to process the intricacies of the relationships between the data.

4.2. Navigation

Topic maps have often been called a semantic network layer of information which points into a set of information. The program's vision takes the semantic network view one step further by implementing an expert system on the weapon system that can use the semantic network to control when and how the technical information is presented to users. Exactly how this will be done was still in design at the time this paper was written. However, by placing the intelligence in the data rather than in software, it is anticipated that the program will realize significant long-term cost savings, since data is historically less expensive to modify and update than software.

On a slightly lower-tech level, the insertion of explicit links will allow the viewer software to create automated menus and tables of contents based on where the user is browsing within the IETM.

5. Conclusion

In conclusion, we have demonstrated that:

  1. topic maps can be used

  2. topic maps are being used in the real world in applications such as the IETM described in this paper

What we have discovered in the use of topic maps is clarification in the organization of data. This newly found clarification needs not be specific to our customer's information but can eventually apply to the entire WWW. A quote from Morningside, a CBS Radio show about the web: "The web is like being in a library where someone has scattered all the books on the floor, attached them together with threads and you are in the dark." Topic maps allow us to not only organize the books onto shelves but turn the lights on and fill the card catalog as well.

The topic map paradigm is the product of a culture that wants to be able to understand, manage and organize their data. Spaces were introduced into documents and writing to show clarification and make information more understandable. However, quoting from the Sun FORTRAN reference manual: "Consistently separating words by spaces became a general custom about the tenth century A.D., and lasted until about 1957, when FORTRAN abandoned the practice." In the aforementioned application of topic maps, we have shown the value of creating document applications that re-use and redefine the "spaces in their sentences" so others may use and understand their information much easier.

Glossary

COTS

Commercial-Off-The-Shelf

DBMS

Data Base Management System

IETM

Interactive Electronic Technical Manual

LSAR

Logistics Support Analysis Record

Pro/E

Pro/Engineer

RDF

Resource Description Framework

XTM

XML Topic Map

Biography

Joshua Holm
ISOGEN International
St. Paul
MN
U.S.A.
Email: jholm@isogen.com

Joshua Holm is a consultant/software developer at ISOGEN International. He has recently graduated from Bethel College in St. Paul, Minnesota, with a degree in Computer Science and a degree in Theatre. In the summer of 2000, Josh joined the environment of a well-established open standards-based content management consulting company. Since coming to ISOGEN, he has gained experience with several SGML/XML content management systems, working with technical manuals and authoring systems. Mentoring under the guidance of Eric Freese, also of ISOGEN, he has learned many things: the incredible benefits of topic maps, the frustrations of golf and the sadness of losing an Audi TT.

Eric Freese
ISOGEN International
St. Paul
MN
U.S.A.
Email: eric@isogen.com

Eric Freese, a senior ISOGEN International consultant, has over 13 years of experience in the area of document, information, and knowledge management with specific expertise in the development and implementation of XML technologies. His experience includes research, analysis, specification, design, development, testing, implementation, integration and management of information systems in a wide range of environments. He has significant research experience in human interface design, graphics interface development and artificial intelligence. Freese is a founding member of TopicMaps.Org, the organization that developed the XTM specification, and currently serves as the chairman of this group. He is also the chief architect and developer of the SemanText, an open source application that uses topic maps to harvest and manage knowledge. Freese has recently come out of a 9–year retirement from golf (since something to do with links can't be all bad), and is still mourning the sale of his Audi TT.