XML Europe 2002 logo

XML Topic Maps and Technical Information Systems

Abstract

On first glance, the Topic Map (TM) paradigm appears to be an attractive way to handle the complex structures in which technical information is created and used. A contemporary Technical Information System consists of database items, external files or BLOBS, and an application that knows the meaning (semantics) of all items. TMs offer a way of describing the semantics in the data itself, to build up a self-explaining knowledge store. TMs promise easy integration with TM-based knowledge bases from other sources, technical or not. They also promise generic support for terminology management, semantic search capabilities, inferencing and other useful features.

However, challenges are plentiful. Due to the vast amount of information and the needs of the users, Technical Information Systems place high demands on the stability of the underlying software, as well as on its performance. It is not obvious that TMs are suitable to store the structures underlying technical information. The advantages of TMs may be purchased with new disadvantages.

This paper is based on the diploma dissertation of Mario Klesse, which was written under supervision of the author, to investigate the usefulness of TMs for Technical Information Systems. Both agree that TMs can be used to describe the information structures, but that the technology is – at the time of writing the thesis – not yet mature enough for productive use.

Keywords


Table of Contents

1. TISs
1.1. Design and functionality of a TIS
1.1.1. Functionality overview
1.1.2. Architecture Overview
1.1.3. Product Identification: the Key to Access Information
1.2. Environment
1.3. The role of XML
1.4. Challenges
1.4.1. Scope Growth 1
1.4.2. Scope Growth 2
1.4.3. Other Challenges
2. Expectations from the TM Paradigm
2.1. Integration of Knowledge Applications
2.2. Adding user groups
2.3. Separation of Ontology and Application
2.4. Use of generic tools
2.5. Serendipty
3. Questions Investigated
4. The Approach
4.1. Methodology
4.2. Modelling the TM–template
4.3. Creating the TM
5. Results and Observations
Acknowledgements
Bibliography
Glossary
Biography

1. TISs

Technical Information Systems are created to provide customer support engineers, service engineers, sales and marketing people and people from a variety of related departments with product related (technical) information. The information typically arises along the product life cycle from a variety of sources and in a multitude of forms. A Technical Information System should provide a unified point of access for all users to the materials provided by all contributors. Usually, a lot of effort is put into integrating the information from the source systems and present them in a target-specific form.

Market expectations and recent trends in legislation and standardisation put an increasing responsibility on manufacturers to provide the right information at the right time and in the right place. Product documentation is no longer an add-on, it has become a part of the product itself, and the processes to create, maintain, publish and use product documentation are a part of the product life cycle.

The primary target of a technical information system is the after sales service organisation of a manufacturer, e. g. service technicians or spare parts distributors. Typically, a technical information system contains repair instructions, functional descriptions, technical data, spare parts lists, diagnostic or troubleshooting information and the like.

With Information Technologies (IT) becoming ubiquitous in office environments, and the widespread use of the Internet, the target user group has expanded and may now include most parts of a company.

Without TIS, product related information is often provided on homepages of the creating departments and is decentralised, as shown in the following illustration.

click image for full size view

The long term goal of companies is to provide an infrastructure where all product related information can be accessed using a single, enterprise-wide, information system. This is shown in the following illustration.

click image for full size view

In most manufacturing companies today, however, this “desired state” is not reached and may remain wishful thinking for a long time. The current situation is a stage in a developing process towards that goal and may be depicted as in the following figure.

click image for full size view

1.1. Design and functionality of a TIS

1.1.1. Functionality overview

A typical TIS can provide some or all of the following functions.

click image for full size view
  • Product or Service Case Identification is a horizontal service used to identify the product about which information is searched as precisely as possible. By this, a global search scope is defined which applies to subsequent search and retrieval operations and limits the results found to information applicable to the product(s) in the scope. This is a key to quality improvements in service for products which come in a large number of configurations, e. g. cars, aeroplanes or printing machines, since a common cause for errors (when using printed manuals) is grabbing the wrong manual from the shelf.

    Product Identification is a key concept of TIS and it is not a trivial or straightforward matter. See section Section 1.1.3”Product Identification”.

  • Personalisation or User Authentication can be used to adapt the behaviour of the system to the skill level or general interest of the user, or to manage access to restricted information. It is also needed for personal annotation functions, for pay-per-use-billing, to gather statistical information, to provide feedback to the manufacturer, or for integration with front office and back office IT systems.

  • Functional descriptions are an example for additional product specific information. They explain how the product, or one of its components, works.

  • The Parts Catalogue is a specialized application or module that lists available spare parts for a product. Typically, it lists quantities, packaging, ordering information, alternative or replacement parts (what to use if part is no longer sold or is sold out) and shows the location of the part in the product.

  • A Repair Manual is an electronic technical manual, typically customised on-the-fly from a neutral manual (product unspecific manual) to contain only information relevant to the identified product.

  • Troubleshooting Manuals can be electronic manuals, similar to repair manuals, which contain troubleshooting tables or diagrams. Troubleshooting Manuals can also be diagnostic expert systems.

  • Front- and Back-Office Integration provide interfaces to the environment of the TIS, e. g. procurement systems to order spare parts, or invoicing systems in a repair workshop.

1.1.2. Architecture Overview

The following illustration shows an overview of contemporary implementations of TIS.

click image for full size view

Some information resources are stored in files, others in relational databases. Meta-information that is used to build up navigation structures or to manage the access to information is stored in relational databases as well. This information is integrated by a specialized application that provides the functionality.

1.1.3. Product Identification: the Key to Access Information

Product identification is not as straightforward as it may seem. Hierarchical browsing or tagging systems are not sufficient to allow efficient maintenance and use of the product related information; the maintenance view on the product configuration is very different from the “use” view.

A product's configuration consists of a number of configuration items, for a car those may be engine size, body style, transmission type, drive side, type of seats, aircon and other options.

In the source systems (systems used in information creation), information is linked to configuration items and not to specific configurations. This is more than a convenience for the contributor of the content; it is a necessity. If, for example, repair instructions describe an engine repair for a specific engine, and a new car model is released which uses that engine, the engine repair instructions apply to the new model as well and the source system has to be able to reflect this without having all contributors check all their contribution's applicability whenever a new model is published. So the instructions must be tagged with the configuration item “engine type XXX”, and not the specific existing configurations it is applicable to.

click image for full size view

The information is used, however, in the context of a specific product, and the product identification identifies a configuration. The TIS must map between the configuration and the configuration items.

The real life situation is more complex than the illustration (e. g. applicability for a resource can be a logical expression in configuration items) and has to be evaluated in real time even for very large numbers of configurations with many configuration items.

In some cases, it is also interesting to navigate in the product information, e. g. starting with a model, go to the engine type, then see which other models are using the same engine, then which engine oils are recommended etc..

1.2. Environment

A TIS cannot exist as a standalone application, but must necessarily interface to other systems. A minimal requirement is that there are interfaces to the source systems where the information is created and maintained. Source systems include:

  • Content Management Systems where documents are maintained, released and linked to product configuration items.

  • PDM/PLC Systems which store product configuration information.

  • Parts Master Systems which are used to define the spare part information.

Other systems where interfaces are useful can be:

  • E-Procurement for spare parts.

  • In a workshop environment, Service Order Entry or Invoicing.

  • CRM-Systems.

  • Specialised Diagnostic hardware.

  • Lots more ...

1.3. The role of XML

XML plays a key role in TISs. Technical Information often is created in XML format to allow cross-media publishing. Publication to print media is a necessity either due to legal requirements or to be able to provide information for organisations in developing countries, which may not have reliable access to electronic media or even electricity. The presentation of the information must be optimised for the target media. XML helps to achieve single source cross-media publishing.

In some fields, like the automotive and aircraft industries, legal requirements or standardisation efforts lead to the use of XML.

Information re-use is another driving factor to use XML. XML also is often an intermediate step when non-document information, such as the parts catalogue, is prepared for printing.

Finally, XML is used for Enterprise Application Integration (EAI) purposes when interfacing to the environment systems.

1.4. Challenges

1.4.1. Scope Growth 1

Real-life TIS start often as an application with a limited scope of content. The scope is limited in two dimensions: type of information provided (e. g. Repair Manuals but not Marketing Material) and products covered (e. g. passenger cars but not commercial vehicles). Over time, the scope grows as more and more contributing departments want to (or are forced to) provide their content via the same TIS. Frequently, several systems co-exist with different initial scope, which grow together and ideally have to be merged to a single system.

Such mergers are difficult because each implementation of a TIS has its own implementation of the business objects. Those implementations are complex (mapping the complex real world of product configuration) and specific to the initial scope of the system. Implementation of the business objects often is syntactically and semantically incompatible.

By merging, implementation details will change so that at least one of the systems will have to be modified.

1.4.2. Scope Growth 2

Real-life TIS also start often as an application with a limited scope of users (e. g. Service Technicians). Over time, the scope grows as more and more users see a potential use in TIS. The implementation of the business objects is biased by the view of the initial target users and it is difficult to extend the user scope of the system. New users can access the information only via the existing applications.

1.4.3. Other Challenges

  • Relational data modelling is not self-explaining. In absence of the application, interpretation of the data and documents is not unique. Additional information is required to develop interfaces to environment systems and to document the relational data and the business objects.

  • The separation of product configuration information and access information (other navigation structures) as database items, and resources as files is sometimes a bit arbitrary. In reality, information may be access structure or content, depending on the point of view and the user's particular problem.

  • Integration of terminology management into content creation is often awkward and a bit artificial.

2. Expectations from the TM Paradigm

2.1. Integration of Knowledge Applications

TM based applications will be easier to merge.

  • By applying the TM merge operation, we expected the resulting TM to be already a partially integrated knowledge store on which both applications can work without changes.

  • We expected that by providing auxiliary “merging” TMs, semantic incompatibilities can be removed without having to change the application implementation.

  • TMs have a strong identity concept using Public Subject Identifier (PSI)s.

2.2. Adding user groups

We expected the information stored in a TM to be immediately useful for other users. Due to the self-explanatory data model, the information can be understood and be browsed using generic TM browsing software, e. g. Ontopia's “Omnigator”. Since associations and topics are descriptive, no special application is required to interpret associations.

Using TM merging, we expected that additional information needed by the new user group can be added to the information store without having to change the original system.

Special applications or modules for the new user group can then be added gradually.

2.3. Separation of Ontology and Application

A conventional system encapsulates parts of the underlying ontology in the business object layer, the application logic or even the user interface. This part of the ontology is not accessible to other systems or other information users. The system's “universe” is spread across different parts of the system.

With TMs, the entire “universe” is in the TM. It is accessible to others.

2.4. Use of generic tools

We expected that viewing a TM in a generic tool already gives a meaningful access to the stored information. Specific applications can improve the “view” onto the data, but the information about topics and their associations should be clearly understandable with the generic software.

The TM can be viewed using different generic tools, for example using “Omnigator” for hypertext-based representations and using “The Brain” for interactive graphical browsing in the associations.

The data of conventional applications can also be viewed using generic tools like SQL monitors, but since the data model is not self–explanatory, an ordinary user can make no use of the information.

2.5. Serendipty

Due to the self explaining nature of TMs and their immediate usefulness when viewing with generic software, we expect more serendipity successes than with conventional systems.

3. Questions Investigated

To evaluate whether XML Topic Map (XTM)s can be the basis for the next generation of TIS, an investigation to answer the following questions was made by Mario Klesse in his diploma dissertation.

  • Suitability

    Can existing TIS functionalities also be implemented based on XTMs as information stores?

  • Benefits

    Which benefits can be gained from an XTM–based TIS?

    Which functionalities can be implemented that are not (reasonably) possible with a conventional system?

    Which of the challenges from Section 1.4”Challenges” are easier to handle?

  • Stability and Completeness of Standard

    Is the XTM–Standard stable enough to allow implementing an application with it without risking to be incompatible to future applications based on the then–current version of the standard?

    Does the standard cover all important aspects of information management?

  • Stability and Suitability of Software

    Are the available TM–Engines stable enough for an information system which needs to have high uptimes, operate reliably and process a large number of requests simultaneously?

    Is the implementation of the XTM–Standard by the available software sufficiently complete?

  • Performance, Scalability and System Sizing

    Can a TIS based on XTM show reasonable response times to the end users?

    Will such a system be scalable to arbitrary amounts of data?

    What are the sizing requirements for server and client hardware compared with a conventional TIS?

Those questions can be summarised into a single one: If we were to base our next TIS–Implementation on XTM rather than conventional technology, will we have a happy customer?

4. The Approach

To answer the questions, two prototypes were implemented, representing two typical TIS–components. In this paper, the general approach to implement those applications can only be briefly highlighted.

4.1. Methodology

TMs are a young concept and at the time of creating the prototypes, no general design methodology for TM–based applications was established.

Special about TM–based systems is that the semantic information about the data is stored in the data itself. Therefore, the creation of the ontology is a key point in the system design. The following illustration shows the general design of a TM–based application.

Figure 1.

click image for full size view

Design of a TM–application

The creation of an ontology is an iterative approach. Since ontology design it is a key point in the system design, the iterative approach was taken for the whole system design. The design process may be described by the following illustration.

Figure 2.

click image for full size view

Design Process for a TM–application

  1. Creation of an Ontology.

  2. Creation of a TM–template from the ontology.

    Selection of a TM–engine. This step is usually done only in the first iteration.

  3. Customisation of the TM–engine and development of the application.

    Creation of transformation modules to create the TM from data in source systems.

  4. Test and use of the TM and the application. Go back to step 1 if either one does not satisfy the requirements.

4.2. Modelling the TM–template

For existing TISs, there usually exist Entity-Relationship-Diagram (ERD)s, Unified Modelling Language (UML) application designs or both. There is no “notation” for modelling TM–templates, but most of the concepts of ERD or UML can be modelled in TMs.

To model the TM–template for the prototypes, Mario started with relational and object–oriented models of the real world and derived a TM–template from this using mapping mechanisms described in his thesis[Klesse 2002].

An interesting side-effect of TM–modelling is that reified associations become topics and thus, since a topic is essentially “something one can talk about”, get a name. This elevates them somewhat in status, at least in the perception of the designer and the user, and helps to avoid common design errors.

4.3. Creating the TM

A key starting point for the design of the TM–template were the ERD models underlying the source systems for the data. All data was stored in an RDBMS, including references to resources (files or URLs). Therefore, once the template was established, converting the source system data into a topic map was not difficult.

For the prototypes, two approaches were chosen to create the TM.

  1. For the first prototype, for each Topic Type and each Association Type an XTM file was created from the RDBMS source, using SQL queries and transforming the results to XTM using XSLT.

  2. For the second prototype, the Application Programming Interface (API) of the underlying TM–Engine was used.

5. Results and Observations

The results of the investigation are summarised below by referring to the questions asked in Section 3”Questions Investigated”.

Note

Work on the thesis took place from June to December 2001. Statements made in this section about the XTM–standard and about commercial applications represent findings we made in that timeframe and may be outdated! In particular the product evaluation and selection took place in the first 8 weeks of the thesis work.

Implementation of the prototypes was based on the Ontopia Knowledge Suite 1.1.1. The product was selected following a series of tests which Mario designed to test those product properties that were essential to perform his investigation. Of the 3 products tested, it was the only product which offered all required properties. The 3 products tested were shortlisted from a larger selection based on product descriptions and other available information.

  • Suitability of concept

    This question can be answered clearly affirmative. Conceptually, a TM and a TM–Engine can be used to replace an RDBMS. Therefore, it is technically possible to implement all TIS–functionalities on a TM–Engine. (This does not imply practicality, though.)

  • Benefits

    Benefits and strong points of TMs include:

    • Easier integration of information stores. Fully automatic integration can not be expected, but the merging of TMs can partially automate the process and TMs provide a framework in which semantic conflicts can be resolved. In addition, the technical tool to resolve the conflicts (“merging TM”) also is a documentation of that resolution.

    • Content management is a very natural application for TMs, esp. Management of XML content. Metadata and Content can be well integrated; with TMs, powerful link management can be implemented.

    • Natural support for terminology management.

    • Knowledge contained in TMs for specialised systems is generally available and readable. XTMs can be used as vendor neutral format for storing “knowledge bases” of expert systems.

  • Stability and Completeness of Standard

    TMs are ISO standard 13250 and can be considered stable. XTM are now an appendix to ISO 13250 and can be considered reasonably stable. However, this was not the case during the earlier phases of the thesis work.

    However, some aspects are not yet covered by standards:

    • Query languages. There are several proposed query languages (“TOLOG”, “TMQL”) which are not part of the standard and therefore not implemented by all or even most TM–Engines. A standardisation is in work.

    • Constraints or Integrity conditions. The standard does not provide for any means by which the designer of the TM–template can put constraints onto the topic map itself. A standardisation is in work.

    • There is no standard API to query TMs. Therefore, application implementations are product specific.

  • Stability and Suitability of Software

    The prototype development was done using Ontopia Knowledge Suite 1.1.1–1.2.3, so statements can only be made about this Engine. It is our opinion, however, based on the results of the preliminary product evaluation, that the results reflect the state of TM–technology in general and that with other products the overall impression would not have been better.

    The TM–Engine was operating reliably and stable and giving correct results for read-only access.

    However, there were problems when merging two TMs both stored in RDBMS which made the merging implementation unsuitable for the intended use (merging huge TMs). Also, the import of large XTM files was problematic. For those reasons, not all of the intended parts of the evaluation could be done.

    According to Ontopia, the first of these problems was due to a bug that has now been fixed. The second is an “out of memory” issue that is being addressed in release 1.3.1 of the OKS.

  • Performance, Scalability and System Sizing

    Response times of the demo applications were quite high and are not acceptable for a productive information system. However, the demo does not give response times under realistic operating conditions for two reasons:

    1. For the demo applications, client and server were running on the same hardware (a typical client PC). Also, application server and RDBMS were not selected for performance, but for cost.

    2. No performance tuning took place. According to Ontopia, significant performance increases can be gained from proper tuning.

    The TM–based system was considerably slower than a conventional TIS (reference application) showing similar functionality, which ran on the same hardware platform, and for which no (database-)performance tuning had taken place.

    A special problem is that navigating to topics with a large number of associations appears to take much more time than to topics with small numbers of associations. Topics with large numbers of associations are typical “information hubs” which are also used frequently by the users. This makes the most important parts least accessible.

    According to Ontopia, using the TOLOG query language in conjunction with the RDBMS backend greatly improves performance in cases like this, since it allows the system to perform a single SQL query instead of one query for each associated topic. However, this functionality was not available at the time of our evaluation.

    Scalability to arbitrarily large TMs is not given. The prototypes were based on a real-life TIS which is currently in pilot phase and contains a limited scope of information. This system was chosen as a pattern particularly for that reason. In prototype development, however, the scope of information had to be further restricted due to capacity limitations.

  • Additional observations (TM creation)

    This observation is not specific to the selected software but applies to the market in general.

    There are only few tools for immediate creation of TMs, and they are not easy to use. It appears that assigning the task of TM maintenance to a non-technical person is not feasible with the existing generic tools.

Acknowledgements

This paper is based on the diploma dissertation of Mario Klesse, which was written under the supervision of the author. Most of the observations made and results presented in this paper are Mario's work and in fact Mario should be the one who presents the results at XML Europe 2002, but at this time, we do not know whether he will be able to attend the conference.

Mario received a lot of support from the topic map community during the time he worked on the thesis, and we both wish to thank all those people, esp. the people at Ontopia, for their support.

Bibliography

[Klesse 2002] Mario Klesse: XML-Topic Maps als Grundlage für Technische Informationssysteme, Diplomarbeit, Studiengang Medizinische Informatik, Universität Heidelberg/Fachhochschule Heilbronn. Not available to the public.

Glossary

API

Application Programming Interface

EAI

Enterprise Application Integration

ERD

Entity-Relationship-Diagram

IT

Information Technologies

PSI

Public Subject Identifier

TIS

Technical Information System

TM

Topic Map

UML

Unified Modelling Language

XTM

XML Topic Map

Biography

Senior Consultant
Consulting
Hewlett-Packard-Straße 1

Dr. Oliver Bonten works as a senior consultant in Hewlett–Packard Consulting's XML Competence Centre. He has been involved as a project manager, system architect and, in his younger days, technical consultant in a number of projects to implement Content Management for Technical Information Systems and Legal Publishing. Among his clients are European and Asian car manufacturers, a pharmaceutical company and a legal publisher. His special interests include publishing for print media. He has been working with SGML and later XML since 1993. He holds a Ph. D. in mathematics from RWTH Aachen.