XML Europe 2003 logo

Topic Maps: Business Objects in Disguise

Abstract

Topic Maps are an important component of a layered information and knowledge management architecture and their use in the enterprise continues to grow. In these systems Topic Maps play two roles. The first is a layer over disparate data sources allowing efficient and powerful navigation and knowledge transfer. The second is first-class business object, requiring the same sophisticated management that all business objects in the system do: life-cycle, configuration management, transaction semantics, etc.

This paper addresses some of the additional needs that Topic Maps as managed first-class business objects require. Most of these additional complexities are a result of Topic Maps' application to information objects. For example, the semantics of merging are complex and business process specific.

One typical approach of merging behavior is to create a new Topic Map that is completely independent of the source Topic Maps. This is insufficient for many use cases. The correct behavior may be to create dependencies of merged Topic Maps on their input, and respond to life-cycle changes in the input Topic Maps appropriately. This example and others are explored to more clearly frame the infrastructure and support needed to manage Topic Map business objects in the enterprise.

Keywords


Table of Contents

1. Introduction
1.1. Topic Maps
1.2. Business Objects
2. Topic Maps As Business Objects
2.1. Life-Cycle
3. Scenarios
3.1. Versioned TopicMap, Versioned Occurrences
3.2. Versioned TopicMaps, Versioned Merging
3.2.1. Do Nothing.
3.2.2. Scoped Merging
3.2.3. Maintain merging information.
4. Summary
5. Further Study
Bibliography
Glossary
Biography

1. Introduction

Topic Maps and their supporting infrastructure are quickly achieving the level of maturity needed to make them useful as part of the basic information management toolkit. With increasing vendor support, standardization activities, and interest in the field of Knowledge Representation and Interchange, it is clear that Topic Maps are here to stay. The current focus of endeavor in the field of Topic Maps has focused primarily on Topic Maps as a navigation and knowledge representation layer on top of an enterprise's information resources. This arguably provides many benefits such as knowledge traversal, thesuari, and ontologies.

Much productive work has been done in the field of developing an infrastructure to support these layer-based applications. This is what I will term first generation Topic Map applications. These are the first applications that put the Topic Map paradigm to good use. As these applications succeed, and the use of Topic Maps in the enterprise continues to grow, various use cases will arise in which providing solely a layer view of Topic Maps will not be sufficient. Topic Maps will need to be viewed as FCBO themselves and require all of the management that that implies. The knowledge representation layer will need to work in synergy with the business rules that exist for the management of those Topic Maps to provide additional value to the enterprise, and satisfy these second generation use cases.

This paper will proceed in the following fashion. We will provide some brief backround on Topic Maps and Business Object in general and will examine what it means to unify these concepts. We will focus on the life-cycle aspect of business objects in particular, using several different scenarios to identify and illustrate issues and frame solutions. We will conclude with a brief summary and identify areas of further study.

Note

All Topic Map examples will be given using the XTM[XTM] syntax. Also I will be continuing to use the non-compliant syntax for xlinks that the XTM spec uses.

1.1. Topic Maps

Topic Maps are a paradigm designed to address many of the information and knowledge management issues facing the enterprise today. The fundamental concept behind Topic Maps are not terribly complex. Yet the constructs of the TM paradigm allow for an effective implementation of a knowledge-rich layer in the enterprise.

Topic Maps are used to address a variety of problems. Amongst them are:

Information Indexing, etc.

One of the first envisioned applications of Topic Maps was as a superpowered back-of-the-book index. The first use enumerated in the HyTime Architecture Based Syntax for Topic Maps is "To qualify the content and/or data contained in information objects as topics to enable navigation tools such as indexes, cross-references, citation systems, or glossaries."[HyTM](1)

Knowledge Management

More recent excitement over the power of Topic Maps has focused on the use of the paradigm to represent ontologies of varying complexity, and provide more knowledge-based traversal functionality, although this is discussed in HyTM as well [HyTM]. The general idea is one of enabling additional traversal and relationship information amongst disparate information resources to be captured in a syntactic form external to the resources themselves. If a document is authored in XML one can enable linking by any number of means, but capturing the relationships amongst legacy data is enabled by expressing it in a standard, machine readable syntax external to the data itself. This has the additional effect of allowing different perspectives over the same data set to be presented.

For additional information about Topic Maps and their uses see [COVER]

1.2. Business Objects

The term Business Object is overloaded. For the purpose of this discussion we define business objects to be the essential components necessary to support business. For the sake of this paper, we will use the following definition. From the International Telecommunication Union Business Object Summit[ITU-BOS]

Business Object

  • Represent persons, places, things or concepts in a business domain.

  • Package business procedures, policy and constraints around business data.

  • Independent of applications.

  • Sharable across industries.

  • Reusable.

For the purpose of this paper we will focus primarily on Topic Map as laid out in the first two items with emphasis on the life-cycle issues presented therein.

2. Topic Maps As Business Objects

2.1. Life-Cycle

Topic Maps have been viewed as belonging to two disjoint sets of business entities: Individual entities that require business-specific management (separate from any topic related processes), and providers of the first generation Topic Map layers. This paper seeks to identify the complexities that arise when we look at the union of these sets of usage. Topic Maps that actively provide the need of First Generation Applications, but also require additional business processes as individual entities, more specifically versioning.

What sort of additional processes are we talking about when we discuss Topic Maps as First Class Business Objects (FCBO). This paper focuses on one of the most critical ones, life-cycle. The key point isn't that these processes are applied to Topic Maps, the key point is that they are applied to Topic Maps in the context of fulfilling their roles in first generation applications.

2.1. Life-Cycle

Versioning is nothing new to the field of information technologies, but when versioning is viewed with the jaundiced eye of one who has dealt extensively with hyperdocuments, it is recognized as a hard problem to solve. Much has been done to provide conceptual models for addressing the complexities that arise, but there are some additional complexities that are specific to the Topic Map field. A concise model for expressing versioned hyperdocuments can be found at [VHD]. This model builds upon the more foundational work found in Snapshot-Based Configuration Management [SnapCM]. SnapCM provides an abstract versioning model that is robust enough to support use cases involving hyperdocuments, including Topic Maps.

3. Scenarios

In order to facilitate the discussion of these issues, several individual use cases will be identified, the issues they raise will be enumerated, and possible solutions will be identified.

3.1. Versioned TopicMap, Versioned Occurrences

In this use-case, we are interested in the interaction between a Topic Map, as a Business Object with life-cycle, and the occurrences referenced in the Topic Map as business objects with life-cycle themselves. This scenario implies that the implementing engine is version-aware, and that both the Topic Map and Occurrences participate in the business defined life-cycle process.

Note

This does not imply that the occurrences need be managed by whatever CMS manages life-cycle, only that the implementing engine be aware of that life-cycle.

This scenario can be addressed the similar way that the issues of versioning any hyperdocument are. These are given a thorough treatment in [VHD] and [SnapCM].

There are several different models which provide a sufficient view of Topic Map as Versioned Hyperdocument, but the simplest one is shown below.

click image for full size view

Figure 1. 

Drawing from the legal domain for an example, assume we have the following:

  1. A set of legally binding contracts.

    These documents are in various stages of legal status. Some are in a draft stage, some have already been approved and finalized, but must be revised due to change in the business relationship.

  2. A Topic Map that provides a knowledge layer on top of the contractual agreements among other things. This is used by the legal department to provide traversal and relationship discovery. It is a business requirement that the Topic Map only reference occurrences that are part of documents that have a legally binding status.

  3. Different versions of the draft are developed on parallel branches (One per legal team such as environmental, corporate, whatever.), and periodically synced back up and merged onto a main branch. The gist is that there is a degree of complexity to the versioning scheme that is slightly more sophisticated than simple linear version.

  4. It is desirable to be able to navigate from an occurrence in a Topic Map to the information resource that it references.

At this point there are two feasible options.

  1. Encode the logic of version resolution in the Topic Map itself.

  2. Use version aware references to resolve occurrences at request time.

Example: versioning internal to Topic Map

	
	<topic id="bill1453">
	  <instanceOf><topicRef xlink:href="#bill"/></instanceOf>
	  <baseName>
	    <baseNameString>House Bill 1453</baseNameString>
	  </baseName>
	  <occurrence>
	    <scope>
	      <topicRef xlink:href="#version1"/>
	    </scope>
	    <resourceRef
	      xlink:href="http://www.isogen.com/resources/bill1.txt"/>
	  </occurrence>
	  <occurrence>
	    <scope>
	      <topicRef xlink:href="#version2"/>
	    </scope>
	    <resourceRef
	      xlink:href="http://www.isogen.com/resources/bill2.txt"/>
	  </occurrence>
	  <occurrence>
	    <scope>
	      <topicRef xlink:href="#latestversion"/>
	    </scope>
	    <resourceRef
	      xlink:href="http://www.isogen.com/resources/bill2.txt"/>
	  </occurrence>
	</topic>

or ...
	
	<topic id="bill1453">
	  <instanceOf><topicRef xlink:href="#resource"/></instanceOf>
	  <baseName>
	    <baseNameString>House Bill 1453</baseNameString>
	  </baseName>
	</topic>

	<topic id="xgf1234">
	  <instanceOf><topicRef xlink:href="#version"/></instanceOf>
	  <baseName>
	    <baseNameString>xgf1234</baseNameString>
	  </baseName>
	  <occurrence>
	    <resourceRef
	      xlink:href="http://www.isogen.com/resources/bill1.txt"/>
	  </occurrence>
	</topic>

	<topic id="xgf1235">
	  <instanceOf><topicRef xlink:href="#version"/></instanceOf>
	  <baseName>
	    <baseNameString>xgf1235</baseNameString>
	  </baseName>
	  <occurrence>
	    <resourceRef
	      xlink:href="http://www.isogen.com/resources/bill2.txt"/>
	  </occurrence>
	</topic>

	<association id="xgf1236">
	  <instanceOf>
	    <topicRef xlink:href="#version-of"/>
	  </instanceOf>
	  <member>
	    <roleSpec>
	      <topicRef xlink:href="#resource"/>
	    </roleSpec>
	    <topicRef xlink:href="#bill1453"/>
	  </member>
	  <member>
	    <roleSpec>
	      <topicRef xlink:href="#version"/>
	    </roleSpec>
	    <topicRef xlink:href="#xgf1235"/>
	  </member>
	</association>
	
	<association id="xgf1237">
	  <instanceOf>
	    <topicRef xlink:href="#next-previous"/>
	  </instanceOf>
	  <member>
	    <roleSpec>
	      <topicRef xlink:href="#previous"/>
	    </roleSpec>
	    <topicRef xlink:href="#bill1234"/>
	  </member>
	  <member>
	    <roleSpec>
	      <topicRef xlink:href="#next"/>
	    </roleSpec>
	    <topicRef xlink:href="#xgf1235"/>
	  </member>
	</association>

	  

Example: Version ignorant Topic Map

	   
	<topic id="bill1453">
	  <instanceOf><topicRef xlink:href="#bill"/></instanceOf>
	  <baseName>
	    <baseNameString>House Bill 1453</baseNameString>
	  </baseName>
	  <occurrence>
	    <resourceRef
	      xlink:href="urn:version:link/target"/>
	  </occurrence>
	</topic>

	

Which is more suitable depends on the specifics of the use cases. The second option provides for far greater flexibility when viewing the Topic Map over time. This enables occurrences references to not only have the flexibility of defined resolution policies, but essentially any user defined policies desired. One example of a user defined policy would be a reference that upon traversal, caused the relative strength of the reference to be reflected appropriately in a user interface. (i.e.whatever user interface navigation is provided for by the Topic Map traversal would give emphasis to the occurrences that are used most often.) The policies would also be a point of potential security restrictions. Due to the nature of the snapCM model, the possibilities are limited only to your imagination.

The second option also provides for a far more stable Topic Map itself. In the first, the Topic Map would require change anytime any of it referenced occurrences were updated. If the Topic Map provided mission critical navigation, this would have to be transactionally secure, and the performance implications are very poor.

The first case does raise the possibility of using a Topic Map (most likely constrained by TMQL) as the equivalent to a database dump of all of the versioning information not only for the Topic Map itself, but for all of the resources in the system.

Note

This example only provides for occurrences to a single information resource. A more robust solution would be the use of RTDs to provide indirection when linking into those information resources [RTD].

3.2. Versioned TopicMaps, Versioned Merging

Focusing exclusively on merging Topic Maps raises it's own unique set of issues. Merging has always been a somewhat grey area in the various Topic Map standards, and allows individual applications the greatest flexibility in determining how things are merged. All the spec has to say is that those topics which refer to the same subject are the same topic and must be merged. Other than that, there is an almost frustrating leeway left to implementing applications. We shall continue forward using solely the premise that things are merged.

XTM [XTM] has this to say: []

The semantics of merging make this a unique situation that versioned hyperdocuments alone cannot address. The fundamental problem is that the post Topic Map space isn't set inclusive of the subordinate topic spaces. Information has been lost in the merge. Namely the information about what topics where in what spaces before the merge. This is what the second scenario seeks to remedy. Through the use of a life-cycle model which can handle rich configuration managment demands, the Topic Map merge can be made to be lossless.

Here is the canonical use case that exhibits this problem. All of the Topic Maps in this scenario are managed as business objects. The company is divided into logical units (i.e. manufacturing, marketing, etc.). Each unit maintains it's own Topic Map over it's corpus of information, all of whom use a corporate level ontological Topic Map to provide it with the foundational topics and characteristics. This also has the effect of providing consistency and useful merging amongst the division-level Topic Maps. The Topic Maps are there to provide division level scoping, with the intent that together they will provide a corporate wide knowledge traversal/discovery layer. The desired behavior is that the corporate Topic Map represents the semantic merging but not necessarily the physical merging of the component Topic Maps, yet in the current standards, physical creation of a new corporate wide Topic Map is all that is available.

Depending on the dominant business processes there are three ways we can deal with merging in the context of enterprise life-cycle.

  • Do nothing.

  • Use scoped merging.

  • Maintain merging information externally.

3.2.1. Do Nothing.

The fact that the Topic Maps we are concerned with represent the simplest concept of a merge. For the above scenario, this would imply that the desired corporate wide Topic Map would have to be recreated by a merge of all of the subordinate Topic Maps every time there was a change. As this is the simplest option, it has the advantage of being easily comprehensible, as well as easily implemented. In smaller setups, using an implementation with good performance characteristics, this could be a viable option, but wouldn't perform in a large deployment scenario.

3.2.2. Scoped Merging

This inolves definining a subject indicator or topicRef that would scope all of the elements from the subordinate Topic Map. This could have the effect of preventing any merging. This would most likely not be the desired effect, but additional processing could be applied to do an additional 'virtual' merge. This would be a rudimentary means of maintaining the merging information. This would allow for simple unmerging (all of the scoped components would simply be deleted).

3.2.3. Maintain merging information.

The new TopicMap is represented by an application that maintains information about what has gone into the merging externally. Ideally the merging would be done through a snapCM Reference to the source Topic Maps, with the resultant Topic Map being a virtual Topic Map. This would allow all of the power of policies scoped to the merge as well as the components that go into it. The majority of the complexity that this implies is found in implementation of the virtual merge.

4. Summary

Topic Maps themselves are designed to solve a large class of problems. As they are further integrated into the enterprise, much thought must be given to how they play amongst their Business Object peers. This paper has focused on the interplay in relation to life-cycle in particular. Hopefully it has framed and stimulated some of the dialogue to come.

5. Further Study

This hasn't been an exhaustive study on all of the complicated facets that the interplay of TMBOs and their environment arise. There are additional questions as to performance in the propagation of changes throughout the system. Access control in the context of merging, as well as occurrence traversal needs to be addressed as well.

Within the context of life-cycle in particular, the work this paper has begun should continue to be investigated. This will provide a more granular view of the details involved, as well as delve into the realm of implementation implications.

Bibliography

[HyTM] ISO/IEC 13250, Topic Maps (Second Edition) available online at http://www.y12.doe.gov/sgml/sc34/document/0322.htm

[ITU-BOS] Internation Telecommunications Union Business Object Summit Proceedings.http://www.itu.int/ITU-T/e-business/bos/index.html

[COVER] The XML Cover Pages: Topic Maps Section at http://xml.coverpages.org/topicMaps.html

[RTD] W. Eliot Kimber, Peter Newcomb, Steve Newcomb. Version Management as Hypertext Application: Referent Tracking Documents. Online from http://www.isogen.com/papers/ref-track-docs-paper.pdf.

[SnapCM] John D. Heintz, Joshua Reynolds. SnapCM: Abstract Model. Online from http://www.isogen.com/papers/snapCM.pdf

[VHD] Versioned Hyperdocuments: Abstract Model available online at http://www.idealliance.org/papers/extreme02/html/2002/Heintz01/EML2002Heintz01.html

[UML] Unified Modeling Language (UML) http://www.omg.org/uml/

[XTM] XML Topic Maps (XTM) Version 1.0, http://www.topicmaps.org/xtm/1.0/

Glossary

FCBO

First Class Business Objects

Biography

Josh has a solid background in hard-core mathematics, extensive experience addressing content management problems in a highly versioned/linked problem domain, and loves dealing the complexities that arise the savage intellectual beating they so richly deserve. When addressing any challenges, his goals are to surmount them using a solid extensible architecture and implement any solutions using test-driven development and whatever tools are appropriate. His tool-set includes UML, CORBA, Pattern Based Design, XML/SGML, Java, C++, and Python. Josh also enjoys coffee, staying fit, and flying.