Abstract
Topic Maps and their supporting infrastructure are quickly achieving the level of maturity needed to make them useful as part of the basic information management toolkit. With increasing vendor support, standardization activities, and interest in the field of Knowledge Representation and Interchange, it is clear that Topic Maps are here to stay. Unfortunately all of this progress and interest in no way eases the formidable task of authoring Topic Maps.
Our experience indicates that XSLT works well for Topic Map generation over sets of XML resources. Markup, through it's design and implementation, frequently captures a good deal of semantic information, making it a perfect candidate for knowledge extraction. There are essentially two ways of extracting that knowledge into a Topic Map when those marked-up resources conform to a known schema (DTD, RELAX-NG, XSD, or even just an in-house convention). The first is hand authoring. This involves reading the document and using human reasoning to interpret the markup and it's content, then creating the Topic Map from this information. The second is to use the schema itself. By applying knowledge extraction techniques to the schema, we can use the same logic across an arbitrarily large set of conforming documents. As markup is easily machine processed, incorporating this reasoning in some sort of algorithmic form is clearly desirable. Going from markup (XML) to markup (XTM) makes XSLT the prime candidate for expressing this algorithm. Topic Map merging enables these generated XTMs to be combined with topical information that can't be extracted using a style-sheet. Although the former allows for more precision, the latter implies far less cost, both in terms of initial effort, as well as maintenance (only the style-sheet must be authored/maintained).
This paper provides a case study used to illustrate how to ease the task of Topic Map creation through a multi-stage modularized process. The first stage is hand authoring a relatively invariant "ontology" Topic Map. This consists of defining the ontology of types and associations that capture the data model for a particular subject domain. The assumption is that this ontology would be relatively stable over time, and a good candidate for reuse. The second is generating additional Topic Maps through an algorithmic process (XSLT) applied to XML document instances. The third is hand authoring those things not captured in the first two stages. This consists of the capture of information not directly discernible from the markup, or stored in non-XML resources. The resultant Topic Maps are merged giving a Topic Map that can be as rich as if completely hand authored. We present the source documents and code (stylesheets) used in an exploratory implementation of this approach, and lay out a more generalized approach to using this methodology. We finish by identifying possible issues with this approach, as well as enumerating alternatives, and stating the conclusions we were able to infer from our exploration.
![]() ![]() |
Design & Development by deepX Ltd. 2002 |