XML 2002 logo

Using DAML+OIL as a Constraint Language for Topic Maps

Abstract

Knowledge management has become the hot new buzzword. Standards such as topic maps and Resource Description Framework (RDF) have been created to allow knowledge to be managed and interchanged. In all the hype surrounding topic maps, one of the oft- mentioned applications is the ability to create community-defined taxonomies and ontologies. Taxonomy is a fancy word meaning, "the ordering of things into groups or categories." Ontology is a description of a set of concepts and the relationships that can exist between those concepts. Topic maps provide much of what is needed to define a fairly robust taxonomy. However, there are still capabilities that are necessary to build robust ontologies that are not part of the topic map standard.

Over the past year or so, the World Wide Web consortium (W3C) and DARPA have been working to create a framework to model information ontologies contained on the Web. The result of that work is known as DARPA Agent Markup Language + Ontology Interface Layer (DAML+OIL). DAML+OIL provides a rich set of constructs, using RDF, to create ontologies and mark up information to be machine readable and processable. Some of the constructs are much more powerful than what is currently enabled by the topic map model. These include:

  • The ability to define not only subclass-superclass relationships but also disjoint relationships.

  • The ability to place restrictions on when specific relationships are applicable

  • The ability to apply cardinality to relationships.

The topic map standard provides several features that RDF cannot match, especially in its association model and the ability to define scopes for information (even though there is still discussion about how scope should really work). While certain inferences about how objects are related can be derived from a topic map, DAML+OIL has extended RDF to do things topic maps can't. The constructs mentioned above would be very useful in topic maps to allow intelligent inferences about objects (whether or not they are topics or resources) and to accurately build a knowledge base from a set of information, be it a small corporate document repository or the entire Web.

This paper will demonstrate how DAML+OIL can be used to provide additional capabilities that are currently missing from the topic map model. It will discuss possible additions to the topic map model or its companion standard, Topic Map Constraint Language (TMCL), to enable DAML+OIL to process and enhance topic maps. It will also discuss methods for using DAML+OIL in conjunction with topic maps to take advantage of the best from both worlds.

Keywords


Table of Contents

1. Example Application - The Family Tree
2. Building a Knowledge Representation System Using Topic Maps and DAML+OIL
2.1. Extending XTM
2.1.1. A NOTE ABOUT THE EXAMPLES
2.2. PSI
2.3. Class Hierarchies
2.4. Assigning properties to topics
3. CONCLUSION
Bibliography
Glossary
Biography

1. Example Application - The Family Tree

For illustration purposes throughout this paper, a genealogical chart (i.e. family tree) will be used to explain the concepts presented. Family trees are used to express relationships between people, whereas topic maps, RDF and semantic networks are used to describe relationships between data items. By examining and compiling the relationships between the nodes of any of these networks, pieces of knowledge can be inferred. For example, in the diagram below, Eric, Becky and Dawn are siblings because they share the same parents. Keri and Olivia are cousins because their parents are siblings. Cara is Carmen's grandparent because Carmen's parent is Cara's child.

In topic map terms, each item within a box can be considered a topic. The names within the boxes can be considered unique identifier values and possibly base names. The horizontal lines going between the boxes represent marriage associations. The horizontal lines connecting boxes from above represent sibling associations. The vertical lines represent parent/child associations.

click image for full size view

Figure 1. Genealogical Chart

2. Building a Knowledge Representation System Using Topic Maps and DAML+OIL

The XTM specification says almost nothing about validation and consistency of the information contained within it. Instead, the conformance section focuses on the understanding of the defined constructs, the interchange syntax, and import and export of topic maps. IT is often left to the developer of the topic map, be it human or computer, to make sure that the information contained within the topic map is "accurate".

Many would argue that the developer of a topic map needs some degree of system support when designing and creating a map that could potentially consist of millions of topics and associations. The question of consistency within the topic map becomes a key issue, because it is nearly impossible to check a map of this size manually.

XTM does a fairly good job in modeling class-instance hierarchies. It also is well suited for associating topics together and referencing information about topics through occurrences. However, there is currently no standardized mechanism in place that allows a topic map author to:

  • define exactly what types of topics are allowed to appear as members within specific types of association.

  • define the number of times a topic can occur as a member of an association

  • define properties on associations that would allow them to be processed automatically into subject-object-predicate triples

  • define inferencing rules that allow triples to be created based on information within the topic map.

Notice that I said "currently no standardized mechanism". There have been several papers published providing topic map based solutions for these items, but none of them has been implemented in a standardized, interchangeable fashion. There is also work being done by ISO to define TMCL. However, it is not known when this work will be completed. This presents a problem to those wishing to develop topic maps now.

One possible solution is to use DAML+OIL to extend the XTM model in all the ways mentioned above.

DAML+OIL is a semantic markup language for Web resources. It builds on W3C standards such as RDF and RDF Schema, and extends these languages with richer modelling primitives. DAML+OIL provides modelling primitives commonly found in frame-based languages. DAML+OIL uses values from XML Schema datatypes. DAML+OIL was built from the original DAML ontology language DAML-ONT (October 2000) in an effort to combine many of the language components of OIL. At its core, the language seems to have clean and well defined semantics.

2.1. Extending XTM

One of the first challenges that must be addressed is the closed model of the XTM specification. The XTM DTD does not contain any parameter entities that enable changes to be made to the structures declared within. While that makes it easier to develop tools based on the specification, it also traps them with any perceived weaknesses. Therefore a change to the specification is being suggested.

For it to be possible to process topic map information through DAML+OIL, two modifications need to be made:

  1. The "id" attribute on the <topic> element should be redeclared as an "rdf:ID" attribute. This will allow an RDF-enabled processor to address topics within the topic map as RDF resources. It will still provide the ability to uniquely identify topics within the topic map.

  2. The <resourceRef> element must be allowed as a valid child of the <instanceOf> element. This will allow DAML+OIL to be used to define ontological constraints on topics that represent topic types and association types. This element could also be used to signify that a topic within a topic map is being used as a resource for the purposes of the indicated reference.

Once these modifications have been made to the XTM specification, it becomes a simple matter to reference topic map structures as resources within a DAML+OIL specification.

The following sections will discuss how topic maps and DAML+OIL can be used to address the limitations mentioned above. They will also discuss development issues when using DAML+OIL as a layer above a topic map. All markup examples assume the changes suggested above have been made to the XTM specification.

2.1.1. A NOTE ABOUT THE EXAMPLES

All examples within this paper are writeen using the XML syntax for both XTM and DAML+OIL. However, it is the intention of the author that the application layer actually be where the combination of XTM and DAML+OIL occurs. The XML examples are for clarity of illustration only.

2.2. PSI

PSI are used to uniquely identify a subject about which any number of topics can be created. This provides a single binding point for a set of topics from different topic maps to be merged. DAML+OIL is based on RDF and therefore relies on Uniform Resource Identifiers (URIs) to identify resources. Any statements made about any number of resources that reference the same URI are understood to be talking about the same resource. In fact URIs are the syntax used to define PSI in the XTM specification.

As part of the development of the sample topic map, a set of PSI can be defined that will be used to assign specific semantics to the topics within the topic map:

 <topic rdf:ID="eric">
  <subjectIdentity>
   <subjectIndicatorRef xlink:href="www.semantext.com/psi/people/Eric 
Freese"/>
  </subjectIdentity>
  <baseName>
   <baseNameString>Eric Freese</baseNameString>
  </baseName>
 </topic>

DAML+OIL does not have a concept of PSI per se. However, based on the suggested modifications to the XTM specification, it can reference topics and thus make use of the PSI defined for the topic.

 <daml:Thing rdf:about="#eric"/>

2.3. Class Hierarchies

Within topic maps, all topics, occurrences, and associations can be seen as instances of classes (types). The classes themselves are expressed as topics. The XTM specification defines a set of association classes for building topic hierarchies or ontologies. Class- instance is a type of association that expresses class-instance relationships between topics that play the roles of class and instance respectively. The subjects "class-instance", "class", and "instance" are all defined by PSI in the XTM specification. Superclass- subclass is a type of association that expresses superclass-subclass relationships between topics that play the roles of superclass and subclass respectively. The subjects "superclass-subclass", "superclass", and "subclass" are all defined by PSI published in this specification.

Within XTM both of the class-instance and superclass-subclass relationships are transitive by default. This will enable inferences to be made automatically as the hierarchies are built.

<topic rdf:ID="person">
  <baseName>
   <baseNameString>Person</baseNameString>
  </baseName>
 </topic>

 <topic rdf:ID="male">
  <baseName>
   <baseNameString>Male</baseNameString>
  </baseName>
 </topic>

 <topic rdf:ID="female">
  <baseName>
   <baseNameString>Female</baseNameString>
  </baseName>
 </topic>

<association>
  <instanceOf>
   <subjectIndicatorRef 
xlink:href="http://www.topicmaps.org/xtm/1.0/core.xtm#superclass-
subclass"/>
  </instanceOf>
  <member>
   <roleSpec>  
    <subjectIndicatorRef 
xlink:href="http://www.topicmaps.org/xtm/1.0/core.xtm#superclass"/>
   </roleSpec>
   <topicRef xlink:href="#person"/>
  </member>
  <member>
   <roleSpec>  
    <subjectIndicatorRef 
xlink:href="http://www.topicmaps.org/xtm/1.0/core.xtm#subclass"/>
   </roleSpec>
   <topicRef xlink:href="#male"/>
   <topicRef xlink:href="#female"/>
  </member>
 </association> 

In the example above some topics are defined for concepts or roles that can be included in a genealogy. The hierarchy is built using the "superclass-subclass" and the "class- instance" relationships. The use of the <instanceOf> element is equivalent to defining a class-instance association. The association shown above could also have been represented using the <instanceOf> element within each topic definition. The "superclass-subclass" relationship is used to subdivide classes. For example, the "person" topic can be subdivided into "male" and "female". However there is currently no mechanism within topic maps for stating that a "person" topic can only be subdivided into "male" and "female".

DAML+OIL can be used to define essentially the same information. The example below can be interpreted nearly identically to the topic map example above.

 <daml:Class rdf:ID="person">
  <daml:label>Person</daml:label>
 </daml:Class>

 <daml:Class rdf:ID="male">
  <daml:label>Male</daml:label>
 </daml:Class>

 <daml:Class rdf:ID="female">
  <daml:label>Female</daml:label>
 </daml:Class>

 <daml:Class rdf:about="#person">
  <daml:oneOf parseType="daml:collection"> 
   <daml:Thing rdf:about="#male"/> 
   <daml:Thing rdf:about="#female"/> 
  </daml:oneOf>
 </daml:Class>

One difference between the two examples is that the last <daml:Class> declaration states that persons may be either male or female. The topic map association defines no such constraint. DAML+OIL provides the ability for the ontology designer to specify other restrictions that cannot currently be defined in the topic map model. Among these is the ability to define disjoint relationships between resources. The class definition about the "person" resource could be redefined as shown below:

<daml:Class rdf:about="#person">
 <daml:disjointUnionOf parseType="daml:collection"> 
  <daml:Class rdf:about="#male"/> 
  <daml:Class rdf:about="#female"/> 
 </daml:disjointUnionOf>
</daml:Class>

Using the modified DAML+OIL expressions, the same class hierarchy was established as is shown in the XTM example. However the ontology designer now has the ability to define that a person cannot be both male and female using the <daml:disjointWith> construct within the definition of "female".

Consider the following example:

<topic rdf:ID="hermaphrodyte">
  <instanceOf>
   <resourceRef xlink:href="#male"/>
   <!--references the DAML+OIL Class definition, not the topic-->
  </instanceOf>
  <instanceOf>
   <resourceRef xlink:href="#female"/>
  </instanceOf>
  <baseName>
   <baseNameString>Hermaphrodyte</baseNameString>
  </baseName>
 </topic>

Because of the extra information provided by the DAML+OIL definition, a DAML+OIL engine would be able to process the example above and throw an error by default. A topic map engine would not be able to perform this type of validation on the data.

In addition to the type hierarchies described above, there are several other types that would also be useful in that they can be modeled in much the same way. These relationships include:

  • component-object (wing/airplane)

  • member-collection (tree/forest)

  • portion-mass (slice/loaf)

  • stuff-object (air/atmosphere)

  • feature-activity (eating/picnic)

  • place-area (city/country)

  • phase-process (assembly/manufacturing)

DAML+OIL declarations could be defined for these types of hierarchies and a DAML+OIL engine would be able to process the semantics being represented. A topic map engine would not be able to process the semantics of these other types of hierarchies.

2.4. Assigning properties to topics

At this point it is possible to infer that if the subject of a topic is a "male", that subject of that topic is also a "person". This is due to the transitivity property of the class-instance and superclass-subclass associations. As can be seen, none of the items in the blocks shown in Figure 1 have been defined. This is done below for one set:

 <topic rdf:ID="eric">
  <instanceOf>
   <resourceRef xlink:href="#male"/>
  </instanceOf>
  <baseName>
   <baseNameString>Eric</baseNameString>
  </baseName>
 </topic>

 <topic rdf:ID="rita">
  <instanceOf>
   <resourceRef xlink:href="#female"/>
  </instanceOf>
  <baseName>
   <baseNameString>Rita</baseNameString>
  </baseName>
 </topic>

 <topic rdf:ID="olivia">
  <instanceOf>
   <resourceRef xlink:href="#female"/>
  </instanceOf>
  <baseName>
   <baseNameString>Olivia</baseNameString>
  </baseName>
 </topic>

 <topic rdf:ID="jordan">
  <instanceOf>
   <resourceRef xlink:href="#male"/>
  </instanceOf>
  <baseName>
   <baseNameString>Jordan</baseNameString>
  </baseName>
 </topic>

One challenge in defining an ontology, whether through a topic map or through DAML+OIL, is setting the methodology to be used. It is important that a consistent method be used to ensure that the topic map structures are interpreted in the same way. One possible method is to provide as much information as possible for each topic. For instance, on the topic named "Eric", it might be possible to also add <instanceOf> elements for "husband", "father", "son", "child", etc. Another method may be to define a minimal set of types and utilize other topic map structures such as associations to define the additional information. This method would allow knowledge to be placed in the context in which it occurs.

An additional advantage to this second method within the topic map domain is that scopes can be used to control when specific information is in effect. It must be noted that there is still a great deal of debate within the topic map community about the processing of scope. It is possible that the range and domain constructs within RDFS and DAML+OIL may accurately mirror the functionality of scope in some cases.

Consider the following topic and association declarations:

 <topic rdf:ID="marriage">
  <baseName>
   <baseNameString>Marriage</baseNameString>
  </baseName>
 </topic>

 <association>
  <instanceOf>
   <topicRef xlink:href="#marriage"/>
  </instanceOf>
  <member>
   <roleSpec>  
    <topicRef xlink:href="#husband"/>
   </roleSpec>
   <topicRef xlink:href="#eric"/>
  </member>
  <member>
   <roleSpec>  
    <topicRef xlink:href="#wife"/>
   </roleSpec>
   <topicRef xlink:href="#rita"/>
  </member>
 </association> 

The example above defines an association type called "marriage" and an association of that type linking the "eric" topic and the "rita" topic. It also defines the roles that each topic plays within the association. However, the topic map model does not currently have a way of interchangeably defining this type of association in such a way so that a marriage can be constrained to only contain one husband and one wife.

In DAML+OIL markup it might look like:

<daml:Class rdf:about="#marriage">
 <daml:subClassOf>
  <daml:Restriction daml:cardinality="1">
   <daml:onProperty rdf:resource="#husband"/>
  </daml:Restriction>
 </daml:subClassOf>
 <daml:subClassOf>
   <daml:Restriction>
     <daml:onProperty rdf:resource="#husband"/>
     <daml:toClass rdf:resource="#Husband"/>
   </daml:Restriction>
 </daml:subClassOf>
 <daml:subClassOf>
  <daml:Restriction daml:cardinality="1">
   <daml:onProperty rdf:resource="#wife"/>
  </daml:Restriction>
 </daml:subClassOf>
 <daml:subClassOf>
   <daml:Restriction>
     <daml:onProperty rdf:resource="#wife"/>
     <daml:toClass rdf:resource="#Wife"/>
   </daml:Restriction>
 </daml:subClassOf>
</daml:Class>

The example above uses DAML+OIL to further define the topic representing the "marriage" association type. By doing so, a DAML+OIL processor can validate that marriage associations consist of one and only one husband and one and only one wife. Notice that rather than defining a new class, we are simply further describing the class defined using topic map syntax using "rdf:about". This allows a topic map processor to manage the ontology and a DAML+OIL processor to validate the assertion being made within the topic map.

The developer could go one step further and define that husbands must be males and wives must be females. This would serve to strengthen the validity of the knowledge base being constructed.

<daml:ObjectProperty rdf:ID="husband">
 <daml:domain rdf:resource="#marriage"/>
 <daml:range rdf:resource="#male"/>
</daml:ObjectProperty>

<daml:ObjectProperty rdf:ID="wife">
 <daml:domain rdf:resource="#marriage"/>
 <daml:range rdf:resource="#female"/>
</daml:ObjectProperty>

These declarations allow a DAML+OIL system to state that when there is a female in a marriage, she is the wife and likewise for husbands. In a combined system, the <rolespec> elements within the association would not even be necessary, since the DAML+OIL declarations would allow a system to infer which was the husband and which was the wife.

The topic map example below defines an association modeling a family unit. One strength of the topic map model is the ability to define associations between any number of items and define roles for them within the association. A subsequent weakness is that is can be difficult to determine how and if the members of the association interact.

<topic rdf:ID="family">
  <baseName>
   <baseNameString>Family</baseNameString>
  </baseName>
 </topic>

 <association>
  <instanceOf>
   <topicRef xlink:href="#family"/>
  </instanceOf>
  <member>
   <roleSpec>  
    <topicRef xlink:href="#parent"/>
   </roleSpec>
   <topicRef xlink:href="#eric"/>
   <topicRef xlink:href="#rita"/>
  </member>
  <member>
   <roleSpec>  
    <topicRef xlink:href="#child"/>
   </roleSpec>
   <topicRef xlink:href="#olivia"/>
   <topicRef xlink:href="#jordan"/>
  </member>
 </association> 

The association states that Eric and Rita participate in the association in the role of parent and Olivia and Jordan both as child members. Because the names of the roles are somewhat intuitive, a human reader can determine the connections between the member topics. However, a generalized topic map processor would not be able to.

<daml:Class rdf:about="#family">
 <daml:unionOf>
  <daml:subClassOf>
   <daml:Restriction daml:maxCardinality="2">
    <daml:onProperty rdf:resource="#parent"/>
   </daml:Restriction>
  </daml:subClassOf>
  <daml:subClassOf>
   <daml:Restriction>
    <daml:onProperty rdf:resource="#child"/>
   </daml:Restriction>
  </daml:subClassOf>
 </daml:unionOf>
</daml:Class>

The class defined above defines a family as zero, one, or two parents and any number of children. It is possible for a single person to constitute a family under this definition.

<daml:Class rdf:ID="son">
 <daml:intersectionOf parseType="daml:collection">
  <daml:subClassOf>
   <daml:Restriction>
    <daml:onProperty rdf:resource="#child"/>
   </daml:Restriction>
  </daml:subClassOf>
  <daml:Class rdf:about="#male"/>
 </daml:intersectionOf>
</daml:Class>

The class definition above builds upon the information contained in the family grouping by saying that any male who is a child is a son. This can also be repeated for daughters, fathers, and mothers.

3. CONCLUSION

This paper has shown that it should be possible to use the DAML+OIL language to provide a constraint and validation mechanism for topic map information without waiting for ISO to complete work on TMCL. Although minor modifications to the XTM specification were recommended, they do not render any damage to currently existing topic maps. Further study into more complex representations will be done in the future.

Bibliography

[DAML01] Connelly, Dan, et al.: DAML+OIL (March 2001) Reference Description, http://www.w3.org/TR/2001/NOTE-daml+oil-reference-20011218, 2001

[FREESE00] Freese, Eric: Topic Maps as Semantic Networks, XML Asia/Pacific 2000, Sydney, Australia, 2000.

[ISO13250] International Organization for Standardization: ISO/IEC 13250:1999 Document description and processing languages - Topic Maps, Geneva, 1999.

[OULETTE02-1] Oulette, Raxone and Ogbuji, Oche: Introduction to DAML: Part 1, http://www.xml.com/pub/a/2002/01/30/daml1.html, 2002.

[OULETTE02-2] Oulette, Raxone and Ogbuji, Oche: Introduction to DAML: Part II, http://www.xml.com/pub/a/2002/03/13/daml.html, 2002.

[XTM01] TopicMaps.Org: XML Topic Maps (XTM) 1.0, 17 February 2001.

Glossary

DAML+OIL

DARPA Agent Markup Language + Ontology Interface Layer

PSI

Published Subject Indicators

RDF

Resource Description Framework

TMCL

Topic Map Constraint Language

URIs

Uniform Resource Identifiers

W3C

World Wide Web consortium

XTM

XML Topic Maps

Biography

Eric Freese has 15 years of experience in the areas of document, information, and knowledge management with specific expertise in the development and implementation of XML technologies. His experience includes research, analysis, specification, design, development, testing, implementation, integration and management of information systems in a wide range of environments. He has significant research experience in human interface design, graphics interface development and artificial intelligence. Freese is a founding member of TopicMaps.Org, the organization that developed the XML Topic Maps (XTM) specification, and currently serves as the chairman of this group. He is also the chief architect and developer of SemanText, an open source application that uses topic maps to harvest and manage knowledge.