Abstract
This paper presents a concise investigation and classification of the nature of identities and the role of names in knowledge management solutions. Identities and names, while being the very foundation upon which knowledge management solutions are built are also the least clearly explained and understood. This leads to a conflicting and often inconsistent use of identity and naming in any given solution. As we move towards integrated knowledge management solutions, identity has an even greater role to play in providing consistency and forming bridges between knowledge models.
We begin this paper with a study of the different aspects of identity including 'thing identity', 'The Scope and Granularity of things to be identified', 'Assigning meaning and Published Identity', 'Identity Assignment and Resolution' and 'Identity Structure'. The next section introduces the idea of names and illustrates how they are different from identities. Throughout this paper we provide syntactic examples of the ideas discussed using several standards, including RDF and TopicMaps. We conclude this paper with a discussion about how identity and names could evolve and what future role they may expect to play in knowledge management solutions.
Keywords
Table of Contents
This paper presents a concise investigation and classification of the nature of identities and the role of names in knowledge management solutions. Identities and names, while being the very foundation upon which knowledge management solutions are built are also the least clearly explained and understood. This leads to a conflicting and often inconsistent use of identity and naming in any given solution. As we move towards integrated knowledge management solutions, identity has an even greater role to play in providing consistency and forming bridges between knowledge models.
We begin this paper with a study of the different aspects of identity including 'thing identity', 'The Scope and Granularity of things to be identified', 'Assigning meaning and Published Identity', 'Identity Assignment and Resolution' and 'Identity Structure'. The next section introduces the idea of names and illustrates how they are different from identities. Throughout this paper we provide syntactic examples of the ideas discussed using several standards, including RDF and TopicMaps. We conclude this paper with a discussion about how identity and names could evolve and what future role they may expect to play in knowledge management solutions.
The idea of identities is not a new one; it has to be one of the core principles of computer science and a general principle of life. Consider parts without part numbers, books without titles and ISBNs what a chaotic world we'd live in. For example Bob is a mechanic fixing an engine. Without identity Bob would need to explain to Jill that he needed "a bolt about 3 inches long, with one of those thingies on top". With identity Bob can just ask for part P7778-R. Bob may want to call part P777-8 a thingy-ma-jig but we'll come to that later. For now we can see that identity is a binding force that makes meaningful communication possible. This section looks at how and why identity is such an integral part of the world and more specifically why identity is such a critical aspect of electronic knowledge management systems.
Identity is just another tool, and like all tools it seems a good idea to start by identifying what it is we use identity for and why there is a requirement to do so. Primarily, identity is used as an unambiguous means of differentiating between things. What these things are is not conveyed by the identity. The identity is the handle that could be used to locate a thing. But again, the identity does not prescribe how a thing is to be located or if it can be located at all.
The other critical reason for assigning identity, one that extends the idea of differentiation, is that of agreement. Different machines and people need not only to disambiguate ideas or things of which only they have knowledge, but they also want to be certain about identifying things that are produced or exposed by others. This is the cornerstone of knowledge interoperability.
In knowledge management systems we are building models of the real world. These models often have an abstraction of 'thing' and in order to make references between these things, to express relationships, it is necessary that these things have identity. It is important to note that identity assignment happens on two levels. On one level we assign an identity value to a property called identity, which belongs to the abstraction of the thing in the computer system. Secondly, we somehow indicate the relationship between a given identity value and the thing in the real world. We say 'somehow' very deliberately as it can be quite a job to relate a string of characters unambiguously with real world things. Especially when those things are not in a computer system.
To illustrate the assignment of an identity value to a abstract entity and the connection to a thing we have expressed the following:
AbstractThing := (Identity x Properties) ThingBinding := (Identity x Thing) An instance of the above schema is i E Identity t E Thing at E AbstractThing p E Properties mythingbinding = (i x t) myasbtractthing = (i x p)
We can use the idea of TopicMaps and SubjectIndicatorReferences to make this idea more concrete. In the computer system a java class of type Topic is created; this is the AbstractThing. We assign a SubjectIndicatorReference identity to this Topic. The PSI we use to define the SubectIndicatorReference is documented and indicates the nature of what that identity means. This is the thing binding.
Identity can be assigned to the abstraction of a thing and related to the thing in the real world. We can then in turn locate, differentiate and make connections between the things in our knowledge management system. As we then consider the integration of knowledge management systems we see that it may be possible to have meaningful interoperation if we can define equivalence between two or more identities, or if two identities are already identical.
It may be throw away, but for all intent and purpose the scope of what can be identified is infinite. People, places, actions and time can all be assigned identity. We can also assign identity to people performing an action at a given location at a given time. Thus any combination of things, including abstract concepts, can be assigned identity.
Practically, in a knowledge management system it is not required to give everything identity in these terms. Identity can be saved for the things about which there is likely to be a need for agreement to be made on what constitutes that thing. E.g. giving identity to the thing or concept 'Person' is a reasonable thing to expect, as is creating an identity for 'name' but is it necessary to create identity for the concept of 'A Person has a Name'? Probably not. Not unless it is required that we need to be able to say things about the idea of people having names. This idea will be familiar to OO designers and developers where classes are created when it is necessary to be able to create that thing as a first class entity. This is done so that things can be said about that class, what properties it has and what relationships it has with other classes.
Identity is required when we want to say something about the thing being identified. There is more to identity though as we want to be able to preserve and expose it such that we can have an agreement of what things are across systems.
One of the key reasons why we assign identity is so that we can assign a semantic or meaning to the abstract entity within the computer system. Once however, we talk about assigning semantics we have to answer the question about what is 'semantic', what does it mean to assign meaning?
We can begin to answer this question by looking at what a modern knowledge management system aims to provide. It should be capable of providing an abstract representation of knowledge structures that are accessed in a distributed environment by different groups of users or machines.
From this we can see that the core aspect of a 'semantic' in terms of identity assignment is the agreement of a meaning associated with an identity. Lets take a look at two examples:
Assignment 1. urn:sjdsjhfsjfksk:fen => the concept known by the name of 'Graham Moore', who works for empolis. Assignment 2. urn:person:empolis:gdm => => the concept known by the name of 'Graham Moore', who works for empolis.
These two examples are included to make the point that the structure and content of an identity in no way convey anything about the meaning of thing being identified. What is important in making the identities above useful in a distributed, multi user knowledge management environment is that the identities used with the domain of operation, for a given system, is agreed upon.
Agreement of identities can occur in many ways. Here are a couple of different ideas that are being pursued. We have classified these ideas as 'assigned meaning' and 'constructed meaning. The first idea is the concept of 'published identity'. Published Identity is where groups of people empowered to convey the concepts in a domain of discourse create a set of identities and descriptions, or references, such that they have common agreement about the meaning or semantic of those identities.
Thus in a discussion about concepts in TopicMaps the SC34 committee could define a set of identities that relate to the core ideas in the standard. One such definition may look like:
urn:iso:iso13250:topic
=> The idea of topic as defined in the ISO13250 standard,
TopicMaps.
=> Reference xtm1_0.xml#topicdefinition
In the above example notice that a combination of document references and English prose is used to define the meaning of the concept identified. There is no one correct way to define the identity. The key is to provide as much information that someone or perhaps some machine can subscribe to the notion being presented.
The next issue with 'Published Identities' is how and where these definitions are made public such that developers, knowledge engineers and software can make use of them. There are a number of efforts to create definition repositories such as the OASIS efforts on TopicMap Published Subject Indicators.
However, these explicit knowledge definition repositories are not the only means by which commonly agreed identities are published. Many standards using a form of XML mark-up are inherently mixing the definition of meaning with identity. E.g. many standards use namespaces to provide a unique identity for some XML fragment of the standard. The potential danger in this is one that is inherent with XML, i.e. that
XML markup != the model of what is being represented
Thus identities used in this context are potentially liable for misinterpretation. If not at some abstract level of knowledge representation but perhaps just between the identity of the XML element versus the thing it represents.
Another powerful identity assignment method being followed in many areas, DAML, RDFSchema, TopicMaps - is to define identities in terms of other identities that have meaning using well defined relationships.
What this means is that a new identity is created and rather than having prose or references that help define its meaning - the meaning is defined more formally in terms of other well defined identities and meaning. This has to be in the context of well defined, understood and identified relationships - other wise the identity is meaningless.
An example of this kind of definition can be seen in the RDF Schema documentation 'Figure 2. : Class Hierarchy for the RDF Schema' where more complex concepts are defined in terms of core terms. E.g. RDF:Class is defined as a subclass of RDF:Resource. In this example RDF:Resource is defined in the RDF Model and Syntax specification to a high degree, the concept of 'subclass' is also defined. Thus, RDF:Class as a concept is defined using these two predefined ideas.
The benefit of this kind of approach is that it implies some kind of foundation and rigour to what is being defined. For certain classes of concepts it is ideal for others it just isn't appropriate. The next section briefly discusses the notion of functional identity and non-functional identity. An idea that is closely related to the idea of constructed versus assigned meaning.
This brief section makes a distinction between identities that have functional meaning within the system and those that do not. In the previous section we distinguished between assigned meaning and constructed meaning, we note here that both assigned and constructed meaning can be functional identities.
A functional identity is one whose existence within the system adds and makes available some system behaviour. A non-functional identity is one where the concept is passive within the system. Two examples illustrate these different kinds of concepts.
Taking the concept we defined earlier of the concept named 'Graham Moore' who works for empolis, we would say that this is a non-functional concept. It is a concept that has no functional bearing on the system.
Alternatively, a concept identified by 'urn:operation:subclass', accompanied with a suitable definition which stated what it meant to be a thing identified by 'urn:operation:subclass' and what that meant in terms of the knowledge management system that the existence of this concept in a system would have a functional impact.
The functional impact could be that any concepts associated by 'subclass' have an additional property of transitivity - such that getting super classes on a concept would result in recursive calls up the class hierarchy.
Within the classification of functional and non-functional identities it is also necessary to analyse the properties of knowledge management systems themselves. Functional identities only work if one of two conditions hold true. The first being that the behaviour associated with the identity exists within the system such that it can act upon the identity being present. The second is that the knowledge management system is open enough that it can dynamically bind in new functionality when required to by functional concepts and that the identity is published along with the functional components that are required.
This second point is particularly interesting as it then supposes that the definition of a concept is more than just prose and references but something executable within a system.
This section discusses the assignment and resolution of identities to entities within a computer system. The assignment of identity is obviously a critical aspect of any knowledge management system. Failing to govern or control the assignment of identity can be lead to disastrous consequences i.e. two different things with the same identity or an inability to access an entity in a distributed environment.
In computer systems it is necessary to be able to resolve an identity to some handle onto the identified abstraction regardless of how the identity is assigned. Thus it is a requirement that a resolution function exists that can resolve the identity. What this means to identity assignment in the context of knowledge management systems is that there are likely to be two classes of identity resolution. These two classes can be separated into identities that are system assigned identity and identities that are human aided assignments. Examples of Human aided identities are Published Subject Indicators and commonly agreed ontological identities.
With knowledge management systems it is often the case that the core idea or construct is 'Topic', 'Concept' or 'Resource' and these ideas are very similar to those in object systems. Thus access to knowledge structures can be compared with accessing an open and distributed object system.
In knowledge management systems we have two kinds of resolution and identity and they serve different functions. The human assigned identity and resolution provides mechanisms for 'clients' to get an initial access into the system. System assigned identities are typically to be used once a client has hooked into a system. The example below is an example of how a client could access a distributed knowledge management solution.
KMClient c = new KMClient();
Concept concept = c.getConceptByIdentity("urn:people:empolis:gdm");
String id = concept.ID();
System.out.println("concept id : " + id);
KMClient c = new KMClient();
Concept concept = c.getObjectByID(id);
Notice that we do not discuss how the resource is located in the first instance - this is covered in the next section on identity structure. But from the example we can see how both kinds of identities serve useful purposes in constructing a knowledge management system.
Perhaps also we can add the conjecture that as systems create more and more identities they will use the unique persistent system assigned identity as being the 'human' assigned identity. This is particularly likely in situation where agent based software is involved.
Lets iterate again that the string of characters that comprise an identity convey nothing about the meaning associated or ascribed to the entities with that identity. However, in terms of resolution it can be of great benefit to use standard ways of forming identities. Why would we want to do this? If we use common structures for constructing identities then we should be able to use common software for resolving these identities - in a distributed and global fashion.
URN Schemes appear to be a powerful way with which to construct identities for knowledge management solutions - and this applies to both generated and assigned identity. What is going to be required is an architecture that is able to resolve these identities and some common understanding about what they resolve to.
This section on Identities has introduced the idea of mapping or associating an identity with some abstract concept and with some computer abstraction of that concept. We have discussed the scope and granularity of what gets identified in a knowledge management solution. The next section discussed how meaning was associated with identities so that systems could interoperate at a higher degree of understanding. We have also discussed several other aspects of identity including functional and non-functional identity, identity resolution and finally identity structure. These ideas constitute the core abstract ideas that underlie many of the modern knowledge management models and systems such as RDF and TopicMaps. The next section discusses the role of names in knowledge systems and compares them to identities.
This section provides a short discussion on the role of names in relation to identified concepts within a knowledge management system. It first discusses concept and and then discusses the differnence between Contextual Names and Contextual Concepts.
Given that knowledge management systems are used at least to some extent by humans it is only natural that some idea of naming is supported. Users of systems don't and often can't use 'identities' to hook into the system. They require a system by which they can access identified entities using names that are familiar to them.
In systems where names are used it is important to keep a few things in mind:
that a single concept can have more that one name, including multi-lingual
that two concepts can have the same name
that it is dangerous to consider a name to be an identity
Names are just properties of identified entities
Some standards such as TopicMaps make names more prominent in the fundamental model than other kinds of properties. Others standards, such as RDF use data driven functional identities such as RDF:Label, to commonly indicate what should be treated as a name of a concept.
Regardless of how it is defined the name property is often an indexed value to help users locate starting points of concepts with a knowledge system. But given the four points above - great care must be taken when using names rather than identities.
We have stated that it is possible for two concepts that are different to share the same name. This is a common issue in the world in which we live and with the models that we build. There are two main approaches to making further distinction about a concept based on the naming structure.
One example of contextualising a name structure is the use of Scope in TopicMaps. This mechanism provides a way to associate an arbitrary set of topics in a semantically undefined way to a Topic name. This is intended to provide some context about the name with regard to its assignment to a given topic.
The other approach is not to meddle with the name structure at all but instead to allow concepts to have the same name. If two concepts are named the same then it is the context around them -i.e. their relationships with other concepts that will aid users and systems in determining if this is the topic they require. Not to mention of course the Topic identity.
We feel that the second approach is far more practical and powerful in disambiguating concepts with the same name. Thus generally, it is advisable to create better context and relationships between concepts that start augmenting the name property of a concept.
While it is not recommended that annotations be used for names in order to distinguish concepts some degree of typing can aid in how names are used. Multi-lingual, Sort names, display names are all valid and useful naming types. But fundamentally they don't have any impact in what is being identified.
This kind of feature is typically either directly supported as part of the name structure or supported through the core mechanisms of a knowledge system, e.g. by treating naming as a property assignment and subsequently being able to make further statements about that assignment.
With the merging of key technologies such as the semantic web and knowledge based systems such as RDF and TopicMaps it seems that the issues of identity and naming have a greater role to play. Future work will begin to look at how identities of functional components can be used to dynamically configure a system such that it exhibits new functionality in order to process and understand new knowledge and information.
Other areas of interest are in how peer-to-peer architecture can be configured to provide robust and scalable identity resolution in an inherently non-robust environment such as a computer network.
This paper has analysed and in cases classified the different aspects and features that names and identities play in a knowledge management solutions. We have discussed how identity is a crucial aspect for agreeing meaning, for reliable interchange of information and how resolution of identity is key to a complete distributed knowledge management solution. Finally we have presented a couple of ideas about how identities could be used as a knowledge management solutions spread.
![]() ![]() |
Design & Development by deepX Ltd. 2002 |