XML Europe 2003 logo

The Story of a Topic Maps Use Case: The IRS Call Center Tax Map

Enabling Fast Access to Diverse Information

Abstract

The IRS Tax Map is an integrated navigation system built to enhance the productivity of the Call Centers. It is an application of Topic Maps. This presentation describes the design principles and the steps of the construction.

Keywords


Table of Contents

1. The IRS Tax Map product
1.1. Purpose
1.2. History
1.3. Description
2. Incremental Aggregation of Information Sources
3. Project Impact on Existing Practices
Biography

1. The IRS Tax Map product

1.1. Purpose
1.2. History
1.3. Description

1.1. Purpose

IRS Call Assistors responding on the telephone to questions from taxpayers must often sift through enormous volumes of complex and diverse information to get answers. Historically, the difficulty of this task has created the danger of incomplete or wrong answers being given.

The IRS Tax Map project was designed and implemented to increase the speed and accuracy of information given to the public by optimizing access to relevant information for the Call Assistors. The Tax Map is an integrated navigation system that gives access to tax-related information by topic. It also enables direct navigation between related topics, in addition to the traditional access to documents by document type and through tables of contents.

click image for full size view

The Topic Maps paradigm was chosen as the foundation for this application because of its powerful facilities for organizing information and supporting navigation strategies.

1.2. History

The first prototype of the IRS Tax Map used eight Taxpayer Information Publications. The goal was a proof of concept, exploiting existing indexing elements (used to provide the printed index) as topics in a topic map. The prototype was designed to optimize access, and went through different phases of development until the design was considered adapted to the requirements.

As the prototype entered a new phase, a greater number of publications were added (all the small business publications). The application, built as a batch process using the Topic Map Loom technology, had no problem incorporating the additional documents. Later, all the publications for individuals were added. For the current Tax Map, other document types have been added: the Frequently Asked Questions and the Tele-Tax Topics publications. (Users have asked for other document types to be integrated as well.) nothing that prevents to do that. The fact that the documents are structured and have information that can be used to create the topic map automatically greatly facilitates the process. It is however possible to add other documents which are not structured, provided there is a way to get the topic information from them by different processes.

1.3. Description

Topics originate from the collation of all indexed terms in the IRS Tax Information Publications, as well as from keywords used in the publications known as Frequently Asked Questions (FAQs) and the Tele-Tax Topics. The total number of topics exceeds 8,000. Having all topics in a single index would be overwhelming. The topics collated from the Frequently Asked Questions are considered the most frequently used topics, and are given the type of "key topic". The set of key topics has been enriched "by hand" by a team of tax experts, including authors of the publications, and representatives of the users (phone assistors). The key topics are accessible through an alphabetical index. Topics which have to do with forms also are isolated. They are accessible through a specialized "Form topics" index. All other topics are present in the topic map and can be retrieved using a built-in topic-based search engine whose search domain is the set of 8000+ topics.

click image for full size view

Topics occurring in more than one place are merged based on their names, modulo a number of rules defined for this application which rely both on natural language processing techniques as well as on human work. Two topics are merged if they happen to have exactly the same name, or variants of the name that can be found through natural language processing. For example, if a topic name is the plural form corresponding to another identical topic name in the singular form, or if the difference in the topic names is capitalization, or punctuation, the topics merge. If a topic has a name which is a permutated form of another topic name, the topics also are merged. For example, the topic name "Fair market value" and the topic name "Value, fair market" merge into the same topic. In addition, tax experts have determined that some topics which have quite different names should be merged: for example, "Form 1040X" and "Amended Tax Return" are considered to be one topic with two different names.

Each topic in the topic map has its own portal or web page, which gives access to all places it occurs (its occurrences) among the various document types. In addition, each topic is linked to related topics, enabling access from one topic page to another.

When a topic has several names, the names are usually preserved, so that they can be accessed by their value in the index. In cases such as a plural form for a name, only one name is preserved. One name is chosen as the "main name" and each of the other names are displayed as "synonyms" on the topic page. The topic page main name is always the shortest name, enabling the computer to choose it algorithmically.

click image for full size view

Occurrences of a topic are represented on its topic page and are linked to the relevant locations in the publications. Occurrences are differently represented according to the publication type in which they are found. In the case of Tax Information Publications, the title of the section or chapter in which the topic occurs is displayed. In the case of Frequently Asked Questions, the question and its category are displayed. Some occurrences are identified as definitions of the topics. The occurrences are grouped (scoped) according to the type of document in which they are found, e.g. "Publications", "Questions" and "Tele-Tax Topics".

Topics related to a given topic are listed on that topic's page. The set of such related or associated topics results from a combination of automatic and manual processing. A hierarchy of topics exists in the sources, enabling a 3 level index. The "subtopics" and "subsubtopics" are considered to be related topics in the topic map. A subtopic has a concatenated name. For example, "Fair Market Value>Replacement cost". Automatic processing is based on the decision that if a topic name is entirely contained within another (concatenated) name, the topics they represent are considered to be related. they get related. The manual processing is based on an analysis of all topics performed by tax experts.

When a topic occurs within a publication page, navigation is enabled from there to the topic's topic page, and to the next and previous occurrences of the same topic elsewhere in the publications. This mechanism enables navigation from the middle of documents, not just from the top as is usually the case with web navigation.

click image for full size view

2. Incremental Aggregation of Information Sources

A fundamental aspect of topic maps, semantic integration, reveals that even if the structure of the source documents is perfectly consistent, allowing them to be parsed successfully in the XML/SGML sense, another level of consistency is needed for making full use of topic maps. The terms used for indexing need to be valid in the proper scope, or at the appropriate scale. For example, a publication might contain an index entry "Table 1" which makes sense locally. However, when indexes are merged, this information becomes meaningless. In the case of the IRS Tax Map, a common taxonomy is actually being built incrementally; that is, instead of starting from scratch, it's being built by refining the existing indexing schemas that are mostly bound to single documents.

3. Project Impact on Existing Practices

The IRS Tax Map project was designed in such a way as not to interfere with pre-existing workflow. Topic map-based navigation is regarded as yet another formatted output, in addition to print, PDF and HTML. There has been no change in the way documents are authored, nor any need to change applications that already exist (such as heavily customized SGML/XML editors). However, the effort to aggregate increasing amounts of information not originally intended to be connected (indexes are most commonly designed for one particular document) has revealed (not unexpectedly) slight inconsistencies in the indexing. Authors are now seeing more information than before, and are willing to improve the consistency of the topic map by providing a more integrated perspective. Work is now being considered that would improve the higher level consistency of the indexes.

While the Taxpayer Information Publications are written by tax experts, the Frequently Asked Questions are created by analyzing email messages from taxpayers (about 500,000 messages, for the current Tax Map). The keywords used as topics reflect the terms actually used by taxpayers, as opposed to those used by tax experts in the indexes of the publications. These document types are very different in nature. The teams that worked to produce these documents had not cooperated before the topic map project was launched.

One interesting side effect of this work is to cause groups of people who weren't working together, and mostly didn't know what the others were doing, to get together and confront their work. After the first contact, where people wonder why, what and how others are doing what they are doing, there is a feeling of mutual enrichment and a better understanding of the perspectives of the other groups. This effect is important because the users have their own needs, which are somewhat different than the needs of the authors. An author needs to privilege the accuracy of the information, while an assistor needs to quickly access all possible information items about a given topic, to ensure that the right answer is given to the question raised by the taxpayer, however specific it is, and even if the terms used by the taxpayer are quite different from the text that is present in the texts being read.

The IRS Tax Map usability tests have shown that among the assistors, the ones that were the most enthusiastic about the Topic map were the beginners or those with the least experience. There have been no questions asked about how to navigate the topic map; everybody has been able to use it immediately. Some assistors prefer to browse by documents, some like to use the indexes to find a topic, and others prefer the search engine. Some assistors have said that they had no idea that there was so much to say about a given question. It is hoped that the use of the Tax Map will give more confidence to the assistors and will provide on a whole an improvement in the quality of answers to the public. It is also envisioned that the public will benefit from using the Tax Map directly when and if it becomes available on the IRS web site.

Biography

Michel Biezunski works as a consultant for Coolheads Consulting. Michel was instrumental in the initiation of the Topic Maps paradigm, and has been involved in its development since the beginning (1992), together with Steven R. Newcomb. He is the co-editor of the ISO/IEC 13250 Topic Maps Standard. He is working to merge knowledge-based approaches with information management systems, both by designing custom applications and by fostering the development of new standards for the Web.