XML 2003 logo

Tax Map: An integrated navigation tool for the IRS Call Center Research System

Enabling Fast Access to Diverse Information

Abstract

The IRS Tax Map is an integrated navigation system built to enhance the productivity of the tax law assistance call centers and to serve as a model to demonstrate the concept of a single entry point for technical information at IRS. It is an application of Topic Maps. This presentation describes the design principles and the steps of the construction.

Keywords


Table of Contents

1. The IRS Tax Map product
1.1. Purpose
1.2. History
1.3. Description
2. Incremental Aggregation of Information Sources
3. Project Impact on Existing Practices
Biography

1. The IRS Tax Map product

1.1. Purpose

IRS Tax Law Assistors must navigate through a wide range of technical information to research answers for taxpayers questions. Assistors have developed their own methods, many of which are paper based, to prioritize and speed access to technical information. These methods are time consuming, inefficient, and increase the risk of error.

The IRS Tax Map project was designed and implemented to increase the speed and accuracy of information given to the public by providing telephone assistors electronic access to tax-related information by subject. Tax Map also enables direct navigation between related topics and provides access by document type and drill-down through a table of contents.

The Topic Maps paradigm was chosen as the foundation for this application because of its powerful facilities for organizing information and supporting navigation strategies.

1.2. History

The first prototype of the IRS Tax Map included the eight small business Taxpayer Information Publications. The goal was to create a subject-based navigation system exploiting existing indexing elements (used to provide the printed index) as topics in a topic map. The prototype was designed to optimize access, and went through different phases of development until the design was considered adapted to the requirements. The prototype was then published on the IRS Tax Products CD-ROM and the IRS Small Business CD-ROM to receive comments from the public. Many positive comments were received from the public concerning the prototype.

The second prototype of Tax Map included all 33 business Taxpayer Information Publications. The original application, built as a batch process using the Topic Map Loom technology, had no problem incorporating the additional documents. There was a desire by IRS to use this technology to help the tax law telephone assistors and the 33 business publication prototype was used to demonstrate the technology for their feedback. After receiving positive feedback from assistor, IRS proceeded with the with the Tax Law assistor prototype.

The Tax Law Assistor prototype of Tax Map contains all 95 Taxpayer Information Publications plus the Frequently Asked Questions and the Tele-Tax Topics document types (Users have asked for other document types to be integrated as well and nothing prevents us from doing that.) The fact that the documents are structured (e.g., SGML, XML) and have information that can be used to create the topic map automatically greatly facilitates the process. It is however possible to add other documents which are not structured, provided there is a way to get the topic information from them by different processes. Usability testing of Tax Map with Tax Law Assistors in the field and in the lab has confirmed the usefulness of integrating technical information by subject and the acceptance by users of this method of research.

One of the biggest challenges of integrating information for telephone assistors is to determine how many topics to provide them with to do research. Too many topics and they are overwhelmed; Too little topics and they can't find what they are looking for. This summer we began testing incorporation of the IRS Probe and Response Guide (P&R Guide) into Tax Map. The P&R Guide is based on topics and contains a series of probes that the assistor must ask the taxpayer. The P&R Guide in effect narrows the availability of topics and provides direct links into Tax Map.

1.3. Description

Topics originate from the collation of all headers in the IRS Taxpayer Information Publications, as well as from keywords used in the publications known as Frequently Asked Questions (FAQs) and the Tele-Tax Topics. Tax Map will incorporate the Probe & Response Guide used by the assistors as a framework for questions and answers while answering the taxpayers on the phone, as well as the Instructions that are used as a guide for filling up tax forms. The total number of topics exceeds 8,000. Having all topics in a single index would be overwhelming. Topics are separated into three categories: The "key topics" are the most frequently used ones, and are chosen within the comprehensive list of topics by tax experts, the "form topics" are the ones which relate to a form or a schedule, and all other topics. The key topics are accessible through an alphabetical index. Topics which have to do with forms also are isolated. They are accessible through a specialized "Form topics" index. All other topics are present in the topic map and can be retrieved using a built-in topic-based search tool whose search domain is the set of 8000+ topics.

Topics occurring in more than one place are merged based on their names, modulo a number of rules defined for this application which rely both on natural language processing techniques as well as on human work. Two topics are merged if they happen to have exactly the same name, or variants of the name that can be found through natural language processing. For example, if a topic name is the plural form corresponding to another identical topic name in the singular form, or if the difference in the topic names is capitalization, or punctuation, the topics merge. If a topic has a name which is a permutated form of another topic name, the topics also are merged. For example, the topic name "Fair market value" and the topic name "Value, fair market" merge into the same topic. In addition, tax experts have determined that some topics which have quite different names should be merged: for example, "Form 1040X" and "Amended Tax Return" are considered to be one topic with two different names.

Each topic in the topic map has its own portal or web page, which gives access to all places it occurs (its occurrences) among the various document types. In addition, each topic is linked to related topics, enabling access from one topic page to another.

When a topic has several names, the names are usually preserved, so that they can be accessed by their value in the index. In cases such as a plural form for a name, only one name is preserved. One name is chosen as the "main name" and each of the other names are displayed as "synonyms" on the topic page. The topic page main name is always the shortest name, enabling the computer to choose it algorithmically.

Occurrences of a topic are represented on its topic page and are linked to the relevant locations in the publications. Occurrences are differently represented according to the document type in which they are found. In the case of Tax Information Publications, the title of the section or chapter in which the topic occurs is displayed. In the case of Frequently Asked Questions, the question and its category are displayed. Some occurrences are identified as definitions of the topics. The occurrences are grouped (scoped) according to the type of document in which they are found, e.g. "Publications", "Questions" and "Tele-Tax Topics".

Topics related to a given topic are listed on that topic's page. The set of such related or associated topics results from a combination of automatic and manual processing. A hierarchy of topics exists in the sources, enabling a 3 level index. The "subtopics" and "subsubtopics" are considered to be related topics in the topic map. A subtopic has a concatenated name. For example, "Fair Market Value>Replacement cost". Automatic processing is based on the decision that if a topic name is entirely contained within another (concatenated) name, the topics they represent are considered to be related. they get related. The manual processing is based on an analysis of all topics performed by tax experts.

When a topic occurs within a publication page, navigation is enabled from there to the topic's page, and to the next and previous occurrences of the same topic elsewhere in the publications. This mechanism enables navigation from the middle of documents, not just from the top as is usually the case with web navigation.

2. Incremental Aggregation of Information Sources

Merging heterogenous information pieces together is not an easy task, even if they apparently follow a similar schema, i.e. they conform to an identical SGML or XML structure. Structure can be used to gather information together from similar element types, and its availability makes a big difference with situations where just plain text is available. But still, this is not enough. Lexical parsing needs to be performed in addition to structural parsing.

When information sources are created and maintained by various people, it is not always possible nor desirable that everybody conforms to a given taxonomy and vocabulary, such as "authority keyword lists". Because the context in which these words are used greatly varies, or simply because there are several ways to speak about the same subject, all of them being equally good. It is not always possible for groups in a big organization to constrain themselves, in order to work together with others, to a finite list of authorized terms. The resulting situation is that there is a huge variety in the way information is described and displayed which can be qualified as "missed opportunities for merging". This is where this approach shines, in the sense that it enables the possibility of merging information "after the fact", by applying either automatic processes that can be done by computers, or human input establishing equivalences or relationships between various terms.

An annual workshop is held to maintain the consistency of the information contained in Tax Map and facilitate navigation for example between terms which only a tax specialist can know: for example, in the IRS jargon, "Form 1040X" and "Amended Tax Return" are used interchangeably. There is the possibility of offering several layers of access, to facilitate beginners' access to the complexities of tax-related information, if terms that are used as entry points are easily identifiable by users who do not necessarily have a very long exposure to the technicalities of that information.

Another aspect of semantic integration is the fact that some information which gets automatically extracted only makes sense in a local context and are not informative when used in a broader context. For example, "table 1" is not a useful piece of information in an integrated index. Therefore, it is important to create a list of stop words that should not be inserted as topics, because the noise reduction is an important aspect of the usability of such a product.

3. Project Impact on Existing Practices

The IRS Tax Map project was designed in such a way as not to interfere with pre-existing workflow. Topic map-based navigation is regarded as yet another formatted output, in addition to print, PDF and HTML. There has been no change in the way documents are authored, nor any need to change applications that already exist (such as heavily customized SGML/XML editors). However, the effort to aggregate increasing amounts of information not originally intended to be connected is sometimes considered by the authors as candidates for improvement, in the perspective of a better integration. The important point here is that this is not mandatory, since the superimposed navigation schema can be applied without changing the sources. This feature is considered to be a great incentive for improvement, especially because it has no mandatory character. Authors are now seeing more information than before, and are willing to improve the consistency of the topic map by providing a more integrated perspective.

While the Taxpayer Information Publications are written by tax experts, the Frequently Asked Questions are created by analyzing email messages from taxpayers (about 250,000 messages, for the current Tax Map). The keywords used as topics reflect the terms actually used by taxpayers, as opposed to those used by tax experts in the indexes of the publications. These document types are very different in nature. The teams that worked to produce these documents had not cooperated before the topic map project was launched.

One interesting side effect of this work is to cause groups of people who weren't working together, and mostly didn't know what the others were doing, to get together and confront their work. After the first contact, where people wonder why, what and how others are doing what they are doing, there is a feeling of mutual enrichment and a better understanding of the perspectives of the other groups. This effect is important because the users have their own needs, which are somewhat different than the needs of the authors. An author needs to privilege the accuracy of the information, while an assistor needs to quickly access all possible information items about a given topic, to ensure that the right answer is given to the question raised by the taxpayer, however specific it is, and even if the terms used by the taxpayer are quite different from the text that is present in the texts being read.

The IRS Tax Map usability tests have shown that among the assistors, the ones that were the most enthusiastic about the Topic map were the beginners or those with the least experience. There have been few questions asked about how to navigate the topic map and everybody has been able to use it after a short introduction. Some assistors prefer to browse by documents, some like to use the indexes to find a topic, and others prefer the search engine. Some assistors have said that they had no idea that there was so much to say about a given question. It is hoped that the use of the Tax Map will give more confidence to the assistors and will provide on a whole an improvement in the quality of answers to the public. It is also envisioned that the public will benefit from using the Tax Map directly when and if it becomes available on the IRS web site.

The condition for such a project to be successful is to progressively increase the convergence of the original editorial guidelines and the usability of the integrated information. When it becomes possible to reveal how much of the editorial policy has been implemented, the quality of the final product increases in terms of its usability. The possibility to draw a visible line between what was originally intended and what is actually achieved shows that it is possible to benefit from cooperative work while leaving as much freedom as possible to the creators of the information sources. It is our hope that over the long term the importance of semantic integration will be realized by the various committees writing publications and source files will then be authored in a manner that will facilitate semantic integration.

Biography

Michel Biezunski works as a consultant for Coolheads Consulting. Michel was instrumental in the initiation of the Topic Maps paradigm, and has been involved in its development since the beginning (1992), together with Steven R. Newcomb. He is the co-editor of the ISO/IEC 13250 Topic Maps Standard. He is working to merge knowledge-based approaches with information management systems, both by designing custom applications and by fostering the development of new standards for the Web.