The Darwin Information Typing Architecture (DITA) is an XML-based, end-to-end architecture for authoring, producing, and delivering technical information.
This paper describes how DITA-based documentation was implemented at CEDROM-SNi, one of Canada's leading on-line news content aggregators. The project delivers documentation as diverse as user training materials and Web Services reference guides targeted to programmers. This paper focuses on the benefits, how tos, and lessons learned.
Technical documentation has its own unique challenges. Its deliverables range from simple reference guides and educational material to complex, multilingual procedure manuals. Critical success factors of a documentation project are numerous and diverse – usability, deadlines, cost, language, delivery media (paper, online) – all of which have their own purpose and challenges. This paper discusses these issues and provides a framework for future DITA projects.
Keywords: DITA; Editing/Authoring; Publishing
| XML Source | PDF (for print) | Author Package | Typeset PDF |
One of the greatest challenges facing technical communicators is delivering multiple quality documentation products (such as procedure manuals, reference guides, training material) as soon as the product being documented is delivered.
Before getting into how implementing DITA and XML helped meeting these challenges at CEDROM-SNi, we must define the specific challenges in more detail. This paper introduces some of the most important challenges met by technical communicators, and then some specific to CEDROM-SNi's project. That first part is followed by a very quick introduction to DITA and to the processes involved in implementing DITA for this specific project. Benefits and lessons learned are reviewed against the identified challenges.
It sounds obvious that documentation should focus on users needs and that meeting these needs should be easy once you know who is your audience. Unfortunately, users are not a uniform group; they have different product knowledge, different backgrounds and they may have different reasons for using the product.
Basic audience analysis usually identifies these three subgroups:
Basically, the challenge addressing these audiences’ needs within their own product understanding, without bothering them with information they do not need or they are not ready to understand. Studies show that when people don’t find what they need right away, they quickly abandon the task. Other audience factors depend on specific products or industries, like the users' possible roles and goals, or in the software industry, on the level of computer literacy [Cooper, Alan, Reimann, Robert].
Users often refer to documentation after they already tried and failed to figure things out on their own. They need to find the information quickly. Good indexes, glossaries (that include synonymous) [Cooper, Alan, Reimann, Robert] and great search capabilities usually solve this issue. A good table of contents and document structure can help too [Feldman, Susan].
Documentation teams' deliverables range from "getting starting" guides to advanced user references, and they can be delivered on paper or by any electronic means.
Sometimes users read documentation from beginning to end, but most often they scan through for useful information. For example, a training manual is usually read and used differently than an API reference guide.
Time is an important factor. Documentation is at the end of the development cycle, right before translation and delivery. Yet, delivery dates are often based on product development alone, so there is very little time to integrate screenshots, if any, and last minute changes. Making updates can be difficult when there are many changes in each document. There is a high risk of inconsistencies and human errors.
When information is found in more than one document, it needs to translated more than once. Each update has a domino effect on translation costs. Moreover, inconsistencies in translations are frequent in large projects and can lead to both user confusion and dissatisfaction.
The goal of the CEDROM-SNi project was documenting a service that offers news documents (newspaper articles, radio and tv show transcripts, etc). The Web application is used mostly by private companies, governments, associations, libraries, schools and universities for media monitoring, e-press clipping, archives and research for gathering information.
The service is offered by three different product interfaces in three different locales (en-ca, fr-ca, fr-fr). Product features are also offered through Web Services that companies can use to provide access to news documents from their own website.
Deliverables include:
The audience is made up of:
Information to include in deliverables is:
| Online Reference Guides | Online Web Services Guide | PDF Training Guides | |
| Functionalities* | X | X | X |
| Tasks* | X | X (not same as user interface) | X |
| Exercises | X (answers for trainer only) | ||
| Trainers Notes | X | ||
| API (WSDL) | X | ||
| XML Schema documentation | X |
* Requires support for conditional text, since different audiences have access to the different tasks and features.
Putting documentation elements in the preceding tables shows the great potential for reusing information.
There was an extra challenge to this project: the overall specification was undefined at the start of the project. Although basic blocks were defined, how they came together was only defined has the project developed. Changes were frequent as users tested the new interfaces, and the basic user navigation was not finalized until the end of the project. For example, the "theme" feature was developed in the alpha release, but how to use it for a press review was only finalized after the interface is tested by multiple user groups.
Writers prefer to guide users through user processes rather then through a list of features, but because of the timetable, the system tasks had to be documented before all components for the real user tasks were ready. This made it difficult for the writers to create a good document structure or to base the table of contents on user tasks rather then on the system features. There was an immediate need to create information as chunks that could be moved around as the project evolved.
Documentation Team: 1 person who has access to XML-knowledgeable people.
Deadlines:
This section covers the DITA features that are most relevant to the CEDROM-SNi project. You can get a fuller introduction to DITA at http://www-106.ibm.com/developerworks/xml/library/x-dita1/index.html.
Besides the generic topic type other proposed topics are:
Why use topics as the base unit?
The topic is the smallest independently maintainable unit of content. Topics must be able to stand alone so that they can be understood when they are encountered out-of-context, for example when a user finds the topic through search, an index, or by following a link. [Priestley, M.]
It made sense for this project to use topics since the documentation was well-suited to chunking into the granular topics that would fit into the proposed DITA topic base. In our case, features were presented as DITA concepts, system tasks as DITA tasks and the Web Services APIs and XML schemas as DITA references.
Another advantage of working with topics is that it reinforces the ability to write independent chunks of information that can be reused in different contexts. For example, the task "saving a news articles search" can be used on its own in the online reference guide for intermediate-level users who need to be reminded how to do it; it can also be used in beginners' training sessions to show why they would want to save a seach they created.
Reuse and modularity have other immediate positive side effects:
Specialization is the process by which DITA lets you define your own topic types from existing ones.
Not all identified building blocks of the project fit the proposed DITA topics. The ability to create our own topic types was a very important factor, especially for those exercises in training manuals that didn't fit any of the proposed topic types, other than the generic basic topic.
DITA maps are used to identify topics to include in a project. They can also be used: to define relationships between topics; to create navigational tools; or to add metadata to topics.
Maps were very important to meet our need to present information in different orders in the different documentation products, especially for training documents, since the content differs based on each customer's particular needs. For example, training new employees in the basics of using the product, training regular users to use new features in a new version or teaching librarians to use queries for advanced searches might use different topics but might also use some of the same topics presented in a different order. Creating a different map for each documentation project is an easy and rapid task compared to other alternatives such as copying and pasting topics in each project and then updating each occurence of the same project in multiple manuals.
We are also using DITA maps to create "related links" at the end of each online topic that are customized to each deliverable's context.
DITA does define an architecture, but the DITA materials also include complete DTD components and sample XSL stylesheets for its proposed basic topic types.
Starting from an existing DTD saved us a lot of time and trouble. Moreover, being able to base our XSL on the samples provided by IBM allowed us to get started right away.
The following three sections describe the processes for each major group of deliverables. However, the content files used for the entire project are all stored together in the same document base on the same server.
The basic project elements are:
The following figure present major steps to producing the online help from multiple DITA topics and a DITA map.




The preceding graphic shows the search options that allow users to look for words or expressions in titles only or in the whole content.
We also index metadata in this project, and we include synonymous in the metadata so users can find topics about a subject even if the expression does not appear in the text itself. For example, in the task "Se connecter", we included the keyword "login" which is an anglicism often used by French Canadian who want information about logging into an application. If they search for "login", the application will return the task "Se connecter".
Since tasks are easy to distinguish from other information, we can return grouped search results where all generic topics, concepts and references are returned as "descriptions found" and all tasks are returned as "tasks found". It is an extra way to help the user find the right information quickly.
These processes are all implemented in script files and are performed automatically once the technical writer double-clicks a simple .bat file.
Producing the Web Services help is quite similar to producing the user online help. One extra transformation is needed because the topics are extracted from available XML files instead of being written by a technical writer. When developers create a Web Service, they provide a WSDL, which is an XML file that contains information about the Service. This information could be sufficient to document the Web Service, but it often lacks comments and is hard to read for people to read. Our developers create their Web Services in C#, and we agreed that comments would be added in XML and reviewed before publication. Therefore all the information necessary to document the APIs can be extracted and presented as-is.
The Web Service description is extracted for each Service, transformed into a DITA reference topic and presented at the first tree level. Each method is transformed into a DITA reference and presented under the proper Web Service. Descriptions for parameters and return values come from comments in the C# code. The tree used to create the TOC is created automatically by scripts when the reference topics are created.

Using DITA to document Web Services allowed me to make use of an already-defined transformation for formatting, but also to reuse the functionality definitions created for the regular online help delivery. These definitions are grouped in a section that explains the purpose of the Web Services. This kind of reuse is enabled by DITA, but needed to be reinforced by guidelines for writing feature definitions specifying that the definitions were to focus only on the purpose of the feature and how it is useful into user processes such as media monitoring or archiving. No interface specific information was allowed in the feature definitions.
The first step to producing the PDF manual is the same one as for producing online help: list the topic you want to use in a DITA map. However, the processing is slightly different: the first processing step merges the topics together into a single file. Then, in a second step, the output is produced by sending an XSL-FO stylesheet and the merged XML file to the XSL processor.
Our first choice of XSL formatter was "fop", which is free software. Once all our content is developed, we'll choose the tool that best renders our particular content.
Features of our PDF manuals include the table of contents, which is exactly the same hierarchy as the DITA map file, and the index at the end of the document, which is extracted from the "indexterm" element in each topic.
Although the first part of the project is over, there is still much to do:
The whole process of moving to XML and DITA was less tedious then first expected. However, certain issues did come up that to be considered by others who would like to walk the XML/DITA path.
Structuring the project's information with DITA was easy. The fact that there was no legacy content probably made the whole process simpler. I was able to create multiple user tasks from smaller system tasks and to document the system while user scenarios were still being defined.
Using DITA allowed for a quick start. Having access to a DTD set and sample XSL transformations saved me a lot of time. However, there was definitely an adaptation period; time was necessary to learn how to write within the information structure, use the tools and be comfortable with the tag set.
The overhead needed to adapt was worth it for this project because of the necessity to build multiple training guides and the changing nature of user tasks over the development period. If no extensive reuse had been needed, it might not have been worth the time and effort.
The first, system implementation, part of the project was a success. The next part includes a bigger human factor as other content developers will need to be able to modify topics. We do not foresee particular challenges related to DITA regarding our workflow process development.
DITA and XML allowed for a lot of automation and reuse, and processes that have been defined will serve as a solid structural foundation for future projects. In fact, we are already using what we built in a new project.
[Ament, Kurt] Single Sourcing – Building Modular Documentation, William Andrew Publishing, 2003.
[Cooper, Alan, Reimann, Robert] About Face 2.0 The essential of interaction design, Wiley Publishing, 2003.
[Coverpage's technology report] http://xml.coverpages.org/dita.html#relatedTM.
[Day, D., Priestley, M., Schell, David A.] Introduction to the Darwin Information Typing Architecture – Toward portable technical information, http://www-106.ibm.com/developerworks/xml/library/x-dita1/.
[Duffy, Tommy] Build an XML-based Tree Control with JavaScript, DevX.com, http://www.devx.com/getHelpOn/Article/11874.
[Feldman, Susan] The cost of not finding information, KMWorld, Volume 13, March 2004.
[Hackos, JoAnn] Content Management for Dynamic Web Delivery, John Wiley & Sons, February 28, 2002.
[Priestley, M.] Scenario-based and model-driven information development with XML DITA, xml.coverpages.org/PriestleyACMSIGDOC-2003-DITA.pdf.
[Rockley, Ann] The impact of single sourcing and technology, Technical communication, Volume 48, Number 2, May 2001.
[The Center for Information - Development Management] Making a business case for single-sourcing, Best Practices, Volume 3, Number 2, April 2001.