XML 2003 logo

Making Markup Invisible

XML Legislative Drafting With No Author-Visible Tags

Abstract

The Office of the Secretary of the United States Senate is developing and implementing an XML-based editing system for the authoring of legislation designed to make the structured markup completely transparent to users. Authors work in an environment similar to that of a word processor in a completely "tags-off" view, with the system maintaining the markup and structure behind the scenes at all times. This system reduces the XML knowledge burden on subject-matter experts to almost zero, allowing authors to concentrate on content while needing to know almost nothing about markup.

Keywords


Table of Contents

1. Background
1.1. The Senate Legislative Environment
1.1.1. Creation of Legislation
1.1.2. The Congressional Legislative DTDs
2. The Users
2.1. User Requirements
2.2. Issues with other XML editing approaches
2.2.1. External coders
2.2.2. XML training for drafters
2.2.3. Translation software from word processors
3. Technical Solutions
3.1. Structured vs. non-structured editing
3.1.1. Key relaxation steps of the DTD
3.1.2. Editing - Exchange transformations
3.1.2.1. Translation to editing
3.1.2.2. Editing to Exchange translation
3.2. Automatic handling of delete and backspace
3.3. Cross container cut and paste
3.4. Structure and Style Verification
4. LEXA
Biography

1. Background

1.1. The Senate Legislative Environment

The need to provide a comprehensive Senate Legislative Information System grew out of consideration of several important issues during the 104th Congress. Congress wanted the assurance that Members and staff would have legislative information that was accurate, timely, comprehensive, and available directly at their desktops. Another issue concerned the need to reduce the overlap in work done by the Library of Congress, the House of Representatives, and the Senate. The development of the LIS was mandated in Section 8 of the 1997 Legislative Branch Appropriations Act (2 U.S.C. 123e).

In 1997, the Senate Rules Committee and the House Administration Committee approved the establishment of a data standards program using the Standard Generalized Markup Language (SGML). The Senate and House agreed to meet in regular coordinating committee and technical committee meetings to address policy issues and to guide the development of these data standards, now focused on XML instead.

In August 1999, the Secretary of the Senate and the Clerk of the House, with the approval of the Senate Committee on Rules and Administration and the Committee on House Administration, initiated a feasibility study on using XML for the preparation and processing of legislative documents. A joint letter invited participation from the Senate Office of the Legislative Counsel, the House Office of the Legislative Counsel, the Senate and House Enrolling Clerks, the Government Printing Office (GPO), and the Library of Congress (LOC).

The XML Feasibility Study provided valuable insights for all of the groups involved concerning the use of XML, the state of XML tools, and the application of this technology for legislative processing. The Study also confirmed that while there are fundamental business process differences between the House and Senate, XML can be relied upon as a common data standard for document exchange. On December 5, 2000, the Senate Committee on Rules and Administration and the Committee on House Administration jointly accepted XML as the primary data standard to be used for the exchange of legislative documents and information.

Efforts to establish the XML data standard for Senate-wide use began in October 2000. In 2001 the Office of the Secretary of the Senate began work on an XML editing environment for the production of legislation. Building off of work done jointly and individually among the participants, particularly the efforts of the Legislative Computer Systems group of the Office of the Clerk of the House, the Office of the Secretary began development of an environment to facilitate the editing of legislation in XML that was customized for the processes and needs of the Senate.

These efforts developed into the Legislative Editing in XML Application, or LEXA. LEXA is currently being user tested and is expected to be in widespread use within the next year.

1.1.1. Creation of Legislation

The overwhelming majority of legislation introduced in the Senate is drafted via the Senate Office of Legislative Counsel (SLC). The SLC provides legal assistance to Committees and Members of the Senate related to the preparation of legislation, providing the wording of most Federal law in a nonpartisan and non-political way. The SLC is staffed by approximately 40 attorneys and staff who produce approximately 30,000 documents per year, from short resolutions declaring National Cowboy Poetry Week to thousand page bills authorizing all facets of Federal activity.

Legislation is currently authored using a customized version of a text editor to create files marked up with an extensive set of formatting codes developed by the Government Printing Office. These codes (called "locator codes") provide a great deal of power and flexibility to authors, but are entirely print-focused and difficult for untrained readers to understand. Because the locator codes are print-based, there is a high degree of variation in how the locator codes are used to provide identical output on the page. It is not uncommon for new attorneys in the SLC to spend months having to memorize these codes in the course of training.

In addition, the editing environment is entirely text based, requiring a print step to see if the document has been correctly formatted. Incorrectly formatted documents generally prevent the composition program from printing the document. In cases like this, an obscure error code is inserted into the document itself, which the author must interpret and correct in order to complete the print process.

Once legislation has been drafted, a Senator may introduce it onto the floor of the Senate. When introduced, it is sent to GPO for official printing and to the Library of Congress for archiving and reference. After passing the Senate, a bill is transmitted to the House of Representatives for consideration. Other offices in the Legislative and Executive branches update the various amended codes, regulations, and laws based on the content of the legislation after passage.

While Congress is in session over 200 days a year, the majority of legislation is passed in the final weeks and days of sessions. This race at the finish puts a great deal of stress on the offices and staffs of the Senate. The system is on occasion jury-rigged in order to make these tight deadlines, but many years of experience have smoothed out the frictions.

Using structured documents for legislation yields many benefits, including:

  • Improved reuse of content across legislation

  • Delivery in multiple formats

  • Improved search and retrieval

  • Improved accuracy and quality of drafting

  • Automatic designation of enumerators

  • Automatic generation of text as either shortcuts or standard templates

  • Improved table of contents generation

  • Automatic tracking of cross-references

1.1.2. The Congressional Legislative DTDs

Legislation is, for the most part, strictly structured. However, in the 214-year history of the Senate, styles have changed, and a great many exceptions, both intentional and accidental, have been made in the laws. In order to capture this variability, creating a schema that is both functional and simple in the common cases and flexible for the unusual cases remains an ongoing challenge.

Legislative documents are generally hierarchical, similar to an outline. In the general case, the body legislation is organized into a named set of levels describing the nesting of the intent. These body levels (from highest to lowest) are named in the DTD as:

  • section

  • subsection

  • paragraph

  • subparagraph

  • clause

  • subclause

  • item

  • subitem

The only exception is that paragraphs may also immediately follow sections in the case where a subsection is not present. Many bills are simply made up of sections but others are organized into "big levels" that organize the sections. These big levels may have different children as described in the following table, but note that the arrangements for a big level is not unique - for example, a chapter may contain a part or a part may contain a chapter depending on the particular style of the law being referenced.

If a big level is...
division title chapter part subtitle subchapter subpart
The next level may be
section section section section section section section
subdivision subtitle subchapter subpart chapter part chapter
title chapter part chapter part    
  part          

Table 1. Elements allowed after "big" levels

Sometimes this division is done for clarity, but often this is due to the practice of combining separate measures into a single measure for passage. This composition can have the effect of changing all the body levels of a portion by, for example, taking a measure written in sections as a stand-alone bill and placing it into a particular section of another bill, forcing all the levels down a step in the hierarchy. Since legislation is drafted with named internal references such as "for the purposes of paragraph (2)", changing "paragraph (2)" to "subparagraph (ii)" can require extensive editing.

Each level has several parts within in it: an enumerator, a header, and text content. Depending on the context, each of those parts may be required, optional, or not allowed. Inside the text content is other markup, for either style or added meaning. After the text content, sublevels (the next level down) are allowed with flush-left text intermixed.

The most common form of additional information is references to other documents or to other parts of the same document. There are both internal and external cross-reference tags that can be used to support hyper-linking. Definition of terms is often critical in law, and additional tags to support the identification of defined terms is provided.

The majority of legislation contains quoted material. Generally these quotes are themselves legislative language: changes to the U.S. Code or other laws. These legislative quotes contain the same structures as stand-alone legislation. However, the text being altered may be drafted in an alternative style for historical or legislative reasons. Heavily amended legislation may be written in several styles depending on the aesthetic judgment of those writing the amendment and the styles in fashion at the time. While the majority of style issues relate to numbering styles or indention of text, older legislation may include differences in line breaking or the order of the enumerator and header within each level.

2. The Users

2.1. User Requirements

An extended set of user requirements were gathered prior to starting development of LEXA. Beyond the extensive set of functional requirements, some vital non-functional requirements emerged: The SLC is primarily concerned with impacting as little as possible the accuracy and the speed at which they could meet the needs of their clients. While the legacy editing environment has many flaws, it is a system the SLC has mastered. So new systems must be easy for the existing staff to use and easy for existing and new staff to learn.

The degree of variability in legislation makes it important that any automation within LEXA be controlled by the drafter. Previous bad experiences with other tools makes the SLC very wary of tools that create more work than they save. While automation is considered to be helpful, an oft-repeated requirement was that nothing should happen without giving the drafter an opportunity to easily override the application's choices.

The amount of training required in XML is a strong concern as well. The requirement of SLC attorneys and staff to learn locator codes is really a side effect of the speed and accuracy requirements of the office and the state of technology more than 15 years ago when the legacy system was implemented. Given the current state of XML authoring tools, a desire to not have to know the underlying technical issues of tagging was considered a reasonable expectation. That is not the same as not understanding the document structure; drafters are experts in the way that legislation is structured. But knowing the legislative structure is different than having to diagnose a misplaced angle bracket or elements required for a document to be valid XML.

An important issue uncovered during the requirements analysis was that the SLC drafters work with documents in a specific way: most of the documents are created from parts of multiple previous documents and edited into proper form. Most of the XML tools that had been reviewed were focused on creating content from scratch. When these XML tools were presented to the SLC as prototypes for review, the drafters would immediately start re-arranging the document's content — an task that structured document editors find very challenging. It was not until this important distinction was detected that a solution suitable to the SLC reviewers emerged. The SLC's working style is somewhat counter to the highly structured nature of the documents required by the legislative DTDs and closer to the working style of a word processor.

2.2. Issues with other XML editing approaches

Before deciding on the techniques used in LEXA, several alternatives were considered and rejected.

2.2.1. External coders

Many of the other offices that produce documents for GPO use a dedicated coder approach. This is a single person or a small group of experts in the current GPO coding system and a background in printing. This person is not generally a subject matter expert and has no input on content of the documents.

The SLC had rejected this approach years ago because of the quick turnaround time and accuracy requirements: clients often require changes literally minutes before legislation is brought to the floor. The additional step of preparing for print and the additional review of coded documents to ensure against transcribing errors was deemed unacceptable.

2.2.2. XML training for drafters

The most common approach would be to provide training in XML for drafters. This is analogous to current practice in the SLC, but modern XML editors have much higher ease of use capabilities than the legacy tools.

Learning XML did not provide enough benefits to the SLC to implement. The drafters wanted to move beyond having to understand the technology and trading one coding system for another.

2.2.3. Translation software from word processors

Several products offer translation capabilities from word processors into XML. Several representative products were looked at, but tended to suffer from the same set of problems stemming from the high complexity of the DTD: It is very difficult to make sure that the translation is correct in all cases and any repairs to the documents, beyond the simple, require knowledge of the underlying tag structures. Further, the translation process was the core of the product and could not be easily customized to support the specialized needs of this application.

One additional issue is that all the products examined require a constrained way of authoring documents, generally requiring use of defined styles. For example, in Microsoft Word, there are at least four different ways to indent a paragraph to a particular location. They all look the same on the screen and on the page. Translation software tended to translate the different methods differently in many cases. Given the current state of practice of the SLC drafters, it was felt that substantial training in Word or Word Perfect would be required for much of the current staff, in addition to XML basics training, to be able to correct document translation errors.

3. Technical Solutions

The solution that was selected was an extensive customization to Corel's XMetaL XML editor. However, in order to meet the ease-of-use requirements, the decision was made to try for a different paradigm than the traditional XML tag-insert model. The Legislative Editing in XML Application, or LEXA, would appear to authors to be a standard word processor. Common word processor actions such as backspace, delete, cut, and paste will work from any portion of the document to any portion of the document. At no time would the document refuse to perform an action because the document schema did not allow intermediate editing states.

In addition to these changes, LEXA also includes extensive data entry aids and functions that, while often challenging to implement, were not beyond the usual customizations of an XML application.

3.1. Structured vs. non-structured editing

Previously, the business need for structured documents and the drafters' requirement for free-form editing have each been discussed. In order to satisfy both requirements, the decision was made to create an editing DTD that selectively relaxed the most onerous requirements of the full legislative DTD, referred to as the exchange DTD. By allowing these relaxations, it became possible to allow drafters to spend the overwhelming majority of time working in a free form environment but still preserve the ability to produce fully compatible XML documents.

3.1.1. Key relaxation steps of the DTD

In order to support the requirements for free-form editing, two key changes were made to create the editing DTD:

  • Inside each body level, enumerators, headers, and text content were made repeatable and optional in any order

  • Body levels were made peers of each other instead of nested

These two changes made it possible to provide all the flexibility required in editing at a minimal risk of not being able to correctly transform the document back to the exchange DTD. This allows the levels in the hierarchy to be placed in order while retaining their identity and, within a level, allows various intermediate steps in document repair to be done without having to deactivate and reactivate the editor's schema checking capability.

3.1.2. Editing - Exchange transformations

A major concern about using an editing DTD was that the conversions be transparent and that no editing DTD documents escape out of the SLC into the legislative processes. To support that, we perform the transformation into the editing format when a document is opened in LEXA and into the exchange format when a document is saved. At no time is the editing format document used outside of the running LEXA process.

This does impact performance of some functions, such as translation into HTML or printed format, but that was considered a reasonable trade-off against the possibility of releasing a document that looks almost like a valid exchange document, but breaks all the tools in other offices.

3.1.2.1. Translation to editing

Translation from a strict model into a less strict model is generally easy. Given a section of XML[1] such as


    <subsection><enum>(b)</enum><header>REMOVAL OF CAP ON AMORTIZABLE
    BASIS</header>

        <paragraph><enum>(1)</enum><text>Section 194 of the Internal
        Revenue Code of 1986 is amended by striking subsection
        (b)</text>

            <subparagraph><enum>(A)</enum><text>Section 267 of the
            Internal Revenue Code of 1986 is amended by striking
            subsection (q)</text></subparagraph>

            <subparagraph><enum>(B)</enum><text>Section 254 of the
            Internal Revenue Code of 1986 is amended by striking
            paragraph (1)</text></subparagraph>

        </paragraph>

    </subsection>

gets translated into

 <subsection><enum>(b)</enum><header>REMOVAL OF CAP 
                      ON AMORTIZABLE BASIS</header></subsection>

    <paragraph><enum>(1)</enum><text>Section 194 of the 
                Internal Revenue Code of 1986 is amended by striking 
                subsection (b)</text></paragraph>

    <subparagraph><enum>(A)</enum><text>Section 267 of the 
             Internal Revenue Code of 1986 is amended by 
           striking subsection (q)</text></subparagraph>

    <subparagraph><enum>(B)</enum><text>Section 254 of the Internal 
             Revenue Code of 1986 is amended by striking 
             paragraph (1)</text></subparagraph> 

Since the internal content of the levels is the same, no translation is needed. The levels themselves, being no longer nested, require the generation of end tags for each level, and the nesting end tags are removed.

3.1.2.2. Editing to Exchange translation

Converting from the less restrictive editing DTD to the more restrictive exchange DTD is more challenging. The transformation engine must try to fit each element into the exchange document while automatically making repairs to create the proper context.

Document repairs are necessary in cases where required markup is missing, where markup exists when it should not (rogue markup), or where markup is in the wrong order relative to its siblings. Required markup can be inserted where needed. Rogue markup and misplaced markup however, cannot be removed or moved without confirmation from the author, unless it is devoid of content and attribute values. Instead of removing or moving this markup, more markup can be added to make the document valid. This ensures that there is no data loss in the translation, and also enables the document to be valid against the exchange DTD. Repair actions are recorded in the document in a way that is invisible to an exchange DTD processor. That is, once repaired, the document may be exchanged with any other processor that understands the exchange DTD. If the document is opened again in LEXA, the repairs are reversed— but only if the affected markup has not been modified in the interim.

Translation from a loose content model to a strict one can be achieved using a variety of methods. Two approaches were considered for the LEXA project. The first approach involved hierarchical processing, and the second involved sequential processing.

In a hierarchical process, the Document Object Model (DOM) is used to test and modify the source document so that it eventually conforms to the exchange DTD. Since the content model of any declaration is only concerned with a specific node and its immediate child nodes, any node whose immediate children conforms to the exchange DTD is said to be exchange-valid. It then follows that if all the nodes in a document are exchange-valid, then the document itself is exchange-valid as well. Therefore, in order to ensure that a document is exchange-valid, a small validity enforcement routine is applied recursively on all the nodes in the document. This routine is applied from the leaf node level up to the root level, in order to ensure that content modifications have limited impact throughout the document tree - a repair made to a specific node only affects its siblings and parent node.

In a sequential process, the document is filtered through a Simple API for XML (SAX) handler that utilizes a set of finite state machines (FSM), one defined for each element. As content is handled, the state of the document is updated and the appropriate FSM is queried to determine what markup is allowed to appear next. If the next SAX token fails this test, the set of FSMs is used to determine the interim markup required - new siblings, new descendants, or the closure of ancestors - to ensure that the next token is valid.

The final conversion engine uses primarily the sequential method, but certain operations require a greater context knowledge than SAX allows, resulting in a hybrid approach.

3.2. Automatic handling of delete and backspace

The comment most often heard when presenting a standard XML editor in a tags-off view to the drafters was "why did it just do nothing but beep at me?" when trying to use the delete or backspace key. The beep was generally caused by trying to remove a required element - for example, trying to remove an entire level from the document via backspace. The level would look like

<paragraph><enum>(2)</enum><text>A paragraph to be removed.</text></paragraph>

and the enum element is required by the DTD; the text element is optional.

The drafter would remove the text element with backspace and continue to backspace. The editor would remove the content of the enum, but would not remove the enum itself. The result would be a blank line in the document and a great deal of beeping.

A similarly confusing result would occur when invisible items such as tags or multiple spaces were being deleted. The user would hit the backspace key, but nothing would change on the tags-off display. Invisibly, the editor had removed a set of tags, but that had no effect on the screen.

While these are reasonable results for a tag-visible editor, the no-tag view made it hard for the drafter to understand what the editor was complaining about. In addition, the complex structure of the DTD makes it difficult to deduce the correct solution.

Additionally, the user expectations that were set by the styled, no tag environment made this behavior particularly annoying. Since the display looked so much like a word processor, it should behave like a word processor, particularly in this most basic of tasks.

The goal became to make backspace and delete work consistently and in a way that the users expect. This includes operating across element boundaries. This may involve skipping over tags that cannot be deleted to delete the next character, repositioning of the insertion point transparently, or other actions.

When a backspace or delete (a removal event) is triggered, it is expected that exactly one visible token is removed from the document. For a backspace, the removed token is to the left of the cursor, and for a delete, the token is to the right. In the case of a backspace, it is also expected that the cursor move one token to the left.

A token is defined as any visibly discernable character, a sequence of pre-generated characters, or a line break. A visibly discernable character is different from a single character because XMetaL displays multiple white space characters as a single white space character; a removal event must treat the sequence of white space in the source as a single token. Pre-generated CSS characters are atomic by default; the CSS engine is unable to partition a pre-generated sequence so that only part of it is rendered.

If the cursor is immediately adjacent to the token to be removed, the token is removed, and for a backspace event, the cursor is moved as well.

If markup exists between the current cursor position and the token to be removed, a removal event must handle that underlying XML markup. How this markup is dealt with depends upon its content, its parent content model, and its style. Markup can only be removed when all three dependencies indicate that a removal is safe. Otherwise, the markup must be retained. Any node corresponding to the intervening markup that meets any of the following conditions must be retained:

  • The node contains child nodes or descendants that have one or more tokens

  • The parent of the node requires it to be present

  • The CSS rule for the node indicates pre-generated text that is not a candidate to be removed by the event

In the case where a removal event is triggered on a selection of content, any tokens within the selection are removed. The three conditions above are then applied to any nodes corresponding to markup contained in the selection.

3.3. Cross container cut and paste

The majority of the work done by drafters involves composing new documents from selections from old ones. Often this requires taking a series of levels (such as paragraphs) and the introductory language at the end of the level above the series. A simplified example might be:

(c) CONSULTATION.— In developing the plan, the Council shall consult with—

(1) the Committee on Earth and Environmental Sciences;

(2) other appropriate Federal agencies; and

(3) members of the public.

And a drafter may wish to take the language from "the Council shall consult with" and copy and paste it into a new subsection that deals with reporting to Congress. The editing XML for this example is:

       <subsection><enum>(c)</enum><header>
       CONSULTATION</header><text>In developing the plan, the Council
       shall consult with--</text></subsection>

       <paragraph><enum>(1)</enum><text>the Committee on Earth and
       Environmental Sciences;</text></paragraph>

       <paragraph><enum>(2)</enum><text> other appropriate Federal
       agencies; and</text></paragraph>

       <paragraph><enum>(3)</enum><text> members of the
       public. </text></paragraph>

The functionality required would be to make a selection starting from after the comma in the subsection through to the end of the third paragraph.

Since XMetal does not allow cross-element selections and automatically extends the selection, an alternative "marker" technique is used:

  • the drafter places the cursor at one end of the desired selection,

  • selects the "mark defined text[2] start",

  • moves the cursor to the other end of the desired selection, and

  • performs the operation on the selection (cut, copy, etc.).

Having created a possibly unbalanced selection, there are two issues to address in order to perform an arbitrary copy and paste: making sure the clipboard contents are valid in the new location, and making sure that all the tags balance correctly. If a cut is performed, additionally the material must be removed and the resulting tags correctly balanced.

When a cut or copy operation is invoked, the application divides the clipboard material into three sections (all optional): an unbalanced beginning, a balanced middle, and an unbalanced end. Since the use of the editing DTD makes all levels peers, the balanced middle text portion may be placed without change in any location in the document.

So, given the example above, the clipboard would be:

beginning

the Council shall consult with--</text></subsection>

middle

[
    <paragraph><enum>(1)</enum><text>the Committee on Earth and
    Environmental Sciences;</text></paragraph>

    <paragraph><enum>(2)</enum><text> other appropriate Federal
    agencies; and</text></paragraph>

    <paragraph><enum>(3)</enum><text> members of the
public. </text></paragraph> 

end (empty)

When inserted at a paste location at the end of the text of

   <subparagraph><enum>(1)</enum><text>In responding to public comment
</text></subparagraph>
 

the resulting structure would be

    <subparagraph><enum>(1)</enum><text>In responding to public
    comment the Council shall consult with--</text></subparagraph>

    <paragraph><enum>(1)</enum><text>the Committee on Earth and
    Environmental Sciences;</text></paragraph>

    <paragraph><enum>(2)</enum><text> other appropriate Federal
    agencies; and</text></paragraph>

    <paragraph><enum>(3)</enum><text> members of the
    public. </text></paragraph>
 

Note that the orphaned close tag of the subsection has been transformed into the close tag of the subparagraph. Further note that the paragraphs remain as paragraphs, losing the hierarchy. This is a deliberate choice in response to user feedback: since LEXA could not always know the intent of the drafter's operation on the text, the behavior is to make the minimum changes to the text to keep the document valid.

3.4. Structure and Style Verification

Working in the relaxed editing environment without tags leaves open the possibility of errors in the XML document that may need attention by the drafter. These may be as simple as an empty set of tags that cannot be seen, a tag required by the exchange DTD that has been deleted, or a set of levels that are out of order and will require repair. In order to alert drafters that something needs to be reviewed, an interactive verification mechanism was created.

This component became known as the "Validation Manager," although that is a confusing piece of terminology because the one aspect of the document it does not check is the validity of the document against any of the DTDs.

The Validation Manager functions in a similar manner to a typical word processor spell checker. The document, or document fragment (selection), is scanned by the engine and potential problems are identified and displayed to the user alongside various potential solutions to the problem. There are several steps that the module goes through in order to accomplish this:

  • Matching of document content to rules

  • Solution lookup and display

  • User input

  • Document correction/modification based on user input

The user interface includes a list of problems, a description of the problem, a set of before and after views that will provide a quick glance at how a suggested fix will make the document appear, a set of suggested repair strategies, and options to fix the current problem, fix all similar problems, skip, or stop the validation check.

The bulk of the processing done by this module involves applying a set of rules to the current document and then suggesting fixes for any rules that were broken. Potential problem areas in the document may be defined in the rules definition by one or more of these three methods:

  • XPath

  • Regular expression(s)

  • Perl code

The rules encapsulate four functions for each kind of issue: finding the problem, identifying it to the drafter in an understandable way, identifying possible repairs, and previewing the repair for the drafter. For Perl code rules, these functions are defined by the programmer. For XPath and regular expression rules the validation manager will generate the solution preview by applying and then undoing a possible repair.

There are two types of ignore operations: ignore the rule entirely for this instance of the Validation Manager or ignore one instance of a broken rule during the current validation pass ("Skip"). Additionally, support for another type of ignore behavior should be implemented at a later time that would allow for one particular instance of a broken rule to be ignored for the current editing session, without ignoring the entire rule. At a later date support for remembering which rules have been ignored across editing sessions should be implemented.

The canonical example of the validation manager is the rule that examines the arrangement of levels to make sure that the order of the hierarchy is correct. Recall that in the editing DTD each level is a peer to the other levels, so performing this check involves comparing each level with its previous sibling. When an out of order level is detected, two repairs are suggested: moving the level to be equal to the previous level in the hierarchy or moving the level to be one step down from the level in the hierarchy. Depending on the amount of repair needed, a drafter may either use the validator to fix the level (perhaps causing the same problem on the next level) or stop the validator and correct the document manually.

This same mechanism is being used to check for stylistic errors, as well. These might include an out-of-order enumeration (which is sometimes an error and sometimes deliberate for historical purposes) or using the wrong length of a dash in a series of numbers. Again, all corrections are presented to the drafter as something that should be considered, but no changes occur without explicit permission.

4. LEXA

The primary objective of the LEXA project was to create an environment that satisfies two requirements: produce structured legislation in a way that comfortably fits into the way that Senate Legislative Counsel drafters want to work. Given a complex and restrictive document schema, provide enough flexibility to edit documents in the manner that SLC needed.

After long discussions with the drafters and examination of various tools and solutions, we identified solutions to the major user complaints about the other XML approaches:

  • A relaxed editing DTD yet a highly structured exchange DTD

  • Delete and backspace that worked like a word processor

  • Cut, copy, and paste functions that worked like a word processor

  • An user-friendly way to find structural issues in the XML without having to have users understand XML structure

Implementing these solutions allows the drafters to focus on content and not on XML. The LEXA environment is built to serve the needs of the drafters and not the needs of the document. This reduces the amount of XML they have to understand to a bare minimum: primarily the idea that there are "containers" such as a block quote or a header. This idea of containment matches up well with a concept that already exists within the legacy editing environment (start codes and end codes). The editing actions are natural and fit into the manner that the drafters want to work.

While the final results were not as perfect as the vision, LEXA comes very close. Instead of needing to understand the tag structure of an XML schema, drafters need to understand the legal structure of legislation. This keeps the subject matter expert immersed in the subject and not in the details of the technology.

The result is a new editing environment that the drafters want to start using instead of an application that drafters are forced to use. This bank of goodwill has allowed us to clear some of the setbacks natural to any new application deployment because even when flaws and misunderstandings arise, the drafters know that we are responding to their issues and their style of work instead of imposing a style that fits the way the document is structured.

Biography

Senior Systems Analyst
Office of the Secretary

Mr. Gelman is the Senior Systems Analyst for the Legislative Information System Augmentation Project in the Office of the Secretary of the United States Senate. He has more than a decade of experience in information technology and software systems development across the commercial, education, and government sectors. The problem domains he has worked on include: XML authoring environments, knowledge management, simulation, game theory, groupware, software development environments, statistical analysis, and educational software. He holds a B.S. in Mathematics/Computer Science from Carnegie Mellon University.

Systems Analyst
Office of the Secretary

Chris Ingrassia has been involved with XML technologies since 1999, including early work on XML transaction services and XML based distributed computing. He has been involved in software development and systems administration for the past 7 years.

Software Specialist
Office of the Sergeant at Arms

Roehl Sioson is a Software Specialist with the Legislative Systems Department of the United States Senate Office of the Sergeant at Arms. His primary role is to facilitate and support XML practices at the Senate. He has been involved with XML technologies since 1998, starting at SoftQuad Software, where he helped establish their XML Professional Services division.



[1] All of the XML examples have had extraneous attributes removed for readability.

[2] The term "defined text" matches the selection terminology used in the legacy drafting application.