XML 2001 logo

Case Study: Web Delivery of Point-in-Time Legislation

Geoffrey J Nolan <gjn@turnkey.com.au>

1. Introduction

This paper follows the development of a Web-based system for delivery of point-in-time legislation. Such a system allows an end-user to navigate a selected Act, Code etc. exactly as it stood on any given day in the past. At the beginning of this project, we already had a body of about 30MB of XML legislation files, with dates and cross-references already marked. We also had a Folio CD product, whose functionality had to be preserved, and if possible enhanced.

The development schedule was extremely tight, only four months being allowed for the development of a full prototype, ready for beta testing and subsequent market evaluation. Because of the limited time, it was decided to use a small team of three programmers, one manager, and a legal expert. This proved to be one of the factors behind the ultimate success of the project.

The project called for some novel techniques, including the development of a new browser-based GUI, a new way of mapping XML to relational database tables, and a new type of late-binding link. It also raised some serious questions about the relative performance of traditional Relational Database Management System (RDBMS) and “native� XML Object stores.

2. Point-in-Time: What and Why

It is an almost universal principle in Western legal systems that in both civil and criminal litigation, the law must be applied as of the date of the offense or point of dispute. For example, a contract hearing arising from an incident in October 1997 must be tried according to the law as it stood at that time, even if the hearing is held several years later and the relevant laws have changed substantially in the interval.

For this reason, lawyers must go to great lengths to ensure that they have access to the correct version of legislation at any given date in the past. Traditionally, significant Acts or Codes are reprinted annually, or after major amendments have occurred. The lawyer must then take into account any amendments applied since the last available reprint. This can involve extensive and time-consuming research, and also requires storing copies of all recent reprints and amending legislation..

A point-in-time delivery system would allow the researcher simply to key in the required date and scan the text of the legislation, correctly amended for that specific date. A number of attempts to create such a system have been made in several countries (e.g. USA, Britain, Germany). Some have had partial success, but none have come close to delivering the full functionality described above.

The only two fully functional systems which we are aware of were developed, completely independently, in Australia. One is of course the subject of this case study. The other was commissioned by the Tasmanian State Parliament and based on the Royal Melbourne Institute of Technology SIM database. This system has some very advanced features, rapid and flexible search facilities, and a capacity to automatically apply legislative amendments to the principal Acts. It does however require amending legislation to be drafted in a particular format, and does not (to our knowledge) fully integrate case law, commentary etc.

Our system was developed on behalf of and in conjunction with TimeBase Pty Ltd, a Sydney company specializing in the electronic publication of Australian legislation and case law. TimeBase also invented the Multi Access Layer Technology (MALT) concept, which is specifically designed to handle the complexity of large ever-changing XML datasets. A number of technologies have arisen out of this idea, one of which (MALTweb) is the subject of this paper.

The MALT system was designed from the start to incorporate past legislation, and this of course precludes any restriction on the drafting format. It also readily handles (say) legislation developed in one jurisdiction and adopted in another, or jurisdictions which do not wish to impose a fixed set of drafting rules. The downside is that the incorporation of amendments into principal legislation sometimes requires human editorial intervention. We have however developed some quite sophisticated tools for parsing the text of an amendment and modifying principal provisions accordingly. In jurisdictions where wording of amendments is reasonably consistent, this will automatically apply up to 95% of all amendments.

This paper concentrates mainly on the development of the web delivery system, which took place after the initial legislation had been marked up and incorporated into a Folio based CD product.

3. Project Goals and Basic Architecture

The development team was deliberately kept small, consisting of just three programmers. One of these was the TimeBase chief programmer, who also served as a liaison between ourselves (Turn-Key) and the client (TimeBase). Christoph Schnelle (the TimeBase CTO), whose background is in mathematics and who devised the initial MALT model, kept in close touch with the project and provided considerable technical and procedural guidance. TimeBase also supplied legal experts and product design specialists to ensure the end result met the needs of legal researchers and worked well alongside other TimeBase products.

The first project planning meeting was held in early July 2000, with a working prototype expected by the end of October. Time was very clearly of the essence!

In short, the aim of the project was to develop a web-based delivery system for point-in-time legislation, which would:

It was immediately apparent that this would require a three tier architecture:

Since the TimeBase web site already made use of a web server running ColdFusion applications, the choice for the web server middleware was clear. It was also quickly decided to support Internet Explorer and Netscape 4.0 and above. Alternatives such as Mozilla and Opera were also considered, but with a less than 1% market share there was little point spending much time on them during prototype development. In the end, a scripting bug in Explorer meant that only version 4.1 and above would run the system correctly, while Netscape had (and continues to have) problems rendering the required frames correctly.

The choice of repository software was less clear cut. Due to the unusual nature of the proposed GUI, an internal solution was considered. However, to develop our own data handling software would divert scarce resources away from the main thrust of the project. Moreover the solution would very likely be less flexible, less stable, and less maintainable than using a commercial product. There was also talk about using software such as Microsoft Exchange, but it was decided that this would raise major questions about performance and scalability. In the end the choice was between a classical RDBMS (Oracle, SQL Server), or an XML Object Store (Excelon, Tamino).

The advantages of a relational database solution are clear. The technology is mature, reliable and well supported, and performance would be more than adequate. There were however two significant drawbacks. Firstly, the XML source data would have to be heavily pre-processed before being loaded into SQL tables. Also, we considered the facilities offered by XML queries to be more flexible than SQL SELECTs, and better suited to both the overall structure of the data and our end-user requirements.

The native XML object stores seemed an excellent choice. Our data could in principle be loaded directly into such a system, and we had already established that XPath based queries would provide most if not all of our required functionality. Our main concern was the relative immaturity of the technology, with its associated reliability and performance issues. We accordingly conducted preliminary trials with both Excelon and Tamino. Unfortunately, Tamino was unable to load our (quite modest) DTD correctly, and support within Australia at that time was limited. We therefore concentrated on Excelon, which installed and tested reasonably well.

At this point the TimeBase CTO stepped in. While the techies were still in evaluation mode, he had decided that the sort of comprehensive tests we were aiming at would jeopardize the timing of the entire project. He therefore decreed that an Excelon solution would be attempted. This turned out to be an extremely good decision, but not quite in the way we expected.

4. Storage XML v Product XML

The next significant decision was whether we should base our product on our existing core XML, or whether we should pre-process that XML (which was designed to be easily maintainable by both human editors and programs) into a form more suitable for fast retrieval.

We already had a conversion regime for the Folio product, in which significant elements (acts, chapters, sections etc.) were converted into a flat sequence of MALT nodes (not to be confused with DOM nodes). This has the effect of converting the hierarchical XML structure into a linear set of discrete objects. The division of a dataset into discrete well-defined chunks is implicit in the overall MALT strategy, whereby any alteration in a chunk results in an updated chunk being added to the dataset.

Our tests had confirmed that the core XML could support XPath queries which would drive the end product. However, two considerations led us to decide on a substantial pre-processing regime. These were:

We therefore adopted a strategy whereby the core storage XML was converted into a flat XML structure. For example, consider the XML fragment:

<part><label>2</label>
<title>General Provisions</title>
<note>commenced 3 Oct 1997</note>

<section><label>27</label>
<title>Definition of department head</title>
<subsec><label>(1)</label>
<p>A <term>Head of Department<term>
    shall be taken to include an acting Head for
    the purposes of this Part.</p>

This fragment is converted into two nodes, one corresponding to the part and one to the section. The choice of which elements trigger the start of a new node (the node element set) is made by the product designers to reflect the “natural� structure of the data. In this case the system delivers one node at a time to the end user, and a legislative section was deemed a good sized chunk for this purpose.

Note that while in the original XML the content of the section also belongs to the part, the part node ends at the start of the first section thus:

<node id="2471" base="2471" level="part">
<ancestry><ancestor ref="1"/>
    . . . <ancestor ref="2471"/></ancestry>
<title class="part">Part 2 – General Provisions</title>
<body><p class="note">Editorial Note: commenced 3 Oct 1997</p>
</body></node>

This (simplified) node has four main segments:

The section then becomes a completely independent node, a sibling to the part node (and every other node) in the flattened XML, thus:

<node id="2472" base="2474" level="section">
<ancestry> . . . <ancestor ref="2471"/>
  <ancestor ref="2472"/></ancestry>
<title class="section">27. Definition of Department Head</title>
<body>
<p class="subsec">(1) A <span class="term">Head of
  Department</span> shall be taken to include an acting
  Head for the purposes of this Part.</p>
</body></node>

5. GUI Design

While setting up the software to drive the system was a challenge, perhaps the most demanding task was the design of the end user interface. This was to be a browser based GUI which took advantage of normal browser functionality (e.g. Back button) where possible, and had a familiar look and feel.

In fact we established the basic GUI design quite early in the project. This was essential since we were effectively designing a new way of navigating data. Existing browsers allow horizontal (next, previous), vertical (up a level, home), and standard hyperlink navigation. We required that, from any provision, it should be possible to navigate directly to:

and to support any future links which may be required, and to do it all without the end-user becoming hopelessly lost in a multidimensional viewing space!

Additional requirements were that the GUI make full use of existing browser features, that no unfamiliar mechanisms be implemented, that the look and feel of the GUI should fit in with the style of existing TimeBase products, and that an end-user should be able to use the system without any special skills or training. As you may imagine, this was no easy task and was clearly the most difficult and contentious single issue in the project.

We soon settled on the concept of a number of distinct “viewing axes� (corresponding to the point list above) along which the user could navigate from the current provision. The current provision, in turn, is specified in three ways:

The basic navigation (next, previous, up a level) are assigned specific buttons. To go higher in the hierarchy, the user clicks a link in the reference frame. Other viewing axes are accessed via a set of special “links� which, when clicked, bring up a list of possible targets (e.g. versions of the current provision, related cases). The user simply selects an item in the list and is taken to that location. The browser Back button can be used to return to any previous location. This interface is powerful, flexible, and extensible, and it works! User feedback has been very positive.

Finally, higher levels such as acts and chapters display their contents as a single level table of contents, which we refer to as a miniToc. These miniTocs are generated on the fly, and provide the name of the sublevel, a link down to that level, and (if appropriate) a section range and/or search hit count.

The way the design operates is best illustrated by a few screenshots from the final product. The first is the main screen of the Australian Corporations Law in July 2001.

The second shot is the same view some years earlier. Note the different chapter structure.

If we follow the miniToc links down to the section level, we see below the section title a set of links to the various available viewing axes (Amendments, Versions, Related Provisions, etc).

Finally, clicking on the Versions link brings up a choice of versions to link to. Links marked with * will force a change to the research date, as any version currently being viewed must be in scope as at the research date.

6. Late-binding Links

It became clear early on that point-in-time navigation would require “links� of an entirely new type. For example, imagine a provision which stated:

Penalties for offences under this section are set out in Schedule 3.

How do we mark the link to Schedule 3? In standard legislation, a simple hyperlink or ID/IDREF pairing would suffice. But in point-in-time markup there may be multiple versions of the Schedule in question. One solution would be to target the version in force when our provision was first set up.

This however would be wrong. The fact is that the precise research date, and hence the required version of Schedule 3 will not be known until run-time. There may indeed be several versions of the Schedule valid during the lifetime of our provision, so we have to target all of them.

We could of course resolve all such links during the construction of each content frame. But a large section could contain many such links, each of which would require a database call to resolve. Also, there is still the question of what sort of markup to use in the core XML.

The solution we came up with is the late-binding link In the core XML, the link is marked as follows:

<link legn-id="fed/act:1989/109:sch3">

The legn-id attribute is simply a pointer to Schedule 3 of the act. Since we do not know what version of the Schedule to use, we omit any reference to a date.

When the link is processed into XHTML, it is replaced by a JavaScript function call which initiates the following actions:

But as far as the end-user is concerned, the late-binding link looks and acts exactly like a normal hyperlink.

7. The Final Hurdle

By the end of October 2000, we had a working prototype. There were still a number of items outstanding – the search facility was quite crude, and the GUI was to undergo a major reworking. However the basics were all in place, and the system worked nicely on our 1MB test data.

But when we loaded the whole 30MB dataset onto Excelon, we got a dramatic fall-off in performance. Queries which had taken under a second on the test data now took 10 seconds or more. And since many queries are required to assemble a single screenful of data, we found ourselves waiting up to four minutes for the system to respond to a single click.

The whole of November was spent trying to address this situation. By judicious use of indexing, and additional pre-processing, we were able to reduce the delay by two thirds. But this was still well over the two second response requirement. While Excelon searches did return the correct values, performance of complex searches on larger datasets was just not up to scratch. In fairness to Excelon, they were aware of the problem and I believe that the more recent releases are much improved in this respect.

However, we had no time to wait for upcoming releases and so were forced to abandon the XML Object technology in favor of our fallback position, classical relational databases.

At this point our decision to flatten the original hierarchical structure proved to be a life saver. The sequential node structure was readily convertible to a set of RDBMS tables, and the query structure and indexing strategy could be replicated in SQL.

Our mapping strategy, unlike classical object relational mapping techniques, produces a fixed number of tables regardless of the size or complexity of the XML being mapped. While it is quite possible to map to a single table, we opted for a four table set to maximize performance. These tables are:

The XPath based Excelon queries also mapped readily onto SQL equivalents. For example, to obtain the title of a provision (say node 2472) as it was when first enacted would (in simplified form) be:

node[@id = node[@id = 2472]@base]/title

in Excelon, and:

SELECT title FROM nodes N, titles T
  WHERE N.base_id = T.node_id
    AND N.node_id = 2472

in SQL.

All up it took us just over a month to rework the whole system to use a SQL data store.

The performance boost was dramatic. A complex query which had taken 36 seconds under Excelon took 0.31 seconds with SQL server. The rest, as they say, is history. Today the product is on-line with over 100MB of legislation in Australia (with no fall-off in performance), and we have received expressions of interest from the international market.

8. Lessons Learned

This project provided ample scope for triumph or catastrophe, and despite its eventual success we ran into a host of problems along the way.

Firstly, we gained a greater understanding of the capabilities and limitations of XML itself:

As well as markup and programming issues, the experiences of our development team raised some more general points:

Turn-Key was an early convert to the XML cause. By the end of 1997 we had adopted XML as our preferred data format. At that time there were still many who said XML is a good place to start, and for complex operations people can always graduate to SGML. We have used XML on projects such as MALTweb through to advanced publishing applications, and never found it inadequate to the task.

We are faced with an ever increasing array of languages, software tools, and programming methodologies. But at least the choice of a data markup scheme is now clear. And this is perhaps the greatest of the many contributions that the XML drive has made to present day IT.

Glossary

MALT

Multi Access Layer Technology

RDBMS

Relational Database Management System

Biography

Geoffrey J Nolan
Senior Systems Engineer
Turn-Key Systems Pty Limited
Sydney
New South Wales
Australia
Email: gjn@turnkey.com.au Web: www.turnkey.com.au

Geoff graduated from Sydney University (Computer Science Hons 1) in 1978. After working for a Sydney software company, and in London for Shell (UK) Oil, Geoff joined Turn-Key Systems in 1984. His work since includes automating the composition of Yellow Pages phone books, co-developing one of Australia's earliest legal CD products, and designing DTDs for Australian Tax Office data. He has worked extensively in data conversion, DTD design, and utilizing XML in various publishing applications. He has also co-authored several patents related to repositories for complex XML datasets.