Case Study: Web Delivery of Point-in-Time Legislation
1. Introduction
This paper follows the development of a Web-based system for delivery of point-in-time legislation. Such a system allows an end-user to navigate a selected Act, Code etc. exactly as it stood on any given day in the past. At the beginning of this project, we already had a body of about 30MB of XML legislation files, with dates and cross-references already marked. We also had a Folio CD product, whose functionality had to be preserved, and if possible enhanced.
The development schedule was extremely tight, only four months being allowed for the development of a full prototype, ready for beta testing and subsequent market evaluation. Because of the limited time, it was decided to use a small team of three programmers, one manager, and a legal expert. This proved to be one of the factors behind the ultimate success of the project.
The project called for some novel techniques, including the development of a new browser-based GUI, a new way of mapping XML to relational database tables, and a new type of late-binding link. It also raised some serious questions about the relative performance of traditional Relational Database Management System (RDBMS) and “native� XML Object stores.
2. Point-in-Time: What and Why
It is an almost universal principle in Western legal systems that in both civil and criminal litigation, the law must be applied as of the date of the offense or point of dispute. For example, a contract hearing arising from an incident in October 1997 must be tried according to the law as it stood at that time, even if the hearing is held several years later and the relevant laws have changed substantially in the interval.
For this reason, lawyers must go to great lengths to ensure that they have access to the correct version of legislation at any given date in the past. Traditionally, significant Acts or Codes are reprinted annually, or after major amendments have occurred. The lawyer must then take into account any amendments applied since the last available reprint. This can involve extensive and time-consuming research, and also requires storing copies of all recent reprints and amending legislation..
A point-in-time delivery system would allow the researcher simply to key in the required date and scan the text of the legislation, correctly amended for that specific date. A number of attempts to create such a system have been made in several countries (e.g. USA, Britain, Germany). Some have had partial success, but none have come close to delivering the full functionality described above.
The only two fully functional systems which we are aware of were developed, completely independently, in Australia. One is of course the subject of this case study. The other was commissioned by the Tasmanian State Parliament and based on the Royal Melbourne Institute of Technology SIM database. This system has some very advanced features, rapid and flexible search facilities, and a capacity to automatically apply legislative amendments to the principal Acts. It does however require amending legislation to be drafted in a particular format, and does not (to our knowledge) fully integrate case law, commentary etc.
Our system was developed on behalf of and in conjunction with TimeBase Pty Ltd, a Sydney company specializing in the electronic publication of Australian legislation and case law. TimeBase also invented the Multi Access Layer Technology (MALT) concept, which is specifically designed to handle the complexity of large ever-changing XML datasets. A number of technologies have arisen out of this idea, one of which (MALTweb) is the subject of this paper.
The MALT system was designed from the start to incorporate past legislation, and this of course precludes any restriction on the drafting format. It also readily handles (say) legislation developed in one jurisdiction and adopted in another, or jurisdictions which do not wish to impose a fixed set of drafting rules. The downside is that the incorporation of amendments into principal legislation sometimes requires human editorial intervention. We have however developed some quite sophisticated tools for parsing the text of an amendment and modifying principal provisions accordingly. In jurisdictions where wording of amendments is reasonably consistent, this will automatically apply up to 95% of all amendments.
This paper concentrates mainly on the development of the web delivery system, which took place after the initial legislation had been marked up and incorporated into a Folio based CD product.
3. Project Goals and Basic Architecture
The development team was deliberately kept small, consisting of just three programmers. One of these was the TimeBase chief programmer, who also served as a liaison between ourselves (Turn-Key) and the client (TimeBase). Christoph Schnelle (the TimeBase CTO), whose background is in mathematics and who devised the initial MALT model, kept in close touch with the project and provided considerable technical and procedural guidance. TimeBase also supplied legal experts and product design specialists to ensure the end result met the needs of legal researchers and worked well alongside other TimeBase products.
The first project planning meeting was held in early July 2000, with a working prototype expected by the end of October. Time was very clearly of the essence!
In short, the aim of the project was to develop a web-based delivery system for point-in-time legislation, which would:
-
be accessible via a standard browser
-
have a maximum response time of 2 seconds
-
be scalable from an initial 30MB dataset up to 1GB and beyond
-
deliver all the functionality of the existing Folio product
-
incorporate any new features which could be developed in the time allowed
-
require no special skills or training on the part of the end-user
It was immediately apparent that this would require a three tier architecture:
-
a browser-based GUI to control end-user interaction with the system
-
a data repository to hold the legislation in some readily accessible form
-
a web server to convert user requests into appropriate database calls, format and deliver database responses, and perform admin tasks such as user validation, help screens etc.
Since the TimeBase web site already made use of a web server running ColdFusion applications, the choice for the web server middleware was clear. It was also quickly decided to support Internet Explorer and Netscape 4.0 and above. Alternatives such as Mozilla and Opera were also considered, but with a less than 1% market share there was little point spending much time on them during prototype development. In the end, a scripting bug in Explorer meant that only version 4.1 and above would run the system correctly, while Netscape had (and continues to have) problems rendering the required frames correctly.
The choice of repository software was less clear cut. Due to the unusual nature of the proposed GUI, an internal solution was considered. However, to develop our own data handling software would divert scarce resources away from the main thrust of the project. Moreover the solution would very likely be less flexible, less stable, and less maintainable than using a commercial product. There was also talk about using software such as Microsoft Exchange, but it was decided that this would raise major questions about performance and scalability. In the end the choice was between a classical RDBMS (Oracle, SQL Server), or an XML Object Store (Excelon, Tamino).
The advantages of a relational database solution are clear. The technology is mature, reliable and well supported, and performance would be more than adequate. There were however two significant drawbacks. Firstly, the XML source data would have to be heavily pre-processed before being loaded into SQL tables. Also, we considered the facilities offered by XML queries to be more flexible than SQL SELECTs, and better suited to both the overall structure of the data and our end-user requirements.
The native XML object stores seemed an excellent choice. Our data could in principle be loaded directly into such a system, and we had already established that XPath based queries would provide most if not all of our required functionality. Our main concern was the relative immaturity of the technology, with its associated reliability and performance issues. We accordingly conducted preliminary trials with both Excelon and Tamino. Unfortunately, Tamino was unable to load our (quite modest) DTD correctly, and support within Australia at that time was limited. We therefore concentrated on Excelon, which installed and tested reasonably well.
At this point the TimeBase CTO stepped in. While the techies were still in evaluation mode, he had decided that the sort of comprehensive tests we were aiming at would jeopardize the timing of the entire project. He therefore decreed that an Excelon solution would be attempted. This turned out to be an extremely good decision, but not quite in the way we expected.
4. Storage XML v Product XML
The next significant decision was whether we should base our product on our existing core XML, or whether we should pre-process that XML (which was designed to be easily maintainable by both human editors and programs) into a form more suitable for fast retrieval.
We already had a conversion regime for the Folio product, in which significant elements (acts, chapters, sections etc.) were converted into a flat sequence of MALT nodes (not to be confused with DOM nodes). This has the effect of converting the hierarchical XML structure into a linear set of discrete objects. The division of a dataset into discrete well-defined chunks is implicit in the overall MALT strategy, whereby any alteration in a chunk results in an updated chunk being added to the dataset.
Our tests had confirmed that the core XML could support XPath queries which would drive the end product. However, two considerations led us to decide on a substantial pre-processing regime. These were:
-
Tests indicated that the indexing was more efficient with a flattened structure. That is, a query such as legislation/node worked better than legislation//section.
-
The flat structure was much more suitable for inclusion in an RDBMS table. Thus we had a fallback position if we ran into difficulties. In addition we deemed the technology to have a wider potential use if it could drive both XML object and relational databases.
We therefore adopted a strategy whereby the core storage XML was converted into a flat XML structure. For example, consider the XML fragment:
<part><label>2</label>
<title>General Provisions</title>
<note>commenced 3 Oct 1997</note>
<section><label>27</label>
<title>Definition of department head</title>
<subsec><label>(1)</label>
<p>A <term>Head of Department<term>
shall be taken to include an acting Head for
the purposes of this Part.</p>
This fragment is converted into two nodes, one corresponding to the part and one to the section. The choice of which elements trigger the start of a new node (the node element set) is made by the product designers to reflect the “natural� structure of the data. In this case the system delivers one node at a time to the end user, and a legislative section was deemed a good sized chunk for this purpose.
Note that while in the original XML the content of the section also belongs to the part, the part node ends at the start of the first section thus:
<node id="2471" base="2471" level="part">
<ancestry><ancestor ref="1"/>
. . . <ancestor ref="2471"/></ancestry>
<title class="part">Part 2 – General Provisions</title>
<body><p class="note">Editorial Note: commenced 3 Oct 1997</p>
</body></node>
This (simplified) node has four main segments:
-
The node element, which specifies a level (the original tag type), a node ID and a base ID. The node ID is a unique integer which reflects the position of the node within the document. The base ID is the ID of the original version of the provision (in time).
-
The ancestry element lists the IDs of all ancestor nodes from the root to the current node. This encapsulates the hierarchical structure lost during the flattening process.
-
The node title, which is required for various purposes (such as tables of contents), is stored separately.
-
The body (content) of the node, which has been pre-converted into XHTML to minimize run-time processing.
The section then becomes a completely independent node, a sibling to the part node (and every other node) in the flattened XML, thus:
<node id="2472" base="2474" level="section"> <ancestry> . . . <ancestor ref="2471"/> <ancestor ref="2472"/></ancestry> <title class="section">27. Definition of Department Head</title> <body> <p class="subsec">(1) A <span class="term">Head of Department</span> shall be taken to include an acting Head for the purposes of this Part.</p> </body></node>
5. GUI Design
While setting up the software to drive the system was a challenge, perhaps the most demanding task was the design of the end user interface. This was to be a browser based GUI which took advantage of normal browser functionality (e.g. Back button) where possible, and had a familiar look and feel.
In fact we established the basic GUI design quite early in the project. This was essential since we were effectively designing a new way of navigating data. Existing browsers allow horizontal (next, previous), vertical (up a level, home), and standard hyperlink navigation. We required that, from any provision, it should be possible to navigate directly to:
-
the next or previous provision
-
any ancestor provision
-
any version of the current provision in time
-
any provision amending, or amended by, the current provision
-
any provision marked by the editors as a related provision
-
any commentary describing the current provision
-
any case law concerning the current provision
and to support any future links which may be required, and to do it all without the end-user becoming hopelessly lost in a multidimensional viewing space!
Additional requirements were that the GUI make full use of existing browser features, that no unfamiliar mechanisms be implemented, that the look and feel of the GUI should fit in with the style of existing TimeBase products, and that an end-user should be able to use the system without any special skills or training. As you may imagine, this was no easy task and was clearly the most difficult and contentious single issue in the project.
We soon settled on the concept of a number of distinct “viewing axes� (corresponding to the point list above) along which the user could navigate from the current provision. The current provision, in turn, is specified in three ways:
-
By a verbal description such as Corporations Act 1989, s 127A.
-
By a reference frame, which lists the hierarchy from the root down to the current provision.
-
By a research date, which is adjustable by the user at any time, either by changing it directly, or following particular links.
The basic navigation (next, previous, up a level) are assigned specific buttons. To go higher in the hierarchy, the user clicks a link in the reference frame. Other viewing axes are accessed via a set of special “links� which, when clicked, bring up a list of possible targets (e.g. versions of the current provision, related cases). The user simply selects an item in the list and is taken to that location. The browser Back button can be used to return to any previous location. This interface is powerful, flexible, and extensible, and it works! User feedback has been very positive.
Finally, higher levels such as acts and chapters display their contents as a single level table of contents, which we refer to as a miniToc. These miniTocs are generated on the fly, and provide the name of the sublevel, a link down to that level, and (if appropriate) a section range and/or search hit count.
The way the design operates is best illustrated by a few screenshots from the final product. The first is the main screen of the Australian Corporations Law in July 2001.
The second shot is the same view some years earlier. Note the different chapter structure.
If we follow the miniToc links down to the section level, we see below the section title a set of links to the various available viewing axes (Amendments, Versions, Related Provisions, etc).
Finally, clicking on the Versions link brings up a choice of versions to link to. Links marked with * will force a change to the research date, as any version currently being viewed must be in scope as at the research date.
6. Late-binding Links
It became clear early on that point-in-time navigation would require “links� of an entirely new type. For example, imagine a provision which stated:
Penalties for offences under this section are set out in Schedule 3.
How do we mark the link to Schedule 3? In standard legislation, a simple hyperlink or ID/IDREF pairing would suffice. But in point-in-time markup there may be multiple versions of the Schedule in question. One solution would be to target the version in force when our provision was first set up.
This however would be wrong. The fact is that the precise research date, and hence the required version of Schedule 3 will not be known until run-time. There may indeed be several versions of the Schedule valid during the lifetime of our provision, so we have to target all of them.
We could of course resolve all such links during the construction of each content frame. But a large section could contain many such links, each of which would require a database call to resolve. Also, there is still the question of what sort of markup to use in the core XML.
The solution we came up with is the late-binding link In the core XML, the link is marked as follows:
<link legn-id="fed/act:1989/109:sch3">
The legn-id attribute is simply a pointer to Schedule 3 of the act. Since we do not know what version of the Schedule to use, we omit any reference to a date.
When the link is processed into XHTML, it is replaced by a JavaScript function call which initiates the following actions:
-
The browser sends the legn-id and the research date to the server.
-
The server formulates a database request.
-
The database returns the required node-id.
-
The server constructs the required node and returns it to the browser.
But as far as the end-user is concerned, the late-binding link looks and acts exactly like a normal hyperlink.
7. The Final Hurdle
By the end of October 2000, we had a working prototype. There were still a number of items outstanding – the search facility was quite crude, and the GUI was to undergo a major reworking. However the basics were all in place, and the system worked nicely on our 1MB test data.
But when we loaded the whole 30MB dataset onto Excelon, we got a dramatic fall-off in performance. Queries which had taken under a second on the test data now took 10 seconds or more. And since many queries are required to assemble a single screenful of data, we found ourselves waiting up to four minutes for the system to respond to a single click.
The whole of November was spent trying to address this situation. By judicious use of indexing, and additional pre-processing, we were able to reduce the delay by two thirds. But this was still well over the two second response requirement. While Excelon searches did return the correct values, performance of complex searches on larger datasets was just not up to scratch. In fairness to Excelon, they were aware of the problem and I believe that the more recent releases are much improved in this respect.
However, we had no time to wait for upcoming releases and so were forced to abandon the XML Object technology in favor of our fallback position, classical relational databases.
At this point our decision to flatten the original hierarchical structure proved to be a life saver. The sequential node structure was readily convertible to a set of RDBMS tables, and the query structure and indexing strategy could be replicated in SQL.
Our mapping strategy, unlike classical object relational mapping techniques, produces a fixed number of tables regardless of the size or complexity of the XML being mapped. While it is quite possible to map to a single table, we opted for a four table set to maximize performance. These tables are:
-
nodes — contains node-id, base-id, legn-id, level, and XHTML content.
-
ancestry — preserves the hierarchy as a set of ancestor/descendant node IDs.
-
titles — the title of the node. Note that some nodes (e.g. chapters) can change their title without changing their immediate content.
-
targets — links an amending provision with the provision(s) being amended.
The XPath based Excelon queries also mapped readily onto SQL equivalents. For example, to obtain the title of a provision (say node 2472) as it was when first enacted would (in simplified form) be:
node[@id = node[@id = 2472]@base]/title
in Excelon, and:
SELECT title FROM nodes N, titles T
WHERE N.base_id = T.node_id
AND N.node_id = 2472
in SQL.
All up it took us just over a month to rework the whole system to use a SQL data store.
The performance boost was dramatic. A complex query which had taken 36 seconds under Excelon took 0.31 seconds with SQL server. The rest, as they say, is history. Today the product is on-line with over 100MB of legislation in Australia (with no fall-off in performance), and we have received expressions of interest from the international market.
8. Lessons Learned
This project provided ample scope for triumph or catastrophe, and despite its eventual success we ran into a host of problems along the way.
Firstly, we gained a greater understanding of the capabilities and limitations of XML itself:
-
XML does work well with an RDBMS solution.
-
Your core storage XML should be designed to support every likely end use, but not necessarily directly.
-
Use XSLT or other conversion tools to create specialized XML, HTML, or even RDBMS tables as appropriate. Proper conversion strategies can hugely reduce the run-time demands on the system.
-
Native XML databases work correctly in a technical sense, but (as available in 2000) lack the sheer grunt of RDBMS technologies for compute intensive applications.
As well as markup and programming issues, the experiences of our development team raised some more general points:
-
Keep the development team as small as possible. Adding an extra person increases the amount of effort required to coordinate efforts, and the possibility for confusion both in project aims and areas of responsibility.
-
Development of cutting edge technology requires a close working relationship between developer and client. Every member of the team (all experienced personnel) made at least one significant error of judgement during the course of the project, and at least two major lines of work had to be abandoned. You must be prepared to admit your own mistakes, and tolerate the mistakes of others, if you want to create something really new. And with all the problems we encountered, the project still ended very successfully.
-
The morale of the team, the morale of the client, and the requirements of managers, marketers and product designers are better served by producing some sort of working prototype ASAP. Hours of poring through theory and design aims does not produce either the emotional impact, or the useful feedback, gained from a few minutes actually working with the system. Even a crude system with limited functionality will serve.
-
Finally, always plan the interfaces first, whether GUI or API. In our case we needed to anticipate the needs of a legal researcher using a standard browser, and the rest followed. To design a data retrieval system first, and only then see what sorts of user queries it can support, will never yield as good a result.
Turn-Key was an early convert to the XML cause. By the end of 1997 we had adopted XML as our preferred data format. At that time there were still many who said XML is a good place to start, and for complex operations people can always graduate to SGML. We have used XML on projects such as MALTweb through to advanced publishing applications, and never found it inadequate to the task.
We are faced with an ever increasing array of languages, software tools, and programming methodologies. But at least the choice of a data markup scheme is now clear. And this is perhaps the greatest of the many contributions that the XML drive has made to present day IT.

