The dominant XML table models (HTML, OASIS CALS, TEI, et al.) are presentationally oriented and do not reflect the meaning of the table. Semantically, tables are complex data structures that (at a minimum) associate a two-dimensional matrix (row and column position) with a value (cell content). Nothing inherent in most table data dictates that one dimension should be expressed forever in rows and the other in columns, but most models nonetheless require that the author determine the layout of rows and columns at encoding time, as if one presentation were inherently more natural or correct than the other. For years some have argued that an information-based table model would overcome these limitations, but the early discussions were before XSLT, nearly-ubiquitous presentational transformations, and spreadsheets that “save as XML.” Is it time to reopen the table-as-data debate? The author presents a table model that bears a close relation to spreadsheet models, but that is sensitive to the needs of XML authoring.
Keywords: Office Documents; Transforming
| XML Source | PDF (for print) | Author Package | Typeset PDF |
Two-dimensional tables logically and tautologically consist of two dimensions, traditionally rendered as rows and columns, where each cell is associated with or belongs to one row and one column.1 Which of these two associations should be arrayed along the horizontal dimension (columns follow one another horizontally) and which along the vertical dimension (rows succeed one another vertically) is purely a rendering issue, and from an abstract perspective, neither of the two associations is logically horizontal or vertical. That is, neither dimension is abstractly or logically a row but not a column or vice versa.
In structured documents tables are traditionally encoded as consisting of rows, which consist of cells, without regard to the logical model described above. The row-based presentational model is found in HTML [Hypertext Markup Language][HTML], TEI [Text Encoding Initiative][TEI P5], OASIS [Organization for the Advancement of Structured Information Standards] CALS [Continuous Acquisition and Life-cycle Support][CALS], and others.2 For example, an HTML <table> element contains <tr> (table row) elements, which, in turn, contain <td> (table data) elements, or cells.
The three standards mentioned above use the following terms for their components:
| HTML | TEI P5 | OASIS CALS | |
|---|---|---|---|
| Row | <tr> | <row> | <row> |
| Cell | <td> | <cell> | <entry> |
In the table immediately above, the header row conveys to the reader the semantics of the columns (the name of the standard from which the terms in that column are taken) and the leftmost column conveys the semantics of the rows (the abstract structural element represented in the three standards). Every cell in this table has a row and a column membership, and neither is inherently more important than the other. Among other things, this means that one could express exactly the same information by swapping the rows and columns (turning the tables?):
| Row | Cell | |
|---|---|---|
| HTML | <tr> | <td> |
| TEI P5 | <row> | <cell> |
| OASIS CALS | <row> | <entry> |
Because the difference between these two tables is purely presentational and not informational, truly descriptive markup would not encode a decision in the document instance about which dimension should be rendered as rows and which as columns, and would leave that specification instead to the transformation and presentation stage.3
The underlying markup for the first of the preceding tables in the modified OASIS CALS DTD [Document Type Definition] used for the Proceedings of the Extreme Markup 2007 conference looks as follows:
<table>
<tgroup cols="4">
<thead>
<row>
<entry/>
<entry>HTML</entry>
<entry>TEI P5</entry>
<entry>OASIS CALS</entry>
</row>
</thead>
<tbody>
<row>
<entry><highlight style="bold">Row</highlight></entry>
<entry><tr></entry>
<entry><row></entry>
<entry><row></entry>
</row>
<row>
<entry><highlight style="bold">Cell</highlight></entry>
<entry><td></entry>
<entry><cell></entry>
<entry><entry></entry><
</row>
</tbody>
</tgroup>
</table>
That logical membership of a cell in a row is encoded by establishing the <cell> element as part of the content model of the <row> element, while logical membership of a <cell> in a column is not reflected in any direct way by the syntax, means that two equivalent semantic relationships are treated very differently, which introduces a philosophically disquieting mismatch between the semantics and the syntax. Furthermore, this mismatch contradicts a fundamental principle of much of XML (and TEI): presentational information does not belong in descriptive markup.4 This mismatch engenders two practical rendering problems: 1) one must decide when encoding the document which value constitutes the row membership and which the column membership, that is, which is to be rendered along the horizontal axis and which along the vertical, and one is then locked into that decision; and 2) mapping from an encoding with the dimensions arrayed one way to a rendering with the dimensions arrayed the other way becomes fiendishly complicated.5
Finally, the data in this type of table is not easily mined for information because a particular cell content is not associated in any clear syntactic way with its row and column labels. Note in this context that column labels may be encoded explicitly as labels (by embedding them inside a <thead> element), but the row labels (often called “stubs”) are marked up no differently from data cells (except for the <highlight> element, which is plainly presentational, rather than descriptive). That is, except for position or added presentational (typographic) markup, the XPath to a stub is the same as the XPath to a data cell, and an XPath intended to address all data cells must use an explicit predicate to exclude the stubs. Aside from the practical clumsiness imposed by this requirement, the model essentially treats stubs (which contain metadata about data cells) and the contents of the data cells themselves identically, which means that it fails to use generic identifiers to distinguish metadata from data.6 This is uncomfortably redolent of the plain text files one finds in Project Gutenberg and other projects that eschew structured text markup entirely, where only position serves to identify metadata.
Several markup theorists have asserted or implied that tables are inherently presentational; that it is misguided, impossible, or impractical to mark them up semantically; and that they constitute an argument against attempting to maintain a strict separation of content and presentation. For example, Kimber states that “the purpose of table markup is to define the visual presentation of the information within a grid, not to model the relationships between it.”[Kimber 1993] Travis picks this up, asserting that it is a “fundamental truth” that “[t]ables are inherently format-oriented.” His argument in support of this assertion is that SGML [Standard Generalized Markup Language] document analysis normally involves identifying the semantics of presentational features, but “when the discussion gets around to tables, even the moderator joins into a debate over how to assign column widths and border sizes. Some groups give up trying to develop their own ‘table model’ and decide whether to use the CALS or AAP models.” This, clearly, is an argument not about tables (as Travis believes), but about document analysis. And although Travis has said that “[t]ables are inherently format-oriented,” later in the same article he seems to state that format is imposed on, rather than inherent in, table data: “[t]ables are a format-oriented means for publishing a set of data, just as bold, italic, and centered are. In order to keep your data as usable as possible, all elements should be described in terms of what they are, not how they look.”[Travis 1995]
Similarly, drawing on an eBook by Dorothea Salo ([Salo 2001]7), Hillesund asserts that it is difficult to encode the meaning underlying tables:
Using columns and rows, the cells of tables can at the same time show data values of two different variables. This information is literarily “shown” in a visual, two-dimensional way and the strength of table layout is that readers can easily compare values within different cells. Meanings in tables are expressed by the combination of data (words and figures) and visual layout.
There is no problem in marking up tables in XML, but the whole meaning conveyed by a table will not reveal itself before the elements are formatted and presented in a visual, two-dimensional way, either on paper or on screen. This shows that the meanings of the tables, their contents, bear on visual representation and cannot fully be captured by the structural logic of XML, irrespective of its hierarchical character. Which again proves Salo right, that some obvious structures and visual distinctions are hard, not to say impossible, to express in XML. The example of tables is a clear illustration of the fact that in visual-based publications there is no such thing as a sharp distinction between content and formatting.[Hillesund 2002]
Hillesund’s argument, like Travis’s, misses the point. Part of the meaning of tables is indeed that the cell entries are associated with values along two dimensions in a visually perspicuous way, but this does not mean that a graphic, two-dimensional grid layout is the only sensible way of representing those associations. The issue is not that tables are inherently format-oriented, but that document analysts have traditionally done a poor job of identifying the semantics underlying the presentational features of tables and modeling those semantics in clear and legible XML. As a result, they have fallen back on format-oriented models not because the data demands it, but because they were unable to conceptualize the problem apart from the rendering issues.
Kimber argues that:
[…] for modular information [in technical documentation], tables are purely a style choice applied to the information. This suggests that from this perspective, a technical information DTD would never have table elements in it, but the presentation application would have a way of applying table presentation to any elements in the DTD […]
[…] within technical documentation, tables are purely a presentation style applied to information, no different from fonts or justification or any of the other visual effects we apply to information. This approach does require more thought on the part of authors. The cognitive process of transforming information structures into tabular form is often an unconscious process, especially in well-trained writers who have spent a lot of time developing effective tabular presentations of complex data. Now they must think more about the inherent structures of their data, capture that, and then apply tabular presentation, when appropriate.[Kimber 1993a]
From this perspective, a two-column table of questions and answers could alternatively be conceptualized (and marked up) as a definition list (in the HTML sense of the term) with no loss of information, at least as long as one allows that the answer to a question is analogous to the definition of a term. A series of paragraphs could be conceptualized (and marked up) as either a one-column table (with each paragraph in a new cell) or a list (with each paragraph as a new list item). More generally, the information in most tables could alternatively be presented in running prose.[Travis 1993] And, as Kimber goes on to argue, even complex relationships can be described in a way that is independent of eventual tabular representation during rendering. This is precisely the approach advocated below, where the logical abstract table model encodes the properties of an item (cell contents) not in terms of rows and columns, but in terms of its semantic association with concepts that may eventually (but not at encoding time) be rendered as row or column labels.
If, then, by table one means a data structure in which objects have row and column membership, a table is unquestionably presentational. If, on the other hand, one means a data structure where objects are associated simultaneously with values along two (or, potentially, more) abstract dimensions, which may (or may not) be represented graphically as rows or columns, a table is an abstract and logical structure that is often, but not obligatorily, represented in a particular way. This report takes the latter approach, and argues that tables can be encoded in the document instance in a way that encodes the meaning but not the layout, and that locates layout (rendering) decisions not in the encoding process, but in the transformations intended to produce a rendered final-form representation.
That tables can be conceived in non-presentational terms, without regard to how they will be rendered on paper or on the screen, is not a new idea. For some earlier reflections on the ontology of tables, see, for example, the many articles on this topic that appeared in <Tag> in the 1990s.[Harding 1995][Peterson 1994][Peterson 1996][Travis 1993][Travis 1993a][Travis 1995][Travis 1995a][Waldt 1990][Waldt 1992] Waldt, for example, emphasizes the difference between the informational and the presentational when he writes that “[…] the fact that the number ‘3’ occurs in a column with a heading of ‘Big Things’ tells you that there are 3 Big Things. This intellectual relationship is more important than the physical representation—that is unless you have to compose the table for typesetting.”[Waldt 1990] Nonetheless, for Waldt, the question of whether the table models information or a view of information is open: “if I flip the matrix or summarize the table differently […], is it a new table or a different view of the same information?”[Waldt 1990] In keeping with the informational orientation of descriptive XML markup, the present report prioritizes the informational answer: since flipping the matrix does not change the information, it changes only the presentation (view) of the table, and not the abstract table itself.
Most of the <Tag> articles mentioned above regard tables from a flat-file (not relational, about which see below) database perspective, where table rows might be seen as corresponding to database records and the cells in table rows as corresponding to database fields. Thus, for example, Harding distinguishes a table display style from a table as a logical structure, but even in his logical structure, rows and columns have inherently different natures:
[…] a tabular structure is one that provides the same pieces of data for each of some number of “things”. For example, we might have an “employee” table that associates with each employee his or her name, social security number, address, company phone extension, etc. We generally think of adding a new “thing’s” data (data about an additional employee, in our example) as not changing the structure of the table, but adding or deleting a data item about each “thing” (such as adding the employee’s birthday) results in a different table. [Emphasis in the original][Harding 1995]
Since adding or removing a row in a table in one of the structured text standards mentioned above does not entail changes to any other existing row element, while adding or removing a column requires modifying every row element, one might interpret the preceding as suggesting a comparison of rows with records and columns with fields.
This sort of flat-file database conceptualization of tables is sensible in the database world, but most of the structured text standards listed above actually originated in a context that was concerned primarily with the eventual two-dimensional display of what are often called “document-centric” documents.8 Within this context a more appropriate analogy for a table in a structured text framework might be a spreadsheet table, rather than a database table. A spreadsheet table (such as the TEI Council assignment table or the instructor office hour examples below) treats the two dimensions equivalently and does not obligatorily distinguish records from fields. Thus, in the latter example, adding another day to the Monday-through-Friday dimension means only that instructors may have weekend office hours, while adding another time slot to the 11:00 a.m.-to-5:00 p.m. dimension means only that instructors may arrive earlier in the day and stay later. Neither of these changes is inherently more record- or field-like than the other. Such tables can be mapped onto a flat file database table, but the mapping would hold only superficially, since neither a day nor a time slot is inherently a record or a field.
If one wished to pursue the flat-file database analogy, it is a fact that days may be subdivided into time slots in a way that time slots cannot be subdivided into days, which might suggest that days should be compared to records and time slots should be compared to fields.9 Note, though, that this logical relationship is not what OASIS CALS or TEI or HTML tables encode. In all of these models, one can add rows without revising existing rows, but adding a column means revising all rows, which is to say that only rows can represent the record-like dimension and only columns can represent the field-like dimension (see the quotation from Harding, above). This presentational fact is hard-wired into the markup schemas in a way that is completely unrelated to whether the user wishes to represent the records as rows or as columns. For example, in case of the instructor office-hour schedule, such schedules are often rendered on paper with the days of the week across the top and the times running down the side primarily because printing on paper is usually oriented in portrait mode, so that the paper is longer than it is wide, and in many schedules there may be more time slots than there are days of the week. In such cases, the particular dominant spatial orientation of the printed schedule is primarily a consequence of the cultural accident that we have come to prefer long paper to wide paper.10 Meanwhile, though, because all the standards listed above prioritize rows over columns syntactically (that is, as far as the syntax is concerned, rows consist of cells, but columns do not consist of cells), this means that the containment model in the syntax may contradict the fact that days contain time slots but time slots do not contain days. Clearly the record/field analogy is not a significant consideration in the syntactic design of these standards or the encoding of such tabular information as schedules.
Within a relational database model of the schedule of office hours, on the other hand, one might normalize the data by regarding days, time slots, and instructors as separate one-column tables, to be joined in a three-column scheduling table. The model advanced below is close to this type of normalization, but it conceptualizes the problem within an XML context.
Encoding a logical abstract table model without reference to presentation is straightforward. Consider the following example, where one dimension specifies some members of the TEI Council (<persons>) and the other specifies the TEI duties assigned to these people at the April 2007 Berlin Council meeting (<duties>).
<?xml version="1.0" encoding="UTF-8"?>
<root>
<persons>
<person person="David">David</person>
<person person="Matthew">Matthew</person>
</persons>
<duties>
<duty duty="Tables">Tables</duty>
<duty duty="Stemma">Stemma</duty>
</duties>
<assignments>
<assignment person="David" duty="Tables">as soon as possible</assignment>
<assignment person="David" duty="Stemma">get a draft to Matthew next week</assignment>
<assignment person="Matthew" duty="Stemma">review David's draft</assignment>
</assignments>
</root>
The steps to encode and process a table according to the logical table model are straightforward:
Order within each of the two lists of assignment targets (<persons> and <duties> in the example above) often needs to be (and is therefore assumed to be for this illustration) specified. Reordering in algorithmic ways (e.g., alphabetically) is possible during processing. Neither list is presumed to be inherently horizontal (column headings) or vertical (stubs) because their physical arrangement is not part of a logical (rather than presentational) abstract table model.
Each <assignment> specifies three features: the association with a specific member of each of the two dimensions and the content. In the accompanying example, the content of the <assignment> is a specification of how or when each Council member is expected to complete each task. During eventual rendering, either the names of the Council members may be arrayed across the top (labeling the columns) and the tasks along the left (labeling the rows) or vice versa. In either case the cells will contain the aforementioned specifications.
Note that an additional advantage of the non-presentational model is that it is also more directly suitable for data mining than the presentational models because the association of the cell content with both dimensions is encoded more directly, in a way that is more immediately accessible. In connection with the present illustration, one could, for example, construct equally simple sets of XPaths to determine what David is supposed to do (connect //assignment[@person="David"] to //assignment[@person="David"]/@duty) or who is responsible for which aspects of the stemma proposal (connect //assignment[@duty="stemma"] to //assignment[@duty="stemma"]/@person).
A RelaxNG schema (compact syntax) defining the grammar of the sample document is:
start = root
root = element root { persons, duties, assignments }
persons = element persons { person+ }
person = element person { attlist.person, text }
attlist.person = attribute person { xsd:ID }
duties = element duties { duty+ }
duty = element duty { attlist.duty, text }
attlist.duty = attribute duty { xsd:ID }
assignments = element assignments { assignment+ }
assignment = element assignment { attlist.assignment, text }
attlist.assignment =
attribute person { xsd:IDREF },
attribute duty { xsd:IDREF }
The ID/IDREF mechanism that associates assignments with persons and duties ensures that no duty is assigned to a non-existent person and no person is assigned a non-existent duty.
The following XSLT [eXtensible Stylesheet Language Transformations] stylesheet transforms the underlying XML source twice, with one rendition swapping the row and column dimensions with the other. This is intended to demonstrate that the new model is able to render the content of the source in either way with equal felicity. Note that all presentational information is introduced during transformation and rendering because presentational and rendering details are not properly part of the abstract table model.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output method="xml" indent="yes" doctype-system="http://www.w3.org/TR/xhtml1/DTD/
xhtml1-strict.dtd"/>
<xsl:template match="/">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>TEI Table Test</title>
</head>
<body>
<h1>TEI Table Test</h1>
<hr/>
<p>2007-04-27, David J. Birnbaum (<a href="djbpitt@pitt.edu">djbpitt@pitt.edu</a>)</p>
<p>Generates two HTML tables from the same XML source, swapping columns and rows</p>
<hr/>
<h2>TEI Table Test with persons as columns and duties as rows</h2>
<table border="1">
<tr>
<th> </th>
<xsl:for-each select="/root/persons/person">
<xsl:variable name="person" select="@person"/>
<th>
<xsl:value-of select="."/>
</th>
</xsl:for-each>
</tr>
<xsl:for-each select="/root/duties/duty">
<xsl:variable name="duty" select="@duty"/>
<tr>
<th>
<xsl:apply-templates/>
</th>
<xsl:for-each select="/root/persons/person">
<td>
<xsl:variable name="person" select="@person"/>
<xsl:for-each select="/root/assignments/assignment">
<xsl:if test="@person=$person and @duty=$duty">
<xsl:apply-templates/>
</xsl:if>
</xsl:for-each>
</td>
</xsl:for-each>
</tr>
</xsl:for-each>
</table>
<h2>TEI Table Test with duties as columns and persons as rows</h2>
<table border="1">
<tr>
<th> </th>
<xsl:for-each select="/root/duties/duty">
<xsl:variable name="duty" select="@duty"/>
<th>
<xsl:value-of select="."/>
</th>
</xsl:for-each>
</tr>
<xsl:for-each select="/root/persons/person">
<xsl:variable name="person" select="@person"/>
<tr>
<th>
<xsl:apply-templates/>
</th>
<xsl:for-each select="/root/duties/duty">
<td>
<xsl:variable name="duty" select="@duty"/>
<xsl:for-each select="/root/assignments/assignment">
<xsl:if test="@person=$person and @duty=$duty">
<xsl:apply-templates/>
</xsl:if>
</xsl:for-each>
</td>
</xsl:for-each>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
Applying this stylesheet to the source XML produces XHTML output that includes the following tables:
| David | Matthew | |
|---|---|---|
| Tables | as soon as possible | |
| Stemma | get a draft to Matthew next week | review David's draft |
| Tables | Stemma | |
|---|---|---|
| David | as soon as possible | get a draft to Matthew next week |
| Matthew | review David's draft |
The preceding example contains no row or column spanning. Encoding HTML or TEI or OASIS CALS tables by hand (without a WYSIWYG [What You See Is What You Get] editor) is famously difficult once row spanning enters the picture because when a cell spans two rows, its contents and the fact that it spans multiple rows is recorded in the first of the two, but the contents of the second row (down into which it stretches, as it were) must also omit a cell specification to make room for the spanning. That the markup in the second row must be adjusted in conformity with markup in the first means that some rows contain fewer cells than there are columns, but nothing within the markup for that particular row identifies which columns are the locus of information that spans down from a preceding row. Keeping count of which cells to omit when encoding the information for a row can be frustratingly error-prone, particularly in large tables.
The logical (non-presentational) table model advocated here is much simpler in this respect because all information about the spanning is contained in a single place (the <assignment> element), and no adjustments are required elsewhere in the table.11
The following table excerpt represents the office hours for instructors in a team-taught course.12 One dimension records the days of the week and the other the time (in half-hour blocks). The contents of the cells consist of the names of the instructors who are available in their offices at a particular time on a particular day.13
| Monday | Tuesday | Wednesday | Thursday | Friday | |
| 11:00 | Marquette | Birnbaum | Post | ||
| 11:30 | |||||
| 12:00 | Fraser | ||||
| 12:30 | |||||
| 1:00 | Jimerson | Konsko | |||
| 1:30 | |||||
| 2:00 | Birnbaum | ||||
| 2:30 | |||||
| 3:00 | Konsko | Jimerson | |||
| 3:30 | |||||
| 4:00 | Fraser | ||||
| 4:30 |
In the preceding table, note that all instructors are in their offices for more than half an hour (represented by one row) at a time, which means that each non-empty cell spans multiple rows.14 The top two data rows look as follows in the underlying OASIS CALS markup:
<row>
<entry><highlight style="bold">11:00</highlight></entry>
<entry morerows="3">Marquette</entry> <!-- Monday -->
<entry/> <!-- Tuesday -->
<entry morerows="1">Birnbaum</entry> <!-- Wednesday -->
<entry/> <!-- Thursday -->
<entry morerows="3">Post</entry> <!-- Friday -->
</row>
<row>
<entry><highlight style="bold">11:30</highlight></entry>
<!-- No cell for Monday, which is spanned from above-->
<entry/> <!-- Tuesday -->
<!-- No cell for Wednesday, which is spanned from above-->
<entry/> <!-- Thursday -->
<!-- No cell for Friday, which is spanned from above-->
</row>
Encoding this information in a logical abstract table model might produce something like the following:
<table>
<days>
<day day="Monday">Monday</day>
<day day="Tuesday">Tuesday</day>
<day day="Wednesday">Wednesday</day>
<day day="Thursday">Thursday</day>
<day day="Friday">Friday</day>
</days>
<times>
<time time="t1100">11:00</time>
<time time="t1130">11:30</time>
<time time="t1200">12:00</time>
<time time="t1230">12:30</time>
<time time="t0100">1:00</time>
<time time="t0130">1:30</time>
<time time="t0200">2:00</time>
<time time="t0230">2:30</time>
<time time="t0300">3:00</time>
<time time="t0330">3:30</time>
<time time="t0400">4:00</time>
<time time="t0430">4:30</time>
<time time="t0500">5:00</time>
</times>
<assignments>
<assignment day="Monday" startTime="t1100" endTime="t0100">Marquette</assignment>
<assignment day="Monday" startTime="t0200" endTime="t0300">Birnbaum</assignment>
<assignment day="Monday" startTime="t0300" endTime="t0400">Konsko</assignment>
<assignment day="Monday" startTime="t0400" endTime="t0500">Fraser</assignment>
<assignment day="Wednesday" startTime="t1100" endTime="t1200">Birnbaum</assignment>
<assignment day="Wednesday" startTime="t1200" endTime="t0100">Fraser</assignment>
<assignment day="Thursday" startTime="t0100" endTime="t0200">Jimerson</assignment>
<assignment day="Thursday" startTime="t0300" endTime="t0400">Jimerson</assignment>
<assignment day="Friday" startTime="t1100" endTime="t0100">Post</assignment>
<assignment day="Friday" startTime="t0100" endTime="t0200">Konsko</assignment>
</assignments>
</table>
As with the non-spanning example earlier, the first step is to list the members of each dimension (<days> and <times> in the present case) in order. Note that each <time> element carries an attribute to indicate when that time period begins, but not when it ends. This means that if an office hour is encoded as lasting from, say, 11:00 to 1:00, it does not actually include the time period that begins at 1:00, a fact that must be addressed during eventual transformation and rendering (see the XSLT below). The 5:00 time slot exists only because it can be the end of an office hour span; its @time attribute can be target of the @endTime attribute of an <assignment> element, but not of its @startTime attribute. The attribute values are a bit clumsy (a value that simply listed the time, e.g., 11:00, would seem more natural) because ID/IDREF validation is possible only if IDs conform to XML naming conventions, which requires (among other things) that 1) they not begin with a digit, and 2) they not contain a colon.
Because this particular table does not include assignments that span days, the <assignment> elements have only a single attribute for day (rather than separate start and end attributes), but the addition of another attribute would support spanning across the other dimension in situations where that might be desirable. In the present case it would be contrary to sense to encode spanning across days even if an instructor were present at the same time on different days, since the instructor’s presence on those two days would not constitute a single office-hour assignment in the same was as presence in consecutive time block on the same day would.
Time spans are indicated using the twelve-hour clock, as expected by human users in the United States, where the twenty-four-hour clock is much less popular and is used much less widely than in many other countries. Programming the transformation script would have been easier with a twenty-four hour clock, but since the goals of the present model included ease of encoding and legibility, the twelve-hour clock was retained.
It is clear that the abstract data model illustrated above is much more legible (easier to encode, read, and maintain) than a presentationally oriented table model. Each <assignment> element contains in one place all of the information about a logical table entry: the day of the office hour, when it begins and ends (in a user-friendly twelve-hour clock), and who will be present in the office. This data is easily mined for reports based on any markup detail, whether it is when a particular instructor will be in the office or who will be in the office on a particular day at a particular time.15
The following RelaxNG schema (compact syntax) is capable of representing this type of table:
start = table
table = element table { days, times, assignments }
days = element days { day+ }
day = element day { attlist.day, text }
attlist.day = attribute day { xsd:ID }
times = element times { time+ }
time = element time { attlist.time, text }
attlist.time = attribute time { xsd:ID }
assignments = element assignments { assignment+ }
assignment = element assignment { attlist.assignment, text }
attlist.assignment =
attribute day { xsd:IDREF },
attribute startTime { xsd:IDREF },
attribute endTime { xsd:IDREF }
Although the logical abstract table model is very easy to encode, read, and maintain, transforming it to a presentationally oriented structure for rendering (for example, in HTML) is complex. It is impossible to avoid this complexity entirely; one deals with it either at encoding time (the presentationally oriented standards) or during transformation (the logical abstract approach). Because one may transform an unlimited number of tables using the same stylesheets, it is clearly most economical to pay the complexity tax in designing a stylesheet once, rather than in designing and encoding each and every table that will eventually need to be rendered.
Rendering is easier with column spanning than with row spanning because with column spanning all of the information in the output presentational model is contained in a single row element (<tr> in HTML or <row> in TEI or OASIS CALS).
If we choose at processing time to treat days as rows and time slots as columns, office hours will span multiple columns. The following XSLT stylesheet converts the logical table instance into this type of HTML presentation:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:output indent="yes"/>
<xsl:key name="cellByStartTime" match="assignment" use="@startTime"/>
<xsl:key name="time" match="time" use="@time"/>
<xsl:template match="/*">
<html>
<head>
<title>Office Hour Schedule</title>
</head>
<body>
<h1>Office Hour Schedules</h1>
<table border="1">
<tr>
<th/>
<xsl:for-each select="//time[not(position()=last())]">
<th>
<xsl:value-of select="."/>
</th>
</xsl:for-each>
</tr>
<xsl:for-each select="//day">
<tr>
<td>
<strong>
<xsl:value-of select="."/>
</strong>
</td>
<xsl:apply-templates select="//times/time[1]">
<xsl:with-param name="dayDay" select="."/>
</xsl:apply-templates>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
<xsl:template match="time">
<xsl:param name="dayDay"/>
<xsl:variable name="assignment" select="key('cellByStartTime', @time)[@day eq $dayDay]"/>
<xsl:choose>
<xsl:when test="$assignment">
<td
colspan="{count(//time[@time=$assignment/@endTime]/preceding-sibling::*) -
count(//time[@time=$assignment/@startTime]/preceding-sibling::*)}">
<xsl:value-of select="$assignment"/>
</td>
<xsl:apply-templates select="key('time', $assignment/@endTime)">
<xsl:with-param name="dayDay" select="$dayDay"/>
</xsl:apply-templates>
</xsl:when>
<xsl:otherwise>
<td/>
<xsl:apply-templates select="following-sibling::time[1]">
<xsl:with-param name="dayDay" select="$dayDay"/>
</xsl:apply-templates>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template match="time[position()=last()]"/>
</xsl:stylesheet>
The output looks as follows:
| 11:00 | 11:30 | 12:00 | 12:30 | 1:00 | 1:30 | 2:00 | 2:30 | 3:00 | 3:30 | 4:00 | 4:30 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Monday | Marquette | Birnbaum | Konsko | Fraser | ||||||||
| Tuesday | ||||||||||||
| Wednesday | Birnbaum | Fraser | ||||||||||
| Thursday | Jimerson | Jimerson | ||||||||||
| Friday | Post | Konsko | ||||||||||
If we choose at processing time to treat days as columns and time slots as rows, office hours will span multiple rows. The following XSLT stylesheet converts the logical table instance above into this type of HTML presentation:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:output indent="yes"/>
<xsl:key name="cellByStartTime" match="assignment" use="@startTime"/>
<xsl:key name="time" match="time" use="@time"/>
<xsl:template match="/*">
<html>
<head>
<title>Office Hour Schedule</title>
</head>
<body>
<h1>Office Hour Schedules</h1>
<table border="1">
<tr>
<th/>
<xsl:for-each select="//day">
<th>
<xsl:value-of select="."/>
</th>
</xsl:for-each>
</tr>
<xsl:for-each select="//time[position() ne last()]">
<xsl:variable name="timeTime" select="."/>
<tr>
<td>
<strong>
<xsl:value-of select="."/>
</strong>
</td>
<xsl:for-each select="//day">
<xsl:variable name="dayDay" select="."/>
<xsl:choose>
<xsl:when test="//412) 765-1155[@day=$dayDay and @startTime=$timeTime/
@time]">
<xsl:variable name="cellCell" select="//assignment[@day=$dayDay
and @startTime=$timeTime/@time]"/>
<td
rowspan="{count(//time[@time=$cellCell/@endTime]/
preceding-sibling::*)
-
count(//time[@time=$cellCell/@startTime]/preceding-sibling::*)}">
<xsl:value-of select="$cellCell"/>
</td>
</xsl:when>
<xsl:otherwise>
<xsl:variable name="free">
<xsl:apply-templates select="//times/time[1]">
<xsl:with-param name="dayDay" select="$dayDay"/>
</xsl:apply-templates>
</xsl:variable>
<xsl:if test="$free/slot = $timeTime">
<td/>
</xsl:if>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
<xsl:template match="time">
<xsl:param name="dayDay"/>
<xsl:variable name="assignment" select="key('cellByStartTime', @time)[@day=$dayDay]"/>
<xsl:choose>
<xsl:when test="$assignment">
<xsl:apply-templates select="key('time', $assignment/@endTime)">
<xsl:with-param name="dayDay" select="$dayDay"/>
</xsl:apply-templates>
</xsl:when>
<xsl:otherwise>
<slot>
<xsl:copy-of select="."/>
</slot>
<xsl:apply-templates select="following-sibling::time[1]">
<xsl:with-param name="dayDay" select="$dayDay"/>
</xsl:apply-templates>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template match="time[position()=last()]"/>
</xsl:stylesheet>
The table at the beginning of this section illustrates the output of such a transformation.
Some tables invite permutations other than the mere swapping of rows and columns. In the office hour schedule, for example, the two representations discussed above use days and time slots for the dimensions and instructor names for the cell contents because both are designed to facilitate finding the days and times when an instructor (any instructor) will be available for consultation. Within this particular course, students are encouraged to bring their questions to any instructor, which means that this type of table fits their immediate needs. On the other hand, there are times when students need to see a particular instructor, e.g., when they need to discuss their performance in a particular discussion section taught by that instructor. For those purposes a table that arrays instructor names as one dimension and days as the other and renders the time spans as cell contents would be more useful, since such a table would group all of an instructor’s hours in one row or column. Such a table might look as follows:16
| Monday | Tuesday | Wednesday | Thursday | Friday | |
|---|---|---|---|---|---|
| Birnbaum | 2:00–3:00 | 11:00–12:00 | |||
| Fraser | 4:00–5:00 | 12:00–1:00 | |||
| Jimerson | 1:00–2:00, 3:00–4:00 | ||||
| Konsko | 3:00–4:00 | 1:00–2:00 | |||
| Marquette | 11:00–1:00 | ||||
| Post | 11:00–1:00 |
Or it might swap the row and columns, but retain the time spans as the cell contents:
| Birnbaum | Fraser | Jimerson | Konsko | Marquette | Post | |
|---|---|---|---|---|---|---|
| Monday | 2:00–3:00 | 4:00–5:00 | 3:00–4:00 | 11:00–1:00 | ||
| Tuesday | ||||||
| Wednesday | 11:00–12:00 | 12:00–1:00 | ||||
| Thursday | 1:00–2:00, 3:00–4:00 | |||||
| Friday | 1:00–2:00 | 11:00–1:00 |
A logical table model designed to support the complete interchange of all three values (days, times, persons) might use an instance structure like the following:
<table>
<days>
<day day="Monday">Monday</day>
<day day="Tuesday">Tuesday</day>
<day day="Wednesday">Wednesday</day>
<day day="Thursday">Thursday</day>
<day day="Friday">Friday</day>
</days>
<times>
<time time="t1100">11:00</time>
<time time="t1130">11:30</time>
<time time="t1200">12:00</time>
<time time="t1230">12:30</time>
<time time="t0100">1:00</time>
<time time="t0130">1:30</time>
<time time="t0200">2:00</time>
<time time="t0230">2:30</time>
<time time="t0300">3:00</time>
<time time="t0330">3:30</time>
<time time="t0400">4:00</time>
<time time="t0430">4:30</time>
<time time="t0500">5:00</time>
</times>
<persons>
<person person="Birnbaum">Birnbaum</person>
<person person="Fraser">Fraser</person>
<person person="Jimerson">Jimerson</person>
<person person="Konsko">Konsko</person>
<person person="Marquette">Marquette</person>
<person person="Post">Post</person>
</persons>
<assignments>
<assignment day="Monday" startTime="t1100" endTime="t0100" person="Marquette"/>
<assignment day="Monday" startTime="t0200" endTime="t0300" person="Birnbaum"/>
<assignment day="Monday" startTime="t0300" endTime="t0400" person="Konsko"/>
<assignment day="Monday" startTime="t0400" endTime="t0500" person="Fraser"/>
<assignment day="Wednesday" startTime="t1100" endTime="t1200" person="Birnbaum"/>
<assignment day="Wednesday" startTime="t1200" endTime="t0100" person="Fraser"/>
<assignment day="Thursday" startTime="t0100" endTime="t0200" person="Jimerson"/>
<assignment day="Thursday" startTime="t0300" endTime="t0400" person="Jimerson"/>
<assignment day="Friday" startTime="t1100" endTime="t0100" person="Post"/>
<assignment day="Friday" startTime="t0100" endTime="t0200" person="Konsko"/>
</assignments>
</table>A RelaxNG schema (compact syntax) capable of modeling this instance might look like:
start = table
table = element table { days, times, persons, assignments }
days = element days { day+ }
day = element day { attlist.day, text }
attlist.day = attribute day { xsd:ID }
times = element times { time+ }
time = element time { attlist.time, text }
attlist.time = attribute time { xsd:ID }
persons = element persons { person+ }
person = element person { attlist.person, text }
attlist.person = attribute person { xsd:ID }
assignments = element assignments { assignment+ }
assignment = element assignment { attlist.assignment }
attlist.assignment =
attribute day { xsd:IDREF },
attribute startTime { xsd:IDREF },
attribute endTime { xsd:IDREF },
attribute person { xsd:IDREF }
Note that in the tables above, there are two discrete (non-contiguous and non-overlapping) time periods associated with instructor Jimerson on Thursday: 1:00–2:00 and 3:00–4:00. A stylesheet intended to process this type of structure would need to include logic to deal with such situations.
Craig R. Sampson’s SASOUT table model was introduced at the SGML ’96 conference as an alternative to the CALS model.[Sampson 1996][Sampson 1997] Although the CALS model met the developers’ needs for printed (paper) output, they found that it was not capable of providing the varied views they needed for electronic publication.17
Within the SASOUT model, a <table> contains a hierarchical (nested) structure of columns (<col>) and subcolumns (<scol>) plus a similar structure for rows (<row>) and subrows (<srow>). The rows and subrows contain <cell> elements, which, in turn, contain pointers (in attribute values) to the parent rows (@pr) and columns (@pc). Sampson explains that “[g]roups of subcolumns can be thought of as horizontal spans” (p. 258) and “”[s]ubrows are a means of creating vertical spans,” supporting a structure like the one below (all SASOUT illustrations in this report are taken from or based on Sampson’s examples)[Sampson 1996]:
| Local Hosts | MVS | CMS | |
| Remote Hosts | Release | 6.07+ | 6.07+ |
| MVS | 6.07+ | APPC TCP/IP | APPC TCP/IP |
| 6.06, 5.18 | None | None | |
| CMS | 6.07+ | APP TCP/IP | APP TCP/IP |
| 6.06, 5.18 | None | None |
In a SASOUT model of the preceding table, the MVS and CMS rows each contain one cell (the leftmost, with the content “MVS” or “CMS”) plus two subrows (one for each release of the operating system included in the table). This is different from all of the representations of spanning discussed above, which treat a span as covering multiple rows, while the SASOUT model regards the spanning cell as falling into a single row, which is then divided into subrows for subsequent columns. This is particularly convenient for the “drill-down” style of table, where the user selects a row, then a subrow, and then a column, after which the system displays the contents of the cell at that intersection. Because all cells bear pointers to their parent rows and columns, it is possible to trace the path between any cell and its row and column labels (although in the case of subrows and subcolumns that path may not be immediate). This means that the encoding of the semantic relationships does not depend purely on the containment of <cell> elements within <row> elements (as is the case with the HTML, TEI, and CALS models).
On the other hand, although in “drill-down” tables row and column spanning occurs in the upper or leftmost cells that represent column and row labels, spanning may also occur within a table, as in the instructor office hour schedule above. The SASOUT model is capable of representing such structures, but that representation reveals a peculiar clash between the membership of a <cell> within a <row> as reflected in the syntactic containment (<cell> elements are part of the content model for <row> elements) and that relationship as reflected in the @pr (parent row) attributes. For example, the following table shows that there is a double-length session in the user track from 10:00 until (apparently) 12:00: