This paper describes a new query and transformation language XTL (An XML transformation language). XTL is based on both output driven and schema driven approach: 1) To specify an output structure of transformation using XML schema language (so far we choose DTD), 2) To map from well-formed input XML documents to an output structure using XPath that is embedded in DTD.
XTL has a simple syntax, as it is declarative and it has few extensions. Users only have to understand DTD and XPath specifications with few extensions and rules of XTL. XTL is powerful because it has efficient operations for extraction and transformation for XML data.
This paper also describes XTL processor to translate XTL expression to XSLT expression. This generator is useful for XSLT users who would like to transform XML, because XTL is much simpler than XSLT. They can use this generator as a front-end tool of XSLT.
Keywords: Querying; Transforming; XPath; XSLT
As data exchange (B2B and B2C) is getting focused, not only a standardization of XML subset like SMIL, MML, B-XML and so forth, but also transformation technology is getting important to exchange XML data in difference structure. XTL is a transformation language to output an XML data by querying and transforming from a collection of input XML data.
The related existing languages are:
|
XML query language (XQL[XQL FAQ], XML-QL[A query language for XML], Quilt[Quilt: an XML Query Language]) |
|
|
XSL Transformations (XSLT [XSL Transformations (XSLT) Version 1.0][XSL Transformations (XSLT) Version 1.1]) |
|
It is not easy to learn and to use because of their syntax complexity (XSLT, Quilt for example). |
|
|
Their functions are not efficient (XQL lacks restructure operation, XML-QL lacks structure preserving query, for example). |
|
To specify an output structure of transformation using XML schema language. So far, we choose DTD as the XML schema language because it is most popular in XML schema languages. XML Schema and RELAX will be good candidate of XML schema language for XTL. |
|
|
To specify mapping rule between input well-formed XML data and an output structure using XPath[XML Path Language (XPath) Version 1.0]. |
In addition, XTL's fundamental operations are based on relational query model as follows.
| Operation | Relational query model | XTL expression |
| Projection | SELECT clause | Projected elements or attributes are specified using DTD |
| Selection | WHERE clause | XPath's selection is used |
| Rename tag | FROM clause | Renamed element or attribute is specified using DTD |
| Set operation (union, difference, intersect) | UNION clause, - clause | Using XTL extension +, -, *, / operations for node-set specified by XPath. XPath 2.0 [XPath Language Requirement Version 2.0] will support these operations. |
| Cartesian product (join) | FROM clause (and WHERE clause, table1.column1 = table2.column2) | Using XSLT function document() |
| Sort | ORDER BY clause | Using XTL extension ORDER BY clause. |
| Eliminate duplicated node | DISTINCT or GROUP BY clause | Using XTL extension GROUP BY clause |
XTL includes several important functions of XML query as follows. We summarize the below table based on paper [XML Query Languages: Experiences and Exemplars] and adds several other important functions.
| Function | Description | XTL expression |
| Structure preserving | A query to preserve a structure of input XML data | By specifying DTD and XPath to preserve a structure of input XML data |
| Changing structure (including flattening) | A query to change a structure of input XML data | XTL can change a structure of input XML data by using DTD and XPath. |
| Tag variable | Keeping a same tag name with input XML | Use $variable as tag in DTD |
| External function | Invocating a user-defined function | XPath selection condition can use a user-defined function |
| Specifying all of sub-structures | To extract all sub-structures of a specified tag | Use ANY in DTD |
| Recursive query | A query that executed on a recursive structure | To define a recursive structure in DTD or recursive selection in XPath |
| Reference (data models and navigations) | Referring to a referenced tag | Referring to a referenced tag using XPath |
This section describes several examples to explain the XTL fundamental operations and functions.
The first is an example to project several tags of XML data while preserving structure. The input XML data is showed below as bib.xml (XML_QL examples).
<?xml version="1.0" ?>
<bib>
<book year="1995">
<!-- A good introductory text -->
<title>An Introduction to Database Systems</title>
<author><lastname>Date</lastname></author>
<publisher><name>Addison-Wesley</name></publisher>
</book>
<book year="1998">
<title>Foundations for Object/Relational Databases</title>
<author><lastname>Date</lastname></author>
<author><lastname>Darwen</lastname></author>
<publisher><name>Addison-Wesley</name></publisher>
</book>
<book year="1999">
<title>Data on the Web: from Relations to Semistructured Data & XML</title>
<author><firstname>Serge</firstname><lastname>Abiteboul</lastname></author>
<author><firstname>Peter</firstname><lastname>Buneman</lastname></author>
<author><firstname>Dan</firstname><lastname>Suciu</lastname></author>
<publisher><name>Morgan-Kaufman</name></publisher>
</book>
<article year="1999" type="inproceedings" month="June">
<author><firstname>Mary</firstname><lastname>Fernandez</lastname></author>
<author><firstname>Alin</firstname><lastname>Deutsch</lastname></author>
<author><firstname>Dan</firstname><lastname>Suciu</lastname></author>
<title>Storing Semi-structured Data Using STORED</title>
<booktitle>ACM SIGMOD</booktitle>
</article>
<article year="1995" type="inproceedings" month="Jan">
<author><firstname>Norman</firstname><lastname>Ramsey</lastname></author>
<author><firstname>Mary</firstname><lastname>Fernandez</lastname></author>
<title>The New Jersey Machine-Code Toolkit</title>
<booktitle>USENIX</booktitle>
</article>
</bib>
Let's suppose a query (or transformation) to make bibliography that contains only books and eliminates all articles. The next XTL expresses this query.
<!ELEMENT bib AS {bib} (book*)>
<!ELEMENT book (title, author+)>
<!ATTLIST book year CDATA #REQUIRED>
<!ELEMENT title (#PCDATA)>
<!ELEMENT author (firstname?, lastname)>
<!ELEMENT firstname (#PCDATA)>
<!ELEMENT lastname (#PCDATA)>
This XTL expression specifies required elements/attributes, bib, book and all its sub-structures in DTD. Therefore, it produces a bibliography that includes only books. To make an implement of the XTL processor easy, XTL follows a rule that any element that can be a root tag must declared with XPath expression (AS {XPath expression}) in its element type declaration. The XPath expression specifies which part of input XML data is mapped to a tag in the output structure. In this example, the bib tag in the input data is mapped to the bib tag in the output structure. When an XPath expression is omitted for an element in content model or an attribute in attribute list declaration, XTL follows a rule that the specified output XML tag is used as default XPath expression. For example, <!ELEMENT bib AS {bib} (book*)> has the same meaning with <!ELEMENT bib AS {bib} (book* AS {book})>.
However, it is a burden for users to specify all sub-structures of book because they don't transform it at all. XTL query using ANY in DTD solves this issue.
This example with ANY produces the same query result with a result of the previous XTL expression.
<!ELEMENT bib AS {bib} (book*)>
<!ELEMENT book ANY>
<!ATTLIST book year CDATA #REQUIRED>
ANY plays the same role of specifying the all sub-structures (title, author, firstname, and lastname) of book in this XTL expression. Basically, ANY matches all sub-structures recursively if there is no other element type declaration that matches the sub-structures.
ANY is powerful to change a small part of an input XML data. Next example transforms author name by concatenating firstname and lastname and makes new author_name element while keeping the same with other part of input XML.
<!ELEMENT bib AS {bib} ANY>
<!ELEMENT author AS {author} (author_name)>
<!ELEMENT author_name (#PCDATA AS {concat(firstname, ' ', lastname)}>
In addition, this example contains a different use of AS clause. If AS clause is specified for #PCDATA or attribute data, it maps a value to an output node. If AS clause is specified for element, it maps an input node to an output node.
This example is to extract tags satisfying several conditions on elements or attributes. For example, let us make a bibliography that contains only books which is published after 1995 and whose title contains XML. The next XTL expresses this query.
<!ELEMENT bib AS {bib} (book*)>
<!ELEMENT book (title AS {title[contains(.,'XML')]}, author+)>
<!ATTLIST book year AS {@year[.>1995]} CDATA #REQUIRED>
<!ELEMENT title (#PCDATA)>
<!ELEMENT author ANY>
XPath expression @year[.>1995] indicates that book's year should be larger than 1995, and the expression title[contains(.,'XML')] indicates that all book's title should contain a string XML. Therefore, this XTL expression extracts such book whose year is larger than 1995 and whose title contains XML.
Suppose an transformation example to capitalize all tag names.
<!ELEMENT Bib AS {bib} (book | article)*>
<!ELEMENT Book AS {book} (title, author+, publisher)>
<!ATTLIST Book
Year AS {year} CDATA #REQUIRED>
<!ELEMENT Article AS {article}
(author+, title, booktitle?,
(shortversion | longversion)?)>
<!ATTLIST Article
Type AS {type} CDATA #REQUIRED
Year AS {year} CDATA #REQUIRED
Month AS {month} CDATA #IMPLIED>
<!ELEMENT Publisher AS {publisher} (name, address?)>
<!ELEMENT Name AS {name} (#PCDATA)>
<!ELEMENT Title AS {title} (#PCDATA)>
<!ELEMENT Author AS {author} (firstname?, lastname)>
<!ELEMENT Firstname AS {firstname} (#PCDATA)>
<!ELEMENT Lastname AS {lastname} (#PCDATA)>
<!ELEMENT Booktitle AS {booktitle} (#PCDATA)>
The first line in this XTL expression indicates that bib tag should be transformed to Bib tag for example.
Next is a changing structure example is to build a list of authors.
<!ELEMENT bib AS {bib} (author* AS {//author} GROUP BY {.})>
<!ELEMENT author ANY>
This example specifies a collection of author element as child elements of bib element and that is extracted using XPath expression //author. Moreover, the GROUP BY clause eliminates duplicated author elements. The exclusion key that is specified with GROUP BY is {.} and indicates author element itself. This elimination is based on deep equality of author structure.
Let us look at more complicated example that outputs a list of author that contains a list of book title for each author. This example transforms the input structure between title and author upside down.
<!ELEMENT bib AS {bib} (author* AS {//author} GROUP BY {.})>
<!ELEMENT author (firstname?, lastname, title* AS {//title[../author=$author]}>
<!ELEMENT title (#PCDATA)>
<!ATTLIST title year AS {../@year} CDATA #REQUIRED>
<!ELEMENT firstname (#PCDATA)>
<!ELEMENT lastname (#PCDATA)>
As the same with the previous example, this XTL specifies a collection of author as child elements of bib tag. In addition, it also specifies a collection of title tag as child elements of author tag to make a title list for each author. The XPath expression //title[../author=$author] collects title elements from input XML data whose parent tag has a author element and that is the same with the current author (specified by $author).
Sort is an operation to reorder a collection of element in ascending or descending order using specified key tag value. Let us look at a simple example that collects an author list and sort it by its name in alphabetical order.
<!ELEMENT bib AS {bib} (author* IN {//author} GROUP BY {.} ORDER BY {.})>
<!ELEMENT author ANY>
In addition to collect an author list that is the same operation with the first example of changing structure, this example adds an ORDER BY clause to sort the author list. When users omit an ascending or descending clause, ascending is chosen as a default action. The next example describes a descending order example.
<!ELEMENT bib AS {bib}
(author* IN {//author} GROUP BY {.} ORDER BY {.} DESC)>
<!ELEMENT author ANY>
Let us look at an example that has two sort operations in different part. The next example sorts both the author list and its title list.
<!ELEMENT bib AS {bib}
(author* IN {//author} GROUP BY {.} ORDER BY {lastname})>
<!ELEMENT author AS {.} (firstname?, lastname, title* ORDER BY {.})>
<!ELEMENT title AS {//title [../author=$author]} (#PCDATA)>
<!ATTLIST title year AS {../@year} CDATA #REQUIRED>
<!ELEMENT firstname (#PCDATA)>
<!ELEMENT lastname (#PCDATA)>
The first ORDER BY clause indicates that the author list should be sorted using lastname as key value. The second ORDER BY clause indicates the title list for each author should be sorted using title itself (expressed by .) as key value.
Join operation is a core operation in relational model because relation is a unit and all information is divided into a collection of relations. On the other hand, join operation in XTL is just to combine several XML data into one XML data. Let us look into a join operation example to combine two XML data. One of input XML data is book catalog (BookCatalogue.xml) that does not contain book price data and the other is bookstore list (BookCosts.xml) including book price for each bookstore.
<?xml version="1.0"?>
<BookCatalogue>
<Book>
<Title>My Life and Times</Title>
<Author>Paul McCartney</Author>
<Date>July, 1998</Date>
<ISBN>94303-12021-43892</ISBN>
<Publisher>McMillin Publishing</Publisher>
</Book>
<Book>
<Title>Illusions The Adventures of a Reluctant Messiah</Title>
<Author>Richard Bach</Author>
<Date>1977</Date>
<ISBN>0-440-34319-4</ISBN>
<Publisher>Dell Publishing Co.</Publisher>
</Book>
<Book>
<Title>The First and Last Freedom</Title>
<Author>J. Krishnamurti</Author>
<Date>1954</Date>
<ISBN>0-06-064831-7</ISBN>
<Publisher>Harper & Row</Publisher>
</Book>
</BookCatalogue>
<?xml version="1.0"?>
<BookCosts>
<Book>
<Title>My Life and Times</Title>
<Cost store="Walden Books">$12.95</Cost>
<Cost store="Barnes & Noble">$10.95</Cost>
</Book>
<Book>
<Title>Illusions The Adventures of a Reluctant Messiah</Title>
<Cost store="Walden Books">$5.95</Cost>
<Cost store="Barnes & Noble">$6.95</Cost>
</Book>
<Book>
<Title>The First and Last Freedom</Title>
<Cost store="Walden Books">$9.95</Cost>
<Cost store="Barnes & Noble">$8.95</Cost>
</Book>
</BookCosts>
The below XTL expression produces an XML data that combines the above two XML data.
<!ELEMENT Bib AS {BookCatalogue} (Book*)>
<!ELEMENT Book (Title, Author+, Date, ISBN, Publisher, Cost*)>
<!ELEMENT Title (#PCDATA)>
<!ELEMENT Author (#PCDATA)>
<!ELEMENT Date (#PCDATA)>
<!ELEMENT ISBN (#PCDATA)>
<!ELEMENT Publisher (#PCDATA)>
<!ELEMENT Cost AS {document('BookCosts.xml')/BookCosts/Book[Title=$Title]/Cost} (#PCDATA)>
<!ATTLIST Cost store CDATA #REQUIRED>
The important part of this example is that an XPath expression for Cost tag specified in second last row. The docment function, which is defined in XSLT specification, is applied to extract some part of other XML data (BookCosts.xml) and combines it to the base BookCatalogue.xml. The XPath expression, BookCosts/Book/Cost, following after the document function specifies an extraction target tag that should be mapped to Cost tag. The [../Title=$Title] specifies a selection condition for Book tag meaning that a Title tag under the Book tag should be the same with $Title. $Title is a sibling tag of current Cost tag and is defined at third line of this XTL example.
Appendix shows XTL syntax. The XTL feature and its extensions to DTD are as follows.
The DTD in XTL specifies an output structure of transformation. Therefore, it is easy for users to understand and specify the transformation result.
XPath clause is embedded into DTD syntax for each element or attribute. This clause indicates a mapping rule from input XML to output XML and there are two types of mapping rule: 1) node map and 2) value map.
It is possible to omit the XPath clause for element or attribute. Default rule of this is that the same tag with output XML tag is used as XPath expression for node map and text() is used as XPath expression for value map. For example, <!ELEMENT bib AS {bib} (book*)> has the same meaning with <!ELEMENT bib AS {bib} (book* AS {book})> and <!ELEMENT book (#PCDATA)> has the same meaning with <!ELEMENT book (#PCDATA AS {text()})>.
To increase a flexibility of specifying output structure, DTD syntax is extended as follows.
In DTD, it is possible to specify an element cardinality (*, +, ?, none) in a content model. DTD in XTL specifies an output structure of transformation. XTL defines that the cardinality of element is a constraint for its mapping rule. For example, <!ELEMENT bib AS {bib} (book* AS {book})> indicates that XPath book should return a collection of book that count is more than zero. <!ELEMENT bib AS {bib} (book+ AS {book})> indicates that XPath book should return a collection of book that count is more than one.
XTL also has to define a meaning of ANY and EMPTY in XTL. ANY means that the element can have any content model (structure) without any constraint. EMPTY means that the element should be empty element. For example, <!ELEMENT author AS {author} ANY> indicates that output author can be any content model and all sub-elements of input author is mapped to the transformation result. On the other hand <!ELEMENT author AS {author} EMPTY> indicates that the input author should be empty element.
A variable is automatically set for each element and attribute. Any embedded XPath can refer those variables as long as the referee node assigned to variable is reachable from a referrer via many-to-one or one-to-one association in DTD graph. For example, <!ELEMENT b AS {bib} (a * {$bib/author})> indicates that $b refers input bib element. If XTL has a recursive element definition, then a closest parent is referenced from its children.
I have implemented XTL processor to translate XTL expression to XSLT exprssion. This generator is useful for users who would like to transform XML, because XTL is much simpler than XSLT. They can use this generator as a front-end tool of XSLT.
This processor doses not implement the XTL specification fully, because there are some difficulties to translate XTL to XSLT that comes from model gaps between XTL and XSLT. However, this generator translates most of XTL into XSLT so users can use this generator for many XML transformations.
This section briefly describes how the XTL processor translates XTL expression to XSLT expression. The details will come in following section.
First of all, let us remind the XSLT expression. XSLT expression is composed of several template declarations. XSLT processor inputs an XML and checks whether there are some templates that match an input element, attribute, or text. If some template matches, then it is applied and executed. If there are no matching templates, then default action is applied and output the input text.
Basically, each element type declaration is translated to one XSLT template declaration. There are two patterns regardless of its content model:
There are two cases that users must use the first pattern: 1) the element is a candidate of root element or 2) the element is a candidate of a descendant of an element declared as ANY. If not, user can use either the first or second pattern. This is a little bit confusing but this reduces the number of template declaration that this XTL processor generates. The first pattern element declaration is translated into two template declarations like <xsl:template match="bib">... for 1) case and <xsl:template match="bib" mode="any">... for 2) case. The second pattern element declaration is translated into a template declaration like <xsl:template match="*|@*|text()" " mode="bib">.... The declared element name is used for a mode name in a translated template declaration.
There are additional three types of template declarations that XTL processor generates:
In case a), a template declaration like <xsl:template match="*|@*|text()" mode="author_.">... is generated for each element specified with GROUP BY and this is for the purpose to eliminate duplication of specified node. The mode name is generated as a concatenation of target element name, string "_", and key element name of GROUP BY clause.
In case b), a template declaration <xsl:template match="text()"></xsl:template> is generated and indicates that any input node (element, attribute, or text) outputs nothing if it dose not match any other template declarations.
In case c), a template declaration:
<xsl:template match="*|@*|text()" mode="any">
<xsl:copy>
<xsl:apply-templates select="*|@*|text()" mode="any"/>
</xsl:copy>
</xsl:template>
I have described how the XTL processor generates each template declaration without inside details of the template declaration. Let us look into its inside using an example. The example XTL is:
<!ELEMENT bib AS {bib} (author* AS {//author} GROUP BY {.})>
<!ELEMENT author (firstname?, lastname,
title* AS {//title[../author=$author]} ORDER BY {.})>
<!ELEMENT title (#PCDATA)>
<!ATTLIST title year AS {../@year} CDATA #REQUIRED>
<!ELEMENT firstname ANY>
<!ELEMENT lastname EMPTY>
and the generated XSLT result is:
<?xml version="1.0" encoding="euc-jp"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml" indent="yes" encoding="euc-jp" omit-xml-declaration="no"/>
<xsl:strip-space elements="*"/>
<!-- This is for bib element -->
<xsl:template match="bib">
<xsl:variable name="bib" select="."/>
<xsl:element name="bib">
<xsl:apply-templates select="(//author)" mode="author_.">
<xsl:with-param name="nodes" select="//author"/>
<xsl:with-param name="bib" select="$bib"/>
</xsl:apply-templates>
</xsl:element>
</xsl:template>
<!-- This is for bib element UNDER ANY -->
<xsl:template match="bib" mode="any">
<xsl:variable name="bib" select="."/>
<xsl:element name="bib">
<xsl:apply-templates select="(//author)" mode="author_.">
<xsl:with-param name="nodes" select="//author"/>
<xsl:with-param name="bib" select="$bib"/>
</xsl:apply-templates>
</xsl:element>
</xsl:template>
<!-- This is for author element -->
<xsl:template match="*|@*|text()" mode="author">
<xsl:variable name="author" select="."/>
<xsl:param name="bib"/>
<xsl:if test="count(lastname)>=1">
<xsl:if test="lastname=''">
<xsl:element name="author">
<xsl:apply-templates select="(firstname)[1]" mode="firstname">
<xsl:with-param name="author" select="$author"/>
<xsl:with-param name="bib" select="$bib"/>
</xsl:apply-templates>
<xsl:apply-templates select="(lastname)[1]" mode="lastname">
<xsl:with-param name="author" select="$author"/>
<xsl:with-param name="bib" select="$bib"/>
</xsl:apply-templates>
<xsl:apply-templates select="(//title[../author=$author])" mode="title">
<xsl:with-param name="author" select="$author"/>
<xsl:with-param name="bib" select="$bib"/>
<xsl:sort select="." order="ascending"/>
</xsl:apply-templates>
</xsl:element>
</xsl:if>
</xsl:if>
</xsl:template>
<!-- This is for firstname element -->
<xsl:template match="*|@*|text()" mode="firstname">
<xsl:variable name="firstname" select="."/>
<xsl:param name="author"/>
<xsl:param name="bib"/>
<xsl:element name="firstname">
<xsl:apply-templates select="child::node()" mode="any"/>
</xsl:element>
</xsl:template>
<!-- This is for lastname element -->
<xsl:template match="*|@*|text()" mode="lastname">
<xsl:variable name="lastname" select="."/>
<xsl:param name="author"/>
<xsl:param name="bib"/>
<xsl:if test=".=''">
<xsl:element name="lastname">
</xsl:element>
</xsl:if>
</xsl:template>
<!-- This is for title element -->
<xsl:template match="*|@*|text()" mode="title">
<xsl:variable name="title" select="."/>
<xsl:param name="author"/>
<xsl:param name="bib"/>
<xsl:if test="count(../@year)>=1">
<xsl:element name="title">
<xsl:variable name="year" select="../@year"/>
<xsl:if test="count($year)>=1">
<xsl:attribute name="year">
<xsl:value-of select="$year"/>
</xsl:attribute>
</xsl:if>
<xsl:value-of select="."/>
</xsl:element>
</xsl:if>
</xsl:template>
<!-- This is for GROUP author element BY . -->
<xsl:template match="*|@*|text()" mode="author_.">
<xsl:param name="nodes"/>
<xsl:param name="bib"/>
<xsl:variable name="pos" select="position()"/>
<xsl:if test="count($nodes[$pos>position() and .=current()/.])=0">
<xsl:apply-templates select="." mode="author">
<xsl:with-param name="bib" select="$bib"/>
</xsl:apply-templates>
</xsl:if>
</xsl:template>
<xsl:template match="text()"></xsl:template>
<xsl:template match="*|@*|text()" mode="any">
<xsl:copy>
<xsl:apply-templates select="*|@*|text()" mode="any"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
First, there are three types of content model in element type declaration: ANY, EMPTY, and element content. There are mixed content in original DTD, but there is no mixed content in DTD of XTL because DTD of XTL is extended to handle #PCDATA and element in the same way (meaning that mixed content is combined with element content).
Template declaration is generated according to the rules described in 5-1. For example, <!ELEMENT firstname ANY> is translated into a template:
<xsl:template match="*|@*|text()" mode="firstname">
<xsl:variable name="firstname" select="."/>
<xsl:param name="author"/>
<xsl:param name="bib"/>
<xsl:element name="firstname">
<xsl:apply-templates select="child::node()" mode="any"/>
</xsl:element>
</xsl:template>
<xsl:template match="*|@*|text()" mode="any">
<xsl:copy>
<xsl:apply-templates select="*|@*|text()" mode="any"/>
</xsl:copy>
</xsl:template>
<xsl:template match="bib" mode="any">
<xsl:variable name="bib" select="."/>
<xsl:element name="bib">
<xsl:apply-templates select="(//author)" mode="author_.">
<xsl:with-param name="nodes" select="//author"/>
<xsl:with-param name="bib" select="$bib"/>
</xsl:apply-templates>
</xsl:element>
</xsl:template>
This function is powerful especially for such case that users want to transform only a small part of structure or a tag name of a node in an input XML. The users can declare a root element as ANY and specify whatever they want to transform using element type declaration or attribute list declaration.
Template declaration is generated according to the rules described in “Query feature”.For example, <!ELEMENT lastname EMPTY> is translated into a template:
<xsl:template match="*|@*|text()" mode="lastname">
<xsl:variable name="lastname" select="."/>
<xsl:param name="author"/>
<xsl:param name="bib"/>
<xsl:if test=".=''">
<xsl:element name="lastname">
</xsl:element>
</xsl:if>
</xsl:template>
Template declaration is generated according to the rules described in 5-1. For example, <!ELEMENT title (#PCDATA)> is translated into a template:
<xsl:template match="*|@*|text()" mode="title">
<xsl:variable name="title" select="."/>
<xsl:param name="author"/>
<xsl:param name="bib"/>
<xsl:if test="count(../@year)>=1">
<xsl:element name="title">
<xsl:variable name="year" select="../@year"/>
<xsl:if test="count($year)>=1">
<xsl:attribute name="year">
<xsl:value-of select="$year"/>
</xsl:attribute>
</xsl:if>
<xsl:value-of select="."/>
</xsl:element>
</xsl:if>
</xsl:template>
A generated template declaration varies according to a content model of an element type declaration in XTL expression: 1) sequence (A,B,...), or 2) choice (A|B|...).
<xsl:element name="author">
<xsl:apply-templates select="(firstname)[1]" mode="firstname">
<xsl:with-param name="author" select="$author"/>
<xsl:with-param name="bib" select="$bib"/>
</xsl:apply-templates>
<xsl:apply-templates select="(lastname)[1]" mode="lastname">
<xsl:with-param name="author" select="$author"/>
<xsl:with-param name="bib" select="$bib"/>
</xsl:apply-templates>
<xsl:apply-templates select="(//title[../author=$author])" mode="title">
<xsl:with-param name="author" select="$author"/>
<xsl:with-param name="bib" select="$bib"/>
<xsl:sort select="." order="ascending"/>
</xsl:apply-templates>
</xsl:element>
<xsl:element name="author">
<xsl:choose>
<xsl:when test="A">
<xsl:apply-templates select="(A)[1]" mode="A">
<xsl:with-param name="root" select="$root"/>
</xsl:apply-templates>
</xsl:when>
<xsl:when test="B">
<xsl:apply-templates select="(B)[1]" mode="B">
<xsl:with-param name="root" select="$root"/>
</xsl:apply-templates>
</xsl:when>
<xsl:when test="C">
<xsl:apply-templates select="(C)[1]" mode="C">
<xsl:with-param name="root" select="$root"/>
</xsl:apply-templates>
</xsl:when>
</xsl:choose>
</xsl:element>
When a content model is nested, each nested part (put in parenthesis) can be specified with XPath and behaves the same with when it is declared as a content model in an element declaration, except it has no specific name for the nested part. For example, <!ELEMENT author (A|B|(C1,C2)*)> behaves like combination of <!ELEMENT author (A|B|foo*)> and <!ELEMENT foo (C1,C2)>. To avoid generating such temporal element like foo, the XTL processor generates <xsl:for-each> instead of generating <xsl:apply-templates>. For example, <!ELEMENT author (A|B|(C1,C2) AS {C})> generates:
<xsl:when test="C">
<xsl:for-each select="C">
<xsl:apply-templates select="(C1)[1]" mode="C1">
<xsl:with-param name="author" select="$author"/>
<xsl:with-param name="bib" select="$bib"/>
</xsl:apply-templates>
<xsl:apply-templates select="(C2)[1]" mode="C2">
<xsl:with-param name="author" select="$author"/>
<xsl:with-param name="bib" select="$bib"/>
</xsl:apply-templates>
</xsl:for-each>
</xsl:when>
A template for an element whose content model contains a GROUP BY clause (I will refer the declared element as parent element for GROUP BY element) invokes xsl:apply-templates to apply a special template for exclusion. I will describe concerning about the special template later. For example, the template for bib element invokes xsl:apply-templates to apply the special template <xsl:apply-templates select="(//author)" mode="author_."> instead of directly invoking a template for author element. In addition, it passes a node-set as a parameter that is the same with a selected node-set for applying the special template. For example, the template for bib element passes a result of //author as a parameter nodes.
Let us look at the special template for exclusion using an example. <!ELEMENT bib AS {bib} (author* AS {//author} GROUP BY {.})> generates both a template for bib element and a template:
<xsl:template match="*|@*|text()" mode="author_.">
<xsl:param name="nodes"/>
<xsl:param name="bib"/>
<xsl:variable name="pos" select="position()"/>
<xsl:if test="count($nodes[$pos>position() and .=current()/.])=0">
<xsl:apply-templates select="." mode="author">
<xsl:with-param name="bib" select="$bib"/>
</xsl:apply-templates>
</xsl:if>
</xsl:template>
If ORDER BY clause is specified for an element in a content model, xsl:sort is added as a content of xsl:apply-templates that applies a template for the element. For example,
<!ELEMENT author (firstname?, lastname, title* AS {//title[../author=$author]} ORDER BY {.})>
<xsl:apply-templates select="(//title[../author=$author])" mode="title"> <xsl:with-param name="author" select="$author"/> <xsl:with-param name="bib" select="$bib"/> <xsl:sort select="." order="ascending"/> </xsl:apply-templates>
The XTL processor checks a cardinality constraint (*, +, ?, and none in DTD) as follows. It checks such node should be more than one that is transitively reached via + or none cardinality in DTD. If there is * or ? cardinality, the XTL processor stops checking that its descendant nodes should be more than one. It doesn't check node should be one or zero in case of ? cardinality or should be exactly one in case of none cardinality, because XPath expression like /bib/author sometimes means a top node of a node-set /bib/author. Another implementation candidate would be that a XTL processor checks node should be one or zero in case of ? or none cardinality and users have to be responsible to express an XPath like /bib/author[1].
A declared element name is defined as variable in its translated template declaration. For example, current node is defined as variable like <xsl:variable name="bib" select="."/> in a template for bib element. This variable is passed as a parameter of apply-templates to refer it as variable from applied template declarations (these templates are for child elements processing). For example, <xsl:with-param name="bib" select="$bib"/> is an expression for parameter. As a result of this, any parent elements are referable as variables from any template declarations.
Ideally, any node, that is reachable from current node via one-to-one or many-to-one association in DTD graph, should be referable as variables. However the XTL processor dose not support it fully (supported is only from child node to parent node), because it is difficult to map such reference in XSLT expression.
Attribute list declaration for an element is translated to XSLT expression in a translated template declaration for the element. For example, <!ATTLIST title year AS {../@year} CDATA #REQUIRED> is translated into:
<xsl:variable name="year" select="../@year"/>
<xsl:if test="count($year)>=1">
<xsl:attribute name="year">
<xsl:value-of select="$year"/>
</xsl:attribute>
</xsl:if>
<xsl:value-of select="."/>
There are several limitations in this translation algorithm from XTL expresion to XSLT exprssion.
|
You cannot use both ORDER BY clause and GROUP BY clause for the same element simultaneously. You have to divide the transformation into two XTL, one is for GROUP BY and the other is for ORDER BY. |
|
|
Only parent elements are referable as variables. |
|
|
Tag variable is not available. |
|
|
User-defined function is not available. |
A query or transformation language has two functions: querying from input XML and defining output structure.
Even though both XTL and Quilt query feature are based on XPath, XTL Querying feature is inferior to that of Quilt. Because Quilt has more additional functionality like FOR, WHERE clause and AFTER, BEFORE operators. They are good supplement of XPath function.
Basically XPath issues may motivate these functions. I hope XPath 2.0 will improve some existing issues of XPath1.0.
XTL defining output structure feature is superior to that of Quilt because XTL can define an output structure based on grammar (extended DTD). Quilt's output structure is base on XML instance so it gets rather complex to specify choice (| in DTD) and some recursive structure. In addition, ANY in XTL is powerful to transform a small part of an input XML data, because users have to specify only a part where a transformation is needed (In Quilt, users have to specify s whole structure from root node including a part where a transformation is not needed).
I will describe examples of 2. Examples of XTL expressions written in Quilt. These examples are for the purpose of clarify the difference of XTL and Quilt.
<bib>
(
FOR $a IN document("bib.xml")/bib/book
RETURN <book>$a</book>
)
</bib>
<bib>
(
FOR $a IN document("bib.xml")/bib
(
FOR $b IN $a/book | $a/article
RETURN
<book Year=$b/year/text()>
<title>$b/title/text()</title>
(
FOR $c IN $b/author
RETURN
<author>
<author_name>concat($c/firstname/text(), ' ',$c/lastname/text())</author_name>
</author>
)
<publisher>$b/publisher/text()</publisher>
</book>
)
)
</bib>
<Bib>
(
FOR $a IN document("bib.xml")/bib
(
FOR $b IN $a/book | $a/article
RETURN
<Book Year=$b/year/text()>
<Title>$b/title/text()</Title>
(
FOR $c IN $b/author
RETURN
<Author>
<Firstname>$c/firstname/text()</Firstname>,
<Lastname>$c/lastname/text()</Lastname>
</Author>
)
<Publisher><name>$b/publisher/name/text()</name></Publisher>
</Book>
)
)
</Bib>
I describe syntax of XTL using BNF. This BNF is based on DTD in XML specification and extends DTD's BNF according to the extension described in Section 3.1.
xtl ::= (markupdecl)*
// XTL does not support EntityDecl, NotationDecl, and PI
markupdecl ::= elementdecl | AttlistDecl | Comment
// element type declaration
elementdecl ::= '<!ELEMENT' Name XPath? contentspec '>'
// for tag variable and handling #PCDATA same with element.
Name ::= (Letter | '_' | ':') (NameChar)* | '$' (NameChar)* | #PCDATA
NameChar ::= Letter | Digit | '.' | '-' | '_' | ':' | CombiningChar | Extender
XPath ::= 'AS' nodesetExpr
nodeSetExpr:=
nodeSetExpr '+' nodeSetExpr /* concatination */
| nodeSetExpr '-' nodeSetExpr /* difference */
| nodeSetExpr '*' nodeSetExpr /* intersection */
| '(' nodeSetExpr ')'
| XPathExpr
contentspec ::= 'EMPTY' | 'ANY' | children
children ::= (choice | seq) ('?' | '*' | '+')?
choice ::= '(' cp ( '|' cp )* ')'
seq ::= '(' cp ( ',' cp )* ')'
cp ::= (Name XPath? | choice | seq) ('?' | '*' | '+')? groupby? orderby?
groupby ::= 'GROUP' 'BY' XPath+
orderby ::= 'ORDER' 'BY' XPath+
// attribute list declaration
AttlistDecl ::= '<!ATTLIST' Name AttDef* '>'
AttDef ::= Name XPath? AttType DefaultDecl
AttType ::= StringType | TokenizedType | EnumeratedType
StringType ::= 'CDATA'
TokenizedType::= 'ID'|'IDREF'|'IDREFS'|'ENTITY'|'ENTITIES'|'NMTOKEN'|'NMTOKENS
// XTL does not support Notation Type
EnumeratedType::= Enumeration
[
Enumeration ::= '(' Nmtoken ('|' Nmtoken)* ')'
Nmtoken ::= (NameChar)+
DefaultDecl ::= '#REQUIRED' | '#IMPLIED' | (('#FIXED')? AttValue)
AttValue ::= '"' ([^<&"] | Reference)* '"' | "'" ([^<&'] | Reference)* "'"
Reference ::= CharRef
CharRef ::= '&#' [0-9]+ ';' | '&#x' [0-9a-fA-F]+ ';'
// comment declaration
Comment ::= '<!--' ((Char - '-') | ('-' (Char - '-')))* '-->'
[A query language for XML] Alin Deutsch, Mary Fernandez, Daniela Florescu, Alon Levy, Dan Suciu, In International World Wide Web Conference, 1999. http://www.research.att.com/~mff/files/final.html.
[Comparative Analysis of Five XML Query Language] Angela Bonifati, Stefano Ceri, SIGMOD Record, vol.29, no.1, March 2000.
[Querying XML Data] Alin Deutsch, Mary Fernandez, Daniela Florescu, Alon Levy, David Maier, Dan Suciu, IEEE Data Engineering Bulletin vol. 22, no.3, 10-18, 1999.
[Quilt: an XML Query Language] Jonathan Robie, Don Chamberlin, Daniela Florescu, http://www.almaden.ibm.com/cs/people/chamberlin/quilt_euro.html.
[XML Path Language (XPath) Version 1.0] W3C, http://www.w3.org/TR/xpath
[XML Query Languages: Experiences and Exemplars] Mary Fernandez, Jerome Simeon, Philip Wadler, http://www.w3.org/1999/09/ql/docs/xquery.html.
[XPath Language Requirement Version 2.0] W3C, http://www.w3.org/TR/xpath20req
[XQL FAQ] Jonathan Robie, http://metalab.unc.edu/xql/
[XSL Transformations (XSLT) Version 1.0] W3C, http://www.w3.org/TR/xslt.
[XSL Transformations (XSLT) Version 1.1] W3C, http://www.w3.org/TR/xslt11.