Keywords: Application architecture, Content Management , Database, Full-text, Query language, Relational database, Repository, Search , SQL, XML, XQuery
Biography
Stephen Buxton is Director of Product Management in Oracle’s Server Technologies division. He works closely with developers and customers on Oracle’s XML and Text Retrieval products, including Oracle’s XQuery implementation. Stephen is a member of the World Wide Web Consortium's XML Query Working Group, and is an editor on several of the XQuery Full-Text drafts.
XQuery is gathering momentum as a powerful, flexible, general-purpose query language for searching within and across XML documents and fragments. But on its own, it’s just a query language – it defines a transformation from one XQuery Data Model instance to another.
Relational databases and SQL have been storing, managing and searching the world’s most valuable data for decades. But SQL (before XML extensions) could not leverage the information implicit in the structure of XML.
Many see XQuery and SQL as competitive technologies: this paper describes how XQuery and SQL can be complementary in practical applications, each leveraging the strengths of the other. SQL can extend its query capabilities via XML extensions – SQL/XML and XQuery functions – while XQuery can query SQL databases and repositories. Add a Java API - XQuery API for Java™ (XQJ) - and XQuery and SQL together can meet any application’s query requirements. The paper includes an overview of XML and query technologies, and some practical advice on choosing the right mix of technologies for your application.
1. Introduction
2. Data Storage and Representation
2.1 Data Sample 1 - House Resolution 558
2.1.1 XML Representation
2.1.2 SQL Representation
2.1.3 Reasons for choosing some representation
2.2 Data Storage
2.2.1 SQL Storage
2.2.2 XML Storage
2.2.3 Reasons for choosing some storage
3. SQL XML Duality - Storage and Representation
3.1 SQL Storage, SQL Representation
3.2 SQL Storage, XML Representation
3.3 XML Storage, SQL Representation
3.4 XML Storage, XML Representation
3.5 Summary - SQL and XML Storage/Representation Duality
4. Data Query - SQL, SQL/XML, XQuery
4.1 SQL
4.2 SQL/XML
4.3 XQuery
4.4 XQuery in SQL/XML
4.5 SQL and XML Duality - Query
5. SQL/XML and XQuery - Summary
6. For Next Time ...
7. Further Reading
Bibliography
This paper discusses storing, representing and querying XML with SQL, SQL/XML and XQuery.
While SQL and SQL/XML are part of the ANSI/ISO SQL standard, the examples in this paper use Oracle's syntax, including some Oracle extensions, via an interactive SQL user interface (SQL*Plus). Oracle is an active founder-member of the SQLX Group ("SQL & XML Working Together"), a working group that authors SQL/XML proposals for presentation to the ANSI/ISO standards bodies. In this paper we use the term "SQL/XML" quite loosely, to mean SQL extensions for XML. Some are already part of the SQL standard, some have been presented as extensions for a future version of the standard, and some will probably remain Oracle extensions. For details of the SQL/XML standards effort, see [SQLX]
Data is “factual information ... used as a basis for reasoning, discussion, or calculation” (Merriam-Webster Online Dictionary). In this paper we focus on persistent data - data that needs to be stored in a reliable way so that it can be retrieved and manipulated. Non-persistent data is data that is produced, used and discarded, such as the data packets exchanged in SOAP messages. While non-persistent data is important, especially in any discussion of XML, we will leave that discussion for another paper.
Persistent data has two important attributes - storage and representation. We describe storage and representation with reference to some sample data.
Our first data sample is taken from a resolution presented to the U.S. House of Representatives. The resolution is entitled “Welcoming the accession of Bulgaria, Estonia, Latvia, Lithuania, Romania, Slovakia, and Slovenia to the North Atlantic Treaty Organization (NATO)”. It was created 11th March 2004, sponsored by Bereuter, and it “welcomes with enthusiasm the accession of Bulgaria, Estonia, Latvia, Lithuania, Romania, Slovakia, and Slovenia to the North Atlantic Treaty Organization (NATO)”.
The previous paragraph is a natural language representation of (some of) the data in House Resolution 558. The first thing to notice is that the representation is not the data, it is merely an abstract view of (some of) the data. Similarly, the data is not the real-world object - no matter how well my data described House Resolution 558, the data would not be the resolution. This separation of representation, data and real-world object may seem obvious, but bear them in mind as we step through some variations in representation and storage.
Example 1 shows another possible representation for House Resolution 558, an XML representation.
<resolution dms-id="42" public-private="public">
<congress>108</congress>
<session>2</session>
<legis-num>558</legis-num>
<action>
<action-date>20040311</action-date>
<action-desc>
<sponsor>Bereuter</sponsor>
<cosponsor>Wexler</cosponsor>
<cosponsor>Gillmor</cosponsor>
<cosponsor>Shimkus</cosponsor>
<committee-name>Committee on International Relations
</committee-name>
</action-desc>
</action>
<official-title>Welcoming the accession of Bulgaria, Estonia,
Latvia, Lithuania, Romania, Slovakia, and Slovenia to the North
Atlantic Treaty Organization (NATO), and for other purposes.
</official-title>
<resolution-body>
<paragraph>welcomes with enthusiasm the accession of Bulgaria,
Estonia, Latvia, Lithuania, Romania, Slovakia, and Slovenia to the
North Atlantic Treaty Organization (NATO);
</paragraph>
<paragraph>reaffirms that the process of NATO enlargement enhances
the security of the United States and the entire North Atlantic area;
</paragraph>
<paragraph>agrees that the process of NATO enlargement should be
open to potential membership by any interested European democracy that
meets the criteria for NATO membership as set forth in the 1995 Study
on NATO Enlargement and whose admission would further the principles
of the Washington Treaty of 1949 and would enhance security in the
North Atlantic area; and
</paragraph>
<paragraph>recommends that NATO heads of state and government
should review the enlargement process, including the applications
of Albania, Croatia, and Macedonia, at a summit meeting to be held
no later than 2007.
</paragraph>
</resolution-body>
</resolution>
|
Example 1: Sample resolution - XML
What does the XML representation give us that the informal, natural language description does not? XML gives us:
<action-date>20040311</action-date> |
<action>
<action-date>20040311</action-date>
<action-desc>
<sponsor>Bereuter</sponsor>
<cosponsor>Wexler</cosponsor>
<cosponsor>Gillmor</cosponsor>
<cosponsor>Shimkus</cosponsor>
<committee-name>Committee on International Relations
</committee-name>
</action-desc>
</action>
|
<paragraph>welcomes with enthusiasm the accession of Bulgaria, Estonia, Latvia, Lithuania, Romania, Slovakia, and Slovenia to the North Atlantic Treaty Organization (NATO); </paragraph> <paragraph>reaffirms that the process of NATO enlargement enhances the security of the United States and the entire North Atlantic area; </paragraph> |
It is possible to represent this same data in many other useful, structured ways - as SGML, as HTML, as a comma-separated list of fields suitable for input to, say, a spreadsheet, or as a stream of codes suitable for punching on cards. For this paper we consider just one other representation, an SQL representation. This is a representation of the data as you might expect to see it produced from an Object-Relational database.
One possible SQL representation of House Resolution 558 is shown in Example 2
| id | congress | session_num | legis_num | action_date | committee_name | official_title |
| 42 | 108 | 2 | 558 | 20040311 | Committee on International Relations | Welcoming the .... |
| etc. |
Table 1: resolutions
| resolution_id | sponsor | sponsor_type |
| 42 | Bereuter | primary |
| 42 | Wexler | co-sponsor |
| 42 | Gillmor | co-sponsor |
| 42 | Shimkus | co-sponsor |
| etc. |
Table 2: resolutions_sponsors
| resolution_id | p_order | paragraph |
| 42 | 1 | welcomes with enthusiasm the accession of... |
| 42 | 2 | reaffirms that the process of NATO enlargement ... |
| 42 | 3 | agrees that the process of NATO enlargement ... |
| 42 | 4 | recommends that NATO heads of state and government ... |
| etc. |
Table 3: resolutions_paragraphs
This is a fairly naïve representation. We have not tried to either fully normalize the data for efficient disk storage, nor denormalize for performance. Perhaps more importantly, this representation does not take advantage of any of the relatively recent advances in Object-Relational modelling of hierarchical structures, such as variable-length arrays (for modelling sequences of data elements) or nested tables (for modelling unordered sets of data elements). It is sufficient to say that the same data we saw represented in natural language and in XML can also be represented in SQL (that is, Relationally or Object-Relationally).
What does the SQL representation give us that the informal, natural language description does not? SQL gives us:
There are two things that were on the XML list that are missing from the SQL list. First, data represented in SQL does not have a default ordering (except in a few types such as varrays). SQL was designed to handle structured data, where order is rarely important. As you can see from the resolutions_paragraphs table, it is easy to introduce an ordering when it is necessary. Second, people tend to think the SQL representation is less flexible than XML. We would argue that this kind of flexibility is A Bad Thing - it allows anyone to add data in any form, and reduces the value of the data.
You might choose an SQL presentation for your data so that you can manage and query it with tools that are robust, mature, and readily available. You might already own a suite of tools that work well with an SQL representation, and you probably already have the skills in-house to work with SQL. Many applications need to publish data (make data available) to a 3rd-party tool or application. SQL has been around for several decades, and there is a wealth of available tools, applications and skills.
On the other hand, you might choose an XML representation because you need to publish data to (or consume data from) a web service or a browser or an integration application.
See Table 5
Data storage is related to data representation, but it's not the same thing. Before describing how you might store your data, we must point out as an aside that, like representation, data storage is an abstraction - we rarely talk about where the bits actually sit on the disk (or how the bits are represented in the magnetic layer). For the purposes of this paper, files and tables is as concrete as we'll get.
When people think of SQL (Object-Relational) storage, they generally think of tables, with rows and columns. The table is the basic unit of SQL storage, but there are a number of ways to model data that is not the same "shape" as a table.
In the examples in the rest of this paper, where we refer to "SQL storage" we mean the storage implied by the naïve representation in Example 2. In a real-world application you have many more SQL storage options to choose from.
In Oracle database 10g there are three ways to store XML data - in a LOB, shredded, or in a native XMLType object (follows the SQL:2003 XML type).
You might choose to store data as SQL (in Object-Relational tables) for greater efficiency - efficiency of space (no need to store all those tags) and efficiency of time (SQL indexing techniques are mature and robust, so it may be faster to query data stored as SQL).
On the other hand, you might choose to store your data as XML if your application requires that the stored data exactly matches the incoming representation (including order, whitespace, etc.) for legal reasons. Or you might have XML data that rarely changes, but has to be input and/or output frequently. In such cases it may be more efficient to keep the data all in one place as a single XML "thing".
See Table 5.
In this section we show that storage and representation are indeed separate. You can store your data in SQL and produce a representation either in some other form of SQL or in XML. You can store your data in XML and produce a representation either in some other form of XML or in SQL.
We start with the simplest transformation, SQL storage to a (different) SQL representation. The classic way to achieve this is with views. Example 3 creates a simple view that joins (pulls data from) some columns in two tables, renames the columns, and restricts the data to include only some rows.
create view body_text as
select
r.congress AS congress ,
r.session_num AS congress_session ,
r.legis_num AS legislation ,
p.paragraph AS beginning
from
resolutions r ,
resolutions_paragraphs p
where
r.id = p.resolution_id and
p.p_order = 1
/ |
Example 3: Simple SQL View
The content of a view can be defined using any query. Once created, a view can be described and queried as if it were a table.
SQL> describe body_text Name Null? Type ----------------------------------------- -------- -------------- CONGRESS NUMBER CONGRESS_SESSION NUMBER LEGISLATION NUMBER BEGINNING VARCHAR2(4000) SQL> select beginning from body_text / BEGINNING ------------------------------------------------------------------ welcomes with enthusiasm the accession of Bulgaria, Estonia, Latvia, Lithuania, Romania, Slovakia, and Slovenia to the North Atlantic Treaty Organization (NATO); SQL> create view beginning as 2 select beginning from body_text 3 / View created. SQL> select * from beginning / BEGINNING ------------------------------------------------------------------ welcomes with enthusiasm the accession of Bulgaria, Estonia, Latvia, Lithuania, Romania, Slovakia, and Slovenia to the North Atlantic Treaty Organization (NATO); |
Example 4: Use a View Like a Table
Example 4 describes the view body_text, queries from the view, then creates and queries another view based on body_text. In all these operations, the view body_text (the representation of some SQL data) is indistinguishable from a table (the storage of SQL data).
OK, now that we've got the general idea that data can be physically stored on the disk in one way, and represented (used, manipulated, queried) as if it were stored in quite a different way, let's apply that to SQL and XML. There are a number of ways to generate an XML representation from data stored in relational or object-relational tables. Example 5 uses some of the XMLxxx functions introduced in SQL:2003 Part 14 (also known as the SQL/XML functions) to produce the XML representation illustrated in Example 1 from data stored in the tables described in Example 2
create or replace view resolutions_xml_view as
(select
r.id AS id ,
XMLElement ("resolution" ,
XMLElement ("congress" , congress) ,
XMLElement ("session" , session_num) ,
XMLElement ("legis-num" , legis_num) ,
XMLElement ("action" ,
XMLElement ("action-date", to_char(action_date, 'YYYYMMDD')),
XMLElement ("action-desc" ,
(select (XMLElement("sponsor", sponsor))
from resolutions_sponsors s
where s.resolution_id = r.id
and s.sponsor_type = 'primary'
) ,
(select (XMLAGG(XMLElement("cosponsor", sponsor)))
from resolutions_sponsors s
where s.resolution_id = r.id
and s.sponsor_type = 'cosponsor'
) ,
XMLElement ("committee-name" , committee_name)
)
) ,
XMLElement ("official-title" , official_title) ,
XMLElement ("resolution-body" ,
(select (XMLAGG(XMLElement("paragraph", paragraph)))
from resolutions_paragraphs p
where p.resolution_id = r.id
)
)
) AS resolution
from resolutions r
)
/
|
Example 5: resolutions_xml_view
With this view in place, a query such as Example 6 gives the same results as a query against a table with two columns, id (number) and resolution (CLOB, VARCHAR2 or XMLType).
select id ,
resolution
from resolutions_xml_view
/
Result:
ID RESOLUTION
---- ----------------------------------------
42 <resolution>
<congress>108</congress>
<session>2</session>
<legis-num>558</legis-num>
....
|
Example 6: Query resolutions_xml_view
We said in Section 2.2.2 that XML can be stored in at least 3 ways in an object-relational database - in a LOB (Large OBject), shredded into object-relational tables, or natively as XMLType. In this section we assume that you have chosen to store XML in an object-relational database as XMLType. In the Oracle database, XMLType is itself an abstraction which may represent several storage strategies, but we will leave that level of detail for another paper. Consider a table that has three columns, id (number), legis_num (number), and resolution (XMLType), as in Example 7.
create table resolutions_xml ( id number primary key , legis_num number , resolution XMLType ) / |
Example 7: Table resolutions_xml
We create an SQL representation of the XML in our example by using two functions - extract() and extractValue(). These functions take as arguments an XML document and an XPath, and return the result of evaluating the XPath against the XML document. extract() returns a sequence of XML nodes, extractValue() returns a single value. Note that you can apply these functions to any XML document stored either in a table in the database, or in a file on the file-system (if you have the correct access rights), or in the Oracle XML DB repository. Our example happens to use an XML document stored in a table in the database. (Note: extract() and extractValue() are Oracle extensions to SQL/XML. Oracle has proposed these extensions as part of SQL:2005)
create or replace view resolutions_view as (
select
id AS id ,
extractValue( resolution, '/resolution/congress' ) AS congress ,
to_number( extractValue( resolution, '/resolution/session' ) )
AS session_num ,
extractValue( resolution, '/resolution/legis-num') AS legis_num ,
to_date( extractValue( resolution,
'/resolution/action/action-date'), 'YYYYMMDD' ) AS action_date,
extractValue( resolution,
'/resolution/action/action-desc/committee-name')
AS committee_name ,
extractValue( resolution, '/resolution/official-title')
AS official_title
from resolutions_xml
)
/
|
Example 8: resolutions_view
A query against the view resolutions_view (Example 8) returns the same results as a query against the table resolutions - see Example 9 for an example.
select id , action_date from resolutions / select id , action_date from resolutions_view / |
Example 9: Query resolutions, resolutions_view
So far we have avoided any discussion of datatypes. A full discussion of datatypes is outside the scope of this paper. For now, notice that extractValue() in our examples returns a string (VARCHAR2(4000)) which you may coerce into some other SQL datatype using to_number, to_date, etc. In some cases it is possible to deduce the datatype to be returned - when the XML you are extracting from has an associated XML Schema, or is based on a view over SQL data. In such cases, extractValue() will return data of the deduced datatype.
The next cell in our Storage-Representation matrix has data stored as XML, with a representation that is XML in some other form. To achieve this we use the extract() function, which takes as arguments an XML document and an XPath expression, and returns the XML document (or fragment) obtained by evaluating the XPath expression over the XML document (or fragment). Again, we use an XML document stored in a table (Example 7), though we could have chosen a file in the file-system or an XML DB repository entry. And of course we could have applied a stylesheet using an XSLT engine, such as the one built into the Oracle database. But for brevity, we stick with the SQL functions.
create or replace view body_text_xml as (
select
id AS id ,
XMLElement (
"resolution" ,
extract( resolution, '/resolution/congress') ,
extract( resolution, '/resolution/session' ) ,
extract( resolution, '/resolution/legis-num' ) ,
extract( resolution,
'/resolution/resolution-body/paragraph[1]' )
) AS resolution_brief
from resolutions_xml
)
/ |
Example 10: View body_text_xml
Example 10 is a view that represents just a few of the elements of a resolution XML document, including the first paragraph. With this view in place, the query in Example 11 returns an id and a part of the XML document in Example 7. It queries an XML view over XML data, returning XML data.
select id, resolution_brief
from body_text_xml
/
Returns:
ID RESOLUTION_BRIEF
---- ----------------------------------------
42 <resolution>
<congress>108</congress>
<session>2</session>
<legis-num>558</legis-num>
<paragraph>welcomes with enthusiasm th
e accession of Bulgaria,
Estonia, Latvia, Lithuania, Romania, Slo
vakia, and Slovenia to the
North Atlantic Treaty Organization (NATO);
</paragraph>
</resolution>
|
Example 11: Query body_text_xml
| REPRESENTATION | ||||
| STORAGE | SQL | XML | Mixed | |
| Relational | Relational | XML View | Relational + XML View | |
| XML | SQL View | XML Type | SQL View + XML Type | |
| Mixed | Relational + SQL View | XML View + XML Type | SQL View + XML View + Relational + XML Type | |
Table 4: SQL and XML Storage/Representation Duality
We have established that you can store data in tables and yet manipulate it (query, publish it) as if it were XML, and you can store data as XML and yet manipulate it (query, publish it) as if it were stored in tables. And of course you can mix'n'match both kinds of storage and representation. There are a number of reasons for storing and representing data in either tables or XML - some are represented in Table 5. For many, the right answer will be Oracle's XMLType datatype which allows you to have the best of both worlds - it is an abstraction with several storage choices. A customer recently asked me "what's the best way to take all my relational data and convert it into XML?". Of course I asked, "Why ? What are you trying to achieve?" And the answer was, "We have a corporate directive that says we must be all-XML within 12 months". I recommended a good deal of thought and discussion about their storage and representation needs, rather than making a rash decision to physically convert data from one storage to another. In the short term, you can create one or more XML Views across all your relational data, and voilà - you can start using your data as if it were XML. This approach is quick, easy and flexible - if you decide you want the XML to look different, you need only recreate the XML View.
| STORAGE | REPRESENTATION | |
| SQL |
|
|
| XML |
|
|
Table 5: Choosing Some Storage, Representation
SQL is the standard language for querying Object-Relational data (which we have referred to in this paper as "SQL data"). Let's take another look at Example 7. This table has three columns. The first two - id and legis_num - are numbers extracted from the XML document that is represented in full in the third column, resolutions. Those first two columns could be dates or character strings, and they could be any relational data, not necessarily taken from resolutions. You can query the table resolutions_xml using SQL Example 12
select id, legis_num, resolution from resolutions_xml where legis_num > 100 order by legis_num asc / |
Example 12: Query resolutions - SQL
Example 12 selects the id, legis_num and the full resolution, where the legis_num is greater than 100, ordered by legis_num in ascending order. But if you only have SQL-92 you cannot look inside the resolution column - SQL-92 knows nothing about the XML structure or content. If you want to use SQL to query inside the XML data in the resolutions column, you need the SQL/XML extensions, which first start to appear in SQL:2003.
SQL:2003 is the first version of SQL that includes extensions to query XML data. The SQL standard continues to evolve in this area - for details, see the home page of the SQLX group [SQLX]. You have already seen some SQL/XML queries - for example the view creation at Example 8 and Example 10. The functions to create an XML representation - XMLElement(), XMLForest(), XMLAgg(), and others - are already part of the SQL standard. Functions to extract XML and values from XML - extract() and extractValue() - are Oracle extensions that have been proposed for a future version of SQL. Note: SQL/XML, part of the SQL standard, should not be confused with SQLXML (without the slash), which is a Microsoft term for something different.
We said that with SQL-92, when you query XML you can only treat that XML as a single thing - it's not possible to "look inside" the XML and query the structure and content of the XML. With the SQL extension functions extract() and extractValue(), you can look inside the XML and query on its structure and content. The second argument to extract() and extractValue() is an XPath expression. XPath 1.0 is a language defined by the W3C. When an XPath expression is applied to an XML document it returns a either XML (a node or sequence of nodes) or a value (a number, date, string). It's this XPath expression that gives SQL the expressive power to "look inside" the XML, and find values at a particular place in the XML tree which satisfy particular conditions. For example, Example 13:
select
id AS id ,
extract( resolution,
'/resolution[@public-private="public"]/action' ) AS action ,
extractValue( resolution,
'/resolution[congress="108"]/official-title') AS title
from resolutions_xml
/
|
Example 13: SQL/XML query - extract(), extractValue()
This query returns:
42 |
<action>
<action-date>20040311</action-date>
<action-desc>
<sponsor>Bereuter</sponsor>
<cosponsor>Wexler</cosponsor>
<cosponsor>Gillmor</cosponsor>
<cosponsor>Shimkus</cosponsor>
<committee-name>Committee on International Relations
</committee-name>
</action-desc>
</action>
|
Welcoming the accession of Bulgaria, Estonia, Latvia, Lithuania, Romania, Slovakia, and Slovenia to the North Atlantic Treaty Organization (NATO). |
extract() and extractValue() can also be used in the where clause of an SQL query. And there is another extension function - existsNode() - to test for existence of an XPath.
select
id AS id
from resolutions_xml
where existsNode(resolution,
'/resolution/official-title[ora:contains(text(),
"welcoming")>0]', 'xmlns:ora="http://xmlns.oracle.com/xdb"') = 1
/ |
Example 14: SQL/XML query - existsNode(), ora:contains()
Example 14 is an example of an SQL query using existsNode(). This example also shows the Oracle extension function ora:contains. ora:contains does a Full-Text search (as opposed to a simple string search) over the text in official-title. Because ora:contains is an Oracle XPath extension function, it has the namespace prefix ora:, which prefix is defined in the third argument to existsNode().
XPath is good at expressing positions in an XML document and conditions, but it has some limitations when querying XML. The W3C is currently working on an even more expressive language for querying XML - XQuery 1.0 [W3C XQuery]. In the next section we briefly describe XQuery, and show how it can be used on its own or as part of SQL/XML.
While XPath describes path expressions, XQuery is a complete language for querying XML which encompasses XPath. The core expression in XQuery is the FLWOR (pronounced "flower") expression. FLWOR stands for "for ... let ... where ... order by ... return", which is the general shape of a FLWOR expression. Example 15 shows a simple XQuery FLWOR expression.
for $r in doc("/home/PUB/samples/xml2004/resolution.xml")/resolution
let $a := $r/action
where $a/action-date="20040311"
order by $r/legis-num ascending
return
<all-sponsors>
{$a/action-desc/sponsor}
{$a/action-desc/cosponsor}
</all-sponsors>
Result:
<all-sponsors>
<sponsor>Bereuter</sponsor>
<cosponsor>Wexler</cosponsor>
<cosponsor>Gillmor</cosponsor>
<cosponsor>Shimkus</cosponsor>
</all-sponsors>
|
Example 15: Simple XQuery
The XQuery in Example 15 says:
There are a couple of obvious advantages to XQuery over XPath:
To show the join capabilities of XQuery, we need to introduce a second piece of data.Example 16 is an XML document that describes the voting record for the resolution in Example 1.
<rollcall-vote>
<vote-metadata>
<congress>108</congress>
<session>2</session>
<chamber>U.S. House of Representatives</chamber>
<rollcall-num>99</rollcall-num>
<legis-num>558</legis-num>
<vote-question>On Motion to Suspend the Rules and Agree, as
Amended
</vote-question>
<vote-type>2/3 YEA-AND-NAY</vote-type>
<vote-result>Passed</vote-result>
<action-date>20040330</action-date>
<action-time>16:18</action-time>
<vote-desc>Welcoming accession of Bulgaria, Estonia, Latvia,
Lithuania, Romania, Slovakia, and Slovenia to NATO
</vote-desc>
</vote-metadata>
<vote-data>
<recorded-vote>
<legislator party="D" state="HI">Abercrombie</legislator>
<vote>Yea</vote>
</recorded-vote>
<recorded-vote>
<legislator party="D" state="NY">Ackerman</legislator>
<vote>Yea</vote>
</recorded-vote>
<recorded-vote>
<legislator party="R" state="AL">Aderholt</legislator>
<vote>Yea</vote>
</recorded-vote>
<recorded-vote>
<legislator party="R" state="MO">Akin</legislator>
<vote>Yea</vote>
</recorded-vote>
<recorded-vote>
<legislator party="D" state="LA">Alexander</legislator>
<vote>Yea</vote>
</recorded-vote>
<recorded-vote>
<legislator party="R" state="AK">Young (AK)</legislator>
<vote>Yea</vote>
</recorded-vote>
<recorded-vote>
<legislator party="R" state="FL">Young (FL)</legislator>
<vote>Yea</vote>
</recorded-vote>
</vote-data>
</rollcall-vote>
|
Example 16: Sample voting record
Example 17 is a simple XQuery that returns some information about the votes counted.
for $v in doc("/home/PUB/samples/xml2004/vote.xml")/rollcall-vote
let $d := $v/vote-data
return
<vote-count>
<total>
{count($d/recorded-vote[vote="Yea"])}
</total>
<republican>
{count($d/recorded-vote[legislator/@party="R" and vote="Yea"])}
</republican>
<democrat>
{count($d/recorded-vote[legislator/@party="D" and vote="Yea"])}
</democrat>
</vote-count>
Returns:
<vote-count>
<total>7</total>
<republican>4</republican>
<democrat>3</democrat>
</vote-count>
|
Example 17: Simple XQuery - vote count
Now that we have the resolution and the voting record, we can correlate (join) the data in the two collections. For example, Example 18 produces the title, sponsor and Yea-vote-count for each resolution for which we have a voting record. We show the query and results for a single resolution and a single voting record, but of course these queries are only interesting when applied to a large collection of resolutions and voting records.
for $r in doc("/home/PUB/samples/xml2004/resolution.xml")/resolution ,
$v in doc("/home/PUB/samples/xml2004/vote.xml")/rollcall-vote
let $d := $v/vote-data/recorded-vote[vote="Yea"]
where $r/legis-num = $v/vote-metadata/legis-num
return
<resolution>
<title>{$r/official-title/text()}</title>
<sponsor name="{$r/action/action-desc/sponsor}"/>
<yea-votes>{count($d)}</yea-votes>
</resolution>
Returns:
<resolution>
<title>Welcoming the accession of Bulgaria, Estonia, ... </title>
<sponsor name="Bereuter"></sponsor>
<yea-votes>7</yea-votes>
</resolution>
|
Example 18: Simple XQuery join
In the XQuery examples here, we have said nothing about how the data is stored or about how the data should be returned to the user. The XQuery specification says very little about input data, output data, or the host language - all these things are purposely left "implementation-defined" - you could say that XQuery is by definition "context-free". These examples could query data stored in a database, or in a file system. They could run in an SQL engine, in a Java program, or both (a Java engine that talks to a database via e.g. JDBC). In Section 4.4 we describe how XQuery might be used in the context of an SQL (Object-Relational) database with SQL/XML.
We have already looked at some of the XML extensions to SQL, collectively called SQL/XML, that are in SQL:2003. They include some functions that make use of XPath to "look inside" an XML document. The next major release of SQL/XML is expected to include a couple more functions - XMLQuery and XMLTable - that make use of XQuery.
XMLQUERY is a function that naturally fits in the SQL select clause. It takes two arguments, an XML type object and an XQuery string, and returns an XML type object. Example 19 shows the XQuery in Example 17 as an SQL/XML statement. The "select ... from dual", where dual is a "dummy table" with one row, provides a convenient SQL harness for the XQuery.
select
XMLQUERY(
'for $v in
doc("/home/PUB/samples/xml2004/vote.xml")rollcall-vote
let $d := $v/vote-data
return
<vote-count>
<total>
{count($d/recorded-vote[vote="Yea"])}
</total>
<republican>
{count($d/recorded-vote[legislator/@party="R" and vote="Yea"])}
</republican>
<democrat>
{count($d/recorded-vote[legislator/@party="D" and vote="Yea"])}
</democrat>
</vote-count>'
returning content).getStringVal() AS result from dual
/
|
Example 19: Simple XMLQUERY in SQL/XML
Example 20 is a simple SQL/XML query using the XMLTABLE function. XMLTABLE takes in an XPath or XQuery string, and returns the result as a table. This result can be included in the from clause of an SQL/XML query, or it can be used to create a SQL view.
select * from
xmltable('for $r in
doc("/home/PUB/samples/xml2004/resolution.xml")/resolution
let $a := $r/action
where $a/action-date="20040311"
order by $r/legis-num ascending
return
<all-sponsors>
{$a/action-desc/sponsor}
{$a/action-desc/cosponsor}
</all-sponsors>')
/
Result:
<all-sponsors>
<sponsor>Bereuter</sponsor>
<cosponsor>Wexler</cosponsor>
<cosponsor>Gillmor</cosponsor>
<cosponsor>Shimkus</cosponsor>
</all-sponsors>
|
Example 20: Simple XMLTABLE in SQL/XML
Example 21 uses XMLTABLE to model Table 2.
select * from xmltable('
for $r in
doc("/home/PUB/samples/xml2004/resolution.xml")/resolution
let $a := $r/action/action-desc
where $r/action/action-date="20040311"
return
(
<sponsor>
<resid>{$r/@dms-id}</resid>
<type>primary</type>
<name>{$a/sponsor/text()}</name>
</sponsor>,
for $j in $a/cosponsor
return
<sponsor>
<resid>{$r/@dms-id}</resid>
<type>cosponsor</type>
<name>{$j/text()}</name>
</sponsor>
)'
columns
resolution_id number PATH '/sponsor/resid',
sponsor varchar2(4000) PATH '/sponsor/name/text()',
sponsor_type varchar2(200) PATH '/sponsor/type'
)
/
|
Example 21: XMLTABLE in SQL/XML - resolutions_sponsors
Example 21 is a bit more complex than Example 20. First, the XQuery models a repeating field in the denormalized resolutions_sponsors table (co_sponsor), using a sub-query (starting at "for $j ..."). Second, XMLTABLE includes the "columns" keyword to describe the shape of the table. The table returned by XMLTABLE is, by default, a single-column table where the column is named "COLUMN_VALUE". Using the "columns" keyword we can name the columns of the output table, assign the value of a PATH to that column, and cast it to a SQL type. The PATH is evaluated relative to the XML returned by the XQuery in the first argument.
In this section you have seen how XQuery, and not just XPath, can be used as part of SQL/XML to query data in an XML representation and return results either in an SQL or an XML representation. The data that is queried by XMLQuery and XMLTABLE can be stored as XML Type objects in Object-Relational tables, or it can be stored as "files" in the Oracle XML DB repository. A third source of data for XMLQuery and XMLTable is relational data stored in tables. For that, Oracle has created an extension function - ora:view - that takes in a relational table (or a view) and outputs an XML representation of that table (or view).
| STORAGE/REPRESENTATION | ||||
| QUERY | Relational Tables | XML Type | Mixed | |
| SQL | SQL query | Use extract, extractValue, existsNode (with XPath argument), or XMLTABLE (with XPath or XQuery argument) | Use extract, extractValue, existsNode, or XMLTABLE function. Consider creating an SQL View. | |
| XQuery | Use XMLQUERY function over ora:view | Use XMLQUERY function over ora:view or Oracle XML DB Repository files | Use XMLQUERY function over ora:view. Consider creating an XML View. | |
| Mixed | Use XMLQUERY function over ora:view in any SQL query | Use extract, extractValue. existsNode, or XMLTABLE function + XMLQUERY | Use extract, extractValue. existsNode, or XMLTABLE function + XMLQUERY | |
Table 6: SQL and XML Storage/Representation Duality
We started this paper by describing the duality of SQL and XML storage and representation - you can store data either as SQL or as XML and then, whatever storage choice you made, represent the data as either SQL or XML. There is also a duality of SQL and XML query. It is impossible to query "inside" a piece of XML using unextended SQL. But we have shown that the recent SQL extensions in SQL/XML allow you to use SQL to query both SQL and XML data, including querying "inside" XML using XPath or XQuery. This strategy lets you choose the data storage and representation that makes most sense from a technical, business and organizational perspective, while satisfying your customers' and suppliers' requirements for data in a particular format for interchange and publishing. And you can choose the query language that best suits your needs, tools and skills set.
Let's revisit the hypothesis of this paper - that SQL/XML and XQuery are not competitive, but rather complementary, technologies. We have shown that if you need to use XQuery to query your data - perhaps queries are coming in from some application that only speaks XQuery - that data can be stored in a database either in tables or in a native XML type. And we have shown that if you want to use SQL to query your data - perhaps you have tools and available skills for SQL but not for XQuery - then you can use SQL/XML (leveraging XPath and possibly XQuery) to do the bits that need to look "inside" an XML structure.
When we discussed storage and representation, we recommended that you think carefully about your requirements before deciding between the many storage and representation options (and combinations of options) available. In the same way we recommend that you consider the query options - SQL, SQL/XML, XPath and XQuery - and the many combinations of query options available before making a decision.
In writing this paper, we have tried to lay out the options for storage, representation and query of data into some context, in a simple and straightforward way. We have glossed over some details, and missed some whole topics. These will have to wait for some future paper. We list the main ones below:
To read more about the XML publishing functions of SQL/XML (the XMLxxx functions), see http://www.oracle.com/technology/oramag/oracle/03-may/o33xml.html, by Jonathan Gennick. For more information about the SQL/XML standards effort, go to the SQL Group Home Page ([SQLX]). And for a guide to the emerging XQuery standard, see Michael Brundage's book, "XQuery: The XQuery Language". For code samples and followups to this paper, as well as the latest news, downloads, docs and papers on Oracle's XML offerings, you should bookmark [Oracle XML].
XHTML rendition made possible by SchemaSoft's Document Interpreter™ technology.