| Papers by Author | Papers by Title | Papers by Keyword |
Extreme Markup Languages 2003: Proceedings |
Proceedings Home Page |
| View Abstract/Table of Contents (in its own window) |
This paper examines the difficulty of the schema conformance problem, which is defined as follows:
Given a schema, is there an XML [eXtensible Markup Language] document that conforms to the schema?
If there is an XML document that conforms to a given schema, then we say that the schema is satisfiable.
For example, the following DTD [Document Type Definition] [ZVON02], [W3C02] is not satisfiable because each section element must contain one or more section elements:
<!ELEMENT section ( section+ ) >
One could argue that this DTD can be satisfied by an infinitely large XML document and that the XML recommendation [W3C02] does not explicitly state that a well-formed XML document must be finite. However, as a practical matter, we don’t consider an XML schema to be satisfiable if it can only be satisfied by an infinite XML document.
It would be nice if your favorite XML schema editor could tell you whether or not your schema was satisfiable. Unfortunately, as we’ll see, the schema conformance problem is intractable1 for the following XML schema languages:
Thus, for these XML schema languages, it’s not feasible to build an XML schema editor that can do either of the following for any given schema:
The remainder of this paper examines the difficulty of the schema conformance problem for the XML schema languages listed above. We assume that the reader has some familiarity with these XML schema languages. We don’t assume that the reader is familiar with concepts from computational complexity (e.g., NP-hard, undecidable, etc.). We use footnotes to provide a brief introduction to these terms.
The following is the outline of this paper:
The DTD conformance problem can be solved in linear time2 when the DTDs do not contain IDs and IDREFs. [Arenas et al.], [Fan/Libkin]
For DTDs that declare IDREF attributes that have fixed values, determining the satisfiability of the DTD can become a difficult problem.
The following is an example of a DTD that declares IDREF attributes that have fixed values:
<!-- A nyc-census document specifies the population of the five boroughs of New York City. The borough elements may appear in any order. There must be one borough element for each of the five boroughs. The order independence of the boroughs is achieved through the use of ID and IDREF attributes. --> <!ELEMENT nyc-census ( borough, borough, borough, borough, borough )> <!ATTLIST nyc-census borough-1 IDREF #FIXED "Bronx"> <!ATTLIST nyc-census borough-2 IDREF #FIXED "Brooklyn"> <!ATTLIST nyc-census borough-3 IDREF #FIXED "Manhattan"> <!ATTLIST nyc-census borough-4 IDREF #FIXED "StatenIsland"> <!ATTLIST nyc-census borough-5 IDREF #FIXED "Queens"> <!ELEMENT borough EMPTY> <!ATTLIST borough name ID #REQUIRED> <!ATTLIST borough population CDATA #REQUIRED>
This DTD specifies the schema for a class of documents that provide New York City census data. A nyc-census document specifies the population of the five boroughs of New York City. The borough elements may appear in any order. There must be one borough element for each of the five boroughs. The order independence of the boroughs is achieved through the use of ID and IDREF attributes.
The following is an XML document that conforms to the DTD above:
<?xml version="1.0"?> <!DOCTYPE nyc-census SYSTEM "fixed-idref.dtd"> <nyc-census> <borough name="Brooklyn" population="2465326"/> <borough name="Queens" population="2229379"/> <borough name="Manhattan" population="1537196"/> <borough name="Bronx" population="1332650"/> <borough name="StatenIsland" population= "443728"/> </nyc-census>
Note that a simple mistake in the DTD for the nyc-census documents can make the DTD unsatisfiable. For example, the following DTD is unsatisfiable because the nyc-census element must contain exactly four borough elements, but the IDREF attributes refer to five distinct boroughs:
<!-- Warning: the following element declaration is incorrect. The nyc-census element should have five borough subelements, but only four have been declared. --> <!ELEMENT nyc-census ( borough, borough, borough, borough )> <!ATTLIST nyc-census borough-1 IDREF #FIXED "Bronx"> <!ATTLIST nyc-census borough-2 IDREF #FIXED "Brooklyn"> <!ATTLIST nyc-census borough-3 IDREF #FIXED "Manhattan"> <!ATTLIST nyc-census borough-4 IDREF #FIXED "StatenIsland"> <!ATTLIST nyc-census borough-5 IDREF #FIXED "Queens"> <!ELEMENT borough EMPTY> <!ATTLIST borough name ID #REQUIRED> <!ATTLIST borough population CDATA #REQUIRED>
Note that an XML document is not valid if it contains an IDREF attribute value that is not declared by one of the ID attributes.
In a posting to the xml-dev mailing list [Thompson], Henry Thompson showed that the DTD conformance problem is NP-hard.3
The following is a sketch of the proof that the DTD conformance problem is NP-hard:3
Consider the following instance of the 3SAT problem:
( x1 OR x2 OR x3 ) AND ( x2 OR NOT(x3) OR NOT(x4) ) AND ( x3 OR x4 OR NOT(x1) )
The following DTD encodes this instance of the 3SAT problem:
<!ELEMENT solution ( variableAssignments, clauses )> <!ELEMENT variableAssignments ( x1Assignment, x2Assignment, x3Assignment, x4Assignment )> <!ELEMENT x1Assignment ( x1isTrue | x1isFalse )> <!ATTLIST x1Assignment value ID #REQUIRED> <!ELEMENT x2Assignment ( x2isTrue | x2isFalse )> <!ATTLIST x2Assignment value ID #REQUIRED> <!ELEMENT x3Assignment ( x3isTrue | x3isFalse )> <!ATTLIST x3Assignment value ID #REQUIRED> <!ELEMENT x4Assignment ( x4isTrue | x4isFalse )> <!ATTLIST x4Assignment value ID #REQUIRED> <!ELEMENT clauses ( ( x1isTrue | x2isTrue | x3isTrue ), ( x2isTrue | x3isFalse | x4isFalse ), ( x1isFalse | x3isTrue | x4isTrue ) )> <!ELEMENT x1isTrue EMPTY> <!ATTLIST x1isTrue value IDREF #FIXED "x1isTrue"> <!ELEMENT x1isFalse EMPTY> <!ATTLIST x1isFalse value IDREF #FIXED "x1isFalse"> <!ELEMENT x2isTrue EMPTY> <!ATTLIST x2isTrue value IDREF #FIXED "x2isTrue"> <!ELEMENT x2isFalse EMPTY> <!ATTLIST x2isFalse value IDREF #FIXED "x2isFalse"> <!ELEMENT x3isTrue EMPTY> <!ATTLIST x3isTrue value IDREF #FIXED "x3isTrue"> <!ELEMENT x3isFalse EMPTY> <!ATTLIST x3isFalse value IDREF #FIXED "x3isFalse"> <!ELEMENT x4isTrue EMPTY> <!ATTLIST x4isTrue value IDREF #FIXED "x4isTrue"> <!ELEMENT x4isFalse EMPTY> <!ATTLIST x4isFalse value IDREF #FIXED "x4isFalse">
The DTD is satisfiable if and only if the 3SAT instance has a solution. The following XML document conforms to this DTD and solves the instance of the 3SAT problem:
<?xml version="1.0"?> <!DOCTYPE solution SYSTEM "3sat-01.dtd"> <solution> <variableAssignments> <x1Assignment value="x1isTrue"> <x1isTrue/> </x1Assignment> <x2Assignment value="x2isFalse"> <x2isFalse/> </x2Assignment> <x3Assignment value="x3isFalse"> <x3isFalse/> </x3Assignment> <x4Assignment value="x4isTrue"> <x4isTrue/> </x4Assignment> </variableAssignments> <clauses> <x1isTrue/> <x3isFalse/> <x4isTrue/> </clauses> </solution>
The following XML document does not conform with the DTD and does not solve the instance of the 3SAT problem:
<?xml version="1.0"?> <!DOCTYPE solution SYSTEM "3sat-01.dtd"> <solution> <variableAssignments> <x1Assignment value="x1isTrue"> <x1isTrue/> </x1Assignment> <x2Assignment value="x2isTrue"> <x2isTrue/> </x2Assignment> <x3Assignment value="x3isFalse"> <x3isFalse/> </x3Assignment> <x4Assignment value="x4isFalse"> <x4isFalse/> </x4Assignment> </variableAssignments> <clauses> <x1isTrue/> <x3isFalse/> <x4isTrue/> </clauses> </solution>
In summary, the DTD conformance problem is solvable in linear time when the DTDs do not contain ID and IDREF attributes. [Arenas et al.], [Fan/Libkin] The ID/IDREF feature of DTDs is very useful, but the downside of this feature is that it makes the DTD conformance problem NP-hard. [Thompson]
Next we examine the schema conformance problem for RELAX NG [Regular Language description for XML Next Generation] [OASIS01], [ThaiOpen], [Cover02], which is an elegant and powerful schema language that was developed by James Clark, MURATA Makoto, and other members of the RELAX NG technical committee. At the Extreme Markup Languages 2003 conference, MURATA Makoto offered a proof that the RELAX NG conformance problem is solvable in linear time2 (when the RELAX NG schemas use only the built-in datatypes).
For RELAX NG schemas that use the W3C XML Schema datatypes [W3C01], [Cover03], [xFront], determining the satisfiability of the schema can become a difficult problem. A datatype in such schemas may specify multiple constraints (e.g., a regular expression, a maximum length, etc.), all of which must be satisfied. Specifically, the RELAX NG data element, which specifies a datatype, may contain multiple param elements, which are ANDed together; each param element specifies a constraint (e.g., a regular expression). The ability to specify multiple param elements within a data element is very useful; however, it’s possible to specify two param elements that are mutually exclusive.
The following is an example of a RELAX NG schema in which the data element contains two conflicting param elements:
<?xml version="1.0" encoding="UTF-8"?> <element name="conflicted" xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary= "http://www.w3.org/2001/XMLSchema-datatypes"> <data type="string"> <param name="pattern">(aa)+</param> <param name="pattern">a(aa)+</param> </data> </element>
The first param element specifies that the conflicted element must contain an even number of “a” characters. The second param element specifies that the conflicted element must contain an odd number of “a” characters. There are no XML documents that conform to this schema because the content of the conflicted element cannot match both patterns.
For data elements that contain multiple param elements such that each param element specifies a complex regular expression, it can be very difficult to determine if the datatype is satisfiable.
It turns out that the RELAX NG conformance problem is NP-hard3 when the schemas are allowed to use W3C XML Schema Datatypes. [W3C01], [Cover03], [xFront] The following is a brief sketch of the proof:
Let’s look at a RELAX NG schema that encodes the following instance of the 3SAT problem:
( x1 OR x2 OR x3 ) AND ( x2 OR NOT(x3) OR NOT(x4) ) AND ( x3 OR x4 OR NOT(x1) )
The following RELAX NG schema encodes this 3SAT instance. Note that each of the four variables (i.e., x1, x2, x3, and x4) must be equal to either “0” (i.e., FALSE) or “1” (i.e., TRUE).
<?xml version="1.0" encoding="UTF-8"?> <element name="three_sat_solution" xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary= "http://www.w3.org/2001/XMLSchema-datatypes"> <data type="string"> <param name="pattern">x1=[01],x2=[01],x3=[01],x4=[01]</param> <param name="pattern">(.*x1=1.*)|(.*x2=1.*)|(.*x3=1.*)</param> <param name="pattern">(.*x2=1.*)|(.*x3=0.*)|(.*x4=0.*)</param> <param name="pattern">(.*x3=1.*)|(.*x4=1.*)|(.*x1=0.*)</param> </data> </element>
This RELAX NG schema is satisfiable if and only if the 3SAT instance has a solution. The following XML document conforms to the RELAX NG schema and solves the instance of the 3SAT problem:
<?xml version="1.0" encoding="UTF-8"?> <three_sat_solution>x1=1,x2=0,x3=0,x4=1</three_sat_solution>
The following XML document does not conform with the RELAX NG schema and does not solve the instance of the 3SAT problem:
<?xml version="1.0" encoding="UTF-8"?> <three_sat_solution>x1=1,x2=1,x3=0,x4=0</three_sat_solution>
In summary, the RELAX NG conformance problem is solvable in linear time2 when the schemas use only the built-in datatypes. The RELAX NG conformance problem is NP-hard3 when the schemas are allowed to use W3C XML Schema Datatypes (or other datatype libraries that support regular expressions).
When I first looked at the schema conformance problem for DTDs and RELAX NG with W3C XML Schema datatypes, I didn’t suspect that these problems are intractable.1 On the other hand, when I first looked at the Schematron conformance problem, it was immediately clear to me that the problem is difficult, if not intractable. Schematron [Schematron], [Cover01], [SourceForge], [ZVON01] schemas define validity constraints using XPath 1.0 expressions [W3C03], and it can be very difficult to determine if there exists an XML document that satisfies a set of XPath expressions. For example, it’s fairly straightforward to encode an instance of the 3SAT problem as a Schematron schema, such that the 3SAT instance is solvable if and only if the schema is satisfiable.
It turns out that the Schematron conformance problem is so difficult that it is undecidable6 (i.e., unsolvable). In the article entitled The Undecidability of the Schematron Conformance Problem [Lyons], the author used the Post Correspondence Problem [Post01], [PCP], [Post02], [Post03], [Lyons] to prove that Schematron conformance problem is undecidable.6 This is true even when the problem is restricted to Schematron schemas that do not use the document and key functions of XPath.
The undecidability of the Schematron conformance problem can also be proved more easily using the undecidability of Hilbert’s Tenth Problem.7
The following is a sketch of the proof that the Schematron conformance problem is undecidable:6
The fact that the Schematron conformance problem is undecidable6 doesn’t mean that you can’t devise a specific algorithm that determines if there is an XML document that conforms to a particular schema. The undecidability of the problem does mean that it is impossible to devise a general algorithm that can make this determination for any Schematron schema.
Let’s look at a Schematron schema that encodes the following Diophantine equation:7
(2x3 - 5y2)(z2 + 1) + 20 = 0
The following Schematron schema encodes this Diophantine equation:
<?xml version="1.0"?> <schema xmlns="http://www.ascc.net/xml/schematron"> <!-- This schematron schema verifies that an XML document contains a solution to the following Diophantine equation: (2x^3 - 5y^2)(z^2 + 1) + 20 = 0 --> <pattern name="check-solution"> <rule context="/solution"> <assert test="(2*x*x*x - 5*y*y)*(z*z + 1) + 20 = 0"> The solution does not satisfy the Diophantine equation. </assert> </rule> </pattern> <pattern name="check-syntax"> <rule context="/"> <assert test="solution"> The document element must be the solution element. </assert> </rule> <rule context="/solution"> <assert test="count(x) = 1"> The solution element must contain one x subelement. </assert> <assert test="count(y) = 1"> The solution element must contain one y subelement. </assert> <assert test="count(z) = 1"> The solution element must contain one z subelement. </assert> <assert test="string(number(x)) != 'NaN' and floor(x) = x"> The value of the x element must be an integer. </assert> <assert test="string(number(y)) != 'NaN' and floor(y) = y"> The value of the y element must be an integer. </assert> <assert test="string(number(z)) != 'NaN' and floor(z) = z"> The value of the z element must be an integer. </assert> </rule> </pattern> </schema>
This Schematron schema is satisfiable if and only if the Diophantine equation has a solution among the integers. The following XML document conforms to this Schematron schema and specifies a solution among the integers to our Diophantine equation:
<?xml version="1.0"?> <solution> <x>2</x> <y>2</y> <z>2</z> </solution>
The following XML document does not conform to the Schematron schema and does not specify a solution among the integers to the Diophantine equation:
<?xml version="1.0"?> <solution> <x>1</x> <y>1</y> <z>1</z> </solution>
It should be noted that Guido Moerkotte proved the undecidability of determining whether or not an XPath expression is satisfiable. [Moerkotte] Specifically, he proved the following theorem:
Given an XPath expression P, it is undecidable whether there exist a document D and a node v in D such that evaluating P with current node v results in the empty set.
Guido Moerkotte provides several proofs for this theorem. One of them is based on the undecidability of Hilbert’s Tenth problem. Another is based on the undecidability of the Post Correspondence Problem. Of course, his theorem implies that the Schematron conformance problem is undecidable.6
Recall that the DTD conformance problem is NP-hard3 because an instance of the 3SAT problem can be encoded as a DTD, such that the 3SAT instance has a solution if and only if the DTD is satisfiable. ID and IDREF attributes were required to encode a 3SAT instance as a DTD. Similarly, we can prove that the W3C XML Schema conformance problem is NP-hard3 using the 3SAT problem, since W3C XML Schema supports ID and IDREF attributes.
W3C XML Schema [W3C01], [Cover03], [xFront] also includes key and key reference features that are more powerful and flexible than ID/IDREF attributes. W3C XML Schema supports multiple keys, globally unique keys, locally unique keys, composite keys, keys on elements and/or attributes, etc. Marcelo Arenas, Wenfei Fan and Leonid Libkin have shown that the key and key reference features of W3C XML Schema make the W3C XML Schema conformance problem undecidable6; they use Hilbert’s Tenth Problem to prove the undecidability of the W3C XML Schema conformance problem. [Arenas et al.]
The Namespace Routing Language [Clark] is a new schema language that was developed by James Clark. An NRL [Namespace Routing Language] schema maps namespaces and/or elements to subschemas. Each subschema, which may be a RELAX NG schema, a W3C XML Schema, etc., is used to validate the corresponding namespace and/or element. NRL supports concurrent validation, where a namespace and/or element must conform to multiple subschemas.
It turns out that the NRL conformance problem is NP-hard3 even when the subschemas are restricted to RELAX NG schemas that use only the built-in datatypes. The following is a brief sketch of the proof:
Let’s look at an NRL schema that encodes the following instance of the 3SAT problem:
( x1 OR x2 OR x3 ) AND ( x2 OR NOT(x3) OR NOT(x4) ) AND ( x3 OR x4 OR NOT(x1) )
The following NRL schema encodes this 3SAT instance:
<?xml version="1.0" encoding="UTF-8"?> <rules xmlns="http://www.thaiopensource.com/validate/nrl"> <namespace ns=""> <validate schema="clause_1.rnc" schemaType="application/x-rnc"/> <validate schema="clause_2.rnc" schemaType="application/x-rnc"/> <validate schema="clause_3.rnc" schemaType="application/x-rnc"/> </namespace> </rules>
This NRL schema is satisfiable if and only if the 3SAT instance has a solution. The NRL schema specifies that elements in the empty namespace must conform concurrently to three RELAX NG subschemas, which are written in the RELAX NG compact syntax. [OASIS02] Each subschema corresponds to one of the clauses of the 3SAT instance. The three subschemas include some common patterns that are defined in common.rnc.
The following is the clause_1.rnc subschema, which encodes the first clause of the 3SAT instance:
# Clause 1: ( x1 OR x2 OR x3 )
start = element solution {
( x1isTrue, x2isTrueOrFalse, x3isTrueOrFalse, x4isTrueOrFalse ) |
( x1isTrueOrFalse, x2isTrue, x3isTrueOrFalse, x4isTrueOrFalse ) |
( x1isTrueOrFalse, x2isTrueOrFalse, x3isTrue, x4isTrueOrFalse )
}
include "common.rnc"The following is the clause_2.rnc subschema, which encodes the second clause of the 3SAT instance:
# Clause 2: ( x2 OR NOT(x3) OR NOT(x4) )
start = element solution {
( x1isTrueOrFalse, x2isTrue, x3isTrueOrFalse, x4isTrueOrFalse ) |
( x1isTrueOrFalse, x2isTrueOrFalse, x3isFalse, x4isTrueOrFalse ) |
( x1isTrueOrFalse, x2isTrueOrFalse, x3isTrueOrFalse, x4isFalse )
}
include "common.rnc"The following is the clause_3.rnc subschema, which encodes the third clause of the 3SAT instance:
# Clause 3: ( x3 OR x4 OR NOT(x1) )
start = element solution {
( x1isTrueOrFalse, x2isTrueOrFalse, x3isTrue, x4isTrueOrFalse ) |
( x1isTrueOrFalse, x2isTrueOrFalse, x3isTrueOrFalse, x4isTrue ) |
( x1isFalse, x2isTrueOrFalse, x3isTrueOrFalse, x4isTrueOrFalse )
}
include "common.rnc"The following are the patterns that are defined in common.rnc:
# Common patterns used by all the subschemas.
x1isTrue = element x1isTrue { empty }
x1isFalse = element x1isFalse { empty }
x2isTrue = element x2isTrue { empty }
x2isFalse = element x2isFalse { empty }
x3isTrue = element x3isTrue { empty }
x3isFalse = element x3isFalse { empty }
x4isTrue = element x4isTrue { empty }
x4isFalse = element x4isFalse { empty }
x1isTrueOrFalse = ( x1isTrue | x1isFalse )
x2isTrueOrFalse = ( x2isTrue | x2isFalse )
x3isTrueOrFalse = ( x3isTrue | x3isFalse )
x4isTrueOrFalse = ( x4isTrue | x4isFalse )The following XML document conforms to the NRL schema and solves the instance of the 3SAT problem:
<?xml version="1.0"?> <solution> <x1isTrue/> <x2isFalse/> <x3isFalse/> <x4isTrue/> </solution>
The following XML document does not conform to the NRL schema and does not solve the instance of the 3SAT problem:
<?xml version="1.0"?> <solution> <x1isTrue/> <x2isTrue/> <x3isFalse/> <x4isFalse/> </solution>
In summary, the NRL conformance problem is NP-hard3 even when the subschemas are restricted to RELAX NG schemas.
Table 1 summarizes the difficulty of the schema conformance problems that we discussed in this paper.
The schema conformance problem can probably be solved in linear time2 for any XML schema language that is purely grammar-based.
Thus, an XML schema editor could probably do the following for any given schema:
The schema conformance problem is likely to be NP-hard3 for any XML schema language that provides one of the following features:
The schema conformance problem is likely to be undecidable6 (i.e., unsolvable) for any XML schema language that provides either of the following two features:
| Schema Language | Difficulty of the Conformance Problem |
|---|---|
| DTD (with fixed-value IDREF attributes) | NP-hard |
| DTD without IDs & IDREFs | Linear time |
| RELAX NG with W3C XML Schema Datatypes | NP-hard |
| RELAX NG (using only the built-in datatypes) | Linear time |
| Schematron | Undecidable |
| W3C XML Schema | Undecidable |
| NRL (even when restricted to RELAX NG subschemas) | NP-hard |
| 1. |
A problem is intractable [Garey/Johnson] if it can’t be solved in polynomial time. A problem can be solved in polynomial time if the number of steps required to solve the problem is bounded by a polynomial function of the size of the problem. An example of such a polynomial function is T <= aNk where T is the number of steps required to solve the problem, N is the size of the problem, and a and k are constants. |
| 2. |
A problem can be solved in linear time if the number of steps required to solve the problem is bounded by a linear function of the size of the problem. An example of such a linear function is T <= aN + b where T is the number of steps required to solve the problem, N is the size of the problem, and a and b are constants. |
| 3. |
Problems that are NP-hard [Non-deterministic Polynomial hard] [Wikipedia03], [Garey/Johnson] are at least as hard as NP-complete5 problems. Problems that are NP-complete [Wikipedia02] are decidable, but the amount of time required to solve an NP-complete problem is thought to grow exponentially as the size of the problem increases. Note that NP-hard problems also include undecidable6 problems. |
| 4. |
The 3SAT [3-Satisfiability] problem (a.k.a., the 3-CNF [3-Conjunctive Normal Form] problem) [Wikipedioa04], [Garey/Johnson] is a boolean logic problem that is NP-complete.5 An instance of the 3SAT problem is a boolean expression that includes of one or more boolean variables. (The value of a boolean variable may be TRUE or FALSE.) Also, the boolean expression in a 3SAT instance is structured as one or more clauses that are ANDed together. Each clause consists of three literals that are ORed together; each literal is either a boolean variable (e.g., x4) or the negation of a boolean variable (e.g., NOT(x2)). Each boolean variable may appear multiple times in the boolean expression. For example, the following instance of the 3SAT problem includes four boolean variables (i.e., x1, x2, x3, and x4) and 3 clauses: ( x1 OR x2 OR x3 ) AND ( x2 OR NOT(x3) OR NOT(x4) ) AND ( x3 OR x4 OR NOT(x1) ) A 3SAT instance can be solved if there is an assignment of values to the boolean variables, such that the boolean expression evaluates to TRUE. |
| 5. |
Problems that are NP-complete [Non-deterministic Polynomial complete] [Wikipedia02], [Garey/Johnson] are decidable (i.e., solvable), but the amount of time required to solve an NP-complete problem is thought to grow exponentially as the size of the problem increases. For example, the amount of time required to solve the 3SAT4 problem increases exponentially as the number of boolean variables increases. No one has found a polynomial time algorithm that solves an NP-complete problem, and it’s very unlikely that anyone ever will. |
| 6. |
A problem is undecidable [Wikipedia05], [Garey/Johnson] if it is a decision problem that cannot be solved. A decision problem is a problem that is posed as a “yes or no” question (e.g., for any given number, is it a prime number?). A decision problem is undecidable if it is impossible to devise an algorithm that solves the problem. It’s widely believed that if a problem is undecidable, then it can’t be solved using software. Perhaps the most famous undecidable problem is Alan Turing’s Halting Problem. [Wikipedia01] The halting problem asks: for any given algorithm and any given input to the algorithm, will the algorithm eventually halt (i.e., stop) on that input? |
| 7. |
Hilbert’s Tenth Problem is to find a general algorithm that determines whether or not any given Diophantine equation [Wikipedia07] has a solution among the integers. [Wikipedia06] The problem is undecidable.6 A Diophantine equation is a polynomial equation with integer coefficients. The following are examples of Diophantine equations:
|
Thanks to the peer review panel of Extreme Markup Languages 2003 for their helpful and thoughtful comments. Thanks also to John Cowan for his comments on DTD conformance.
[Arenas et al.] Arenas, M., W. Fan, and L. Libkin. What’s Hard about XML Schema Constraints? http://www.cs.toronto.edu/~marenas/publications/xsc_dexa02.pdf.
[Clark] Clark, J. Namespace Routing Language (NRL), June, 2003. http://www.thaiopensource.com/relaxng/nrl.html.
[Cover01] Cover Pages: Schematron: XML Structure Validation Language Using Patterns in Trees. http://xml.coverpages.org/schematron.html.
[Cover02] Cover Pages: RELAX NG. http://xml.coverpages.org/relax-ng.html.
[Cover03] Cover Pages: XML Schema. http://xml.coverpages.org/schemas.html.
[Fan/Libkin] Fan, W., and L. Libkin. On XML integrity constraints in the presence of DTDs. In PODS’01, pages 114-125.
[Garey/Johnson] Garey, M., and D. Johnson. Computers and Intractability; A Guide to the Theory of NP-completeness, 1979.
[Lyons] Lyons, R. The Undecidability of the Schematron Conformance Problem, Oct. 2003. http://www.unidex.com/scp/.
[Moerkotte] Moerkotte, G. Incorporating XSL Processing Into Database Engines. Technical Report 7, University of Mannheim, 2002.
[OASIS01] OASIS Technical Committee: RELAX NG. http://www.oasis-open.org/committees/relax-ng/.
[OASIS02] OASIS Technical Committee: RELAX NG Compact Syntax. http://relaxng.org/compact.html.
[PCP] PCP: A Nice Problem. http://www.cs.ualberta.ca/~zhao/PCP/intro.htm.
[Post01] Post’s Correspondence Problem. http://www.cis.ohio-state.edu/~gurari/theory-bk/theory-bk-fourse7.html.
[Post02] The Undecidability of Predicate Logic and the Post Correspondence Problem. http://www.cis.udel.edu/~chester/courses/604hw/undecidable.
[Post03] Post’s Correspondence Problem. http://www.dcs.napier.ac.uk/~andrew/pcp.html.
[Schematron] Academia Sinica Computing Centre’s Schematron Home Page. http://www.ascc.net/xml/resource/schematron/schematron.html.
[SourceForge] Schematron Project on SourceForge.net. http://sourceforge.net/projects/schematron.
[ThaiOpen] Thai Open Source: RELAX NG. http://www.thaiopensource.com/relaxng/.
[Thompson] Thompson, H. ID/IDREF makes XML generation NP-hard. http://aspn.activestate.com/ASPN/Mail/Message/xml-dev/1584981.
[W3C01] W3C. XML Schema. W3C Recommendation, May 2001. http://www.w3.org/XML/Schema.
[W3C02] W3C. Extensible Markup Language (XML) 1.0, Second Edition. W3C Recommendation, Oct. 2000. http://www.w3.org/TR/2000/REC-xml-20001006.
[W3C03] W3C. XML Path Language (XPath). W3C Recommendation, Nov. 1999. http://www.w3.org/TR/1999/REC-xpath-19991116.
[Wikipedia01] Wikipedia. Halting problem. http://www.wikipedia.org/wiki/Halting_Problem.
[Wikipedia02] Wikipedia. NP-complete. http://www.wikipedia.org/wiki/NP-Complete.
[Wikipedia03] Wikipedia. NP-hard. http://www.wikipedia.org/wiki/NP-hard.
[Wikipedia05] Wikipedia. Decision problem. http://www.wikipedia.org/wiki/Decision_problem.
[Wikipedia06] Wikipedia. Hilbert’s 10th Problem. http://www.wikipedia.org/wiki/Hilbert's_tenth_problem.
[Wikipedia07] Wikipedia. Diophantine equation. http://www.wikipedia.org/wiki/Diophantine_equation.
[Wikipedioa04] Wikipedia. Boolean satisfiability problem. http://www.wikipedia.org/wiki/Boolean_satisfiability_problem.
[xFront] xFront.com: XML Schema Tutorial. http://www.xfront.com/.
[ZVON01] ZVON.org: Schematron Tutorial. http://www.zvon.org/xxl/SchematronTutorial/General/contents.html.
[ZVON02] ZVON.org: DTD Tutorial. http://www.zvon.org/xxl/DTDTutorial/General/book.html.
| Papers by Author | Papers by Title | Papers by Keyword |