XML Europe 2003 logo

XML and .NET, a healthy relationship

Abstract

XML plays a key role in the .NET Framework, not only for interaction with other systems, but also for storing and transmitting data within applications and components. This paper gives an overview of the XML support in the .NET Framework and shows various methods available to work with XML.

Keywords

».NET, »ASP.NET, »Database, »DOM, »MSXML, »SAX, »XPath, »XSLT.

Table of Contents

1. Introduction
2. Standards support
3. Manipulating XML
3.1. Working with the XmlReader classes
3.2. Using DOM with the XmlDocument class
3.3. Using the DataSet class
4. ASP.NET and XML
4.1. Data binding
Bibliography
Glossary
Biography

1. Introduction

The .NET Framework radically changes how the Microsoft platform deals with XML. Previously the Microsoft XML Core Services (a.k.a. Microsoft XML Core Services (MSXML)) provided XML parser and Extensible Stylesheet Language (XSLT) processor support for Windows. Like many other parser and processors MSXML is just an add-on to the system, without any integration with other parts of the system. In the .NET Framework XML is not just part of the system, but much of the system actually relies on XML-based mechanisms for communicating and (temporarily) storing information.

2. Standards support

The classes in the .NET Framework all conform the latest standards. This means that you don't have to worry about interoperability with other systems. Below is a list of the most important standards supported:

  • XML 1.0

  • DTD

  • XML Schema 1.0

  • XML Namespaces

  • DOM Level 2 Core

  • SOAP 1.1

  • XPath 1.0

  • XSLT 1.0

Because the XML support in the .NET Framework is extensible and completely plugable, it is likely that the standards support will grow along with the standards that evolve from the W3 Consortium, as well as collaborative projects that Microsoft in involved in. A good example of this is the technology preview of XQuery that you can download from http://xqueryservices.com. Because the framework is completely plugable, you could easily replace the existing support with third party support if so desired.

3. Manipulating XML

The classes available to manipulate XML are all situated in the System.XML namespace (or child namespaces thereof). There are basically two options for manipulating XML. One is through the Document Object Model (DOM), using the XmlDocument class. The other option is through a forward-only, non-caching cursor type approach using the XmlReader class. People familiar with DOM implementations, and specifically MSXML, will have no trouble working with the XmlDocument. The XmlReader approach however is a completely new approach to working with XML. Like Simple API for XML (SAX) implementations it operates under the premise that you don't want to have the entire document tree in memory. Having the entire tree in memory takes a lot of memory, especially with large documents. SAX solves this problem by implementing an event-driven push model. System.XML solves the same problem, but the other way around, by implementing a pull model based on a cursor that walks through the document. SAX is not supported by the .NET Framework, but you could easily create it based on the XmlReader, converting the pulled data into event-driven code. The cursor-based model presented in the .NET Framework has a major advantage over SAX implementations, in that it can skip nodes in the document tree, until it finds the next node to operate on. This increases the already formidable performance of the XmlReader classes.

Another way to work with XML is through the DataSet class. The DataSet is sort of a mini in-memory database that can contain tables and relations. Under the covers this data is stored in XML, so you can actually load an XML file into a DataSet, manipulate it, and save it back to file. The only requirement is that the XML file is (relatively) structured, and contains data similar to tables and relations. An ad hoc file such as an XML-tagged article is not suitable for the DataSet approach.

3.1. Working with the XmlReader classes

The XmlReader class is an abstract class, which means you can't create an XmlReader object. Instead you need to create an object from a class that implements the XmlReader class. In the .NET Framework there are currently two classes that implement the XmlReader class: XmlTextReader and XMLNodeReader. To write XML in the cursor-based model the only available implementation is the XmlTextWriter class. For the record: there is no XMLNodeWriter. Basically it isn't needed, because if you supply the XmlTextWriter with an XML structure, it will write that structure to file perfectly. However, when you are reading from file, you may want to use different models. All these classes are based on abstract classes that you could use to base more different readers and writers upon. The class structure is shown in Figure 1.

click image for full size view

Figure 1. Cursor-based classes in the .NET Framework

Using the readers seems easy enough, but you'll find that it actually doesn't do what you expect at first. Once you start to think about it, it actually makes sense. First of all the beginning and end tags of are treated as separate entities within the stream, which also holds for text values and white-space. So if you just write out the XML document to a text stream, you will find all sorts of duplicate data and empty spaces. You need to dive further into the class and use methods beyond the simple Read method to get more control over the input. Fortunately this is fairly easy and the names of all the methods and properties are quite intuitive, as you can see in the code below.

Do While myXmlReader.Read()
	If myXmlReader.IsEmptyElement Then
		Console.Write("Empty element: " & myXmlReader.Name)
	ElseIf myXmlReader.IsStartElement() Then
		Console.Write("Non-empty element: " & myXmlReader.Name)
	ElseIf Not myXmlReader.NodeType = XmlNodeType.EndElement Then
		Console.Write("Value: " & myXmlReader.Value)
	End If
Loop

3.2. Using DOM with the XmlDocument class

The XmlDocument class is just a DOM implementation. Anybody familiar with DOM should be able to use it fairly easily. This goes even more for people familiar with MSXML, which also implements some additional methods not part of the DOM specification. XmlDocument implements similar (in most cases the same) methods. Besides the regular XmlDocument class, there are also several derived classes optimized for certain operations. The XmlDataDocument class is geared towards interaction with a DataSet, enabling ADO.NET and System.XML to interact easily. The XPathDocument class and the related XPathNavigator class are optimized for XPath queries and transforming XML with Extensible Stylesheet Language Tranformations (???). If you want to use XSLT, you'll also need the XslTranform class. Unlike MSXML, where you can execute a transformation based on two DOM objects, System.XML contains specific classes for specific purposes, each optimized for its unique purpose. This makes the XML support in the .NET Framework extremely performant.

The more complex objects in the Framework are built on the simple ones. The simple objects are highly optimized for performance. The more complex the derived class, the less the performance becomes. This is why it is important to get to know the XML object model in the.NET Framework very well, so you can get to what you want in the least number of steps. Figure 2 shows that the XmlDocument object uses the XmlReader and XmlWriter objects to load/save the document. Everytime you load an XmlDocument, it will employ an XmlReader class. This goes for both situations where you explicitly code it as such, as well as in cases where you leave loading a file up to the XmlDocument class itself. Under the covers it will call the XmlReader anyway.

click image for full size view

Figure 2. XmlDocument in the .NET XML object model

3.3. Using the DataSet class

The DataSet class represents an in memory relational data store. Under the covers the data is stored as XML. This means it can be loaded from and saved to an XML document as well. The way this works is described when we look at databinding ASP.NET server controls in Section 4.1.

4. ASP.NET and XML

ASP.NET offers several web controls that can be used with XML. Most of these controls use a mechanism called databinding to manipulate and display data. With databinding you don't have to loop through data to display it, the control does that for you. You just specify the data soure, and the control does the rest. The data source can be a variable or method, a database DataReader, or a DataTable or DataSet. Since a DataSet can hold XML files as mentioned earlier, this means we can (indirectly) databind XML documents. Before we get into databinding however, let's look at the XML-control that is also part of ASP.NET. The XML-control serves as a container that takes an XML source and optionally an XSLT source. If there is not XSLT source, the XML is displayed as is on the client. This doesn't sound very interesting, but you can actually specify a document to show in a particular place, which can be useful when you're working with Cascading Stylesheets (CSS) or XML-data islands. When you specify an XSLT source it is used to transform the XML, and the result is sent to the browser. The normal operation for this control is to specify the files it is supposed to use. Other options are loading the files separately using an XmlDocument and an XslTransform object and bind those at run-time. The last option is to load the XML source from a string (the XSLT source doesn't support this). The syntax for this web control is extremely simple:

<asp:xml id="myxml" runat="server" DocumentSource="data.xml" TransformSource="xslt.xsl" />

The only additional attributes this control can have are those that are common to all controls. The list of methods, properties and events is also very limited. This simplicity is however what gives this control its power. If you want more... do it yourself. Having more control is easily done by not specifying a source for the XML and XSLT, and loading it at run-time. The code for doing this is again pretty simple:

Sub Page_Load(sender As Object, e As EventArgs)
    'Declare and Create objects
    Dim oXML As XmlDocument = New XmlDocument()
    Dim oXSLT As XslTransform = New XslTransform()

    'Load objects
    oXML.Load(Server.MapPath("data.xml"))
    oXSLT.Load(Server.MapPath("transform1.xslt"))

    'Bind to XML control
    myxml.Document = oXML
    myxml.Transform = oXSLT

    'Dispose of objects
    oXML = Nothing
    oXSLT = Nothing
End Sub

The code above loads the two separate sources into different objects and binds these dynamically to the Document and Transform properties, as easy as a proverbial piece of cake. If you want to manipulate the XML before binding it to the control, you need to do more elaborate coding of course, but for seasoned MSXML programmers this shouldn't be rocket science. Even if you are not so experienced, this shouldn't be too hard to understand.

4.1. Data binding

A neat feature of ASP.NET are databound controls. They enable you to show a datasource in a control, without having to loop through the data. This enables you to show rows of data in a spreadsheet-like grid, or repeated in some template. Because a DataSet stores data as XML, independent of the data soure, XML documents can also be databound to controls. To do this, you open an XML file, read it into a DataSet and bind it to a control. As long as the data is (more or less) tabular, you will get data in the form of a database table which can be shown using a data bound control. The listing below is a simple sample of the code used to bind an XML file to a DataGrid control.

Sub Page_Load(Src As Object, E As EventArgs)
    Dim sFile As String = Server.MapPath("data.xml"),
    Dim oFs As FileStream = New FileStream(sFile, FileMode.Open, FileAccess.Read)
    Dim oFile As StreamReader = New StreamReader(oFs)
    Dim oDs As New DataSet

    oDs.ReadXml(oFile)
    oFs.Close()

    grid1.DataSource = New DataView(oDs.Tables(0))
    grid1.DataBind()
End Sub

As you see the steps required are straightforward. The XML file is opened as a standard file stream, which is read into the data set using the ReadXml method. After that it just works as if you are working with a database.

The structure of the XML file is very important. When loading the file, the DataSet analyzes the input and creates tables for data items that are on the same level and have the same structure.

<Products>
  <ProductID>1001</ProductID>
  <CategoryID>1</CategoryID>
  <ProductName>Chocolate City Milk</ProductName>
  <ProductDescription>Chocolate City Milk Description</ProductDescription>
  <UnitPrice>2</UnitPrice>
  <ImagePath>/quickstart/aspplus/images/milk5.gif</ImagePath>
  <Manufacturer>Chocolate City</Manufacturer>
</Products>

If an XML file would contain (under the root node) only elements as shown above, there would be one table created for all the products nodes. Now look at the listing below.

<Products>
  <ProductID>1001</ProductID>
  <CategoryID>1</CategoryID>
  <ProductName>Chocolate City Milk</ProductName>
  <UnitPrice>2</UnitPrice>
  <MoreProdInfo>
    <ProductDescription>Chocolate City Milk Description</ProductDescription>
    <ImagePath>/quickstart/aspplus/images/milk5.gif</ImagePath>
    <Manufacturer>Chocolate City</Manufacturer>
  </MoreProdInfo>
</Products>

This listing has another hierarchical level in it. The result will be that the data set contains two tables, one with the Products built up out of the immediate children (containing data) of the Products element, the other with all the child elements of MoreProdInfo. The former can be accessed with the same code as shown earlier, the latter just by changing the index from 0 to 1:

Grid2.DataSource = New DataView(oDs.Tables(1))

The DataSet also contains a relation between these to 'tables', so you can refer back and forth. The relations between tables is stored in the DataRelations collection, which contains DataRelation items. This object contains objects like ChildTable and ParentTable, so getting the name of the parent table in the relation can be done like this:

oDs.Relations.Item(0).ParentTable.TableName

Bibliography

[WA 2001] Wahlin, Dan, 2001. XML for ASP.NET Developers, Sams Publishing, Indianapolis, IN, USA. ISBN 0-67-232039-8.

[NE 2001] Microsoft, 2001. .NET Framework Software Development Kit, Microsoft, Redmond, WA, USA.

Glossary

CSS

Cascading Stylesheets

DOM

Document Object Model

MSXML

Microsoft XML Core Services

SAX

Simple API for XML

XSLT

Extensible Stylesheet Language

Biography

Michiel van Otegem is the president and co-founder of ASPNL.com, a training and consulting firm based in the Netherlands. He is also the chairman of dotNED, the Dutch .NET User Group.