Welcome!

.NET Authors: Bruce Armstrong, Pat Romanski, Liz McMillan, Yeshim Deniz, Maureen O'Gara

Related Topics: .NET

.NET: Article

XML Support in Whidbey

Reading, writing, transforming, and querying XML

The Internet is a large, heterogeneous collection of interconnected systems. To leverage the distributed computing opportunities that the Internet offers, developers had to agree upon a way to represent data that would be exchanged over the Internet. And that way is XML.

Now, any distributed system consists of a bunch of nodes (possibly diverse ones) that create, process, or consume data. After XML was agreed upon as the data format, different distributed computing environments added support to read, represent, manipulate, and serialize (and also persist) XML data. In this article we are going to briefly look at the XML support that is built into the next version of the Microsoft .NET Framework and Visual Studio .NET, codenamed "Whidbey."

XML: As of Now
The .NET Framework 1.0 and 1.1 provide excellent support for XML-related operations, including:

  • The ability to read and write an XML document (an XML byte stream) using streaming APIs (System.Xml.XmlReader and System.Xml.XmlWriter). The XmlReader provides a forward-only, read-only, pull-based API (i.e., the client pulls XML nodes from the processor), unlike with SAX (Simple API for XML), which is push based (i.e., the processor pushes the XML nodes to the client through events). The XmlReader is the fastest and most memory-efficient way to read XML. XmlReader and XmlWriter are abstract classes that provide extensible points for developers to plug in their own streaming readers and writers. The .NET Framework provides the XmlTextReader and the XmlTextWriter as implementations.

  • The ability to load an XML document into an in-memory cache and expose it via the W3C DOM Level 2 API (System.Xml.XmlDocument). You can update this in-memory cache and then persist it to disk as an XML file. Internally, the XML Document uses the XmlTextReader. It's the easiest API to use, but the slowest and most inefficient with memory.
  • The ability to validate an XML document against a DTD or an XML Schema (System.Xml.Xml ValidatingReader). The Xml-ValidatingReader uses an XmlReader implementation (like the XmlTextReader) to read the XML document.

  • An alternative in-memory tree representation of an XML document (System.Xml.XPath.XPath Document). The XPathDocument class is more efficient than the XmlDocument class. Check out Don Box's MSDN TV episode called "Passing XML Data Inside the CLR," which can be found at Click Here !.

  • The ability to navigate through an XML document using XPath (System.Xml.XPath.XPathNavigator). The XPathNavigator can be created over any XML in-memory representation that implements the IXPathNavigable (System.Xml. XPath.IXPathNavigable) interface.

  • The ability to apply an XSL transform to an XML document (System.Xml.Xsl.XslTranform).
As you can see, there is a wide variety of options available to the developer when it comes to dealing with XML data. Aaron Skonnard has written a brilliant article, titled ".NET XML Best Practices," (http://support.softartisans.com/kbview.aspx?ID=673).

XML: Soon to Be
Whidbey brings a bunch of exciting enhancements to the party, some of which Mark Fussell discussed in his PDC session, "NET Framework: What's new in System.Xml for Whidbey" (ARC380). Chief among these enhancements are:

  • The XmlTextReader has become about twice as fast (in the PDC preview).
  • The XPathDocument2 class has replaced the XmlDocument class as the preferred in-memory XML cache.
  • The XsltProcessor is the new XSLT processor and is based on the XQuery architecture. It is also faster than XsltTransform.
  • Support for XQuery 1.0. XQuery is an XML query language just as SQL is a RDBMS query language.
Reading XML
There are essentially two ways of reading XML. The first is by using the XmlTextReader, which offers a stream-based API. The XmlTextReader in the PDC preview version of Whidbey is twice as fast as the one that ships with the .NET Framework 1.1, according to Fussell. The second way to read XML is to use an in-memory representation of XML. In Whidbey, the XmlPathDocument2 takes from the XmlDocument the mantle of the preferred in-memory representation of XML data.

For our example we will be using the Books.xml file shown in Listing 1.

The code in Listing 2 prints the ISBN numbers and names of all books in the Books.xml file. Here the Books.xml file is loaded into an instance of XPathDocument2 and then traversed using XPathNavigator2.

Although this code is largely similar to what you would write using the .NET Framework 1.1 SDK, one of the differences is that the System.Xml.XPath.XPathDocument has been replaced by the System. Xml.XPathDocument2, and the System.Xml.XPath.XPath Navigator has been replaced by the System.Xml.XPathNavigator2 (notice the change in the namespace in both cases).

Also, the XPathDocument2 provides us with various typed Read methods such as ReadInt32Value, ReadDateTimeValue, and ReadBooleanValue, which are not present in the XPathDocument.

Writing XML
Whidbey introduces a new way to manipulate XML. This method uses the XPathEditor and an XmlWriter (like the XmlTextWriter) to modify an XML document that is cached in memory as an XPathDocument2 object. Listing 3 shows how to load Books.xml into an instance of XPathDocument2, add a new book element, and save the changes.

Transforming XML
Whidbey also introduces a new way to transform XML data - by using the XsltProcessor class. The XsltProcessor class uses an XSLT style sheet to convert an XML source tree to an XML target tree. Its architecture is based on XQuery and performance-wise it's better than the XsltTransform class.

Listing 4 shows a simple XSLT style sheet that transforms Books.xml into an HTML table. Listing 5 shows the code that applies the XSLT style sheet to the XML.

Querying XML
Whidbey has built-in support for XQuery 1.0. XQuery allows you to query XML the same way SQL allows you to query a relational data store. XQuery has a very simple but powerful syntax. It has just five keywords - FOR, LET, WHERE, ORDER BY, and RETURN. To illustrate how to query an XML document loaded in an XPathDocument2 instance, imagine that we want to break the data found in the Books.xml file shown in Listing 1 into two files - Authors.xml (Listing 6) and JustBooks.xml (Listing 7).

We will now try to join the these two XML documents and compose the data in them using XQuery. Listing 8 shows the XQuery that does exactly that. Listing 9 shows the code that applies the XQuery in Listing 8 to the documents in Listings 6 and 7.

Conclusion
This article shows how the XML support in the .NET Framework continues to evolve with Whidbey. Microsoft's policy of release early; collect, filter, and apply feedback; and then iterate has elevated the .NET Framework to the position of the best distributed programming platform ever. And a lot of that credit goes to System.Xml and its friends.

More Stories By Mujtaba Syed

Mujtaba Syed works as a software architect with Marlabs Inc. He is an MCSD
(early achiever) and loves to speak about and write on Microsoft .NET. Mujtaba has been programming the Microsoft .NET Framework since its beta 1 release. His current interests are focused on Longhorn.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.