Welcome!

.NET Authors: Liz McMillan, Mark O'Neill, Peter Silva, Yakov Werde, Matthew Pollicove

Related Topics: .NET, XML

.NET: Article

Managing XML Data

The Benefits to Office-Based Applications

Last week I had lunch with the application manager of a local customer that just completed their enterprise rollout of Office 2003. We had decided to meet and discuss possible ways his team could begin to utilize this deployment. As we sat down he explained that he had been talking to his team and had been investigating a project that he wanted to discuss. He explained that they had a variety of independent business processes that all ran within various Microsoft Office applications. He wanted to know if it were possible to connect these together using XML and the features of Office 2003. He explained that he had discovered Office natively supports XML, which had gotten him to think about ways his developers could take advantage of this feature. He was hoping leveraging this would enable him to connect these independent processes together and begin to share the various data that was collected throughout the enterprise. In this article I will explain, as I did that day, how you can use not only XML but many of the other associated standards such as Extensible Schema Language Templates (XSLT) and Extensible Schema Definitions (XSD) to build and integrate Office-based applications.

It is important to understand that one of the major benefits of an XML document is that it enables the separation of application data from presentation. An XML document contains a set of self-describing structures that are used to define a vocabulary of data. The text-based nature of XML enables the easy transport of these types of data documents across various process boundaries such as the ones the application manager had described. Always remember that XML is about data storage. This means that by definition XML documents can be unpredictable, as they are guaranteed to be well formed, but there is no inherent requirement of data consistency. This is the reason the XSD standard was developed. Additionally, as the need emerged for these documents to change easily, the XSLT standard was developed.

What Is XSD?
The simple answer for providing a guaranteed data structure is to create schemas. These schemas are used to describe an object and any of the interrelationships that exist within a data structure. There are many different kinds of schema definitions. For example, relational databases such as SQL Server use schemas to contain their table names, column keys, and provide a repository for trigger and stored procedures. Also when a developer creates a class definition, he or she can define schemas to provide the object-oriented interface to properties, methods, and events. Within an XML data structure, schemas are used to describe both the object definition and the relationship of data elements and attributes. Regardless of their actual context, schemas are used to provide the data representation and serve as an abstracted layer or framework.

Just as XML is really a metalanguage used to create and describe other languages, XSD is an example of an XML-based modeling language defined by the W3C for creating XML schemas. Defined using XML, XSD is used to enforce the legal building blocks for the formatting and validation of an XML file. For example, let's examine a schema that defines an employee structure as shown in Listing 1.

This schema is by definition a well-formed XML document. At the top of an XSD file is a set of namespaces. These are an optional set of declarations that provide a unique set of identifiers that associate a set of XML elements and attributes together. The original namespace in the XML specification was released by the W3C as a URI-based way to differentiate various XML vocabularies. This was then extended under the XML schema specification to include schema components and not just single elements and attributes. The unique identifier was redefined as a URI that doesn't point to a physical location, but to a security boundary that is owned by the schema author. The namespace is defined through two declarations - the XML schema namespace and target namespace. The xmlns attribute uniquely defines a schema namespace and is then divided into three sections.

  • Xmlns Keyword: Is defined first and separated from the target namespace prefix by a colon.
  • Prefix: Defines the abbreviated unique name of a namespace and is used when declaring all elements and attributes. Both xmlns and xml are reserved keywords that can't be used as valid prefixes.
  • Definition: The unique URI that identifies the namespace and contains the security boundary owned by the schema author.
By definition all XSD schemas contain a single top-level element. Underneath this element is the schema element that contains either simple or complex type elements. Simple elements contain text-only information. Complex elements are grouping elements that act as a container for other elements and attributes. There are four types of complex elements: empty elements, elements that contain other elements, elements that contain only text, and elements that contain both other elements and text.

The simple types contain the individual elements or fields that describe the employee object. These are then grouped into a complex type (employeeinfo) that provides the entire object representation. This schema contains a variety of elements that describes the data that can be used to capture employee information. By using the XML adapter of InfoPath as shown in Figure 1, we can import the schema into a data source.

By using InfoPath we can then build a data entry form as shown in Figure 2 that would allow end users to update employee information and also guarantee that their data conforms to the XSD structure defined above. The benefit of InfoPath in this example is that it abstracts users from having to understand the complexities and mechanics of the underlying XSD. Instead, they are able to open and complete the data entry form that results in a congruent XML document.

Additionally, depending on the specific business process, we could identify additional business rules using the features of InfoPath that could enforce form-specific requirements. It is important to remember that these are additional rules and can't alter the base line defined within the XSD.

As users create and save their forms, this data is then stored in an XML document (as shown in Listing 2) that is guaranteed to match the XSD schema defined above.

Note: Processing instructions (PI) are optional comment elements that can appear at the top of an XML document. InfoPath uses them to provide a path to the solution file and version information. Within the construct of XML they always being with a "?."

What Is XSLT?
Of course once this XML document is created, it contains the necessary information and associations to be opened within the InfoPath solution that we created above. Although it can be opened using Word as shown in Figure 3, we will only see the data that it contains and not the format or any additional rules we defined within InfoPath. However, by using XSLT we can change that and transform this document into a solution that can leverage the presentation capabilities provided by Microsoft Word.

XSLT is also a metalanguage that consists of an XML-based vocabulary that describes elements for transforming XML-based content. This vocabulary consists of a specialized set of elements or formatting objects that define presentation and document-based positional elements. Also, built into this positional location service is a search specification called the XML Path Language (XPATH). The combination of XSLT and XPATH forms a specialized vocabulary that enables the transformation of any XML-based document into virtually any other document format. XSLT is designed as a transformation language. Starting with an XML-based document, the application of templates can generate a new output document. The XSLT processor accepts as input the XML tree represented in a well-formed document and then produces as output a new transformed document.

The transformation process defines the use of three documents: the source, the XSLT style sheet, and the resulting document. The source document is simply a well-formed XML document. This document serves as the input of the transformation. The style sheet document is an XML document that uses the XSLT vocabulary for expressing transformation rules. Finally, the result document is a text document that is produced by applying the transformation defined in the XSLT stylesheet to the input document.

A transformation expressed in XSLT describes rules for transforming a source tree into a result tree. The transformation is achieved by associating a set of patterns with templates. A pattern is matched against elements in a source tree. A template is instantiated to create part of the result tree. It is important to remember that the result tree is separate from the source tree. In constructing the result tree, elements from the source tree can be filtered and reordered, and any type of arbitrary structure can be added.

Making the Transformation
Within the .NET Framework the System.Xml.Xsl namespace provides support for XSLT transformations. It supports the W3C XSL Transformations (XSLT) Version 1.0 Recommendation (www.w3.org/TR/xslt). This namespace provides several methods that enable developers to transform the document created by InfoPath into any other format. For example, using the code shown in Listing 3 we can apply an XSLT transformation to the InfoPath XML document and perform the transformation to a Word document.

The XML format defined within Word is based on an additional namespace support called WordML. The WordML schema was designed to mirror the information found in a traditional .doc file. The root element of a WordML document is always w:wordDocument. This element contains several other elements that represent the complete Word document structure, including properties, fonts, lists, systems, and the actual document body that contains the sections and paragraphs.

The addition of this namespace within an XML document preserves Word's styles and formatting in an XML namespace. This doesn't define presentation, but an important part of any Word document is formatting. This namespace allows the inclusion of these formats. For example, let's start with an InfoPath form that contains a repeating table structure as shown in Figure 4.

This could be transformed using XSLT into a Word document that includes formatting as shown in listing 4.

Once the transformation is complete, the document appears as a Word document within the File Explorer as shown in Figure 5.

.  .  .

As we stood to leave, he started smiling as he thought about the possibilities. As we shook hands and parted in the parking lot, each getting into his own car, he confided in me that he had all sorts of things he wanted to do. He couldn't wait to talk to his team and start planning. Of course, this is just a small introduction to the many things that XML, XSD, and XSLT can provide when working with Office; the rest is up to you.

More Stories By Thom Robbins

Thom Robbins is a senior technology specialist with Microsoft. He is a frequent contributor to various magazines, including .NET Developer's Journal and SOA Web Services Journal. Thom is also a frequent speaker at a variety of events that include VS Live and others. When he's not writing code and helping customers, he spends his time with his wife at their home in New Hampshire.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.