Welcome!

Microsoft Cloud Authors: Pat Romanski, Andreas Grabner, Nick Basinger, Kevin Benedict, Liz McMillan

Related Topics: Microsoft Cloud

Microsoft Cloud: Article

Following the Tao of XML Type

XML best practices

We cannot know the Tao itself, nor see its qualities directly, but only see by differentiation that which it manifests."
- The Tao Te Ching, Chapter 2

Confession: This is not the article I started out to write. The editors asked for ideas and I offered to write about building Common Language Runtime-based User Defined Functions (CLR UDFs.) After attending Michael Rys's TechEd 2004 session on the XML Type in SQL Server 2005, though, I changed direction and decided to use the XML type instead. I was rewarded with a Zen moment for that. I would like to acknowledge all of the assistance I received from Dr. Rys in making these examples and for his valuable suggestions along the way.

In Zen, the concept of "The Way" (or Tao) is fundamental. We get into trouble when we assume that the Tao of one thing is essentially the same as that of another. XML type has a unique Tao. Understanding its Tao is not required to make effective use of the XML type, but it is if you want to be efficient.

The nature of the XML type is subtle: it is not like the Document Object Model (DOM) or the stream of nodes model approaches with which you may be familiar. In this article, I will try to guide you into following the Tao of XML type.

We Cannot Know the Tao Itself
When you first meet the XML type in Books On-Line, you may be immediately drawn to its methods:

  • Query: Uses XQuery to select an XML infoset from the underlying data.
  • Value: Also uses XQuery to select the value of the underlying data.
  • Exists: Simply tests for the presence of a sequence resulting from an XQuery.
  • Modify: Allows for the changing of the underlying data values.
What you will not see, though, is a definition of the XML type in terms of models like the DOM or XmlWriter. The definitions of Query and Value give us a subtle clue as to why. Consider the following query:

declare @greetings xml
set @greetings = '<greetings><greeting language=''en''>Hello
World.</greeting><greeting language=''es''>Hola
Mundo.</greeting></greetings>'
select 'query version'
select @greetings.query('/greetings/greeting[@language="en"]') as type
select 'value version'
select @greetings.value('/greetings/greeting[@language="en"]','nvarchar(max)')

This generates:

query version
<greeting language="en">Hello World.</greeting>

value version
Hello World.

Note: If you are using the Beta 1 version of "Yukon," you will need to type "@greetings::query" instead of "@greetings.query." The method designator has changed between versions.

What exactly are we seeing here? The first query returns the full element that matches our query, while the second returns just the inner-text value from the matching element. The really interesting question is - does the first query extract the element from an underlying XML representation while the second query is parsing out the inner-text value from the node, or is the first query method constructing an XML infoset in the first query and just retrieving the inner-text value from the underlying store in the second? If it is the former case, we might logically infer that the underlying store for the XML type might be DOM-like in nature. In the latter case, we must reject that conclusion.

My moment of Zen was realizing that the underlying store was neither a DOM nor a stream of nodes. The XML type is not a CLR-based type, rather it is an SQL Type. But it is a different kind of SQL Type.

Rather, we have to think of it as being some collection of scalar values. When we initialize that value of an XML type with a given infoset, the query processor (QP) breaks this collection of elements and their attribute values down to scalar values and inserts them into this collection along with enough meta-data to reassemble the infoset as it was originally presented to the QP. I am going to leave any deeper exploration of the nature of the underlying storage of the XML type at this point. We really do not need to understand the Tao of the XML type so long as we follow its way faithfully.

The underlying collection is optimized for querying, not for modification. In fact, it appears that any modification of the collection results in partial regeneration of the entire collection. This nature implies three best practices of which you need to be aware when using the XML type:

  • Initialize the value of an XML type instance as fully as possible: This is important given the relative expense of making modifications against the underlying store.
  • Minimize the number modifications made against the instance: Again, because each modification essentially regenerates the underlying storage of the data in the infoset, it is best to avoid that operation.
  • Do not use the XML type for serial construction of XML documents: Because heavy modification - which includes inserts - is to be avoided, the use of queries that build the infoset one element at a time also should be avoided.
This last practice is probably the most critical to making efficient use of the XML type, as we are about to see.

Manifest Differences
The original intent of this article was to demonstrate a CLR UDF that constructed an amortization schedule. These are used to illustrate the payoff of the loan amount. Amortization schedules are commonly associated with mortgages, or loans issued to purchase real estate. The typical amortization schedule shows payment due dates and amounts, the amount of each payment applied to interest on the loan, and the resulting balance of the loan.

Storing the amortization schedule in an XML format within the database makes some sense. We would make it a column in a table with other information about the mortgage, like issue date, principal loan amount, interest rate, and other loan factors. This way, the schedule can be retrieved without recalculation and possibly queried without persisting the amortization schedule in a tabular form (e.g., each payment transaction becomes a row).

A Bad Approach
To fully understand why you should not use the XML type as a dynamic constructor, consider the code listed for dbo.udfExample1. This query was written for the Beta 1 bits of SQL Server 2005 before I understood the costs of modification. In particular, I now want to point out some of the places where I went wrong.

Almost immediately, I violated the first best practice of not fully initializing the XML document as much as possible. Rather, I initialized the document with the least possible information by doing this:

set @sched = '<s />'

The next egregious violation was of the second best practice. Instead of minimizing the number of modifications made against the instance, I wrote:

set @sched::modify('insert <p /> into /s')
set @sched::modify('insert attribute due {sql:variable("@date_str")} into
/s/p[last()]')
set @sched::modify('insert attribute amt {sql:variable("@pmt")} into /s/p[last()]')
set @sched::modify('insert attribute int {sql:variable("@icharge")} into
/s/p[last()]')
set @sched::modify('insert attribute bal {sql:variable("@ibal")} into
/s/p[last()]')

Finally - and worst of all - I was breaking the last best practice: I was using the XML type to serially construct an infoset. At this point, you might be asking yourself how much difference it makes in writing the code. It amounts to several seconds! For example, on my development computer, the original function takes approximately 37 seconds to complete using the default values and the SQL Server 2005 Beta 1 bits.

Table 1 demonstrates the inefficiencies of what I was doing. The first column shows the number of years for which the loan is made. The second column shows the number of modification operations required to generate the schedule. Note that for each periodic payment, five modifications are required. The third column shows the time as reported by the SQL Server Workbench to execute the query. The last column shows the number of modifications per second achieved.

The nature of the underlying storage of data in the XML type is hinted at here. As the amount of work performed increases, the number of modifications per second achieved decreases. What's more, the rate of degradation in performance actually increases as the amount of work increases. This suggests that the underlying store might be trying to keep the data in the physical order in which it might be serialized. That is not bad, exactly, because it like improves the performance of serialization and queries. But it does evidence how bad of an idea the serial construction of an infoset with the XML type is.

A Better Approach
I suspect that my failure was because I had not followed the Tao of XML type, and that I had failed to appreciate the lesson with which I opened this article: "We cannot know the Tao itself, nor see its qualities directly." What I needed to do was get away from the idea of trying to construct an XML document at all. I had to compute underlying data first, then construct the XML type all at once (holding to all three of the best practices). The first thing that made sense to do was to replace the XML instance with a temporary table. You can use normal temporary tables within a User Defined Function, or you can use a variable typed as Table.

declare @table table( due nvarchar(64), amt numeric(14,2),
"int" numeric(14,2),
bal numeric(14,2)) -- The data to be serialized.

Then I can simply insert the results into the table as I go along.

insert into @table values(@date_str, @pmt, @icharge,@ibal)

Finally, I can construct the XML as type infoset with a FOR XML query.

return (select * from (
     select 1 as tag,
     0 as parent,
     'elem' as
     "s!1!dummy!hide",
     NULL as "p!2!due",
     NULL as "p!2!amt",
     NULL as "p!2!int",
     NULL as "p!2!bal"
     union all
     select 2,
     1,
     'elem',
     due,
     amt,
     int,
     bal
     from @table ) tbl
    order by "s!1!elem!hide", parent
    for xml explicit, type)

This version of the function executes in less than a second in virtually every test case I run it against. This is what I would call a significantly better approach to the problem because it gives us a better, positive version of our last best practice. Use FOR XML to generate infosets into XML type variables instead of using serial construction.

Lesson Learned
The XML type is indeed a powerful option for working with XML in SQL Server 2005, especially for storage and query. However, you should keep in mind that making frequent changes to instances of this type can be very expensive. Such operations should be kept to a minimum. This brings us back to the three best practices:

  • Initialize the value of an XML type instance as fully as possible.
  • Minimize the number modifications made against the instance.
  • Use FOR XML to generate infosets into XML type variables instead of using serial construction.
Following the Tao of the XML Type is simple as long as you keep the three best practices in mind.

More Stories By Kent Tegels

Kent Tagels is Senior System Analyst for HDR, Inc. in Omaha, and an author, .NET developer, and administrator. MCSE+I, MCDBA and MCP+SB.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


IoT & Smart Cities Stories
Moroccanoil®, the global leader in oil-infused beauty, is thrilled to announce the NEW Moroccanoil Color Depositing Masks, a collection of dual-benefit hair masks that deposit pure pigments while providing the treatment benefits of a deep conditioning mask. The collection consists of seven curated shades for commitment-free, beautifully-colored hair that looks and feels healthy.
The textured-hair category is inarguably the hottest in the haircare space today. This has been driven by the proliferation of founder brands started by curly and coily consumers and savvy consumers who increasingly want products specifically for their texture type. This trend is underscored by the latest insights from NaturallyCurly's 2018 TextureTrends report, released today. According to the 2018 TextureTrends Report, more than 80 percent of women with curly and coily hair say they purcha...
The textured-hair category is inarguably the hottest in the haircare space today. This has been driven by the proliferation of founder brands started by curly and coily consumers and savvy consumers who increasingly want products specifically for their texture type. This trend is underscored by the latest insights from NaturallyCurly's 2018 TextureTrends report, released today. According to the 2018 TextureTrends Report, more than 80 percent of women with curly and coily hair say they purcha...
We all love the many benefits of natural plant oils, used as a deap treatment before shampooing, at home or at the beach, but is there an all-in-one solution for everyday intensive nutrition and modern styling?I am passionate about the benefits of natural extracts with tried-and-tested results, which I have used to develop my own brand (lemon for its acid ph, wheat germ for its fortifying action…). I wanted a product which combined caring and styling effects, and which could be used after shampo...
The platform combines the strengths of Singtel's extensive, intelligent network capabilities with Microsoft's cloud expertise to create a unique solution that sets new standards for IoT applications," said Mr Diomedes Kastanis, Head of IoT at Singtel. "Our solution provides speed, transparency and flexibility, paving the way for a more pervasive use of IoT to accelerate enterprises' digitalisation efforts. AI-powered intelligent connectivity over Microsoft Azure will be the fastest connected pat...
There are many examples of disruption in consumer space – Uber disrupting the cab industry, Airbnb disrupting the hospitality industry and so on; but have you wondered who is disrupting support and operations? AISERA helps make businesses and customers successful by offering consumer-like user experience for support and operations. We have built the world’s first AI-driven IT / HR / Cloud / Customer Support and Operations solution.
Codete accelerates their clients growth through technological expertise and experience. Codite team works with organizations to meet the challenges that digitalization presents. Their clients include digital start-ups as well as established enterprises in the IT industry. To stay competitive in a highly innovative IT industry, strong R&D departments and bold spin-off initiatives is a must. Codete Data Science and Software Architects teams help corporate clients to stay up to date with the mod...
At CloudEXPO Silicon Valley, June 24-26, 2019, Digital Transformation (DX) is a major focus with expanded DevOpsSUMMIT and FinTechEXPO programs within the DXWorldEXPO agenda. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of business. Only 12% still survive. Similar percentages are found throug...
Druva is the global leader in Cloud Data Protection and Management, delivering the industry's first data management-as-a-service solution that aggregates data from endpoints, servers and cloud applications and leverages the public cloud to offer a single pane of glass to enable data protection, governance and intelligence-dramatically increasing the availability and visibility of business critical information, while reducing the risk, cost and complexity of managing and protecting it. Druva's...
BMC has unmatched experience in IT management, supporting 92 of the Forbes Global 100, and earning recognition as an ITSM Gartner Magic Quadrant Leader for five years running. Our solutions offer speed, agility, and efficiency to tackle business challenges in the areas of service management, automation, operations, and the mainframe.