Welcome!

Microsoft Cloud Authors: Kevin Benedict, Pat Romanski, Liz McMillan, Lori MacVittie, Elizabeth White

Related Topics: Microsoft Cloud

Microsoft Cloud: Article

Following the Tao of XML Type

XML best practices

We cannot know the Tao itself, nor see its qualities directly, but only see by differentiation that which it manifests."
- The Tao Te Ching, Chapter 2

Confession: This is not the article I started out to write. The editors asked for ideas and I offered to write about building Common Language Runtime-based User Defined Functions (CLR UDFs.) After attending Michael Rys's TechEd 2004 session on the XML Type in SQL Server 2005, though, I changed direction and decided to use the XML type instead. I was rewarded with a Zen moment for that. I would like to acknowledge all of the assistance I received from Dr. Rys in making these examples and for his valuable suggestions along the way.

In Zen, the concept of "The Way" (or Tao) is fundamental. We get into trouble when we assume that the Tao of one thing is essentially the same as that of another. XML type has a unique Tao. Understanding its Tao is not required to make effective use of the XML type, but it is if you want to be efficient.

The nature of the XML type is subtle: it is not like the Document Object Model (DOM) or the stream of nodes model approaches with which you may be familiar. In this article, I will try to guide you into following the Tao of XML type.

We Cannot Know the Tao Itself
When you first meet the XML type in Books On-Line, you may be immediately drawn to its methods:

  • Query: Uses XQuery to select an XML infoset from the underlying data.
  • Value: Also uses XQuery to select the value of the underlying data.
  • Exists: Simply tests for the presence of a sequence resulting from an XQuery.
  • Modify: Allows for the changing of the underlying data values.
What you will not see, though, is a definition of the XML type in terms of models like the DOM or XmlWriter. The definitions of Query and Value give us a subtle clue as to why. Consider the following query:

declare @greetings xml
set @greetings = '<greetings><greeting language=''en''>Hello
World.</greeting><greeting language=''es''>Hola
Mundo.</greeting></greetings>'
select 'query version'
select @greetings.query('/greetings/greeting[@language="en"]') as type
select 'value version'
select @greetings.value('/greetings/greeting[@language="en"]','nvarchar(max)')

This generates:

query version
<greeting language="en">Hello World.</greeting>

value version
Hello World.

Note: If you are using the Beta 1 version of "Yukon," you will need to type "@greetings::query" instead of "@greetings.query." The method designator has changed between versions.

What exactly are we seeing here? The first query returns the full element that matches our query, while the second returns just the inner-text value from the matching element. The really interesting question is - does the first query extract the element from an underlying XML representation while the second query is parsing out the inner-text value from the node, or is the first query method constructing an XML infoset in the first query and just retrieving the inner-text value from the underlying store in the second? If it is the former case, we might logically infer that the underlying store for the XML type might be DOM-like in nature. In the latter case, we must reject that conclusion.

My moment of Zen was realizing that the underlying store was neither a DOM nor a stream of nodes. The XML type is not a CLR-based type, rather it is an SQL Type. But it is a different kind of SQL Type.

Rather, we have to think of it as being some collection of scalar values. When we initialize that value of an XML type with a given infoset, the query processor (QP) breaks this collection of elements and their attribute values down to scalar values and inserts them into this collection along with enough meta-data to reassemble the infoset as it was originally presented to the QP. I am going to leave any deeper exploration of the nature of the underlying storage of the XML type at this point. We really do not need to understand the Tao of the XML type so long as we follow its way faithfully.

The underlying collection is optimized for querying, not for modification. In fact, it appears that any modification of the collection results in partial regeneration of the entire collection. This nature implies three best practices of which you need to be aware when using the XML type:

  • Initialize the value of an XML type instance as fully as possible: This is important given the relative expense of making modifications against the underlying store.
  • Minimize the number modifications made against the instance: Again, because each modification essentially regenerates the underlying storage of the data in the infoset, it is best to avoid that operation.
  • Do not use the XML type for serial construction of XML documents: Because heavy modification - which includes inserts - is to be avoided, the use of queries that build the infoset one element at a time also should be avoided.
This last practice is probably the most critical to making efficient use of the XML type, as we are about to see.

Manifest Differences
The original intent of this article was to demonstrate a CLR UDF that constructed an amortization schedule. These are used to illustrate the payoff of the loan amount. Amortization schedules are commonly associated with mortgages, or loans issued to purchase real estate. The typical amortization schedule shows payment due dates and amounts, the amount of each payment applied to interest on the loan, and the resulting balance of the loan.

Storing the amortization schedule in an XML format within the database makes some sense. We would make it a column in a table with other information about the mortgage, like issue date, principal loan amount, interest rate, and other loan factors. This way, the schedule can be retrieved without recalculation and possibly queried without persisting the amortization schedule in a tabular form (e.g., each payment transaction becomes a row).

A Bad Approach
To fully understand why you should not use the XML type as a dynamic constructor, consider the code listed for dbo.udfExample1. This query was written for the Beta 1 bits of SQL Server 2005 before I understood the costs of modification. In particular, I now want to point out some of the places where I went wrong.

Almost immediately, I violated the first best practice of not fully initializing the XML document as much as possible. Rather, I initialized the document with the least possible information by doing this:

set @sched = '<s />'

The next egregious violation was of the second best practice. Instead of minimizing the number of modifications made against the instance, I wrote:

set @sched::modify('insert <p /> into /s')
set @sched::modify('insert attribute due {sql:variable("@date_str")} into
/s/p[last()]')
set @sched::modify('insert attribute amt {sql:variable("@pmt")} into /s/p[last()]')
set @sched::modify('insert attribute int {sql:variable("@icharge")} into
/s/p[last()]')
set @sched::modify('insert attribute bal {sql:variable("@ibal")} into
/s/p[last()]')

Finally - and worst of all - I was breaking the last best practice: I was using the XML type to serially construct an infoset. At this point, you might be asking yourself how much difference it makes in writing the code. It amounts to several seconds! For example, on my development computer, the original function takes approximately 37 seconds to complete using the default values and the SQL Server 2005 Beta 1 bits.

Table 1 demonstrates the inefficiencies of what I was doing. The first column shows the number of years for which the loan is made. The second column shows the number of modification operations required to generate the schedule. Note that for each periodic payment, five modifications are required. The third column shows the time as reported by the SQL Server Workbench to execute the query. The last column shows the number of modifications per second achieved.

The nature of the underlying storage of data in the XML type is hinted at here. As the amount of work performed increases, the number of modifications per second achieved decreases. What's more, the rate of degradation in performance actually increases as the amount of work increases. This suggests that the underlying store might be trying to keep the data in the physical order in which it might be serialized. That is not bad, exactly, because it like improves the performance of serialization and queries. But it does evidence how bad of an idea the serial construction of an infoset with the XML type is.

A Better Approach
I suspect that my failure was because I had not followed the Tao of XML type, and that I had failed to appreciate the lesson with which I opened this article: "We cannot know the Tao itself, nor see its qualities directly." What I needed to do was get away from the idea of trying to construct an XML document at all. I had to compute underlying data first, then construct the XML type all at once (holding to all three of the best practices). The first thing that made sense to do was to replace the XML instance with a temporary table. You can use normal temporary tables within a User Defined Function, or you can use a variable typed as Table.

declare @table table( due nvarchar(64), amt numeric(14,2),
"int" numeric(14,2),
bal numeric(14,2)) -- The data to be serialized.

Then I can simply insert the results into the table as I go along.

insert into @table values(@date_str, @pmt, @icharge,@ibal)

Finally, I can construct the XML as type infoset with a FOR XML query.

return (select * from (
     select 1 as tag,
     0 as parent,
     'elem' as
     "s!1!dummy!hide",
     NULL as "p!2!due",
     NULL as "p!2!amt",
     NULL as "p!2!int",
     NULL as "p!2!bal"
     union all
     select 2,
     1,
     'elem',
     due,
     amt,
     int,
     bal
     from @table ) tbl
    order by "s!1!elem!hide", parent
    for xml explicit, type)

This version of the function executes in less than a second in virtually every test case I run it against. This is what I would call a significantly better approach to the problem because it gives us a better, positive version of our last best practice. Use FOR XML to generate infosets into XML type variables instead of using serial construction.

Lesson Learned
The XML type is indeed a powerful option for working with XML in SQL Server 2005, especially for storage and query. However, you should keep in mind that making frequent changes to instances of this type can be very expensive. Such operations should be kept to a minimum. This brings us back to the three best practices:

  • Initialize the value of an XML type instance as fully as possible.
  • Minimize the number modifications made against the instance.
  • Use FOR XML to generate infosets into XML type variables instead of using serial construction.
Following the Tao of the XML Type is simple as long as you keep the three best practices in mind.

More Stories By Kent Tegels

Kent Tagels is Senior System Analyst for HDR, Inc. in Omaha, and an author, .NET developer, and administrator. MCSE+I, MCDBA and MCP+SB.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


IoT & Smart Cities Stories
DXWorldEXPO | CloudEXPO are the world's most influential, independent events where Cloud Computing was coined and where technology buyers and vendors meet to experience and discuss the big picture of Digital Transformation and all of the strategies, tactics, and tools they need to realize their goals. Sponsors of DXWorldEXPO | CloudEXPO benefit from unmatched branding, profile building and lead generation opportunities.
In this Women in Technology Power Panel at 15th Cloud Expo, moderated by Anne Plese, Senior Consultant, Cloud Product Marketing at Verizon Enterprise, Esmeralda Swartz, CMO at MetraTech; Evelyn de Souza, Data Privacy and Compliance Strategy Leader at Cisco Systems; Seema Jethani, Director of Product Management at Basho Technologies; Victoria Livschitz, CEO of Qubell Inc.; Anne Hungate, Senior Director of Software Quality at DIRECTV, discussed what path they took to find their spot within the tec...
The deluge of IoT sensor data collected from connected devices and the powerful AI required to make that data actionable are giving rise to a hybrid ecosystem in which cloud, on-prem and edge processes become interweaved. Attendees will learn how emerging composable infrastructure solutions deliver the adaptive architecture needed to manage this new data reality. Machine learning algorithms can better anticipate data storms and automate resources to support surges, including fully scalable GPU-c...
Disruption, Innovation, Artificial Intelligence and Machine Learning, Leadership and Management hear these words all day every day... lofty goals but how do we make it real? Add to that, that simply put, people don't like change. But what if we could implement and utilize these enterprise tools in a fast and "Non-Disruptive" way, enabling us to glean insights about our business, identify and reduce exposure, risk and liability, and secure business continuity?
Nicolas Fierro is CEO of MIMIR Blockchain Solutions. He is a programmer, technologist, and operations dev who has worked with Ethereum and blockchain since 2014. His knowledge in blockchain dates to when he performed dev ops services to the Ethereum Foundation as one the privileged few developers to work with the original core team in Switzerland.
DXWorldEXPO LLC announced today that Telecom Reseller has been named "Media Sponsor" of CloudEXPO | DXWorldEXPO 2018 New York, which will take place on November 11-13, 2018 in New York City, NY. Telecom Reseller reports on Unified Communications, UCaaS, BPaaS for enterprise and SMBs. They report extensively on both customer premises based solutions such as IP-PBX as well as cloud based and hosted platforms.
"Akvelon is a software development company and we also provide consultancy services to folks who are looking to scale or accelerate their engineering roadmaps," explained Jeremiah Mothersell, Marketing Manager at Akvelon, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
"Space Monkey by Vivent Smart Home is a product that is a distributed cloud-based edge storage network. Vivent Smart Home, our parent company, is a smart home provider that places a lot of hard drives across homes in North America," explained JT Olds, Director of Engineering, and Brandon Crowfeather, Product Manager, at Vivint Smart Home, in this SYS-CON.tv interview at @ThingsExpo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
DXWordEXPO New York 2018, colocated with CloudEXPO New York 2018 will be held November 11-13, 2018, in New York City and will bring together Cloud Computing, FinTech and Blockchain, Digital Transformation, Big Data, Internet of Things, DevOps, AI, Machine Learning and WebRTC to one location.
The current age of digital transformation means that IT organizations must adapt their toolset to cover all digital experiences, beyond just the end users’. Today’s businesses can no longer focus solely on the digital interactions they manage with employees or customers; they must now contend with non-traditional factors. Whether it's the power of brand to make or break a company, the need to monitor across all locations 24/7, or the ability to proactively resolve issues, companies must adapt to...