Microsoft Cloud Authors: Pat Romanski, Andreas Grabner, Nick Basinger, Kevin Benedict, Liz McMillan

Related Topics: Microsoft Cloud

Microsoft Cloud: Article

Database Unit Testing with the MassDataHandler

An open source framework to test your database

After years of nearly every tradeshow, magazine, and technologist saying how wonderful unit tests are, developers know that they're supposed to write unit tests. But it's not always that easy. For example, while most developers can write tests for a trivial class library, it's much harder to write tests for the database.

There is much value in testing the database - it can contain complicated stored procedures, transactions, and functions that could easily cause errors. Preventing those errors can be worth the time it takes to write the test.

This article will explain some of the theories and difficulties behind testing the data layer, and then show how a new open source tool, the MassDataHandler, helps solve those problems, resulting in value-added unit tests. You can download it for free from Microsoft's new "CodePlex" Web site: www.codeplex.com/Wiki/View.aspx?ProjectName=MassDataHandler

The standard pattern for a deterministic test is to:

  1. Set the initial state.
  2. Run the system under test (SUT).
  3. Check the final state.
While this technique is trivial for a class library, it can be very difficult for a database because a database is a complex collection of potentially thousands of integrated objects. In a sense, it defies the very nature of what a unit test is. The big problem is how to set the initial state. Once that's done, we know that we can run the SUT with a SQL or data-access call, and then check the final state with a SQL query.

Setting the initial state requires first creating the schema, and then populating the database with test data. Each test - in order to run independent of the others - must clear the appropriate tables (to avoid interference) and then create its own data, mostly likely with a group of insert statements. However, generating these insert statements is messy. Just look at a simple example to insert a few columns in just three rows:

insert into Customer (CustomerName, Notes, LastUpdate)
values ('Bob',null, cast('03/04/2006' as varchar(30)) )
insert into Customer (CustomerName, Notes, LastUpdate)
values ('Sarah','smart employee', cast('06/03/2005' as varchar(30)) )
insert into Customer (CustomerName, Notes, LastUpdate)
values ('Simspon','show up for work on time', cast('09/17/1997' as varchar(30))


  • A huge amount of error-prone, textual overhead, like recopying the insert line for each row and calling extra functions just to create the right data type.
  • Brittle insert statements that add a non-null column to the table and break everything in the process.
  • Irrelevant non-null fields. (Say a table has 20 required fields that you don't need; you still need to insert data into all of them to create a single row.)
  • Hard coded identity values that change with each test run.
  • Visual separations in the value and the column it inserts into, making it harder to ensure the right value goes to the right column.
Creating and maintaining the base data used for database tests is so problematic that it becomes impractical to write the tests. That's where the MassDataHandler comes in - it uses an XML format and knowledge of the database schema to make it easy to insert test data. It's a tool that makes writing database unit tests worthwhile.

How the MassDataHandler Does It
The MassDataHandler framework takes a user-friendly XML snippet that contains the relevant data to insert. Because these are just XML snippets, you can store them either as literal strings in code or as physical files. Using knowledge of the database schema to do all the grunt work, it translates that snippet into SQL inserts and populates the database. Figure 1 shows a sample XML fragment that inserts data into several tables. There are three types of root elements: a variables section, an import element that dynamically includes other scripts, and a table element to insert data into the named table. The table elements have child row elements, where each node inserts a row into the database. The database column is the XML attribute; the XML value is what gets inserted into the database.

While the online tutorials go into more detail, the MassDataHandler scripts have several benefits:

  • Variables: You can use variables in attribute values to refactor out common text strings. A variable is referenced with an MSBuild-like syntax of $(…).
  • Identity Inserts: The MassDataHandler saves the identity value of every row it inserts, and lets you reference them later with the '@' operator. So say you've inserted into the customer and product tables, and then need both their primary key (identity) values to insert into the child table order. When you specify CustomerId="[email protected]", where N is a row (1, 2, 3, etc....), the framework is smart enough to check the customer table, lookup the nth row's value, and then automatically insert that.
  • Non-Null Columns: You only need to specify the columns that you care about. The MassDataHandler uses knowledge of the database schema to determine any required columns that weren't explicitly mentioned, and then inserts minimal default data into those columns so that you can still insert a row without worrying about them. For example, if the LastUpdate column is required, yet you omit it from your XML script because it's irrelevant to your test, the framework will still populate it with a default date. Likewise, if someone adds a new required column, it won't break your tests because the framework will just automatically populate it.
  • Multiple Tables: You can insert data into multiple tables with the <Table> element (by default it clears all the tables listed, helping your tests to have a fresh baseline each time).
  • Default Rows: You can specify a default row using the <Default> element that sets the values for certain columns. You can then override the default with a <Row> element that specifies specific values for specific columns. The next section explains this more.
Getting Values for Rows
The framework uses several steps to determine what column values to insert into the database table. For each <Row> element, the framework first starts with the minimal inserts for that table (i.e. the non-null, required columns). This ensures that all required columns always get populated. It then overrides that with any <Default> element, overrides that again with the specific <Row> element, and lastly evaluates the variables and identity lookups. Figure 2 illustrates this with a 5-column table. For simplicity, these columns are all varchar (string). For example, Column 2 is required, and therefore has a default value of "A." However, we override that with the <Default> element for a new value of "B." Then the given <Row> has the final say. For Column 2 it overrides everything and explicitly sets the value to "C," whereas for Column 3 it doesn't specify anything and therefore keeps the original value of "B" from the <Default>.

One of the ways that the MassDataHandler helps simplify database unit tests is by allowing you to round-trip data between SQL, XML snippets, and ADO.NET DataTables (see Figure 3). The intent is that the MassDataHandler primarily works with XML Snippets, supplying methods to convert the snippet to live SQL data. However, it also supplies methods to create an XML Snippet from a SQL select (thus round-tripping between XML and SQL Data). Furthermore, the framework lets you create a snippet from a DataTable, as well as create a DataTable from a snippet. Therefore, using the snippet as an intermediate step, you can quickly convert data between any of the three formats. This is especially useful when you need to create snippets from existing data or check a large result set to verify that your test worked correctly.

How to Install
Installing the MassDataHandler is quick. You can download it for free from Microsoft's open source Web site, CodePlex. The root directory has a batch file appropriately named "ClickHereToStart.bat". This file triggers a WinForm to collect environmental inputs (database connection info and your MSTest and MSBuild directories) (see Figure 4). With this info, the batch then kicks off an MSBuild script to compile the source code, install a test database, run the tutorial (which is really just an MSTest suite showing how to use the framework), and lastly opening up the solution. The MSBuild script takes less than 30 seconds to run; if it passes, then the framework successfully ran on your machine.

The Readme file lists the key environmental assumptions, namely that the MassDataHandler framework was created for .Net 2.0 and SQL Server 2005, that it requires MSTest, and that you've somehow created a test database that has the sufficient schema (excluding foreign keys).

The downloadable package includes an MSTest suite that shows the MassDataHandler in action. Because it can be easiest to learn from actually seeing working code, this is intended as a tutorial. At the time of this writing, it contained about a dozen tests, starting with the basic concepts, and gradually working up to the advanced ones. Users can probably find a test that is similar to what they need, and just adapt the code appropriately.

Using GUI
The MassDataHandler is only as powerful as the XML snippets that you can write. Therefore it includes a visual editing tool, both to test the scripts (see Figure 5) and to create snippets from existing data (see Figure 6), which is extremely useful as your data gets more complicated. The GUI is straightforward to begin with, and it gets compiled (from the MSBuild script running) to the "MassDataHandler.Gui\bin\Debug" directory.

Frequently Asked Questions
Q: Is it cheating to have the database created as a separate step?
A: No. Although ideally in unit tests, each test runs perfectly in isolation, without any prerequisites, the very nature of database tests is not ideal. It could take several minutes to completely recreate a database schema. It's worth having your local update script (or test run script) first upgrade a test database with the latest schema, if it means your unit tests perform more quickly. However, it's also possible to require each test to individually check for the latest database schema and only trigger an upgrade if necessary - it just may not be worth your time to implement.

Q: Why not just use mocking?
A: By its very definition, mocking "mocks out the database," preventing it from being tested, which is the exact opposite of what we want to do. Mocking is great for testing the business layer by decoupling it from the database. But our goal is to test the database.

Q: But won't your tests be brittle?
A: A little, yes. But, we're trying to solve a complicated problem, and the extra brittleness goes with the territory. However, by using the MassDataHandler's flexible XML scripts, we bring the brittleness of our database tests down to a manageable level.

Q: You can't really call this a unit test.
A: It's semantics. Technically a unit test runs only a single unit of code, and the very nature of database tests requires integration (interconnected tables, stored procs, views, functions). However, by making all the other pieces so robust, you can effectively run these database tests as if they were unit tests. They're run from within MSTest (alongside all your other unit tests), they're mostly atomic, they perform quickly, and they safeguard your advanced SQL logic. You can call them any testing buzzword you choose, but the point is they add value to your project.

Testing the database is difficult, yet beneficial. The MassDataHandler is a free tool that makes it easier to create and maintain database tests. With the importance of the data layer in today's enterprise applications, it's a good investment worth checking out.

More Stories By Timothy Stall

Tim Stall is a software developer at Paylocity, an independent provider of payroll and human resource solutions. He can be contacted at [email protected]

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

IoT & Smart Cities Stories
Moroccanoil®, the global leader in oil-infused beauty, is thrilled to announce the NEW Moroccanoil Color Depositing Masks, a collection of dual-benefit hair masks that deposit pure pigments while providing the treatment benefits of a deep conditioning mask. The collection consists of seven curated shades for commitment-free, beautifully-colored hair that looks and feels healthy.
The textured-hair category is inarguably the hottest in the haircare space today. This has been driven by the proliferation of founder brands started by curly and coily consumers and savvy consumers who increasingly want products specifically for their texture type. This trend is underscored by the latest insights from NaturallyCurly's 2018 TextureTrends report, released today. According to the 2018 TextureTrends Report, more than 80 percent of women with curly and coily hair say they purcha...
The textured-hair category is inarguably the hottest in the haircare space today. This has been driven by the proliferation of founder brands started by curly and coily consumers and savvy consumers who increasingly want products specifically for their texture type. This trend is underscored by the latest insights from NaturallyCurly's 2018 TextureTrends report, released today. According to the 2018 TextureTrends Report, more than 80 percent of women with curly and coily hair say they purcha...
We all love the many benefits of natural plant oils, used as a deap treatment before shampooing, at home or at the beach, but is there an all-in-one solution for everyday intensive nutrition and modern styling?I am passionate about the benefits of natural extracts with tried-and-tested results, which I have used to develop my own brand (lemon for its acid ph, wheat germ for its fortifying action…). I wanted a product which combined caring and styling effects, and which could be used after shampo...
The platform combines the strengths of Singtel's extensive, intelligent network capabilities with Microsoft's cloud expertise to create a unique solution that sets new standards for IoT applications," said Mr Diomedes Kastanis, Head of IoT at Singtel. "Our solution provides speed, transparency and flexibility, paving the way for a more pervasive use of IoT to accelerate enterprises' digitalisation efforts. AI-powered intelligent connectivity over Microsoft Azure will be the fastest connected pat...
There are many examples of disruption in consumer space – Uber disrupting the cab industry, Airbnb disrupting the hospitality industry and so on; but have you wondered who is disrupting support and operations? AISERA helps make businesses and customers successful by offering consumer-like user experience for support and operations. We have built the world’s first AI-driven IT / HR / Cloud / Customer Support and Operations solution.
Codete accelerates their clients growth through technological expertise and experience. Codite team works with organizations to meet the challenges that digitalization presents. Their clients include digital start-ups as well as established enterprises in the IT industry. To stay competitive in a highly innovative IT industry, strong R&D departments and bold spin-off initiatives is a must. Codete Data Science and Software Architects teams help corporate clients to stay up to date with the mod...
At CloudEXPO Silicon Valley, June 24-26, 2019, Digital Transformation (DX) is a major focus with expanded DevOpsSUMMIT and FinTechEXPO programs within the DXWorldEXPO agenda. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of business. Only 12% still survive. Similar percentages are found throug...
Druva is the global leader in Cloud Data Protection and Management, delivering the industry's first data management-as-a-service solution that aggregates data from endpoints, servers and cloud applications and leverages the public cloud to offer a single pane of glass to enable data protection, governance and intelligence-dramatically increasing the availability and visibility of business critical information, while reducing the risk, cost and complexity of managing and protecting it. Druva's...
BMC has unmatched experience in IT management, supporting 92 of the Forbes Global 100, and earning recognition as an ITSM Gartner Magic Quadrant Leader for five years running. Our solutions offer speed, agility, and efficiency to tackle business challenges in the areas of service management, automation, operations, and the mainframe.