YOUR FEEDBACK
Werner Keil wrote: Java 6 update 10. If I'd be running Apple, I'd probably really drop dead...

SYS-CON.TV
TOP MICROSOFT .NET LINKS


Effective Database Change Management
Versioning, developing concurrently, deploying and upgrading

Have you ever been on a project where software development worked beautifully but developing and maintaining the database always caused unexpected problems and bugs? Do your changes constantly get overwritten by other developers, or is only one person at a time allowed to make changes? Do you find, after two or three major releases, that it's impossible to create upgrade scripts for existing production databases? After experiencing these frustrations and more, I decided to address them.

In this article, I will take a close look at the problems that many software projects face with database development, analyze the cause, and recommend best practices to work around them. The good news is that with the tools available today, you will be able to give a considerable boost to your database development process (even though there is still room for improvement). The bad news is that the techniques described in here are highly addictive!

Source Control for Databases
Source control for databases (or version control, or revision control) is the management of multiple revisions of a database. Each revision represents a number of changes that were made by developers, which include changes to the database schema, the contained data, programmability (stored procedures, user-defined functions, triggers), and permissions. Just as with source code, it is of utmost importance to keep track of these changes as the database evolves.

Some common but suboptimal approaches for database version management, often applied because of a lack of better alternatives, include:

  • Checking a large binary database backup into source control
  • Checking in one huge build script
  • Not checking in the database at all
The most prominent problem encountered in all of these approaches is that traceability of changes gets completely lost. Currently I'm not aware of any diff tool (a tool that compares two files for differences) that can compare two compressed, binary backups to determine which tables or stored procedures were changed between two revisions. This lack of traceability will soon lead to tedious and error-prone manual work when it comes to creating scripts for upgrading production databases.

When we compare the above approaches to the way we version regular source code (e.g., for a .NET assembly), we notice a big difference. The source code for a .NET assembly is broken down into many small files, each containing only one class at a time (by using partial classes, even one class could be spread across multiple files). Therefore, when developers create new features or fix bugs, only a small number of files will actually change. Subsequently, source control will pick up only these few changed files and check them in. A revision log combined with a diff tool will allow anybody to pinpoint changes in code within seconds.

We can accomplish the same for database development by increasing the granularity in source control. Rather than checking in one big chunk of data (e.g., a one-file build script), we need to break down the repository representation of the database into many small pieces. Clearly we will need tool support for this. Ideally, such a tool supports round-trip engineering, which means we can build a database right from a local checkout folder of the repository (working copy), make changes to the database using tools such as SQL Management Studio, and script out the database to the local checkout folder again in order to check in only a few modified pieces.

Figure 1 shows the file-based representation of a database as it was created by the DbGhost tool from Innovartis. Each entity in a database (such as a table, stored procedure, or login, etc.) is scripted to an individual file. Furthermore, in this screenshot you can see the overlay icons from TortoiseSVN (the UI for source control tool Subversion). At a glance, you can detect which parts of a database were changed (the red exclamation mark) or which parts are new (the blue plus).

Such a fine-grained, file-based approach provides a number of great benefits:

  • Results in a very detailed trail of changes in source control
  • Establishes a direct correlation between source code changes and database when you check in modifications from both parts at the same time
  • Enables incremental and fast updates of working copies/local checkout folders (no need to update a binary BLOB of hundreds of megabytes with every database change)
  • Allows for all the standard source control operations such as branching, merging, or tagging
Furthermore, proper database source control is the foundation of any of the advanced techniques that I discuss in the remainder of this article.

As a side note I'd like to mention that many tools provide their own proprietary storage type and format, which is not necessarily text-based like the one shown above. Rather, they use a proprietary format, such as snapshot files or a SQL database as source control repository. While this approach might be better, e.g., with regards to performance, I prefer a purely file-based approach because it tightly integrates with the same source control system used for the related software project.

Concurrent Development
I have seen many software projects rely on databases where database development was a true bottleneck, unpredictably delaying software development. Looking closer into these projects, it turns out that in these cases only one developer would be allowed to work on a database because concurrent development was not be feasible. This directly relates to the suboptimal versioning practices that I outlined in the beginning of this article because none of these projects was able to cope with concurrent changes to a database by multiple developers.

What would concurrent database development look like when using a fine-grained database representation? First of all, there are two major approaches, each with its own pros and cons:

  • Shared database: All developers work on the same development database and make their changes.
  • Sandbox approach: Each developer works on a local, isolated copy of the database, created directly from the repository.
In my opinion, the latter approach clearly wins because of the complete isolation of development environments. Once a developer completes a work package, he will script out the database, run tests, and check in his changes.

The interesting part is to see what happens in the case of merge conflicts. First of all, the chances that two developers will change the same database object are already much lower when using a fine-grained representation. Second, a good source control tool will be able to merge most of the changes automatically (e.g., both developers modified a different column). But just as when working with regular source code, an automatic merge can (and will) fail periodically, producing a merge conflict that needs to be resolved manually. Figure 2 shows how the process to resolve this merge conflict works.

For the most part, such an approach will take the guesswork out of merging conflicting changes in database development and enable concurrent development by multiple developers, effectively eliminating the typical bottleneck encountered in linear development.

UPGRADING PRODUCTION SYSTEMS
The last problem I will write about in this article is related to upgrading databases in production systems. After deploying a number of applications of, let's say, version 1.0, your customers likely start busily collecting gigabytes of valuable business data. Chances are, when version 2.0 comes out, the underlying database schema will have gone through quite a number of changes in the following areas:

  1. Programmability: Stored procedures, user-defined functions and triggers
  2. Database schema: Tables, columns, constraints, primary and foreign keys
  3. Static data: Lookup tables, categorizations, dictionaries, etc.
In the first category, the changes are fairly easy to deal with by simply dropping and re-creating these programmability objects. The real problem arises with changes of type 2 and 3.

Some changes to the database schema are simply additions, such as new columns or tables. However, other changes are the result of complex refactorings, data normalizations or denormalizations. The key to any database upgrade strategy is to transform your customers' live production data in such a way that no data gets lost and that the transformed data adheres to all constraints and is coherent with what your application expects.

In general, there are two strategies to perform such an upgrade:

  • Create an empty database of v2.0 and move all v1.0 data over while performing the necessary transformations.
  • Perform an on-the-fly upgrade by executing a sequence of small schema changes and data transformation steps to bring the v1.0 production database to v2.0.
While the first approach can be somewhat easier, the drawbacks are obviously the need for twice the disk capacity and possibly lengthy upgrades, as gigabytes of data need to be moved from one database to the other. So let's see how we can establish a predictable and reliable process to develop on-the-fly database upgrade scripts.

Again, the key to success is tool support, this time using a database synchronization tool. Such a tool takes a source and a target database, compares all database objects between the two (schema, data, programmability, and security), and then upgrades the target database by creating, modifying and dropping objects in the target database. The result of this synchronization process is two exact database copies. During the synchronization, the DB sync tool will record a SQL script with all executed commands, which will allow replaying the upgrade process later.


About Christoph Wienands
Christoph Wienands is software engineer at Siemens Corporate Research, NJ. He received his Diplom-Informatiker (FH) at the University of Applied Sciences in Furtwangen, Germany. His current research activities include software factories, model-driven development, and domain-specific languages. Due to his research activities he is frequent speaker at conferences such as UML World and SD West.

MICROSOFT .NET LATEST STORIES
We are seeing more being written about Cloud computing and cloud platforms today, and there is strong validation that the future of computing will include significant innovation and value in web/cloud platforms. Microsoft’s Cloud strategy is materializing, and as part of our overall ...
Nth Penguin has released WW.DataServices to the public and is available for immediate download at: www.nthpenguin.com. WW.DataServices, the first system of the WebWidgetry engine, removes all the work from accessing your data. You simply point it to a database location, push a button,...
Gizmox announced the release of a fully functional beta version of its Visual WebGui (VWG) with support for Microsoft Silverlight. For the first time, VWG enables Silverlight for enterprise applications by providing a RAD like Windows Forms development experience with drag & drop desig...
Google will come out from behind the Firefox browser that it’s been pumping money into – and profiting royally from – and take direct aim at Microsoft with a browser of its very own. The widgetry is called Google Chrome and Google Chrome, like all of Google’s non-search widgetr...
Many of today (and tomorrow’s) development projects lend themselves nicely to RIA application patterns. Silverlight offers a compelling RIA development experience that works on Linux, the Mac and windows as well as all major browsers. With HD video, vector based graphics and a rich s...
SUBSCRIBE TO THE WORLD'S MOST POWERFUL NEWSLETTERS
SUBSCRIBE TO OUR RSS FEEDS & GET YOUR SYS-CON NEWS LIVE!
Click to Add our RSS Feeds to the Service of Your Choice:
Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
Publish Your Article! Please send it to editorial(at)sys-con.com!

Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021


SYS-CON FEATURED WHITEPAPERS

ADS BY GOOGLE
BREAKING NEWS FROM THE WIRES
DataCore Software, the leading provider of storage virtualization software, joins Microsoft's "Get V...