Welcome!

.NET Authors: Bruce Armstrong, Marek Miesiac, Jason Dolinger, Yeshim Deniz, Liz McMillan

Related Topics: .NET

.NET: Article

Best Practices for Advanced Data Persistence in Enterprise .NET

What .NET can learn from Java

The .NET framework has become the platform of choice for many enterprise applications. Not surprisingly, many .NET concepts parallel those in CORBA and J2EE. The benefit of hindsight gives .NET architects and developers the opportunity to imitate the success while avoiding the pitfalls inherent in complex distributed development. This is especially true for data persistence, which often gets less attention than it deserves.

Issues with data persistence are often not discovered until the later stages of development and testing, when the options for addressing them are limited and expensive. A well-designed data persistence layer can drastically improve the performance of the entire application, if it's addressed throughout the development process.

This article outlines best practices for developing advanced data persistence in a .NET application. It discusses the development and runtime requirements of an effective data layer and then shows the benefits of an object-oriented approach and intelligent persistent data caching, comparing these with the data persistence facilities available in the .NET framework.

Requirements for a Efficient Data Persistence Layer
A data persistence solution must address both development and runtime challenges. It must be able to support the complex data requirements of an enterprise application - without reducing developer productivity or compromising performance.

Enterprise applications have similar requirements for data persistence regardless of the programming languages they were developed with and the platforms they're deployed on. One lesson learned early on in the Java community was that the representation of relational data as objects yields huge returns. The Java world continues to focus on this object-oriented approach with EJB, JDO and more recently Hibernate. Successful .NET architects can benefit from Java's lessons.

Nobody wants to spend a lot of time writing, testing and debugging data persistence code. For the most part, this is plumbing code that "should just work" without a lot of effort. Developers want to focus their efforts on the custom business logic of their application. However, because data persistence is an important element of the overall application, it can't be ignored. Ideally a solid data persistence layer requires minimal developer intervention. Choosing good tools that assist in object-relational mapping and code generation along with the careful design of an efficient data persistence model helps meet both these requirements.

However, the ability to design and build an application quickly doesn't guarantee success at deployment. Performance and scalability are possibly the two most popular terms in enterprise development. A poorly designed data access layer can significantly impact application performance, and ultimately availability and scalability. While some approach this problem by throwing more hardware at it, caching is the most common and effective solution for performance problems.

While Web page caching is a common solution for improving response times, it doesn't address data bottlenecks for enterprise applications. Applications with complex data requirements quickly become unmanageable when data access is coded directly in Web components. Instead, business logic and data access typically reside in the server tier between the database and Web components as shown in Figure 1. Scalability and performance problems occur when the volume of requests from the server tier overwhelm the limited database resources. Server tier caching of persistent data increases the application's capability for handling many concurrent requests.

So, important success factors for a data persistence layer include:

  • Developer productivity and code maintainability
  • Built-in runtime performance and scalability
  • High-availability features for deployment
The remainder of this article describes the approaches that have been proven in successful enterprise Java applications. The availability of new third-party products for .NET development that embrace some or all of these approaches indicates that they are already gaining acceptance in the .NET community.

Advantages of an Object-Oriented Approach for Developer Productivity
For the most part, enterprise applications work under complex data requirements. For example, a banking application may perform thousands of transactions an hour. Each transaction may consist of different bits of data such as customers, accounts and regulatory information. This data is most likely stored in multiple tables in multiple different database systems. By programming to a data abstraction layer, developers are isolated from the underlying details of the data persistence layer. Instead, they are able to treat the persistent objects the same as they would any other object. Data abstraction lets developers easily manage complex data requirements. By hiding the details of the underlying data sources (through design and/or runtime activities), the data access APIs become much simpler to work with.

When Microsoft first introduced the .NET Framework, it also introduced a data layer API called ADO.NET. Derived from ADO, an earlier form of data access, ADO.NET's goal is to work (through third-party adapters) with various data sources throughout the Internet. Its status as a de facto standard, coupled with its tight integration to Visual Studio.NET, quickly made ADO.NET useful to developers who were working with relational data.

Unfortunately, ADO.NET is not an object-oriented approach. Although it facilitates access to various data sources, ADO.NET remains a relatively low-level API whose code can be generated either through Visual Studio or by third-party tools. Regardless of what tool is used, complex systems usually need some additional data persistence code, which must be written manually. In addition, ADO.NET can't directly satisfy the requirement of transparent object persistence. This is because the API is geared towards the processing of tabular result sets (the DataSet object) built from SQL queries, rather than towards the business objects used by the application.

In contrast, the O-R (Object-Relational) mapping approach takes all that's good from object-oriented programming and applies it to the data layer. Concepts such as encapsulation, abstraction and reusability make the data layer easier to manipulate and use, meaning faster, more efficient development.

The Data Persistence Object Model
At the heart of the object-oriented approach lies the data persistence object model. This model is similar in look-and-feel to any other object model in your application (such as business objects). The only difference is that unlike business objects, the objects in the data persistence object model actually represent data stored in persistent storage devices (such as a relational database). The underlying mapping details are dealt with as part of a design and/or runtime activity, meaning the application code doesn't need to contain any database-specific code.

For example, a relational table A_Table can be modeled and mapped to the C# class A.cs. Column names become property names, and the CRUD operations are replaced with methods.

Figure 2 shows two tables (Department and Employee) that have a one-to-many relationship. The foreign key (DeptID) is on the Employee table.

Suppose we want to get all Employees associated with the "sales" department. In ADO.NET the code may look like the following:

Now let's take a look at how the same operation could look using an O-R mapping system.

Notice how the O-R mapped solution uses no SQL or database-specific code. In fact, the relational data is treated the same as any ordinary .NET object. A closer look at Sample 2 shows how relationships are represented and used in the O-R approach (dept.employees). Obtaining information in related tables becomes as simple as accessing an object attribute.

Advanced Features of an Object-Oriented Approach
The previous sample showed how the O-R mapping approach improves the usability and maintainability of the data persistence code for simple tables. But what about more complex data models, those often found in enterprise applications? For example, what if Employee was simply a base type in a model that allowed for various types of Employees (see Figure 3).

Inheritance is a powerful feature of object-oriented programming. Therefore applying this capability to the object model further enhances the usability of the data persistence code. Now the developer can work with and understand the data persistence objects in a more natural way.

Products that offer this level of O-R mapping functionality usually provide various tools to define the mapping and generate code for persistent objects. Choosing a solution with a complete toolset will greatly improve productivity. For example, some products offer a tool that will read and generate object models directly from existing database schemas. This means that with very little time and effort, and object model can be created, code can be generated and the developer can enjoy the benefits the object-oriented approach brings to the data persistence layer.

Integrating Caching for Performance and Scalability
As mentioned earlier, server-tier caching is the most common and effective solution for performance problems. Caching works by moving data closer to the business processes and reducing the number of database calls.

ADO.NET can provide a form of lightweight caching by using disconnected datasets. In using disconnected datasets, DataSet information is serialized to XML files, which can then be modified "offline" in local processes. Some folks in the Microsoft world feel that disconnected datasets are the .NET way to implement data caching. However, caching wasn't the initial intent behind disconnected datasets; designers were simply looking for a way to reduce the number of open database connections. By tearing down the connection after data has been transferred to or from the database, disconnected datasets improve performance on both the database and application side (clients are no longer blocked).

Various other types of caching solutions are available. However, it's important to realize that caching strategies aren't equal. Depending on the complexity of your application, certain caching solutions may result in significant differences in performance and scalability. Some also require much more development effort to integrate and manage the cache.

Questions you need to ask when looking for a caching solution include:

  • What kind of caching is supported (static or dynamic)? How much of my data can be cached?
  • Does it leverage the object model (caching relationships)?
  • How useable is this solution? How much developer intervention is required? What kinds of management capabilities are available?
  • Will the cached data remain consistent and accurate if it's distributed across multiple applications?
Static Caching
Let's start the discussion with the most basic type of caching - static caching. A static cache (sometimes also called a read-only cache) contains data that never changes, usually reference data that is potentially used a number of times. Therefore instead of retrieving this data from the database every time it's needed, an application can simply query the cache, thus reducing the overhead of a database call.

Although a static cache can be a huge improvement over querying the database, it only solves performance problems for read-only data - which may be only a portion of the entire data model. For most enterprise systems, many elements in the data model are very dynamic (change frequently) and also need to be accessed at near real-time speeds. The old 80-20 rule applies: 80% of an application's transactions will be performed on 20% of the data. Caching that 20% can significantly improve performance, even for transactional data.

Dynamic Caching
It takes a dynamic or "intelligent" cache to fulfill the 80-20 requirement. A dynamic cache allows for data to be both read and updated in the cache. Although updates can ultimately be propagated to the database, the ability to update objects directly in the cache can increase performance as it immediately impacts users reading the updated data.

When looking at caching systems, make sure to understand the locking behavior for updates. One approach is to lock the data that's being updated (pessimistic locking). For example, if an Employee instance is getting updated, then all other threads sharing this cache will be blocked from reading (and possibly updating) this Employee until the update is complete. Although this method works, pessimistic locking tends to reduce performance for transactions on highly requested objects and negate the cache benefits.

More sophisticated caching solutions offer optimistic locking. By assigning version numbers to different revisions of the data, the cache will know if a thread is attempting to update stale data. By using a versioning mechanism, data in the cache never needs to be locked, thus increasing performance.

Relationship Caching
If a cache treats a data object as a black box, it can't understand and manage relationships between objects. An intelligent cache knows both the application object model and the database schema. It is aware, not just of object attribute values, but of the relationship between objects. By caching relationships, an intelligent cache can dramatically reduce the number of expensive join queries performed by the database, resulting in a corresponding improvement in performance. Enterprise applications, with their complex data models, can enjoy a surprisingly dramatic performance boost.

Usability
Another consideration is how much programming is required to implement the caching. If object-relational mapping classes are tightly integrated with the runtime cache, objects can be automatically instantiated in the cache whenever they are accessed by the application. Developers don't have to manually track or code which objects have to be cached, when they should be cached or when they should be cleared from the cache. Simple configuration of caching policies can optimize such intelligent caches. Other alternatives require significant manual coding and, in the worst cases, application re-architecture to insert a cache.

Distributed Caching To Increase Scalability and High Availability
To address the scalability and high-availability concerns of enterprise applications, many architects look at using distributed caches. Distributed caches are simply clusters of dynamic caches. When one cache in the cluster updates some data, that update will be sent to all participating caches in the cluster. Upon receiving the update, the participating caches can then apply the update directly to their own cache (see Figure 4).

Just as with caching, clustering technologies vary widely. Most simply invalidate objects when a change occurs to the underlying data. This means that the next request for any of the changed objects requires an expensive database call. True cache synchronization provides the updated information to all of the distributed caches, delivering the best performance.

Not only can a distributed caching system provide performance and scalability requirements of an enterprise application, it provides a safe foundation for advanced enterprise deployments such as those providing failover and load balancing. When all application caches are in sync (as in Figure 3) then if one server goes down, the other servers would easily be able to satisfy the additional failover requests with little to no performance impact on the end user. Ideally, cache synchronization complements server clustering mechanisms to provide better uptime in the case of database or network outages, as well as allowing for server-side failover.

Conclusion
Although data persistence is a critical component of enterprise applications, its importance is often overlooked. Many times architects and developers settle for de facto data persistence solutions only to experience problems later in development or at deployment. The .NET community can learn from the successes Java has had with enterprise applications, and take advantage of third-party development products that implement these techniques.

This article identified developer productivity, performance, scalability and high availability as important requirements for the data persistence layer of enterprise .NET applications.

Productivity gains can be realized by adopting an object-oriented approach and choosing code generation tools with flexible features. Choosing a good caching and distribution technology can increase performance, scalability and availability with minimal code changes. The combination of O-R mapping, caching and distribution for advanced data persistence can play a significant role in reducing risks and contributing to the success of enterprise applications.

Resources
For a detailed discussion of O-R mapping, see the following Web sites.

  • For Description of agile O-R mapping strategies: www.agiledata.org/essays/mappingObjects.html
  • For a collection of O-R mapping resources: www.objectarchitects.de/ObjectArchitects/orpatterns/index.htm
  • For a discussion of "build" versus "buy": www.objectstore.com/publications/caching_resources/index.ssp

    For more on caching, see the following Web sites.

  • For a research paper that compares different approaches: www.research.ibm.com/AEM/pubs/ssgrr01.pdf
  • For a discussion of "build" versus "buy": www.objectstore.com/publications/caching_resources/intell_caching/index.ssp
  • For products that automate the development of Data Persistence:
    www.alachisoft.com
    www.deklarit.com
    www.objectstore.com/products/edgextend/index.ssp
  • About Greg Aloi

    Greg Aloi is a product consultant for ObjectStore. He has eight years of experience designing, developing, and teaching distributed systems with a focus on the distributed data layer.

    About Tobias Grasl

    Tobias Grasl has been developing software for the past seven years, with a
    specific focus on metadata-driven infrastructure libraries and
    applications. His current interests include MDA, agile methodologies, and
    the impact of both on a team's ability to deliver quality software, fast.

    Comments (1) View Comments

    Share your thoughts on this story.

    Add your comment
    You must be signed in to add a comment. Sign-in | Register

    In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


    Most Recent Comments
    John Atrim 06/05/08 07:25:07 PM EDT

    This might be worth checking out regarding Data Persistance, interesting reading:

    http://www.icoblog.com/2008/05/new-data-layer-for-bitsdujour.html