Welcome!

.NET Authors: Bruce Armstrong, Marek Miesiac, Jason Dolinger, Yeshim Deniz, Liz McMillan

Related Topics: .NET

.NET: Article

Indexed LINQ

Optimizing the performance of LINQ queries using in-memory indexes

Language Integrated Query (LINQ) is a very powerful new technology coming to us with Visual Studio 2008. There is a great deal of innovation going on in the LINQ space, including innovative projects like LINQ-to-Flickr and LINQ-to-Amazon (among many others), in addition to the great things Microsoft is providing in LINQ-to-SQL and LINQ-to-XML.

This article dives deeper into the more mundane details about how to optimally query data you already have in memory via LINQ-to-Objects. LINQ lets you do queries on any object (typically collections or arrays) that support IEnumerable<T>. For example, if you have an online bookstore, you might decide to cache your products in memory on the Web server to improve performance. In such a case, you might have a List<Books> bookCache that you query this way when someone goes to look up a book:

var searchResult = from b in bookCache
where b.ISBN="978-1590596326"
select b;

Now, imagine your cache of books for your Web site contains, say, a million books, and you need to make this scale in a serious way. In such a case, you might be interested in optimizing how LINQ works.

How LINQ Queries in Memory Objects
When you use LINQ syntax to do a query, what really happens is that the syntax is converted to a series of method calls, such as where, join, and others related to the query "operators" that LINQ supports. Of interest to us is, for purposes of a query, how Microsoft implements "Where" in LINQ to Objects. Given that Where has to work for any IEnumerable<T>, there's very little that the default implementation can do to optimize the query. As a result, the default search over IEnumerable<T> in LINQ is a sequential search. In other words, if you are searching for the book at the bottom of a million-book pile, your search may not be terribly fast.

Indexing to the Rescue
Nearly any searching algorithm is better than sequential search. Of course, to use any of those algorithms, we have to know a little bit more about what we are searching. If your items are in order, you might use a binary search, which would help quite a bit. However, for the best possible performance, you want your search to use an index - just like every relational database worth mentioning does when it searches through a large collection of rows.

Implementation of a Generic Index
As it turns out, we need three elements to be able to do this in a generic manner:
• An ability to easily specify which fields in a class we are going to have an indexed collection of that we are going to index.
• A mechanism that allows for building and maintaining these indexes automatically.
• Modification of the implementation of Where so that it takes advantage of the index, rather than doing a sequential search.

The [Indexable()] Attribute
If we're going to have an easy-to-use indexing mechanism, we need a simple way to answer the "what" question - as in what is it we are going to build indexes on. Building indexes on every property in a class is almost certainly going to be too much - causing performance lags when you build the collection, not to mention wasting memory if we build indexes on fields that aren't searched on. To be more specific about what we are going to index, we need to add some meta-information to our class. That's exactly what Attributes are for.

Thankfully, for a simple indexing implementation, we need nothing more than to mark certain properties as indexable. The implementation of the attribute is very simple:

public class IndexableAttribute : Attribute { }

Once in place, we can do the following with any class we intend to use in an indexed collection:

   class Book
   {
     private string _isbn;
     private string _authorFirstName;
     private string _authorLastName;
     private int _id;

     [Indexable()]
     public string ISBN { get { return _isbn; } set { _isbn = value; } }

     [Indexable()]
     public int Id { get { return _id; } set { _id = value; } }

     public string AuthorFirstName {
         get { return _authorFirstName; }
         set { _authorFirstName = value; }
     }

     [Indexable()]
     public string AuthorLastName {
         get { return _authorLastName; }
         set { _authorLastName = value; }
     }
   }

Implementation of IndexableCollection<T>
The next step is to have a collection that, while based on the collection functionality that .NET gives us, lets us intercept certain operations on it that are of interest to maintaining an index. Namely, we want, at the very least, to know when something is added or removed from the collection.

To start, we'll derive from Collection<T>, from System.Collections.ObjectModel, as follows:

    public class IndexableCollection<T> : Collection<T>
    {

    }


About Aaron Erickson

Aaron Erickson is a principal consultant for Magenic. He is a ruthless advocate for concentrating on creating the most business value in the least amount of time when his clients entrust him to deliver a technical solution. Aaron has been delivering solutions on the Microsoft platform for over 14 years, and currently leads open source development efforts related to LINQ, including Indexes of Objects (i4o) and LINQ to Expressions (MetaLinq).

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.