Welcome!

.NET Authors: Liz McMillan, Yakov Werde, Matthew Pollicove , Kevin Benedict

Related Topics: .NET

.NET: Article

Indexed LINQ

Optimizing the performance of LINQ queries using in-memory indexes

Build the Index
The first thing we need is a structure to hold our indexes. An index can be defined as a sort of "dictionary of lists." The inner list holds everything that relates to a certain indexed value. The outer dictionary holds all the different indexed values and associates them with the appropriate inner list. For example, the inner list might hold everything that starts with "A." The outer list might hold 26 items, one for each letter (a list for "A," a list for "B," and so forth). When you look up an item, you would look up by its first letter, find the list to look in, and then search through the smaller list, rather than searching through everything. Our data structure for a particular index looks like this:

Dictionary<int, List<T>>

In our actual implementation, we'll look up by int rather than string for the index. We use int because int is what the GetHashCode() method returns from all objects in .NET. One of the keys to making this work is that all objects support GetHashCode() - which lets us, while not uniquely identify an object, effectively partition it into very small groups.

Of course, there may be more than one index in our collection. So in actuality, our implementation will use a Dictionary of indexes, which gives us the following structure in the end:

protected Dictionary<string, Dictionary<int, List<T>>> _indexes
= new Dictionary<string, Dictionary<int, List<T>>>();

Building the indexes on the construction of the collection is the next step. When we build a new IndexableCollection<T>, we need to reflect what we're collecting, find out which properties are going to be indexed (by looking for the presence of the IndexableAttribute), and create the index accordingly. The following routine, called from the constructor, accomplishes that goal:

   protected void BuildIndexes()
   {
     PropertyInfo[] allProps = typeof(T).GetProperties();
     foreach (PropertyInfo prop in allProps)
     {
       object[] attributes = prop.GetCustomAttributes(true);
       foreach (object attribute in attributes)
         if (attribute is IndexableAttribute)
           _indexes.Add(prop.Name, new Dictionary<int, List<T>>());
     }
   }

Intercept Collection Adds
Once indexes are built, you need a mechanism to add data to the indexes (and eventually remove it). In our implementation, we're going to redefine the Add method in our collection:

   public new void Add(T newItem)
   {
     foreach(string key in _indexes.Keys)
     {
       PropertyInfo theProp = typeof(T).GetProperty(key);
       if (theProp != null)
       {
         int hashCode = theProp.GetValue(newItem, null).GetHashCode();
         Dictionary<int, List<T>> index = _indexes[key];
         if (index.ContainsKey(hashCode))
           index[hashCode].Add(newItem);
         else
         {
           List<T> newList = new List<T>(1);
           newList.Add(newItem);
           index.Add(hashCode, newList);
         }
       }
     }
     base.Add(newItem);
   }

The mechanics of Add are to iterate through the known indexes, and using reflection, retrieve the hash code of the appropriate property value for each field on the object we're going to index. Once we have that value (hashCode above), we look in the index to see if we have an entry yet for that hash code. If we do, we simple add the item to the list of entries that have the given hash code. If not, we create a new entry in that index based on the hash code.

Once we're done, we call the Add routine in the base class so that we have the normal behavior of add implemented. We can now go ahead and add a couple of helper methods that would help anyone trying to implement an index:

    public bool PropertyHasIndex(string propName)
    {
       return _indexes.ContainsKey(propName);
    }

    public Dictionary<int, List<T>> GetIndexByProperty(string propName)
    {
       return _indexes[propName];
    }

We need to make these public since anything that might want to implement an index on this is going to need to be able to see if a property has an index and retrieve the index if a given index exists.


More Stories By Aaron Erickson

Aaron Erickson is a principal consultant for Magenic. He is a ruthless advocate for concentrating on creating the most business value in the least amount of time when his clients entrust him to deliver a technical solution. Aaron has been delivering solutions on the Microsoft platform for over 14 years, and currently leads open source development efforts related to LINQ, including Indexes of Objects (i4o) and LINQ to Expressions (MetaLinq).

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.