Welcome!

.NET Authors: Brad Abrams, Alin Irimie, Colin Walker, Maureen O'Gara, Reuven Cohen

Related Topics: .NET

.NET: Article

Iterators in C#: Nothing Beats the Foreach Loop

Neat Uses Pioneered in Python Can Easily be Applied in C#

One Fish,
Two Fish,
Red Fish,
Blue Fish!
- Dr. Seuss, 1960

My favorite construct in the C# programming language has to be the foreach loop. I get a little jolt of pleasure every time I use it. I can enumerate rows in SQL Queries, nodes in XML documents, ArrayLists, and Hashtables. You name it, I can enumerate it all without mucking around with a single loop conditional.

foreach is super convenient to use, but I don't use it as much as I should in my own programs because writing the code to support foreach is tedious and dull. Personally I'd much rather be off doing some kind of asynchronous bind to a Web service thing than grinding out yet another iterator class. That's a shame because iterators can be used for a lot more than just enumerating data structures.

So I was happy to find that in C# 2.0 the dreary work is largely a relic of the past. The introduction of iterator functions reduces the backing infrastructure for foreach to a single function. This allows me to use foreach for all kinds of new purposes while still giving me time to check out the latest installment of Grand Theft Auto.

Supporting foreach Today
As shown in Figure 1, any time you use foreach the compiler expects the expression you give it to result in an implementation of IEnumerable (or IEnumerable<T> in C# 2.0). The compiler emits code that calls GetEnumerator on the resulting object and a loop that keeps calling IEnumerator.Move Next until MoveNext returns false. Each time through the loop the Current property on the iterator is evaluated and the result is used in the body of the foreach loop.

This means that if you want to give your clients something to enumerate, you must implement an iterator class devoted to feeding items to the client. This class is often a private nested class on some larger entity. The only way for the outside world to get at it is to call GetEnumerator on the container. When they do, you create an instance of your iterator class and pass it some sort of reference to the data it will be enumerating. The iterator class keeps a reference to the "current" item to hand out in the Current property and updates its state every time someone calls MoveNext. Listing 1 shows a simple iterator that can be used to iterate simple integer arrays along with a class that uses it to implement IEnumerable.

This code doesn't look too bad - it's less than 40 lines - but an array is a pretty simple data structure. Other structures (particularly recursive ones such as trees) are difficult to iterate with IEnumerable, so iterators can grow to many hundreds of lines or more. There are various other niggling problems that tend to complicate iterating over data structures. What if the client decides to open several iterators simultaneously? What if it tries to modify the collection in between calls to MoveNext? What if you want to support multiple threads iterating simultaneously? These problems each add code and complexity.

The biggest complaint I have about IEnumerable-style iteration, however, is its lack of flexibility. Very often I would like the client to be able to specify parameters to control how iteration should proceed. Maybe the client would prefer to iterate backwards. Maybe it only wants to see every other item or only the last 10. If the client is iterating tree-like data, it might prefer pre-order, in order, or post-order traversal, depending on what it's doing. If I'm generating the items as I go, there are any number of ways I might want the client to control iteration. This code is largely boilerplate, but it multiplies rapidly because you don't get to specify any parameters to GetEnumerator. This means you have to write an IEnumerable/IEnumerator pair for every option you want to support. This is great for clients, but it's painful for me, so today I'm more likely to just ditch the paradigm entirely and do something else.

Generators
An interesting contrast to this pain can be found in the Icon and Python programming languages. These languages approach iteration with an elegant construct known as "generator functions." Generator functions are written to look like normal functions, but they behave just a bit differently. Generator functions can stop at will and "yield" a value to their caller. When they do this, their execution state is preserved somewhere and the caller gets to run. The next time the function is invoked, the generator picks up where it left off (along with its local variables) and continues running as if nothing had happened. This makes it easy to write a function that yields a series of values to the client. A simple Python generator function looks like the following:


def hello():
	yield  'hello'
	yield  'world'

This function returns an object to the client called a "generator" that can be iterated in a fashion familiar to anyone who has ever used IEnumerator (see Listing 2).

There's a definite family resemblance to IEnumerable/ IEnumerator here, but the beauty is that the generator class itself is generated by the Python compiler based on the function definition.

We can write code in C# today that behaves a lot like hello by using switch statements and gotos. Listing 3 shows a crude translation of the Python program above into C#. Ugh. It works but C# is clearly the ugly stepsister in this family.

Generator Functions in C#
C# 2.0 borrows the concept of generator functions from Python, including a syntax that is pretty much identical. Implementing generator functions in C# 2.0 requires a fairly simple set of steps:

  1. You write a function and declare the return type to be IEnumerable or IEnumerable<T>.
  2. In your function body, you use "yield return" to yield a value to the caller (while preserving your state) or "yield break" to abandon iteration.
  3. The compiler translates your code into a class resembling the code in Listing 3.
  4. Clients consume your function using foreach (or by manually using the IEnumerator returned from the function). When the function body exits (or encounters a yield break statement), MoveNext will return false and the loop will exit.
Using the new syntax in C# 2.0, we can condense the code from Listing 3 into something more closely resembling the Python example (see Listing 4; for space reasons listings 4-8 are online at www.sys-con.com/ dotnet/source.cfm).

As you can see, function SayHello returns IEnumerable<string> instead of just string. The presence of this return type, plus the fact that the method contains the yield return expression, tells the compiler not to emit an ordinary method body. Instead the compiler will create a generator class that will yield the values "hello" and "world" in the Current property after successive calls to MoveNext. If we compile the class and crack it open with Reflector.NET we see that SayHello just creates an instance of the generator class and returns it:


private static IEnumerable<string> SayHello()
{
      return new hello.<GetStrings>d__0(0);
}

The compiler assigns a peculiar name to the generated class: <GetStrings>d__0. However, remember that no one should ever create an instance of this class - they should call SayHello.

We'd expect the generated class to implement the IEnumerable<string> interface. Looking at the class definition with Reflector.NET shows us that it does (see Listing 5).

A look at MoveNext (Listing 6) reveals code that looks suspiciously like the code we wrote in Listing 3.

This is a lot easier than writing all this code by hand, and there's less opportunity for bugs to creep in. The client doesn't see things any differently than it has in the past. It just merrily consumes the items generated by SayHello with a normal foreach loop like the following:


foreach(string s in Greeter.SayHello())
   Console.WriteLine(s);

Taking it to the Next Level
SayHello may be a simple program, but it subtly illustrates the real promise of generator functions. Rather than traverse a data structure, it just generates items and returns them sequentially. This is the real power of iterator functions: actually generating sequences of data. All kinds of problems become simpler with iterator functions because you get to calculate in whatever manner is most convenient for you without the consumer having to get involved. This makes lots of programs easier to write, especially if the items are generated recursively.

For a fun example, let's look at the classic "Tower of Hanoi" puzzle (where you move the disks from one of the three pegs to another one without placing a larger disk on a smaller one). This game can be easily solved with a simple recursive algorithm (see Listing 7).

It would be nice to allow the user to say something like the following:


foreach(Move m in TowerOfHanoi.Solution(10)) ...

However, the recursive nature of the algorithm would make it difficult to implement in C# 1.0. In C# 2.0 it's no problem - we just build a generator that yields a sequence of successive moves to solve the game. Listing 8 shows the technique in action. Notice how the function recursively calls itself to generate partial solutions to the game. The client is insulated from this detail and just iterates over the results with a simple foreach loop.

Conclusion
Iterator functions are a pretty new idea in the mainstream software world, but they've been around for awhile in academic languages like Icon and Python. A lot of really interesting uses have been pioneered in Python, and it's pretty easy to apply these ideas in C#. Poking around Google I turned up everything from Abstract State Machines to lightweight Thread Scheduling. I have a feeling a lot of techniques are still waiting to be discovered, so get out there and start playing with them. I have more samples and fun things at my Web site (www.neocranium.com). I hope to see you there.

More Stories By Jason Whittington

Jason Whittington is a consultant and researcher with an irrational fascination with virtual execution environments. When he's not researching or consulting he can often be found delivering courses for DevelopMentor. His Web site can be found at http://staff.develop.com/jasonw.

Comments (6) View Comments

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Most Recent Comments
Darryl 07/25/08 01:47:48 AM EDT

I'm with Dave, screw your crappy pop-ups.

dave 06/19/08 05:48:39 PM EDT

screw your crappy pop ups that wont mimimize

Connelly Barnes 11/11/04 07:03:01 PM EST

Calling Python "academic" is like calling C# "academic." It doesn't make any sense. There are lots of tradeoffs in the Python vs C# equation, but both languages are "pragmatic," not "academic."

DougHolton 11/10/04 10:17:16 AM EST

See boo ( http://boo.codehaus.org/ ), a python-like language for .NET and Mono. It has iterators and generators exactly like python, as well as anonymous methods/closures like ruby. And it is static typed so it runs as fast as C#.

I agree though, Python isn't an academic language. I think you're thinking of Lisp :)

montecristo 11/10/04 08:43:42 AM EST

Python is an "academic" language? You need to take a look at python.org, and get a grip!

gab 11/10/04 06:11:25 AM EST

you have this ugly feeling that python is an academic language. Actually it really is not, just think of Zope.

I hope the next article will consider anonymous delegates as "stolen" from ruby :) (wich, btw used them as iterators/generators for more than ten years)