Welcome!

.NET Authors: Hovhannes Avoyan, Bruce Armstrong, Pat Romanski, Liz McMillan, Yeshim Deniz

Related Topics: .NET

.NET: Article

MSIL: A Wizard's Tale

MSIL: A Wizard's Tale

There are places in this universe were mortals fear to tread ­ dark mysterious places replete with shadowy cliffs, hidden treasures, and rumors of spiritual powers. These places are best left to wizards, hobbits, and elves ­ and those few who have an unbridled passion for adventure, challenge, and conquest.

Right now you are either intrigued by where this is going or completely flabbergasted as to what possible relationship this introduction could have to Microsoft's .NET. Well, if you observe some developers and how they approach learning their craft, you would think that making sense of the internal aspects of this technology is best left to programming gurus and those few crusaders who dare to tread in the .NET netherworld.

The reality is that understanding the internals of .NET isn't all that difficult or mysterious, and having good insight into things like garbage collection, memory management, threading, and Microsoft's Intermediate Language (IL) will help you become a much better programmer and system architect.

In this article we are going to delve into the netherworld of .NET internals by taking on the Microsoft Intermediate Language, officially known as MSIL, but more commonly known as IL. IL is a language ­ nothing more and nothing less ­ and it really isn't all that mysterious or complex. So here goes, let's dive in and start dismantling the beast.

The Purpose of IL
One of the challenges when Microsoft was designing .NET was how to allow developers to write in any language they choose, yet target a single runtime environment. The answer to this dilemma? All source code compilers must translate your program's source code into a single intermediate language. The purpose of that "intermediary" language is to act as a sort of assembly language that is independent of any CPU architecture. Once your code is translated by the source code compiler to IL it is then rather simple to see how this standard representation of program semantics can be translated into native code and targeted to a specific CPU. The translation from IL to native code is the job of the JIT (just-in-time) compiler (see Figure 1).

The Basics
Now that you understand where IL fits into the overall .NET world, you next need to understand some of the basic language concepts. IL is a fully functioning language; it has data types, flow control, object model instructions, and everything else needed to make a program. In order to help you get a good handle on IL, we are going to start by writing, compiling, and running a small program, and then we will dissect it to give you a better understanding of how it operates.

Being a traditional sort of guy, I'll start with that favorite and time-honored workhorse of programmers everywhere ­ "Hello World." Try entering the code from Listing 1 into any text editor. Visual Notepad works rather nicely. (Visual Notepad is an inside joke from my days at Microsoft. It always seemed easier to write code samples in Notepad when doing presentations.)

Once you have entered the code, save your file as HelloWorld.il and open a command window. The first step is to take the IL and turn it into an executable, more correctly known as a PE file. To do this, make sure you have a path set to the IL Assembler, where your copy of ILASM.EXE resides, then type in "ILASM HelloWorld.IL" and hit Enter. The compiler should generate the results shown in Figure 2.

This is good. The IL Assembler completed without errors and you should now have a spanking new HelloWorld.exe file in your directory. Execute the file and you should see those timeless words "Hello World" sparkling across your screen. So there you have it, more than likely your first program written in 100% pure IL. Not all that mysterious, right?

Breaking It Down
So now let's start dissecting what we did line by line and see if we can make some sense of it all. The first line of the program points to the assembly we would like to use. Note that the word "assembly" is preceded by a "dot." In IL, a dot precedes what is known as a "directive." Basically, a directive is an instruction to the assembler to carry out some unit of work. In this case the assembler will create an assembly manifest called "Hello". This is a required line, and without it the program will compile but will generate an I/O exception when you try to run it.

The next line, ".method static void HelloWorld()", is a directive for the assembler to create a method that is named HelloWorld. Since we declared it as "void", nothing will be returned. The beginning and endpoints of the method are bracketed by curly brackets, the same as would be used in a C# program. So you must be wondering why we included a "static" declarative in the method directive for HelloWorld(). If you look at the next line in the program, you will see that it has an ".entrypoint" directive. This tells the assembler that this is the program's initial point of entry, much like the "Main" method in other languages. IL programs must have only one entry point, and the entry point method must be declared as static.

For now let's skip over the line that contains the "ldstr" instruction and concentrate on the line that makes a call to the .NET Framework. You might notice that this line is not preceded with a dot. In IL, lines that are not preceded by a dot are considered instructions. Instructions are basically the guts of your program, in which directives are the infrastructure and piping. The "call" instruction allows us to make calls into other assemblies. In this case we make the call to the appropriate library, allowing us to write to the console.

What about that line with the "ldstr" instruction? First, let's go back and reiterate one of the concepts from beginning of this article: IL is an intermediary between your source code and the JIT; it must not make any assumptions about the underlying operating environment. For instance, it cannot make assumptions about the CPU architecture or instruction set employed by the servers where your code executes. In order to meet this requirement, Microsoft elected to use a simple, yet elegant, approach ­ they made IL stack-based. The process of copying something from memory to the stack is called loading, and the process of writing to a variable on the stack is known as storing.

The basic IL instruction set is divided between instructions that load and those that store. Instructions that load something to the stack are initialized with the letters "ld" and those that store are initialized with the letters "st". So "ldstr" is an instruction to load a literal string onto the stack, in our case the string "hello world".

A Little Magic
In the next example we are going to learn how to interact with the stack and with values returned by other programs. The goal is to take the strings "hello" and "world" and concatenate them using native IL. First we load the stack with our string values; we accomplish this using the "stloc" instruction. Since it is preceded with a "st", we know that it is going to write to a variable on the stack. The code block for carrying out this little feat of magic is rather straightforward:

ldstr "hello "
stloc.0

ldstr "world"
stloc.1

All this code really says is to load a string literal onto the stack and then write it to the appropriate location. In our case we are storing the string "hello " in slot one and the string "world" in slot two. Once we have things stored in their locations, we can turn our attention to concatenating the string in order to create our trusty "hello world" salutation. To accomplish this we need to load things onto the stack from memory, and this is done with the "ldloc" instruction, which also takes a slot parameter. The code to load these two strings is again fairly straightforward:

ldloc.1
ldloc.2

Now that we have our strings we make a call out to the framework and concatenate the two strings:

call string [mscorlib]System.String::Concat
(string,string)

Notice that the Concat() function returns a string, so we need to store that result, load it, and then write it out to the console. You can see a pattern emerging: we basically store, load, and manipulate through the use of IL instructions. Listing 2 shows the full program that takes the two strings and concatenates them.

We have been concentrating on the use of strings, and our program has been rather "top-down" in nature, without any logic. As a sort of graduation exercise we are going to work through a program that introduces some math and branching logic. The program will basically take two inputs, "Total Sales for Today" and "Total Returns for Today". We will then subtract the returns from the sales to produce our net sales. Depending on whether or not net sales is greater then 10, we will display the appropriate message on the screen.

This program uses more then four variables, so we are going to need to initialize some variables. The directive to initialize local variables basically takes the ".locals" command, followed by a function to initialize our variables.

.locals init ([0] int32 iSales, [1] int32 iReturns,
[2] int32 iNet, [3] string a, [4] string b)

The reason we cannot use the previous syntax, stloc.[slot number], is that "stloc" can address only the first four slots using dot notation. So the instruction "stloc.3" is valid, but "stloc.4" is not. Once we go above four variables, we need to use labels and apply the following syntax: stloc.s "variable label". If we had initialized a variable labeled "greeting" we could address it as "stloc.s greeting". This also holds true for the "ldloc" instruction, whereby we would load the "greeting" variable onto the stack using the "ldloc.s greeting" syntax.

The Final Exam
Now let's look at the complete program. I think you will find it easy to follow and will probably surprise yourself with how well you are starting to understand IL. Listing 3 is your final exam.

How did you do? The programming concepts should be pretty straightforward by now. We load some strings to facilitate the calls to our WriteLine() methods; we read in and store the input from the ReadLine() method calls; and then we use the .NET Framework to parse the strings into integers and store the results of the conversion in slots 0 and 1, respectively. We then copy the memory values for slots 0 and 1, subtract them using the "sub" instruction, and store the result of the calculation in slot 2. Then we load the result back to the stack using ldloc.2, and we come across something new, the instruction "ldc.i4.s 10". This is just an instruction to load a constant of type integer with a value of 10 onto the stack. The "ldc" instruction can support a 4-byte integer (i4), an 8-byte integer (i8), a 4-byte float (r4) or an 8-byte float (r8). Aside from "ldc" and "ldstr", IL supports the loading of arguments, local variables, fields, and elements.

Once we have the result of our calculation and the value we want to compare against ­ in this case 10 ­ we can use the IL branching logic instructions to decide whether we had a good day or not. The "ble" instruction "branch less then or equal to" looks at the two values and, based on the result of the comparison, either executes the next line of code or jumps to an IL line number, much like a "goto" statement. Branching instructions are always preceded by a "b" and are complemented by calling instructions.

Between the branching and calling instruction sets, IL manages the flow of a program's execution. IL also supports the ability to manage exceptions and has instructions specific to the .NET object model. If you worked through the examples in this article, you are now familiar with the IL Assembler. Another program that you should spend some time with is the IL Disassembler, which takes a .NET executable and generates IL for you to inspect. The Disassembler is in the .NET Framework SDK/Bin folder and is named "ILDASM.exe". The tool is rather intuitive and should take you little time to understand after having read this article.

Conclusion
What do you think? Was it all that mysterious or scary? The reality is that getting to know IL, as well as other areas of .NET internals, will help you write better code, develop more robust and efficient architectures, and, most important, give you the skills and knowledge that can make all the difference in the world ­ when things go bump in the night.

More Stories By John Gomez

John Gomez, open source editor for .NET Developer's Journal, has over 25 years of software development and architectural experience, and is considered a leader in the design of highly distributed transaction systems. His interests include chaos- and fuzzy-based systems, self-healing and self-reliant systems, and offensive security technologies, as well as artificial intelligence. John started developing software at age 9 and is currently the CTO of Eclipsys Corporation, a worldwide leader in hospital and physician information systems.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.