YOUR FEEDBACK
Gregor Rosenauer wrote: well, not what's your take on this? Did I miss a second page of this article or...

SYS-CON.TV
TOP MICROSOFT .NET LINKS


Beyond Objects - Alternate Managed Languages
Beyond Objects - Alternate Managed Languages

When I was in junior high school music was one of the most important factors in life. Few things were more important than being up-to-the-minute on which bands were "cool" and which were to be eschewed. Regardless of what genre you liked, yesterday's bands sucked and today's bands ruled. So it is with the software industry. At any given moment there are the "cool" technologies (as I write this the prevailing attitude is that Web services rule) and the has-beens (COM was cool but now seems positively dowdy). I don't think this is necessarily unhealthy ­ experimentation is the only way we're ever going to figure out how to write software without inhuman amounts of effort.

One idea has managed to remain cool for the last decade ­ object-oriented development. OO has been around long enough that it's starting to look, well, mature, and it's hard to be both mature and cool. At least two popular new technologies have largely abandoned objects: XML/XSLT and Web services.

What this means to me is that the Next Big Thing will probably not be objects. This is interesting because most of the languages used on the .NET platform today are object-oriented in nature. How will the CLR cope if next year's favorite development paradigm turns out to be "Bilinear Aspect-Oriented Development"? Can the CLR accommodate nonobject languages?

The answer depends on the relative agnosticism of the .NET runtime. If the platform is well factored, then the generic services built into it (like memory and thread management) should make it easy to accommodate next year's fashionable ideas in a way that solidly integrates with today's has-beens. Since it's very hard to know what the next "cool" model will be, it is important to try to keep the runtime as free of bias as possible without trying to be all things to all people. This is the tightrope the CLR will have to walk as it evolves over the next few years. If it manages to succeed, then the managed platform might still be around even after object-oriented programming goes into the hopper with all of our old REO Speedwagon records.

One Runtime to Bind Them All
Computer languages come in an infinite variety of flavors, but underneath their syntactic differences they work in more or less the same way. All languages give their users the ability to create some kind of programmatic variables (whether C++ objects or LISP cons nodes) and a set of operations for manipulating them. Most languages additionally define a memory model and a set of libraries useful for common programming tasks.

The defining vision of the .NET platform is to distill these four factors into a single consolidated runtime so that compilers can focus on language issues instead of platform details. This vision is outlined in ISO 23271, the .NET Common Language Infrastructure (CLI) specification. This specification defines a Common Type System (CTS) so that compilers don't have to synthesize entire type systems from scratch, a Virtual Execution System (VES) to insulate compilers from the vagaries of CPU register allocation, and a garbage collector to provide automated memory management. Compilers don't have to emit API calls to take advantage of runtime services; they emit code in the .NET Intermediate Language (IL) and the runtime supplements execution with services as appropriate. Finally, the Base Class Library (BCL) provides a set of around 600 classes for doing common things like File I/O.

Porting a new language or framework to .NET will require interaction with each of these four subsystems. The two with the most immediate impact on a language's performance characteristics are the Common Type System and the garbage collector.

The Common Type System
The Common Type System (CTS) is arguably the most critical part of the CLI specification because it is the largest determinant of how well a given language will work with the underlying Execution Engine. The mapping a compiler makes between its programmatic types and the CTS has an enormous effect on execution speed as well as interaction with other code.

The Common Type System is object-oriented in nature and utilizes single-implementation inheritance. Not coincidentally, this is exactly the model used by both C# and VB, which are designed directly around the CTS. These compilers map syntactic constructs like classes onto the CTS in an almost one-to-one manner. If you compile a C# program and open the resulting assembly with ILDASM, you'll find that its contents look practically identical to the ones in the source code; not surprising, given its unusually close kinship with the .NET platform. Other single-inheritance languages like Java are similarly easy to map onto the CTS.

What about multiple inheritance? Eiffel, is an object-oriented language that supports multiple inheritance and templates, neither of which are natively supported by the CTS. Eiffel.NET still provides the full functionality of the Eiffel language by mapping each Eiffel type onto multiple CTS types (several CTS classes are used in concert to implement a single Eiffel). This mapping is an uninteresting compiler detail to a pure Eiffel programmer, but comes into focus as soon as the code is distributed to developers used to C# or VB.NET. A simple Eiffel class is shown in Listing 1. All of the code for this article can be downloaded from www.sys-con.com/dotnet/sourcec.cfm. A VB programmer who fires up ILDASM and points it at an Eiffel assembly sees something like Figure 1 ­ a thicket of CTS classes whose proper use isn't immediately obvious.

This doesn't mean that Eiffel is poor or that it's poorly suited to the CLI. The more tangled the mapping between a language and its representation in the CTS, the harder it's going to be to consume it from other languages. Other languages (notably managed C++) generate constructs that are even more difficult, if not impossible, for others to use.

Despite its object-oriented nature, the CTS isn't particularly biased in favor of OO languages; languages like COBOL and FORTRAN still use ints and floats after all. These compilers happily base their type system on the CTS and just ignore its more object-like constructs. The real bias of the Common Type System is in the fact that it is statically typed ­ every object and object reference is bound to a type at creation and that binding can never change. This has implications for all languages, whether they are OO or not. The CTS isn't statically typed just because the implementers happened to be in a statically typed mood on the day they designed the type system. Rather, static typing is the linchpin of code verification.

Verification
One of the most important developments of the 1990s was the development of semantic type checking ­ if enough information is known about how code manipulates variables, then it is possible to make strong statements about whether or not that code is safe to execute. This process is familiar to both Java and C# programmers as verification ­ the runtime inspects IL prior to execution, and then may decide not to execute it if it is unable to prove that the code can't do any harm. Code that can't be proven to be safe is called "unverifiable code" and will not be executed unless it has been granted trust.

Semantic type checking is a relatively new field and verifiers today can only prove safety for a limited set of operations ­ not all safe code is verifiable. The .NET verifier can only verify code where both the code (IL) and the object instances act not only statically typed but in a strongly-typed manner where the type of object instances is never vague. As a result, some IL instructions are not verifiable (like cpblk), and native code is never verifiable.

Dynamic Typing
Verification's insistence on strong typing is a major impediment for dynamically typed languages where the type of a variable can change with each assignment. In JScript, for example, the following code sequence is perfectly legal:

var a = 7; //a is holding an Int32 value
var b = 8;
a = (a+b).ToString(); //a is holding a string value.

Building a dynamic type system on top of a static one requires heavy runtime overhead, with lots of object conversions. The results are rarely pretty. Compiling the JScript above into IL produces the code in Listing 2.

Variables a and b are System.Object references that in this case are created via the IL box instruction. Box isn't particularly cheap and tends to create garbage every time it is used. The JScript compiler also can't use native IL instructions like add because "+" in JScript might mean something different every time it appears. Compiled JScript code instead relies on a 715K runtime support library for supplementary functions like EvaluatePlus.

All of this overhead means that dynamically typed languages are second-class citizens on the .NET platform today. The only viable use for dynamic typing is for scripting languages where performance isn't an issue. The hostility of verification toward dynamic typing today prevents the easy porting of a large class of languages.

Luckily, compilers have another option ­ to trade verifiability for performance.

Mixed-Mode Compilation
The execution model I've outlined so far isn't particularly unique to .NET. The Java platform in particular provides a very similar model, including a statically typed object-oriented type system, an intermediate code representation (bytecode), garbage collection, and a standard class library. The biggest difference between these two platforms is their stance on code verifiability. The Java platform's approach to verification is like that of a strict elementary school teacher ­ all code must pass the verifier to run ­ no ifs, ands, or buts. JNI allows Java to call unmanaged methods, but it doesn't allow unmanaged types or code to mingle with their managed brethren.

The .NET platform's approach to verification is more like a hippie college professor ­ the user gets to decide which programs are required to pass verification and which are not. Compilers make a choice to either accept the performance hit of verifiable code in exchange for enhanced deployment flexibility (as exemplified by JScript and Eiffel) or sacrifice deployment flexibility in favor of performance by mixing unverifiable or unmanaged constructs into the code.

Compilers that utilize native code or unmanaged types are called mixed-mode compilers. Mixed-mode compilation enables languages like ANSI C and C++ to run on the .NET platform with reasonable performance. Mixed-mode compilers can pick and choose which services they want to relegate to the runtime and which they would rather implement themselves. This flexibility makes a lot of things possible on .NET that would be awkward or impossible on the JVM. A programmer might want to represent a custom hardware device as a managed class but finds that communicating with the device requires a few lines of platform-specific assembly language. A mixed-mode compiler could implement a method in native machine code without forcing the programmer to go through an interop layer like P/Invoke or JNI. The compiler emits metadata for the method, declaring it as native, and provides enough information to allow the runtime to invoke the method. This lets the runtime call the method directly, rather than going through an interop layer like P/Invoke.

Languages that don't fit neatly into the CTS's rigid type model may prefer to use a type system of their own devising but still use IL to manipulate instances. Listing 2 shows a short managed C++ program that declares managed and unmanaged versions of a class; Listing 3 shows the resulting IL (slightly abridged). Notice that the managed version of the structure contains descriptions of the fields it contains. The unmanaged structure simply contains an attribute telling the runtime how big instances of this type are, but it's enough information to allow instances to be held and manipulated in the runtime.

Verification is not the only thing mixed-mode compilers sacrifice. Mixed-mode assemblies are likely to be tied to a particular CPU and operating system if they contain CPU-specific native code (The Mono environment today doesn't support mixed-mode assemblies compiled for Windows and probably never will). Mixed-mode assemblies that use unmanaged constructs are also harder to integrate with modules written in other languages (it's not called the Common Type System for nothing).

Memory Management
The performance of a language on any given CLI platform is going to be influenced by how well it interacts with the garbage collector chosen by the platform implementers. This is a tricky area, because garbage collectors can be optimized in several ways:

  • To use a minimum (or fixed) amount of memory
  • To use the least possible CPU (by running less often)
  • To make GC collection intervals as short as possible (at the expense of more total CPU)

    Different types of languages place different loads on the garbage collector. Procedural and OO languages tend to create relatively few, largish objects that stick around for a fairly long time. Functional languages tend to create zillions of teeny objects (e.g., LISP cons nodes) that almost all become garbage right away. The performance of language x on managed platform y is going to depend on how garbage collector y handles the GC demands placed by x. The garbage collector currently used by the CLR is optimized to keep GC collection intervals short; since it expects most code to be OO/procedural in nature, it's safe to say that it's optimized for that case. The performance of other languages may vary depending on how this pattern suits them.

    Conclusion
    Regardless of what the Next Big Idea looks like, I think there's little question that a decently performing implementation can be built. Mixed-mode compilation provides a way for even the strangest of languages to run reasonably well while using the services of the runtime. What is harder to answer is (a) if it will be able to run efficiently and pass the verifier or (b) how easy it will be to use from other languages. The CLI platform continues to evolve and over time will probably introduce constructs for dynamically typed language. As the field of semantic type verification advances verifiers should become more powerful as well. With any luck the CLI will still be around not only when OO goes out of style, but also when it comes back as a "retro-cool" idea.

    Further Reading

  • Gough, K.J. "Stacking them up: A Comparison of Virtual Machines." http://sky.fit.qut.edu.au/~gough/VirtualMachines.ps
  • Meijer, E. "Scripting .NET Using Mondrian". http://research.microsoft.com/~emeijer/Papers/ECOOP.pdf
  • Meijer, E., and Gough, J. "A Technical Overview of the Common Language Infrastructure." http://research.microsoft.com/~emeijer/Papers/CLR.pdf
  • Gilmore, S. "Resource-Bounded Functional Programming on the JVM and .NET." www.dcs.ed.ac.uk/home/stg/MRG/comparison/slides.pdf
  • Hanson, D.; lcc.net: "Targeting the .NET Common Intermediate Language from Standard C." http://research.microsoft.com/~drh/pubs/msr-tr-2002-112.pdf
  • Arnout, K. "Eiffel for .NET: An Introduction." www.devx.com/codemag/Article/8500/1954
    About Jason Whittington
    Jason Whittington is a consultant and researcher with an irrational fascination with virtual execution environments. When he's not researching or consulting he can often be found delivering courses for DevelopMentor. His Web site can be found at http://staff.develop.com/jasonw.

  • MICROSOFT .NET LATEST STORIES
    Come see a no-slides, code-only presentation that starts with a blank directory and builds a data-driven, AJAX enabled, ASP.NET web application from scratch that implements common AJAX patterns with the rich set of AJAX Control Toolkit, accesses data with LINQ, and implements standards...
    GigaSpaces Technologies and GoGrid have announced the availability of the GigaSpaces eXtreme Application Platform (XAP) on GoGrid's enterprise-grade cloud computing service for Windows and Linux. The two companies’ joint offering enables enterprises to migrate existing and new Java, ...
    Many of today's (and tomorrow’s) development projects lend themselves nicely to RIA application patterns. Silverlight offers a compelling RIA development experience that works on Linux, the Mac and windows as well as all major browsers. With HD video, vector based graphics and a rich...
    With all of the hype surrounding Cloud computing, Microsoft's upcoming Cloud OS and current efforts around Live Mesh, I thought I would take a trip on the WABAC machine to look at where it all started. Back when I was in junior high school, the best type of connectivity that I could ho...
    Rich Internet Applications offer the potential to fundamentally change the user experience and in doing so, yield significant business benefits. The theme of this October's AJAX World Conference & Expo 2008 West is 'Beyond AJAX to the RIA Era' and the Call for Papers, which is still op...
    SUBSCRIBE TO THE WORLD'S MOST POWERFUL NEWSLETTERS
    SUBSCRIBE TO OUR RSS FEEDS & GET YOUR SYS-CON NEWS LIVE!
    Click to Add our RSS Feeds to the Service of Your Choice:
    Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
    myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
    Publish Your Article! Please send it to editorial(at)sys-con.com!

    Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021


    SYS-CON FEATURED WHITEPAPERS

    ADS BY GOOGLE
    BREAKING NEWS FROM THE WIRES
    Slalom Consulting, ranked one of the best consulting firms to work for in the United States by Consu...