Welcome!

Microsoft Cloud Authors: Pat Romanski, Andreas Grabner, Nick Basinger, Kevin Benedict, Liz McMillan

Related Topics: Microsoft Cloud

Microsoft Cloud: Article

Cover Story: Garbage Collection

How many managed objects is your application really creating?

Perhaps the most commonly asked questions regarding memory management in .NET are: "How long does a garbage collection take," and "How can I control when the garbage collector runs?" Apprehensive that "pauses" caused by garbage collections will be perceived by users, application developers often search for ways to control when garbage collections occur. Not surprisingly, the standard answer from Microsoft on these issues is to leave the collector to do its thing instead of trying to control it manually. Nevertheless, concern over garbage collection timing and performance remains.

The single biggest factor in determining when a garbage collection occurs and how long it will take is the number of managed objects that your application allocates. Therefore understanding how many objects you allocate is the best way to determine how your application will be affected by garbage collection performance. As shown in Figure 1, the GC latency, or the time spent in garbage collection, is directly related to the number of live objects allocated by the application.

At first glance, determining how many objects your application creates appears easy: because you create the objects yourself using the new operator, it's obvious when new objects are created, right? Unfortunately, tracking the number of newly created objects isn't nearly that straightforward because objects are created "under the covers" on your behalf in response to certain operations. In many cases, the objects created implicitly vastly outnumber the objects you explicitly create. Furthermore, it's not always obvious to ascertain when objects are being implicitly created simply by looking at source code.

This article helps you understand how the garbage collector will affect your application's performance by pointing out the "not so obvious" situations where managed objects are created on your behalf. In particular we'll look at the effects of boxing and string manipulations on the number of objects created as your application runs. Although this article is written specifically with the .Net Compact Framework in mind, many of the concepts discussed apply to the full .Net Framework as well.

Boxed Value Types
As you likely know, the .NET-type system defines two different kinds of types: value types and reference types. Value types provide an efficient means to create and work with simple, frequently used types. Many of .NET's built-in types such as Int32, as well as enums and any type defined with the struct keyword in C# (Structure in Visual Basic.NET) are value types. Reference types are those types that are created with the new keyword. The most significant difference between value types and reference types as far as memory management is concerned is that value types are allocated on the stack while reference types are allocated on the garbage collector's heap.

There are many convenience types defined in the .NET class libraries that are designed to work with both value types and reference types. Examples of such types include collections such as array lists and hashtables. In order to work well with any type, collections like these often include methods that take instances of System.Object as a parameter. As an example, consider the definition of the Add method on System.Collections.ArrayList:

public virtual int Add(object value);

Because Object sits at the top of the .NET inheritance hierarchy, any type, regardless of whether it is a value type or a reference type, can be passed at run time.

Loosely typed collections such as these are great for programmer productivity, but have surprising performance implications in some scenarios. When a value type is passed as a parameter to a method defined to take a reference type, the CLR will automatically convert the value type to a reference type at run time. This conversion, termed boxing, involves two steps: memory is allocated on the GC heap for the new "reference type," and the contents of the value type are copied into the newly allocated space. In Microsoft Intermediate Language (IL), boxing operations can be identified by the box instruction. For example, the following C# code:

ArrayList a = new ArrayList();
a.Add(200);

generates the following IL for the call to ArrayList.Add:

ldc.i4 0xc8
box [mscorlib]System.Int32
callvirt instance int32
[mscorlib]System.Collections.ArrayList:: Add(object)

As you see, this code boxes an instance of System.Int32 before calling Add.

In applications where boxing occurs infrequently, its affect on performance is negligible. However, because the .Net Compact Framework's garbage collector initiates a collection whenever 1MB of objects have been allocated, scenarios in which a significant amount of boxing occurs can cause the garbage collector to run more frequently than it would have to. Not only may the collector run more often, but the time spent in each collection may increase due to the length of time it takes for the garbage collector to examine the entire object graph.

Let's take a look at an example that demonstrates the potential affects of boxing on performance and garbage collection frequency. Credit for this sample goes to Roman Batoukov from the .Net Compact Framework team. Roman developed this sample while preparing for breakout sessions at conferences, including the Mobile and Embedded Developers Conference (MEDC) and the Professional Developers Conference (PDC). Roman's sample consists of three types that form a rudimentary banking system. Two implementations of these types are provided: a strongly typed implementation that involves almost no boxing, and a loosely typed implementation that provides more flexibility, but incurs the cost of several boxing operations at run time. The types involved in the sample are:

  • AccountId: Account identifiers are represented by the AccountId value type. AccountId contains an integer to hold the Account number and provides an implementation of GetHashCode. In our example, the hash code is simply the Account number. The symmetry between the hash code and the Account number makes for a perfect hash function.
  • AccountData: Each Account has both an Account identifier and an Account balance. The Account balance is held in the AccountData value type.
  • Accounts: The Accounts reference type holds a collection of 10,000 AccountData records. Accounts also provides an index operator that allows a consumer to access the AccountData for a given AccountId.
Listings 1 and 2 show the two implementations of our simple banking system. The differences between the two implementations are shown in boldface. The differences between the implementations shown in Listings 1 and 2 are:
  • The base type of AccountData: The code in Listing 1 allows alternative implementations of AccountData to be provided in the future. This flexibility is provided by defining an interface of type IAccountData from which the AccountData value type derives. The implementation of AccountData in Listing 2 does not derive from such an interface.
  • The type of the accounts array: Because the implementation in Listing 1 allows for alternate implementations of an account's data, the type of the accounts array is IAccountData. The type of the accounts array in the strongly typed implementation is simply AccountData.
  • The return type and the "parameter" to the index operator: The fact that the type of the accounts array is a reference type (IAccountData) in Listing 1 requires that a reference type be the return type and parameter to the index operator. In our case, the index operator works with instances of System.Object.
The most important distinction between these two implementations is the use of reference types by the index operator and the accounts array in Listing 1. The flexibility provided by using reference types is what makes the implementation shown in Listing 1 loosely typed.

Now that we've looked at the differences between the two implementations, let's see how they perform. The code in Listing 3 iterates through the accounts collection, initializing each account with a balance of 100. After all accounts have been populated, we iterate back through them, deducting 10 from each account. The loop that updates the balances is timed using Environment.TickCount. I iterate through the loop long enough to run the entire operation for at least 1 second (generally speaking, running a performance test for at least one second increases the consistency of the results).

There are several places where boxing occurs when running the code in Listing 3 against the loosely typed implementation of our Account class. The first boxing operation occurs when we pass an instance of our AccountId value type to the index operator that takes an instance of Object:

AccountData rec = (AccountData)ac[id];

The same boxing operation occurs two lines later in our listing when we use the index operator again:

ac[id] = rec;

Finally, the modified instance of the AccountData value type must be boxed when it is stored back in the collection:

ac[id] = rec;

More Stories By Steven Pratschner

Steven Pratschner is the program manager for the .Net Compact Framework Common Language Runtime at Microsoft. Before working on the Compact Framework team, Steven spent several years working on the full .Net Framework. Steven has written articles and presented at numerous conferences on a variety of topics related to .Net-based programming. He is the author of the book Customizing the Common Language Runtime from Microsoft Press.

Comments (1)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


IoT & Smart Cities Stories
The Internet of Things is clearly many things: data collection and analytics, wearables, Smart Grids and Smart Cities, the Industrial Internet, and more. Cool platforms like Arduino, Raspberry Pi, Intel's Galileo and Edison, and a diverse world of sensors are making the IoT a great toy box for developers in all these areas. In this Power Panel at @ThingsExpo, moderated by Conference Chair Roger Strukhoff, panelists discussed what things are the most important, which will have the most profound e...
In today's enterprise, digital transformation represents organizational change even more so than technology change, as customer preferences and behavior drive end-to-end transformation across lines of business as well as IT. To capitalize on the ubiquitous disruption driving this transformation, companies must be able to innovate at an increasingly rapid pace.
Here are the Top 20 Twitter Influencers of the month as determined by the Kcore algorithm, in a range of current topics of interest from #IoT to #DeepLearning. To run a real-time search of a given term in our website and see the current top influencers, click on the topic name. Among the top 20 IoT influencers, ThingsEXPO ranked #14 and CloudEXPO ranked #17.
The Jevons Paradox suggests that when technological advances increase efficiency of a resource, it results in an overall increase in consumption. Writing on the increased use of coal as a result of technological improvements, 19th-century economist William Stanley Jevons found that these improvements led to the development of new ways to utilize coal. In his session at 19th Cloud Expo, Mark Thiele, Chief Strategy Officer for Apcera, compared the Jevons Paradox to modern-day enterprise IT, examin...
According to Forrester Research, every business will become either a digital predator or digital prey by 2020. To avoid demise, organizations must rapidly create new sources of value in their end-to-end customer experiences. True digital predators also must break down information and process silos and extend digital transformation initiatives to empower employees with the digital resources needed to win, serve, and retain customers.
In his keynote at 18th Cloud Expo, Andrew Keys, Co-Founder of ConsenSys Enterprise, provided an overview of the evolution of the Internet and the Database and the future of their combination – the Blockchain. Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settl...
Contextual Analytics of various threat data provides a deeper understanding of a given threat and enables identification of unknown threat vectors. In his session at @ThingsExpo, David Dufour, Head of Security Architecture, IoT, Webroot, Inc., discussed how through the use of Big Data analytics and deep data correlation across different threat types, it is possible to gain a better understanding of where, how and to what level of danger a malicious actor poses to an organization, and to determin...
@CloudEXPO and @ExpoDX, two of the most influential technology events in the world, have hosted hundreds of sponsors and exhibitors since our launch 10 years ago. @CloudEXPO and @ExpoDX New York and Silicon Valley provide a full year of face-to-face marketing opportunities for your company. Each sponsorship and exhibit package comes with pre and post-show marketing programs. By sponsoring and exhibiting in New York and Silicon Valley, you reach a full complement of decision makers and buyers in ...
There are many examples of disruption in consumer space – Uber disrupting the cab industry, Airbnb disrupting the hospitality industry and so on; but have you wondered who is disrupting support and operations? AISERA helps make businesses and customers successful by offering consumer-like user experience for support and operations. We have built the world’s first AI-driven IT / HR / Cloud / Customer Support and Operations solution.
LogRocket helps product teams develop better experiences for users by recording videos of user sessions with logs and network data. It identifies UX problems and reveals the root cause of every bug. LogRocket presents impactful errors on a website, and how to reproduce it. With LogRocket, users can replay problems.