Welcome!

.NET Authors: Lee Novak, Alin Irimie, Colin Walker, Maureen O'Gara, Reuven Cohen

Related Topics: .NET

.NET: Article

Marissa's Guide to the .NET Garbage Collector, part 2

Marissa's Guide to the .NET Garbage Collector, part 2

When we left Marissa last month she was getting rather cranky, but now she is well rested and ready to get down to business. And that business is how Microsoft .NET manages memory through the use of what is commonly known as the garbage collector or GC. In the first article in this series [.NETDJ, Vol. 1, issue 8] we discussed some of the basic concepts behind the garbage collector. If you haven't had a chance to read that article, now would be a good time - before you look over the following explanations and advice from Marissa, my six-month-old niece, who is known far and wide as an expert in playtime etiquette; baby food; and most of all, the workings of the Microsoft .NET garbage collector.

"So Marissa, where should we start this time?" "Given that we left off with the Finalize method, maybe we should pick up from there, Uncle John," she replied. She then dove right into some of the deeper concepts related to the Finalize method. Marissa outlined some key rules regarding the Finalize method that we should keep in mind when designing applications. (I took the liberty of detailing these rules in a sidebar.) "Now that we have a handle on the Finalize method, we should move on to the Dispose method." I just nodded my head, put my feet up on the PlaySkool Talking Grill, and leaned back.

"One of the challenges of dealing with the Finalize method arises from the fact that it is neither deterministic nor a public method," Marissa stated as she pointed to some notes she'd scribbled on a chalkboard. "So it is important to implement a Dispose method whenever we utilize resources that are expensive or limited." "So if I'm right," I interjected, "the Dispose method basically allows an application to provide a means by which applications or even the original application itself can deterministically release resources?" "Yes, that's exactly right, Uncle John."

Marissa shot me a small grin and commented, "I think you might get the hang of this yet. The important thing to keep in mind, though, is that the Dispose method allows only resources to be released; it does not release memory or help manage memory." As I listened to her point out this subtle difference - the release of resources versus the release of memory - I decided that it warrants a brief example.

Consider that a database connection, a file handle, and a network socket are all examples of system resources. By implementing the Dispose method you could design an algorithm that calls the Close or Release method of each, making them available for other applications. But consider that although you have released the resource, you cannot release the memory allocated for the management of the resource. So the memory that was originally allocated by the GC will remain allocated until the GC works through its cleanup process.

"I think we should spend just a little time talking about how threading affects the GC," Marissa said, dragging me back from my reverie. "It's important that people understand once and for all that threading has implications - many implications - and that it isn't a toy. I know I'm getting a little serious here, Uncle John, but trust me - in my six months I've seen many good applications messed up with threads. So I hope that by highlighting how they impact the GC, people will think a little harder about using them."

"It sounds as if you have strong feelings about threading there, Marissa!" I tossed out. "I do," she responded emphatically. "It's just that threads never hurt anyone, and so many people misuse them." When I finally got her calmed down, she gave me some strong insight into threading and the GC. She explained that all the aspects we've discussed and will discuss in this article work exactly as in a single-threaded environment. But when you go to a multithreaded design, the GC has to do some special things that can affect performance.

For instance, she explained, the GC will suspend all threads executing managed code before it starts a collection. This is important because the GC is rearranging things and threads could end up referencing memory that is allocated to objects that are going to be moved to another generation. Think of the havoc that would cause! Keep in mind that the suspension of threads by the GC affects only managed threads executing in the process that the GC is cleaning up. Now you're wondering how the GC knows it can safely suspend a thread without interfering with the execution of your application's task.

It has to do with how threads are managed, and their state within the internal method table and thread stack. We'll tackle these concepts in a future article on threading, but rest assured that the GC will not suspend your threads until it has determined that they have reached a safe and happy place and no harm will be done by suspending them. But the implications are that the GC must wait, thereby prolonging the use of resources - or, once the threads are suspended, they must wait on the GC. Both of these actions result in a situation that can and often does, impact the performance of your application and memory utilization.

Next on the agenda was reviewing the concept of weak and strong references. Marissa did a pretty good job of explaining the difference between strong and weak references. She started by explaining that the GC will create a graph, similar to a tree graph, of all the objects in the system. This graph helps the GC determine which objects are reachable, or in other words, directly referenced by a "root node" in the tree.

If an object is reachable, it is said to have a "strong" reference. So basically, if the object is being used, it is strong. To understand weak references, it helps to first understand the advantages they provide. Let's consider an application that for some reason loads a list of a thousand items into memory. For argument's sake let's just say that these items cause a large performance hit and that it is easier to keep them in memory than to re-create the list of items each time the application has to access something in the list. Now suppose that the behavior of your application is such that at some point the list is nice to have around, but the memory could "possibly" be better used elsewhere.

Wouldn't it be, as Marissa says, "ultra-cool" if you could somehow tell the GC that you are still using the object, but if worst comes to worst the GC could reclaim the memory used by the list? That is exactly what a weak reference allows you to do. A weak reference allows the GC to reclaim an object that is still referenced by a root node, and that is ultra-cool. To let the GC know that you have a weak reference to something, examine the code in Listing 1 and modify it for your needs. Remember that you will always need to check to see if the object is still alive because the GC can reclaim the memory the object is using, thereby destroying the object, at any time.

The last thing Marissa and I discussed was the ability to programmatically call the GC and monitor it via the exposed performance counters. Marissa explained that there are few times when you will need to programmatically interact with the GC, but in some rare instances - for say developing a utility or debugging - there are some useful methods you should be aware of to round out your understanding of the GC. These methods are described in Table 1, which should give you a basic understanding.

It's also important to become familiar with the .NET and system performance counters so that your efforts at debugging problems related to memory management are more fruitful. It's important to get to know the performance counters that are not part of .NET - but that are related to overall system memory usage - since they can give you clues as to how your application is impacting the overall system, or let you know if other applications are impacting your application.

Conclusion
I hope we have provided you with a practical and applicable review of how the GC works to manage memory and resources. It is my hope - and Marissa's - that you will be able to truly use the information contained here to design more robust applications and architectures.

SIDEBAR

The Finalize Method

  • The Finalize method can cause expensive or unmanaged resources to be held onto longer than required.
  • There is no public access to the Finalize method, and you have no control over when it will be called.
  • There is no guarantee as to the order by which the GC will call Finalize methods.
  • More Stories By John Gomez

    John Gomez, open source editor for .NET Developer's Journal, has over 25 years of software development and architectural experience, and is considered a leader in the design of highly distributed transaction systems. His interests include chaos- and fuzzy-based systems, self-healing and self-reliant systems, and offensive security technologies, as well as artificial intelligence. John started developing software at age 9 and is currently the CTO of Eclipsys Corporation, a worldwide leader in hospital and physician information systems.

    Comments (0)

    Share your thoughts on this story.

    Add your comment
    You must be signed in to add a comment. Sign-in | Register

    In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.