| By Gabriel Torok | Article Rating: |
|
| January 30, 2003 12:00 AM EST | Reads: |
17,122 |
Are you aware that you might be shipping your source code with your .NET dll or exe? A new tool included in Microsoft's Visual Studio .NET 2003 can help you make sure that does not happen.
The .NET platform realizes Microsoft's vision for the next paradigm in Windows computing: multiple programming languages interacting harmoniously, sharing an enriched object-based framework, and executed by a Common Language Runtime (CLR). This architecture provides an unprecedented degree of power and flexibility. Unfortunately, that flexible design inherently produces a problem for those wishing to hide their program's intellectual property. Programs in the .NET Framework are easy to reverse engineer. This is not in any way a fault in the design of .NET; it is simply a reality of modern, intermediate-compiled languages (Java suffers from this problem too). Both Java and .NET use expressive file syntax for delivery of executable code: bytecode in the case of Java, MSIL (Microsoft Intermediate Language) for .NET. Being much higher-level than binary machine code, the intermediate files are laden with identifiers and algorithms that are immediately observable and ultimately understandable. After all, it is obviously difficult to make something easy to understand, flexible, and extensible while simultaneously hiding its crucial details.
So anyone with a copy of ILDASM or better yet, one of the commercial .NET decompilers can look at your assemblies and reverse engineer your source code. Suddenly, your software licensing code, copy protection mechanisms, and proprietary business logic are much more available for all to see whether it's legal or not. Anyone can peruse the details of your software for whatever reason. They can search for security flaws to exploit, steal unique ideas, crack programs, etc. This should be enough to make you pause for thought.
All of that said, it should not be considered a risk or a showstopper. Organizations concerned about putting their intellectual property on the .NET Platform need to understand that there is a solution to help thwart reverse engineering. Obfuscation is a technique that provides for seamless renaming of symbols in assemblies, as well as other tricks to foil decompilers. Properly applied, obfuscation can increase the protection against decompilation by many orders of magnitude, while leaving the application intact. Obfuscation is commonly used in Java environments and for years has helped companies feel safe about protecting their intellectual property when they release their Java-based products.
As you'd expect, Microsoft isn't passively watching as this issue develops. As of Visual Studio .NET 2003, they're including a "lite" version of PreEmptive Solutions' Dotfuscator, accessible from the toolbar. Microsoft is known for treating developers like important customers (which they are), and they're not missing the boat on this either. They are providing a solution right out of the box. This article delves into the world of .NET obfuscation. Along the way, you will develop an understanding of how obfuscation is successfully applied.
Background
Obfuscation is the technology of shrouding the facts. It's not
encryption, but in the context of .NET (or Java) code, it might be better.
Early in Java's life, several companies produced encrypting class loaders to
fully encrypt Java classes. Decryption was done just in time, prior to execution. Although it made classes completely unreadable, this methodology suffered from a classic encryption flaw; it needed to keep the decryption key with the encrypted data.
Therefore, an automated utility could be created to decrypt the code and put
it out to disk. Once that happens, the fully unencrypted, unobfuscated code
is in plain view.
As another illustration, you could compare encryption to locking a six-item meal into a lockbox. Only the intended diner (i.e., the Common Language Runtime) has the key, and we don't want anyone else to know what he or she is going to eat. Unfortunately, if someone can pick the lock (or find the key hidden on the bottom of the box), the food is in plain view. Obfus- cation works more like putting the six-item meal into a blender and sending it to the diner in a baggie. Sure everyone can see the food in transit, but besides a lucky pea or some beef-colored goop, they don't know what the original meal is. The diner still gets the intended delivery and the meal still provides the same nutritional value it did before (luckily, CLRs aren't picky about taste). The trick of an obfuscator is to confuse observers, while still giving CLRs the same delivery.
Without argument, obfuscation (or even encryption) is not 100% protection. Even compiled C++ is disassembleable. If a hacker is persistent enough, he or she can find the meaning of your code. Also, humans write and employ decompilers to automate decompilation algorithms that are too challenging for the mind to follow. It is safe to say that any obfuscator that confuses a decompiler will pose even more of a deterrence to a less-capable human attempting the same undertaking. The goal of obfuscation is to form a barrier that knocks out as many would-be reverse engineers as possible by creating confusion.
As confusion builds, the ability of the human mind to comprehend multifaceted intellectual concepts deteriorates. Note that this precept says nothing about altering the forward (executable) logic only representing it incomprehensibly. When an obfuscator goes to work on readable program instructions, a possible side effect is that the output will not only confuse a human interpreter, it will stop a decompiler. While the forward logic has been preserved, the reverse semantics have been rendered nondeterministic. As a result, any attempt to reverse engineer the instructions into a "programming dialect" like C# or VB will likely fail because the translation is ambiguous. Deep obfuscation creates a myriad of decompilation possibilities, some of which might produce incorrect logic if recompiled. The decompiler, as a computing machine, has no way of knowing which of the possibilities could be recompiled with valid semantics.
Issues
The obvious concern getting the most buzz in .NET developer circles is
the threat of intellectual property theft. We hear this discussed at
conferences and see it as a forum topic in online newsgroups. The developer
community is concerned for good reason. They intend to produce commercial
Windows software with .NET and this is a very competitive industry. The
barriers to entry are low. Anyone with skill, hardware, and some basic tools
can begin to create programs that have the potential to enter the
competitive arena. For reasons just explained, .NET introduces the
possibility that competitors can inspect your code. Even if they don't copy
it outright, they can certainly glean algorithms and constructs useful to
their own endeavors, leaving you holding the bag.
A less obvious effect of MSIL readability is the exhibition of confidential constructs such as your software licensing, copy protection, or encryption code. The problem here is more subtle, but equally perilous. By exposing your security logic to the public, you are giving them a roadmap to cracking your algorithms.
The third issue is that of code bloat. .NET is fully object oriented. The world has come to a place that accepts this as the programming paradigm of choice no argument there. One of the benefits of OOP is the ability to use class libraries to quickly bypass the development of tedious "plumbing" code. Instead, developers inherit from a coordinated set of classes that have been tested and offer a rich palette of functionality. In fact, this set might be richer than we need for a given application. Where does all that extra functionality go when you compile? It goes right into your application code. As post-compilation tools, obfuscators are in the perfect position to help us with this bloat. High-end obfuscators are available that remove unused code as a by-product of their multipass analysis. This expands the role of obfuscator to include that of code sizereducer.
The Basic Solution
Today, some commercial obfuscators employ a renaming technique that
applies trivial identifiers. Typically, these can be as short as a single
character. As the obfuscator processes the code, it selects the next
available trivial identifier for substitution. This seemingly simple
renaming scheme has a huge advantage over hashing or character-set offset:
it cannot be reversed. While the program logic is preserved, the names
become nonsense. At this point, it has hampered human understanding to a
large degree. Faced with identifiers like a, t.bb(), ct, and 2s(e4), it is a
stretch to translate the semantic purpose to be concepts like invoiceID,
address.print(), userName, and deposit(amount). Nevertheless, the program
logic can be reverse engineered.
A deeper form of obfuscation uses Overload Induction, a patented algorithm devised by PreEmptive Solutions, Inc. (this scheme is included in the Visual Studio version). Trivial renaming is used; however, a crafty twist is added. Method identifiers are maximally overloaded after an exhaustive scope analysis. Instead of substituting one new name for each old name, Overload Induction will rename as many methods as possible to the same name. After this deep obfuscation, the logic, while not destroyed, is beyond comprehension. See for yourself. The simple example shown in Listings 1 and 2 gives you some idea of the power of the Overload Induction technique:
One of the things you probably noticed about the example is that the obfuscated code is more compact. A positive side effect of renaming is size reduction. For example, if you have a name that is 20 characters long, renaming it to a() saves a lot of space (specifically 19 characters). This also saves space by conserving string heap entries. Renaming everything to "a" means that "a" is stored only once, and each method or field renamed to "a" can point to it. Overload Induction enhances this effect because the shortest identifiers are continually reused. Typically, an Overload Induced project will have up to 35% of the methods renamed to a().
Obfuscators remove debug information and nonessential metadata from an MSIL file as they process it. Aside from enhancing protection and security, this also contributes to the size reduction of MSIL files.
It is important to understand that obfuscation is a process that is applied to compiled MSIL code, not source code. Your development environment and tools will not change to accommodate renaming. Source code is never altered, or even read, in any way. Obfuscated MSIL code is functionally equivalent to traditional MSIL code and will execute on the CLR with identical results. (The reverse, however, is not true. Even if it were possible to decompile strongly obfuscated MSIL, it would have significant semantic disparities when compared to the original source code.) Figure 1 shows the flow of the classic obfuscation process.
Solution Enhancements
One of the more advanced obfuscation techniques available today is
Control-Flow obfuscation. This process synthesizes branching, conditional,
and iterative constructs that produce valid forward logic, but yield
nondeterministic semantic results when decompilation is attempted. All of
the admonishments you ever heard about maintaining spaghetti code are
working in your favor when you try to protect your intellectual property
using Control-Flow obfuscation. Consider trying to understand the code in
Listings 3 and 4 before and after Control-Flow obfuscation. It should be
obvious that after Control-Flow obfuscation the reverse engineered code is
very ugly at worst and incorrect (not recompilable) at best.
Another technique, string encryption, applies a simple encryption algorithm to any strings in your application that you desire. As mentioned before, any encryption (or specifically decryption) done at runtime is inherently insecure. That is, a smart hacker can eventually break it, but for strings present in customer code, it is worthwhile. Let's face it; if hackers want to get into your code, they don't blindly start searching renamed types. They probably do a search for "Invalid License Key", which points right to the code where license handling is performed. Searching on strings is incredibly easy. String encryption raises the bar for the casual hacker and deters that many more nonserious hackers. The algorithm typically incurs a tiny performance penalty at runtime, so make sure the option is fully configurable.
An advanced feature called incremental obfuscation is of particular interest to enterprise development teams maintaining an integrated application environment. By generating name-mapping records during an obfuscation run, obfuscated API names can be reapplied and preserved in successive runs. A partial build can be done with the full expectation that its access points will be renamed the same as a prior build. As a result, the distributed patch files integrate into the previously deployed system without a hitch.
Last, obfuscators can accomplish size reduction by analyzing your application and removing code your program is not using. It seems odd that unused-code removal can actually do anything who writes code they don't use? Well, the answer is all of us. What's more, we all use libraries and types written by other people that were written to be reusable. Reusable code implies there is contingent code that handles many cases however, in any given application you typically only use one or two of those many cases. An advanced obfuscator can figure that out and rip out all the unused code (from compiled MSIL, not the source). The result is that the output contains precisely the types and methods your application needs, nothing more. Amazing space reduction can be achieved, conserving computing resources and reducing instantiation times. This can be especially important for .NET Compact Framework or remotely deployed applications.
Conclusion
Microsoft's .NET Framework provides one of the best software development
platforms available today. Expect all Windows developers (and even some
non-Windows developers) to eventually make the switch to .NET. Given this
reality, the next step is to address any concerns you might have about
protecting your code from reverse engineering. Obviously, this need not be
considered a risk or a showstopper; the problem is solved. To get started
using an obfuscator, consider downloading a free copy of Dotfuscator
Community Edition at www.preemptive.com/dotfuscator or use it right from the
Tools menu of Microsoft's Visual Studio .Net 2003 (see Figure 2). Should you
want more powerful obfuscation and size reduction, you can upgrade to
PreEmptive's Dotfuscator Professional Edition. You may never know what an
obfuscator is worth unless you do not use one!
Published January 30, 2003 Reads 17,122
Copyright © 2003 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Gabriel Torok
Gabriel Torok is a founding principal at PreEmptive Solutions, Inc. He is a book author and active national conference speaker. He is directly involved in most aspects of the business, with a primary focus on product development, and sales and marketing. In addition to company management, he remains active in teaching Java, .NET and related technologies.
- Cloud People: A Who's Who of Cloud Computing
- New Relic Q1 2013 Blazes Past Growth Targets and Reaches 40,000 Active Customer Accounts
- Streamline Health® Engages KPMG as Its New Independent Registered Public Accountants
- Session Topics: 12th Cloud Expo / Cloud Expo New York
- Cloud Expo New York: Developing the World’s First IaaS Marketplace
- Cloud Expo New York: Aligning Your Cloud Security with the Business
- Commander of U.S. Cyber Command and National Security Agency Director, General Keith Alexander, To Keynote Day One of Black Hat USA 2013
- Five Big Data Features in SQL Server
- According to Nick Gholkar, Accounting Apps Make Conducting Business Easier
- NIST to Sponsor FFRDC Widespread Adoption of Integrated CyberSecurity
- Cloud Business Solutions, Social Media, and Platform Systems of Engagement Market Shares, Strategies, and Forecasts, Worldwide, 2013 to 2019
- Lunch Keynote at Cloud Expo | Strategies for App Delivery in the Cloud Era
- Cloud People: A Who's Who of Cloud Computing
- Windows Azure IaaS Reaches General Availability
- AMD and Adobe Collaborate on Upcoming Version of Adobe Premiere Pro Software to Enable Breakthrough Video Editing Performance Through Open Standards
- New Relic Q1 2013 Blazes Past Growth Targets and Reaches 40,000 Active Customer Accounts
- State and Local Governments Adopt Microsoft Dynamics CRM to Improve Citizen Service Delivery
- Cloud Expo New York: Deploying Hybrid Cloud for Performance and Uptime
- Predixion Software Announces General Availability of the Latest Version of its Predictive Analytics Platform
- Streamline Health® Engages KPMG as Its New Independent Registered Public Accountants
- Session Topics: 12th Cloud Expo / Cloud Expo New York
- Cloud Expo New York: Developing the World’s First IaaS Marketplace
- MEI Pharma Announces $15.2 Million Registered Offering Of Common Stock
- Cloud Computing Is Simplifying Things
- Google Maps and ASP.NET
- Converting VB6 to VB.NET, Part I
- How to Write High-Performance C# Code
- Where Are RIA Technologies Headed in 2008?
- Crystal Reports XI & How It Has Changed
- Creating Controls for.NET Compact Framework in Visual Studio 2005
- Programmatically Posting Data to ASP .NET Web Applications
- Implementing Tab Navigation with ASP.NET 2.0
- AJAX World RIA Conference & Expo Kicks Off in New York City
- i-Technology Viewpoint: "SOA Sucks"
- .NET Archives: Getting Reacquainted with the Father of C#
- i-Technology Photo Exclusive: Bill Gates & Steve Jobs In "Nerds"



















