YOUR FEEDBACK
udaykiran wrote: Really Excellent Information. But i have some doubts. initially i have some aver...

SYS-CON.TV
TOP MICROSOFT .NET LINKS


Cover Story: Understanding Base64 Encoding
What it is, when to use it, and how to write custom Base64 encoding

The numBlocks variable represents the number of three-byte blocks in the input array, and also the number of four-character blocks in the output string. The padBytes variable holds the number of bytes that I will have to pad into the output string to bring the size of the string up to an even multiple of 4. If the size of input array is an even multiple of 3, then the number of blocks is just the size of the input divided by 3 and there is no padding. For example, if the input has size 27 bytes then my encoder will process 27 / 3 = 9 three-byte blocks and produce a string with 9 * 4 = 36 characters. If the input is not an even multiple of 3 then there is one extra block and either one or two padding bytes required. I do a rudimentary validation check:

if (padBytes < 0 || padBytes > 3)
     throw new Exception("Fatal logic error in padding code");

Because this article is primarily a tutorial, I have omitted most of the error-checking code for clarity. In a production system you will need to add a lot of additional error checks. Next, I set up three arrays where most of the encoding work is done:

byte[] newValue = new byte[numBlocks * 3];
for (int i = 0; i < newValue.Length; ++i) // not really necesary
     newValue[i] = 0;
for (int i = 0; i < value.Length; ++i)
     newValue[i] = value[i];

byte[] resultBytes = new byte[numBlocks * 4];
char[] resultChars = new char[numBlocks * 4];

I declare a byte array newValue which will be a copy of the input byte array, but expanded in size up to an even multiple of three bytes if necessary. I do this so I can process three bytes of input at a time. I explicitly zero-out array newValue but this is not necessary because when arrays are declared but not initialized they are filled with the default value for the array contents (which is 0 in this case). Next I copy the original input bytes into the working array newValue. Then I declare a byte array named resultBytes with size four times the number of input blocks. As explained earlier, each three bytes of input produces four characters of output. The resultBytes array will hold the output characters in byte form pending their conversion to characters. The resultChars array will hold the Base64 encoded string result except that it may need padding with one or two "=" characters. The main processing loop iterates through each block of input:

for (int i = 0; i < numBlocks; i++)
{
resultBytes[i * 4 + 0] =
    (byte)((newValue[i * 3 + 0] & 0xFC) >> 2);

resultBytes[i * 4 + 1] =
    (byte)((newValue[i * 3 + 0] & 0x03) << 4 |
       (newValue[i * 3 + 1] & 0xF0) >> 4);

resultBytes[i * 4 + 2] =
    (byte)((newValue[i * 3 + 1] & 0x0F) << 2 |
       (newValue[i * 3 + 2] & 0xC0) >> 6);

resultBytes[i * 4 + 3] =
    (byte)((newValue[i * 3 + 2] & 0x3F));
}

Here is where most of the work is performed. There aren't many lines of code here but they're a bit tricky. The process is best explained with a diagram as shown in Figure 3. To obtain the first character of output, I need to extract the leftmost six bits of the first byte of the input. To do this I can mask by logical ANDing (in C#, with the "&" operator) with value 0xFC, which is 1111 1100 in binary. Now if I perform a logical right shift two bits (using " >> 2"), I will have the leftmost six bits. The other logical operations are similar and if you trace through the masking and bit shifting code with a paper and pencil you'll see how each bye of output is determined. Once I have the output in byte form, I can compute the equivalent character form using my base64Chars lookup table:

for (int i = 0; i < numBlocks * 4; ++i)
     resultChars[i] = base64Chars[resultBytes[i]];

Now all that's left to do is to pad the trailing output characters with "=" where necessary:

if (padBytes == 0)
     ;
else if (padBytes == 1)
     resultChars[numBlocks * 4 - 1] = '=';
else if (padBytes == 2)
{
     resultChars[numBlocks * 4 - 1] = '=';
     resultChars[numBlocks * 4 - 2] = '=';
}

I use the padBytes value I computed earlier and add either two, one, or zero "=" characters at the end of the result char array. The null statement when padBytes has value 0 is a bit ugly and you can leave it out if you wish. I finish the encoding routine by converting the result char array to a string using the overloaded String object constructor, which accepts a character array, and then I return the result string:

string s = new string(resultChars);
return s;

With the custom encoder in place, you can write code that mirrors encoding using the .NET Framework methods. For example:

byte[] input = new byte[] { 0x5F, 0xC9, 0xBF, 0x17 };
string output = MyConverter.ToBase64String(input);
Console.WriteLine(output);

The Custom Base64 Decoder
In most situations a custom Base64 encoder is useless without its corresponding decoder. Listing 2 presents one way to write a Base64 decoder. Because the concepts involved in decoding are essentially the same as those for encoding, I won't go over the decoding implementation in detail.

The private ValueOf() method accepts one of the Base64 characters and returns the numeric value that corresponds to the lookup table in the ToBase64String() method. For example, if the input character is "A," the helper method will return "A" - 65 = 65 - 65 = 0. If you write a custom Base64 encoder with a different character set, then you'll have to modify the logic in ValueOf() accordingly.

Conclusion
The most common use of a Base64 encoding is to send binary data over e-mail in MIME format. The specifications for this particular type of Base64 encoding are contained in RFC 1421 and RFC 2045. Because Base64 encoding is so often associated with MIME, it is easy to incorrectly assume that this is the only kind of Base64 encoding. If you encounter Base64 encoding in a system or specification, make sure you clearly determine what particular flavor of Base64 encoding is being used. For example, MIME Base64 encoding specifies that the encoded output stream must be represented in lines of no more than 76 characters each. However, a generic Base64 encoding scheme may not have this restriction.

The .NET Framework Convert.ToBase64String() and Convert.From-Base64String() methods will meet the majority of your Base64 needs. However, knowing how to implement a custom scheme may be useful in several situations. One possible scenario is that you inherit a legacy system with a custom encoding scheme and you need to decode data from that system. Another possible use of a custom Base64 encoding scheme is to provide rudimentary obfuscation of data. If you use a custom scheme to encode data being transmitted over an open communications channel, you can scramble your data. Of course this is by no means data encryption or a security mechanism - it's just a way to deter casual inspection of your data.

To summarize, Base64 encoding is a way to represent arbitrary binary data as a string composed of characters from a 64-character set. Base64 encoding is useful when you want to transmit binary data over a communication channel that is inherently text-based, such as SMTP or HTTP. Base64 encoding is more efficient in terms of encoding size than basic hexadecimal encoding. The .NET Framework has simple and effective Base64 methods that will suit most of your needs. However if you need to implement a custom Base64 scheme, you can use the custom implementation code presented in this article as a basis to get started.

About James McCaffrey
Dr. James McCaffrey works for Volt Information Sciences, Inc., where he manages technical training for software engineers working at Microsoft's Redmond, WA campus. He has worked on several Microsoft products, including Internet Explorer and MSN Search. James can be reached at jmccaffrey@volt.com or v-jammc@microsoft.com.

YOUR FEEDBACK
Kumanan Murugesan wrote: Dr. James, Wonderful article. I was wondering what this does and why is it required many times like other folks.
SYS-CON Belgium News Desk wrote: If you work in a .NET environment you have probably come across Base64 encoded data. For example, Base64 encoding is used in ASP.NET for a Web application's ViewState value, as shown in Figure 1. Base64 encoding is also used to transmit binary data over e-mail. However, if you are like most of my colleagues (and me until recently) you do not have a thorough understanding of precisely what Base64 encoding is and when Base64 encoding should be used. In the this article I will explain exactly what Base64 encoding is, show you how to use the two primary .NET Framework methods that support Base64 encoding and decoding, and present a lightweight, custom C# implementation of Base64 encoding and decoding methods. This article assumes you are a .NET developer, tester, or manager and have intermediate level C# coding skill. After reading the article you'll have a solid grasp of Base64 encoding as...
MICROSOFT .NET LATEST STORIES
VMware’s ESX hypervisor has become the first third-party hypervisor accredited under Microsoft’s months-old Server Virtualization Validation Program (SVVP). The validation applies to VMware ESX 3.5 update 2 (ESX 3.5u2) and means VMware customers who run Windows Server and Microsoft...
We are seeing more being written about Cloud computing and cloud platforms today, and there is strong validation that the future of computing will include significant innovation and value in web/cloud platforms. Microsoft’s Cloud strategy is materializing, and as part of our overall ...
Nth Penguin has released WW.DataServices to the public and is available for immediate download at: www.nthpenguin.com. WW.DataServices, the first system of the WebWidgetry engine, removes all the work from accessing your data. You simply point it to a database location, push a button,...
Gizmox announced the release of a fully functional beta version of its Visual WebGui (VWG) with support for Microsoft Silverlight. For the first time, VWG enables Silverlight for enterprise applications by providing a RAD like Windows Forms development experience with drag & drop desig...
Google will come out from behind the Firefox browser that it’s been pumping money into – and profiting royally from – and take direct aim at Microsoft with a browser of its very own. The widgetry is called Google Chrome and Google Chrome, like all of Google’s non-search widgetr...
SUBSCRIBE TO THE WORLD'S MOST POWERFUL NEWSLETTERS
SUBSCRIBE TO OUR RSS FEEDS & GET YOUR SYS-CON NEWS LIVE!
Click to Add our RSS Feeds to the Service of Your Choice:
Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
Publish Your Article! Please send it to editorial(at)sys-con.com!

Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021


SYS-CON FEATURED WHITEPAPERS

ADS BY GOOGLE
BREAKING NEWS FROM THE WIRES
Neverfail, a leading global software company providing affordable continuous availability and disast...