Welcome!

Microsoft Cloud Authors: Janakiram MSV, Yeshim Deniz, David H Deans, Andreas Grabner, Stackify Blog

Related Topics: Microsoft Cloud

Microsoft Cloud: Article

Cover Story: Understanding Base64 Encoding

What it is, when to use it, and how to write custom Base64 encoding

The numBlocks variable represents the number of three-byte blocks in the input array, and also the number of four-character blocks in the output string. The padBytes variable holds the number of bytes that I will have to pad into the output string to bring the size of the string up to an even multiple of 4. If the size of input array is an even multiple of 3, then the number of blocks is just the size of the input divided by 3 and there is no padding. For example, if the input has size 27 bytes then my encoder will process 27 / 3 = 9 three-byte blocks and produce a string with 9 * 4 = 36 characters. If the input is not an even multiple of 3 then there is one extra block and either one or two padding bytes required. I do a rudimentary validation check:

if (padBytes < 0 || padBytes > 3)
     throw new Exception("Fatal logic error in padding code");

Because this article is primarily a tutorial, I have omitted most of the error-checking code for clarity. In a production system you will need to add a lot of additional error checks. Next, I set up three arrays where most of the encoding work is done:

byte[] newValue = new byte[numBlocks * 3];
for (int i = 0; i < newValue.Length; ++i) // not really necesary
     newValue[i] = 0;
for (int i = 0; i < value.Length; ++i)
     newValue[i] = value[i];

byte[] resultBytes = new byte[numBlocks * 4];
char[] resultChars = new char[numBlocks * 4];

I declare a byte array newValue which will be a copy of the input byte array, but expanded in size up to an even multiple of three bytes if necessary. I do this so I can process three bytes of input at a time. I explicitly zero-out array newValue but this is not necessary because when arrays are declared but not initialized they are filled with the default value for the array contents (which is 0 in this case). Next I copy the original input bytes into the working array newValue. Then I declare a byte array named resultBytes with size four times the number of input blocks. As explained earlier, each three bytes of input produces four characters of output. The resultBytes array will hold the output characters in byte form pending their conversion to characters. The resultChars array will hold the Base64 encoded string result except that it may need padding with one or two "=" characters. The main processing loop iterates through each block of input:

for (int i = 0; i < numBlocks; i++)
{
resultBytes[i * 4 + 0] =
    (byte)((newValue[i * 3 + 0] & 0xFC) >> 2);

resultBytes[i * 4 + 1] =
    (byte)((newValue[i * 3 + 0] & 0x03) << 4 |
       (newValue[i * 3 + 1] & 0xF0) >> 4);

resultBytes[i * 4 + 2] =
    (byte)((newValue[i * 3 + 1] & 0x0F) << 2 |
       (newValue[i * 3 + 2] & 0xC0) >> 6);

resultBytes[i * 4 + 3] =
    (byte)((newValue[i * 3 + 2] & 0x3F));
}

Here is where most of the work is performed. There aren't many lines of code here but they're a bit tricky. The process is best explained with a diagram as shown in Figure 3. To obtain the first character of output, I need to extract the leftmost six bits of the first byte of the input. To do this I can mask by logical ANDing (in C#, with the "&" operator) with value 0xFC, which is 1111 1100 in binary. Now if I perform a logical right shift two bits (using " >> 2"), I will have the leftmost six bits. The other logical operations are similar and if you trace through the masking and bit shifting code with a paper and pencil you'll see how each bye of output is determined. Once I have the output in byte form, I can compute the equivalent character form using my base64Chars lookup table:

for (int i = 0; i < numBlocks * 4; ++i)
     resultChars[i] = base64Chars[resultBytes[i]];

Now all that's left to do is to pad the trailing output characters with "=" where necessary:

if (padBytes == 0)
     ;
else if (padBytes == 1)
     resultChars[numBlocks * 4 - 1] = '=';
else if (padBytes == 2)
{
     resultChars[numBlocks * 4 - 1] = '=';
     resultChars[numBlocks * 4 - 2] = '=';
}

I use the padBytes value I computed earlier and add either two, one, or zero "=" characters at the end of the result char array. The null statement when padBytes has value 0 is a bit ugly and you can leave it out if you wish. I finish the encoding routine by converting the result char array to a string using the overloaded String object constructor, which accepts a character array, and then I return the result string:

string s = new string(resultChars);
return s;

With the custom encoder in place, you can write code that mirrors encoding using the .NET Framework methods. For example:

byte[] input = new byte[] { 0x5F, 0xC9, 0xBF, 0x17 };
string output = MyConverter.ToBase64String(input);
Console.WriteLine(output);

The Custom Base64 Decoder
In most situations a custom Base64 encoder is useless without its corresponding decoder. Listing 2 presents one way to write a Base64 decoder. Because the concepts involved in decoding are essentially the same as those for encoding, I won't go over the decoding implementation in detail.

The private ValueOf() method accepts one of the Base64 characters and returns the numeric value that corresponds to the lookup table in the ToBase64String() method. For example, if the input character is "A," the helper method will return "A" - 65 = 65 - 65 = 0. If you write a custom Base64 encoder with a different character set, then you'll have to modify the logic in ValueOf() accordingly.

Conclusion
The most common use of a Base64 encoding is to send binary data over e-mail in MIME format. The specifications for this particular type of Base64 encoding are contained in RFC 1421 and RFC 2045. Because Base64 encoding is so often associated with MIME, it is easy to incorrectly assume that this is the only kind of Base64 encoding. If you encounter Base64 encoding in a system or specification, make sure you clearly determine what particular flavor of Base64 encoding is being used. For example, MIME Base64 encoding specifies that the encoded output stream must be represented in lines of no more than 76 characters each. However, a generic Base64 encoding scheme may not have this restriction.

The .NET Framework Convert.ToBase64String() and Convert.From-Base64String() methods will meet the majority of your Base64 needs. However, knowing how to implement a custom scheme may be useful in several situations. One possible scenario is that you inherit a legacy system with a custom encoding scheme and you need to decode data from that system. Another possible use of a custom Base64 encoding scheme is to provide rudimentary obfuscation of data. If you use a custom scheme to encode data being transmitted over an open communications channel, you can scramble your data. Of course this is by no means data encryption or a security mechanism - it's just a way to deter casual inspection of your data.

To summarize, Base64 encoding is a way to represent arbitrary binary data as a string composed of characters from a 64-character set. Base64 encoding is useful when you want to transmit binary data over a communication channel that is inherently text-based, such as SMTP or HTTP. Base64 encoding is more efficient in terms of encoding size than basic hexadecimal encoding. The .NET Framework has simple and effective Base64 methods that will suit most of your needs. However if you need to implement a custom Base64 scheme, you can use the custom implementation code presented in this article as a basis to get started.

More Stories By James McCaffrey

Dr. James McCaffrey works for Volt Information Sciences, Inc., where he manages technical training for software engineers working at Microsoft's Redmond, WA campus. He has worked on several Microsoft products, including Internet Explorer and MSN Search. James can be reached at [email protected] or [email protected]

Comments (3) View Comments

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Most Recent Comments
trash_incinerator 02/19/10 01:17:00 PM EST

With all due respect to the author, your explanation for why Base64 encoding exists is wrong. The "string" of hex for your example is not comprised of 10 characters as you have indicated. Hex uses 2 digits to represent 8 bits. Four bits for the first digit and four bits for the second. There are, in fact, only 5 bytes of data there, whereas the Base64 encoded string is using 8 bytes.

Base64 is NOT a means of compressing data. In fact it makes the information being represented larger. The reason why this is sometimes necessary is because of the fact that systems assign special meanings to specific bytes or byte sequences. In XML for example, there are special bytes that are not considered valid characters in XML. In order to send information in an XML stream with characters that are not allowed, you have to replace the illegal bytes with legal ones. Hence, Base64 allows you to take arbitrary bytes and reassign them in a way that can be reversed later.

Kumanan Murugesan 04/16/08 10:07:55 AM EDT

Dr. James,
Wonderful article. I was wondering what this does and why is it required many times like other folks.

SYS-CON Belgium News Desk 03/19/06 10:04:38 AM EST

If you work in a .NET environment you have probably come across Base64 encoded data. For example, Base64 encoding is used in ASP.NET for a Web application's ViewState value, as shown in Figure 1. Base64 encoding is also used to transmit binary data over e-mail. However, if you are like most of my colleagues (and me until recently) you do not have a thorough understanding of precisely what Base64 encoding is and when Base64 encoding should be used. In the this article I will explain exactly what Base64 encoding is, show you how to use the two primary .NET Framework methods that support Base64 encoding and decoding, and present a lightweight, custom C# implementation of Base64 encoding and decoding methods. This article assumes you are a .NET developer, tester, or manager and have intermediate level C# coding skill. After reading the article you'll have a solid grasp of Base64 encoding as well as the ability to write your own custom encoding methods. I think you'll find the ability to use Base64 encoded data is a valuable addition to your skill set.

@ThingsExpo Stories
SYS-CON Events announced today that Avere Systems, a leading provider of enterprise storage for the hybrid cloud, will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Avere delivers a more modern architectural approach to storage that doesn't require the overprovisioning of storage capacity to achieve performance, overspending on expensive storage media for inactive data or the overbui...
With major technology companies and startups seriously embracing Cloud strategies, now is the perfect time to attend 21st Cloud Expo October 31 - November 2, 2017, at the Santa Clara Convention Center, CA, and June 12-14, 2018, at the Javits Center in New York City, NY, and learn what is going on, contribute to the discussions, and ensure that your enterprise is on the right path to Digital Transformation.
Smart cities have the potential to change our lives at so many levels for citizens: less pollution, reduced parking obstacles, better health, education and more energy savings. Real-time data streaming and the Internet of Things (IoT) possess the power to turn this vision into a reality. However, most organizations today are building their data infrastructure to focus solely on addressing immediate business needs vs. a platform capable of quickly adapting emerging technologies to address future ...
Digital transformation is changing the face of business. The IDC predicts that enterprises will commit to a massive new scale of digital transformation, to stake out leadership positions in the "digital transformation economy." Accordingly, attendees at the upcoming Cloud Expo | @ThingsExpo at the Santa Clara Convention Center in Santa Clara, CA, Oct 31-Nov 2, will find fresh new content in a new track called Enterprise Cloud & Digital Transformation.
Most technology leaders, contemporary and from the hardware era, are reshaping their businesses to do software. They hope to capture value from emerging technologies such as IoT, SDN, and AI. Ultimately, irrespective of the vertical, it is about deriving value from independent software applications participating in an ecosystem as one comprehensive solution. In his session at @ThingsExpo, Kausik Sridhar, founder and CTO of Pulzze Systems, will discuss how given the magnitude of today's applicati...
SYS-CON Events announced today that NetApp has been named “Bronze Sponsor” of SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. NetApp is the data authority for hybrid cloud. NetApp provides a full range of hybrid cloud data services that simplify management of applications and data across cloud and on-premises environments to accelerate digital transformation. Together with their partners, NetApp emp...
As popularity of the smart home is growing and continues to go mainstream, technological factors play a greater role. The IoT protocol houses the interoperability battery consumption, security, and configuration of a smart home device, and it can be difficult for companies to choose the right kind for their product. For both DIY and professionally installed smart homes, developers need to consider each of these elements for their product to be successful in the market and current smart homes.
Widespread fragmentation is stalling the growth of the IIoT and making it difficult for partners to work together. The number of software platforms, apps, hardware and connectivity standards is creating paralysis among businesses that are afraid of being locked into a solution. EdgeX Foundry is unifying the community around a common IoT edge framework and an ecosystem of interoperable components.
Join IBM November 1 at 21st Cloud Expo at the Santa Clara Convention Center in Santa Clara, CA, and learn how IBM Watson can bring cognitive services and AI to intelligent, unmanned systems. Cognitive analysis impacts today’s systems with unparalleled ability that were previously available only to manned, back-end operations. Thanks to cloud processing, IBM Watson can bring cognitive services and AI to intelligent, unmanned systems. Imagine a robot vacuum that becomes your personal assistant th...
SYS-CON Events announced today that Avere Systems, a leading provider of hybrid cloud enablement solutions, will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Avere Systems was created by file systems experts determined to reinvent storage by changing the way enterprises thought about and bought storage resources. With decades of experience behind the company’s founders, Avere got its ...
SYS-CON Events announced today that Golden Gate University will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Since 1901, non-profit Golden Gate University (GGU) has been helping adults achieve their professional goals by providing high quality, practice-based undergraduate and graduate educational programs in law, taxation, business and related professions. Many of its courses are taug...
SYS-CON Events announced today that SIGMA Corporation will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. uLaser flow inspection device from the Japanese top share to Global Standard! Then, make the best use of data to flip to next page. For more information, visit http://www.sigma-k.co.jp/en/.
High-velocity engineering teams are applying not only continuous delivery processes, but also lessons in experimentation from established leaders like Amazon, Netflix, and Facebook. These companies have made experimentation a foundation for their release processes, allowing them to try out major feature releases and redesigns within smaller groups before making them broadly available. In his session at 21st Cloud Expo, Brian Lucas, Senior Staff Engineer at Optimizely, will discuss how by using...
In this strange new world where more and more power is drawn from business technology, companies are effectively straddling two paths on the road to innovation and transformation into digital enterprises. The first path is the heritage trail – with “legacy” technology forming the background. Here, extant technologies are transformed by core IT teams to provide more API-driven approaches. Legacy systems can restrict companies that are transitioning into digital enterprises. To truly become a lead...
SYS-CON Events announced today that CAST Software will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. CAST was founded more than 25 years ago to make the invisible visible. Built around the idea that even the best analytics on the market still leave blind spots for technical teams looking to deliver better software and prevent outages, CAST provides the software intelligence that matter ...
SYS-CON Events announced today that Daiya Industry will exhibit at the Japanese Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Ruby Development Inc. builds new services in short period of time and provides a continuous support of those services based on Ruby on Rails. For more information, please visit https://github.com/RubyDevInc.
As businesses evolve, they need technology that is simple to help them succeed today and flexible enough to help them build for tomorrow. Chrome is fit for the workplace of the future — providing a secure, consistent user experience across a range of devices that can be used anywhere. In her session at 21st Cloud Expo, Vidya Nagarajan, a Senior Product Manager at Google, will take a look at various options as to how ChromeOS can be leveraged to interact with people on the devices, and formats th...
SYS-CON Events announced today that Yuasa System will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Yuasa System is introducing a multi-purpose endurance testing system for flexible displays, OLED devices, flexible substrates, flat cables, and films in smartphones, wearables, automobiles, and healthcare.
SYS-CON Events announced today that Taica will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Taica manufacturers Alpha-GEL brand silicone components and materials, which maintain outstanding performance over a wide temperature range -40C to +200C. For more information, visit http://www.taica.co.jp/english/.
SYS-CON Events announced today that SourceForge has been named “Media Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. SourceForge is the largest, most trusted destination for Open Source Software development, collaboration, discovery and download on the web serving over 32 million viewers, 150 million downloads and over 460,000 active development projects each and every month.