Friday, October 4, 2013

Think of Base64 Data as Plain Text

I think there is still confusion in the minds of many developers about Base64 encoding.  It's not encryption.  It doesn't offer any protection of the data at all.  It's trivial to decode with Burp Suite, a desktop utility, or a website such as this.

When testing a web application for security, be sure to decode any Base64-encoded strings you see (often they will have one or two "=" at the end).  You may find the resulting data is gibberish, which probably means it's encrypted or hashed.  Or you may find sensitive user data developers were hoping to hide.  You also might find technical information about the deployment infrastructure or a clue that leads you to another area of the application that has a security hole.

What is Base64 encoding?  Just a way to encode ANY data so it can be represented as plain ASCII text.  This is convenient in many situations.  Email attachments are sent as Base64 data for example.  For every 3 bytes of data, encoding will get you 4 ASCII characters.  It works by first concatenating ("smooshing") the data together in binary form, then chunking it up into 6 bit sequences.  Padding with zeros is done if needed so the data is a multiple of 3 bytes. Each 6 bit sequence is converted to an ASCII character according to a standard Base64 encoding table.

Here is an example of Base64 encoding in action.

1) Start with some data. It could be Hex, Binary, ASCII, and so on.  Let's use the word "Hi".

2) Convert each byte to binary. We need to add one zero for padding in this case.
H : ascii code 72 / binary 01001000
i : ascii code 105 / binary 01101001
0 : binary 00000000

3) Smoosh the binary data together:
010010000110100100000000

4) Chunk it up into 6 bit sequences:
010010 000110 100100 000000
The decimal equivalent is:
18  6  36  0

5) Convert to Base64 using the table below. Any artificial trailing zero is represented by a "=".

The Base64 string is:  SGk=