Base64 is one of those encoding schemes that developers encounter constantly but rarely think about deeply. It shows up in data URIs, email attachments, API responses, and authentication tokens. Understanding how it actually works — not just calling btoa() and moving on — helps you make better decisions about when to use it and when to avoid it.
What Base64 Is and Why It Exists
Base64 is a binary-to-text encoding scheme that represents binary data using a set of 64 printable ASCII characters. It was designed to solve a specific problem: many transport protocols and storage systems were built to handle text, not arbitrary binary data. Email (SMTP), for example, was originally a 7-bit ASCII protocol. If you wanted to send a JPEG image through email, you needed a way to represent those binary bytes as safe, printable text characters.
The name "Base64" comes from the fact that it uses 64 distinct characters to represent data. Compare this with Base10 (decimal, 10 digits), Base16 (hexadecimal, 16 characters), or Base2 (binary, 2 digits). The higher the base, the more compact the representation. Base64 strikes a balance between compactness and using only characters that survive intact through text-based systems.
The Algorithm Step by Step
The Base64 encoding process converts every 3 bytes (24 bits) of input into 4 characters of output. Here is the step-by-step breakdown:
- Take 3 bytes of input. Each byte is 8 bits, so 3 bytes give you 24 bits total.
- Split the 24 bits into four 6-bit groups. Since 2^6 = 64, each 6-bit group maps to exactly one of the 64 characters in the Base64 alphabet.
- Map each 6-bit value to a character using the Base64 index table.
- Repeat for the next 3 bytes until the input is exhausted.
Let us encode the string "Cat" as a concrete example. The ASCII values are: C = 67, a = 97, t = 116.
Step 1: Convert to binary (8 bits each)
C = 01000011
a = 01100001
t = 01110100
Step 2: Concatenate all 24 bits
010000110110000101110100
Step 3: Split into 6-bit groups
010000 | 110110 | 000101 | 110100
Step 4: Convert each group to decimal
16 | 54 | 5 | 52
Step 5: Look up in Base64 alphabet
Q | 2 | F | 0
Result: "Cat" encodes to "Q2F0"
The 64-Character Alphabet
The standard Base64 alphabet (defined in RFC 4648) consists of:
- A-Z (indices 0-25): uppercase letters
- a-z (indices 26-51): lowercase letters
- 0-9 (indices 52-61): digits
- + (index 62): plus sign
- / (index 63): forward slash
The = character is used for padding (not part of the 64 encoding characters). It serves a special purpose in handling input that is not a multiple of 3 bytes.
Padding with the = Character
Since Base64 processes input in 3-byte chunks, what happens when the input length is not divisible by 3? Padding handles this:
- Input is 1 byte (8 bits): Pad with four zero bits to get 12 bits, which yields 2 Base64 characters. Append
==to indicate 2 bytes of padding were added. - Input is 2 bytes (16 bits): Pad with two zero bits to get 18 bits, which yields 3 Base64 characters. Append
=to indicate 1 byte of padding. - Input is 3 bytes: No padding needed. The output is exactly 4 characters.
For example, encoding "Ca" (only 2 bytes): the binary is 01000011 01100001. We add two zero bits to get three 6-bit groups: 010000 110110 000100, which maps to Q2E=. The trailing = tells the decoder that the last group only contained meaningful data in its first four bits.
The 33% Size Overhead
Every 3 bytes of input become 4 bytes of output. This means Base64-encoded data is always approximately 33% larger than the original binary data. The exact formula is:
output_length = 4 * ceil(input_length / 3)
A 1 MB image becomes roughly 1.33 MB when Base64-encoded. A 10 KB file becomes about 13.3 KB. This overhead is significant and is the primary reason Base64 should not be used as a general-purpose data format. It is a trade-off: you gain compatibility with text-based systems at the cost of increased size.
Common Use Cases
Despite the size overhead, Base64 is invaluable in several scenarios:
- Data URIs: Embedding small images directly in HTML or CSS avoids an extra HTTP request.
data:image/png;base64,iVBORw0KGgo...is a common pattern for icons under 2-3 KB. - Email attachments: MIME encoding uses Base64 to attach binary files to text-based email messages.
- JSON API payloads: JSON does not support binary data natively. When an API needs to include binary content (images, files, certificates), Base64 encoding it into a JSON string is the standard approach.
- JWT tokens: JSON Web Tokens encode their header and payload using Base64url (a URL-safe variant) to create compact, text-safe tokens.
- Basic authentication: The HTTP Basic auth header encodes
username:passwordas Base64. Note that this is encoding, not encryption — it provides zero security on its own. - Storing binary in text fields: When a database column or configuration file only accepts text, Base64 lets you store binary content like encryption keys or certificate data.
The Base64url Variant
Standard Base64 uses + and / characters, which have special meaning in URLs and file paths. The Base64url variant (also defined in RFC 4648) replaces them:
+becomes-(hyphen)/becomes_(underscore)- Padding
=is often omitted
JWTs use Base64url encoding specifically because tokens frequently appear in URLs, HTTP headers, and cookies where +, /, and = would cause problems.
When NOT to Use Base64
Base64 is sometimes misused. Here are cases where you should avoid it:
- Large files: Embedding a 500 KB image as a data URI adds 165 KB of overhead and prevents browser caching. Serve it as a separate file instead.
- Performance-critical paths: Encoding and decoding has a CPU cost. For high-throughput systems processing millions of messages, the overhead adds up.
- Security through obscurity: Base64 is trivially reversible. It is not encryption, not hashing, and not protection. Never use it to "hide" sensitive data.
- When binary transport is available: If your protocol supports binary data natively (WebSockets binary frames, HTTP multipart uploads, Protocol Buffers), use it directly instead of Base64-encoding into text.
JavaScript btoa() and atob() and Their Limitations
JavaScript provides built-in functions for Base64: btoa() encodes a string to Base64, and atob() decodes it. However, they have a significant limitation: they only work with ASCII characters. Passing a string with characters outside the Latin-1 range throws an error.
// This works:
btoa("Hello"); // "SGVsbG8="
// This throws "InvalidCharacterError":
btoa("Hello "); // Error! Emoji is outside Latin-1
// The fix: encode to UTF-8 bytes first
function toBase64(str) {
var bytes = new TextEncoder().encode(str);
var binary = "";
bytes.forEach(function(b) { binary += String.fromCharCode(b); });
return btoa(binary);
}
function fromBase64(b64) {
var binary = atob(b64);
var bytes = new Uint8Array(binary.length);
for (var i = 0; i < binary.length; i++) {
bytes[i] = binary.charCodeAt(i);
}
return new TextDecoder().decode(bytes);
}
The TextEncoder and TextDecoder APIs handle the UTF-8 conversion properly. This pattern ensures that any Unicode string can be safely Base64-encoded and decoded in the browser.
Base64 is a fundamental building block of the web. It is simple, well-specified, and ubiquitous. The key is knowing when the 33% size trade-off is worth the compatibility benefits — and when it is not. For small payloads embedded in text-based contexts, Base64 is exactly the right tool. For large binary data, look for alternatives that keep the data in its native format.