Hash Functions Explained: MD5, SHA-256, bcrypt, and Why Choosing Wrong Costs Millions
A hash function transforms any input into a fixed-size fingerprint. It sounds simple, but picking the wrong algorithm has caused national-scale cyberattacks, mass password leaks, and costly data breaches. Here's what every developer needs to understand about MD5, SHA-256, bcrypt, and Argon2.
Key Takeaways
- MD5 is broken — collisions computable in seconds; never use for security
- SHA-256 is excellent for data integrity but wrong for passwords — it's too fast
- bcrypt and Argon2 are designed for passwords — deliberately slow, GPU-resistant
- Git uses SHA hashes to create tamper-evident commit chains
- The birthday paradox explains why collision resistance requires 2× the bit length you'd expect
What Is a Cryptographic Hash Function?
A hash function takes any input — a single character, a 10 GB video file, or a database — and produces a fixed-size output called a hash, digest, or checksum. Three properties make it cryptographically useful:
Deterministic
Same input always produces exactly the same output. "Hello" always hashes to the same value.
One-way
You cannot reverse a hash to recover the original input — it's computationally infeasible by design.
Avalanche effect
Changing a single bit of input changes ~50% of the output bits. "Hello" and "hello" produce completely different hashes.
MD5: The Algorithm Everyone Still Uses (But Shouldn't)
MD5 was designed by Ron Rivest in 1991 and produces a 128-bit (32 hex character) hash. For most of the 1990s, it was the standard for data integrity and password storage. Then everything changed.
In 2004, cryptographers Xiaoyun Wang and Hongbo Yu demonstrated practical MD5 collision attacks — the ability to create two different inputs that produce the same MD5 hash. In 2008, researchers took this further and created a rogue SSL certificate exploiting MD5 collisions, proving that browser security could be undermined. The attack that ended MD5's credibility completely came in 2012: the "Flame" malware used an MD5 collision to forge a fake Microsoft Windows Update certificate, enabling a sophisticated state-sponsored cyberattack against targets in the Middle East.
Today, MD5 collisions can be computed in seconds on a laptop. NIST deprecated MD5 for all security-sensitive uses over a decade ago. Yet it persists everywhere: developers who learned it in the 2000s still use it, old codebases rely on it, and it appears in millions of tutorials that were never updated.
Still OK to Use MD5 For:
- • Checksums for file downloads over trusted channels (non-security verification)
- • Non-security deduplication in databases
- • Cache keys and hash maps
Never Use MD5 For:
- • Password storage
- • Digital signatures or certificates
- • Secure message authentication (use HMAC-SHA256 instead)
- • Any security-critical verification
SHA-2 and SHA-3: The Modern Standards
The SHA-2 family (SHA-256, SHA-384, SHA-512) was designed by the NSA and standardized by NIST in 2001. SHA-256 produces a 256-bit hash — no known collisions exist, and finding one would require approximately 2^128 operations, beyond any conceivable computing power.
SHA-1, the predecessor, was theoretically weakened in 2005 and practically demonstrated in 2017 when Google's "SHAttered" project produced the first SHA-1 collision (requiring ~6,500 CPU-years). SHA-1 is now deprecated everywhere. SHA-256 and SHA-512 remain secure.
SHA-3 (Keccak), standardized in 2015, uses a fundamentally different mathematical construction (a "sponge function") than SHA-2. This diversity is intentional: if a weakness were found in the Merkle–Damgård construction used by SHA-2, SHA-3 would still be secure. No practical weakness in SHA-3 exists.
Password Hashing: Why SHA-256 Is the Wrong Choice
SHA-256 is designed to be fast. On a modern GPU, billions of SHA-256 hashes can be computed per second. This is perfect for data integrity and digital signatures. It is catastrophic for password storage.
When a database of SHA-256-hashed passwords is stolen, attackers can try billions of password guesses per second against every hash in parallel. A 10 million password database can be exhaustively checked against a dictionary of 1 billion common passwords in under a minute.
| Algorithm | Speed | Good for Passwords? | Why |
|---|---|---|---|
| SHA-256 | Billions/sec | No | Too fast — enables mass brute-force |
| bcrypt | ~100ms | Yes | Intentionally slow, adjustable work factor |
| scrypt | ~100ms | Yes | Memory-hard — especially difficult for GPUs |
| Argon2id | ~100ms | Best | NIST recommended; winner of Password Hashing Competition |
How Git Uses Hashing for Tamper-Evident History
Every object in a Git repository — commits, files (blobs), directory trees, and tags — is identified by a SHA hash of its contents. This design creates a naturally tamper-evident data structure.
A commit hash is computed from the commit's content (message, author, timestamp, tree hash) plus the hash of its parent commit. This means that if you change any past commit, its hash changes — and therefore all subsequent commit hashes change. There's no way to silently alter history in a Git repository.
# Every Git commit has a SHA hash:
$ git log --oneline
a1b2c3d Fix login bug
e4f5a6b Add user authentication
c7d8e9f Initial commit
# Changing any commit invalidates all later hashes
# This makes the commit history tamper-evident
Git historically used SHA-1, which has known weaknesses. Git 2.29 (2020) introduced SHA-256 repository support. Migration to SHA-256 by default is ongoing — most repositories still use SHA-1 for backward compatibility.
Try Our Hash Generator
Generate MD5, SHA-1, SHA-256, SHA-512, and other hash values instantly from any text string. Useful for data integrity checks, debugging, and learning how different algorithms handle the same input.
Try the Hash Generator
Generate cryptographic hashes in multiple algorithms from any input text — instantly in your browser.
Open Hash Generator →