Hash Functions Explained: MD5, SHA-256, bcrypt, and Data Integrity

Key Takeaways

MD5 is broken — collisions computable in seconds; never use for security
SHA-256 is excellent for data integrity but wrong for passwords — it's too fast
bcrypt and Argon2 are designed for passwords — deliberately slow, GPU-resistant
Git uses SHA hashes to create tamper-evident commit chains
The birthday paradox explains why collision resistance requires 2× the bit length you'd expect

What Is a Cryptographic Hash Function?

A hash function takes any input — a single character, a 10 GB video file, or a database — and produces a fixed-size output called a hash, digest, or checksum. Three properties make it cryptographically useful:

Deterministic

Same input always produces exactly the same output. "Hello" always hashes to the same value.

One-way

You cannot reverse a hash to recover the original input — it's computationally infeasible by design.

Avalanche effect

Changing a single bit of input changes ~50% of the output bits. "Hello" and "hello" produce completely different hashes.

MD5: The Algorithm Everyone Still Uses (But Shouldn't)

MD5 was designed by Ron Rivest in 1991 and produces a 128-bit (32 hex character) hash. For most of the 1990s, it was the standard for data integrity and password storage. Then everything changed.

In 2004, cryptographers Xiaoyun Wang and Hongbo Yu demonstrated practical MD5 collision attacks — the ability to create two different inputs that produce the same MD5 hash. In 2008, researchers took this further and created a rogue SSL certificate exploiting MD5 collisions, proving that browser security could be undermined. The attack that ended MD5's credibility completely came in 2012: the "Flame" malware used an MD5 collision to forge a fake Microsoft Windows Update certificate, enabling a sophisticated state-sponsored cyberattack against targets in the Middle East.

Today, MD5 collisions can be computed in seconds on a laptop. NIST deprecated MD5 for all security-sensitive uses over a decade ago. Yet it persists everywhere: developers who learned it in the 2000s still use it, old codebases rely on it, and it appears in millions of tutorials that were never updated.

Still OK to Use MD5 For:

• Checksums for file downloads over trusted channels (non-security verification)
• Non-security deduplication in databases
• Cache keys and hash maps

Never Use MD5 For:

• Password storage
• Digital signatures or certificates
• Secure message authentication (use HMAC-SHA256 instead)
• Any security-critical verification

SHA-2 and SHA-3: The Modern Standards

The SHA-2 family (SHA-256, SHA-384, SHA-512) was designed by the NSA and standardized by NIST in 2001. SHA-256 produces a 256-bit hash — no known collisions exist, and finding one would require approximately 2^128 operations, beyond any conceivable computing power.

SHA-1, the predecessor, was theoretically weakened in 2005 and practically demonstrated in 2017 when Google's "SHAttered" project produced the first SHA-1 collision (requiring ~6,500 CPU-years). SHA-1 is now deprecated everywhere. SHA-256 and SHA-512 remain secure.

SHA-3 (Keccak), standardized in 2015, uses a fundamentally different mathematical construction (a "sponge function") than SHA-2. This diversity is intentional: if a weakness were found in the Merkle–Damgård construction used by SHA-2, SHA-3 would still be secure. No practical weakness in SHA-3 exists.

Password Hashing: Why SHA-256 Is the Wrong Choice

SHA-256 is designed to be fast. On a modern GPU, billions of SHA-256 hashes can be computed per second. This is perfect for data integrity and digital signatures. It is catastrophic for password storage.

When a database of SHA-256-hashed passwords is stolen, attackers can try billions of password guesses per second against every hash in parallel. A 10 million password database can be exhaustively checked against a dictionary of 1 billion common passwords in under a minute.

Algorithm	Speed	Good for Passwords?	Why
SHA-256	Billions/sec	No	Too fast — enables mass brute-force
bcrypt	~100ms	Yes	Intentionally slow, adjustable work factor
scrypt	~100ms	Yes	Memory-hard — especially difficult for GPUs
Argon2id	~100ms	Best	NIST recommended; winner of Password Hashing Competition

How Git Uses Hashing for Tamper-Evident History

Every object in a Git repository — commits, files (blobs), directory trees, and tags — is identified by a SHA hash of its contents. This design creates a naturally tamper-evident data structure.

A commit hash is computed from the commit's content (message, author, timestamp, tree hash) plus the hash of its parent commit. This means that if you change any past commit, its hash changes — and therefore all subsequent commit hashes change. There's no way to silently alter history in a Git repository.

# Every Git commit has a SHA hash:

$ git log --oneline

a1b2c3d Fix login bug

e4f5a6b Add user authentication

c7d8e9f Initial commit

# Changing any commit invalidates all later hashes

# This makes the commit history tamper-evident

Git historically used SHA-1, which has known weaknesses. Git 2.29 (2020) introduced SHA-256 repository support. Migration to SHA-256 by default is ongoing — most repositories still use SHA-1 for backward compatibility.

Try Our Hash Generator

Generate MD5, SHA-1, SHA-256, SHA-512, and other hash values instantly from any text string. Useful for data integrity checks, debugging, and learning how different algorithms handle the same input.

Try the Hash Generator

Generate cryptographic hashes in multiple algorithms from any input text — instantly in your browser.

Open Hash Generator →