Understanding SQL Internals: ACID, WAL, MVCC, and Isolation Levels Explained
Introduction
Most engineers know that SQL databases are "ACID compliant." But what does that actually mean under the hood? And more importantly — how much of it is automatic, and how much do you have to do yourself?
I spent some time recently going deep on these internals — partly for interview prep, partly because I work on infrastructure systems where these guarantees matter in production. This post is my attempt to distil that into something genuinely useful: not just definitions, but the why behind each concept and how they connect to each other.
What Is a Transaction?
Before getting into ACID, it helps to be clear on what a transaction actually is.
A transaction is a group of database operations treated as a single unit of work — either all succeed or all fail together. The classic example is a bank transfer:
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 'Alice';
UPDATE accounts SET balance = balance + 100 WHERE id = 'Bob';
COMMIT;
If the second UPDATE fails, the first is rolled back automatically. Alice doesn't lose $100 into the void. Without a transaction wrapping both operations, a crash between them leaves the database in a half-updated state.
This is the foundational insight: the database can only protect what you explicitly tell it belongs together.
ACID — What Each Property Actually Means
Atomicity — "All or nothing"
Atomicity is implemented via the undo log (in Postgres, this is part of the WAL). Every change is recorded before it's applied. If a transaction crashes mid-flight, the database replays the log on restart and rolls back any uncommitted changes.
The key insight: atomicity is not magic — it's a log file that lets the database undo its own work.
Consistency — "DB rules are never violated"
Consistency means the database always moves from one valid state to another. This is enforced through constraints you define:
ALTER TABLE accounts ADD CONSTRAINT no_negative CHECK (balance >= 0);
ALTER TABLE orders ADD CONSTRAINT valid_status
CHECK (status IN ('pending', 'shipped', 'cancelled'));
The database enforces these rules on every transaction. But — and this is important — it only enforces rules you explicitly define. If you don't add a CHECK constraint for negative balances, the database will happily let Alice go into the red. Consistency is partly the DB's job, partly yours as the schema designer.
Isolation — "Concurrent transactions don't interfere"
Isolation is the most nuanced property. It's not binary — it's a spectrum of levels, each offering different guarantees. More on this below.
Isolation is implemented via MVCC — multiple versions of each row coexist so readers and writers never block each other.
Durability — "Committed data survives crashes"
When you COMMIT, Postgres writes to the WAL on disk before acknowledging success. Even if the server loses power immediately after, on restart it replays the WAL and your data is there.
COMMIT → WAL flushed to disk → "OK" returned to client
The acknowledgment only comes after the disk write. That sequencing is the entire durability guarantee.
What's Automatic vs What You Have to Do
This is where most engineers get tripped up. They assume "I'm using Postgres, so I'm ACID compliant." Not quite.
Automatic — always on, you pay the cost whether you want to or not:
- WAL on every write
- MVCC row versioning
- fsync on commit
- Basic constraint enforcement
Manual — you have to do this yourself:
- Wrap related operations in
BEGIN/COMMIT - Choose the right isolation level
- Define your business rule constraints
- Write retry logic for deadlocks and serialization failures
The analogy I keep coming back to: SQL ACID is like a seatbelt. The mechanism works perfectly once buckled — but you still have to buckle it.
WAL — Write Ahead Log
WAL is the mechanism behind both atomicity and durability. The golden rule:
Write to the log first, apply to data second.
Without WAL, a crash mid-write can leave a page half-written on disk — permanently corrupted data. With WAL, the intent is logged before anything is applied. On restart, Postgres reads the log, sees what committed and what didn't, and replays or rolls back accordingly.
WAL writes are sequential appends — the fastest possible disk operation, no seeking required. This is why WAL is fast despite being on disk.
WAL also enables replication: the primary streams its WAL to replicas, which replay it to stay in sync.
MVCC — Multi-Version Concurrency Control
The problem MVCC solves: in a naive locking system, readers block writers and writers block readers. Under heavy concurrency, everyone ends up waiting on everyone else.
MVCC's solution: never update a row in place — create a new version instead.
Every row in Postgres has two hidden columns:
xmin— which transaction created this versionxmax— which transaction deleted this version (0 if still alive)
When Transaction 101 updates Alice's balance from $500 to $400:
| balance | xmin | xmax |
|---------|------|------|
| 500 | 100 | 101 | ← old version, marked deleted
| 400 | 101 | 0 | ← new version, currently alive
Both versions coexist on disk. A transaction that started before 101 committed sees $500. A transaction that started after sees $400. Nobody blocks anybody.
The downside: dead row versions accumulate over time. Postgres runs VACUUM periodically to clean them up and reclaim disk space.
Dirty Reads vs Stale Reads
MVCC always prevents dirty reads — you never see another transaction's uncommitted changes. But it can allow stale reads — data that was real and committed, but has since been updated.
These are fundamentally different:
| Dirty Read | Stale Read | |
|---|---|---|
| Was the data ever committed? | No | Yes |
| Could it disappear via rollback? | Yes | No |
| Prevented by MVCC? | Always | Depends on isolation level |
Isolation Levels
MVCC alone gives you dirty read prevention. Isolation levels control how much more protection you get — and at what cost.
Read Uncommitted
You can read data that hasn't been committed yet. Another transaction's in-progress write is visible to you.
-- Session A
BEGIN;
UPDATE accounts SET balance = 0 WHERE id = 1;
-- not committed yet
-- Session B
BEGIN;
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
SELECT balance FROM accounts WHERE id = 1;
-- sees 0 ← dirty read
-- Session A
ROLLBACK;
-- balance back to 500, but B already acted on 0
Prevents: nothing. Use case: almost never — only for analytics where approximate data is acceptable and speed is critical.
Read Committed (Postgres default)
You only see committed data. But your reads can return different values within the same transaction, because each read sees the latest committed state at that moment.
-- Session B
BEGIN;
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
SELECT balance FROM accounts WHERE id = 1;
-- sees 500
-- Session A commits: balance = 400
SELECT balance FROM accounts WHERE id = 1;
-- sees 400 ← non-repeatable read, same query different result
COMMIT;
This is called a non-repeatable read. If your transaction makes a decision based on the first read and then the value changes before you write, your decision is based on stale data.
Prevents: dirty reads. Allows: non-repeatable reads, phantom reads.
Repeatable Read
Your snapshot is frozen at transaction start. The same query returns the same result every time within your transaction, regardless of what others commit.
BEGIN;
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
SELECT balance FROM accounts WHERE id = 1;
-- sees 500, snapshot frozen
-- Session A commits: balance = 400
SELECT balance FROM accounts WHERE id = 1;
-- still sees 500 ✅ snapshot protected
COMMIT;
But phantom reads are still possible — new rows inserted by other transactions can appear:
SELECT COUNT(*) FROM orders WHERE user_id = 1;
-- sees 5
-- Another transaction inserts a new order, commits
SELECT COUNT(*) FROM orders WHERE user_id = 1;
-- sees 6 ← phantom row appeared
One important subtlety: Repeatable Read gives you a consistent but potentially stale snapshot. In the balance sheet report example — if you read Alice's balance before a transfer and Bob's after, Repeatable Read ensures both reads come from the same frozen snapshot. The total will be internally consistent, but the individual values may not reflect current reality.
Prevents: dirty reads, non-repeatable reads. Allows: phantom reads.
Serializable
The strongest level. Transactions behave as if they ran one at a time, serially. Postgres implements this via SSI — Serializable Snapshot Isolation.
SSI doesn't use traditional locking. Instead it uses SIREAD locks — lightweight read trackers that don't block anyone, but record what each transaction read. At commit time, Postgres checks whether any combination of reads and writes forms a dependency cycle. If it does, one transaction is aborted and must retry.
-- Session A
BEGIN;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT balance FROM accounts WHERE id = 1;
-- sees 500, SIREAD lock recorded
-- Session B commits: balance = 400
-- Session A
UPDATE accounts SET balance = balance - 200 WHERE id = 1;
COMMIT;
-- ERROR: could not serialize access due to concurrent update
-- Session A must retry with fresh data
Critically: Postgres detects the conflict based on dependency relationships, not actual values. It doesn't compare $500 vs $400 — it tracks who read what and who wrote what, and looks for cycles. A transaction can be aborted even if the value didn't change, purely because the read/write ordering is inconsistent.
Prevents: everything. Use case: financial transfers, inventory, anything where correctness is non-negotiable.
Isolation Level Summary
Dirty Read Non-Repeatable Phantom Read
Read Uncommitted ❌ ❌ ❌
Read Committed ✅ ❌ ❌
Repeatable Read ✅ ✅ ❌
Serializable ✅ ✅ ✅
Write Conflicts — How to Handle Them
Understanding isolation levels is only half the picture. The other half is knowing which mechanism to use when writes conflict.
Optimistic Concurrency — version column
Good for low-contention workloads where conflicts are rare:
UPDATE users SET name='Ray', version=version+1
WHERE id=1 AND version=5;
-- 0 rows updated = conflict, someone else wrote first → retry
Pessimistic Locking — SELECT FOR UPDATE
Good for high-contention rows where you need to read then write based on that read:
BEGIN;
SELECT balance FROM accounts WHERE id=1 FOR UPDATE;
-- row is locked, other writers block
UPDATE accounts SET balance = balance - 200 WHERE id=1;
COMMIT;
Atomic Operations
Best when you can avoid the read-modify-write pattern entirely:
UPDATE inventory SET qty = qty - 1 WHERE id=42 AND qty > 0;
-- no separate read needed, DB handles atomicity
When to Use What
| Scenario | Approach |
|---|---|
| Simple counter decrement | Atomic operation |
| Read then apply complex business logic | SELECT FOR UPDATE |
| Multi-table consistency, read-only report | Repeatable Read |
| Highest correctness, willing to retry | Serializable |
Why SQL Can't Match NoSQL Write Throughput
Even if you tune aggressively — lowest isolation level, no explicit locks, synchronous_commit off — SQL has architectural constraints you can't remove:
- WAL always runs. Every write goes through the log. You can make it async, but you can't eliminate it.
- MVCC overhead always runs. Every update creates a new row version. Vacuum has to clean up dead versions.
- Single primary writer. All writes funnel through one node. NoSQL distributes writes across the entire cluster natively.
Tuning can get you from ~10k to ~200k writes/second. Cassandra starts at 500k and scales linearly by adding nodes. The gap never fully closes — it's an architectural difference, not a configuration difference.
The root cause: SQL was designed for correctness first. NoSQL was designed for scale first. ACID guarantees have a cost, and that cost is baked into the engine regardless of how you configure it.
Conclusion
The mental model I find most useful: SQL gives you the ACID infrastructure for free, but you still have to drive it correctly. The database handles crash recovery, WAL flushing, and MVCC versioning automatically. But you have to wrap related operations in transactions, choose the right isolation level, define your business rule constraints, and handle retries in application code.
For most applications, Postgres with Read Committed and well-placed FOR UPDATE locks is more than enough. Serializable is there when you need the strongest guarantee and are willing to pay with occasional retries. And if you're hitting the write throughput ceiling of a single SQL primary — that's when you start looking at distributed SQL (CockroachDB, Spanner) or NoSQL, not before.
Understanding these internals doesn't just help in interviews — it changes how you design systems. Once you know that operational burden is often a design debt bill coming due, you start thinking about failure modes, recovery paths, and consistency requirements from day one, not after the first 3am incident.