Skip to content

Deadlocks

When two transactions each hold a lock the other needs, neither can proceed.


The Classic Example: Money Transfer

Two transfers happening simultaneously, locking rows in different order:

-- Transfer 1: Account 1 → Account 2
BEGIN;
SELECT * FROM accounts WHERE id = 1 FOR UPDATE;  -- ✅ locks row 1
SELECT * FROM accounts WHERE id = 2 FOR UPDATE;  -- ❌ BLOCKED (Transfer 2 holds row 2)

-- Transfer 2: Account 2 → Account 1  (at the same time)
BEGIN;
SELECT * FROM accounts WHERE id = 2 FOR UPDATE;  -- ✅ locks row 2
SELECT * FROM accounts WHERE id = 1 FOR UPDATE;  -- ❌ BLOCKED (Transfer 1 holds row 1)

Neither can proceed. This is a deadlock.

graph LR
    T1["Transfer 1"] -->|"holds lock on"| R1["Row 1"]
    T1 -->|"waiting for"| R2["Row 2"]
    T2["Transfer 2"] -->|"holds lock on"| R2
    T2 -->|"waiting for"| R1

    style T1 fill:#4a1a1a,stroke:#ff6666
    style T2 fill:#4a1a1a,stroke:#ff6666
    style R1 fill:#1a3a4a,stroke:#44aaff
    style R2 fill:#1a3a4a,stroke:#44aaff
sequenceDiagram
    participant T1 as Transfer 1
    participant DB as PostgreSQL
    participant T2 as Transfer 2

    T1->>DB: BEGIN
    T1->>DB: SELECT ... WHERE id=1 FOR UPDATE
    DB-->>T1: Row 1 locked ✅

    T2->>DB: BEGIN
    T2->>DB: SELECT ... WHERE id=2 FOR UPDATE
    DB-->>T2: Row 2 locked ✅

    T1->>DB: SELECT ... WHERE id=2 FOR UPDATE
    Note over T1,DB: BLOCKED (T2 holds row 2)

    T2->>DB: SELECT ... WHERE id=1 FOR UPDATE
    Note over T2,DB: BLOCKED (T1 holds row 1)

    Note over T1,T2: DEADLOCK!<br/>Both waiting for each other.<br/>Neither can proceed.

    DB-->>T2: ERROR: deadlock detected
    Note over T2: T2 is aborted (victim)
    Note over T1: T1 can now proceed

Prevention: Consistent Lock Ordering

The fix is simple: always lock rows in the same order regardless of the operation.

Sort by ID (or any deterministic key). Both transfers lock the lower ID first:

-- Both transfers, regardless of direction:
BEGIN;
SELECT * FROM accounts WHERE id IN (1, 2) ORDER BY id FOR UPDATE;
-- Always locks row 1 first, then row 2
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;

Now if Transfer 2 needs rows 1 and 2, it locks row 1 first. If Transfer 1 already holds row 1, Transfer 2 simply waits -- and Transfer 1 proceeds to lock row 2 without anyone blocking it. No cycle, no deadlock.

sequenceDiagram
    participant T1 as Transfer 1
    participant DB as PostgreSQL
    participant T2 as Transfer 2

    T1->>DB: BEGIN
    T1->>DB: SELECT ... WHERE id IN (1,2)<br/>ORDER BY id FOR UPDATE
    DB-->>T1: Row 1 locked, Row 2 locked ✅

    T2->>DB: BEGIN
    T2->>DB: SELECT ... WHERE id IN (1,2)<br/>ORDER BY id FOR UPDATE
    Note over T2,DB: Waits for Row 1<br/>(T1 holds it, but no cycle!)

    T1->>DB: UPDATE row 1, UPDATE row 2
    T1->>DB: COMMIT
    Note over T1,DB: Locks released

    DB-->>T2: Row 1 locked, Row 2 locked ✅
    T2->>DB: UPDATE row 1, UPDATE row 2
    T2->>DB: COMMIT

Detection: PostgreSQL's Safety Net

Even with careful coding, deadlocks can slip through. PostgreSQL has a built-in deadlock detector as a safety net.

How It Works

PostgreSQL uses a wait-for graph -- a directed graph where each edge means "Transaction A is waiting for Transaction B." When deadlock_timeout expires, PostgreSQL walks this graph looking for cycles (A → B → C → A). If it finds one, it picks a victim (typically the transaction that's done the least work) and aborts it.

This is the same algorithm as cycle detection in a directed graph (DFS with a visited set) -- a classic graph algorithm problem.

Further reading: - PostgreSQL docs: Deadlocks - PostgreSQL docs: deadlock_timeout - PostgreSQL wiki: Lock Monitoring

Two configuration parameters:

Setting What It Does Default
deadlock_timeout How long to wait before checking for a deadlock cycle 1 second
lock_timeout Hard cap on any lock wait, deadlock or not 0 (wait forever)

deadlock_timeout is NOT "abort after this long." It's "wait this long before running the deadlock detector." PostgreSQL thinks: "maybe this is just normal waiting. I'll give it 1 second. If it's still stuck, I'll check for a cycle." If a cycle is found, one transaction is killed. If no cycle, waiting continues.

lock_timeout is the hard limit. Regardless of whether there's a deadlock, if a transaction has been waiting longer than this, it's killed. Useful for preventing any request from hanging indefinitely.

flowchart TD
    W["Transaction waiting for a lock..."] --> DT{"deadlock_timeout<br/>reached? (1s)"}
    DT -->|"No"| W
    DT -->|"Yes"| CHECK{"Run deadlock<br/>detector"}
    CHECK -->|"Cycle found"| KILL["Kill one transaction<br/>(the 'victim')"]
    CHECK -->|"No cycle"| WAIT{"lock_timeout<br/>reached?"}
    WAIT -->|"No"| W
    WAIT -->|"Yes"| ABORT["Abort: lock timeout exceeded"]
    WAIT -->|"No timeout set"| W2["Keep waiting forever..."]

    style KILL fill:#4a1a1a,stroke:#ff6666
    style ABORT fill:#4a3a1a,stroke:#ffaa44

Production Setup

A common configuration:

-- Check for deadlocks after 1 second of waiting
SET deadlock_timeout = '1s';

-- Never wait more than 5 seconds for any lock
SET lock_timeout = '5s';

Two Strategies

Strategy How Who Does It
Prevention Always lock rows in consistent order (sort by ID) Your app code
Detection Deadlock detector kills one transaction after timeout PostgreSQL (automatic)

In practice, you do both -- prevent where you can with consistent ordering, and rely on PostgreSQL's deadlock detector as a safety net for cases you missed.


Deadlock Detection in Application Code

Databases detect deadlocks for you. Application-level threading does not.

Python Go PostgreSQL
Detects deadlocks? No Only if ALL goroutines stuck Yes (active cycle detector)
Prevention tools lock.acquire(timeout=N) context.WithTimeout Consistent lock ordering
On deadlock Hangs silently forever Hangs (or panic if all goroutines stuck) Kills one transaction with error

Python's threading.Lock has no deadlock detection. If two threads deadlock, the program hangs silently. The only tool is acquire(timeout=5) which lets you bail out after a timeout -- but that's a timeout, not detection.

Go's runtime detects when all goroutines are blocked (fatal error: all goroutines are asleep - deadlock!), but if even one goroutine is alive (e.g., an HTTP server), the deadlock among other goroutines goes undetected.

Takeaway: Databases are the only layer that actively detects and resolves deadlocks. In application code, prevention (consistent lock ordering) is your only reliable tool.