Wide Column Stores (Cassandra)

What is a Wide Column Store?

A distributed database designed for write-heavy, high-availability workloads with simple query patterns. The canonical example is Apache Cassandra.

Key characteristics:

Columns vary from row to row — no NULLs for missing columns, rows only store what they have
Schema-flexible — different rows in the same table can have different columns
Distributed by design — data spread across a cluster via consistent hashing
Tunable consistency — choose between speed and correctness per query

Why Not Just Use Postgres?

At massive write scale, relational databases hit a wall:

Postgres (OLTP)	Cassandra
Single primary for writes	Any node accepts writes
Locks for consistency	No locks — append-only (LSM-based)
Foreign key checks on INSERT	No referential integrity checks
ACID transactions	Eventual consistency (tunable)
Vertical scaling	Horizontal scaling — add nodes

Postgres bottleneck: Every write must acquire locks, check constraints, update indexes, write WAL — all on one machine. At millions of writes/sec, the single primary falls over.

Cassandra's tradeoff: Give up JOINs, aggregations, and referential integrity → get blazing fast distributed writes and high availability.

Flexible Schema — No Wasted NULLs

In Postgres, every row has every column (NULLs for missing ones):

| user_id | name  | email       | phone | twitter | github |
|---------|-------|-------------|-------|---------|--------|
| 1       | Alice | a@mail.com  | NULL  | NULL    | NULL   |
| 2       | Bob   | NULL        | 555.. | @bob    | NULL   |

In Cassandra, rows only store columns they actually have — no NULL, no placeholder:

Row "user-1":  { name: "Alice", email: "a@mail.com" }
Row "user-2":  { name: "Bob", phone: "555...", twitter: "@bob" }

Great for heterogeneous data like log entries, user preferences, IoT sensor readings.

Data Model: Partition Key + Clustering Key

The PRIMARY KEY controls everything

CREATE TABLE orders (
    customer_id UUID,
    order_date TIMESTAMP,
    order_id UUID,
    amount DECIMAL,
    PRIMARY KEY (customer_id, order_date)
);
--          ^^^^^^^^^^^  ^^^^^^^^^^
--          partition     clustering
--          key           key

graph TD
    PK["PRIMARY KEY (customer_id, order_date)"] --> Part["Partition Key: customer_id<br/>hash → which node owns this data"]
    PK --> Clust["Clustering Key: order_date<br/>sort order within the partition"]

    style Part fill:#1a3a4a,stroke:#44aaff
    style Clust fill:#4a3a1a,stroke:#ffaa44

Partition key (customer_id):

Hashed via consistent hashing → determines which node stores this data
All rows with the same partition key live on the same node, same disk partition
This is what makes queries fast — one network hop to the right node

Clustering key (order_date):

Within a partition, rows are sorted on disk by the clustering key
Range queries and ORDER BY are free — data is pre-sorted

PRIMARY KEY patterns

PRIMARY KEY (A)        → partition=A, no clustering. One row per key (KV lookup).
PRIMARY KEY (A, B)     → partition=A, cluster by B. Many rows per A, sorted by B.
PRIMARY KEY (A, B, C)  → partition=A, cluster by B then C. Nested sort.

Example: Query flow

SELECT * FROM orders
WHERE customer_id = 42
ORDER BY order_date DESC
LIMIT 10

Hash customer_id=42 → go to the right node (one network hop)
Data is already sorted by order_date → read last 10 entries (sequential disk read)

No index lookup. No sort step. Just seek and scan.

Query-Driven Table Design

In Cassandra, you design tables around your queries — one table per query pattern.

The problem: looking up by non-partition key

-- This is SLOW — scatter-gather across all nodes!
SELECT * FROM orders WHERE order_id = 'abc-123'

Without customer_id (the partition key), Cassandra can't hash to a specific node. It must ask every node to check → scatter-gather.

Solution: Dual write with a second table

-- Table 1: "All orders for customer X, sorted by date"
CREATE TABLE orders_by_customer (
    customer_id UUID,
    order_date TIMESTAMP,
    order_id UUID,
    amount DECIMAL,
    PRIMARY KEY (customer_id, order_date)
);

-- Table 2: "Look up one order by its ID"
CREATE TABLE orders_by_order_id (
    order_id UUID PRIMARY KEY,  -- partition key only, no clustering
    customer_id UUID,
    order_date TIMESTAMP,
    amount DECIMAL
);

Same data, two tables. App writes to both. Each is optimized for a different query.

Secondary Indexes: LSI vs GSI

An alternative to dual writes: CREATE INDEX ON orders(order_id)

LSI — Local Secondary Index (document-partitioned)

Each node only indexes its own data:

Node 1: local index on order_id → {abc-123, def-456}  (Node 1's data only)
Node 2: local index on order_id → {ghi-789, jkl-012}  (Node 2's data only)
Node 3: local index on order_id → {mno-345, pqr-678}  (Node 3's data only)

A query on order_id still hits every node (scatter-gather).

GSI — Global Secondary Index (term-partitioned)

The index itself is partitioned by the indexed column's value:

Index Node A: order_id a-m → {abc-123 → Data Node 1, ghi-789 → Data Node 2}
Index Node B: order_id n-z → {pqr-678 → Data Node 3}

Query order_id='abc-123' → hash → Index Node A → points to Data Node 1. Two hops, not scatter-gather.

LSI vs GSI Comparison

	LSI (Local / Document-Partitioned)	GSI (Global / Term-Partitioned)
Read	Scatter-gather (slow)	2 hops (fast)
Write	Fast (update local index only)	Slower (must update remote index node)
Consistency	Immediate (same node)	Eventually consistent (async update)
Cassandra	✅ Native support	❌ Not native — use dual write tables
DynamoDB	✅ Both LSI and GSI supported	✅ Both LSI and GSI supported

Which to use?

Approach	Best For
Dual write (separate table)	Hot path queries — single-node lookup, predictable
LSI (secondary index)	Rare, low-frequency queries — avoid extra table overhead
GSI	When available (DynamoDB) — fast reads on non-partition columns

Rule of thumb: Secondary indexes are fine for rare, low-frequency queries. For your hot path — use a dedicated table.

Consistency Model

Cassandra uses tunable consistency — you choose the tradeoff per operation:

Write/Read Consistency Levels:
  ONE     → ack from 1 replica   (fastest, weakest consistency)
  QUORUM  → ack from majority    (balanced)
  ALL     → ack from all replicas (slowest, strongest consistency)

graph LR
    Client["Client Write"] --> Coord["Coordinator Node"]
    Coord --> R1["Replica 1 ✅"]
    Coord --> R2["Replica 2 ✅"]
    Coord --> R3["Replica 3 ⏳"]

    Coord -->|"QUORUM: 2/3 acked<br/>→ success!"| Client

    style R1 fill:#1a4a1a,stroke:#44ff44
    style R2 fill:#1a4a1a,stroke:#44ff44
    style R3 fill:#4a3a1a,stroke:#ffaa44

"Reads may not be immediate" — a write acked by ONE replica may not be visible on another replica yet
"Writes accepted even if some nodes are down" — fault tolerance over consistency

What Cassandra Gives Up vs Gets

Gives Up	Gets
JOINs	Multi-master distributed writes
Aggregations	Horizontal scaling (add nodes)
Referential integrity	Fault tolerance (writes survive node failures)
ACID transactions	Blazing fast writes (LSM-tree, append-only)
Complex queries	High availability (no single point of failure)
Strong consistency (by default)	Tunable consistency per query

When to Use Cassandra

The ideal workload is: write-heavy, distributed, simple queries, can't afford downtime.

Use Case	Why Cassandra Fits
Message inboxes	Billions of messages/day, partition by user, sort by time
Activity feeds	"User X liked post Y" — append-heavy, query by user + time
IoT / sensor data	Millions of devices writing telemetry every second
Messaging apps	Discord uses Cassandra for message storage
Ride-sharing	Uber uses it for driver location updates
Notifications	Write once, read once, partition by user

The pattern:

✅ Write-heavy
✅ Natural partition key (user_id, device_id, driver_id)
✅ Need ordering within a partition (by timestamp)
✅ Can tolerate eventual consistency
❌ Don't need JOINs or complex queries

Real-World Wide Column Stores

Database	Notes
Apache Cassandra	The most widely used. Originally built at Facebook.
Apache HBase	Built on Hadoop/HDFS. Strong consistency (vs Cassandra's eventual).
ScyllaDB	Cassandra-compatible, rewritten in C++ for better performance.
Google Bigtable	The original wide column store (2006 paper). Cloud-managed.
DynamoDB	AWS managed. Not technically wide-column but similar partition/sort key model.

Cassandra's Evolution

Cassandra has gone through a major architectural shift:

Original Model (~2008-2012): Bigtable-Inspired

Modeled after Google's Bigtable paper. Used column families with a Thrift API:

ColumnFamily "users" {
  row_key "user-1" → {
    Column("name", "Alice", timestamp=123),
    Column("email", "a@mail.com", timestamp=124),
  }
}

Each row was a bag of name-value-timestamp triples. Dynamic columns, no schema, raw Thrift RPC.

Modern Model (~2012+): Dynamo-Inspired + CQL

Old (Bigtable-inspired)	New (Dynamo-inspired + CQL)
Column families, Thrift API	`CREATE TABLE` with typed columns
Dynamic columns, no schema	Schema with partition + clustering keys
Raw key-value operations	SQL-like syntax (`SELECT`, `INSERT`, `WHERE`)
Confusing "super columns"	Clean composite PRIMARY KEYs

Key ideas from Amazon's Dynamo paper (2007):

Consistent hashing for partitioning
Tunable consistency (ONE, QUORUM, ALL)
Leaderless replication (any node accepts writes)

Important: Dynamo vs DynamoDB

	Dynamo (paper)	Dynamo (internal)	DynamoDB (AWS)
What	Published paper (2007)	Amazon's internal system	AWS managed service (2012)
Access	Public ideas	Closely guarded	Pay-per-use cloud service
Influence	Cassandra, Riak, Voldemort all borrowed from it	Only Amazon engineers	Different internal architecture

Cassandra is often described as "Bigtable's data model + Dynamo's distribution model." The "wide column" name is a leftover from the old column family days.

Topics for Later

LSM Trees — the storage engine behind Cassandra's fast writes (see Storage Engines)
Consistent Hashing — how partition keys map to nodes (Week 06)