Skip to content

SQL Data Types

A practical reference for system design -- which types to use, how much storage they cost, and the gotchas.


Integer Types

Type Size Range Use case
TINYINT 1 byte -128 to 127 (0–255 unsigned) Boolean flags, status codes
SMALLINT 2 bytes -32K to 32K Port numbers, small counters
MEDIUMINT 3 bytes -8M to 8M Medium counters (MySQL only)
INT 4 bytes ±2.1 billion User IDs, most counters
BIGINT 8 bytes ±9.2 quintillion Epoch timestamps, snowflake IDs

Rule of thumb:

  • Counting normal things (users, products)? → INT
  • Timestamps, distributed IDs, massive scale? → BIGINT

String Types

Type Max Size Storage Use case
CHAR(n) 255 bytes Always n bytes (padded) Fixed-length codes ("US", "USD")
VARCHAR(n) 65,535 bytes* Actual length + 1–2 bytes Keys, emails, usernames
TEXT 65 KB Actual length + 2 bytes Blog posts, descriptions
MEDIUMTEXT 16 MB Actual length + 3 bytes Large documents
LONGTEXT 4 GB Actual length + 4 bytes Huge blobs (rarely used)

*VARCHAR max depends on row size limit

Key insight: VARCHAR(255) storing "hello" uses 7 bytes (5 + 2 length bytes), not 255. Only stores what it needs.

graph LR
    subgraph "CHAR(5) storing 'hi'"
        C1["h"] --- C2["i"] --- C3["_"] --- C4["_"] --- C5["_"]
    end
    subgraph "VARCHAR(5) storing 'hi'"
        V0["len=2"] --- V1["h"] --- V2["i"]
    end

    style C3 fill:#4a3a1a,stroke:#ffaa44
    style C4 fill:#4a3a1a,stroke:#ffaa44
    style C5 fill:#4a3a1a,stroke:#ffaa44
    style V0 fill:#1a3a4a,stroke:#44aaff

Rule of thumb: VARCHAR is the default. Use CHAR only for truly fixed-width data. Use TEXT when you can't predict length.


Date/Time Types

Type Size Stores Example
DATE 3 bytes Date only 2025-03-15
TIME 3 bytes Time only 14:30:00
DATETIME 8 bytes Date + time, no timezone 2025-03-15 14:30:00
TIMESTAMP 4 bytes Date + time, stored as UTC epoch 1970 – 2038

The 2038 Problem

TIMESTAMP is stored as a 32-bit integer (seconds since 1970-01-01). It overflows on January 19, 2038.

graph LR
    A["1970-01-01<br/>epoch = 0"] --> B["2025-03-15<br/>epoch = 1,741,034,400"] --> C["2038-01-19<br/>epoch = 2,147,483,647<br/>💥 OVERFLOW"]
    style C fill:#4a1a1a,stroke:#ff4444

Workaround: Store epoch as BIGINT — good for billions of years.

Getting Epoch in SQL

Database Function
MySQL UNIX_TIMESTAMP()
PostgreSQL EXTRACT(EPOCH FROM NOW())::BIGINT

NOW() returns a datetime, not an integer. Can't compare directly with a BIGINT column.


Floating Point Types

Type Size Precision Use case
FLOAT 4 bytes ~7 digits Approximate values (rarely used)
DOUBLE 8 bytes ~15 digits Scientific, coordinates
DECIMAL(m,d) Variable Exact Money! DECIMAL(10,2)99999999.99

Golden rule: Never use FLOAT/DOUBLE for money. Floating point rounding will bite you.

-- FLOAT: 0.1 + 0.2 = 0.30000000000000004  ❌
-- DECIMAL: 0.1 + 0.2 = 0.3                 ✅

Binary & Blob Types

Type Max Size Use case
BINARY(n) 255 bytes Fixed-length hashes, UUIDs
VARBINARY(n) 65 KB Variable binary data
BLOB 65 KB Small files (usually use S3 instead)

Boolean

BOOLEAN is TINYINT(1) under the hood. 0 = false, 1 = true.


Quick Decision Guide

flowchart TD
    A[What are you storing?] --> B{Numbers?}
    A --> C{Text?}
    A --> D{Date/Time?}
    A --> E{Money?}

    B -->|"Counting things"| B1["INT (4 bytes)"]
    B -->|"Timestamps/IDs"| B2["BIGINT (8 bytes)"]
    B -->|"Flags"| B3["TINYINT (1 byte)"]

    C -->|"Fixed width<br/>(country codes)"| C1["CHAR(n)"]
    C -->|"Variable, bounded<br/>(keys, emails)"| C2["VARCHAR(n)"]
    C -->|"Unbounded text"| C3["TEXT"]

    D -->|"Need timezone"| D1["BIGINT epoch<br/>(avoids 2038)"]
    D -->|"Just a date"| D2["DATE"]

    E --> E1["DECIMAL(m,d)<br/>NEVER FLOAT"]

    style E1 fill:#4a1a1a,stroke:#ff4444
    style D1 fill:#1a3a4a,stroke:#44aaff