Consistency Models — Cross-System Comparison

This note compares the consistency model offered by every database, datastore, and distributed-data system referenced in the Compute library. Each section pins the model on a single axis — linearizability ↓ eventual — then maps real systems to it. Use the tables to look up “what does System X actually promise”; use the decision tree to pick a model for a new workload.

1. The consistency hierarchy

strongest                                                              weakest
  |                                                                       |
linearizable → sequential → causal+ → causal → RYW → MR → MW → SE → eventual
  Spanner       traditional   COPS    Bayou   sessions  monotonic  CRDTs   DynamoDB
  Calvin        single-node   Eiger             reads    writes    Riak    Cassandra
  etcd/Raft     in-mem DB                                                  (default)
  ZooKeeper

Linearizable (Herlihy-Wing 1990) — operations appear atomic, in some real-time order. Strongest.
Sequential (Lamport 1979) — operations appear in some total order, agreed by all processes; does not need to be real-time.
Causal+ (Lloyd-Freedman-Kaminsky-Andersen, COPS, SOSP 2011) — causal consistency + convergent conflict handling. The strongest model achievable under partition + low latency.
Causal (Hutto-Ahamad 1990) — operations that are causally related are seen in causal order; concurrent ops can be observed in different orders.
Read-your-writes (RYW) — a process always sees its own writes (session model).
Monotonic read — successive reads see ≥ previous read’s version.
Monotonic write — process’s own writes are applied in issue-order.
Strong-eventual (SEC) (Shapiro-Preguiça-Baquero-Zawirski 2011) — convergent state given any delivery order (CRDT-style).
Eventual (Vogels 2008 ACM Queue) — if no new writes, all replicas converge.
PRAM / FIFO (Pipelined RAM, Lipton-Sandberg 1988) — writes by one process seen in issue order by all others; concurrent writes can interleave.

2. CAP, PACELC — the framing

CAP (Brewer 2000 PODC keynote; Gilbert-Lynch 2002 proof) — under network partition, a system must choose C (consistency = linearizability) or A (availability = every request answered). Never both.

PACELC (Daniel Abadi 2010) — extends CAP. If a Partition: A vs C. Else (no partition): Latency vs Consistency. The L vs C tradeoff is the practical one — partitions are rare; latency-vs-consistency is paid every single operation.

System	Partition behavior	Else behavior	Classification
Spanner	CP	CC (TrueTime adds bounded staleness)	CP/CC
Cassandra (QUORUM)	AP	EL (eventual on low latency)	AP/EL
DynamoDB (strong)	CP	CC	CP/CC
DynamoDB (eventual)	AP	EL	AP/EL
MongoDB (default w=majority)	CP-ish (configurable)	EC or EL depending on `readConcern`	CP/EC
Cosmos DB	CP/AP per consistency level	configurable (5 levels)	configurable
FoundationDB	CP (strict serializable)	CC	CP/CC
CockroachDB	CP (serializable)	CC	CP/CC
TiDB	CP (snapshot isolation by default)	CC	CP/CC
YugabyteDB	CP	CC	CP/CC
Riak	AP	EL	AP/EL
Aerospike	AP (default) or CP (strong-consistency mode)	EL	AP/EL or CP/CC
Redis (single-node)	n/a	n/a (linearizable single-instance)	linearizable
Redis Cluster	AP	EL (async replication)	AP/EL
Etcd (Raft)	CP	CC (linearizable reads)	CP/CC
ZooKeeper (ZAB)	CP	sequential consistency	CP/CC (sequential not linearizable)

3. Database family → consistency model

Mapping every system in database-internals and database-engine-taxonomy.

3.1 Relational OLTP

System	Default isolation	Strictest available	Replication mode	Notes
PostgreSQL	Read Committed	Serializable (SSI, Kemme-Alonso-Ports 2012)	sync (`synchronous_commit`) or async streaming	SSI is the gold standard for predicate-aware serializability
MySQL InnoDB	REPEATABLE-READ (note: not actual repeatable read — has phantoms)	SERIALIZABLE	async binlog, semi-sync, group replication (MGR, 8.0)	InnoDB’s RR is MVCC w/ snapshot at first read
MySQL Galera (MariaDB Cluster)	SERIALIZABLE per group	linearizable across cluster	synchronous certification	virtually synchronous; group-commit
SQL Server	Read Committed (default)	Serializable	always-on availability groups (sync/async)	also has Snapshot isolation
Oracle	Read Committed	Serializable	Active Data Guard (sync/async)	does NOT have true SERIALIZABLE (SSI-like) historically; only snapshot
SQLite	Serializable (single-writer, single-process)	Serializable	n/a (embedded)	trivially linearizable on disk via WAL
Aurora (Postgres / MySQL)	inherits engine	inherits engine	6/6 quorum storage, 4/6 write, 3/6 read	storage layer is decoupled; compute layer single-master (until Aurora DSQL 2024 multi-region multi-master)

3.2 NewSQL (geo-distributed)

System	Default isolation	Replication	Mechanism
Google Spanner	external consistency (linearizable + bounded staleness)	Paxos across replicas	TrueTime (Marzullo + GPS + atomic clocks, ~7 ms epsilon)
Google Spanner stale reads	bounded staleness (configurable)	Paxos	TrueTime + read timestamp
CockroachDB	serializable	multi-Raft per range	HLC (hybrid logical clock, Kulkarni-Demirbas-Madappa 2014)
YugabyteDB	serializable + snapshot isolation	Raft per tablet	HLC-based
TiDB	snapshot isolation (default)	multi-Raft per region	PD-issued global timestamp via TSO oracle
FoundationDB	strict serializable	Paxos + state-machine replication	resolver-coordinated transaction read-version
Calvin (Yale, Thomson-Diamos-Weng-Ren-Shao-Abadi 2012)	strict serializable	deterministic order across replicas	sequencer pre-orders before execution
Amazon Aurora DSQL (re:Invent 2024)	strong (linearizable)	active-active multi-region	journal-based; per-row Raft-ish

3.3 Key-value / wide-column

System	Default	Strongest available	Replication
DynamoDB	eventually consistent reads	strongly consistent reads (single-AZ)	three-AZ sync; transaction tables for atomic multi-item
Cassandra	tunable (ONE/QUORUM/ALL)	QUORUM read+write → quorum-consistency	gossip + LWT (Paxos-based) for compare-and-set
ScyllaDB	tunable (Cassandra-compatible)	QUORUM, LWT	seastar-based, low-latency Cassandra clone
Riak KV	tunable (R+W>N for “strong”)	eventual; with Dotted Version Vectors (Almeida et al 2014)	active anti-entropy; AAE merkle trees
Aerospike (default AP mode)	eventual	per-record consistency	replication factor; strong-consistency mode (SC) since 4.0 (Paxos)
Aerospike (SC mode)	strict serializable	linearizable single-record read/write	Roster-based; sync replication
etcd	linearizable reads (default) + lease-based watches	linearizable	Raft
ZooKeeper	sequential consistency	sequential + linearizable writes	ZAB
Hazelcast IMap	tunable (read-from-backup vs leader)	linearizable	CP subsystem (Raft) since 3.12
Apache HBase	strong per-row, eventual cross-row	strong per-row	HDFS-backed; HMaster + RegionServer
Bigtable	strong per-row	strong per-row + atomic	Colossus-backed

3.4 Document

System	Default consistency	Sessions / causal	Notes
MongoDB	majority writeConcern (default ≥ 5.0)	causal-consistency session w/ `afterClusterTime`	logical clocks (cluster time)
Couchbase	tunable (per-bucket durability levels: NONE/MAJORITY/MAJORITY-AND-PERSIST-ACTIVE/PERSIST-MAJORITY)	session consistency via N1QL hints	RYW per session
CouchDB	eventual	n/a	revision-tree merge resolution
AWS DocumentDB	strong (Aurora-like storage)	n/a	distinct from Mongo despite wire-compat

3.5 Time-series

System	Consistency	Notes
InfluxDB	eventual (open source)	tags + fields + retention buckets
InfluxDB Enterprise / Cloud	tunable	hinted handoff anti-entropy
TimescaleDB	inherits Postgres (Serializable available)	hypertables on Postgres
Prometheus	local-only (no replication)	scrape-based; Thanos / Cortex / Mimir add replication
Mimir / Thanos / Cortex	configurable	typically eventual via S3 backing
Druid	eventual (segment-level)	indexer + historical pattern
ClickHouse	tunable; default async; `INSERT_QUORUM` for sync	RANS / Atomic / Replicated engines

3.6 Graph

System	Consistency	Notes
Neo4j	strict serializable single-instance, causal-cluster RYW	Raft for causal clustering
TigerGraph	strong per-machine, eventual cluster	partition-tolerant by design
ArangoDB	tunable	RocksDB engine; resilient single-server replication
JanusGraph	depends on backend (Cassandra → tunable; HBase → strong per-row; Bigtable → strong per-row)
Amazon Neptune	strong reads (read replicas eventual w/ option for read-after-write)	shared storage like Aurora

3.7 Vector

System	Consistency	Notes
Pinecone	strong index reads; freshness window for writes	”freshness ~30s by default”
Weaviate	tunable, similar to Cassandra (consistency-level per operation)	Raft-based metadata, async vector replication
Milvus	eventual; bounded staleness option	strong via `BOUNDED` consistency level
Qdrant	eventual; configurable	gossip cluster
pgvector (in Postgres)	inherits Postgres (Serializable available)	depends on cluster setup
Chroma	single-node (linearizable trivially); cluster-mode emerging

3.8 In-memory caches

System	Consistency	Notes
Redis (standalone)	linearizable single-instance	single-threaded command loop
Redis Cluster	AP / eventual	async replication; sentinel for failover
Redis Enterprise	tunable; Active-Active with CRDT semantics	per-key CRDT; “EAC” (eventual active-active consistency)
Memcached	best-effort; client-side hashing	no replication
Hazelcast (CP subsystem)	linearizable	Raft

3.9 Streaming / log

System	Consistency	Notes
Kafka	per-partition total order; configurable producer acks (0/1/all/-1)	ISR-based; `min.insync.replicas` for durability
Kafka transactional	exactly-once across partitions	producer ID + epoch; idempotent producers
Pulsar	per-partition strong; cross-partition eventual	BookKeeper-backed ledger w/ quorum write
RabbitMQ Streams	per-stream strong	mirroring + Raft (since 3.10) for quorum queues
AWS Kinesis	per-shard order	shard iterator; trimming horizon
Redpanda	Kafka-compatible; Raft per partition	low-latency engine, single binary

4. CRDTs — the strong-eventual family

crdts-and-distributed-data-types details these. CRDTs achieve strong-eventual consistency by structuring operations to be:

Commutative — order doesn’t matter (e.g., set-union, max counter).
Associative — grouping doesn’t matter.
Idempotent — applying twice = once.

This means any delivery order converges to the same state. No coordination, no consensus — but the data type’s algebra is constrained.

CRDT	Use	Where it ships
G-Counter (grow-only)	counters	Riak
PN-Counter (positive + negative)	counters supporting decrement	Riak, Redis Enterprise
LWW-Register	last-write-wins single value	Cassandra columns, Cosmos DB session
Multi-Value Register	concurrent writes preserved	Riak’s MVRegister
OR-Set (observed-remove set)	sets w/ add + remove	Riak, Akka Distributed Data, Redis Enterprise
LWW-Element-Set	set w/ add + remove timestamp tie-break	Riak
RGA (Replicated Growable Array)	ordered list	collaborative text (Yjs, Automerge)
Logoot / LSEQ	ordered list	collaborative text
Yjs Y.Doc	tree-of-CRDTs	Liveblocks, Tldraw, Tiptap, JupyterLab
Automerge	document JSON CRDT	Local-first apps
Causal-tree (Grishchenko)	text editing	early CRDT text work

Modern collaborative apps (Figma, Linear, Notion-text, Replit, Tldraw) all use CRDTs for offline-first / multi-user text. The flip side is CRDTs cannot enforce global invariants — you cannot have “bank balance ≥ 0” in pure CRDT without coordination.

5. The “session” subset — practical mid-tier

Session models (Terry et al, “Session Guarantees for Weakly Consistent Replicated Data” 1994) include:

Read-your-writes (RYW) — every read sees the requester’s own previous writes.
Monotonic reads (MR) — successive reads of the same data never go backwards in time.
Monotonic writes (MW) — writes from the same session apply in issue order.
Writes-follow-reads (WFR) — writes always come after the reads they depend on.

These four are strictly weaker than causal but practically sufficient for most user-facing UIs — a user types, sees their own change, refreshes, sees their own change. MongoDB’s “causal consistency” (introduced 3.6, 2017) and Cosmos DB’s “session” consistency are session models in Terry’s sense.

6. Postgres isolation levels — the SQL reality

Postgres explicitly defines four levels but promotes them:

Requested	Actually delivered
Read Uncommitted	Read Committed (Postgres has no dirty reads anyway)
Read Committed	Read Committed (default)
Repeatable Read	Snapshot Isolation
Serializable	Serializable Snapshot Isolation (SSI, Kemme-Alonso-Ports 2012)

synchronous_commit = on/local/remote_write/remote_apply tunes the durability/availability trade-off; remote_apply is the strongest synchronous-replication mode.

7. MySQL InnoDB isolation — the practical gotcha

InnoDB’s “REPEATABLE READ” is not SQL-standard repeatable read — it permits phantoms in certain SELECT…FOR UPDATE patterns. The actually-serializable mode is SERIALIZABLE, which adds shared locks on all reads (high contention). Most MySQL applications run at REPEATABLE READ.

Galera Cluster (MariaDB Cluster) provides virtually synchronous replication via certification — each transaction at commit is broadcast to all nodes and certified against concurrent writes; if certification fails, the transaction aborts. Effectively serializable across the cluster.

8. Consensus protocols — the substrate

Protocol	Used by	Notes
Paxos (Lamport 1998)	Spanner (Multi-Paxos), original implementation everywhere	hard to implement correctly
Multi-Paxos	Spanner, Chubby, Cassandra LWT	optimization over single-decree Paxos
Raft (Ongaro-Ousterhout 2014)	etcd, Consul, CockroachDB, YugabyteDB, TiKV, MongoDB replica sets, Aerospike SC, NATS JetStream, ScyllaDB Raft (since 5.0)	designed for understandability
ZAB (ZooKeeper Atomic Broadcast)	ZooKeeper	sequential consistency only
EPaxos (Egalitarian Paxos, Moraru-Andersen-Kaminsky 2013)	research / experimental	leaderless, lower latency in WAN
Flexible Paxos (Howard-Schwarzkopf-Madhavapeddy-Crowcroft 2017)	derivative tooling	weakens quorum size requirements
Calvin (Yale 2012)	Calvin DB	sequencer pre-orders, deterministic execution
Tendermint BFT (Buchman 2016)	Cosmos, blockchain	Byzantine-fault-tolerant Paxos variant
PBFT (Castro-Liskov 1999)	Hyperledger Fabric, some blockchain	f Byzantine faults tolerated out of 3f+1
HotStuff (Yin-Malkhi-Reiter-Gueta-Abraham 2019)	Diem (formerly Libra), several blockchain	linear view-change cost

consensus-protocols details each.

9. The “external consistency” guarantee (Spanner)

Spanner gives external consistency = linearizability with the additional property that if T1 commits before T2 begins (in real time), then T1’s effects are visible at T2’s snapshot. This is stronger than linearizability when sequences of independent clients run.

The trick: TrueTime — a globally synchronized clock with bounded error ε (~7 ms in Google’s datacenters via GPS + atomic clocks). A commit waits out the uncertainty window so its timestamp is guaranteed to be in the past for any future operation. ε is the price; the win is that read-only transactions can run at a stale timestamp without coordination.

Spanner-style systems without TrueTime use Hybrid Logical Clocks (HLC, Kulkarni-Demirbas-Madappa 2014) — combines physical NTP time with a Lamport-counter tiebreaker. CockroachDB, YugabyteDB, MongoDB, TiKV all use HLC variants.

10. Decision tree — pick a consistency model

What's the workload?
├─ Financial transactions, ledger, bank account, inventory
│    → Linearizable / external consistency
│    → Spanner, CockroachDB, FoundationDB, Aurora DSQL
│    → Trade off: 5–50 ms write latency in geo-distributed setup
├─ E-commerce cart / order placement (single region)
│    → Serializable (SSI)
│    → Postgres, MySQL InnoDB SERIALIZABLE
│    → 1–5 ms writes
├─ Session state, shopping cart (cross-region, low-latency)
│    → Read-your-writes (session model)
│    → Cosmos DB Session, MongoDB causal session, DynamoDB strong+sessioned
├─ Collaborative document editor (Figma, Notion, Linear)
│    → Strong-eventual via CRDT
│    → Yjs, Automerge, Liveblocks
│    → No global invariants — conflicts merge automatically
├─ Social-feed timeline
│    → Eventual
│    → Cassandra, DynamoDB eventual, MongoDB read=local
│    → Sub-ms reads in exchange for stale data window
├─ Time-series ingest (metrics, logs)
│    → Eventual; per-series order
│    → InfluxDB, Prometheus, ClickHouse, TimescaleDB
├─ Distributed lock / leader-election / config
│    → Linearizable
│    → etcd, ZooKeeper, Hazelcast CP subsystem, Consul
├─ Event log / event sourcing source-of-truth
│    → Per-partition total order
│    → Kafka, Pulsar, Kinesis, Redpanda
│    → Use exactly-once semantics if cross-partition matters
├─ Caching layer
│    → Eventual (stale OK by definition)
│    → Redis Cluster, Memcached
├─ Vector search (RAG, embeddings)
│    → Eventual with freshness window
│    → Pinecone, Weaviate, Milvus, Qdrant, pgvector
├─ Multi-region with strong needs
│    → External consistency, accept higher latency
│    → Spanner, CockroachDB, Aurora DSQL
└─ Multi-region with low-latency needs
     → Causal / RYW + per-region linearizable
     → MongoDB causal session, Cosmos DB Bounded Staleness
     → DynamoDB Global Tables (LWW eventual)

11. Anti-patterns — the four common mistakes

Mistaking InnoDB “REPEATABLE READ” for actual repeatable read — phantoms exist. Use SERIALIZABLE or compensate with SELECT…FOR UPDATE.
Trusting DynamoDB “strongly consistent” reads across partitions — strong consistency is per-item only. Cross-item ACID requires TransactWrite / TransactGet.
Assuming MongoDB writes are durable on default config — historically w:1 was default; only w:majority is durable across replica set. As of MongoDB 5.0, default is w:majority, but check.
Using CRDTs for invariants that need coordination — bank balance ≥ 0, unique username, exactly-once seat reservation. CRDTs converge; they do not coordinate. You need a consensus protocol on top.

12. The 2024–2026 frontier

Spanner-class on commodity — CockroachDB 24.x, YugabyteDB 2024+, TiDB 8.x, FoundationDB Apple now ship Spanner-style external consistency on commodity hardware via HLC.
Aurora DSQL (re:Invent 2024) — active-active multi-region Postgres with strong consistency, journal-based.
Distributed SQLite (rqlite, Turso/libSQL, Litestream) — embedded DB scaled out via Raft (rqlite) or log replication (Turso libSQL Server).
ScyllaDB Raft tables (5.x) — tables can now be strongly consistent (Raft-backed) alongside Cassandra-style eventual tables.
Postgres logical replication for active-active — pglogical, Bucardo, BDR (2ndQuadrant / EDB), now Postgres 16 logical decoding improvements.
PostgreSQL serializable snapshot isolation (SSI) — production-proven at scale (Heroku Postgres, Crunchy Bridge, etc.).
CRDTs in production — Figma’s multiplayer system, Liveblocks (Yjs as a service, ~$25M Series A 2024), Replicache (sync layer for local-first), Replit’s collaborative editor.

Adjacent

Math foundations — markov-chains-and-hmm for distributed clock theory; probability-fundamentals for failure-rate modeling underlying availability.
Cryptography — cryptography-fundamentals for the digital-signature primitives underlying BFT consensus.
Distributed systems theory — distributed-systems-fundamentals for FLP impossibility, async/sync model, consensus theorems.
Storage engines — database-internals / databases-internals-deep / database-engine-taxonomy for the engines underneath.
Service architecture — _compare_service-architectures for how service decomposition interacts with consistency boundaries.
CRDTs — crdts-and-distributed-data-types for full coverage.
Microservices — microservices-patterns for saga, event sourcing, outbox, distributed transactions across services.

When to pick what

The fastest narrowing: money / inventory → linearizable; session UI state → causal or RYW; collaborative editing → CRDT (strong-eventual); feed / cache / metrics → eventual; locks / config → linearizable consensus (etcd/ZK). Geography flips the dial — multi-region linearizable pays 50+ ms per write (Spanner, CockroachDB across continents); regional + eventual cross-region is the most common modern pattern. The single biggest cost is the default isolation level — pick deliberately, document it, and test under partition with chaos engineering (Jepsen-style tests, Aphyr/Kingsbury’s work on every system above).

Compendium

Explorer

Consistency Models — Cross-System Comparison

Consistency Models — Cross-System Comparison

See also

1. The consistency hierarchy

2. CAP, PACELC — the framing

3. Database family → consistency model

3.1 Relational OLTP

3.2 NewSQL (geo-distributed)

3.3 Key-value / wide-column

3.4 Document

3.5 Time-series

3.6 Graph

3.7 Vector

3.8 In-memory caches

3.9 Streaming / log

4. CRDTs — the strong-eventual family

5. The “session” subset — practical mid-tier

6. Postgres isolation levels — the SQL reality

7. MySQL InnoDB isolation — the practical gotcha

8. Consensus protocols — the substrate

9. The “external consistency” guarantee (Spanner)

10. Decision tree — pick a consistency model

11. Anti-patterns — the four common mistakes

12. The 2024–2026 frontier

Adjacent

When to pick what

Graph View

Table of Contents

Backlinks