CRDTs & Distributed Data Types
Conflict-free Replicated Data Types (CRDTs) are data structures designed to be replicated across multiple nodes such that replicas can be updated independently and concurrently, without coordination, and any divergent replicas will mathematically converge to the same value once all updates have propagated. They enable strong eventual consistency (SEC) without locking, consensus, or a central coordinator — a foundational pattern for collaborative editing, offline-first apps, geo-distributed databases, and edge computing.
Definition and Origin
The canonical formalisation comes from Marc Shapiro, Nuno Preguiça, Carlos Baquero, and Marek Zawirski in two 2011 publications:
- INRIA Technical Report RR-7687 — “A comprehensive study of Convergent and Commutative Replicated Data Types” (January 2011) — the long-form treatment with proofs and a taxonomy of canonical types.
- SSS 2011 (Symposium on Self-Stabilising Systems) — “Conflict-free Replicated Data Types” — the short conference paper introducing the term CRDT.
The intellectual lineage runs back through:
- Bayou (Xerox PARC, 1994) — Terry, Petersen, Theimer, Spreitzer, Demers, Welch. Optimistic replication with anti-entropy, primary-commit with merge procedures and tentative writes — the first system to formalise reconciliation for mobile/disconnected operation.
- Amazon Dynamo (2007) — DeCandia, Hastorun, Jampani, Kakulapati, Lakshman, Pilchin, Sivasubramanian, Vosshall, Vogels. Vector clocks for causal tracking; merge-on-read with sibling resolution; quorum reads (R+W>N) trading consistency for availability.
- Letia, Preguiça, Shapiro (2009) — collaborative editing with WOOT, the immediate predecessor to operation-based CRDTs.
Shapiro et al.’s contribution was to give a mathematical condition — replicas form a join semilattice (idempotent, commutative, associative merge), or operations commute under causal delivery — that guarantees convergence rather than merely making it likely.
Strong Eventual Consistency
SEC is a stronger guarantee than the eventual consistency of Dynamo. The formal statement:
If all correct replicas have received the same set of updates (in any order, with any duplicates), they have equivalent state.
Compare:
- Eventual consistency (Dynamo, DNS) — replicas converge if updates stop, but conflict resolution is application-specific (sibling reads in Dynamo).
- SEC (CRDTs) — replicas converge by construction, no application logic needed. Stronger than eventual; weaker than causal+ or linearizable.
- Causal consistency (COPS, Eiger) — reads reflect a causal cut. Compatible with CRDTs; usually delivered together.
- Linearizability — there is a single total order consistent with real-time. Requires consensus; incompatible with full availability under partition.
The CAP theorem (Brewer 2000, Gilbert-Lynch 2002) says you can have at most two of Consistency, Availability, Partition-tolerance — CRDTs choose AP. PACELC (Abadi 2012) extends this: under Partition you trade A vs C; Else you trade Latency vs Consistency. CRDTs commit to AP and EL: low-latency availability everywhere, paying with weakened semantics.
Bailis et al. (2013) “ALPS” systems — Available, Low-latency, Partition-tolerant, Strong (causal+) — articulated why CRDTs + causal delivery is the practical sweet spot for global apps.
State-Based CRDTs (CvRDTs — Convergent)
Replicas exchange full state. The merge function ⊔ : S × S → S must form a join semilattice:
- Idempotent —
s ⊔ s = s - Commutative —
s₁ ⊔ s₂ = s₂ ⊔ s₁ - Associative —
(s₁ ⊔ s₂) ⊔ s₃ = s₁ ⊔ (s₂ ⊔ s₃) - Monotonically increasing — every local update produces a state that is ≥ the prior state in the lattice order
Given these properties, convergence is automatic: any gossip / anti-entropy protocol that eventually exchanges every pair of states will produce a unique join. Network may reorder, duplicate, or drop messages — merge tolerates all three.
Pro: simple, idempotent over re-delivery. Con: full state is heavy — for a 10 MB document, you ship 10 MB per sync.
Operation-Based CRDTs (CmRDTs — Commutative)
Replicas exchange operations (typed deltas). Operations must commute when applied in any order consistent with their causal history. Requires a causal broadcast layer (vector clocks, version vectors, dotted version vectors) that guarantees:
- Each operation is delivered exactly once to each replica.
- An operation is not applied before its causal dependencies.
Pro: bandwidth is proportional to update size, not full state. Con: requires reliable causal broadcast (vector clocks add metadata; exactly-once is non-trivial over lossy networks).
Causal broadcast primitives:
- Vector clocks (Fidge 1988, Mattern 1989) — one counter per replica; O(N) metadata where N = replicas.
- Version vectors — like vector clocks but pruned to known active replicas.
- Dotted version vectors (Preguiça-Baquero 2010) — efficient for client-server topologies where many clients touch the same key.
- Hybrid Logical Clocks (Kulkarni-Demirbas-Madappa-Avva-Leone 2014) — combine wall-clock + logical to give monotonic, partially ordered timestamps with bounded error vs real time. Used by CockroachDB, MongoDB, YugabyteDB.
Delta-State CRDTs
Introduced by Almeida, Shoker, Baquero (2018) — “Delta state replicated data types”. Instead of shipping full state (CvRDT) or every operation (CmRDT), ship small delta-mutators — incremental state fragments that, when joined into the receiver’s state, produce the same result as joining the full state.
Bandwidth-efficient like op-based, but tolerant of message loss like state-based (delta groups can be re-shipped, merge is idempotent). This is the model most modern production CRDT systems (AntidoteDB, recent Yjs internals, Loro) gravitate toward.
Canonical CRDT Types
Counters
- G-Counter (grow-only) — vector
[c₁, c₂, ..., cₙ], one entry per replica. Replicaiincrementscᵢ. Value isΣ cᵢ. Merge is element-wise max. Trivially a semilattice. - PN-Counter — pair of G-Counters
(P, N); value =Σ Pᵢ − Σ Nᵢ. Supports increment + decrement.
Sets
- G-Set (grow-only set) — add-only, no removal. Merge is set union.
- 2P-Set (two-phase set) — pair
(A, R)of G-Sets for adds and removes (tombstones). Element is present iff inAand not inR. Cannot re-add a removed element. - OR-Set (Observed-Remove Set, Shapiro 2011) — each add carries a unique tag (UUID or replica-counter pair). Remove only affects tags observed at the time of removal. Concurrent add and remove resolves in favour of add. The de facto set CRDT.
- LWW-Set — adds and removes both timestamped; element present iff last operation was an add. Drops concurrent writes.
Registers
- LWW-Register (Last-Writer-Wins) — value + timestamp pair; merge keeps the value with the higher timestamp. Requires a tiebreaker (replica ID) for equal timestamps. Conflict-free, at the cost of silently dropping concurrent writes.
- MV-Register (Multi-Value) — preserves all concurrent values as a set; application chooses how to display (Dynamo-style siblings).
Maps
- Composition of register/set/counter CRDTs keyed by some structure. OR-Map is the analogue of OR-Set for keys with arbitrary CRDT values.
Sequences (for collaborative text editing)
The hardest CRDT problem — preserving insertion intent in a totally-ordered sequence under concurrent edits.
- WOOT (Oster, Urso, Molli, Imine 2006) — Without Operational Transformation. Tombstone-heavy. Foundational but slow.
- Treedoc (Preguiça, Marquès, Shapiro, Letia 2009) — characters in a binary tree, position is a path. Re-balancing is hard.
- Logoot (Weiss, Urso, Molli 2010) — characters have lexicographic position identifiers in
(0, 1); insert between produces a new identifier. Identifiers can grow unbounded. - LSEQ (Nédelec, Molli, Mostefaoui, Desmontils 2013) — Logoot variant with adaptive strategy (boundary± expansion) to bound identifier growth.
- RGA (Replicated Growable Array, Roh, Jeon, Kim, Lee 2011) — operations form a tree of timestamps; linearised by a depth-first traversal. O(N) per insert in naive form; Yjs/Diamond Types use efficient indexes. The dominant modern algorithm.
- Causal Trees (Grishchenko 2014) — closely related to RGA.
Documents (JSON / nested)
- JSON CRDT (Kleppmann-Beresford 2017) — full nested object semantics with concurrent map/list/register operations; preserves user intent across moves, nested edits, deletes.
- Yjs Y.Doc — composable:
Y.Map,Y.Array,Y.Text,Y.XmlFragmentinside a single doc. By Kevin Jahns (since 2015; v13 stable). - Automerge — by Martin Kleppmann + Annette Bieniusa with collaborators; v2.0 released 2023 with a Rust core (
automerge-rs) and TypeScript bindings (@automerge/automerge). Earlier versions were JavaScript-only with significant overhead; v2 made it production-grade. - Diamond Types — by Seph Gentle (ex-Google Wave). High-performance JSON CRDT in Rust; benchmarks show 10-100× faster than Automerge on large edit traces.
- Loro (2024) — Rust CRDT library by the Loro team; competitive perf, rich type system.
Algorithmic Complexity and Garbage Collection
The cost most CRDT systems pay is metadata bloat:
- Tombstones (in 2P-Set, OR-Set, sequence CRDTs) accumulate forever in naive implementations.
- Vector clocks scale with the number of historical replicas; pruning requires consensus on which replicas are dead.
- RGA preserves all deleted nodes; Yjs compacts adjacent operations from the same client into runs.
Garbage collection is hard because: a replica that has been offline for months might still hold operations that depend on tombstones currently considered “safe to delete”. Production systems typically:
- Server-assisted GC — a central coordinator collects “I have seen up to time T” acks from clients, advances a watermark, prunes tombstones older than the watermark.
- Causal stability — define an operation as causally stable once every replica has observed it; safe to compact after that.
- Snapshotting — periodically take a flat-state snapshot, ship that as the new baseline, discard operation history below it. Used by Yjs and Automerge for cold storage.
Algorithmic complexity highlights:
- G/PN-Counter — O(N) state, O(1) op.
- OR-Set — O(adds + removes) state; without GC unbounded.
- RGA — O(N) per insert in naive form; Yjs uses a B-tree-like index to achieve O(log N) for many real-world workloads.
- Yjs document trace replay — ~1-10 ms for 100k edits on modern JS.
- Automerge v2 — comparable; pre-v2 was 100× slower.
Production Deployments
Databases
- Riak Data Types (2014, Basho) — the first major commercial CRDT database. Counters, sets, maps, registers natively. Riak’s CRDT implementation work by Sean Cribbs, Russell Brown, Sam Elliott and the Basho team. (Basho went bankrupt 2017; code preserved as open source; Riak revived by NHS / Bet365.)
- Redis Enterprise CRDB (Redis Labs, 2017+) — geo-distributed Redis with active-active CRDT replication; Counter, Set, Hash, String, Sorted-Set with CRDT semantics under the hood.
- AntidoteDB (SyncFree project, EU FP7) — academic transactional CRDT database from a consortium led by INRIA, including Marc Shapiro’s group. Adds highly-available transactions (HAT) over CRDTs.
- Cassandra / DynamoDB / ScyllaDB — LWW semantics under the hood; effectively LWW-Register at the cell level.
- ElectricSQL — sync layer over Postgres with CRDT-style conflict resolution; founded by Anna McDonald, James Arthur, Valter Balegas.
- PowerSync — SQLite-on-device synced to Postgres backend; CRDT-flavoured conflict resolution.
Collaborative Apps
- Figma — multi-user real-time design tool. Custom server-assisted CRDT-flavoured system; Evan Wallace documented their multiplayer architecture in 2019 — properties are LWW-Registers, structural ops use custom commutative semantics, server is authoritative.
- Notion — block-based editing; CRDT-style sync under the hood with server-coordinated ordering.
- Linear (Linear Sync) — fully offline-capable issue tracker; custom CRDT engine; their engineering team has published several writeups.
- Replicache + Reflect (Rocicorp, by Aaron Boodman) — sync engine over server-authoritative mutations + optimistic local replay; not pure CRDT but related lineage; Reflect (2023) adds real-time presence at the edge.
- Liveblocks — collaboration backend with CRDT primitives (
LiveMap,LiveList,LiveObject) and Yjs integration. - Convex — reactive backend; integrates with Yjs for collaboration features.
- Atlassian Confluence — collaborative editor uses Yjs since 2023.
Real-time / Chat / Presence
- SoundCloud — counters in production for likes/play counts (early Riak deployment).
- League of Legends Twitch chat — millions of concurrent users; custom CRDT-flavoured scalable chat.
- BBC Programmes, TomTom maps, Bet365 sports betting — all early Riak CRDT users.
- Cloudflare Durable Objects — building block for stateful edge apps; community libraries layer Yjs on top.
- Apple Notes (iCloud sync) — internal CRDT-flavoured replication.
CRDT vs Operational Transformation
The historical alternative is Operational Transformation (OT):
- Origins — Ellis-Gibbs 1989 “Concurrency control in groupware systems” (GROVE). Sun and Ellis 1998 is the canonical reference work on OT properties (TP1, TP2 — Transformation Properties).
- Google Wave (2010) popularised OT at scale; the Wave operation transformer was the basis for Google Docs.
- Google Docs, Office 365 Word/Excel/PowerPoint, Etherpad — all use OT, with a central authoritative server that linearises operations.
The CRDT vs OT debate (ongoing since ~2011):
- OT — well-tuned, decades of production experience, requires a central coordinator that transforms incoming ops against pending ones. Hard to prove correct in fully peer-to-peer settings; multiple published “OT” algorithms have been shown to violate TP2 or fail in edge cases.
- CRDT — provably correct by construction (the math gives convergence for free), supports peer-to-peer naturally, but historically higher metadata overhead. Modern algorithms (RGA + Yjs compaction, Diamond Types) have closed the perf gap.
The consensus by ~2024: CRDTs win for offline-first, P2P, edge, and end-to-end encrypted apps. OT remains viable and well-tuned for fully centralised editors (Google Docs, Office). Both are battle-tested.
Causal Consistency Systems
- COPS (Lloyd-Freedman-Kaminsky-Andersen 2011 SOSP) — Clusters of Order-Preserving Servers; causal+ consistency for geo-distributed key-value stores.
- Eiger (Lloyd 2013 NSDI) — extends COPS with write-only transactions and read-only transactions over column-family stores.
- Bolt-on causal consistency (Bailis 2013) — retrofits causal consistency onto any eventually-consistent store via a shim layer.
- Antidote / SyncFree — full transactional causal+ with CRDTs.
- AWS S3 Strong Consistency (2020) — uses chained replication for strong read-after-write; not quite linearizable but very close.
Theoretical Limits
- CAP theorem (Brewer 2000; Gilbert-Lynch 2002 proof) — under network partition you cannot have both availability and strong (linearizable) consistency.
- PACELC (Abadi 2012) — extends CAP: under Partition, choose A or C; Else, choose Latency or Consistency. PA/EL = CRDTs / Dynamo / Cassandra. PC/EC = HBase / spanner-style. PA/EC = MongoDB defaults.
- FLP impossibility (Fischer-Lynch-Paterson 1985) — no deterministic protocol can guarantee consensus in an asynchronous network with even one crash failure. Practical systems use randomisation (Paxos/Raft with timeouts) or assume partial synchrony.
- Quorum math — F+1 replicas tolerate F failures with majority quorum; R+W>N gives strong consistency in Dynamo-style systems if N replicas are reachable.
Hybrid Logical Clocks (HLC)
Kulkarni-Demirbas-Madappa-Avva-Leone 2014 — combine wall-clock time with a logical clock to give:
- Monotonically increasing per process
- Causally consistent across processes
- Bounded skew from real wall-clock time (typically <1 ms with NTP/PTP)
Used as the timestamp source by CockroachDB, MongoDB (4.0+), YugabyteDB. Critical infrastructure for CRDTs running over WANs where clock skew is the dominant practical concern.
Programming Models
- Functional Reactive Programming (FRP) on CRDTs — propagate changes through pure functions over CRDT inputs. Libraries: Yjs’s awareness API, Replicache reactivity, Convex.
- Eventual datalog — query over CRDTs; Bloom language (Conway et al. 2012) gave a calculus for monotonically-increasing computation that composes well with CRDTs.
- Lasp (Christopher Meiklejohn 2014) — Erlang language for distributed CRDT programming; SyncFree research output.
- Riak DT — Erlang library; the reference impl that Riak shipped.
Notable Libraries
- yjs — TypeScript/JavaScript; the production-quality leader. Sub-modules:
y-websocket,y-webrtc,y-indexeddb,y-leveldb. Used by Notion, Atlassian, JupyterLab, many others. - automerge — Rust core + TypeScript bindings; v2.0 made it competitive. Active development by Kleppmann’s group at Cambridge.
- diamond-types — Rust; Seph Gentle. Used by some experimental editors.
- rsdocs / loro — newer Rust CRDT libraries.
- akka-cluster-distdata — Scala / Akka; production-grade for JVM systems.
- lasp — Erlang.
- riak_dt — Erlang reference implementation.
- AntidoteDB — full transactional CRDT DB.
When to Use CRDTs
Good fit:
- Real-time collaborative editing (text, designs, drawings, docs)
- Offline-first mobile / desktop apps that sync on reconnect
- Geo-distributed counters and analytics
- Multi-master active-active databases
- Edge computing where coordination cost is high
Bad fit:
- Strictly ordered operations (financial transactions with overdraft rules)
- Workflows that fundamentally require consensus (e.g., uniqueness constraints, leader election)
- Domains where silently dropping concurrent writes (LWW) loses critical data