SQL vs NoSQL Design — Compute Reference
1. At a glance
This note is the design-choice companion to database-internals. Where that note covers how databases are built (storage engines, indexes, MVCC, WAL, replication, query planners), this one covers how to pick one.
Database picks are driven by a small set of forcing functions:
- Data model: relational rows, nested documents, graph traversals, time-series, vectors, free text. The shape of your data should map naturally onto the database’s primary abstraction. Fighting the abstraction (graph in SQL, joins in document) is the most common avoidable mistake.
- Consistency vs scalability: ACID transactions, strong consistency, and joins limit horizontal scaling. CAP and PACELC force tradeoffs above a certain scale.
- Read vs write patterns: read-heavy with cache → easy; write-heavy with secondary indexes → hard; mixed OLTP/OLAP → very hard and usually solved by splitting workloads.
- OLTP vs OLAP: point lookups and small transactions versus aggregations across billions of rows. Different storage layouts (row vs column), different engines.
- Latency budget: sub-millisecond (Redis), single-digit ms (Postgres LAN), tens of ms (cross-region SQL), hundreds of ms (warehouse scans).
- Schema flexibility: how often does the shape change, who controls it, can you do migrations.
- Ecosystem maturity: drivers, ORMs, ops tooling, backup, monitoring, hiring pool.
- Cost model: self-hosted vs managed, provisioned vs serverless, on-demand vs reserved.
The modern landscape is overwhelmingly polyglot. The default boring-but-effective stack for new applications in 2026 is roughly Postgres as the system of record, Redis as the cache and short-lived state store, S3-compatible object storage for blobs, optionally a specialized OLAP system (ClickHouse, Snowflake, BigQuery) once analytics outgrows Postgres, and optionally a vector store (pgvector, Pinecone, Qdrant) once you do retrieval-augmented work.
2. The “NoSQL” misnomer
The “NoSQL” label was coined for a 2009 meetup and stuck despite never describing a coherent category. Martin Fowler’s 2011 reframing as “polyglot persistence” was more accurate: applications combine multiple stores, each matched to a workload, rather than choosing a single global database.
Two trends have further blurred the boundary:
- Many NoSQL databases re-added SQL or SQL-like languages. Cassandra has CQL, Couchbase has N1QL, MongoDB has the aggregation pipeline plus SQL connectors, DynamoDB has PartiQL, Elasticsearch has SQL endpoints, ClickHouse and BigQuery are SQL-first despite being column stores.
- Most SQL databases gained first-class JSON support. PostgreSQL jsonb (since 9.4) with GIN indexing is production-grade document storage. MySQL has JSON, Oracle has JSON and a JSON-relational duality view, SQL Server has JSON path support, SQLite has json1.
By 2024 the practical question is less “SQL or NoSQL?” and more “which storage layout, which consistency model, which scaling story, which query surface?” Categorize by data model and operational profile, not by the marketing label.
It also helps to dissolve a few related dichotomies that get conflated with SQL-vs-NoSQL:
- Row vs column storage is orthogonal to SQL. Both ClickHouse (columnar) and SQLite (row) speak SQL; Cassandra (wide-column) and MongoDB (document) do not. Pick layout for the workload (OLTP vs OLAP) and query surface for the team.
- ACID vs BASE is a spectrum, not a binary. Postgres is strongly ACID, DynamoDB is configurable, Cassandra is tunable per query (
CONSISTENCY ONEtoALL), MongoDB transactions exist but cost more than single-document writes. - Schema-on-write vs schema-on-read describes where validation lives, not whether you have SQL. A jsonb column in Postgres is schema-on-read; a MongoDB collection with a strict JSON Schema validator is schema-on-write.
- OLTP vs OLAP vs streaming is a workload distinction. The same dataset often lives in multiple stores tuned for each.
3. Data model taxonomy
A working taxonomy that maps onto real products:
- Relational — rows and columns, foreign keys, normalization, declarative SQL. Best for entities with stable shape and many relationships. Examples: PostgreSQL, MySQL, SQLite, SQL Server, Oracle, DB2, CockroachDB, Spanner.
- Document — JSON or BSON, schemaless (or weakly typed), nested structures, queried by path. Best for hierarchical aggregates. Examples: MongoDB, CouchDB, Couchbase, Firestore, RavenDB, DocumentDB.
- Key-value — opaque value indexed by key, no secondary indexes (or very limited). Best for cache, session, counter. Examples: Redis, Memcached, DynamoDB (in KV mode), etcd, RocksDB.
- Columnar / OLAP — data stored column-by-column rather than row-by-row, compressed, vectorized scans. Best for aggregations across large fact tables. Examples: ClickHouse, DuckDB, Snowflake, BigQuery, Redshift.
- Wide-column / column-family — Bigtable model: rows can have arbitrary, sparse sets of columns grouped into families. Best for high-write, time-ordered, denormalized workloads. Examples: Cassandra, ScyllaDB, HBase, Bigtable, DynamoDB.
- Graph — nodes and edges with properties, traversal-oriented query language. Best for deep relationship queries (3+ hops) and pattern matching. Examples: Neo4j, Neptune, TigerGraph, Memgraph, DGraph.
- Time-series — timestamp-indexed, retention policies, downsampling, time-window aggregations. Examples: InfluxDB, TimescaleDB, Prometheus, VictoriaMetrics, QuestDB.
- Search — inverted index, tokenization, ranking, fuzzy and faceted query. Examples: Elasticsearch, OpenSearch, Solr, Meilisearch, Typesense, Vespa.
- Vector — high-dimensional similarity, ANN indexes. Examples: pgvector, Pinecone, Weaviate, Qdrant, Milvus, LanceDB (see rag-embeddings-vector-search).
- Object / blob — opaque large objects addressed by key, no query beyond list and get. Examples: S3, GCS, Azure Blob, R2, MinIO, Ceph.
- Multi-model — explicitly target several models at once. Examples: ArangoDB (document + graph + KV), Couchbase (KV + document + search), Fauna (document + relational + temporal), SurrealDB, OrientDB.
Real systems often blur categories. PostgreSQL with extensions covers relational, document (jsonb), key-value, time-series (TimescaleDB), full-text, graph (Apache AGE), and vector (pgvector). DynamoDB is sold as key-value but is really a wide-column store with strong KV ergonomics. This is fine; the taxonomy is for thinking, not for purity.
Another axis worth carrying alongside data model is consistency contract:
- Strong / linearizable — every read sees the last completed write (Spanner, etcd, CockroachDB, Postgres single-node, ZooKeeper).
- Sequential / snapshot isolation — common SQL default; transactions see a consistent snapshot but not necessarily the latest committed write across all sessions.
- Read-your-writes — a client sees its own writes; cheap to implement with sticky routing.
- Causal — operations causally related are seen in order (MongoDB causal-consistent sessions, Cosmos DB consistency level).
- Eventual — converges given time; replicas may diverge transiently (DynamoDB default, Cassandra ONE, S3 historically, DNS).
Eventual consistency is fine for caches, feeds, analytics, and most user-generated content. It is dangerous for inventory, billing, account balances, and anything where two readers must agree on a number.
4. Relational (RDBMS)
Still the default for new general-purpose applications, and for good reason: the relational model and SQL remain the most expressive, most teachable, and best-supported way to model and query structured data.
- PostgreSQL — open source, BSD-style license, the dominant choice for new builds in the last decade. MVCC concurrency, rich type system (arrays, ranges, jsonb, hstore, enums, custom domains), powerful indexing (B-tree, hash, GIN, GiST, BRIN, SP-GiST), partial and expression indexes, window functions, CTEs (recursive too), table inheritance, logical and physical replication, declarative partitioning, foreign data wrappers, stored procedures in many languages (PL/pgSQL, PL/Python, PL/V8). Extensions are the killer feature: PostGIS for geospatial, pgvector for embeddings, TimescaleDB for time-series, Citus for sharding, pg_partman, pgaudit, pg_stat_statements, plpgsql_check.
- MySQL / MariaDB — InnoDB row-store with clustered primary key, robust replication (statement, row, GTID), Group Replication for HA, very wide deployment in legacy and CMS stacks (WordPress, Drupal). MariaDB forked in 2009 over Oracle stewardship concerns and has its own column-store (ColumnStore) and storage engines (Aria, TokuDB historically). Both gained JSON columns and CTEs in 5.7 / 10.2.
- SQLite — embedded, single-file, public-domain. One of the most widely deployed pieces of software on Earth: every iOS and Android app, Firefox, Chrome bookmarks, Skype, Adobe, most aircraft avionics. Server-class for low-write workloads when combined with WAL mode and a single writer. Powering Litestream replication and turso/libsql for edge deployments.
- Microsoft SQL Server — enterprise OLTP, columnstore indexes for hybrid HTAP, in-memory OLTP (Hekaton), strong tooling (SSMS, Profiler, Query Store), tight Windows/Active Directory integration. Express is free, Standard is moderately priced, Enterprise is expensive.
- Oracle Database — historically dominant in enterprise, RAC clustering, Data Guard, Exadata, very rich SQL (analytical functions, model clause, hierarchical queries). Licensing is the primary blocker for new adoption.
- IBM DB2 — mainframe (z/OS) and distributed editions. Still the system of record at many banks and insurers.
- NewSQL distributed — engines that preserve SQL and ACID while scaling horizontally:
- CockroachDB — Postgres wire-compatible, Spanner-style with HLC timestamps, geo-partitioning.
- YugabyteDB — Postgres-compatible query layer on top of a sharded Raft-replicated storage layer.
- TiDB — MySQL-compatible, with TiKV row store and TiFlash columnar replicas for HTAP.
- Google Spanner — globally consistent via TrueTime, external consistency guarantee, the original; GoogleSQL and Postgres dialects.
- AlloyDB — GCP managed Postgres with columnar acceleration and tiered storage.
- Aurora DSQL (AWS, GA 2024) — multi-region, active-active, strong consistency Postgres-compatible, journal-based architecture.
- PlanetScale — managed Vitess sharding for MySQL, branching workflow.
- Neon — serverless Postgres with branchable storage, scale-to-zero.
- Supabase — managed Postgres plus a hosted REST/GraphQL/realtime layer and auth.
Strengths: ACID transactions, expressive declarative SQL, joins, mature ecosystem (drivers, ORMs, BI tools), 50 years of accumulated DBA knowledge.
When to choose: complex entity relationships, multi-row transactions, reporting on the same dataset, small-to-medium scale (which covers most apps until you have tens of thousands of QPS or tens of TB). The honest default for new applications is Postgres.
A few operational notes on Postgres specifically, since it is the most likely landing spot:
- Use a connection pooler (PgBouncer transaction mode, RDS Proxy, Supabase pooler). Each connection costs ~10 MB of RAM and a backend process; without a pooler, app autoscalers will exhaust
max_connections. - Set
shared_buffersto ~25% of RAM,effective_cache_sizeto ~75%, tunework_memper-query rather than globally, and useauto_explainto catch slow plans in production. - Vacuum is not optional; long-running transactions block reclamation and cause table bloat. Monitor
pg_stat_progress_vacuumandpg_stat_user_tables.n_dead_tup. - Use logical replication for zero-downtime upgrades and for fanning out CDC. Use physical replication for high-availability standbys.
- Partition large tables by time or tenant before they hit 100 GB; afterwards partitioning becomes a multi-day migration.
- Prefer
jsonboverjsonwhen storing flexible data; index with GIN on the whole document or B-tree on extracted expressions.
5. Document
Document databases store self-describing records, typically JSON or BSON, and let the application embed nested objects and arrays.
- MongoDB — by far the most popular document DB. Managed offering is Atlas (multi-cloud). Rich query language, aggregation pipeline (a Turing-complete dataflow over documents), multi-document ACID transactions since 4.0 (2018), change streams, time-series collections (5.0), vector search (7.0+ with Atlas Vector Search). Sharding by document key (hashed or ranged), replica sets for HA.
- CouchDB — Apache project, multi-master replication is the headline feature. Conflict resolution at the document level. Pairs with PouchDB, a JavaScript implementation of the same protocol, for offline-first browser and mobile apps.
- Couchbase — KV plus document plus full-text plus a SQL-like query language (N1QL, now SQL++). Memory-first architecture, eventing service, mobile sync.
- Firestore (Google Firebase) — serverless document store with real-time listeners, deep mobile/web SDK integration, offline cache. Pricing per read/write/delete. The successor to Cloud Datastore.
- Amazon DynamoDB — sold as key-value but really a wide-column-with-JSON-friendly-API. Predictable single-digit-ms latency at any scale, on-demand or provisioned capacity, point-in-time recovery, global tables, streams for CDC. The single-table design pattern (popularized by Rick Houssain) packs multiple entity types into one table using composite primary keys (partition key + sort key) plus global and local secondary indexes (GSI / LSI).
- RavenDB — .NET-leaning, ACID by default, multi-master, built-in revisions.
- MarkLogic — enterprise XML and JSON, semantic triples, used heavily in publishing and government.
- ArangoDB — multi-model (document plus graph plus KV) with a unified query language (AQL).
- DocumentDB (AWS) — MongoDB API compatibility layer over an Aurora-style storage backend.
Strengths: schema flexibility, nested data is natural and indexable, sharding by document key is simple, denormalized models read fast.
When to choose: rapidly evolving schemas (early product, lots of fields per entity), strongly hierarchical aggregates (a product with variants, a CMS document with embedded blocks), single-document atomicity is sufficient, developer velocity matters more than strict normalization. Real-time mobile sync workloads pair particularly well with Firestore.
Watch out: it is easy to lose referential integrity. Many-to-many relationships are awkward. Aggregations and ad-hoc queries are usually slower than the SQL equivalent. Schemaless does not mean schema-free; you just push the schema into application code.
Document-modeling rules of thumb (MongoDB-flavored, but apply broadly):
- Embed when the embedded child is part of the parent’s identity (order line items in an order), is small and bounded (a user’s preferences), is always read with the parent, and rarely updated independently.
- Reference when the child is large, mutates independently, is shared across many parents (a product referenced from many carts), or grows unbounded (comments on a popular post).
- Cap embedded arrays. Unbounded arrays inside documents are a common cause of slow queries, hot shards, and 16 MB document-size errors. If you can’t bound, externalize.
- Denormalize for read paths, then accept the duplication cost on write. The whole point of documents is to colocate what you read together.
- Pre-aggregate when possible. Counters, sums, and rollups stored in the document are dramatically cheaper than recomputing them on each read.
6. Key-value
Designed for the simplest possible access pattern: get and put by key. Often optimized for memory and latency.
- Redis — the de facto in-memory store. Sub-millisecond on a LAN. Far richer than pure KV: strings, lists, sets, sorted sets, hashes, streams (a log abstraction), geo, bitmaps, HyperLogLog, RedisJSON (typed JSON values), RedisSearch (full-text plus vector), RedisTimeSeries, RedisBloom (probabilistic structures). Persistence via AOF (append-only log) and RDB (point-in-time snapshot). HA via Sentinel, sharding via Cluster. Redis Stack bundles the modules. After the 2024 license change (BSL/SSPL), the project forked into Valkey (Linux Foundation) and KeyDB (Snap-acquired multithreaded fork), both BSD-licensed and broadly drop-in.
- Memcached — older, simpler, multi-threaded, pure cache, no persistence, no replication. Still preferred where you want a no-features-just-cache layer.
- etcd / Consul / ZooKeeper — coordination key-value stores, strongly consistent via Raft (etcd, Consul) or Zab (ZooKeeper). Used as the brain of Kubernetes (etcd), service discovery (Consul), and the original Hadoop ecosystem (ZooKeeper). Not designed for application data; small values, low write rate.
- DynamoDB (as KV mode) — partition-key-only tables behave as a managed distributed KV with predictable latency.
- Embedded engines — used inside larger systems:
- RocksDB (Facebook fork of LevelDB) — LSM-tree, the storage layer for Cassandra/Scylla alternatives, TiKV, CockroachDB (before Pebble), Kafka Streams state stores, MyRocks (MySQL).
- LevelDB (Google) — original LSM library.
- LMDB — memory-mapped B+tree, read-optimized, used by OpenLDAP, Monero, libmdbx.
- WiredTiger — pluggable B-tree and LSM, MongoDB’s default storage engine since 3.2.
- TiKV — the distributed KV layer underneath TiDB; can be used standalone.
- ScyllaDB — wide-column, but offers a Redis-protocol-compatible mode and is often deployed as a KV.
When to choose: cache, session store, rate-limit counters, leaderboards (Redis sorted sets), distributed locks, pub/sub, ephemeral state shared across instances, queues (Redis streams or Lists, or proper brokers).
Redis-specific patterns worth knowing:
- Cache-aside — read-through your app: check Redis, on miss read DB and SETEX with a TTL. The default for most caches.
- Write-through / write-behind — writes go to Redis first, then propagate to the DB. Higher write throughput, more complex consistency.
- Rate limiting — token bucket via Lua scripts (
INCR+EXPIRE) or the officialredis-cellmodule. Avoid round-trips by batching with pipelines. - Distributed locks — single-instance
SET key value NX PX <ttl>is the recommended approach; Redlock across multiple instances is debated (Kleppmann vs antirez) and rarely worth the complexity. For correctness-critical locks, use a real consensus store (etcd, ZooKeeper). - Idempotency keys — store request hashes with a TTL to deduplicate retried writes from clients.
- Leaderboards — sorted sets (ZADD / ZRANGEBYSCORE) give O(log N) inserts and O(log N + M) range queries.
- Pub/sub vs streams — pub/sub is fire-and-forget, no replay; streams provide a log with consumer groups, message acks, and DLQ semantics.
Newer entrants worth tracking: Dragonfly (multi-threaded Redis-compatible written in C++, BSL license), KeyDB, Valkey (Linux Foundation Redis fork), and Garnet (Microsoft Research).
7. Wide-column / column-family
Bigtable’s heritage: a sparse, distributed, persistent, multi-dimensional sorted map. Rows have a primary key; columns are grouped into families; cells can be timestamped and versioned.
- Apache Cassandra (commercial backer DataStax) — leaderless replication, tunable consistency (ONE, QUORUM, ALL, LOCAL_QUORUM for multi-DC), high write throughput (LSM-tree under the hood), CQL as the query language. Particularly strong for time-series and high-cardinality wide rows. No joins.
- HBase — Hadoop-era Bigtable clone, runs on HDFS, strongly consistent within a region. Operationally heavy. Replaced in many places by Cassandra or by managed Bigtable.
- ScyllaDB — Cassandra wire-compatible, rewritten in C++ with shard-per-core and shared-nothing architecture. Typically 5-10x faster on the same hardware, lower tail latency.
- DynamoDB — wide-column in implementation, KV-with-JSON in API. Single-table design heavily exploits the wide-column nature.
- Google Bigtable — managed, the original. Foundation for several Google products (Search, Analytics, Maps). Used directly by external customers via the Cloud Bigtable API.
When to choose: very large scale (hundreds of TB or PB), write-heavy or append-only workloads, time-series with billions of points, multi-datacenter active-active with tunable consistency, you can model around the absence of joins. Cassandra is the canonical choice when you need writes to scale linearly with nodes.
8. Columnar OLAP
Column-oriented storage with compression and vectorized execution. Designed for scans and aggregations across large fact tables.
- ClickHouse (Yandex, now ClickHouse Inc.) — open source, single-node and clustered, vectorized engine that routinely sustains TB/sec scans on commodity hardware. MergeTree storage with sparse indexes. Strong replication and sharding story. ClickHouse Cloud since 2022. Rapidly displacing Redshift and adding pressure on Snowflake for self-hosted analytics.
- DuckDB — embedded analytical database (DuckDB Labs, Mark Raasveldt and Hannes Mühleisen, first release 2019). “SQLite for OLAP.” Single binary, no server, reads Parquet and CSV directly, joins remote files over HTTP. Increasingly competitive with Pandas for in-process analytics. Powers many data tools (Mode, MotherDuck, dbt local execution).
- Snowflake — managed cloud DW, the modern leader. Separates compute (virtual warehouses) from storage (micro-partitions in cloud object storage). Auto-scales compute, near-zero ops, time travel, secure data sharing, Snowpark for Python/Scala/Java.
- Google BigQuery — serverless DW, columnar (Capacitor) storage on Colossus, Dremel-style tree execution. Pay per byte scanned or per slot-hour. Strong streaming ingestion (Storage Write API), BI Engine for sub-second BI, GoogleSQL with ML extensions.
- Amazon Redshift — MPP columnar, classic provisioned model plus Redshift Serverless. Spectrum extends queries to S3.
- Databricks SQL + Photon — Spark + a C++ vectorized engine on top of Delta Lake. Lakehouse positioning.
- Apache Iceberg + Trino / Presto + DuckDB + Athena — open table format (Iceberg, Hudi, Delta) on object storage, queried by federated engines. The “open lakehouse” stack. Iceberg has emerged as the leading format in 2024-2026.
- Apache Druid + Apache Pinot + StarRocks + Apache Doris — real-time analytics databases, low-latency ingestion plus fast slice-and-dice queries; common for user-facing analytics dashboards.
When to choose: aggregations over hundreds of millions to trillions of rows, BI dashboards, ad-hoc analysts running SQL, ETL pipelines feeding marts. Modern stack: ingest into S3-based Iceberg tables, query with ClickHouse / Snowflake / BigQuery, model with dbt, visualize with Looker / Metabase / Superset / Hex.
The lakehouse rotation in 2025-2026 is real. Iceberg has effectively won the open-table-format war over Delta and Hudi for new builds, and Snowflake, Databricks, BigQuery, ClickHouse, DuckDB, Trino, Athena, and Spark all read and write it. The implication: storage in your own object bucket (Parquet files cataloged by Iceberg metadata) becomes vendor-neutral, and compute engines compete on price and performance over the same data. This significantly weakens warehouse vendor lock-in.
dbt patterns worth following: incremental models for fact tables, snapshots for SCD type 2, tests as first-class artifacts (unique, not_null, accepted_values, relationships), exposures to document downstream consumers, sources to track lineage from raw ingestion.
9. Graph
Native graph databases optimize for relationship traversal, where each hop is O(1) rather than a join.
- Neo4j — the market leader. Cypher query language (now also standardized as GQL). Property graph model. Strong tooling, Bloom for visualization, GDS for graph algorithms (PageRank, community detection, similarity).
- Amazon Neptune — managed, supports both property graphs (Gremlin, openCypher) and RDF (SPARQL).
- JanusGraph — open source, pluggable storage (Cassandra, HBase, BerkeleyDB), Gremlin via TinkerPop.
- TigerGraph — closed source, focused on deep-link analytics, GSQL query language, parallel processing.
- DGraph — distributed, GraphQL-first.
- Memgraph — in-memory, Cypher-compatible, real-time analytics.
- ArangoDB — multi-model with first-class graph traversals.
- Apache AGE — Postgres extension that adds openCypher; useful when you want graph queries without a separate database.
- RedisGraph — discontinued in 2023, do not start new projects on it.
When to choose: queries that require 3+ hops, pattern matching (cycles, shortest path), highly connected data, recommendations, fraud rings, knowledge graphs, identity resolution, social networks. For shallow relationships, Postgres recursive CTEs or joins on a well-indexed schema are usually faster and cheaper.
Two practical cautions on graph adoption:
- The graph model is contagious. Once you have a graph store, the temptation is to put more entities into it. Resist; the operational burden, weaker tooling, smaller hiring pool, and lack of strong cross-graph transactions make it a poor system of record. Keep the graph derived from the relational source where possible.
- GraphQL is not a graph database. GraphQL is an API query language; the storage underneath is typically Postgres or DynamoDB. Conflating the two leads to nonsense architectural decisions.
10. Time-series
Optimized for an ordered stream of timestamped values with retention, downsampling, and time-window queries.
- InfluxDB — long-standing leader. Flux query language (v2), back to SQL-like InfluxQL in 3.x which is built on top of Apache Arrow and DataFusion. IOx engine. Cloud Serverless tier.
- TimescaleDB — Postgres extension that adds automatic time-based partitioning (hypertables), compression, continuous aggregates, retention policies. Lets you stay in the Postgres ecosystem.
- Prometheus — pull-based metrics scraping plus a built-in time-series database, designed for monitoring. PromQL is the de facto metrics query language. Single-node by design; long-term storage delegated.
- VictoriaMetrics, Thanos, Cortex, M3DB, Grafana Mimir — Prometheus-compatible storage for HA and long retention. Mimir and VictoriaMetrics are the most actively used in 2026.
- QuestDB — Java-based, SQL with time-series extensions, very fast ingest.
- TDengine — IoT-focused, supports a “supertable” schema per device type.
- Aerospike Time Series — built on Aerospike’s flash-optimized KV.
- ClickHouse — often used directly as a time-series store; the ReplacingMergeTree and AggregatingMergeTree variants and partitioning are well-suited.
When to choose: IoT and sensor data, infrastructure monitoring, application metrics, financial ticks, telemetry, anything with a dominant time axis and append-only ingest. For metrics specifically, Prometheus + Grafana + Mimir is the boring default.
Modeling notes that apply across time-series engines:
- Retention and downsampling. Raw 1-second points for 7 days, 1-minute aggregates for 90 days, 1-hour aggregates for years. Configure as policies in the engine; do not roll it yourself.
- High cardinality is the killer. Each unique label combination is effectively a separate series. Million-cardinality datasets explode Prometheus, InfluxDB, and TimescaleDB equally. Prune labels aggressively; do not use user IDs, request IDs, or trace IDs as labels.
- Pre-compute continuous aggregates (TimescaleDB) or materialized recording rules (Prometheus) for common dashboard queries. Avoid scanning raw data at view time.
- Backfill is hard. Plan ingest topology so late-arriving data lands in the right partition; many engines penalize out-of-order writes.
11. Search
Full-text and faceted search across documents with ranking.
- Elasticsearch (Elastic) and OpenSearch (AWS fork after the 2021 license change) — distributed Lucene-based engines. JSON queries, aggregations, geo, percolator, ML features. The standard for logs (the ELK / OpenSearch stack) and for application search.
- Apache Solr — older Lucene-based engine, still strong in enterprise content management.
- Meilisearch — modern, instant-search focused, easy to operate, typo tolerance out of the box. Targets the “Algolia-like but self-hosted” niche.
- Typesense — similar positioning, focus on developer experience and small footprint.
- Algolia — managed-only, very fast hosted instant search, expensive at scale.
- PostgreSQL FTS (tsvector + tsquery + GIN) — built in, fine for moderate volumes, no separate system to operate. With ParadeDB it gains BM25 ranking and Tantivy-based performance close to Elastic.
- Vespa (Yahoo, open source) — hybrid lexical + vector search, ranking pipelines, particularly strong for large-scale recommender and search systems. Underpowered marketing relative to capability.
- OpenSearch Serverless, Coveo, Lucidworks Fusion — managed search solutions.
When to choose: full-text relevance ranking, fuzzy matching, autocomplete, faceted filtering, log search and analytics, anything where SQL LIKE '%foo%' will scan a table and miss synonyms, stemming, and typos.
Recommendation hierarchy when adding search:
- Start in Postgres with
tsvector,tsquery, GIN indexes, andpg_trgmfor fuzzy matching. Good enough for tens of millions of documents. - Add ParadeDB if you need BM25 ranking inside Postgres without operating a separate cluster.
- Move to Meilisearch or Typesense when you need instant-search UX (sub-50ms typo-tolerant ranked results) but not log search.
- Move to OpenSearch / Elasticsearch when you need both application search and log/metric search, complex ranking, percolation, or aggregations over billions of docs.
- Move to Vespa when you need hybrid lexical + vector + learned-ranking at billion-document scale (Spotify, Yahoo Mail).
12. Vector
High-dimensional similarity, ANN indexes, the substrate for RAG and semantic search. Covered in depth in rag-embeddings-vector-search; in summary the live options in 2026 are pgvector (Postgres extension, default for most teams), Pinecone (managed serverless), Qdrant (open source, Rust), Weaviate (open source, schema-aware), Milvus (open source, large-scale), Chroma (lightweight local), LanceDB (file-format-based), and the vector capabilities in Elastic, OpenSearch, MongoDB Atlas, Redis, and ClickHouse.
13. Schemaless misconception
Every database has a schema. The question is who enforces it and when.
- Database-enforced: SQL DDL, MongoDB JSON Schema validators, DynamoDB item type definitions in tooling, GraphQL schemas.
- Application-enforced: Pydantic (Python), Zod or Joi (TypeScript), Mongoose schemas (MongoDB), Marshmallow, dataclasses, validators in ORMs (Prisma, Drizzle, SQLAlchemy, ActiveRecord).
- Convention-enforced: nothing checks, hopefully someone wrote it down.
Going “schemaless” means choosing application-enforced over database-enforced. That is sometimes correct (early product iteration, very polymorphic data) and often a mistake (you discover a year later that “user.email” sometimes contains an array, sometimes null, sometimes a string of length 4000, because nothing rejected the bad writes).
Recommendation: prefer database-enforced for systems of record; allow flexibility at the edges (jsonb columns, untyped metadata) where the cost of migrations exceeds the benefit of strictness.
14. Modeling patterns
- Relational normalization — Codd’s 1NF through 5NF; in practice aim for 3NF and denormalize judiciously for hot read paths. Foreign keys, check constraints, unique indexes, partial indexes for sparse data.
- Document embedding rules — embed for one-to-few that are accessed together (order line items inside an order). Reference for one-to-many where the many side is large or independently mutated (comments on a popular post). For many-to-many, model as a separate collection or as bidirectional reference arrays with care.
- DynamoDB single-table design — model access patterns first, then derive partition keys and sort keys. Pack multiple entity types into one table using prefix conventions on the sort key. Use GSIs to support alternative access. Read Rick Houssain’s “The DynamoDB Book” before going far.
- Star schema for OLAP (Kimball and Ross, 1996) — fact tables of measurements, surrounded by denormalized dimension tables. Wide and short, optimized for joins-then-aggregate queries. Snowflake schema normalizes dimensions further.
- Slowly-changing dimensions — SCD type 1 overwrites, type 2 keeps history with effective-from/effective-to columns, type 3 stores limited history in extra columns. Type 2 is the default for analytical models.
- Event sourcing — store the sequence of immutable events and project read models. Pairs with CQRS (separate read and write paths). Strong audit story; complex to operate.
- Polymorphic associations — one column referencing rows in multiple tables. SQL doesn’t enforce these; use sparingly and document well. Postgres jsonb with discriminator field is sometimes cleaner.
- Soft deletes —
deleted_atcolumn instead of physical delete. Common, but every query must remember to filter; consider views or partitioning by status. - Multi-tenancy — shared schema (tenant_id column with row-level security), schema-per-tenant, database-per-tenant. Tradeoffs in isolation, ops cost, and noisy-neighbor risk.
- Versioned schemas — store a
schema_versioncolumn on flexible-shape rows; let the app migrate on read. Useful for long-lived document stores where downtime migrations are unacceptable. - Bitemporal modeling — separate transaction-time (when the database knew) from valid-time (when the fact was true in the world). Standard in regulated industries (insurance, banking) and increasingly in modern systems (Crux/XTDB, Datomic, Snowflake time travel + valid-time columns).
- Wide vs tall tables — wide (one row per entity with many columns) is friendly to OLTP and BI; tall (one row per attribute, EAV-style) is friendly to dynamic schemas but slow to query. EAV is almost always the wrong default.
- Surrogate vs natural keys — surrogate keys (auto-increment ID, UUID) are immune to real-world changes; natural keys (email, SSN) are intuitive but break when reality changes. Use surrogate primary keys, expose natural keys via unique constraints.
15. Polyglot persistence patterns
The dominant production shape:
- Postgres as the system of record.
- Redis for cache, session, short-lived state, queues for non-critical work.
- Elasticsearch / OpenSearch / Meilisearch for full-text and log search.
- S3 / GCS / Azure Blob for blobs, file uploads, backups, data lake.
- ClickHouse / Snowflake / BigQuery for analytics.
- pgvector / Pinecone / Qdrant for embeddings.
- Kafka / Kinesis / Pub/Sub for streaming.
Keeping these in sync is the hard problem. Two foundational patterns:
- Outbox pattern — write the domain change and an outbox row inside the same DB transaction; a separate process publishes outbox rows to the message broker. Solves dual-write inconsistency.
- Change data capture (CDC) — read the database’s WAL (Debezium for Postgres/MySQL/SQL Server/MongoDB) and stream changes to Kafka. Downstream consumers update search indexes, warehouses, caches.
Architectural choices that follow:
- Materialized views — derived tables maintained by the database. Cheap, no extra moving parts; consistency lag depends on refresh strategy.
- CQRS — explicit separate write model and read model, often in different stores. Stronger guarantees, more ceremony.
- Read replicas — same DB, lagging copies. Easiest scaling lever; watch out for replication lag breaking read-your-writes.
- Saga pattern — long-running multi-service transactions broken into a sequence of local transactions with compensating actions on failure. Required when distributed transactions are not available; complex to implement correctly.
- Idempotent consumers — downstream services dedupe by event ID; lets you replay or retry safely. Essential for CDC pipelines.
- Schema registry (Confluent, Apicurio) — central catalog of event schemas with backward/forward compatibility checks; prevents producer changes from silently breaking consumers.
- Data contracts — formalize the producer-consumer agreement at the source-system level; check with tools like Great Expectations, dbt tests, or schema registries before data lands in the warehouse.
16. Distributed SQL tradeoffs
Distributed SQL trades latency for scale and consistency. The basic physics: a transaction that touches data on multiple nodes pays at least one network round-trip for consensus.
- Spanner / CockroachDB / YugabyteDB — strong consistency via Paxos or Raft; horizontal scale; transparent SQL. Single-region write latencies typically 5-30 ms; multi-region writes 50-100+ ms unless you pin data to a region. Best for OLTP workloads that need to outgrow a single Postgres while keeping ACID and joins.
- Vitess (PlanetScale) and Citus — shard an existing single-node engine (MySQL or Postgres). Cross-shard transactions are limited or expensive; you pick shard keys carefully.
- Aurora — single-writer with very fast reader fleet via shared storage. Aurora Global gives read replicas in other regions. Aurora DSQL (2024) introduces multi-region multi-writer with strong consistency, journal-based architecture, similar in spirit to Spanner.
- TiDB — MySQL-compatible, separates storage (TiKV) and compute (TiDB), adds columnar replicas (TiFlash) for HTAP.
Honest take: most apps will never need distributed SQL. A vertically scaled Postgres on a modern box (96 vCPU, 768 GB RAM, NVMe) handles tens of thousands of QPS and tens of TB. Move to distributed SQL when you have demonstrated the bottleneck and have ops capacity to operate it (or pay for the managed offering).
Comparison of the major distributed-SQL contenders for new projects:
- Spanner — GoogleSQL plus Postgres dialect; strong consistency via TrueTime; multi-region writes; managed only (GCP).
- CockroachDB — Postgres-compatible; strong consistency via HLC and Raft; multi-region writes with added latency; self-host or managed Cloud.
- YugabyteDB — Postgres-compatible; strong consistency via Raft; multi-region writes; self-host or managed.
- TiDB — MySQL-compatible; strong consistency via Percolator-style transactions; limited multi-region writes; self-host or TiDB Cloud.
- Aurora DSQL — Postgres-compatible; strong consistency; multi-region writes (GA 2024); managed only (AWS).
- Vitess (PlanetScale) — MySQL-compatible; eventual consistency cross-shard; multi-region reads only; self-host or PlanetScale.
- Citus — Postgres-compatible; eventual consistency cross-shard; single-region writes; self-host or Azure Cosmos for PostgreSQL.
Pricing varies wildly; for hobby and small-team workloads the managed serverless tiers (CockroachDB Serverless, TiDB Serverless, Aurora DSQL) are dramatically cheaper than self-hosted distributed clusters.
17. HTAP — hybrid transactional + analytical processing
Running OLTP and OLAP on the same data, traditionally split into operational and warehouse stores joined by nightly ETL.
- TiDB + TiFlash — row store (TiKV) plus column replicas (TiFlash) populated via Raft learner; the optimizer chooses.
- SingleStore (formerly MemSQL) — row and columnstore in one engine.
- Snowflake Hybrid Tables (2023) — managed transactional row-store alongside the analytical column-store; same SQL surface.
- PostgreSQL with columnar extensions — Citus columnar, Hydra, pg_analytics, pg_mooncake. Approaches HTAP without leaving Postgres.
- MySQL HeatWave (Oracle, on OCI and AWS) — in-memory columnar accelerator for MySQL.
- AlloyDB — Postgres-compatible with columnar engine and tiered storage on GCP.
HTAP shines when latency from OLTP to analytics matters (real-time dashboards, fraud detection, personalization). For traditional reporting, dedicated warehouses are still cheaper and faster.
18. Cloud-native serverless
Pay-per-request, scale-to-zero, no provisioned capacity to manage.
- Aurora Serverless v2 — scales between configured ACU min/max; fractional ACUs; works with most Aurora features.
- Neon — serverless Postgres, branchable storage (copy-on-write branches in seconds), bottomless storage, scale-to-zero compute.
- Supabase — managed Postgres (non-serverless compute by default) with hosted PostgREST, GoTrue auth, realtime, edge functions.
- PlanetScale — managed Vitess MySQL with database branching workflow.
- Fauna — globally distributed serverless document/relational, FQL or GraphQL.
- Firestore — Google’s serverless document DB.
- DynamoDB on-demand — pay per request, no capacity planning.
- MongoDB Atlas Serverless — pay-per-operation MongoDB.
- TiDB Serverless / Cloud — usage-based TiDB.
- Cloudflare D1 (SQLite at the edge) and Turso / libsql (distributed SQLite with replicas at the edge) — emerging edge-database category.
Serverless wins for spiky and unpredictable workloads, prototypes, branch-per-PR development environments, and multi-tenant SaaS where most tenants are idle. Loses for steady-state high-throughput where reserved capacity is cheaper.
Things that look serverless but are not quite:
- Aurora Serverless v1 was a different (now-deprecated) architecture; v2 is the supported path.
- “Serverless” Snowflake warehouses still bill per second of running compute and have a minimum suspend timeout.
- BigQuery on-demand is serverless from your perspective but provisioned slots are often cheaper above a few TB scanned per day.
- DynamoDB on-demand is true scale-to-zero on capacity, but you still pay storage and global-table replication.
19. Cost reality (2026)
Rough order-of-magnitude figures for sizing:
- Self-hosted Postgres on a $40-80/month VM handles ~100 GB and ~1k QPS comfortably; with NVMe storage and tuning, much more.
- Aurora Serverless v2: ~0.10 per GB-month storage plus 50-150/month; a busy one easily $1k+.
- DynamoDB on-demand: ~0.25 per million reads, ~$0.25 per GB-month storage. Predictable for small apps, can blow up at scale without reserved capacity or single-table design.
- Snowflake: ~23/TB-month. Costs concentrate in poorly written queries that scan more than they should.
- BigQuery: $6.25 per TB scanned on-demand (US multi-region), or flat-rate slots. Storage tiers active vs long-term. Partitioning and clustering are essential for cost control.
- Neon: free tier 0.5 GB storage + 100 hours compute; Launch tier from 69/month, plus usage above included limits.
- ClickHouse Cloud: dev services from a few dollars per month, production typically scales with active compute hours plus storage.
- MongoDB Atlas: M0 free, M10 dedicated from ~$60/month, scales with instance size and storage.
- Redis Cloud / ElastiCache: small in-memory instances $30-100/month; serverless ElastiCache priced per ECPU and GB-hour.
Self-hosted is always cheaper at steady state if you have ops capacity. Managed is almost always cheaper if you do not, once you factor in pager rotation, backup verification, version upgrades, security patching, and lost engineering time.
20. Pitfalls
- Choosing NoSQL prematurely “because it scales” when single-node Postgres would have served the entire lifetime of the product.
- Joining at the application layer because “joins don’t scale,” when the database join would be faster and consistent.
- Adopting a document database without modeling access patterns; the resulting collections require expensive scans or many indexes.
- Using DynamoDB with naive one-table-per-entity modeling, paying for GSIs and high RCU/WCU instead of designing the single-table layout.
- Storing strongly relational data in a graph database; OLTP throughput suffers and operational tooling is sparse.
- Putting time-series data into a generic OLTP table without partitioning; queries slow as the table grows, vacuum and autovacuum struggle.
- Using
LIKE '%substring%'for search; no B-tree index applies and the engine scans every row. - Letting the search index be the system of record; Elasticsearch can lose documents under certain failure modes and is not designed for primary storage.
- Storing blobs in the database as
byteaorBLOBcolumns; bloats backups, slows replication, ruins page cache. Use object storage and store the URL. - Ignoring connection pooling on Postgres (use PgBouncer or RDS Proxy); connections are expensive and the server has a hard limit.
- Letting ORMs generate N+1 queries; profile in dev and use eager loading or batching.
- Sharding too early; sharding is operationally heavy and constrains future query patterns.
- Reaching for distributed SQL when a managed read-replica fleet would have sufficed.
- Building “polyglot persistence” with five stores from day one; start with one or two and grow into the rest.
- Treating the cache as the system of record. Caches lose data on eviction; if the database is the only source, a cache miss must be cheap. If the cache holds data not in the database, you have built a fragile second source of truth.
- Mixing OLTP queries and analytical scans on the same Postgres without read replicas or a warehouse; long-running analytics block autovacuum and contend for shared buffers.
- Forgetting timezone handling. Store timestamps as
timestamptz(Postgres),TIMESTAMP WITH TIME ZONE(standard), or ISO-8601 strings with explicit offset.timestampwithout zone has bitten every team. - Choosing UUIDs as primary keys without considering index locality. Random UUIDs (v4) thrash B-tree caches; prefer UUIDv7 or ULID (time-sortable) for write-heavy tables.
- Building application-layer encryption without key management; rotation, audit, and recovery are the hard parts.
- Skipping backups, point-in-time recovery, or backup restore drills. An untested backup is not a backup.
- Trusting “managed” to mean “zero ops.” You still need to size, monitor, set alerts, manage versions, plan upgrades, and rehearse failover.
21. Decision tree
Quick answers, then validate with database-internals:
- Transactions across multiple entities, joins, expressive queries → PostgreSQL. At very large scale or multi-region writes, Spanner, CockroachDB, or YugabyteDB.
- Hierarchical or schemaless aggregates, mobile sync, rapidly evolving shape → MongoDB or Firestore. For AWS-native serverless with extreme scale, DynamoDB with single-table design.
- Sub-millisecond cache, session store, counters, leaderboards, simple shared state → Redis (or Valkey / KeyDB / Dragonfly).
- Aggregations over hundreds of millions of rows, BI dashboards, ad-hoc analyst SQL → ClickHouse for self-hosted, Snowflake or BigQuery for managed. DuckDB for single-node analytics or notebooks.
- Deep relationship traversal (3+ hops), pattern matching, recommendations, fraud rings → Neo4j or Memgraph; Apache AGE if you want to stay in Postgres.
- Time-series metrics for infrastructure monitoring → Prometheus + Grafana + Mimir. For IoT or financial ticks, TimescaleDB or InfluxDB; ClickHouse if you also want broad analytics on the same data.
- Full-text search with ranking, fuzziness, facets → Elasticsearch / OpenSearch for the rich-feature path, Meilisearch or Typesense for the simple instant-search path. Postgres FTS or ParadeDB if you want to stay in one system.
- Vector similarity for RAG and semantic search → pgvector if Postgres is already your DB, Qdrant or Pinecone for dedicated systems. See rag-embeddings-vector-search.
- Binary files, images, videos, backups, data lake → S3 or compatible (R2, GCS, Azure Blob, MinIO).
- Very high-write append-mostly workload across many nodes → Cassandra or ScyllaDB.
- Coordination, leader election, config, service discovery → etcd or Consul.
- Edge-deployed read-heavy workloads → Turso / libsql, Cloudflare D1, or Fly’s LiteFS.
- Multi-model from day one for prototyping → ArangoDB, Couchbase, or SurrealDB; otherwise stick with the boring polyglot stack.
If unsure: start with Postgres. Add Redis when you need a cache. Add object storage when you need files. Anything beyond that requires evidence.
A complementary decision framework when several options seem plausible:
- What is the access pattern? Write it out as a list of queries with expected QPS, latency budget, and result size. The database falls out of the patterns more reliably than out of “type of data.”
- What is the consistency requirement per pattern? Some queries need strong consistency (account balance), most do not (timeline). Different patterns can target different stores.
- What is the failure mode you can tolerate? A query that returns stale data is different from one that errors. A write that succeeds locally but is lost in a region failover is different from one that returns success only after replication.
- Who operates this? Managed service plus a small team beats self-hosted distributed system plus a small team, every time.
- What does the migration path look like in 2 years? A choice that locks data into a proprietary format with no export path (some early Firebase, some closed graph DBs) is a higher long-term cost than a marginally less convenient open-source equivalent.
- What is the team familiar with? Tooling and skills are real costs. Picking Postgres because every engineer knows it is a legitimate technical reason.
22. Cross-references
- database-internals — how storage engines, indexes, MVCC, WAL, and replication actually work.
- rag-embeddings-vector-search — vector databases and retrieval-augmented generation in depth.
- distributed-systems-fundamentals — CAP, PACELC, consensus, and the physics behind the distributed-SQL tradeoffs above.
- consensus-protocols — Raft, Paxos, the algorithms underneath CockroachDB, etcd, Spanner.
- microservices-patterns — outbox, CQRS, saga, and how polyglot persistence fits into service decomposition.
- observability-stack — Prometheus, metrics, logs, traces; many database choices interact with the observability stack.
- _index — the Compute domain index.
23. Citations
- Kleppmann, Martin. Designing Data-Intensive Applications. O’Reilly, 2017. The single best book on this topic; reads as a tour of the entire taxonomy with the underlying mechanisms exposed.
- Fowler, Martin. “PolyglotPersistence.” martinfowler.com, 2011. The blog post that gave the modern stack its name.
- Kimball, Ralph and Ross, Margy. The Data Warehouse Toolkit, 3rd ed. Wiley, 2013. Star schema, slowly changing dimensions, the conceptual foundation of analytical modeling.
- Codd, E.F. “A Relational Model of Data for Large Shared Data Banks.” CACM 13(6), 1970. The paper that started the field.
- Brewer, Eric. “Towards Robust Distributed Systems.” PODC keynote, 2000; “CAP Twelve Years Later” IEEE Computer, 2012. The CAP theorem and its refinement.
- Corbett, James et al. “Spanner: Google’s Globally-Distributed Database.” OSDI 2012. TrueTime, external consistency, the model many NewSQL systems followed.
- DeCandia, Giuseppe et al. “Dynamo: Amazon’s Highly Available Key-value Store.” SOSP 2007. The paper behind DynamoDB and the leaderless architecture inherited by Cassandra and Riak.
- Vogels, Werner. AWS re:Invent keynote series; in particular the DynamoDB and Aurora sessions over the last decade — the most accessible architecture explanations.
- Dageville, Benoit et al. “The Snowflake Elastic Data Warehouse.” SIGMOD 2016. Compute/storage separation explained by its architects.
- ClickHouse team. “ClickHouse: Lightning Fast Analytics for Everyone.” VLDB / SIGMOD-adjacent paper, 2024. The engine explained.
- Chang, Fay et al. “Bigtable: A Distributed Storage System for Structured Data.” OSDI 2006. The wide-column blueprint.
- Lakshman, Avinash and Malik, Prashant. “Cassandra: A Decentralized Structured Storage System.” LADIS 2009.
- Raasveldt, Mark and Mühleisen, Hannes. “DuckDB: an Embeddable Analytical Database.” SIGMOD 2019.
- Stonebraker, Michael and Cetintemel, Ugur. “‘One Size Fits All’: An Idea Whose Time Has Come and Gone.” ICDE 2005. The intellectual case for polyglot persistence.
- Helland, Pat. “Life beyond Distributed Transactions: an Apostate’s Opinion.” CIDR 2007. The case for designing around distributed transactions rather than against them.
- Houssain, Rick. The DynamoDB Book, 2020. The definitive guide to single-table design.
- Petrov, Alex. Database Internals. O’Reilly, 2019. Companion reading to Kleppmann, more focused on storage and replication mechanisms.
- Abadi, Daniel. “Consistency Tradeoffs in Modern Distributed Database System Design.” IEEE Computer, 2012. The PACELC reformulation of CAP — useful for thinking about latency in the non-partition case.
- Gilbert, Seth and Lynch, Nancy. “Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-tolerant Web Services.” SIGACT News, 2002. The CAP proof.
- Hellerstein, Joseph and Stonebraker, Michael (eds). Readings in Database Systems (“The Red Book”), 5th ed, 2015. Free online; curated tour of the foundational and modern papers.
- Vohra, Deepak (ed). Apache Cassandra Definitive Guide, 3rd ed, O’Reilly 2020.
- Plugge, Membrey, Hawkins. MongoDB Definitive Guide, 3rd ed, O’Reilly 2019.
- Carlson, Josiah. Redis in Action. Manning, 2013. Dated in places but still the best on Redis data structures.
- Vitess project documentation, vitess.io. Long-form architectural docs on sharding MySQL at scale.
- TimescaleDB documentation and blog. Practical material on time-series modeling in a relational world.
- Apache Iceberg specification and blog. The open table format specification and the rationale behind partitioning, schema evolution, and time travel.
- Lakehouse and OneTable / XTable initiatives — track the convergence of Iceberg, Hudi, and Delta in 2025-2026.
- AWS, GCP, Azure database service documentation. Vendor docs are unusually good for managed services; read the operational sections (limits, quotas, failover semantics, backup model) before committing.
24. Working appendix: cheat sheet by workload
A condensed lookup table for common scenarios, intended as a starting point rather than a verdict:
- SaaS application backend, multi-tenant, < 100 GB per tenant → Postgres with
tenant_idcolumn and row-level security; Redis cache; S3 for uploads. - High-traffic e-commerce catalog and orders → Postgres for orders and inventory; Elasticsearch or Algolia for catalog search; Redis for cart and session; ClickHouse for analytics; S3 for product images.
- Mobile app with offline sync → Firestore or CouchDB/PouchDB; fall back to Postgres + custom sync if you outgrow it.
- IoT fleet telemetry, millions of devices → TimescaleDB or ClickHouse for the time-series; Postgres for device metadata; Kafka for ingest.
- Application logs and traces → OpenTelemetry collectors into Loki / OpenSearch / ClickHouse (logs), Tempo / Jaeger (traces), Prometheus + Mimir (metrics). See observability-stack.
- Real-time leaderboard, gaming → Redis sorted sets, periodic snapshot to Postgres for history.
- Recommendation system → Postgres or warehouse for source-of-truth events; Spark / Flink for feature engineering; Redis or DynamoDB for feature serving; vector store for candidate retrieval.
- Knowledge graph / semantic search → Neo4j or Memgraph for graph queries; vector store (pgvector, Qdrant) for embeddings; Postgres for source documents.
- Financial ledger, double-entry → Postgres with serializable isolation, append-only event tables, materialized balances; do not use eventually consistent stores for money.
- Audit log / compliance → Append-only Postgres table or dedicated immutable store (QLDB, Immudb); replicate to object storage for retention.
- Multi-region active-active → Spanner, CockroachDB, YugabyteDB, Aurora DSQL. Accept the latency tax.
- Edge-served content with low write rate → Cloudflare D1, Turso, KV at the edge; refresh from origin Postgres.
25. Operational checklist before adopting a new data store
Whenever you propose adding a new store to the stack, work through:
- Failure modes — what happens on a single-node crash, on a regional outage, on a network partition, on a disk fill? What does the application see?
- Backup and restore — how do you back it up, how long does a full restore take, have you actually restored it into a test environment in the last 90 days?
- Upgrades — major version upgrade path, downtime required, rollback story. Some engines (Postgres major upgrades, Cassandra) require careful planning; others (managed serverless) hide it from you.
- Observability — what metrics does it expose, do they integrate with your existing monitoring (Prometheus, Datadog, New Relic, OpenTelemetry), can you alert on slow queries and capacity?
- Capacity planning — at what point do you scale up, scale out, shard, or move to a different engine? What does the cost curve look like at 10x and 100x current scale?
- Access control and audit — IAM integration, fine-grained RLS, audit logging, encryption at rest and in transit, secret rotation.
- Disaster recovery — RPO (how much data can you lose?) and RTO (how long can you be down?). Different stores have very different defaults; replicas alone are not DR.
- Vendor lock-in and exit plan — can you export your data in a portable format, can you migrate to a competitor in weeks, are there proprietary APIs that constrain future moves?
- Cost ceilings — set hard caps in cloud billing alarms; serverless databases with unbounded autoscale have caused real seven-figure incidents.
- Compliance — data residency, GDPR / CCPA / HIPAA / SOC 2 requirements, deletion guarantees, right-to-be-forgotten flows.
- Team readiness — does at least one engineer understand the engine deeply enough to debug a production incident? If not, who is the escalation path?
If you cannot answer most of these, the new store is a risk regardless of how well it fits the data model.
26. Glossary of terms used above
- ACID — Atomicity, Consistency, Isolation, Durability. The classic transaction guarantee.
- BASE — Basically Available, Soft state, Eventual consistency. The looser model many NoSQL systems adopt.
- MVCC — Multi-Version Concurrency Control; readers see a snapshot while writers create new versions.
- WAL — Write-Ahead Log; durability mechanism in nearly all modern databases.
- LSM-tree — Log-Structured Merge-tree; write-optimized storage used by Cassandra, RocksDB, ScyllaDB.
- B-tree / B+tree — balanced tree structure used by most relational engines and many KVs.
- GSI / LSI — Global / Local Secondary Index in DynamoDB.
- HTAP — Hybrid Transactional / Analytical Processing.
- OLTP / OLAP — On-Line Transaction / Analytical Processing.
- CDC — Change Data Capture; stream of row-level changes from a database.
- CQRS — Command Query Responsibility Segregation; separate write and read models.
- MPP — Massively Parallel Processing; analytical engine architecture.
- CAP / PACELC — distributed-system consistency-availability tradeoff frameworks.
- HA — High Availability; replication and failover machinery.
- DR — Disaster Recovery; recovery from regional or catastrophic failure.
The takeaway: the field has matured to the point where most decisions are well-trodden. Pick the boring default unless you have a specific reason not to, and document the reason in an ADR (_index links to the broader process).
27. Final notes on choosing well
Three patterns recur across teams that pick well:
- They start with the access patterns and consistency requirements, not with a favorite database.
- They are willing to use boring technology (Postgres, Redis, S3) until a measured bottleneck justifies adding complexity.
- They write down the decision and the alternatives considered, so the next team does not relitigate the same questions in 18 months.
Three patterns recur across teams that pick poorly:
- They chase the vendor with the loudest conference keynote.
- They optimize for a hypothetical scale they will not reach for years (or ever).
- They add stores without a corresponding plan for ops, backup, monitoring, or schema evolution.
If this note saves one team from any of those, it has paid for itself.