Database Engine Taxonomy

A working catalog of the production database landscape circa 2026: the OLTP row stores, the OLAP columnar warehouses, the document stores, the key-value caches, the graph engines, the vector stores, the time-series databases, the search indexes, the embedded engines, the NewSQL distributed systems, the streaming logs, and the lakehouse formats. This is a taxonomy — it groups engines by the workload and data model they serve rather than by company, since the same vendor often ships across categories (Postgres → PostGIS → pgvector → TimescaleDB → Citus is one example of a single engine fanning out across five categories).

The selection axes that matter when picking an engine:

  1. Workload: OLTP (read+write, low-latency, point queries) vs OLAP (big scans, aggregations, analytical) vs HTAP (both) vs streaming (continuous append).
  2. Consistency: linearizable → sequential → causal → read-your-writes → eventual. Picks one end of CAP under partition.
  3. Data model: relational, document, key-value, wide-column, graph, columnar, vector, time-series, multi-model.
  4. Deployment: managed cloud (Snowflake, BigQuery, DynamoDB) vs self-hosted (Postgres, MySQL, ClickHouse) vs embedded (SQLite, DuckDB, RocksDB).
  5. Query language: SQL (still dominant), MQL (MongoDB), CQL (Cassandra), Cypher (Neo4j), Gremlin (graph standard), SPARQL (RDF), GraphQL, AQL (ArangoDB), Flux/InfluxQL (Influx), KQL (Kusto/Azure), N1QL (Couchbase), PromQL (Prometheus).
  6. Wire-compatibility: the Postgres wire protocol is the single largest gravity well — CockroachDB, YugabyteDB, Neon, Supabase, Crunchy Data, AlloyDB, Aurora Postgres, Materialize, and dozens of others speak it.
  7. Licensing: Apache 2.0 vs MIT vs BSD remain the open-source default; AGPL (Mongo pre-2018), SSPL (Mongo 2018, Elastic 2021), BSL (Cockroach, MariaDB, Redis 2024), Elastic License 2.0, and Confluent Community License are the source-available licenses created in the 2018-2024 cloud-vendor relicensing wave.

Relational OLTP — the row-oriented ACID workhorses

The category that started commercial database software (System R, Ingres, Oracle V2 1979) and still runs ~80% of the world’s transactional workload. Row-oriented storage, ACID guarantees, SQL as lingua franca, B-tree (or LSM) indexing, MVCC for concurrency.

  • PostgreSQL — UC Berkeley 1986 (Michael Stonebraker), open source since 1996; PostgreSQL Global Development Group governance; PostgreSQL License (BSD-style). The Swiss Army knife of relational. Features: declarative partitioning, logical replication, JSONB column type, full-text search, GIN/GiST/BRIN/SP-GiST/Hash/BTree indexes, custom types, custom operators, custom aggregates, FDW (foreign data wrappers) for federation, parallel query (8.x+), generated columns, table inheritance, listen/notify, row-level security, write-ahead logging, hot standby, streaming replication. Extension ecosystem makes it irreplaceable: PostGIS (geospatial), pgvector (vector search; Andrew Kane 2021), Citus (distributed sharding; Microsoft acquired 2019), TimescaleDB (time-series hypertables; Timescale Inc.), pg_partman, pg_cron, plv8 (JS in DB), plr (R in DB), HypoPG (hypothetical indexes), pg_stat_statements. Managed offerings: AWS RDS Postgres, AWS Aurora Postgres (storage-decoupled), GCP Cloud SQL, Azure Database for Postgres, Neon (S3-backed; serverless Postgres; 80M ARR 2025), Crunchy Data (Postgres specialists; Snowflake acquired Jun 2025 $250M), EnterpriseDB / EDB (Postgres support and Oracle compat tools), Aiven (managed cloud; Helsinki), pgEdge (multi-master Postgres for edge), Tembo (Postgres “stacks”).
  • MySQL — Michael Widenius + David Axmark 1995; MySQL AB → Sun 2008 ($1B) → Oracle 2010. Default LAMP stack DB for two decades. Pluggable storage engines: InnoDB (default since 5.5; ACID), MyISAM (legacy), Aria, NDB cluster, Memory, MyRocks (Facebook 2016 RocksDB-backed). Replication: async binlog, semi-sync, group replication (8.0), MGR multi-master. Managed: Oracle MySQL HeatWave, AWS RDS MySQL, AWS Aurora MySQL-compatible (custom storage layer decoupled from compute, log-structured), PlanetScale (Vitess-based managed MySQL with branching; YC W18; cut free tier 2024), GCP Cloud SQL MySQL, Azure Database for MySQL.
  • MariaDB — community fork of MySQL after Oracle acquisition; led by original creator Widenius from Helsinki; MariaDB Foundation (Apache 2.0 license) + MariaDB Corp (formerly SkySQL); merged with MariaDB Corp Jan 2023; went public via SPAC Dec 2022 then delisted Aug 2024 ($26M K1 take-private). Features: Galera synchronous multi-master replication, ColumnStore (columnar analytics engine), Xpand (distributed SQL, formerly Clustrix, acquired 2018), MaxScale (proxy), Spider (sharding storage engine), Oracle PL/SQL compatibility mode, sequences, system-versioned temporal tables.
  • Oracle Database — Larry Ellison + Bob Miner + Ed Oates 1977 (Software Development Laboratories → Relational Software → Oracle Systems → Oracle Corp). V2 1979 was first commercial SQL RDBMS shipped. Features: RAC (Real Application Clusters, shared-disk multi-instance), Exadata (engineered system with smart storage offload), TimesTen (in-memory), Autonomous Database (self-tuning), Active Data Guard, GoldenGate (CDC + replication; Oracle acquired GoldenGate Software 2009), partitioning, advanced compression, RMAN backup, ASM (Automatic Storage Management), PL/SQL stored procs, materialized views. PL/SQL is the most-cloned proprietary procedure language (EDB, MariaDB, IBM DB2, Google AlloyDB Omni all implement compatibility layers).
  • Microsoft SQL Server — 1989; originated as a Sybase port to OS/2 then NT (Microsoft + Sybase + Ashton-Tate). Diverged from Sybase 1993; rewritten engine SQL Server 7.0 1998. T-SQL dialect. Features: columnstore indexes (since 2012; combined OLTP+OLAP), in-memory OLTP (Hekaton), Always On Availability Groups, Service Broker, SSIS/SSAS/SSRS BI stack, CLR integration. Azure SQL Database (managed, autoscale), Azure SQL Managed Instance, SQL Server on Linux (since 2017). Editions: Enterprise, Standard, Web, Developer, Express, LocalDB.
  • IBM DB2 — System R origin (1974 IBM San Jose); DB2 brand 1983 (z/OS first). Family: DB2 for z/OS (mainframe), DB2 LUW (Linux/Unix/Windows), DB2 for i (IBM i / AS400), Db2 Big SQL (Hadoop bridge), Db2 Warehouse (analytical), Informix (acquired 2001). Used in core banking, airline reservations, and government systems where it’s been resident for 40+ years.
  • SQLite — D. Richard Hipp 2000; written for the US Navy; public domain (one of very few). Self-contained, single-file, zero-config, embedded, serverless. Used in iOS, Android, Chrome/Firefox/Safari, Windows 10, Mac OS X, every Tesla, every Boeing 787, every Airbus A350 cockpit, BMW i-Drive, Skype legacy, Adobe products, Bloomberg, Dropbox, McAfee, Symantec, every Python install, every PHP install, every Ruby install. Estimated 1 trillion+ active deployments — the most-deployed software module in history. Features: full ACID (WAL mode), full SQL92 (most of), full-text search FTS5, R-Tree spatial, JSON1 (since 3.45), generated columns, window functions, CTEs. Litestream (Ben Johnson 2021; SQLite → S3 replication), LiteFS (Fly.io 2022; distributed SQLite), Cloudflare D1 (Workers SQLite at edge), Turso (libSQL fork by ChiselStrike → Turso; LibSQL Apache 2.0; offers managed; ~$60M raised), rqlite (Raft + SQLite distributed), dqlite (Canonical).
  • CockroachDB — Cockroach Labs 2015; founders Spencer Kimball + Peter Mattis + Ben Darnell (ex-Google, ex-Square — Spencer wrote GIMP); BSL → Apache 2.0 after 3 years (relicensed to BSL 1.1 Aug 2024 reversing direction). Inspired by Google Spanner paper; multi-region; Raft consensus per range; SQL layer above KV; Postgres wire protocol (drop-in for most ORMs). Multi-active-replica, follower reads, geo-partitioning. CockroachDB Serverless + Dedicated + Self-Hosted. Series F 2021 5B valuation; layoffs 2024; 2025 went BSL-only.
  • TiDB — PingCAP 2015 (Beijing → SF); founders Liu Qi + Cui Qiu + Huang Dongxu; Apache 2.0. MySQL wire protocol compatible. Architecture: TiDB SQL layer + TiKV distributed row KV (Raft, Rust) + TiFlash columnar replica (HTAP, ClickHouse-derived) + PD (Placement Driver coordinator). True HTAP — same data accessible row-format via TiKV and column-format via TiFlash with raft-replicated consistency. PingCAP raised Series D 2020 $270M; restructured 2023.
  • YugabyteDB — Yugabyte Inc 2016; founders Kannan Muthukkaruppan + Karthik Ranganathan + Mikhail Bautin (ex-Facebook, originally built HBase + Cassandra at FB); Apache 2.0 + commercial. Two API layers: YSQL (Postgres-wire reusing actual Postgres parser/executor) + YCQL (Cassandra CQL-wire). Distributed storage in DocDB (per-tablet RocksDB + Raft). Inspired by Spanner; sharding by hash or range; multi-region.
  • Google Cloud Spanner — Google 2012 (paper) → external GA 2017. TrueTime API uses GPS + atomic clocks in every datacenter to bound clock uncertainty; enables external consistency (linearizable). Underpins Google Ads, Play Store, Drive metadata. Adds Spanner Graph (2024) and Spanner Vector (2024). PostgreSQL interface 2023. AlloyDB (Google’s 2022 Postgres derivative) is positioned for OLTP; Spanner for global scale.
  • AWS Aurora — 2014; storage-decoupled MySQL and Postgres compat. Custom log-structured replicated storage layer (six-way across three AZs). Aurora Serverless v2 (auto-scaling). Aurora DSQL (Dec 2024 preview) — globally distributed serverless SQL with active-active multi-region.
  • Azure Cosmos DB — Microsoft 2017; multi-API (SQL/Mongo/Cassandra/Gremlin/Table); tunable consistency (five levels); Cosmos DB for Postgres (formerly Citus on Azure); turnkey global distribution.

Embedded engines — in-process, zero-server

For applications that ship a database inside themselves rather than reach across a network.

  • SQLite — see above; the gold standard.
  • DuckDB — CWI Amsterdam research project 2018-2019; Mark Raasveldt + Hannes Mühleisen; MIT license. Columnar, vectorized, in-process OLAP. Native Parquet, CSV, JSON, Arrow IPC, Excel, Postgres, MySQL, S3, Iceberg. Window functions, complex types (struct/list/map), full PostgreSQL-flavored SQL. “SQLite for analytics.” Foundation: DuckDB Foundation (Dutch nonprofit) holds copyright. Commercial: DuckDB Labs (consulting) + MotherDuck (managed DuckDB cloud; Jordan Tigani ex-BigQuery + Tino Tereshko 2022; ~$100M Series B 2023).
  • RocksDB — Facebook 2012; fork of Google’s LevelDB (2011 Jeff Dean + Sanjay Ghemawat). LSM-tree storage engine. C++. The most-embedded storage engine: powers Kafka Streams, CockroachDB, TiKV, MyRocks (Facebook), Yugabyte DocDB, ScyllaDB (originally), Apache Cassandra (3.x option), Ceph BlueStore (originally), Jaeger, Druid, Ethereum geth client. Maintained by Meta + community.
  • LMDB — Symas / OpenLDAP 2011; Howard Chu; B+ tree, memory-mapped, ACID, single-writer multi-reader; sub-microsecond reads. Used in OpenLDAP, Bitcoin Core, Bind9, Memgraph, Mozilla, Drupal.
  • Badger — Dgraph 2017; pure-Go LSM-tree KV; designed for SSDs.
  • BoltDB — CoreOS 2014; pure-Go B+ tree KV; LMDB-inspired; archived 2017 but actively forked as bbolt (etcd, Kubernetes, ConcourseCI).
  • LevelDB — Google 2011; original LSM-tree KV; Chrome, Bitcoin Core. Maintenance mode; superseded by RocksDB.
  • TiKV — PingCAP 2016; CNCF graduated 2020; Rust; distributed transactional KV; backs TiDB but stand-alone usable.
  • KuzuDB — Waterloo / Kuzu Inc 2022; Semih Salihoglu + team; embedded property graph DB; Cypher; native columnar; Apache 2.0; “DuckDB for graphs.”
  • Datasette — Simon Willison 2017; serves SQLite databases as queryable HTTPS endpoints; Apache 2.0; the canonical “tiny data publishing” tool.

OLAP and data warehouses — columnar, MPP

Designed for analytical scans: aggregate billions of rows in seconds. Columnar storage (one file per column), late materialization, vectorized execution, MPP (massively parallel processing).

  • Snowflake — 2012 Mike Speiser (Sutter Hill) + Benoit Dageville + Thierry Cruanes + Marcin Żukowski (ex-Oracle, ex-VectorWise); GA Jun 2015 with Bob Muglia (ex-Microsoft) as CEO. Separated compute and storage (compute = “virtual warehouses” that can be paused; storage = micro-partitioned columnar in S3/Azure Blob/GCS). Multi-cloud (AWS + Azure + GCP). Single biggest software IPO ever Sep 2020 120B → ~3.6B; net loss ~1.3B (heavy R&D + sales). Acquisitions: Streamlit 2022 800M, Neeva 2023 (Sridhar Ramaswamy who became CEO Feb 2024), Modin 2023, Crunchy Data Jun 2025 $250M. Features: zero-copy cloning, time travel (90d), Snowpipe streaming, Snowpark (Python/Scala/Java UDFs), Cortex AI, Native Apps, Polaris Catalog (Iceberg), Snowflake Marketplace.
  • Databricks — 2013 from UC Berkeley AMPLab; founders Ali Ghodsi + Matei Zaharia (Spark creator) + Ion Stoica + Reynold Xin + Patrick Wendell + Andy Konwinski + Arsalan Tavakoli. Originally a managed Spark service; pivoted to “lakehouse” architecture (Parquet + Delta Lake transaction log on object store + Spark/Photon engine). Acquisitions: MosaicML Jun 2023 1B+ (Iceberg; founded by Ryan Blue who created Iceberg at Netflix), Mooncake Labs Jun 2025 (managed Postgres + Iceberg), Neon Jun 2025 100M (CDC), 8080 Labs 2021 (Bamboolib). Funding: Series J Dec 2024 62B valuation; **June 2025 secondary at 3B ARR Mar 2025.
  • Google BigQuery — Google Cloud 2010 GA (Dremel paper internally since 2006); serverless; pay per byte scanned (or flat-rate slots); separates compute and storage; columnar Capacitor format. BigLake (lakehouse atop GCS), BigQuery ML (in-DB ML), BigQuery Omni (cross-cloud query AWS/Azure data).
  • AWS Redshift — 2012 GA; originally based on ParAccel (Mike Stonebraker design influence); diverged. RA3 instance with managed storage (S3-backed); Redshift Serverless, Redshift Spectrum (query S3), Redshift ML.
  • Microsoft Azure Synapse Analytics — 2019 (rebrand of SQL Data Warehouse); now subsumed into Microsoft Fabric (2023) — Fabric unifies Synapse + Data Factory + Power BI + Real-Time Analytics + Data Activator on OneLake (Delta Parquet).
  • Vertica — Mike Stonebraker + Andrew Lamb + Daniel Abadi 2005 (from C-Store paper 2005); HP acquired 2011 $350M; spun back to Micro Focus 2017; OpenText acquired Micro Focus 2023; columnar MPP; Eon mode separates compute and storage.
  • Teradata — 1979 NCR spinoff; legacy enterprise warehouse; VantageCloud is the modern multi-cloud rebrand.
  • Greenplum — Pivotal/VMware; Postgres-derived MPP; spun via Broadcom → discontinued Apr 2024 from open source by Broadcom (community fork Cloudberry ASF incubation).
  • Firebolt — 2019; founders ex-Sisense; SSB-style ultra-fast analytics; raised 1.4B 2022 valuation.
  • ClickHouse — Yandex Moscow 2009 (Alexey Milovidov); open-sourced Jun 2016; Apache 2.0; spun out as ClickHouse Inc 2021 (Aaron Katz CEO, ex-Elastic). Columnar OLAP, MergeTree engine family, vectorized execution, subsecond aggregation across billions of rows, scans up to 2TB/s on commodity hardware. Used at Cloudflare (HTTP analytics ~10M req/sec), Uber, eBay, Lyft, Spotify (event analytics replacing Druid), Mux, Plausible Analytics, PostHog (entire backend), Tinybird (managed). Funding: Series C Sep 2024 6.35B valuation.
  • Druid — Metamarkets 2011 (now Snap); Apache top-level 2018; co-creator Fangjin Yang founded Imply (commercial Druid). Real-time analytical OLAP, time-partitioned, indexed columnar. Used by Netflix, Walmart, Airbnb, Yahoo, Lyft.
  • Apache Pinot — LinkedIn 2014; Apache top-level 2019; co-creator Kishore Gopalakrishna founded StarTree (commercial Pinot; $47M Series B 2022). Real-time analytics, sub-second on petabyte-scale, used by LinkedIn (1B+ members), Uber, Stripe, Walmart.
  • Apache Doris — Baidu Palo 2017 → Apache top-level 2022; MySQL wire; HTAP. Commercial: SelectDB (raised $36M 2022).
  • StarRocks — Apache 2.0 fork of Doris; founded by Doris co-creators 2020; commercial CelerData ($45M Series B 2024); often benchmarks ahead of Doris/Druid/Pinot.
  • Trino — fork of Presto Mar 2020 (Martin Traverso + Dain Sundstrom + David Phillips + Eric Hwang departing Facebook + PrestoSQL → Trino rebrand 2020 after legal); Trino Software Foundation. Federated query engine across Hive/Iceberg/Delta/HDFS/S3/Kafka/Cassandra/Postgres/MySQL/MongoDB/Elasticsearch/Redis/Pinot/Druid/ClickHouse; SQL only. Presto continues at Linux Foundation governance under Meta (PrestoDB).
  • Starburst — commercial Trino (Justin Borgman, ex-Teradata; raised Series D 2022 3.35B valuation; layoffs 2023-2024).
  • AWS Athena — 2016; serverless managed Trino/Presto on S3; pay per byte scanned.
  • MotherDuck — see DuckDB cloud above.
  • Sundeck — DuckDB-as-managed-service (separate from MotherDuck).
  • ksqlDB — Confluent 2019; Kafka Streams SQL frontend; deprecated → Confluent moved focus to Flink SQL 2024.
  • CelerData — commercial StarRocks.

Document databases — JSON-native

Designed when “fixed schema is the wrong default” became fashionable (~2009 NoSQL wave).

  • MongoDB — 10gen → MongoDB Inc 2007 (Dwight Merriman + Eliot Horowitz + Kevin Ryan). BSON binary JSON; aggregation pipeline (group/facet); sharding by hash or range; change streams (replica oplog tailing); MongoDB Atlas (managed since 2016; ~70% of MongoDB revenue 2025). Atlas Vector Search since 2023. MongoDB Query Language (MQL) + Aggregation Framework. Licensing: AGPL 2009 → SSPL Oct 2018 (Server Side Public License — copyleft for SaaS) — triggered AWS/Azure/GCP relicense events. NYSE IPO Oct 2017 at 2.0B; market cap ~39M (mobile), Voyage AI 2025 (embeddings/reranker for retrieval).
  • Couchbase — 2011 merger of CouchOne (CouchDB stewards) + Membase (Memcached-derived). Couchbase Server (KV+document hybrid; N1QL SQL-like; analytics engine; eventing; mobile sync), Capella (managed cloud). IPO Jul 2021 ~1.5B.
  • CouchDB — Apache; multi-master eventually-consistent JSON; HTTP API; “PouchDB” client-side sync.
  • Amazon DocumentDB — 2019; managed; Mongo wire compat (partial — many Mongo features unsupported); Aurora storage backend; AWS positioning as Mongo alternative within the cloud.
  • Azure Cosmos DB Mongo API — Mongo wire over Cosmos storage.
  • RavenDB — Hibernating Rhinos; .NET-first; multi-master; document + graph + search; AGPL + commercial.
  • OrientDB — multi-model document + graph; SAP acquired Orient Technologies 2017; deprecated 2024.
  • MarkLogic — XML/JSON enterprise document DB; Progress Software acquired 2023 $355M.
  • FaunaDB — 2017 ex-Twitter (Evan Weaver + Matt Freels); distributed transactional document DB with FQL/GraphQL; shut down core service Mar 2025; AGPL OSS release.

Key-value stores

The simplest data model — get(k) / put(k, v) / delete(k) — and the foundation of most other engines internally.

  • Redis — Salvatore Sanfilippo (antirez) 2009; Italian developer; sponsored by VMware → Pivotal → Redis Labs → Redis Inc (Yiftach Shoolman + Ofer Bengal). In-memory; data structures (strings, hashes, lists, sets, sorted sets, streams, bitmaps, HyperLogLog, geospatial, bitfield); replication; sentinel; cluster; persistence (RDB + AOF). Modules (RediSearch, RedisJSON, RedisGraph deprecated, RedisTimeSeries, RedisBloom, RedisAI). Licensing crisis Mar 2024: BSD → dual SSPLv1 + RSAL — triggered Valkey fork led by Linux Foundation (AWS + Google + Oracle + Snap + Ericsson + Alibaba). Antirez returned to Redis Sep 2024. Redis 8 (May 2025) added AGPL track restoring open source.
  • Valkey — Linux Foundation Mar 2024 fork from Redis 7.2.4 BSD; AWS, Google, Oracle, Ericsson, Snap, Alibaba founding sponsors; drop-in Redis-compat; AWS ElastiCache + MemoryDB switched defaults to Valkey 2024.
  • Memcached — Danga Interactive / LiveJournal 2003 (Brad Fitzpatrick); pure in-memory cache; no persistence; LRU eviction; consistent-hash client-side sharding; still widely deployed for hot caches.
  • KeyDB — multi-threaded Redis fork by EQ Alpha 2019; Snap Inc acquired 2022; integrated into Snap stack.
  • Dragonfly — 2022; Roman Gershman + Oded Poncz (ex-Google); Redis-API-compatible but multi-threaded shared-nothing C++; up to 25x throughput vs Redis on a single host; raised $21M 2023; BSL licensing.
  • ScyllaDB — 2014 (KVM creators Avi Kivity + Dor Laor); C++ rewrite of Cassandra; shard-per-core (Seastar framework); Cassandra-wire-compatible; DynamoDB-wire-compatible Alternator API also; ScyllaDB Cloud managed.
  • Apache Cassandra — Facebook 2008 inbox search (Avinash Lakshman + Prashant Malik — Lakshman also co-authored Dynamo paper); Apache 2009; CQL since 1.2; tunable consistency (ONE/QUORUM/ALL etc); ring topology gossip; LSM SSTables. Cassandra 5.0 (Sep 2024) added vector type + ANN index. Commercial: DataStax (Astra DB managed; Series E 2022 $115M; on file for IPO 2025).
  • Riak — Basho 2009; Dynamo-inspired; Riak KV + TS + S2 (S3 compat); Basho bankruptcy 2017; community continuation; Bet365 large user.
  • Aerospike — 2009; founders ex-Citrusleaf; sub-millisecond hybrid memory (DRAM index + SSD data); ad-tech and financial dominant deployment; Series E 2022 $64M.
  • Amazon DynamoDB — 2012; original 2007 Dynamo paper authors at Amazon (Werner Vogels + Giuseppe DeCandia + Deniz Hastorun + others). Provisioned (read/write capacity units) vs On-Demand (per-request pricing 2018) vs Global Tables (multi-region multi-active) + Streams (CDC) + DAX (caching) + Point-in-Time Recovery. NoSQL-only key-value+document; secondary indexes (LSI/GSI); single-table-design pattern (Rick Houlihan evangelism).
  • FoundationDB — Apple acquired 2015 ($> rumoured 8-figure); open-sourced Apr 2018 Apache 2.0; ordered transactional KV with strict serializability; “build any database on top” philosophy; powers Apple iCloud, Snowflake metadata, Wavefront, IBM Cloudant.
  • Tigris Data — open-source Apache 2.0; document + search + cache; Ericsson-founded YC W22; shut down Mar 2024 → relaunched 2025 as Tigris Object Storage (S3-compatible).
  • HiveDB — Apache; columnar over Hadoop.

Wide-column / column-family

Schema-flexible row-keyed storage with column families (Cassandra, HBase, Bigtable model).

  • Apache HBase — 2008; Bigtable paper model; Hadoop ecosystem; Facebook Messages 2011 (now migrated to MyRocks/RocksDB); Adobe, Salesforce internal.
  • Google Bigtable — 2006 (paper) → 2015 managed cloud release; petabyte-scale; powers Google Search, Maps, Analytics, Gmail; Cloud Bigtable HBase-compat API; latency-stable workloads.
  • Apache Cassandra — see KV section.
  • ScyllaDB — see KV section.
  • Hypertable — discontinued.

Graph databases

Property graph (nodes + edges + properties) or RDF triple stores. Specialized for traversals and pattern matching.

  • Neo4j — Emil Eifrem + Johan Svensson + Peter Neubauer 2007 (Sweden → Malmö); GraphDB pioneer; Cypher query language (now openCypher with multi-vendor); GraphQL Federation layer; AuraDB managed cloud; Graph Data Science library (PageRank, Louvain, node2vec). Funding: Series F Jun 2021 2B+ valuation. Cypher influenced ISO GQL (Graph Query Language ISO/IEC 39075:2024 ratified Apr 2024).
  • ArangoDB — ArangoDB Inc 2014 (Cologne); multi-model document + graph + search; AQL query language; SmartGraphs; recovered from 2023 financial trouble.
  • JanusGraph — Apache 2017; fork of Titan (Aurelius acquired by Datastax 2015 → JanusGraph spinout); pluggable backend (Cassandra/HBase/Bigtable/Scylla/BerkeleyDB); pluggable index (Elastic/Solr/Lucene); Gremlin (Apache TinkerPop); IBM/Google contributing.
  • TigerGraph — 2017; Yu Xu (Beijing → Redwood City); parallel graph DB; GSQL query; raised Series C 2021 1B+; layoffs 2023.
  • Amazon Neptune — 2018; managed graph; Gremlin + SPARQL + openCypher; Neptune Analytics 2023 (in-memory analytical).
  • Memgraph — 2017 Zagreb; C++ in-memory Cypher; query streaming + MAGE algorithms; BSL.
  • Nebula Graph — Vesoft 2018 (Beijing); raised $35M Series A 2021; ANTV / WeBank user; nGQL query; HTAP graph.
  • Dgraph — 2016; Manish Rai Jain (ex-Google); native GraphQL+/GraphQL; Apache 2.0; Hypermode (formerly Dgraph Labs) pivoted 2024 to AI inference platform; Dgraph itself archived → community fork.
  • KuzuDB — see embedded; columnar property graph in-process.
  • FalkorDB — RedisGraph fork after Redis deprecated the module 2024; Apache 2.0; sparse matrix linear-algebra-based.
  • Stardog — knowledge graph (RDF + LPG hybrid); enterprise knowledge management.
  • Blazegraph — Wikidata backend; Apache 2.0; AWS Neptune was based on Blazegraph; mostly maintenance-only.
  • GraphDB (Ontotext) — RDF+SPARQL.
  • AnzoGraph (Cambridge Semantics) — MPP RDF.
  • TerminusDB — Git-like immutable graph.

Vector / embedding stores — the 2023-2025 wave

Approximate Nearest Neighbor (ANN) search on dense embedding vectors. Driven by LLM RAG and semantic search demand.

Dedicated vector DBs:

  • Pinecone — Pinecone Systems 2019 (Edo Liberty ex-AWS SageMaker); fully managed SaaS; serverless tier (2024) decoupled compute/storage S3-backed; multi-tenant. Raised 100M at 750M flat then 70M ARR 2024.
  • Milvus — Zilliz Inc 2019 (Charles Xie ex-Oracle); Apache 2.0; CNCF graduated 2024; HNSW + IVF + DiskANN + GPU; cluster + standalone modes. Zilliz Cloud managed; Knowhere (ANN engine library).
  • Weaviate — Bob van Luijt + Etienne Dilocker 2019 (Amsterdam, SeMI Technologies → Weaviate B.V.); BSD-3; modular vectorizer plugins; GraphQL query; hybrid BM25 + vector; multi-tenant; Weaviate Cloud Services. Series B Apr 2023 200M.
  • Qdrant — Andrey Vasnetsov + Andre Zayarni 2021 (Berlin); Rust + Tonic gRPC; Apache 2.0; serverless cloud; quantization (scalar/product/binary); raised $28M Series A 2024.
  • Chroma — 2022; Jeff Huber + Anton Troynikov; Apache 2.0; Python-first embedded vector store; cloud version 2024; raised Series A $18M.
  • LanceDB — LanceDB Inc 2022; Chang She (ex-pandas) + Lei Xu; Rust + Apache Arrow + Lance columnar format optimized for ML; embedded + cloud; Apache 2.0.
  • Vespa — Yahoo internal since 2003; open source 2017; spun out from Verizon as Vespa.ai Inc Oct 2023 ($31M seed from Blossom); hybrid retrieval (BM25 + tensor + vector); ad-tech and search scale (millions of QPS).
  • Marqo — 2022; managed semantic search; tensors of arbitrary dimensions.
  • Vald — Yahoo Japan; CNCF sandbox.
  • DocArray + Jina AI — multimodal; Jina pivoted 2024 to embeddings + reranker API.

Vector capabilities in existing DBs:

  • pgvector — Andrew Kane 2021; Postgres extension; HNSW + IVFFlat; the gravitational center of vector search in 2025 because everyone already has Postgres. pgvectorscale (Timescale 2024) adds StreamingDiskANN.
  • Elasticsearch dense_vector — 7.3+; HNSW; widely deployed.
  • OpenSearch k-NN plugin — HNSW + IVF + Lucene HNSW; default for AWS RAG.
  • MongoDB Atlas Vector Search — 2023; serverless; hierarchical navigable small world.
  • Redis VSS / Redis Search — RediSearch module; HNSW + FLAT.
  • Cassandra Vector Search — 5.0+; ANN with SAI indexes; DataStax Astra DB Vector + JVector library.
  • ClickHouse vector functions — distance functions + ANN indexes 2024+.
  • DuckDB VSS — extension 2024.
  • SQLite-vss / sqlite-vec — Alex Garcia 2023; SQLite extensions.
  • Snowflake Cortex Vector — 2024.
  • BigQuery vector search — 2024.
  • Azure Cosmos DB vector indexing — 2024.
  • SingleStore — vector native columnar.
  • Oracle 23ai — AI Vector Search built-in 2024.
  • SQL Server 2025 preview — vector type.

ANN algorithms (the math under the hood):

  • HNSW (Hierarchical Navigable Small World) — Malkov + Yashunin 2018; graph-based; default in most vector DBs.
  • IVF / IVF-PQ (Inverted File Index + Product Quantization) — Jégou + Douze + Schmid 2010-2017 in FAISS (Meta).
  • ScaNN — Google 2020; anisotropic vector quantization.
  • DiskANN / Vamana — Microsoft NeurIPS 2019; disk-resident graph index; powers Bing.
  • SPANN — Microsoft 2021; memory + disk hybrid.
  • FreshDiskANN — incremental DiskANN.
  • NGT — Yahoo Japan.

Open-source libraries:

  • FAISS — Meta FAIR 2017; the foundational library.
  • Annoy — Spotify 2015; tree-based; simple and fast for static datasets.
  • ScaNN — Google.
  • hnswlib — original HNSW reference.
  • DiskANN library — Microsoft.

Time-series databases

Optimized for append-mostly numerical timestamped data: metrics, IoT, sensor, financial ticks, logs-as-metrics.

  • InfluxDB — InfluxData 2013 (Paul Dix). Major version transitions: 1.x (TSM storage engine + InfluxQL, Go); 2.x (Flux query language, Go) — controversial reception; 3.x / InfluxDB IOx (2024 GA; Rust + Apache Arrow + DataFusion + Parquet object storage; SQL + InfluxQL) — Flux abandoned. Cloud Serverless + Dedicated + Clustered; OSS Edge.
  • TimescaleDB — Timescale Inc 2017 (Mike Freedman + Ajay Kulkarni; Princeton); Postgres extension providing hypertables (time-partitioned chunks), continuous aggregates, compression, retention policies. TSL (Apache for Apache Postgres parts, Timescale License for advanced). Timescale Cloud + self-hosted; pgai + pgvectorscale extensions also from Timescale.
  • QuestDB — QuestDB Ltd 2014 (Vlad Ilyushchenko); zero-GC Java + C++; Apache 2.0; SQL with SAMPLE BY / LATEST ON time extensions; column-oriented; raised $40M Series A 2024.
  • VictoriaMetrics — Aliaksandr Valialkin 2018; Prometheus drop-in alternative; better cardinality handling; lower memory; PromQL → MetricsQL; vmagent + vmselect + vminsert + vmstorage cluster; commercial VictoriaMetrics Enterprise.
  • kdb+ / q — Kx Systems / KX 2003; Arthur Whitney (APL → A+ → K → kdb+); columnar in-memory; dominant in finance/HFT; q language; FDB 2024 (KX Foundation database); $300M+ revenue; FD Technologies parent.
  • OpenTSDB — HBase-backed; legacy.
  • GraphiteDB — Carbon + Whisper (file-per-metric); legacy but ubiquitous in older estates.
  • Apache IoTDB — Tsinghua 2020 → Apache top-level; native time-series with TsFile format; native IoT modeling.
  • M3DB — Uber 2018 → M3 Inc; Prometheus-compatible; high-cardinality.
  • Apache Druid — see OLAP.
  • Apache Pinot — see OLAP.
  • ClickHouse for time-series — common substitution.
  • Prometheus TSDB — local on-disk; designed for ephemeral; Thanos / Cortex / Mimir / VictoriaMetrics extend to long-term.

Search engines

Full-text search and inverted indexes.

  • Elasticsearch — Shay Banon 2010 (Israel → Amsterdam); Lucene-based; distributed; near-realtime; aggregations; ELK stack: Elasticsearch + Logstash + Kibana + Beats. Licensing: Apache 2.0 → SSPLv1 + Elastic License 2.0 (ELv2) dual Jan 2021 → triggered OpenSearch AWS fork Apr 2021. Sep 2024: Elasticsearch re-added AGPLv3 option (triple-licensed). Public 2018 (~1.4B 2025; ~$13B market cap.
  • OpenSearch — AWS-led fork Apr 2021; OpenSearch Software Foundation under Linux Foundation Sep 2024 (transferred from AWS to neutral foundation); Apache 2.0; AWS OpenSearch Service managed.
  • Apache Solr — 2004 CNET / Yonik Seeley; Apache Lucene-based; predecessor to Elasticsearch; SolrCloud distributed mode; mature but waning relative to ES.
  • Apache Lucene — Doug Cutting 1999; the foundational JVM full-text library; powers Solr, Elasticsearch, OpenSearch, Crate, Nutch.
  • Algolia — 2012 Nicolas Dessaigne + Julien Lemoine (Paris); SaaS search-as-service for ecommerce/SaaS; ~$80M ARR 2024.
  • Typesense — 2020 Jason Bourne + Kishore Nallan; OSS GPLv3; managed Typesense Cloud; “Algolia alternative.”
  • Meilisearch — 2018 (Paris); Rust; typo-tolerant; MIT.
  • Vespa — see vector; also a serious search engine.
  • Quickwit — 2021 (Paris); Rust; log search; cost-optimized; acquired by Datadog Mar 2024 ~$200M reported.
  • Sonic — Rust lightweight search.
  • Manticore Search — Sphinx fork; high-perf.

NewSQL / globally distributed SQL

Distributed transactional SQL with horizontal scale and strong consistency.

  • Google Spanner — see OLTP.
  • CockroachDB — see OLTP.
  • YugabyteDB — see OLTP.
  • TiDB — see OLTP.
  • MariaDB Xpand — formerly Clustrix (founded 2006 by Paul Mikesell + Aaron Passey ex-Isilon); MariaDB Corp acquired 2018; distributed MySQL.
  • VoltDB — Michael Stonebraker 2009 (H-Store paper); in-memory; was Volt Active Data → Volt 2024.
  • NuoDB — distributed SQL; Dassault Systèmes acquired 2023.
  • SingleStore (formerly MemSQL) — 2011 Eric Frenkiel + Nikita Shamgunov; in-memory rowstore + on-disk columnstore HTAP; raised Series F 2022 1.3B valuation.
  • PlanetScale — 2018 Vitess co-creators Jiten Vaidya + Sugu Sougoumarane; managed MySQL with branching/deploy-requests; ended free hobby tier Apr 2024; pivot toward enterprise.
  • Neon — 2021 Nikita Shamgunov (ex-SingleStore/MemSQL) + Stas Kelvich + Heikki Linnakangas (Postgres committer); Postgres-on-S3 architecture (separates compute and storage; scales compute to zero); branching like PlanetScale; Databricks acquired Jun 2025 $1.46B.
  • Supabase — Paul Copplestone 2020; Postgres + auth (GoTrue) + realtime + storage + edge functions + vector + auto-generated REST API (PostgREST); Apache 2.0; Series D 2024 2B; ~$80M ARR 2025.
  • AlloyDB — Google 2022 managed Postgres; columnar accelerator; AlloyDB Omni self-hosted variant; AlloyDB AI with vector + Vertex AI integration.
  • CockroachDB Serverless — 2021 GA; spend caps and burst credits.
  • Crunchy Bridge — Crunchy Data managed Postgres; Snowflake acquired Jun 2025 $250M.
  • Tembo — 2023; Postgres stacks (preconfigured extension sets); Series A 2024 $14M.
  • Xata — Postgres BaaS; deprecated standalone product 2024 → pivot to Postgres branching tooling.

Multi-model

Single engine, multiple data models with shared transaction semantics.

  • ArangoDB — document + graph + search + KV; AQL.
  • OrientDB — deprecated.
  • Azure Cosmos DB — five APIs (SQL/Mongo/Cassandra/Gremlin/Table); tunable consistency.
  • FaunaDB — discontinued core service Mar 2025.
  • Couchbase — KV + document + N1QL SQL + full-text + analytics + eventing + mobile sync.
  • SurrealDB — Tobie Morgan Hitchcock + Jaime Morgan Hitchcock 2022; Rust; document + graph + relational + vector + KV; SurrealQL; BSL (changed from Apache in 2024 controversy).
  • EdgeDB — 2019 Yury Selivanov + Elvis Pranskevichus (ex-MagicStack); built on Postgres; EdgeQL graph-of-objects model; rebranded to Gel 2025.
  • TypeDB — Vaticle (Cambridge); strongly-typed entity-relation; TypeQL; “next-gen logic-programming DB.”

Streaming / event-log / commit-log databases

The append-immutable log paradigm — Kafka-style.

  • Apache Kafka — LinkedIn 2011 (Jay Kreps + Neha Narkhede + Jun Rao); LinkedIn open-sourced 2011 → Apache top-level 2012; Confluent commercial (Kreps CEO; NASDAQ IPO Jun 2021 10B market cap 2026; ~$1B ARR FY2025). Partitioned commit log; consumer groups; KRaft (ZooKeeper-free since 3.x); Tiered Storage 3.6+; broker compaction; exactly-once semantics; Kafka Streams (JVM library), Connect (sources/sinks), Schema Registry, ksqlDB (deprecated focus).
  • Apache Pulsar — Yahoo 2016 → Apache top-level 2018; layered architecture (broker stateless + BookKeeper storage); geo-replication built-in; multi-tenant; StreamNative commercial (Sijie Guo). Adoption: Iterable, Splunk, Tencent, Mercado Libre.
  • Redpanda — Vectorized 2019 (Alexander Gallego ex-Concord); C++; Kafka API-compatible no JVM no ZooKeeper; Series C Apr 2023 500M+; up to 10x lower latency claimed.
  • AWS Kinesis (Data Streams + Firehose + Analytics + Video Streams) — 2013; managed; Lambda-friendly.
  • Azure Event Hubs — managed Kafka surface and native AMQP.
  • GCP Pub/Sub — Google’s managed pub-sub.
  • NATS / NATS JetStream — Synadia (Derek Collison ex-Apcera); lightweight; pull/push; persistence via JetStream; CNCF graduated 2018.
  • RabbitMQ — VMware → Pivotal → Broadcom; AMQP queues; classic queues + quorum queues + streams (since 3.9 — Kafka-like).
  • Materialize — Frank McSherry + Arjun Narayan + Nikhil Benesch 2019; differential dataflow academic work (McSherry MSR Cambridge); incremental view maintenance; Postgres-wire; views over Kafka/Postgres CDC; raised Series C 2021 $60M.
  • RisingWave — Singularity Data 2021 (Yingjun Wu); Rust; streaming SQL; Postgres-wire; ASF 2.0; Series A 2023 $36M.
  • Apache Flink — TU Berlin Stratosphere 2014 → Apache top-level 2014; streaming + batch; Ververica commercial (Alibaba acquired 2019). Flink SQL is the standard streaming SQL in 2025.
  • Apache Beam — Google’s Dataflow SDK open-sourced; unified batch + stream programming model.
  • Apache Samza — LinkedIn; less common today.
  • Estuary Flow — managed CDC + streams; Series A 2023 $7M.

Caching layers

  • Redis — see KV.
  • Valkey — see KV.
  • Memcached — see KV.
  • KeyDB — see KV.
  • Hazelcast — JVM IMDG; CP Subsystem (Raft); SQL since 5.x.
  • GridGain — enterprise Apache Ignite.
  • Apache Ignite — IMDG; SQL; ML; Service Grid; computing grid.
  • NCache — .NET-first; Alachisoft.
  • AWS ElastiCache — managed Redis/Valkey/Memcached.
  • AWS MemoryDB — durable Redis with multi-AZ persistence.
  • Momento — Khawaja Shams (ex-DynamoDB) 2021; serverless cache as a service; raised Series A 2022 $15M.
  • Fastly Edge Cache / Cloudflare Workers KV / Cloudflare Cache API / Cloudflare D1 — edge-distributed caching layers tied to CDNs.
  • Aerospike — see KV.
  • DragonflyDB — see KV.

HTAP and unified workloads

Engines aiming to serve both OLTP and OLAP from the same dataset.

  • TiDB / TiKV / TiFlash — best-known HTAP execution.
  • SingleStore — rowstore + columnstore on same table.
  • SAP HANA — in-memory column-store; the original aggressive HTAP push.
  • Oracle HeatWave (MySQL) — column-store accelerator on Oracle Cloud + AWS + Azure.
  • Greenplum — Postgres MPP; can be configured HTAP-ish.
  • DuckDB embedded scenarios — pair OLTP store (Postgres) with DuckDB analytical layer over same Parquet snapshots.
  • Snowflake Hybrid Tables (Unistore) — 2022 preview / 2024 GA tier; row-based OLTP table alongside columnar.
  • Databricks Lakebase — 2024-2025 OLTP-on-lakehouse positioning building on Neon acquisition.

Lakehouse / table formats

Storage formats and transactional layers atop object storage that enable lakehouse architectures.

  • Apache Parquet — Twitter + Cloudera 2013; columnar file format; the default analytical file format in 2026; row-group + column-chunk + page; dictionary encoding + RLE + bit-packing + delta + snappy/zstd compression.
  • Apache Iceberg — Netflix 2017 (Ryan Blue + Daniel Weeks); Apache top-level 2020; table format with snapshots + hidden partitioning + schema evolution + ACID + branch/tag; specs widely adopted across Snowflake, Databricks, Trino, Athena, Dremio, Starburst. Tabular (Iceberg cofounder commercial) — Databricks acquired Jun 2024 $1B+. Snowflake Polaris Open Catalog 2024 → contributed to Apache Polaris incubation. Apache Polaris — Snowflake’s REST catalog implementation.
  • Delta Lake — Databricks 2017 → Linux Foundation 2019; transaction log over Parquet; Delta Lake 3.0 Universal Format (UniForm) 2023 reads/writes Iceberg + Hudi compatibility metadata; Databricks Lakehouse Platform default.
  • Apache Hudi — Uber 2017 (Vinoth Chandar); upsert-friendly; Apache top-level 2020; Onehouse commercial (Chandar; Series B 2024 $35M); copy-on-write + merge-on-read tables.
  • Apache XTable (formerly OneTable) — 2024; cross-format translation Iceberg ↔ Delta ↔ Hudi metadata.
  • Apache ORC — Hortonworks 2013; columnar; Hive-native.
  • Apache Avro — Doug Cutting 2009; row format with schema; Kafka serialization default.
  • Apache Arrow — Wes McKinney + Hadley Wickham 2016; in-memory columnar IPC standard; Arrow Flight RPC; backbone of DuckDB, Polars, Velox, Influx 3, Datafusion, Dremio, Voltron Data.
  • Apache Polaris — Snowflake’s REST catalog spec.
  • Unity Catalog — Databricks unified governance; OSS Jun 2024.
  • Nessie — Dremio; Git-like catalog for Iceberg; AGPL.
  • Lakekeeper — open-source Iceberg REST catalog.

Specialized verticals

Geospatial:

  • PostGIS (Postgres ext); MySQL Spatial; SQL Server Spatial; Oracle Spatial; BigQuery GIS; DuckDB Spatial extension; Snowflake Geospatial; MongoDB GeoJSON 2dsphere; Elasticsearch geo_shape; ESRI ArcGIS / SDE / Enterprise; GeoMesa (HBase/Accumulo); Uber H3 (hexagonal hierarchical spatial index 2018); Google S2 (cell hierarchy); Geohash.

IoT / edge:

  • InfluxDB; TimescaleDB; QuestDB; SQLite; RocksDB; EdgeDB / Gel; Apache IoTDB; CrateDB (PostgreSQL-wire columnar; Crate.io); TDengine (Taos Data); Litestream (SQLite replica to S3 for edge); Cloudflare D1; Tigris.

Ledger / immutable:

  • Amazon QLDB — append-only journal with cryptographic verification; deprecated 2025 end-of-life Jul 2025.
  • BigchainDB, Hyperledger Fabric, Algorand, immudb (Codenotary).
  • FaunaDB versioning model — service ended.

Streaming aggregation:

  • Materialize, RisingWave, ksqlDB, Flink SQL, Apache Pinot real-time tables, Druid real-time ingestion, ClickHouse Materialized Views.

Consistency, isolation, and CAP

ACID properties (Atomicity, Consistency, Isolation, Durability) — Jim Gray 1981; the foundation guarantee of OLTP.

Isolation levels (ANSI SQL-92 + extended):

  1. Read Uncommitted — dirty reads possible.
  2. Read Committed — most-OLTP default (Postgres, Oracle).
  3. Repeatable Read — Postgres’ “Repeatable Read” is actually snapshot isolation; MySQL InnoDB default.
  4. Snapshot Isolation — MVCC-based; not in ANSI SQL but used by Postgres, Oracle, MS SQL, MySQL InnoDB.
  5. Serializable — strict; some implementations are 2PL, others SSI (Serializable Snapshot Isolation — Cahill 2009; Postgres “serializable” level).

Consistency models (distributed systems):

  • Linearizable (single-object linearizability) — Spanner, FoundationDB transactions.
  • Strict Serializable — combines linearizable + serializable.
  • Sequential Consistency — single global order but possibly delayed.
  • Causal Consistency — read your causal history.
  • Read-Your-Writes — see your own writes.
  • Eventual Consistency — converge eventually; Cassandra, DynamoDB default.
  • Monotonic Read / Monotonic Write / Bounded Staleness — Cosmos DB exposes five levels explicitly.

CAP theorem (Brewer 2000, formalized Gilbert+Lynch 2002) — Consistency, Availability, Partition tolerance: pick two when network partitions occur. Modern engines blur the lines — CockroachDB and Spanner choose CP; DynamoDB chooses AP with tunable; Cassandra chooses AP.

PACELC (Abadi 2012) — extends CAP by adding latency tradeoffs when no partition exists.


Benchmarks

  • TPC-C — Transaction Processing Performance Council; 1992; OLTP benchmark with five transactions (new order, payment, order status, delivery, stock-level); tpmC metric. The reference OLTP benchmark.
  • TPC-H — 1999; 22 ad-hoc decision-support queries on a star schema; scale factors SF1 (1GB) to SF100000 (100TB).
  • TPC-DS — 2006; 99 queries against retail schema; more realistic than TPC-H.
  • TPC-E — financial brokerage workload; more realistic than TPC-C; less commonly used.
  • YCSB — Yahoo Cloud Serving Benchmark 2010; NoSQL/KV/wide-column comparisons; six workloads (A through F) varying read/write/scan mix.
  • ClickBench — ClickHouse 2022; 43 analytical queries over a single 14GB dataset; the canonical columnar OLAP shootout.
  • TPC-DI — 2014; data integration / ETL.
  • TPCx-IoT — IoT-shaped time-series.
  • SSB — Star Schema Benchmark; simpler than TPC-H.
  • VectorDBBench — Zilliz; ANN/vector DB benchmarking.
  • ANN-Benchmarks — vector index recall/throughput.
  • HammerDB — open-source TPC-C/H runner.
  • sysbench — MySQL ecosystem standard.
  • pgbench — Postgres bundled.

The “fauxpen-source” debate triggered by cloud vendors monetizing open-source DBs:

  • AGPLv3 — copyleft including SaaS; MongoDB pre-2018; Neo4j Community; Elasticsearch re-added Sep 2024.
  • SSPL — Server Side Public License; MongoDB Oct 2018; Elastic Jan 2021; Redis Mar 2024; explicitly designed to force SaaS providers to open-source their orchestration.
  • BSL — Business Source License 1.1; MariaDB created; Cockroach (Aug 2024 returned to BSL only after years of post-3-years-Apache); Sentry; HashiCorp Terraform Aug 2023 (triggered OpenTofu Linux Foundation fork); Redis Mar 2024 dual SSPL+RSAL.
  • Elastic License 2.0 (ELv2) — Elastic Jan 2021; restricts hosted SaaS competition.
  • Apache 2.0 — most CNCF + most OSS DBs still default here.
  • MPL 2.0 — Mozilla; CrateDB.
  • Apache Polaris, Apache Iceberg, Apache Hudi, Delta Lake, DuckDB, PostgreSQL, MariaDB Server, TiDB, YugabyteDB, Postgres extensions — all permissive (Apache/PostgreSQL/MIT/BSD).

The OpenSearch Software Foundation transfer from AWS to Linux Foundation (Sep 2024) and Valkey under Linux Foundation (Mar 2024) signal a defensive-foundation strategy by hyperscalers and large users to neutralize relicensing risk going forward.


Adjacent

  • observability-tools-catalog — the metrics, logs, traces stack that lives alongside these engines.
  • llm-landscape — vector stores are foundational to LLM retrieval and pair with engines here.
  • ml-framework-comparison — model training and inference often pulls feature data from these stores.
  • distributed-systems — consistency / CAP / consensus background for the NewSQL and globally distributed categories.
  • storage-systems — block + object + file storage substrates these engines build on.
  • streaming-systems — Kafka, Pulsar, Flink, Materialize ecosystem cross-references.