SQL — Reference (ANSI/ISO + dialects)

Source: https://www.iso.org/standard/76583.html

SQL

  • Created: 1974 (SEQUEL/SQL at IBM, Donald Chamberlin & Raymond Boyce); first ANSI standard 1986 (SQL-86), ISO standard 1987
  • Latest stable (standard): ISO/IEC 9075:2023 (“SQL:2023”) — added Part 16 (Property Graph Queries / SQL/PGQ) and a JSON data type in SQL/Foundation. (Wikipedia SQL article citing ISO/IEC 9075:2023)
  • Paradigms: Declarative, set-based, relational; with procedural extensions per dialect (PL/pgSQL, T-SQL, PL/SQL, MySQL stored programs)
  • Typing: Static, strongly typed columns; expression coercion rules vary by dialect
  • Memory: Engine-managed; buffer pools, page cache, sort spillage to disk
  • Compilation: Statements parsed → bound → planned/optimized → executed; plans typically cached/reused
  • Primary domains: Online transactional processing (OLTP), analytical processing (OLAP), data warehousing, embedded application storage, time-series, geospatial, document and graph workloads
  • Implementations: PostgreSQL (18.x, 2026), MySQL (8.4 LTS, 2026 release line) / MariaDB, SQLite (3.x, embedded), Microsoft SQL Server (2022 / vNext on Linux & Windows), Oracle Database (23ai/26ai), plus DB2, CockroachDB, YugabyteDB, Snowflake, BigQuery, Redshift, DuckDB, ClickHouse, Spanner, Aurora, Singlestore — all SQL-flavored, none fully ISO-compliant
  • Official docs:

At a glance

SQL (“Structured Query Language”) is a declarative language for relational data — you describe what set you want; the optimizer chooses how to get it. The ISO/IEC 9075 standard runs to thousands of pages across a dozen parts (Foundation, CLI, PSM, MED, OLB, Schemata, JRT, XML, etc., with SQL/PGQ added in 2023), and “virtually no implementation adheres to it fully.” Every database vendor implements a dialect — a SQL core plus extensions, plus a procedural language. The portable subset is small enough that getting cross-dialect SQL right is a real engineering problem; ORMs and DSLs (jOOQ, SQLAlchemy Core, Diesel) exist largely to hide it. (Wikipedia)

Quick dialect cheat sheet:

ConcernPostgreSQLMySQL/MariaDBSQLiteSQL Server (T-SQL)Oracle (PL/SQL)
ArchitectureServer, multi-processServer, multi-thread (per connection)In-process library, single fileServer, multi-threadServer, multi-process
Default isolationRead CommittedRepeatable ReadSerializableRead CommittedRead Committed
Concurrency modelMVCC (snapshot)InnoDB MVCCReader-writer locks (WAL mode → MVCC-ish)MVCC w/ Read Committed Snapshot opt-inMVCC
Auto-incrementGENERATED ALWAYS AS IDENTITY (or serial)AUTO_INCREMENTINTEGER PRIMARY KEY (rowid alias)IDENTITY(1,1)GENERATED BY DEFAULT AS IDENTITY (12c+)
Limit/offsetLIMIT n OFFSET mLIMIT m, n or LIMIT n OFFSET mLIMIT n OFFSET mOFFSET m ROWS FETCH NEXT n ROWS ONLYOFFSET m ROWS FETCH NEXT n ROWS ONLY (12c+)
String concat`(andconcat()`)CONCAT() (`
Quote string'literal'; double '' to escape'literal' or "literal"'literal''literal' (N'literal' for unicode)'literal'
Quote identifier"name"`name` (backticks)"name" or `[name] or "name""NAME" (uppercase canonical)
JSON typeJSON, JSONB (binary, indexable)JSON (binary internally)TEXT + json1 extensionnvarchar(max) + JSON_* functions, JSON type (2025+)JSON (21c+) + JSON Relational Duality
UPSERTINSERT … ON CONFLICT … DO UPDATEINSERT … ON DUPLICATE KEY UPDATEINSERT … ON CONFLICT … DO UPDATEMERGE (with caveats)MERGE
BooleanNative BOOLEANTINYINT(1) (no real bool)None (use INTEGER)BITNone pre-23ai (use NUMBER(1))
ProceduralPL/pgSQL (also PL/Python, PL/Perl, PL/Tcl)SQL/PSM dialectNone (call from host)T-SQLPL/SQL

Getting started

Install:

  • PostgreSQL: brew install postgresql@17, apt install postgresql, or Docker postgres:17. Connect with psql.
  • MySQL: brew install mysql, apt install mysql-server. CLI: mysql.
  • SQLite: brew install sqlite, almost always preinstalled. CLI: sqlite3 mydb.sqlite. No server.
  • SQL Server: Docker mcr.microsoft.com/mssql/server:2022-latest (cross-platform now), or Express MSI on Windows. CLI: sqlcmd.
  • Oracle: Free edition Oracle Database Free 23ai (Linux container easiest). CLI: sqlplus or modern sqlcl.

Hello world (any dialect):

SELECT 'Hello, world!';

Project layout: No standard. Migration tooling owns layout: Flyway (V1__init.sql, V2__add_users.sql), Liquibase (XML/YAML/SQL changesets), Alembic (SQLAlchemy), Prisma migrate, Knex, sqitch.

Build/package tool: None for SQL itself. Migrations are the build pipeline. Schema diff tools: pg_dump --schema-only, mysqldump --no-data, sqldef, Atlas, dbmate, Skeema (MySQL).

REPL: psql, mysql, sqlite3, sqlcmd, sqlplus/sqlcl. DBeaver (cross-dialect, free), DataGrip (JetBrains), TablePlus, pgAdmin, MySQL Workbench, SQL Server Management Studio (SSMS), Azure Data Studio as GUIs.

Basics

Types & literals. Standard scalar types: BOOLEAN, SMALLINT/INTEGER/BIGINT, NUMERIC(p,s)/DECIMAL, REAL/DOUBLE PRECISION, CHAR(n)/VARCHAR(n)/TEXT, DATE/TIME/TIMESTAMP [WITH TIME ZONE], INTERVAL, BLOB/BYTEA, UUID (Postgres), JSON/JSONB. String literals single-quoted with doubled single-quote escape: 'O''Brien'. Date: DATE '2026-05-06', TIMESTAMP '2026-05-06 12:34:56+00'. NULL is its own three-valued logic — x = NULL is never true; use x IS NULL / IS DISTINCT FROM.

DDL — schema.

CREATE TABLE users (
    id          BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,    -- standard since SQL:2003; supported by Postgres, Oracle 12c+, SQL Server 2022 (T-SQL syntax differs)
    email       VARCHAR(255) NOT NULL UNIQUE,
    created_at  TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP,
    metadata    JSONB
);
CREATE INDEX users_email_lower_idx ON users (LOWER(email));
ALTER TABLE users ADD COLUMN deleted_at TIMESTAMP WITH TIME ZONE;
DROP TABLE users CASCADE;

DML — data.

INSERT INTO users (email) VALUES ('a@x'), ('b@x') RETURNING id;   -- RETURNING is Postgres/Oracle/SQL Server (OUTPUT in T-SQL); not SQLite/MySQL
UPDATE users SET deleted_at = now() WHERE email = 'a@x';
DELETE FROM users WHERE deleted_at < now() - INTERVAL '90 days';

SELECT — the query.

SELECT u.email, COUNT(*) AS post_count
  FROM users u
  LEFT JOIN posts p ON p.author_id = u.id
 WHERE u.created_at >= DATE '2026-01-01'
 GROUP BY u.email
 HAVING COUNT(*) > 10
 ORDER BY post_count DESC
 LIMIT 100;

Logical processing order (mental model — physical order is the optimizer’s choice): FROMJOINWHEREGROUP BYHAVING → window functions → SELECTDISTINCTORDER BYLIMIT. This is why you can’t reference a SELECT alias in WHERE (alias not yet bound) but you can in ORDER BY.

Joins. INNER JOIN, LEFT/RIGHT/FULL OUTER JOIN, CROSS JOIN, LATERAL JOIN / CROSS APPLY (per-row subquery, see Advanced).

Set operations. UNION / UNION ALL, INTERSECT, EXCEPT (Postgres/SQL Server) / MINUS (Oracle). ALL keeps duplicates; without it dedupes (= one extra sort).

Constraints. PRIMARY KEY, UNIQUE, FOREIGN KEY … REFERENCES … ON DELETE CASCADE | SET NULL | RESTRICT | NO ACTION, CHECK (...), NOT NULL. Defer FKs in transactions: SET CONSTRAINTS ALL DEFERRED.

Intermediate

Window functions (SQL:2003, universal except very old SQLite). Compute aggregations over a partition without collapsing rows:

SELECT
    user_id,
    order_total,
    SUM(order_total) OVER (PARTITION BY user_id ORDER BY ordered_at
                           ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total,
    ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY ordered_at) AS order_seq,
    LAG(order_total) OVER (PARTITION BY user_id ORDER BY ordered_at) AS prev_total
FROM orders;

Supported window fns: ROW_NUMBER, RANK, DENSE_RANK, PERCENT_RANK, NTILE, LAG/LEAD, FIRST_VALUE/LAST_VALUE/NTH_VALUE, plus any aggregate as a window function. Frame clauses: ROWS BETWEEN n PRECEDING AND m FOLLOWING, RANGE, GROUPS (SQL:2011, Postgres+Oracle).

Common Table Expressions (CTEs).

WITH active AS (
    SELECT * FROM users WHERE deleted_at IS NULL
)
SELECT * FROM active WHERE created_at > now() - INTERVAL '7 days';

Recursive CTE for graph/tree traversal:

WITH RECURSIVE descendants AS (
    SELECT id, parent_id, name FROM categories WHERE id = 1
    UNION ALL
    SELECT c.id, c.parent_id, c.name
      FROM categories c
      JOIN descendants d ON c.parent_id = d.id
)
SELECT * FROM descendants;

Termination is by the UNION ALL halting when the right side returns no rows.

Subqueries. Scalar (SELECT (SELECT max(x) FROM …)), table (in FROM), correlated (references outer query — runs per row, often optimizer-rewritten to a join). EXISTS (subquery) is usually the right form for “any matching row” — short-circuits.

Indexes (more in Advanced). CREATE INDEX … (col) for B-tree (default everywhere). Multi-column index (a, b) accelerates WHERE a = ?, WHERE a = ? AND b = ?, but not WHERE b = ? alone. Functional/expression indexes: CREATE INDEX … ON t (LOWER(email)) (Postgres/Oracle/SQL Server, not native MySQL — workaround via generated columns).

Transactions.

BEGIN;          -- or START TRANSACTION
… statements …
COMMIT;         -- or ROLLBACK

Savepoints: SAVEPOINT s1; … ROLLBACK TO SAVEPOINT s1;. Transaction isolation set per-tx: SET TRANSACTION ISOLATION LEVEL SERIALIZABLE | REPEATABLE READ | READ COMMITTED | READ UNCOMMITTED.

Procedural / stored programs.

  • PostgreSQL — PL/pgSQL (also pluggable PL/Python, PL/Perl, PL/Tcl, PL/v8).
  • SQL Server — T-SQL (CREATE PROCEDURE … AS BEGIN … END).
  • Oracle — PL/SQL (CREATE OR REPLACE PROCEDURE … IS BEGIN … END;).
  • MySQL — SQL/PSM dialect (DELIMITER //; CREATE PROCEDURE …).
  • SQLite — none built-in; use the host language. json_each, json_tree table-valued functions cover much of what triggers and functions provide elsewhere.

Advanced

Memory model & on-disk format. Each engine maintains a buffer pool / page cache (PostgreSQL: shared_buffers; MySQL/InnoDB: innodb_buffer_pool_size; SQL Server: max server memory; Oracle: SGA). Pages are 8 KiB (Postgres, SQL Server default), 16 KiB (InnoDB default), or configurable (SQLite default 4 KiB; Oracle 2/4/8/16/32 KiB by tablespace). Sort/hash spills to tempdb (SQL Server), temp tablespace (Oracle), temp_tablespaces (Postgres), sort_buffer_size overflow files (MySQL).

MVCC vs lock-based. Postgres, MySQL/InnoDB, Oracle, SQL Server (with Read Committed Snapshot) all use MVCC — readers see a snapshot, writers create new row versions, vacuum/garbage collection reclaims old versions later. Pre-RCSI SQL Server and pre-WAL SQLite use shared/exclusive locks — readers block writers and vice versa. MVCC implications: long-running transactions delay vacuum (Postgres “bloat”), repeatable-read works without read locks, write conflicts are detected at commit time under SERIALIZABLE.

Transaction isolation — the actual semantics, not the names. The standard defines four levels by what anomalies they prevent:

LevelDirty readNon-repeatable readPhantom readWrite skewSerialization anomaly
READ UNCOMMITTEDpossiblepossiblepossiblepossiblepossible
READ COMMITTEDpreventedpossiblepossiblepossiblepossible
REPEATABLE READpreventedpreventedpossible (per std)possiblepossible
SERIALIZABLEpreventedpreventedpreventedpreventedprevented

But every engine implements these levels differently:

  • PostgreSQL “REPEATABLE READ” is actually snapshot isolation — also prevents phantom reads (better than the standard). Doesn’t prevent write skew. Its SERIALIZABLE uses Serializable Snapshot Isolation (SSI) — true serializability with rollback on detected anomaly. Default: READ COMMITTED.
  • MySQL/InnoDB “REPEATABLE READ” (the default) uses snapshot reads + next-key locks for inserts in range-locked queries. Mostly prevents phantoms — not always. Its SERIALIZABLE adds shared locks on every read.
  • SQL Server default READ COMMITTED uses locks unless RCSI (Read Committed Snapshot Isolation) is on. SNAPSHOT isolation is opt-in per-database. SERIALIZABLE uses range locks.
  • Oracle has READ COMMITTED (default) and SERIALIZABLE; both snapshot-based. No support for READ UNCOMMITTED or REPEATABLE READ.
  • SQLite is SERIALIZABLE always (single-writer, journal/WAL provides snapshot reads in WAL mode).

MERGE / UPSERT. Standard MERGE (SQL:2003) exists in SQL Server, Oracle, and Postgres 15+ — but its semantics under concurrency are notoriously underspecified. SQL Server’s MERGE has known race-condition bugs (Aaron Bertrand’s Use Caution with SQL Server’s MERGE Statement); many guides recommend INSERT … ON CONFLICT (Postgres/SQLite) or INSERT … ON DUPLICATE KEY UPDATE (MySQL) instead.

-- Postgres / SQLite (UPSERT)
INSERT INTO counters (key, value) VALUES ('x', 1)
ON CONFLICT (key) DO UPDATE SET value = counters.value + EXCLUDED.value;
 
-- MySQL
INSERT INTO counters (`key`, value) VALUES ('x', 1)
ON DUPLICATE KEY UPDATE value = value + VALUES(value);
 
-- Standard MERGE
MERGE INTO counters c USING (SELECT 'x' AS k, 1 AS v) s ON (c.key = s.k)
  WHEN MATCHED THEN UPDATE SET value = c.value + s.v
  WHEN NOT MATCHED THEN INSERT (key, value) VALUES (s.k, s.v);

Lateral joins. LATERAL lets the right side of a join reference columns from the left. Standardized SQL:1999. Supported in Postgres, Oracle 12c+, SQL Server (CROSS APPLY/OUTER APPLY predates lateral), MySQL 8.0.14+.

-- "Top 3 orders per user"
SELECT u.id, o.*
  FROM users u
  CROSS JOIN LATERAL (
      SELECT * FROM orders WHERE user_id = u.id ORDER BY total DESC LIMIT 3
  ) o;

JSON / XML. Per dialect:

  • Postgres: JSONB (binary, indexable, GIN); operators ->, ->>, #>, @>, ?, jsonb_path_exists, jsonb_path_query. SQL:2016 path syntax in jsonb_path_* family.
  • MySQL: JSON (binary), ->, ->>, JSON_EXTRACT, JSON_TABLE for shredding to relational form.
  • SQL Server: historically nvarchar(max) + JSON_VALUE/JSON_QUERY/OPENJSON; SQL Server 2022+ added native JSON type and ISJSON, JSON_OBJECT, JSON_ARRAY, JSON_PATH_EXISTS. (What’s new in SQL Server 2022)
  • Oracle: JSON type (21c+) and JSON Relational Duality (23ai+) — query the same data as table or document.
  • SQLite: json1 extension built in (json_extract, json_each, json_tree).
  • XML: SQL/XML standard. Postgres xpath, SQL Server .value('xpath', 'sqltype'), Oracle XQuery.

Query planner inspection.

  • Postgres: EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) <query> — runs query, returns chosen plan + actual timings + I/O.
  • MySQL: EXPLAIN ANALYZE (8.0.18+) or EXPLAIN FORMAT=JSON.
  • SQL Server: SET STATISTICS XML ON; SET STATISTICS IO ON; or “Display Actual Execution Plan” in SSMS; SHOWPLAN_XML.
  • Oracle: EXPLAIN PLAN FOR <query>; SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY); Real-time SQL Monitoring for live exec.
  • SQLite: EXPLAIN QUERY PLAN <query>.

Index types beyond B-tree.

TypeEngine(s)Use case
B-treeAllEquality, range, prefix on ordered columns
HashPostgres (memory only pre-10), MySQL Memory engineEquality only; no range. Rare; B-tree usually better
GIN (Generalized Inverted)PostgresInverted index for JSONB, arrays, full-text (tsvector); fast @> containment
GiST (Generalized Search Tree)PostgresGeometric, full-text, range types, kNN; foundation of PostGIS
BRIN (Block Range)PostgresAppend-only / time-series tables — small index, scans block ranges
SP-GiSTPostgresSpace-partitioned trees (quad-trees, k-d trees)
BitmapOracleLow-cardinality columns, OLAP joins
ColumnstoreSQL Server, Postgres (citus, hydra), MariaDBAnalytical scans across columns; massive compression
Full-textAll major (tsvector/MATCH AGAINST/CONTAINS/Oracle Text/SQLite FTS5)Token search, ranking

Procedural extensions deep dive.

  • PL/pgSQL (Postgres): CREATE FUNCTION … LANGUAGE plpgsql AS $$ DECLARE … BEGIN … END $$;. Cursors, exceptions (EXCEPTION WHEN unique_violation THEN …), composite types.
  • T-SQL (SQL Server): BEGIN TRY/CATCH, THROW, table-valued parameters, OUTPUT clause, dynamic SQL via sp_executesql @sql, @paramdef, @p1=… (parameterized — SQL injection-safe vs EXEC()), schema-bound functions for indexed views.
  • PL/SQL (Oracle): packages (header + body), %TYPE/%ROWTYPE for column-typed vars, bulk operations (FORALL/BULK COLLECT — order-of-magnitude faster than row-by-row), pipelined table functions, autonomous transactions (PRAGMA AUTONOMOUS_TRANSACTION).
  • MySQL stored programs: procedures, functions, triggers, events. Far less mature than Oracle/Postgres; many shops avoid in favor of application-side logic.

Foreign Data Wrappers / federation.

  • Postgres: postgres_fdw, oracle_fdw, mysql_fdw, file_fdw, multicorn (Python wrappers) — query external sources as tables.
  • SQL Server: OPENROWSET, linked servers, PolyBase (S3, Hadoop, parquet — extended in SQL Server 2022).
  • Oracle: Database Links, DBMS_HS_* (Heterogeneous Services).
  • MySQL: FEDERATED storage engine (limited).

Materialized views.

  • Postgres: CREATE MATERIALIZED VIEW … WITH DATA; REFRESH MATERIALIZED VIEW [CONCURRENTLY] mv; (CONCURRENTLY requires unique index, doesn’t lock readers).
  • Oracle: rich support — REFRESH FAST (incremental via materialized view logs), REFRESH COMPLETE, REFRESH FORCE, ON COMMIT vs ON DEMAND.
  • SQL Server: indexed views — strictly more constrained but maintained synchronously.
  • MySQL: none built-in. Triggers + summary tables, or external tools (Flexviews).
  • SQLite: none.

Partitioning.

  • Postgres (declarative since 10): PARTITION BY RANGE | LIST | HASH (col); subpartitioning. Partition pruning at planning time.
  • MySQL: RANGE, LIST, HASH, KEY, COLUMNS. Partition pruning. Limited (no FK across partitions).
  • SQL Server: partition functions + partition schemes; one filegroup per partition. SWITCH PARTITION for fast bulk load.
  • Oracle: richest — RANGE, LIST, HASH, INTERVAL (auto-create on insert), REFERENCE (FK-based), composite. Partition-wise joins.
  • SQLite: none (single file).

Sharding is application-side except for purpose-built distributed SQL (CockroachDB, YugabyteDB, Spanner, Vitess for MySQL, Citus for Postgres). Add consistent hashing and you have a horizontal-scale story.

Prepared statements & plan caching. Every engine caches plans for parameterized statements. Pitfalls: parameter sniffing — the first execution’s parameters shape the plan, subsequent calls with skewed params reuse a bad plan. Mitigations: SQL Server OPTION (RECOMPILE), OPTIMIZE FOR UNKNOWN, parameter-sensitive plan optimization (2022+); Oracle OPT_PARAM hints, adaptive plans (12c+); Postgres prepared-statement plan cache hits the “generic vs custom plan” decision at execution 6.

God mode

  • Recursive CTEs for graph traversal: above pattern applies to org charts, dependency graphs, transitive closures. With cycle detection clause (SQL:2016, Postgres 14+): CYCLE id SET is_cycle USING path_array.
  • Window-function recipes: ROW_NUMBER() OVER (PARTITION BY user ORDER BY ts DESC) = 1 for “latest per group”. SUM(x) OVER () for grand total alongside detail rows. Gaps-and-islands: ROW_NUMBER() OVER (ORDER BY ts) - ROW_NUMBER() OVER (PARTITION BY status ORDER BY ts) groups consecutive same-status rows.
  • FILTER clause (SQL:2003): COUNT(*) FILTER (WHERE active) instead of COUNT(CASE WHEN active THEN 1 END). Postgres + SQLite + Oracle 23ai. Cleaner for conditional aggregates.
  • GROUPING SETS, CUBE, ROLLUP — multi-level aggregations in one query (SQL:1999).
  • TABLESAMPLESELECT * FROM t TABLESAMPLE SYSTEM (1) reads ~1% of pages. Postgres, SQL Server, Oracle.
  • MERGE with caveats: prefer ON CONFLICT (Postgres/SQLite) where available; if you must MERGE (Oracle/SQL Server), wrap in serializable txn or use HOLDLOCK in T-SQL to avoid concurrency races.
  • JSONB indexing (Postgres): CREATE INDEX ON t USING gin (data jsonb_path_ops) — operator-class-tuned GIN index, smaller and faster for @> queries than the default.
  • Generated columns (GENERATED ALWAYS AS … STORED | VIRTUAL): synthesize values from other columns; STORED lets you index them (the workaround for MySQL’s lack of expression indexes).
  • LATERAL for top-N-per-group (above).
  • Arrays (Postgres): text[], int[][], unnest(), array_agg(), array @> array, GIN-indexable. Often a JSON-replacement for typed array data.
  • Range types (Postgres): int4range, tstzrange. Exclusion constraints prevent overlapping ranges (e.g., room booking conflicts).
  • PL/pgSQL RAISE EXCEPTION USING ERRCODE = '23505' — emit specific SQLSTATE so apps can distinguish constraint violations.
  • SELECT … FOR UPDATE [SKIP LOCKED | NOWAIT] — row-level pessimistic locking; SKIP LOCKED is the canonical work-queue pattern (each consumer claims rows the others didn’t lock). Standard since SQL:2003 (SKIP LOCKED is SQL:2016).
  • SELECT … FOR SHARE — share-lock for read-modify-write to prevent concurrent updates without blocking other readers.
  • EXPLAIN ANALYZE discipline: always include BUFFERS (Postgres) — knowing whether a plan hit cache vs disk is more important than its time. Compare estimated vs actual rows; orders-of-magnitude divergence = stale stats (ANALYZE/UPDATE STATISTICS).
  • Transaction-scoped temp tables (Postgres CREATE TEMP TABLE … ON COMMIT DROP, SQL Server #temp, Oracle CREATE GLOBAL TEMPORARY TABLE … ON COMMIT DELETE ROWS) for pipeline-style intermediate results.
  • COPY (Postgres) / BULK INSERT (SQL Server) / LOAD DATA INFILE (MySQL) / SQL*Loader (Oracle) — bypass row-by-row INSERT for massive throughput on bulk loads.
  • Database-side full-text: Postgres to_tsvector/to_tsquery + GIN, MySQL MATCH … AGAINST, SQL Server CONTAINS/FREETEXT, Oracle Text, SQLite FTS5. Often good enough to skip Elasticsearch for small corpora.
  • PostgreSQL’s pg_stat_* views — runtime telemetry: pg_stat_statements (per-query stats; needs extension), pg_stat_user_indexes (find unused indexes by idx_scan = 0), pg_stat_activity (running queries).

Idioms & style

  • Naming: snake_case for tables and columns is overwhelmingly the convention (Postgres folds unquoted identifiers to lowercase; mixed case requires quoting forever). Singular vs plural table names is a holy war — pick one and stick.
  • Always alias tables in joins (FROM users u JOIN orders o ON u.id = o.user_id).
  • SELECT * is for ad-hoc only. In code, name columns — schema changes silently break consumers otherwise.
  • COUNT(*) over COUNT(1) over COUNT(col). * and 1 are equivalent (the optimizer treats them identically on every modern engine); COUNT(col) skips NULLs.
  • Use IS [NOT] DISTINCT FROM for null-safe equality (SQL:1999; Postgres, SQL Server 2022+, Oracle 23ai). MySQL has <=>.
  • Prefer EXISTS over IN (subquery) for “does any row match” — short-circuits, NULL-safe.
  • Always parameterize — never string-concatenate values into SQL. SQL injection is solved with ? / $1 placeholders + driver binding.
  • Migrations are append-only. Edit-and-rerun breaks production. Use Flyway/Liquibase/Atlas/Alembic and treat each migration as immutable history.
  • One transaction per logical unit of work. Long-running transactions hurt MVCC bloat (Postgres) and lock duration (SQL Server pre-RCSI).
  • Index the foreign keys. Most engines don’t auto-index FK columns; DELETE of a parent row scans children otherwise. (Oracle is famous for the “unindexed FK = full-table-lock during delete” trap.)
  • Linter / formatter: sqlfluff (multi-dialect, configurable rules), pgFormatter (Postgres-leaning). Editor support via DataGrip / DBeaver / VS Code SQLTools.
  • Style guides: Mozilla’s SQL style, GitLab’s data team SQL style guide, Simon Holywell’s “SQL Style Guide” (https://www.sqlstyle.guide/).

Ecosystem

  • Migration tools: Flyway, Liquibase, Alembic (SQLAlchemy), Atlas, dbmate, sqitch, Skeema (MySQL), DACPAC/sqlpackage (SQL Server), Liquibase Pro.
  • Query builders / ORMs: SQLAlchemy (Python), jOOQ (JVM — generates type-safe DSL from schema), Diesel (Rust), Prisma (Node.js), Hibernate / EF Core, Doctrine (PHP), Ecto (Elixir).
  • Connection pooling: PgBouncer (Postgres), ProxySQL (MySQL), HikariCP (JVM-side, all dialects).
  • Replication / HA: Postgres logical & streaming replication, Patroni; MySQL GTID replication, InnoDB Cluster, Group Replication; SQL Server Always On Availability Groups; Oracle Data Guard / GoldenGate; Litestream / rqlite for SQLite.
  • Backup: pg_basebackup + WAL archiving (Postgres), mysqldump/xtrabackup (MySQL), VSS-based (SQL Server), RMAN (Oracle), .backup (SQLite).
  • Observability: pg_stat_statements, performance schema (MySQL), Query Store (SQL Server 2016+, default-on in 2022), AWR / Statspack (Oracle).
  • Distributed SQL: CockroachDB, YugabyteDB, TiDB, Spanner, Citus (Postgres extension for sharding), Vitess (MySQL).
  • Analytical / OLAP: Snowflake, BigQuery, Redshift, ClickHouse, DuckDB (in-process columnar SQL — increasingly important), Databricks SQL.
  • Notable users: every business with persistent state. Postgres at Apple, Instagram, Reddit, Datadog. MySQL at Facebook (huge fork: MySQL+RocksDB), Booking.com, Wikipedia. SQLite in every iOS/Android app, every browser, Skype, Adobe products, most embedded systems with structured storage. SQL Server across Microsoft, Stack Overflow (famously). Oracle in essentially every Fortune 500 ERP.

Gotchas

  • NULL is not equal to NULL. WHERE x = NULL matches nothing. Use IS NULL / IS DISTINCT FROM.
  • Three-valued logic. NOT (x IN (1, NULL)) is NULL (not true) when x ∉ {1, NULL}. NOT EXISTS is safer.
  • COUNT(col) skips NULLs; COUNT(*) doesn’t.
  • String comparison collation differences. 'a' = 'A' depends on collation. SQL Server is case-insensitive by default; Postgres is case-sensitive by default.
  • Ordering without ORDER BY is undefined. No engine guarantees insertion order; what worked yesterday breaks after VACUUM FULL or a plan change.
  • LIMIT without ORDER BY returns “any” N rows. Same trap.
  • Implicit type coercion footguns. MySQL: WHERE id = '1abc' matches id = 1 because of implicit cast. Postgres errors. Performance: WHERE indexed_col = 'literal' may not use the index if types differ (e.g., int column vs string literal).
  • SELECT * and column reordering. Adding a column to a table changes SELECT * order — breaks INSERT INTO … SELECT * and positional clients.
  • Auto-commit by default in most CLIs — wrap in explicit BEGIN; … COMMIT; or set \set AUTOCOMMIT off (psql).
  • Postgres folds unquoted identifiers to lowercase, the standard says they should be uppercase. Quoted identifiers preserve case forever. Mixing quoted/unquoted is misery.
  • MySQL’s || is OR, not concat unless sql_mode includes PIPES_AS_CONCAT. Use CONCAT().
  • MySQL’s “REPEATABLE READ” + write skew — a lost-update can occur if you read-then-write without FOR UPDATE.
  • SQL Server’s MERGE race conditions under concurrent writes — known footgun, see Bertrand’s article.
  • Oracle’s empty string is NULL. '' IS NULL is true. Different from every other dialect.
  • Oracle’s DATE includes time of day. No separate TIME type. TIMESTAMP is the modern choice.
  • SQLite’s “type affinity” isn’t real type checking — you can insert a string into an INTEGER column unless you declare STRICT (SQLite 3.37+). Quietly stores it.
  • MERGE is not always atomic in concurrency — even where it works, multiple concurrent MERGEs can both decide “not matched” then both INSERT. Use unique constraint + ON CONFLICT or wrap in SERIALIZABLE.
  • INSERT … RETURNING isn’t in MySQL or SQLite (SQLite added in 3.35). Standardize via LAST_INSERT_ID() (MySQL), last_insert_rowid() (SQLite), OUTPUT (T-SQL), RETURNING INTO :var (PL/SQL).
  • Foreign keys not enforced in SQLite by default. PRAGMA foreign_keys = ON; per connection — and the WAL/journal default doesn’t preserve this across reopens.
  • Time zones. TIMESTAMP without time zone is wall-clock; TIMESTAMP WITH TIME ZONE (Postgres) actually stores UTC and converts on read per session TZ. SQL Server has datetimeoffset (with TZ) and datetime2 (without). Never store DATETIME and “remember it’s UTC” — you’ll forget.
  • Floating point money. Use NUMERIC(p,s) / DECIMAL. Never FLOAT/DOUBLE.
  • OFFSET for pagination scales poorly (DB scans + discards offset rows). Use keyset pagination (WHERE id > $last_id ORDER BY id LIMIT n) for large data sets.
  • N+1 query problem is an ORM/app issue but shows up in DB load. EXPLAIN per query and look at the access pattern.

Citations