Graph Theory — Math Reference

1. At a glance

A graph G = (V, E) is a set of vertices V and a set of edges E ⊆ V × V that connect them. Graphs are the universal data structure for relational information. Once you see them, you see them everywhere:

Social networks — vertices are people, edges are friendships, follows, mentions (Facebook, LinkedIn, Twitter).
Transportation — intersections + roads, airports + flights, ports + shipping lanes.
Electrical grids — generators + substations + transmission lines; flows obey Kirchhoff’s laws on the underlying graph.
Computer networks — routers + links; routing protocols (OSPF, BGP) are shortest-path algorithms on these graphs.
Dependency graphs — build systems (Make, Bazel, Cargo), package managers (npm, pip, Maven), task schedulers.
Knowledge graphs — entities + typed relations (Wikidata, Google KG, schema.org).
Neural networks — themselves graphs; modern attention is a soft fully-connected graph operator (see [[Compute/transformer-architecture]]).
Molecules — atoms as vertices, bonds as edges; underlies all of cheminformatics + drug discovery.
Pose graphs — robot poses + relative measurements; backbone of SLAM ([[Robotics/slam]]).

Modern relevance: graph neural networks (GNNs) generalize CNNs from grids to arbitrary topologies; graph databases (Neo4j, JanusGraph, TigerGraph) treat relationships as first-class citizens; network science has matured into a quantitative discipline since the late 1990s (Barabási, Watts, Strogatz, Newman).

This note is structured for use: definitions, then algorithms with complexity + originator + year, then libraries, applications, and pitfalls.

2. Definitions

Vertex (node): an element of V. Edge (link, arc): an element of E.
Order |V|; size |E|. Often written n = |V|, m = |E|.
Directed graph (digraph): edges are ordered pairs (u, v). Undirected: unordered {u, v}.
Simple graph: no self-loops, at most one edge between any pair. Multigraph: parallel edges allowed.
Weighted graph: edges carry weights w: E → ℝ. Unweighted: implicit unit weight.
Cyclic / acyclic: contains a cycle or not. A DAG (Directed Acyclic Graph) is the workhorse for scheduling, version histories, dependency resolution.
Connected (undirected): every pair of vertices joined by a path. Strongly connected (directed): path u → v and v → u for every pair.
Connected component: maximal connected subgraph.
Planar: drawable in the plane without crossings (see Kuratowski, §12).
Bipartite: V partitions into V₁ ⊎ V₂ with edges only between parts; equivalently, no odd cycle (König 1936).
Regular: every vertex has the same degree. k-regular if degree = k.
Complete graph K_n: every pair of distinct vertices connected; |E| = n(n−1)/2.
Complete bipartite K_{m,n}: every vertex in one part joined to every vertex in the other.
Complement Ḡ: same V, edges precisely the non-edges of G.
Subgraph H ⊆ G: V(H) ⊆ V(G), E(H) ⊆ E(G). Induced subgraph G[S]: take S ⊆ V and all edges of G with both endpoints in S.
Path / walk / trail / cycle: a walk allows repeats, a trail no repeated edge, a path no repeated vertex, a cycle is a closed path.
Distance d(u, v): length (or weight) of a shortest u–v path.

3. Representations

The right representation depends on density and access pattern.

Adjacency matrix A ∈ ℝ^{n×n}: A[i][j] = w(i, j) (or 1 for unweighted, 0 for absent). O(n²) space — fine for dense or small graphs; O(1) edge-existence test; matrix powers count walks. Symmetric for undirected graphs. Spectrum of A drives spectral methods (§13).
Adjacency list: for each v, a list of (neighbor, weight) pairs. O(n + m) space — the right choice for sparse graphs, which most real-world graphs are. Iteration over neighbors is O(deg(v)). Used by every serious library (NetworkX, igraph, SNAP, cuGraph).
Edge list: just an array of (u, v, w) triples. O(m) space. Natural for streaming, distributed processing (Pregel, GraphX), and external-memory algorithms. Less convenient for traversal.
CSR / CSC (Compressed Sparse Row / Column): two flat arrays — row_ptr[0..n] and col_idx[0..m], optionally val[0..m]. Adjacency list in cache-friendly layout. Standard form for GPU graph processing (cuGraph, Gunrock) and for SpMV-based algorithms (e.g. power iteration for PageRank).

Trade-off table:

Op	Adj matrix	Adj list	Edge list	CSR
Edge exists?	O(1)	O(deg)	O(m)	O(deg)
Iterate neighbors	O(n)	O(deg)	O(m)	O(deg)
Space	O(n²)	O(n+m)	O(m)	O(n+m)
GPU-friendly	yes (dense)	no	partial	yes

4. Degree

Undirected: deg(v) = number of edges incident to v.
Directed: in-degree deg⁻(v), out-degree deg⁺(v).

Handshake lemma: Σ_{v ∈ V} deg(v) = 2 |E|. Each edge contributes 1 to each endpoint. Corollary: the number of odd-degree vertices is even.

Degree distribution P(k) = fraction of vertices with degree k.

Uniform / regular: every vertex same degree. K_n, hypercubes, lattices.
Poisson / binomial: Erdős-Rényi G(n, p) — degrees ≈ Binomial(n−1, p), concentrated around the mean np.
Power-law / scale-free: P(k) ∼ k^{−γ}, typically 2 < γ < 3. Empirically observed in the WWW, citation networks, protein interaction networks, the Internet AS graph. Barabási-Albert model (Barabási + Albert 1999, Science) — preferential attachment (“rich get richer”) generates power-law graphs with γ = 3. Heavy tails mean hubs dominate, which has consequences for percolation, epidemic spreading, and PageRank.

5. Traversals

Two foundational traversals, both O(|V| + |E|) on adjacency-list representations.

BFS — Breadth-First Search

Explore layer by layer using a FIFO queue. Discovers vertices in order of unweighted distance from the source.

BFS(G, s):
  dist[s] = 0, mark s visited, queue.push(s)
  while queue not empty:
    u = queue.pop()
    for v in adj[u]:
      if not visited[v]:
        visited[v] = true
        dist[v] = dist[u] + 1
        parent[v] = u
        queue.push(v)

Properties: O(|V| + |E|); gives shortest unweighted paths; defines the BFS tree and a level structure; used in bipartiteness checking (two-color the levels), 2-SAT (implication graph), and as the inner loop of Edmonds-Karp max-flow (§8).

DFS — Depth-First Search

Go as deep as possible before backtracking. Recursion or explicit stack.

DFS(G, u):
  pre[u] = time++; visited[u] = true
  for v in adj[u]:
    if not visited[v]:
      DFS(G, v)
  post[u] = time++

Properties: O(|V| + |E|); produces pre / post-order times that classify edges as tree, back, forward, cross; detects cycles (a back edge in DFS forest = cycle); finds bridges and articulation points in one pass (Tarjan 1972 low-link); foundation for topological sort, SCC (§10), and planarity testing.

Topological sort

Linear ordering of a DAG such that every edge u → v has u before v. Two standard algorithms:

Kahn 1962 — repeatedly remove a source (in-degree 0), append to output, decrement in-degrees of neighbors. O(|V| + |E|). Natural for parallel scheduling.
DFS-based — DFS the graph, emit vertices in reverse post-order. Cycle detected if a back edge is found.

Applications: build systems (make, Bazel, Nx), spreadsheet recalculation, course prerequisites, instruction scheduling, neural-network compute graphs.

6. Shortest paths

The most thoroughly studied class of graph problem. Single-source vs all-pairs; positive vs negative weights; static vs dynamic.

Dijkstra (Dijkstra 1959, Numerische Mathematik)

Single-source shortest paths, non-negative weights.

Dijkstra(G, s):
  dist[s] = 0; dist[v] = ∞ otherwise
  PQ.push(s, 0)
  while PQ not empty:
    (u, d) = PQ.extract_min()
    if d > dist[u]: continue            # stale
    for (v, w) in adj[u]:
      if dist[u] + w < dist[v]:
        dist[v] = dist[u] + w
        parent[v] = u
        PQ.push(v, dist[v])

Complexity:

Binary heap: O((|V| + |E|) log |V|).
Fibonacci heap (Fredman + Tarjan 1987): O(|E| + |V| log |V|) — better asymptotically, slower constants.
Dense graphs with array-based PQ: O(|V|²).

Correctness requires non-negative weights — Dijkstra “fixes” vertices as it extracts them; a negative weight could later make the path shorter.

Bellman-Ford (Bellman 1958; Ford 1956)

Single-source, handles negative weights. Relax all edges |V| − 1 times.

Bellman-Ford(G, s):
  dist[s] = 0; dist[v] = ∞ otherwise
  for i = 1 .. |V|-1:
    for (u, v, w) in E:
      if dist[u] + w < dist[v]:
        dist[v] = dist[u] + w
  # one more pass: any relaxation means a negative cycle is reachable

Complexity: O(|V| · |E|). Detects negative cycles reachable from s. Used in distance-vector routing (RIP) and as the reweighting subroutine of Johnson’s algorithm.

Floyd-Warshall (Floyd 1962; Warshall 1962; Roy 1959)

All-pairs shortest paths, dynamic-programming over intermediate vertices.

Floyd-Warshall(G):
  dist[i][j] = w(i, j) or ∞; dist[i][i] = 0
  for k = 1 .. n:
    for i = 1 .. n:
      for j = 1 .. n:
        if dist[i][k] + dist[k][j] < dist[i][j]:
          dist[i][j] = dist[i][k] + dist[k][j]

O(|V|³) time, O(|V|²) space. Conceptually simple; cache-friendly; trivially parallel over (i, j) for fixed k. Negative cycles detected by a negative dist[i][i].

Johnson 1977

All-pairs shortest paths on sparse graphs with negative weights. Run Bellman-Ford from a dummy source to compute potentials h(v), reweight edges w’(u, v) = w(u, v) + h(u) − h(v) (now non-negative), then Dijkstra from every vertex. Total O(|V|·|E| + |V|² log |V|) — beats Floyd-Warshall when m ≪ n².

A* (Hart + Nilsson + Raphael 1968, IEEE Trans. Sys. Sci. Cyb.)

Heuristic-guided shortest path. Same skeleton as Dijkstra, priority = g(v) + h(v) where h is an admissible (never overestimates) heuristic. Optimal if h is admissible; optimal and visits the fewest vertices among admissible algorithms if h is consistent (monotone). Workhorse of robot path planning and game AI; see [[Robotics/Tier3/path-planning-algorithms]].

Contraction Hierarchies + Hub Labeling

For road networks (millions of nodes, billions of queries) Dijkstra is too slow. Contraction Hierarchies (Geisberger + Sanders + Schultes + Delling 2008) pre-compute a hierarchy of “important” nodes and shortcut edges. Hub Labeling (Abraham + Delling + Goldberg + Werneck 2011) precomputes a small label per node so a query is a single list intersection. Used in production by OSRM, Valhalla, GraphHopper, and (under the hood) by Google Maps and Bing Maps.

7. Minimum Spanning Tree (MST)

Given a connected, undirected, weighted graph, find an acyclic subgraph that spans all vertices and minimizes total weight. Always has n − 1 edges. Cut property: the lightest edge crossing any cut belongs to some MST. Cycle property: the heaviest edge in any cycle is not in any MST.

Kruskal 1956

Sort edges by weight; greedily add each edge if it joins two different components (check via Union-Find / Disjoint Set Union with union-by-rank + path compression, near-O(1) amortized). Total O(|E| log |V|). Beautifully simple, ideal for sparse graphs and external-memory variants.

Prim 1957 (rediscovered; Jarník 1930)

Grow the tree from an arbitrary root. Maintain a min-heap keyed on the lightest edge into the current tree. O(|E| log |V|) with binary heap; O(|E| + |V| log |V|) with Fibonacci heap. Conceptually similar to Dijkstra (same skeleton, different relaxation).

Borůvka 1926

Each round, every component picks its lightest outgoing edge. All chosen edges added simultaneously, components merged. Number of components halves each round so O(log |V|) rounds, each O(|E|), total O(|E| log |V|). Parallel-friendly — the round structure maps cleanly to GPU + distributed (the basis of modern parallel MST: cuGraph, GraphBLAS).

Applications: network design (laying cable, pipelines), clustering (single-linkage = MST cut), image segmentation, approximation algorithms (TSP ≤ 2 · MST), phylogenetic trees.

8. Network flow

A flow network is a directed graph with edge capacities c(u, v) ≥ 0, source s, sink t. A flow f respects capacity (f(u, v) ≤ c(u, v)) and conservation (in = out at every non-{s,t} vertex). Maximize the total flow out of s.

Max-flow / Min-cut theorem (Ford + Fulkerson 1956)

The maximum s–t flow equals the minimum s–t cut capacity. A foundational LP-duality result, foreshadowing the more general theory.

Ford-Fulkerson 1956

Repeatedly find an augmenting path in the residual graph; augment by its bottleneck. Terminates with integer capacities. Complexity bounded only by the max flow value F: O(|E| · F) — pathological on rational/irrational capacities.

Edmonds-Karp 1972

Always pick the shortest augmenting path (BFS in the residual graph). Complexity O(|V| · |E|²) regardless of capacities. Simple and robust.

Dinic’s algorithm (Dinitz 1970)

Build a layered graph by BFS; find a blocking flow in the layered graph; repeat. O(|V|² · |E|) general; O(|E| · √|V|) on unit-capacity graphs (used for bipartite matching, §9); O(|E| · √|E|) on graphs from bipartite matching.

Push-Relabel (Goldberg + Tarjan 1986/1988)

Local operations: push excess flow along admissible edges; relabel vertex heights when stuck. O(|V|² · √|E|) with FIFO selection; the highest-label variant gives O(|V|² · √|E|) in practice the fastest general-purpose algorithm. Used in industrial-strength solvers.

Min-cost max-flow

Given costs as well as capacities, find a max flow of minimum total cost. Solved by successive shortest paths (SSP) with Bellman-Ford or Dijkstra-with-potentials; by cycle-canceling; or as a linear program. Backbone of the transportation problem and many assignment formulations.

Bipartite matching as max-flow

Add a super-source s connected to all left vertices, a super-sink t connected from all right vertices, all unit-capacity. Max flow = max matching.

Applications

Assignment (workers to jobs) via min-cost max-flow.
Transportation (supply to demand).
Project selection (Picard 1976 — max-weight closure via min-cut).
Image segmentation via graph cuts (Boykov + Kolmogorov 2004, PAMI) — pixels as nodes, smoothness + data terms encoded as capacities; the s–t min-cut gives the optimal binary segmentation. Used in foreground extraction, stereo, medical imaging.
Network reliability, baseball elimination, scheduling with deadlines.

9. Matching

A matching M ⊆ E pairs up vertices, each vertex in at most one edge. Maximum cardinality matching; maximum weight matching; perfect matching (covers every vertex).

Bipartite

Hopcroft-Karp 1973 — find multiple augmenting paths per phase. O(|E| · √|V|). Standard for bipartite max-matching.
Hungarian / Kuhn-Munkres (Kuhn 1955, Munkres 1957) — maximum-weight perfect matching in a bipartite graph. O(n³). The textbook algorithm for the assignment problem.

General (non-bipartite)

Edmonds’ blossom algorithm 1965 — handles odd cycles by contracting “blossoms”; max-cardinality in O(|V|³) (later sharpened to O(|E| · √|V|) by Micali + Vazirani 1980).
Maximum-weight general matching — also Edmonds 1965; O(|V|³) or O(|V| · |E| · α(|E|, |V|)) modern implementations.

Stable matching — Gale-Shapley 1962

Each side has a preference ranking; output a matching with no blocking pair (no two people who’d both prefer each other to their current partner). Always exists, can be found in O(n²). The proposing side gets its optimal stable matching. Roth + Shapley shared the 2012 Nobel in Economics for theory + applications (medical residency matching via NRMP since 1952; school choice in Boston, NYC; kidney exchange).

10. Connectivity

Strongly Connected Components (SCC)

Maximal subsets of a directed graph where every pair is mutually reachable.

Tarjan 1972 — one DFS pass with a stack and low-link values. O(|V| + |E|).
Kosaraju 1978 (also attributed to Sharir 1981) — DFS to compute finishing times, transpose the graph, DFS in decreasing finish-time order. O(|V| + |E|). Simpler to teach.
Path-based (Cheriyan + Mehlhorn 1996; Gabow 2000).

The condensation of G (one node per SCC) is always a DAG — useful for 2-SAT (Aspvall + Plass + Tarjan 1979): satisfiable iff no variable and its negation lie in the same SCC of the implication graph.

Biconnected components (undirected)

A graph is biconnected if it has no articulation point. Decomposition into biconnected components via Hopcroft + Tarjan 1973, one DFS pass with low-link.

Articulation point (cut vertex): removing it disconnects the graph.
Bridge (cut edge): removing it disconnects.

Used in network reliability, redundancy analysis, and as a subroutine in planarity testing.

k-connectivity + Menger’s theorem (Menger 1927)

A graph is k-vertex-connected if removing any k − 1 vertices leaves it connected. Menger 1927: max number of vertex-disjoint u–v paths = min vertex cut separating u from v. The undirected, edge analog gives min cut = max edge-disjoint paths — a precursor to Ford-Fulkerson.

11. Trees

A tree is a connected acyclic undirected graph. Equivalent characterizations: n − 1 edges + connected; n − 1 edges + acyclic; unique path between every pair. Rooted trees give parent/child structure.

Tree DP

Many problems decompose over subtrees. Pseudocode template:

dp(v, parent):
  acc = base_case(v)
  for c in children(v, parent):
    sub = dp(c, v)
    acc = combine(acc, sub)
  return acc

Computes subtree sums/sizes, diameter (longest path — two DFS passes or one tree-DP), centroid, rerooting (compute answers from every root in O(n) total).

LCA — Lowest Common Ancestor

The deepest common ancestor of u and v in a rooted tree. Algorithms:

Binary lifting — precompute up[v][k] = 2^k-th ancestor; O(n log n) prep, O(log n) per query.
Euler tour + RMQ — flatten the tree; LCA = position of min depth in the range; O(n) prep with sparse table, O(1) per query.
Tarjan 1979 offline — union-find, O((n + q) α(n)) for n nodes + q queries.

Heavy-Light Decomposition (HLD) (Sleator + Tarjan 1983, conceptually)

Partition edges into “heavy” (to largest subtree) + “light” (rest). Each root-to-leaf path crosses O(log n) light edges, so any path is the concatenation of O(log n) contiguous heavy chains. Combined with a segment tree on each chain, supports path queries / updates in O(log² n).

Euler tour + segment tree

Flatten subtree to a contiguous range via DFS; subtree update / query becomes range update / query on the array. Standard tournament-tree technique for subtree aggregates.

Centroid decomposition

Recursively split the tree at its centroid (a vertex whose removal leaves subtrees of size ≤ n/2). Recursion depth O(log n). Solves many path-counting and divide-and-conquer problems on trees in O(n log n) or O(n log² n).

12. Planar graphs

A graph is planar if it can be drawn in the plane with no edge crossings. Planar graphs have rich structure.

Kuratowski’s theorem (Kuratowski 1930)

A graph is non-planar iff it contains a subdivision of K_5 or K_{3,3}. Equivalent formulation by Wagner 1937: iff it has K_5 or K_{3,3} as a minor.

Four-color theorem (Appel + Haken 1976; Robertson + Sanders + Seymour + Thomas 1997 simpler proof)

Every planar graph can be properly vertex-colored with at most 4 colors. First major theorem to require computer assistance in its proof (1,936 reducible configurations).

Euler’s formula (Euler 1758)

For a connected planar graph drawn without crossings: V − E + F = 2, where F counts faces (including the unbounded outer face). Corollaries: in a simple planar graph, |E| ≤ 3|V| − 6 and average degree < 6; every planar graph has a vertex of degree ≤ 5 (basis of a greedy 6-coloring).

Linear-time planarity testing (Hopcroft + Tarjan 1974) and planar embedding (Boyer + Myrvold 2004) exist. Planarity enables algorithms strictly faster than the general case (separator theorem — Lipton + Tarjan 1979 — gives O(√n) separators, hence subexponential algorithms for many NP-hard problems on planar inputs).

13. Spectral graph theory

Studies graphs through eigenvalues/eigenvectors of matrices built from them.

Adjacency matrix A

Spectrum of A: eigenvalues λ₁ ≥ λ₂ ≥ … ≥ λₙ. For undirected graphs, A is symmetric → real eigenvalues + orthonormal eigenvectors.

[A^k]i counts walks of length k from i to j.
λ₁ (Perron-Frobenius) is the spectral radius; bounded by max degree; equal to it iff regular.
Strongly regular graphs have just three distinct adjacency eigenvalues.

Laplacian L

L = D − A, where D = diag(deg(v)). Properties:

Symmetric, positive semidefinite.
L · 𝟙 = 0, so 0 is always an eigenvalue.
Multiplicity of 0 = number of connected components.
Quadratic form: x^T L x = Σ_{(u,v) ∈ E} (x_u − x_v)² — a measure of “smoothness” of x on G.

Normalized Laplacian

L_norm = I − D^{−1/2} · A · D^{−1/2}. Eigenvalues in [0, 2]. Better behaved on graphs with heterogeneous degrees; underlies modern spectral clustering and many GNN constructions.

Fiedler vector (Fiedler 1973)

The eigenvector v₁ corresponding to the second-smallest eigenvalue λ₁ (the algebraic connectivity). Sign of components partitions the graph into two roughly equal pieces with few crossing edges — basis of spectral bisection and graph partitioning (METIS, Scotch).

Cheeger inequality

Relates λ₁ of the normalized Laplacian to the conductance φ(G) (min cut to volume ratio): φ(G)² / 2 ≤ λ₁ ≤ 2 φ(G). Means the spectral gap controls mixing time of random walks — small gap, slow mixing, bottlenecks.

PageRank (Page + Brin 1998 — Stanford TR)

Stationary distribution of a random walk on the web graph, with damping (teleport with probability 1 − α to a uniform vertex). Solves π = α · M · π + (1 − α) · (1/n) · 𝟙, where M is the column-stochastic adjacency matrix. Computed via power iteration; or as the dominant eigenvector of α · M + (1 − α) · (1/n) · 𝟙·𝟙^T. Foundation of Google search 1998–2010s; still used as a building block.

HITS — Hubs and Authorities (Kleinberg 1999, J. ACM)

Two scores per vertex: hub h_v (links to good authorities) and authority a_v (linked by good hubs). Coupled power iteration: a = A^T · h, h = A · a. Converges to leading singular vectors of A. Used in topic-specific search, citation analysis.

Spectral clustering (Shi + Malik 2000 normalized cuts; Ng + Jordan + Weiss 2002)

Build similarity graph (e.g. k-NN or ε-ball over data points).
Compute (normalized) Laplacian.
Take the k eigenvectors of smallest non-zero eigenvalues; stack as columns into an n × k matrix.
k-means on rows.

Works when clusters are non-convex (where k-means in the original space fails). Linked to graph partitioning, normalized cuts, and (via Cheeger) to conductance.

Graph embeddings + GNNs

Modern bridge between graphs and machine learning. Each maps vertices to vectors so that nearby (in graph) → nearby (in embedding).

DeepWalk (Perozzi + Al-Rfou + Skiena 2014, KDD) — generate random walks, treat them as “sentences”, train Skip-Gram (word2vec).
node2vec (Grover + Leskovec 2016, KDD) — biased random walks with BFS/DFS-like parameters p, q.
GraphSAGE (Hamilton + Ying + Leskovec 2017, NeurIPS) — inductive: learn an aggregator over neighbors, generalizes to unseen nodes.
GCN — Graph Convolutional Network (Kipf + Welling 2017, ICLR) — H^{(l+1)} = σ(D̃^{−1/2} · Ã · D̃^{−1/2} · H^{(l)} · W^{(l)}), a first-order localized spectral filter. The catalyst paper for the GNN era.
GAT — Graph Attention Network (Veličković et al. 2018, ICLR) — learned per-edge attention weights.
Message Passing Neural Networks (Gilmer et al. 2017, ICML) — unifying framework: edges send messages, vertices aggregate and update.

14. NP-hard graph problems

Many natural graph problems are NP-hard. Below: problem, structure, what’s known.

Hamiltonian cycle / path — visit each vertex exactly once. NP-complete (Karp 1972). Polynomial for special classes (interval graphs, planar bipartite).
Traveling Salesman Problem (TSP) — Hamiltonian cycle of minimum weight, complete weighted graph. NP-hard. Christofides 1976 gives a 3/2-approximation for metric TSP (improved to 3/2 − ε by Karlin + Klein + Oveis Gharan 2021). Concorde solver routinely handles n in the thousands optimally.
Graph coloring — minimum number of colors so adjacent vertices differ. Chromatic number χ(G) is NP-hard to compute or approximate within n^{1−ε}. Bounded by Δ + 1 (greedy); Δ for non-K_{Δ+1} graphs (Brooks 1941); 4 for planar (Four-color theorem).
Maximum clique / Maximum independent set — complementary via Ḡ. NP-hard, hard to approximate within n^{1−ε}. Tractable on perfect graphs (Grötschel + Lovász + Schrijver 1981 via the ellipsoid method).
Minimum vertex cover — cover every edge with a minimum vertex set. NP-hard but admits a simple 2-approximation (take both endpoints of a maximal matching). 2 − ε is UGC-hard (Khot + Regev 2008).
Steiner tree — minimum-weight subtree spanning a given subset of “terminals”. NP-hard; best known approximation 1.39 (Byrka et al. 2010).
Graph isomorphism (GI) — is G ≅ H? Neither known to be in P nor known to be NP-hard. Babai 2016 gave a quasi-polynomial algorithm 2^{O((log n)^c)}. Tractable in practice (nauty/bliss).
Feedback vertex set / arc set, multiway cut, densest subgraph (poly), densest k-subgraph (NP-hard), max-cut (NP-hard, 0.878-approx Goemans + Williamson 1995 via SDP).

The standard reference is Garey + Johnson “Computers and Intractability” 1979 for the original NP-hardness catalog.

15. Random graphs

Probabilistic models of graphs — a workbench for studying typical structure.

Erdős-Rényi G(n, p) (Erdős + Rényi 1959, 1960)

n labelled vertices; each of the C(n, 2) edges present independently with probability p.

Expected edges: p · C(n, 2).
Degrees: Binomial(n − 1, p) → Poisson(np) in the n → ∞, np → const regime.
Phase transition at p_c = 1/n: below, components are O(log n); above, a unique giant component of size Θ(n) emerges.
Connectivity threshold: p ~ ln n / n.
Diameter ~ ln n / ln(np) in the supercritical regime.

The cleanest model — but not a good fit for real networks (Poisson degrees, no clustering).

Configuration model (Bollobás 1980; Molloy + Reed 1995)

Specify a degree sequence (d₁, …, d_n); choose a uniformly random graph with that sequence (via half-edge matching). Lets you study what’s caused by degrees alone vs by other structure.

Watts-Strogatz small-world (Watts + Strogatz 1998, Nature)

Start with a ring lattice (high clustering); rewire each edge with probability p (introduces shortcuts → small diameter). At small p, captures the “small-world” empirical phenomenon: high clustering coexisting with short paths. Explains why six degrees of separation works.

Barabási-Albert preferential attachment (Barabási + Albert 1999, Science)

Grow the graph one vertex at a time; each new vertex attaches m edges to existing vertices, with probability proportional to current degree. Generates scale-free degree distribution P(k) ∼ k^{−3}. The model that put network science on the map.

Other notable models: stochastic block model (SBM) (Holland + Laskey + Leinhardt 1983) for community structure; exponential random graphs (ERGM) for inference; hyperbolic random graphs (Krioukov et al. 2010) reproducing degree, clustering, navigability simultaneously.

16. Graph databases + tools

The graph stack as of 2026.

Native graph databases

Neo4j (Webber + Eifrem, 2007; open source 2010) — property graph model + Cypher query language. ACID, large ecosystem, the de facto standard for transactional graph workloads. Cypher has been standardized as GQL — ISO/IEC 39075 (2024).
JanusGraph (continuation of TitanDB) — distributed property graph over Cassandra / HBase / ScyllaDB; Apache TinkerPop + Gremlin traversal language.
Amazon Neptune — managed; supports both Gremlin (property graph) and SPARQL (RDF).
ArangoDB — multi-model (document + graph + key-value), AQL query language.
Memgraph — in-memory, Cypher-compatible, real-time analytics focus.
TigerGraph — distributed, analytic, GSQL query language, deep-link analytics.
Dgraph — distributed, GraphQL-native.
NebulaGraph — open-source distributed graph DB from Vesoft.
TerminusDB — git-style versioned knowledge graph.

Triplestores / RDF

Apache Jena, Virtuoso, Blazegraph, GraphDB (Ontotext), AnzoGraph, Stardog. SPARQL queries; backbone of the semantic web, Wikidata.

In-memory algorithm libraries

NetworkX (Hagberg + Schult + Swart 2008) — Python, reference implementations, small-to-medium graphs. Algorithmically comprehensive, performance modest.
igraph (Csárdi + Nepusz 2006) — C core with R, Python, Mathematica bindings; fast, suitable for medium-large graphs.
graph-tool (Peixoto 2014) — Python interface, C++/Boost core; very fast; strong on statistical inference (SBM).
SNAP (Leskovec + Sosič, Stanford) — C++ + Python; designed for billion-edge graphs on a single machine.
NetworKit — parallel algorithms in C++/OpenMP with Python bindings.
Boost Graph Library (BGL) — C++ template-generic.
LEMON (Library for Efficient Modeling + Optimization in Networks) — C++.
GraphBLAS + SuiteSparse:GraphBLAS (Davis) — linear-algebra formulation of graph algorithms; the basis of LAGraph + RedisGraph predecessors.

Distributed + GPU

cuGraph (NVIDIA RAPIDS) — GPU-accelerated graph algorithms; PageRank, BFS, Louvain, etc.
Gunrock (UC Davis) — GPU graph processing framework.
Apache Spark GraphX / GraphFrames — distributed graph processing.
Pregel (Malewicz et al. 2010 Google) — vertex-centric BSP model; inspired Giraph, GraphLab, Flink Gelly.
TigerGraph + Neo4j Fabric — distributed query planning.

Visualization

Gephi (Bastian + Heymann + Jacomy 2009) — interactive desktop, ForceAtlas2 layout.
Cytoscape — biological networks.
yEd, Graphviz (dot / neato / fdp), vis.js / vis-network, D3.js force layout, sigma.js, Cosmograph (GPU).

17. GNN libraries

Production-grade frameworks for graph neural networks.

PyTorch Geometric (PyG) (Fey + Lenssen 2019, ICLR Workshop) — message-passing API, dozens of GNN layers (GCN, GAT, GIN, GraphSAGE, …), datasets, samplers. The dominant academic + many industrial codebases.
DGL — Deep Graph Library (Wang et al. 2019) — Amazon + NYU; PyTorch + MXNet + TensorFlow backends; strong on large-scale + heterogeneous graphs.
TensorFlow GNN (TF-GNN) (Ferludin et al. 2022, Google) — heterogeneous graphs first-class; production deployment via TF Serving.
Jraph (DeepMind 2020) — JAX-based GNN library.
Graphormer (Ying et al. 2021, Microsoft, NeurIPS) — Transformer adapted to graphs with structural/positional encodings; won OGB-LSC 2021 PCQM4M-LSC.
PyG-Lib / TorchSparse / Sparse Tensor backends — sparse-tensor primitives for scalability.

These are typically paired with the Open Graph Benchmark (OGB) (Hu et al. 2020) for standardized evaluation, and with samplers (neighbor sampling, GraphSAINT, ClusterGCN) for training on graphs that don’t fit in GPU memory.

18. Applications

A short tour, not exhaustive.

Social networks — friend / follower / mention graphs; recommendation (Facebook, LinkedIn People You May Know, Twitter Who To Follow, Pinterest Pixie, TikTok For You) all rely on graph signals.
Knowledge graphs — Google KG (~2012), Wikidata, schema.org, Microsoft Academic / Concept, DBpedia, YAGO. Power answer boxes, entity resolution, and now retrieval for grounded LLM systems.
PageRank + search ranking (§13); also link analysis in fraud, spam, and trust networks.
Routing — Google Maps, Apple Maps, Bing Maps; open stacks OSRM, GraphHopper, Valhalla. Contraction hierarchies + hub labeling under the hood.
Dependency analysis — build systems (Make, Ninja, Bazel, Nx, Turborepo), package managers (npm, pip, cargo, Maven, NuGet), language servers, security CVE propagation.
Compiler optimization — control-flow graph, call graph, data-flow lattice; dominator trees, SSA form, register allocation as graph coloring.
Network analysis — Wikipedia link graph; academic citation networks (Microsoft Academic, OpenAlex); biology protein-protein interaction (STRING-db); brain connectomes; transportation networks; financial transaction graphs (AML, fraud rings).
Recommendation — user-item bipartite graphs; collaborative filtering ↔ matrix factorization ↔ link prediction; modern systems (Pinterest PinSage, Uber Eats, Amazon) train GNNs on the interaction graph.
Molecular property prediction — molecules as graphs (atoms + bonds); GNNs predict toxicity, solubility, binding affinity. Open Catalyst Project, DeepMind GNoME (2023, materials discovery).
SLAM pose graphs — robot poses + relative measurements; nonlinear least squares on the graph ([[Robotics/slam]], [[Robotics/Tier3/slam-algorithms]]).
Circuit design — netlists are hypergraphs; placement + routing solve graph partitioning + Steiner tree problems at industrial scale (Cadence, Synopsys).
Bioinformatics — de Bruijn graphs in genome assembly; gene regulatory networks; phylogenetic trees as MSTs on distance matrices.
Operations research — vehicle routing, scheduling, supply chain — all graph optimization.

19. Pitfalls

Common bugs and traps when implementing graph algorithms.

In-degree vs out-degree confusion in directed graphs — always be explicit which adjacency you’re iterating.
Off-by-one in 0-indexed vs 1-indexed vertex numbering when mixing libraries / contest code.
Adjacency matrix on a sparse graph — O(V²) BFS instead of O(V + E). On a graph with 10⁶ vertices and 10⁷ edges, the matrix would take 10¹² entries — infeasible. Always use adjacency lists for sparse graphs.
Forgetting disconnected components — your BFS / DFS / SCC scan should loop over all vertices, not just one start.
Negative cycles silently break Dijkstra — no error, just wrong distances. Use Bellman-Ford or detect negativity upfront.
Visited check placement — adding a vertex to BFS queue but only marking visited when popping leads to duplicates; mark when pushing.
Union-Find without path compression / union by rank — degenerates to O(n) per op. Always implement both optimizations.
Heap with stale entries in Dijkstra — must guard with if d > dist[u]: continue when popping.
Recursive DFS on large graphs — Python / Node recursion limit hits at ~10⁴ depth. Use iterative DFS with an explicit stack for large graphs.
Symmetric storage for undirected — remember each edge appears in both adjacency lists; |E| stored entries = 2m.
Sub-optimal MST detection from wrong cuts — when implementing Borůvka, two components may both pick the same edge — handle ties consistently (e.g. by edge index).
GNN over-smoothing (Li + Han + Wu 2018; Oono + Suzuki 2020) — at depth, node representations collapse to a constant. Mitigations: residual / skip connections (JKNet, GCNII), normalization (PairNorm), DropEdge, attention.
Over-squashing (Alon + Yahav 2021) — exponentially many neighbors compressed into a fixed-dim vector; addressed by graph rewiring (Topping et al. 2022), virtual nodes, or graph transformers.
Random walk + DeepWalk on biased samples — random-walk-based embeddings inherit any sampling bias in the graph (popular hubs over-represented).
Centrality definitions disagree — degree, betweenness, closeness, eigenvector, PageRank each rank vertices differently. Pick the one that matches your question.

20. Cross-references

[[Math/linear-algebra-essentials]] — vectors, matrices, eigenvalues underpinning spectral methods.
[[Math/numerical-linear-algebra]] — sparse eigensolvers (Lanczos, Arnoldi) and iterative methods (CG, GMRES) used in spectral graph theory and Laplacian-based methods.
[[Math/svd-pca-spectral]] — SVD and spectral decompositions; HITS as SVD.
[[Math/convex-optimization]] — LP / SDP relaxations of max-cut, Lovász theta, vertex cover.
[[Math/_index]] — math reference family index.
[[Compute/database-internals]] — graph databases as a class.
[[Compute/transformer-architecture]] — attention as a complete graph; GNN ↔ Transformer (Joshi 2020).
[[Robotics/Tier3/path-planning-algorithms]] — A*, D*, RRT, RRT*, PRM on configuration-space graphs.
[[Robotics/slam]], [[Robotics/Tier3/slam-algorithms]] — pose graphs + factor graphs (GTSAM, g2o, Ceres).

21. Citations

Cormen, Leiserson, Rivest, Stein. Introduction to Algorithms, 4th ed. MIT Press, 2022. (CLRS — the standard reference.)
West, D. B. Introduction to Graph Theory, 2nd ed. Prentice Hall, 2001.
Bondy, J. A. + Murty, U. S. R. Graph Theory. Springer GTM 244, 2008.
Newman, M. E. J. Networks, 2nd ed. Oxford University Press, 2018. (Network science.)
Garey, M. + Johnson, D. Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman, 1979.
Dijkstra, E. W. “A note on two problems in connexion with graphs.” Numerische Mathematik 1, 269–271, 1959.
Bellman, R. “On a routing problem.” Quart. Appl. Math. 16, 87–90, 1958.
Ford, L. R. + Fulkerson, D. R. “Maximal flow through a network.” Canad. J. Math. 8, 399–404, 1956.
Floyd, R. W. “Algorithm 97: Shortest path.” CACM 5(6), 345, 1962.
Hart, P. + Nilsson, N. + Raphael, B. “A formal basis for the heuristic determination of minimum cost paths.” IEEE Trans. SSC 4(2), 100–107, 1968.
Kruskal, J. B. “On the shortest spanning subtree of a graph and the traveling salesman problem.” Proc. AMS 7(1), 48–50, 1956.
Prim, R. C. “Shortest connection networks and some generalizations.” Bell Syst. Tech. J. 36(6), 1389–1401, 1957.
Borůvka, O. “O jistém problému minimálním.” Práce Mor. Přírodověd. Spol. 3, 37–58, 1926.
Hopcroft, J. + Karp, R. “An n^{5/2} algorithm for maximum matchings in bipartite graphs.” SICOMP 2(4), 225–231, 1973.
Edmonds, J. “Paths, trees, and flowers.” Canad. J. Math. 17, 449–467, 1965.
Gale, D. + Shapley, L. S. “College admissions and the stability of marriage.” Amer. Math. Monthly 69, 9–15, 1962.
Kuhn, H. W. “The Hungarian method for the assignment problem.” Naval Res. Logist. Quart. 2, 83–97, 1955.
Tarjan, R. E. “Depth-first search and linear graph algorithms.” SICOMP 1(2), 146–160, 1972.
Erdős, P. + Rényi, A. “On random graphs I.” Publ. Math. Debrecen 6, 290–297, 1959.
Watts, D. J. + Strogatz, S. H. “Collective dynamics of small-world networks.” Nature 393, 440–442, 1998.
Barabási, A.-L. + Albert, R. “Emergence of scaling in random networks.” Science 286, 509–512, 1999.
Page, L. + Brin, S. + Motwani, R. + Winograd, T. “The PageRank citation ranking: bringing order to the web.” Stanford InfoLab TR, 1998.
Kleinberg, J. M. “Authoritative sources in a hyperlinked environment.” J. ACM 46(5), 604–632, 1999.
Shi, J. + Malik, J. “Normalized cuts and image segmentation.” IEEE PAMI 22(8), 888–905, 2000.
Boykov, Y. + Kolmogorov, V. “An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision.” IEEE PAMI 26(9), 1124–1137, 2004.
Perozzi, B. + Al-Rfou, R. + Skiena, S. “DeepWalk: online learning of social representations.” KDD, 2014.
Grover, A. + Leskovec, J. “node2vec: scalable feature learning for networks.” KDD, 2016.
Hamilton, W. L. + Ying, R. + Leskovec, J. “Inductive representation learning on large graphs.” NeurIPS, 2017.
Kipf, T. N. + Welling, M. “Semi-supervised classification with graph convolutional networks.” ICLR, 2017.
Veličković, P. et al. “Graph attention networks.” ICLR, 2018.
Gilmer, J. et al. “Neural message passing for quantum chemistry.” ICML, 2017.
Fey, M. + Lenssen, J. E. “Fast graph representation learning with PyTorch Geometric.” ICLR Workshop on Representation Learning on Graphs and Manifolds, 2019.
Wang, M. et al. “Deep Graph Library: a graph-centric, highly-performant package for graph neural networks.” 2019; later DGL papers.
Babai, L. “Graph isomorphism in quasipolynomial time.” STOC, 2016.
Christofides, N. “Worst-case analysis of a new heuristic for the travelling salesman problem.” Carnegie Mellon TR, 1976.
Hagberg, A. + Schult, D. + Swart, P. “Exploring network structure, dynamics, and function using NetworkX.” SciPy, 2008.
Csárdi, G. + Nepusz, T. “The igraph software package for complex network research.” InterJournal Complex Systems, 2006.
NetworkX documentation, https://networkx.org.
Neo4j Cypher documentation; ISO/IEC 39075:2024 GQL.

Compendium

Explorer

Graph Theory — Math Reference

Graph Theory — Math Reference

1. At a glance

2. Definitions

3. Representations

4. Degree

5. Traversals

BFS — Breadth-First Search

DFS — Depth-First Search

Topological sort

6. Shortest paths

Dijkstra (Dijkstra 1959, Numerische Mathematik)

Bellman-Ford (Bellman 1958; Ford 1956)

Floyd-Warshall (Floyd 1962; Warshall 1962; Roy 1959)

Johnson 1977

A* (Hart + Nilsson + Raphael 1968, IEEE Trans. Sys. Sci. Cyb.)

Contraction Hierarchies + Hub Labeling

7. Minimum Spanning Tree (MST)

Kruskal 1956

Prim 1957 (rediscovered; Jarník 1930)

Borůvka 1926

8. Network flow

Max-flow / Min-cut theorem (Ford + Fulkerson 1956)

Ford-Fulkerson 1956

Edmonds-Karp 1972

Dinic’s algorithm (Dinitz 1970)

Push-Relabel (Goldberg + Tarjan 1986/1988)

Min-cost max-flow

Bipartite matching as max-flow

Applications

9. Matching

Bipartite

General (non-bipartite)

Stable matching — Gale-Shapley 1962

10. Connectivity

Strongly Connected Components (SCC)

Biconnected components (undirected)

k-connectivity + Menger’s theorem (Menger 1927)

11. Trees

Tree DP

LCA — Lowest Common Ancestor

Heavy-Light Decomposition (HLD) (Sleator + Tarjan 1983, conceptually)

Euler tour + segment tree

Centroid decomposition

12. Planar graphs

Kuratowski’s theorem (Kuratowski 1930)

Four-color theorem (Appel + Haken 1976; Robertson + Sanders + Seymour + Thomas 1997 simpler proof)

Euler’s formula (Euler 1758)

13. Spectral graph theory

Adjacency matrix A

Laplacian L

Normalized Laplacian

Fiedler vector (Fiedler 1973)

Cheeger inequality

PageRank (Page + Brin 1998 — Stanford TR)

HITS — Hubs and Authorities (Kleinberg 1999, J. ACM)

Spectral clustering (Shi + Malik 2000 normalized cuts; Ng + Jordan + Weiss 2002)

Graph embeddings + GNNs

14. NP-hard graph problems

15. Random graphs

Erdős-Rényi G(n, p) (Erdős + Rényi 1959, 1960)

Configuration model (Bollobás 1980; Molloy + Reed 1995)

Watts-Strogatz small-world (Watts + Strogatz 1998, Nature)

Barabási-Albert preferential attachment (Barabási + Albert 1999, Science)

16. Graph databases + tools

Native graph databases

Triplestores / RDF

In-memory algorithm libraries

Distributed + GPU

Visualization

17. GNN libraries

18. Applications

19. Pitfalls

20. Cross-references

21. Citations

Graph View

Table of Contents