Compiler Design — Compute Reference

1. At a glance

A compiler is a program that translates a source language into a target language while preserving semantics. The classical pipeline is a chain of well-defined transformations, each consuming a representation produced by the prior phase:

source text
    │
    ▼
[ lexer ]            ── tokens (identifiers, keywords, literals, punctuation)
    │
    ▼
[ parser ]           ── concrete syntax tree, then AST (abstract syntax tree)
    │
    ▼
[ semantic analysis ]── typed AST: names resolved, types checked, scopes built
    │
    ▼
[ IR generation ]    ── high-level intermediate representation (often SSA-based)
    │
    ▼
[ optimization ]     ── transforms on IR: constant folding, inlining, LICM, etc.
    │
    ▼
[ lowering ]         ── lower-level IR closer to target machine (e.g. LLVM MIR)
    │
    ▼
[ instruction
  selection +
  register allocation
  + scheduling ]     ── assembly / machine code / bytecode
    │
    ▼
target code (binary, bytecode, assembly, WebAssembly, GPU kernel, etc.)

Phases are conceptually separate but in production compilers (LLVM, GCC, MSVC, rustc, Hotspot C2) they overlap and feedback loops exist — for example, type checking can require evaluating constant expressions, and inlining decisions in optimization can depend on profile data collected at runtime. The separation of concerns given by the pipeline is what made modular compiler infrastructure (LLVM, MLIR, GCC) possible: a frontend writer only needs to emit a well-formed IR; everything downstream is reused across hundreds of languages.

Modern compilers run dozens to hundreds of passes. A clang -O2 build of a moderate C++ file may execute on the order of 100 LLVM passes; rustc -O adds MIR-level passes before LLVM ever sees the program; javac is short, while the JIT (Hotspot C2 or Graal) does the bulk of the work at runtime.

The pipeline divides cleanly into a frontend (lex / parse / semantic / IR-gen), a middle-end (target-independent optimization on IR), and a backend (instruction selection / scheduling / register allocation / emission). The Three Hats principle in LLVM: a frontend writer owns AST → IR; a middle-end writer owns IR → IR; a backend writer owns IR → machine code. Most engineers in the wild only ever touch one of these.

A few cross-cutting concerns thread through every phase: diagnostics + source-position propagation (a source span must survive every transformation so errors can be reported at the right place), debug info preservation (DWARF / PDB lines + variable locations must follow code as it is hoisted, inlined, or eliminated), and invariant verification (compilers in development run Verifier passes between every transform to catch broken IR).

2. Lexing (tokenization)

The lexer (a.k.a. scanner) splits a stream of characters into tokens. A token is a (kind, lexeme, source-span) triple — for example (IDENT, "factorial", 12:5-12:14).

Token categories typically include:

Identifiers — foo, _bar, αlpha (modern compilers handle Unicode per UAX #31).
Keywords — if, while, match. Some languages have contextual keywords (yield in Python, async in Rust before 2018) that are identifiers in most contexts.
Literals — integers, floats, strings (with escapes), characters, byte strings, raw strings, regex literals.
Operators + punctuation — +, ==, ->, ::, ?., ..=.
Whitespace + comments — usually skipped, but significant in Python (indentation tokens INDENT/DEDENT) and Haskell (layout rule).

The classical formal model is regular expressions → NFA → DFA, with the maximal-munch (longest match) rule for disambiguation. Tooling:

lex / flex (UNIX) — generate DFAs from regex specs; emit C code.
ANTLR (Parr 1989+) — LL(*) lexer + parser generator; widely used (Hibernate, Presto, Postgres pg_query, Solidity).
re2c — embeds DFA in user-written C with /*!re2c */ blocks.
Hand-written lexers — most production compilers (GCC, Clang, rustc, swiftc, javac) use hand-written lexers for control over error messages, performance, and incremental relexing.

Subtleties: UTF-8 multi-byte sequences (a Unicode identifier byte is not a token boundary), source-position tracking across CRLF / LF / CR, BOM handling, trigraphs / digraphs (C++), and shebang lines (#!). A common bug is treating regex . as “any byte” when it should be “any code point”.

Performance matters: a production lexer is on the hot path of every IDE keystroke. Tricks include SIMD-accelerated identifier scanning (Clang since 2015, V8 since 2017 for JSON parsing), branch-prediction-friendly token-kind layout, and string interning of identifiers into a single hash table so later phases compare by pointer rather than memcmp. The Swift compiler’s lexer interns every identifier; the JVM internalizes string constants as Symbol* in the same way.

Special-case lexing rules cause most surprises. C++ has the maximal-munch exception for >> in templates (vector<vector<int>> was a syntax error until C++11). Rust’s r#raw identifiers, r"raw strings", b"byte strings", and the 'a lifetime / character-literal ambiguity all require lookahead. Python’s layout rule emits virtual INDENT / DEDENT tokens from the lexer; Haskell’s offside rule is similar. JavaScript’s automatic semicolon insertion (ASI) is a post-lex error-recovery mechanism that has caused entire categories of subtle bugs.

3. Parsing

Parsing consumes the token stream and produces a parse tree, typically immediately collapsed into an abstract syntax tree (AST) that drops parentheses, punctuation, and operator-precedence intermediates.

Top-down

Recursive descent (LL(k)) — one function per non-terminal; the parser predicts the production from k tokens of lookahead. Easy to write by hand, easy to integrate with error recovery, the default for GCC, Clang, rustc, roslyn, javac.
Pratt parsing (Pratt 1973) — handles expressions with precedence climbing. Each operator has a binding power; the algorithm is roughly 30 lines of code. Used in V8, rustc’s expression parser, and many language servers.
LL(*) — ANTLR 4 (Parr + Harwell 2014) — uses arbitrary lookahead with DFA-based prediction; powerful, generator-only.

Bottom-up

LR(k) / LALR(1) — shift-reduce parsers with a state machine over a stack of grammar symbols. Tooling: yacc / bison (Johnson 1975, Stallman). Used in older Bash, Awk, GCC C parser (pre-3.x), MySQL, Perl 5.
GLR (Generalized LR) (Tomita 1986) — handles ambiguous and natural-language-like grammars by forking the parser stack; used in elsa (C++), and historically in some R parsers.
Earley parser (Earley 1968) — handles any CFG including ambiguous ones; O(n³) worst case, O(n²) for unambiguous grammars; used in nearley (JS) and Marpa (Perl).

PEG and combinators

PEG (Parsing Expression Grammars) (Ford 2004) — looks like CFG but with ordered choice (/) replacing alternation (|), making the grammar unambiguous by construction. Python 3.9+ adopted a PEG parser (PEP 617, Bendersky + van Rossum) replacing the LL(1) parser. Tooling: pegen (Python), pest (Rust), peg.js. Pitfall: left recursion needs special handling (Warth et al. 2008 packrat with left-recursion).
Parser combinators — small functions composed with combinators like .or_else, .and_then, .many, .optional. Haskell Parsec (Leijen 2001) and megaparsec, Rust nom (Couprie), Scala fastparse. Great for embedded DSLs; harder to give pinpoint error messages.

Error recovery + diagnostics

Modern compilers (Rust, Swift, TypeScript) treat parse errors as a first-class feature, not a fatal stop. Techniques:

Synchronization tokens — on error, skip until ; } or end-of-line, then continue.
Token insertion / deletion — heuristically insert missing ; or delete an extra }.
Tree-sitter (Brunsfeld 2018) — GLR-based; every edit produces a valid (possibly error-tolerant) tree, so editors get useful highlighting even mid-edit. Used by Neovim, GitHub code search, Atom-derived editors, Zed.
Rust + Swift — invest heavily in suggestion-quality diagnostics: “did you mean & here?”, with machine-applicable fixes.

The AST should not encode language-specific concrete syntax (no quote tokens, no positions of parens) but should carry source spans for diagnostics and language-server integration.

4. Semantic analysis

This phase turns a context-free AST into a context-sensitive typed AST.

Name resolution / scoping — bind each identifier to a definition. Static (lexical) scoping is the universal default; dynamic scoping survives only in Bash, Emacs Lisp pre-lexical-binding, and let-dynamic in Clojure.
Symbol tables — usually a stack of hash maps, one per scope, or a single map keyed by (scope_id, name). Module-level resolution can be order-independent (Rust, Java) or order-dependent (C, Python).
Type checking — verify operations are well-typed; insert implicit conversions where the language allows (C’s integer promotions, Java autoboxing, Scala implicits, Swift literal coercions).
Constant folding (AST-level) — evaluate 3 + 4 * 5 to 23 at compile time; useful as input to type checking (array sizes, const generics).
Borrow / lifetime checking (Rust) — happens after type checking on MIR.
Effect / capability checking (Koka, OCaml 5) — track which effects each function may perform.
Trait / type-class / protocol resolution — match call sites to implementations: Rust trait selection, Haskell type-class dictionary insertion, Swift protocol conformance lookup, Scala implicit search. This phase often dominates compile time on large generic codebases.
Macro expansion — Rust declarative + procedural macros, Scala 3 inline + macros, Lisp / Scheme syntax-rules + syntax-case, OCaml ppx, C/C++ preprocessor. Hygiene (Kohlbecker et al. 1986) prevents accidental identifier capture; Rust and Scheme implement it correctly, C does not.
Lints + non-fatal warnings — Clang -Wall -Wextra, rustc lints (#[warn(unused)]), clippy, ESLint, ruff. Conceptually a separate analysis but woven into semantic analysis to share the type information.

5. Type systems

Static vs dynamic

Static: types known at compile time. Java, C#, Rust, Swift, Haskell, OCaml, Go, TypeScript (mostly), C, C++, Kotlin, Scala.
Dynamic: types attached to values at runtime. Python, JavaScript, Ruby, Lua, Common Lisp, Clojure, Smalltalk, PHP, Tcl.
Gradual typing — a language allows both: TypeScript, Python (PEP 484 type hints with mypy/pyright), Hack, Dart.

Strong vs weak

Imprecise but widely used. “Strong typing” usually means: implicit coercion is rare and operations on mismatched types fail. C is statically typed but weakly so (int* to char* cast is silent); Python is dynamically but strongly typed ("5" + 3 is an error).

Hindley-Milner inference

Originated with Hindley (1958), Milner (1978), Damas + Milner (1982). The Algorithm W infers the most general type of a function from its body without annotation. Foundational for ML, OCaml, Haskell, Elm, F#, and partially adopted by Rust + Swift + TypeScript. Key insight: types are inferred by unification on a constraint set generated by walking the AST. Pitfall: full HM extensions (let-polymorphism, type classes, GADTs, higher-rank types) lose decidability or principal types in pathological corners.

System F and dependent types

System F (Girard 1972, Reynolds 1974) — adds explicit type abstraction (universal quantification); the theoretical core of polymorphic ML and Haskell.
System F-omega — adds type-level functions; foundation of Haskell type families and Scala higher-kinded types.
Dependent types — types that depend on values: Vec n a where n is a term. Used by Coq (Coquand + Huet 1984), Agda, Idris (Brady 2007), Lean (de Moura 2013); enables full machine-checked proofs, but type checking is undecidable in general.

Linear and affine types

A linear type must be used exactly once; an affine type at most once. The latter is the basis of Rust’s ownership and borrow checker (Matsakis + Klock 2014). Linear Haskell (Bernardy et al. 2018) adds linearity as a multiplicity annotation on arrows.

A refinement type is a base type with a predicate: {x: Int | x > 0}. LiquidHaskell (Rondon + Kawaguchi + Jhala 2008) and F* (Microsoft Research) embed an SMT solver to discharge proof obligations. Used in production at Microsoft for kernel and Azure code.

Effect systems

Track side effects in the type system. Koka (Leijen 2014+), Eff (Bauer + Pretnar 2010), and OCaml 5 (effect handlers, 2022) make effects first-class. The motivation: pure functions with explicit effect rows are easier to refactor and parallelize than functions with hidden globals.

Subtyping and variance

Subtyping introduces the question: if Cat <: Animal, is List<Cat> <: List<Animal>? The answer depends on variance:

Invariant — default in Java + C# + Scala for class generics.
Covariant (out T in C#, extends T in Java wildcards, +T in Scala) — List<Cat> is a List<Animal> for read-only consumers.
Contravariant (in T, super T, -T) — Action<Animal> is an Action<Cat>.

Site-variance (Java wildcards) vs declaration-variance (Scala, Kotlin, C#) is the major design split. Java arrays are covariant (an unsound legacy decision that requires runtime ArrayStoreException checks).

Higher-kinded types, GADTs, type families

Modern functional type systems support second-order constructs:

Higher-kinded types — abstract over type constructors (Functor f, Monad m). Native in Haskell and Scala; emulated awkwardly in Rust via GAT (Generic Associated Types, stable 2022).
GADTs (Generalized Algebraic Data Types) — constructors that refine type parameters; in Haskell (GADTs extension) and OCaml. Enable type-safe interpreters and well-typed AST manipulations.
Type families / associated types — type-level functions; Rust’s type Item; on traits, Swift’s associatedtype, Haskell type families.
Const generics — types parameterized by values: [T; N], Vec<T, N>. Mainstream in Rust (stable 2021), C++ non-type template parameters, Zig comptime, Swift parameter packs (5.9, 2023).

Type-inference cost

Hindley-Milner is DEXPTIME in the worst case (Mairson 1990), though in practice linear-time for typical programs. Adding type classes (Haskell), implicits (Scala), or trait selection (Rust) makes constraint resolution exponential — programmers see this as “compile times of minutes for one expression”. This is the single largest cost driver in Scala 2 + Rust + GHC compile time today.

6. Intermediate Representations (IR)

A compiler usually has multiple IRs at descending levels of abstraction.

AST

Closest to source. Carries high-level constructs (classes, pattern matches, for-in loops). Useful for refactoring tools, formatters, language servers.

Three-address code (TAC)

Each instruction has the form x = y op z. Reduces complex expressions to a sequence of binary operations. Historically the entry to backend optimization in pedagogical compilers (Aho et al. Dragon Book).

SSA — Static Single Assignment

(Cytron, Ferrante, Rosen, Wegman, Zadeck 1991 TOPLAS.) Each variable is assigned exactly once. At control-flow merge points, φ-nodes select between values from incoming edges:

    bb0: x0 = 1
        ↓     ↘
    bb1: x1 = 2  bb2: x2 = 3
        ↘     ↓
    bb3: x3 = φ(x1 from bb1, x2 from bb2)
         use(x3)

Benefits: def-use chains are direct (no need for separate analysis), most dataflow analyses become simpler, optimizations like sparse conditional constant propagation (Wegman + Zadeck 1991) are natural. SSA is the IR of LLVM IR, GCC GIMPLE, HotSpot C2 (Sea of Nodes), V8 TurboFan, Cranelift, GraalVM.

CFG — Control Flow Graph

Basic blocks (straight-line sequences with a single entry and single exit) connected by edges representing branches. The unit of analysis for most optimizations. Loops are detected via dominator trees and back-edges (Tarjan 1974).

CPS — Continuation-Passing Style

Every function takes an explicit continuation. Alternative to SSA with equivalent expressive power (Appel 1992, Kelsey 1995). Used by MLton, Chez Scheme, the Rabbit Scheme compiler (Steele 1978), and as an IR in functional-language compilers. SSA and CPS are interconvertible.

MLIR — Multi-Level IR

(Lattner et al. Google 2019, published 2021.) MLIR generalizes LLVM IR by allowing user-defined dialects that layer on top of each other. A typical ML compilation lowers through dialects like:

TF dialect  →  TOSA dialect  →  Linalg dialect  →  Affine + SCF  →  LLVM dialect  →  LLVM IR

Each lowering is a pass; rewrites are declarative. MLIR underpins TensorFlow XLA’s HLO (in some configurations), IREE, Modular Mojo, Triton 2.x, Polygeist (C/C++ → MLIR), CIRCT (hardware), and Flang (Fortran). It is the most significant compiler-infrastructure development of the late 2010s / early 2020s.

7. Optimizations

Optimizations are program transformations that preserve observable semantics (the as-if rule in C++) while improving some cost metric — speed, size, energy.

Local (within a basic block)

Constant folding — 3 + 4 → 7 at compile time.
Dead code elimination (DCE) — remove instructions whose result is never used.
Common subexpression elimination (CSE) — a*b computed twice → compute once and reuse.
Copy propagation — replace y with x after y = x.
Strength reduction — replace 2*x with x+x, x*8 with x << 3, integer division by constants with multiply-and-shift.
Peephole optimization — pattern-match short instruction sequences and rewrite (mov eax, 0 → xor eax, eax).

Loop

Loop-invariant code motion (LICM) — hoist work that doesn’t depend on the loop variable out of the loop.
Induction-variable simplification — replace complex induction expressions with cheaper recurrences.
Unrolling — replicate the loop body N times, reducing branch + induction overhead.
Fusion — merge adjacent loops over the same range to share traversal cost.
Fission — opposite: split a loop to enable other transforms.
Blocking / tiling — restructure nested loops to improve cache locality (key for GEMM and stencils).
Polyhedral optimization — model loop nests as integer polyhedra, transform via affine maps (Feautrier 1992, Bastoul Pluto 2008); used by GCC Graphite, IBM XL, isl, MLIR’s Affine dialect.

Inter-procedural

Inlining — replace a call with the callee’s body; the most important single optimization in modern compilers because it enables all other optimizations across function boundaries.
Inter-procedural DCE — remove unused functions across modules.
Devirtualization — turn obj.method() (virtual call) into a direct call when the dynamic type is known statically; central in V8, Hotspot, Graal.
Escape analysis — prove an allocation never escapes its scope; can convert heap to stack allocation (Go, Hotspot, .NET).

Profile-Guided Optimization (PGO)

Two-phase build: (1) compile with instrumentation, run the program on a representative workload, collect a profile; (2) recompile, using the profile to direct inlining, basic-block layout, branch hinting, and register allocation. Real-world wins: 5-15% on backend services (Chromium, MySQL, Postgres builds), substantially more on JITs by construction. AutoFDO (Google 2016) replaces instrumented PGO with sample-based profiles from perf.

Link-Time Optimization (LTO)

Defer optimization to link time so the compiler sees all translation units. Full LTO (slow, max benefit) vs Thin LTO (Tobler + Lattner 2015) — the latter scales to Chromium-sized builds. Standard in Clang, GCC, Swift, Rust.

Autovectorization

Generate SIMD (AVX2, AVX-512, NEON, SVE) instructions from scalar loops. LLVM’s loop vectorizer (Karrenberg + Hack 2011, polly-llvm) and slp-vectorizer (Rosen et al. 2007). Pitfalls: alignment, aliasing (in C, restrict helps), reductions, gather/scatter overhead.

When autovectorization fails, programmers reach for intrinsics (_mm256_add_ps, vaddq_f32) or, increasingly, portable SIMD libraries: std::simd (C++26, Vc-derived), core::simd (Rust, nightly since 2020), Google Highway, Intel ISPC. Highway and ISPC compile one source to multiple ISA back-ends with dynamic dispatch chosen by cpuid.

Whole-program optimization and specialization

Specialization clones a generic function for the specific types or values seen at call sites. Rust monomorphization, C++ template instantiation, .NET generics (reified). The trade-off is code size: Rust binaries are notoriously large in part for this reason.

The opposing strategy is type erasure: Java generics, Haskell’s polymorphism, OCaml functors all compile one generic body and box values uniformly. This keeps code size small at a performance cost. Many systems pick a hybrid: .NET reifies value-type generics but erases reference-type generics; Swift erases unless @inlinable is used. Project Valhalla (JEP-401 value types, in progress) aims to bring partial reification to Java.

Other notable transforms

Tail-call elimination — turn return f(...) into a jump; mandatory in Scheme (R5RS+), optional everywhere else. LLVM’s tailcc and musttail make it explicit. PowerShell, Python, and the JVM (pre Project Loom + Wisp work) historically lack guaranteed TCO.
Bounds-check elimination — prove an array access is in bounds at compile time; common in Rust, Java JITs, Go (with -gcflags=-d=ssa/check_bce/debug=1). Often guided by SCEV (Scalar Evolution analysis).
Devirtualization + speculative inlining — JIT-only: speculate the dynamic type, inline, and install a deopt guard. Hotspot, V8, Graal all rely on this for OO performance.
Polly + Pluto — polyhedral analysis built atop LLVM (Pollylabs, Grosser + Größlinger + Lengauer 2012) for cache-aware loop transformations.
Outlining + cold-section sinking — opposite of inlining; move rarely executed code to cold pages to improve i-cache locality.

8. Register allocation

The backend’s hardest classical problem: map an unbounded set of IR values to a fixed bank of machine registers, spilling to the stack when necessary.

Graph coloring (Chaitin 1981 SIGPLAN, refined by Chaitin-Briggs 1994 and Chaitin-Briggs + iterated coalescing George + Appel 1996). Build an interference graph where nodes are values and edges connect values live simultaneously; k-color the graph for k registers. NP-hard in general; heuristics work well in practice. Used in GCC for many years.
Linear scan (Poletto + Sarkar 1999) — sort live ranges by start; allocate greedily in one pass. Much faster than coloring; preferred for JITs where compile time is on the user’s critical path. Used in early HotSpot client (C1), early LLVM, .NET RyuJIT.
SSA-based register allocation (Hack 2006, Brisk 2009) — exploits the property that interference graphs of SSA programs are chordal, making coloring polynomial. LLVM moved to an SSA-aware allocator (Greedy + PBQP options) around 2010.
Spill cost minimization — when a value can’t get a register, pick the one with lowest spill cost (estimated by loop depth, use density).

Calling conventions (System V AMD64, Windows x64, AArch64 AAPCS) reserve specific registers for arguments, return values, and caller-/callee-saved roles — register allocation must respect these.

9. Code generation

Instruction selection

Map IR operations to target machine instructions. Approaches:

Tree pattern matching with BURS (Bottom-Up Rewrite Systems, Pelegri-Llopart + Graham 1988) — tile the IR DAG with machine-instruction templates having associated costs; dynamic programming finds the min-cost tiling. Used in iburg and historically GCC.
DAG covering — generalizes tree covering to DAGs (shared subexpressions).
Selection-DAG (LLVM SelectionDAG) — lower IR to a per-basic-block DAG, then tile; being replaced by GlobalISel since 2017, which works directly on a global pattern matcher.
Hand-written selectors — Cranelift uses table-driven hand-written ISel for predictable JIT compile times.

Instruction scheduling

Reorder instructions to fill pipeline slots and avoid stalls. List scheduling (basic, per-basic-block) and software pipelining (loop-aware, e.g. iterative modulo scheduling, Rau 1994) for VLIW and superscalar targets. Modern OOO x86 has loose scheduling constraints; in-order ARM and RISC-V cores benefit more.

Machine-specific lowering

x86-64 — variable-length encoding, complex addressing modes, mature backends.
AArch64 / ARM — fixed-length 32-bit instructions, large register file (31 GP + 32 SIMD).
RISC-V — open ISA, modular extensions (RV64GC base, V vector, B bit-manip).
GPU: NVPTX (LLVM target for CUDA), AMDGPU (LLVM target for ROCm/HIP), SPIR-V (Vulkan/OpenCL portable intermediate). NVIDIA’s actual ISA (SASS) is below PTX and unstable across architectures; ptxas lowers PTX → SASS.
WebAssembly — stack-machine bytecode; LLVM Wasm backend, Binaryen + wasm-opt for post-link optimization.

10. JIT vs AOT

AOT — Ahead-of-Time

Compile once, ship a binary. C, C++, Rust, Go, Swift, Zig, Nim, Crystal, OCaml.

Pros: full optimization budget, predictable startup, easier deployment, no warmup, smaller runtime.
Cons: no profile data at compile time (unless PGO), generic specialization happens up front (code bloat), recompilation needed for new CPU features.

JIT — Just-in-Time

Compile at run time. V8 (JavaScript), HotSpot + OpenJ9 + GraalVM (JVM), CoreCLR / RyuJIT (.NET), LuaJIT, PyPy, JavaScriptCore, SpiderMonkey, Dart VM.

Pros: profile-guided by construction, can speculate (e.g. assume foo.length is an integer), deoptimize when assumptions break, can recompile hot code with more aggressive optimization, optimize across deployment boundaries.
Cons: warm-up time (peak performance only after JIT compiles hot code), memory overhead for IR + caches, code-gen on the user’s CPU budget, harder to reason about latency.

Tiered JIT is the universal architecture: an interpreter or template JIT runs everything cheaply; a C1 / baseline JIT compiles fast and unoptimized; a C2 / optimizing JIT compiles slowly and optimally for the hottest methods. HotSpot’s tiered compilation (C1 + C2), V8’s Ignition → Sparkplug → Maglev → TurboFan (added 2022-2023), .NET tiered JIT, all follow this pattern.

Hybrid

GraalVM Native Image — AOT compiles a JVM application closed-world; pairs with the GraalVM JIT for development.
Dart — dart compile aot-snapshot for production, JIT mode for development hot-reload.
.NET ReadyToRun (R2R) + Crossgen2 — AOT-precompile bytecode to native, JIT supplements at run time.
Android ART — installs APKs with ahead-of-time + just-in-time + profile-guided recompile cycles (speed-profile mode).
OCaml — bytecode (ocamlc) for fast compile, native (ocamlopt) for production; same source, two backends, no JIT.

Deopt + on-stack replacement

The defining trick of optimizing JITs is deoptimization: when a speculation fails (the dynamic type wasn’t what we guessed; an unboxed value overflowed), the runtime reconstructs the interpreter state from the JIT’s stack frame and resumes in the interpreter mid-method. On-stack replacement (OSR) is the inverse: a long-running interpreted loop can be replaced mid-execution by a freshly JITted version with state migrated. HotSpot, V8, Graal, .NET RyuJIT all implement both. The book-keeping (scope descriptors, frame state mapping) is the bulk of JIT complexity.

11. Garbage collection

GC implementation is its own deep topic — see [[Compute/garbage-collection]] for tracing (mark-sweep, mark-compact, copying, generational, concurrent like G1, Shenandoah, ZGC, Go GC), reference counting (Python, Swift), and hybrid systems.

12. LLVM

LLVM (Lattner 2003 MS thesis; Lattner + Adve CGO 2004) is the dominant modular compiler infrastructure. Originally “Low-Level Virtual Machine”; the acronym was dropped in 2011 because the project outgrew the name.

Components:

LLVM IR — typed, SSA-based, three-address; both an in-memory data structure and a textual / bitcode serialization. Designed to be language-agnostic but C/C++-leaning (notion of pointer + integer types, exception handling that mirrors Itanium ABI).
Passes — 50+ analysis passes (DominatorTree, LoopInfo, ScalarEvolution, AliasAnalysis) + 100+ transformation passes (InstCombine, SROA, GVN, LoopVectorize, SLPVectorize, IPSCCP).
Backends — code generators for x86 / x86-64, AArch64, ARM, RISC-V, PowerPC, MIPS, SystemZ, WebAssembly, AMDGPU, NVPTX, Hexagon, SPARC, and several others.
Frontends — Clang (C, C++, Objective-C, OpenCL), Rustc (LLVM backend, plus MIR), Swiftc, Julia, GHC (NCG and LLVM backends), Zig (multi-backend, including own self-hosted), Flang (Fortran via MLIR), Crystal, Pony, MLIR-based frontends.

LLVM has been the de facto default backend for new languages since ~2010 because the cost of writing a competitive code generator from scratch is prohibitive.

LLVM IR details worth knowing

LLVM IR is typed (i32, i64, float, double, <4 x float>, pointer types, struct types, array types), SSA, with a small set of opcodes (around 60). Functions have basic blocks; basic blocks have instructions; the last instruction of a block is a terminator (br, ret, switch, unreachable, invoke). Memory operations (load, store, alloca) are explicit; getelementptr (GEP) computes addresses without dereferencing. Metadata is attached via !dbg, !tbaa, !range, !annotation.

The opaque-pointer transition (LLVM 15-17, 2022-2023) removed typed pointers — i32* became plain ptr. This was a years-long refactor across the whole ecosystem (Rust, Julia, Swift, GHC, every embedder) and is the most consequential breaking change LLVM has made.

LLVM ecosystem tooling

opt — run named passes on .ll / .bc files.
llc — IR to assembly.
lli — JIT-execute IR (uses MCJIT or ORC).
bugpoint / llvm-reduce — minimize failing inputs.
FileCheck — golden-file testing for IR transforms; the basis of the LLVM test suite.
LLD — fast cross-platform linker (ELF, COFF, Mach-O, Wasm).
clang-tidy + clang-format + clangd — refactoring, formatting, and LSP, all built on the Clang AST.

13. GCC

GNU Compiler Collection (Stallman 1987 — GCC 1.0). Older than LLVM and historically the only viable open-source C compiler. Architecture:

Multiple frontends — C, C++, Fortran, Ada, Go (gccgo), D (gdc), Modula-2, Objective-C.
Two IRs: GIMPLE (high-level, SSA-form, language-independent) and RTL (Register Transfer Language, low-level, machine-dependent, pre-dates GIMPLE by decades).
Optimization passes on GIMPLE → lowered to RTL → more passes → assembly.

GCC remains competitive: on many benchmarks GCC 14 and Clang 18 trade wins. GCC tends to lead on Fortran and on some auto-vectorization corners; Clang leads on compile time and diagnostic quality. The Linux kernel supports both; GCC is the default on most distros.

GCC’s plugin system is less developed than LLVM’s pass-pipeline model, but the GCC tree-SSA infrastructure (introduced in GCC 4.0, 2005) parallels SSA in LLVM. GCC also has its JIT library (libgccjit, since GCC 5) which powers Emacs Native Compilation (since Emacs 28, 2022) — Elisp is JIT-compiled to native via libgccjit, a significant speedup.

MSVC

The Microsoft Visual C++ compiler (cl.exe) is the third major C/C++ compiler. Closed-source, with its own IR (CIL for /GL whole-program-optimization, then LLVM-derived ml in modern toolchains). Distinctive features: Windows ABI conformance, structured exception handling (SEH), __declspec, MSVC-specific pragmas. Many large Windows codebases (Office, Windows itself) still build with MSVC; Clang-CL provides MSVC-compatible command-line + ABI on top of LLVM and is increasingly used (Chromium since 2018).

14. WebAssembly (Wasm)

A portable bytecode designed by all four major browsers (V8, SpiderMonkey, JavaScriptCore, ChakraCore) in 2015 and standardized at the W3C in 2017 (MVP). Stack machine; sandboxed; structured control flow (no arbitrary jumps); deterministic.

Compilation targets: C/C++ via Emscripten (Zakai 2011), Rust via wasm-pack / wasm32-unknown-unknown target, Go (limited), AssemblyScript (TypeScript-like, hand-rolled compiler), Zig.
Runtimes: V8, SpiderMonkey, JavaScriptCore in browsers; Wasmtime (Bytecode Alliance, Cranelift), Wasmer (multi-backend), WAMR (Intel, embedded), WasmEdge (CNCF, edge), Wazero (pure Go).
Wasm Component Model (W3C 2023) — interface types, WIT, cross-language composition of modules without a JS glue layer.
Wasm GC (W3C 2023) — typed managed references; lets Java, Kotlin, OCaml, Scheme target Wasm without bundling their own GC.

Wasm is increasingly used as a portable plugin / serverless / edge target (Fastly Compute@Edge, Cloudflare Workers’ V8 isolates with Wasm, Shopify Functions, Fermyon Spin).

WASI

The WebAssembly System Interface (Bytecode Alliance 2019+, WASI Preview 2 = 2024) is the standardized syscall layer for Wasm outside the browser. Capability-based: a module receives an explicit set of file descriptors, network handles, clocks; ambient authority does not exist. Preview 2 introduces the Component Model as the canonical surface, with WIT (Wasm Interface Types) replacing the prior witx files. WASI Preview 3 is targeting async semantics.

Compiling to Wasm

Each source language faces specific Wasm pitfalls:

C/C++ via Emscripten — bundles a libc (musl-derived) and a JS glue layer; uses Binaryen as a Wasm-to-Wasm optimizer.
Rust — wasm32-unknown-unknown (no libc, bare metal), wasm32-wasip1 (legacy WASI), wasm32-wasip2 (Component Model).
Go — historically large output (5+ MB hello-world) due to the runtime; tinygo produces dramatically smaller Wasm.
Languages with GC (Java, Kotlin, OCaml, Scheme) — pre-Wasm-GC, had to bundle their own GC into the module; with Wasm GC (2023), they target the runtime’s GC directly.

15. ML compilers

The deep-learning era reframed compilers as ML systems infrastructure.

TensorFlow XLA (Google 2017) — JIT compiles a TF subgraph (HLO IR) for TPU, GPU, CPU. Operator fusion, layout assignment, autotuning.
JAX (Bradbury, Frostig et al. 2018+) — jax.jit traces Python through jax.numpy to a closed graph and compiles via XLA. The combination jit + vmap + grad + pmap is the most influential composable-transform design of the era.
PyTorch 2.0 + torch.compile (2023) — four-stage pipeline:
1. TorchDynamo — Python frame-level graph capture using CPython bytecode hooks.
2. AOTAutograd — capture forward + backward as a joint graph.
3. Inductor — graph-to-kernel lowering; uses Triton for GPU code and C++ for CPU.
4. Triton — Tillet et al. OpenAI 2021 (MAPL 2019 paper); Python DSL where blocks are first-class; compiles via MLIR → LLVM → PTX / AMDGCN. Wide adoption: most published transformer attention kernels (FlashAttention v2/v3 + many derivatives) are written in Triton.
MLIR-based stacks: IREE (Google, runtime + compiler for accelerators), TensorFlow MLIR (replacing XLA HLO incrementally), Modular Mojo (Lattner 2023; superset-of-Python systems language targeting MLIR), OpenXLA / StableHLO (community-stewarded HLO dialect after the Google + community split).
Apache TVM (Chen et al. 2018) — competitor to XLA; auto-scheduling via Ansor (Zheng et al. OSDI 2020); strong on edge inference and ARM/quantized models.
CUDA + ROCm + SYCL/oneAPI + OpenCL — vendor + open standards for GPU compute. CUDA is the most mature (since 2007). SYCL (Khronos) is a single-source C++ standard that targets multiple backends; oneAPI (Intel) is a SYCL implementation plus libraries. OpenCL is the older portable standard; usage has waned in the deep-learning era.
CUTLASS (NVIDIA 2017+) — header-only C++ template library producing high-performance GEMM, conv, attention kernels at the level of cuBLAS / cuDNN but inspectable + modifiable. CUTLASS 3.x is built on CuTe, a tensor-abstraction layer.
Halide (Ragan-Kelley et al. MIT 2012) — DSL separating algorithm from schedule for image processing pipelines; predecessor to many ML-compiler ideas (separate concerns of what to compute from how to schedule it). Used in production at Adobe, Google (YouTube), Facebook (Instagram).
Tiramisu (Baghdadi et al. 2019) — polyhedral-model DSL for deep learning and dense tensor compute.
Glow (Facebook 2018) — graph compiler for hardware accelerators; merged components into PyTorch + ONNX runtime.

Training vs inference compilers

A subtle taxonomy: training compilers (XLA, torch.compile via Inductor) must support the joint forward + backward graph, with autograd, distributed sharding (FSDP, SPMD), and large batch sizes. Inference compilers (TVM, ONNX Runtime, TensorRT, OpenVINO, Apple Core ML, Apple MLX) optimize for latency, small batch sizes, quantization (INT8, INT4, FP8), and hardware-specific kernels. The compilers share IR layers but the pass pipelines and operator coverage differ substantially.

16. Tree-sitter, LL(*), and incremental parsing

Tree-sitter (Brunsfeld 2018) — parser generator producing GLR parsers that are incremental (re-parse only the edited region) and error-tolerant (always produce a tree). Default parser for syntax highlighting in Neovim, Helix, Zed, GitHub web code search. The grammar repository covers 200+ languages.
LL(*) (ANTLR 4, Parr + Harwell 2014) — practical infinite lookahead via DFA-based prediction. Powers Hibernate, Presto, Solidity.
Combinator + memoizing (packrat) — packrat parsers (Ford 2002) implement PEG with linear-time guarantees via memoization. Used in pegen (CPython), pest (Rust).

The shift from yacc/bison to recursive descent + Pratt + PEG + Tree-sitter is one of the visible craft changes in compiler frontends in the past 15 years.

17. Debugger interface

Without debug information, an optimized binary is opaque. The two dominant formats:

DWARF (DWARF Debugging Standard, current v5 2017) — used on Linux, macOS, FreeBSD, and most Unix targets. Embedded in .debug_* sections of ELF / Mach-O. Encodes types, source-line tables, frame info (CFI), variable locations (which can be register, memory, or piecewise-composed).
PDB (Program Database) — Microsoft Windows format; separate .pdb file alongside .exe / .dll. Has its own toolchain (Microsoft DIA SDK, llvm-pdbutil).

For JITs, the runtime emits debug info dynamically: gdb JIT interface (a callback registry the debugger walks), Linux perf JIT marker files (/tmp/perf-PID.map and JITDUMP for symbolicated profiles), and LLVM llvm.debug.* intrinsics for source-mapping JIT-emitted code.

Debug-info preservation through optimization is hard: when a variable is dead-code-eliminated, the debugger should still tell you “optimized out” rather than displaying garbage. LLVM models this via llvm.dbg.value / llvm.dbg.declare intrinsics that survive optimization. Rust + Swift + Clang all hand-tune debug-info handling per pass.

Source maps (originally Closure Compiler, then standardized by Mozilla 2011) are the same problem for JavaScript: map minified output back to original source. The format (Source Map v3) is JSON with base64-VLQ-encoded mappings; every JS bundler (esbuild, webpack, Vite, Parcel, Turbopack, Rspack) emits it.

18. Common pitfalls

Reinventing parsing instead of using a generator or established library — almost always pays off the first time but rots over time (operator-precedence creep, recovery rules ad-hoc, no language-server integration).
Hand-rolled lexer with regex bugs — common: greedy .* against multi-line input, missing UTF-8 boundary handling, hex literal underscores not in spec.
Forgetting Unicode + multi-byte characters — column reporting in bytes vs codepoints vs graphemes; identifier normalization (NFC vs NFKC).
Exponential blowup in type-class / template / generic resolution — C++ templates have famously triggered hours-long compiles; Rust trait resolution has worst-case exponential search; Scala implicit search has the same problem; Haskell type-family resolution can diverge.
O(n²) name resolution — using nested scopes by re-scanning instead of indexed symbol tables; visible on >100k-line files.
Order-of-passes bugs — optimizing before type checking (now the bug is in optimized form, which is harder to diagnose); running canonicalization after a pass that depends on canonical form; ordering inliner and devirtualization wrong.
Constant folding overflow — folding INT_MAX + 1 to UB-token incorrectly at compile time; LLVM has historically had latent bugs here.
Reliance on undefined behavior to enable optimization — the C/C++ “strict aliasing” + “signed overflow” optimizations are correct per the standard but routinely surprise programmers and cause subtle bugs.
Floating-point reassociation — (a + b) + c is not always equal to a + (b + c) under IEEE 754. -ffast-math (GCC/Clang) and -Ofast permit reassociation that can produce different results; subtle bug source in scientific code.
Ignoring ABI — a function attribute change (noexcept, calling-convention switch, return-type widening) silently breaks linkage and produces nm-passing but runtime-corrupting binaries.
Premature lowering — running target-specific lowering before optimization loses information that the optimizer needed; common in toy compilers that emit assembly directly from the AST.
Optimizing across a barrier you didn’t model — moving a load past an atomic / volatile / inline-asm operation; almost every C/C++ compiler has had bugs in this area.

19. Modern trends

Incremental compilation — only re-typecheck and re-codegen the modules that changed. Rust (since 2017, with --incremental), Swift, Kotlin K2 (2024 GA), C++ modules (slowly adopted). Crucial for large codebases; pairs with content-addressed caching (sccache, Bazel remote cache).
Stable ABIs — letting independent libraries evolve without recompilation against new versions. Swift has a stable ABI on Apple platforms since 5.0 (2019). Rust does not (and is unlikely to commit), but the community has stable C ABI via extern "C" and the upcoming extern "crabi" proposal. C ABI remains the lingua franca.
Self-hosted compilers — Rust (rustc since 2011), Zig (since 2022), Swift (since 2018 in parts), Go (since 1.5, 2015), Crystal, Nim, TypeScript (compiles itself). Self-hosting is a maturity signal and forces dogfooding.
Language servers (LSP) (Microsoft 2016) — one compiler frontend powers the CLI and IDE features (completion, go-to-definition, find-references, refactoring). rust-analyzer, clangd, gopls, TypeScript tsserver, Pyright / pylance, swift-lsp, HLS (Haskell Language Server), JDT-LS. The LSP design demands incremental + error-tolerant frontends.
WebAssembly as universal target — Wasm GC + Component Model unify deployment for a broad set of languages.
AI-assisted compilation — at one level, code completion (Copilot, Cursor, Claude Code) is itself a compiler-frontend-adjacent application; at another, ML for compiler heuristics (inlining decisions, register allocation, autotuning kernel parameters) is a research area: AutoTVM, Ansor, MLGO (Trofin et al. Google 2021) for LLVM inlining + register allocation, PolyBench / Tiramisu for polyhedral.
Sea-of-Nodes IR (Click 1995, used by HotSpot C2, Graal, V8 TurboFan) — fuses CFG and data-flow into one graph where node-scheduling decisions are deferred until late lowering. Trade-off: powerful for optimization, harder to read and debug.
Cranelift (Bytecode Alliance, 2018+) — fast SSA-based code generator written in Rust. Compile-time-first design (single-pass register allocator, no global optimization), targets Wasmtime, the Rust cg_clif debug backend, and Lucet. About 10x faster compile than LLVM at ~80% of the runtime performance.
Reproducible builds — deterministic compilation so the same source produces byte-identical output. Adopted by Debian, NixOS, Tor, and increasingly required by SLSA + supply-chain frameworks. Compilers must avoid timestamps, ASLR seeds, and unordered hash-map iteration in code-gen output.
Formally verified compilers — CompCert (Leroy + INRIA 2006+) is a Coq-verified C compiler used in avionics. CakeML (Kumar et al. 2014) is a verified ML implementation. Verified passes are being incrementally added to LLVM via Alive2 (Lopes et al. 2021) for peephole-correctness checking.
Just-in-time security mitigations — JITs are attractive attack surfaces (writable + executable pages). Defenses: W^X enforcement (V8 + JSC dual-mapped pages), CFI / Intel CET, JIT hardening for sandbox escape resistance.
eBPF compilers — the in-kernel BPF verifier accepts a constrained subset of LLVM IR; clang’s BPF backend (since 2014) compiles restricted C to BPF bytecode. Followed by Aya (Rust eBPF), bpftrace, Cilium’s BPF code-gen. eBPF is itself a JIT target: the kernel translates verified BPF to native at load time.
Domain-specific languages explode — modern systems include many small compilers: SQL planners (Postgres, DuckDB, Spark Catalyst), regex compilers (RE2, Hyperscan, Rust regex), shader compilers (DXIL, SPIR-V toolchains, Naga, glslang), policy compilers (OPA Rego), Datalog (Soufflé). Compiler ideas have outgrown traditional general-purpose programming languages.
Bidirectional type checking — combines inference and checking modes in one pass (Pierce + Turner 2000); used by Rust, Swift, TypeScript, Scala 3, OCaml’s type-directed disambiguation. Avoids the principal-type pitfalls of pure HM and produces better error messages.
Polymorphic recursion + impredicative polymorphism — research-leaning but increasingly mainstream in GHC + Scala 3 + Lean 4.

Build-system co-evolution

A modern compiler is one half of the build experience; the other is the build system. Bazel + Buck2 model the compile graph at file-level granularity with hermetic execution and remote caching. Nix and Guix push this further to bit-reproducible whole-system builds. Cargo + sccache, Gradle + build cache, Pants v2 all converge on the same idea: cache the input → output map of every compile unit, share it across machines, recompile only the minimum. This effectively trades disk + bandwidth for CPU at the team or organization level.

The compiler’s job is increasingly to expose its inputs and outputs honestly so the build system can trust the cache: deterministic output, declared dependencies (Rust’s --emit dep-info, Clang’s -MMD), and stable hashing of intermediate artifacts.

19a. Compiler engineering in practice — case studies

A handful of real systems illustrate how the pieces fit together.

Rust (rustc)

Frontend: hand-written lexer + recursive-descent + Pratt expression parser. Resolves into the HIR (high-level IR, preserves source structure for diagnostics), then THIR (typed HIR), then MIR (mid-level IR with explicit drops, borrows, and CFG — the substrate for borrow checking, exhaustiveness checks, and Rust-specific lints), then LLVM IR. Borrow checking happens on MIR (Polonius is the experimental NLL successor). Trait selection is a constraint solver invoked during type checking; complaints about Rust compile times are largely complaints about trait selection + monomorphization + LLVM. The cg_clif Cranelift backend exists for fast debug builds.

Swift (swiftc)

Frontend: hand-written lexer + recursive-descent + custom expression parser (handles trailing closures, builder syntax). Custom SIL (Swift Intermediate Language, SSA-form with reference-counting and existential operations) is the substrate for Swift-specific optimizations — ARC optimization, generic specialization, devirtualization, exclusivity checking — before lowering to LLVM IR. Swift’s resilient generics avoid full monomorphization across module boundaries, trading code size for slightly slower generic dispatch (witness tables). The compiler is itself heavily used for the IDE (SourceKit-LSP).

Hotspot JVM

Multi-tier: bytecode interpreter → C1 (client) JIT → C2 (server, sea-of-nodes) JIT. Profile counters drive promotion between tiers. C2 performs escape analysis, scalar replacement of allocations, aggressive devirtualization, on-stack replacement. Deopt is plumbed through every speculative optimization. GraalVM offers an alternative server compiler written in Java itself, with Truffle (interpreter framework) on top.

V8

JavaScript pipeline: parser → Ignition (bytecode interpreter) → Sparkplug (non-optimizing baseline JIT, since 2021) → Maglev (mid-tier optimizing JIT, since 2023) → TurboFan (top-tier optimizing JIT, sea-of-nodes). Wasm pipeline: Liftoff (one-pass baseline) → TurboFan. Inline caches feed type feedback to the optimizing tiers; deopts fall back to Sparkplug.

TypeScript

Single-pass-ish: lex → parse → bind (name resolution) → check (type checking). Then emit either JavaScript or .d.ts declarations. Type checking is structural with extensive narrowing; the checker is famous for accepting some non-decidable corners and bailing with a depth limit. The same compiler powers tsc, tsserver (LSP), and editors.

20. Cross-references

[[Compute/_index]] — compute pillar index.
[[Compute/concurrency-primitives]] — locks, atomics, memory models (relevant to multi-threaded codegen).
[[Compute/garbage-collection]] — GC algorithms used by runtimes.
[[Compute/os-scheduling-memory]] — page tables, memory layout, calling conventions.
[[Compute/transformer-architecture]] — what ML compilers ultimately compile.
[[Languages/_index]] — language families index.
[[Languages/Tier3/build-devops]] — build systems and toolchains adjacent to compilers.

21. Citations

Aho, Lam, Sethi, Ullman. Compilers: Principles, Techniques, and Tools, 2nd ed. Addison-Wesley, 2007. (The “Dragon Book”.)
Appel. Modern Compiler Implementation in ML / C / Java, 2nd ed. Cambridge, 2002.
Muchnick. Advanced Compiler Design and Implementation. Morgan Kaufmann, 1997.
Cooper + Torczon. Engineering a Compiler, 3rd ed. Morgan Kaufmann, 2022.
Lattner. LLVM: An Infrastructure for Multi-Stage Optimization. MS thesis, UIUC, 2003.
Lattner + Adve. “LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation”. CGO 2004.
Cytron, Ferrante, Rosen, Wegman, Zadeck. “Efficiently Computing Static Single Assignment Form and the Control Dependence Graph”. TOPLAS 13(4), 1991.
Chaitin. “Register Allocation & Spilling via Graph Coloring”. SIGPLAN 1981.
George + Appel. “Iterated Register Coalescing”. TOPLAS 18(3), 1996.
Poletto + Sarkar. “Linear Scan Register Allocation”. TOPLAS 21(5), 1999.
Hack. Register Allocation for Programs in SSA Form. PhD diss., U. Karlsruhe, 2006.
Click. “Combining Analyses, Combining Optimizations”. PhD diss., Rice, 1995. (Sea-of-Nodes.)
Pratt. “Top Down Operator Precedence”. POPL 1973.
Ford. “Parsing Expression Grammars: A Recognition-Based Syntactic Foundation”. POPL 2004.
Parr + Harwell. “ALL(*): The Foundation of the ANTLR Parser Generator”. PLDI 2014.
Brunsfeld et al. “Tree-sitter: An Incremental Parsing System for Programming Tools”. 2018+.
Hindley. “The Principal Type-Scheme of an Object in Combinatory Logic”. TAMS 146, 1969.
Milner. “A Theory of Type Polymorphism in Programming”. JCSS 17, 1978.
Damas + Milner. “Principal Type-Schemes for Functional Programs”. POPL 1982.
Reynolds. “Towards a Theory of Type Structure”. 1974. (System F.)
Bernardy, Boespflug, Newton, Peyton Jones, Spiwack. “Linear Haskell: Practical Linearity in a Higher-Order Polymorphic Language”. POPL 2018.
Rondon, Kawaguchi, Jhala. “Liquid Types”. PLDI 2008.
Leijen. “Koka: Programming with Row-Polymorphic Effect Types”. MSFP 2014.
Lattner et al. “MLIR: Scaling Compiler Infrastructure for Domain Specific Computation”. CGO 2021.
Tillet, Kung, Cox. “Triton: An Intermediate Language and Compiler for Tiled Neural Network Computations”. MAPL 2019.
Chen et al. “TVM: An Automated End-to-End Optimizing Compiler for Deep Learning”. OSDI 2018.
Zheng et al. “Ansor: Generating High-Performance Tensor Programs for Deep Learning”. OSDI 2020.
Ansel et al. “PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph Compilation”. ASPLOS 2024.
Trofin et al. “MLGO: a Machine Learning Guided Compiler Optimizations Framework”. 2021.
DWARF Debugging Information Format, Version 5. DWARF Standards Committee, 2017.
WebAssembly Core Specification 2.0. W3C, 2022. Component Model + GC proposals, W3C 2023.
Pierce + Turner. “Local Type Inference”. TOPLAS 22(1), 2000. (Bidirectional type checking.)
Mairson. “Deciding ML Typability is Complete for Deterministic Exponential Time”. POPL 1990.
Click + Paleczny. “A Simple Graph-Based Intermediate Representation”. IR Workshop 1995. (Sea of Nodes.)
Lopes, Lee, Nagarakatte, Regehr. “Alive2: Bounded Translation Validation for LLVM”. PLDI 2021.
Leroy. “Formal Verification of a Realistic Compiler”. CACM 52(7), 2009. (CompCert.)
Kumar, Myreen, Norrish, Owens. “CakeML: A Verified Implementation of ML”. POPL 2014.
Bastoul. “Code Generation in the Polyhedral Model Is Easier Than You Think”. PACT 2004.
Grosser, Größlinger, Lengauer. “Polly — Performing Polyhedral Optimizations on a Low-Level Intermediate Representation”. Parallel Processing Letters 22(4), 2012.
Ragan-Kelley et al. “Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines”. PLDI 2013.
Kohlbecker, Friedman, Felleisen, Duba. “Hygienic Macro Expansion”. LFP 1986.
Rosen, Wegman, Zadeck. “Global Value Numbers and Redundant Computations”. POPL 1988.
Wegman + Zadeck. “Constant Propagation with Conditional Branches”. TOPLAS 13(2), 1991.
Matsakis + Klock. “The Rust Language”. HILT 2014.
Bendersky, van Rossum, et al. PEP 617 — “New PEG Parser for CPython”. 2020.

Last reviewed 2026-05-16. Foundational Tier 1 reference for the Compute pillar — pairs with [[Compute/garbage-collection]] and [[Compute/os-scheduling-memory]] as the systems-language triad.

Compendium

Explorer

Compiler Design — Compute Reference

Compiler Design — Compute Reference

1. At a glance

2. Lexing (tokenization)

3. Parsing

Top-down

Bottom-up

PEG and combinators

Error recovery + diagnostics

4. Semantic analysis

5. Type systems

Static vs dynamic

Strong vs weak

Hindley-Milner inference

System F and dependent types

Linear and affine types

Refinement types

Effect systems

Subtyping and variance

Higher-kinded types, GADTs, type families

Type-inference cost

6. Intermediate Representations (IR)

AST

Three-address code (TAC)

SSA — Static Single Assignment

CFG — Control Flow Graph

CPS — Continuation-Passing Style

MLIR — Multi-Level IR

7. Optimizations

Local (within a basic block)

Loop

Inter-procedural

Profile-Guided Optimization (PGO)

Link-Time Optimization (LTO)

Autovectorization

Whole-program optimization and specialization

Other notable transforms

8. Register allocation

9. Code generation

Instruction selection

Instruction scheduling

Machine-specific lowering

10. JIT vs AOT

AOT — Ahead-of-Time

JIT — Just-in-Time

Hybrid

Deopt + on-stack replacement

11. Garbage collection

12. LLVM

LLVM IR details worth knowing

LLVM ecosystem tooling

13. GCC

MSVC

14. WebAssembly (Wasm)

WASI

Compiling to Wasm

15. ML compilers

Training vs inference compilers

16. Tree-sitter, LL(*), and incremental parsing

17. Debugger interface

18. Common pitfalls

19. Modern trends

Build-system co-evolution

19a. Compiler engineering in practice — case studies

Rust (rustc)

Swift (swiftc)

Hotspot JVM

V8

TypeScript

20. Cross-references

21. Citations

Graph View

Table of Contents