DWARF Expressions & Debug-Info Sub-DSLs Family Index


type: language-family-index family: dwarf-expressions languages_catalogued: 16 tags: [language-reference, family-index, dwarf-expressions, debug-info, unwind, cfi, codeview, mach-o, pdb, ehabi]

DWARF Expressions & Debug-Info Sub-DSLs — Family Index

Family overview

Most engineers who work with native binaries do not realise it, but debug information is a programming language — several of them, in fact, embedded inside the gaps between sections of every ELF, Mach-O, and PE file on the planet. The canonical example is DWARF expressions (DW_OP_* opcodes), the stack-machine bytecode that lives inside .debug_info, .debug_loc/.debug_loclists, and .debug_frame. A DWARF expression is what the debugger evaluates when you ask it “where is variable x at this PC?” — the answer is not a fixed address, it is a tiny program. Production debug info routinely embeds non-trivial compiled programs in this language, generated automatically by GCC, Clang, and rustc.

DWARF itself has a long arc: DWARF 1 (1992, AT&T Bell Labs / UNIX International PLSIG) was the first attempt; DWARF 2 (1993) introduced the expression evaluator and became the widely-deployed baseline; DWARF 3 (2005) and 4 (2010) added incrementally; DWARF 5 (February 2017) is the current production standard, with the major split-DWARF / .debug_loclists / .debug_rnglists / .debug_str_offsets overhaul; and DWARF 6 is in working draft as of 2026 — the most recent public snapshot is dwarf6-20250505-2228.pdf from 2025-05-05 on snapshots.sourceware.org/dwarfstd/. GCC and GDB have already begun emitting DWARF 6 features such as DW_AT_language_name / DW_AT_language_version ahead of finalisation.

The key insight — surfaced by Krister Walfridsson (2016) and earlier by the WOOT ‘11 paper “Exploiting the hard-working DWARF” (Oakley & Bratus) — is that DWARF expressions are formally Turing-complete. The opcode set includes unconditional branch (DW_OP_skip, signed 2-byte offset), conditional branch (DW_OP_bra, pop-test-and-skip), full integer arithmetic, an unbounded operand stack by spec, and DW_OP_call2 / DW_OP_call4 / DW_OP_call_ref opcodes that invoke other DIEs’ expressions — including across compile units, which makes recursion possible. A maliciously crafted .debug_loc entry can drive a debugger, profiler, or crash-reporter into infinite computation; weaponised DWARF has been demonstrated as a way to host trojan logic with no native executable code (no NX/ASLR violations, because nothing is ever loaded into executable memory — the “VM” is the debugger itself).

The same problem class produced parallel-evolution sibling languages on every other platform. Microsoft has CodeView symbol records (S_*) plus the FPO (Frame Pointer Omission) stream and the broader PDB MSF container — never officially documented, reverse-engineered by the LLVM project. Apple has Mach-O __compact_unwind (a denser, lookup-friendly alternative to DWARF CFI, with arch-specific 32-bit-per-PC opcode encodings, falling back to __eh_frame for cases the compact vocabulary cannot express). The Linux/BSD world uses .eh_frame CIE/FDE (Itanium C++ ABI lineage; same DW_CFA_* instruction set as DWARF CFI but living in a loadable section). 32-bit ARM defines its own format in EHABI. Windows x64 has .pdata / .xdata with UWOP_* codes optimised for SEH interop. WebAssembly adopted DWARF 5 with its own quirks. And LLVM IR has its own debug-info layer (!DILocation, !DICompileUnit, …) sitting above DWARF — a small DSL the compiler manipulates internally before emitting a target debug format. Security-wise, libdwarf, elfutils libdw, and binutils libbfd have all shipped CVE fixes for malformed-expression attacks; the implicit attack surface is “any program that loads debug info” — debuggers, profilers, perf, eBPF tooling, sanitisers, crash reporters.

In our deep library

None catalogued. Debug-info sub-DSLs do not have standalone deep-library notes; they are embedded sublanguages of binary container formats.

This note is a split-out from assembly-and-encoding (which catalogues the binary/object/executable formats themselves: ELF, Mach-O, PE/COFF, etc.). That note covers the containers; this one covers the embedded sub-languages within them. Cross-reference:

  • assembly-and-encoding — parent / sibling family. ELF, Mach-O, PE/COFF, DWARF-the-format itself live there.
  • notation-spec — adjacent (DWARF the spec is itself a tabular notation language).
  • c, cpp — the typical compiler frontends that emit DWARF.
  • rust — increasingly the source of pathological debug info, because generic monomorphisation produces enormous numbers of DIEs and complex location expressions for closure captures.
  • gpu-and-shaders — PTX has its own debug-info conventions; AMDGPU has DWARF extensions for heterogeneous (host+GPU) debugging.
  • wasm — WebAssembly DWARF adoption.

Tier 3 family table

Language / DSLFirst appearedOriginContainerStatus (2026)URL
DWARF Expressions (DW_OP_*)1993 (DWARF 2)UNIX International PLSIG → DWARF Standards Committee.debug_info, .debug_loc/.debug_loclists (ELF, Mach-O, PE/COFF)Active; DWARF 5 (Feb 2017) is the production baseline; DWARF 6 working draft snapshot 2025-05-05https://dwarfstd.org/
DWARF Call Frame Information (DW_CFA_*)1993 (DWARF 2)UNIX International PLSIG.debug_frame (ELF)Active; the canonical unwind languagehttps://dwarfstd.org/doc/DWARF5.pdf
DWARF Location Lists (.debug_loclists)DWARF 5 (2017) replaced earlier .debug_locDWARF Standards Committee.debug_loclists (DWARF 5+); .debug_loc (DWARF 2–4)Activehttps://dwarfstd.org/doc/DWARF5.pdf
DWARF Macro Information (DW_MACRO_*)DWARF 5 (2017), reworked from DWARF 4 .debug_macinfoDWARF Standards Committee.debug_macroActive, niche — most builds disable ithttps://dwarfstd.org/doc/DWARF5.pdf
DW_OP_call_ref + GNU extensions (DW_OP_GNU_entry_value, DW_OP_GNU_const_type, DW_OP_GNU_implicit_pointer, …)GNU extensions ~2010+; some folded into DWARF 5GCC / GDB community.debug_info and friendsActive; vendor-specific opcode subdialectshttps://sourceware.org/elfutils/DwarfExtensions
Linker index sublanguages (.debug_aranges, .debug_pubnames, .debug_pubtypes, .debug_names)DWARF 2 (pubnames) → DWARF 5 (debug_names accelerator table)DWARF Standards CommitteeELF index sectionsActive; .debug_names (DWARF 5) supersedes .gdb_index and the older pub* sectionshttps://dwarfstd.org/doc/DWARF5.pdf
CodeView Symbol Records (S_*)~1989 (Microsoft C 6.0); reverse-engineered ~2015+ by LLVMMicrosoft.debug$S (object) / DBI module streams (PDB)Active; sole MSVC debug format. Officially undocumented, only LLVM’s reverse-engineered docs existhttps://llvm.org/docs/PDB/CodeViewSymbols.html
PDB Stream Format (MSF) + FPO streamPDB ~1990s; FPO (FPO_DATA) for x86 frame-pointer-omitted unwindMicrosoft.pdb files (separate from PE image)Active; FPO is x86-32-only and largely legacyhttps://llvm.org/docs/PDB/index.html
Mach-O __compact_unwind~2010 (Snow Leopard / LLVM libunwind era)Apple__TEXT,__unwind_info (Mach-O)Active; primary unwind format on Darwin, falls back to __eh_frame for hard caseshttps://faultlore.com/blah/compact-unwinding/
.eh_frame CIE/FDE~1999 (Itanium C++ ABI / LSB)HP/Intel/Itanium ABI group.eh_frame / .eh_frame_hdr (loadable ELF section)Active; same DW_CFA_* instructions as DWARF CFI but loaded at runtime for C++ EH and _Unwind_Backtracehttps://itanium-cxx-abi.github.io/cxx-abi/exceptions.pdf
ARM EHABI exception tables2007 (first ratified)Arm Ltd..ARM.exidx + .ARM.extab (ELF, AArch32)Active for AArch32 only; AArch64 uses standard DWARF/.eh_framehttps://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst
Windows x64 unwind codes (UWOP_*)2003 (Windows Server 2003 x64 / AMD64 ABI)Microsoft.pdata (RUNTIME_FUNCTION) + .xdata (UNWIND_INFO) in PE32+Active; SEH-integrated; UNWIND_INFO v2 added UWOP_EPILOGhttps://learn.microsoft.com/en-us/cpp/build/exception-handling-x64
ARM64 / AArch64 Windows unwind codes~2017 (Windows on ARM64)Microsoft.pdata / .xdata (PE)Activehttps://learn.microsoft.com/en-us/cpp/build/arm64-exception-handling
WebAssembly DWARF~2019WebAssembly CG + LLVMDWARF 5 sections inside Wasm modules; offsets resolved against the code sectionActive; Chrome DevTools and Emscripten use ithttps://yurydelendik.github.io/webassembly-dwarf/
LLVM IR debug-info metadata (!DILocation, !DISubprogram, !DICompileUnit, …)2014 (the metadata rewrite, replacing MDNode debug nodes)LLVM project (Adrian Prantl et al.)LLVM IR .ll / bitcodeActive; the IR-side debug DSL upstream of every target backendhttps://llvm.org/docs/SourceLevelDebugging.html
AMDGPU / heterogeneous DWARF extensionsLLVM 10+ (~2020)AMD + LLVMDWARF in ROCm/HIP/HSA targets; adds segment-aware location descriptionsActive, vendor extension trying to upstream into DWARF 6https://llvm.org/docs/AMDGPUDwarfExtensionsForHeterogeneousDebugging.html

Notable threads

  • DWARF expressions are formally Turing-complete. The DW_OP_bra (conditional branch on stack-top non-zero) plus DW_OP_skip (unconditional branch, signed 2-byte offset) opcodes provide branching; the DWARF spec places no upper bound on the operand stack; and DW_OP_call2 / DW_OP_call4 / DW_OP_call_ref allow one DIE’s expression to invoke another DIE’s expression — across compile units in the call_ref case — which is sufficient for recursion. Krister Walfridsson and the WOOT ‘11 paper “Exploiting the hard-working DWARF” (Oakley & Bratus) both demonstrated that real malicious .debug_loc or .eh_frame expressions can drive a debugger or unwinder into arbitrary computation. This is not theoretical: weaponised DWARF lets you host trojan logic with no native executable code, evading NX/ASLR because the “VM” is the debugger itself.

  • The location-list pattern is a mini-program-per-variable. A single C variable does not have a single location for its lifetime — across an optimised function, variable x may live in register RAX over PC range 0x10000x1080, then in stack slot [RBP-24] over 0x10800x10A0, then be partially in XMM0 low half and EBX (split storage) over 0x10A00x10F0. DWARF 5 .debug_loclists is the table of (PC range, expression) pairs that encodes this. Each expression itself is a mini program. The volume of these expressions in a -O2 Rust binary with heavy generics is one reason debug info now routinely outweighs the code section.

  • CFI is its own assembly language. .debug_frame (DWARF) and .eh_frame (loadable, Itanium ABI) hold a stream of DW_CFA_* instructions executed by the unwinder to virtually reconstruct register state at any PC: DW_CFA_advance_loc (advance the simulated PC), DW_CFA_def_cfa_offset (redefine where the canonical frame address sits relative to a register), DW_CFA_offset (record where a callee-saved register was spilled), DW_CFA_restore (revert to CIE-defined state). Stack unwinding — for C++ exceptions, panics, profiler back-traces, eBPF stack walks — is literally interpretation of this instruction stream.

  • Mach-O __compact_unwind is Apple’s denser alternative. Instead of streaming CFI instructions, compact unwind stores a fixed-size 32-bit opcode per PC range in a two-level page-table indexed by function offset. Most functions resolve to one opcode for the whole function (LLVM literally calls them “function offsets”). The vocabulary is closed and arch-specific — ARM64 has a particularly tight opcode set because the AArch64 ABI is stricter — and any function whose unwind shape does not fit falls back to a full DWARF FDE in __eh_frame. The two sections are complementary, not duplicate.

  • Windows x64 UWOP_* is yet a third design, optimised for SEH. The .pdata section is an array of RUNTIME_FUNCTION (BeginAddress / EndAddress / UnwindInfoAddress RVAs); .xdata holds the UNWIND_INFO structs. Each UNWIND_CODE (e.g. UWOP_PUSH_NONVOL, UWOP_ALLOC_LARGE, UWOP_SAVE_NONVOL, UWOP_EPILOG in v2) describes one prolog/epilog operation. The whole format is built around making Structured Exception Handling dispatch fast — the unwinder must run during RaiseException synchronously, with no debugger present, so simplicity wins over expressiveness.

  • DWARF 5 → DWARF 6. DWARF 5 (Feb 2017) was the major modern overhaul: split DWARF (.dwo / .dwp), .debug_loclists, .debug_rnglists, .debug_str_offsets, .debug_names accelerator table, plus a proper byte-code DW_OP_implicit_value / DW_OP_implicit_pointer story. DWARF 6 is in working draft — public snapshot dwarf6-20250505-2228.pdf (2025-05-05) on snapshots.sourceware.org, and the live spec lives in git at git.dwarfstd.org/dwarf-spec. DWARF 6 is mostly cleanups, bug fixes, and modest features (e.g. DW_AT_language_name / DW_AT_language_version decoupled from the old monolithic DW_LANG_* enum, which GCC has already started emitting). Major adoption still waits on the next round of toolchains. libdwarf 2.3.1 (2026-03-04) is the current consumer-side reference implementation.

  • Security: the attack surface is “any program that loads debug info.” libdwarf, elfutils libdw, and binutils libbfd have all shipped CVE fixes from malformed expressions and corrupt abbreviation lists. A 2025 libdwarf change made the library refuse rather than loop on duplicate-attribute abbreviation tables — a previous DoS vector. The systemic issue is that debug parsers run before anyone authenticates the binary (debuggers attach, profilers ingest, crash-reporters symbolicate, eBPF tools symbolicate kernel addresses), which means a malicious crash dump or core file can attack the analyst’s machine. The “Exploiting the hard-working DWARF” paper showed this years before it became a CVE pattern.

  • LLVM IR debug-info metadata is a DSL above DWARF. LLVM IR carries debug info as metadata nodes (!DILocation(line: 42, column: 7, scope: !12), !DICompileUnit(language: DW_LANG_C_plus_plus_14, ...), !DISubprogram(...), !DILocalVariable(...), etc.) — a small structured DSL the optimiser must preserve as it transforms IR. Backends then lower this metadata to DWARF, CodeView, or compact-unwind as appropriate for the target. So in modern toolchains there are at least two debug-info languages stacked: LLVM IR metadata at the compiler-internal layer, and DWARF/CodeView/etc. at the binary layer. Bugs in either (“optimised-out variables”, missing line info, wrong inlined-frame attribution) are notoriously hard because they require thinking in both languages simultaneously.

Citations