DWARF Expressions & Debug-Info Sub-DSLs Family Index
type: language-family-index family: dwarf-expressions languages_catalogued: 16 tags: [language-reference, family-index, dwarf-expressions, debug-info, unwind, cfi, codeview, mach-o, pdb, ehabi]
DWARF Expressions & Debug-Info Sub-DSLs — Family Index
Family overview
Most engineers who work with native binaries do not realise it, but debug information is a programming language — several of them, in fact, embedded inside the gaps between sections of every ELF, Mach-O, and PE file on the planet. The canonical example is DWARF expressions (DW_OP_* opcodes), the stack-machine bytecode that lives inside .debug_info, .debug_loc/.debug_loclists, and .debug_frame. A DWARF expression is what the debugger evaluates when you ask it “where is variable x at this PC?” — the answer is not a fixed address, it is a tiny program. Production debug info routinely embeds non-trivial compiled programs in this language, generated automatically by GCC, Clang, and rustc.
DWARF itself has a long arc: DWARF 1 (1992, AT&T Bell Labs / UNIX International PLSIG) was the first attempt; DWARF 2 (1993) introduced the expression evaluator and became the widely-deployed baseline; DWARF 3 (2005) and 4 (2010) added incrementally; DWARF 5 (February 2017) is the current production standard, with the major split-DWARF / .debug_loclists / .debug_rnglists / .debug_str_offsets overhaul; and DWARF 6 is in working draft as of 2026 — the most recent public snapshot is dwarf6-20250505-2228.pdf from 2025-05-05 on snapshots.sourceware.org/dwarfstd/. GCC and GDB have already begun emitting DWARF 6 features such as DW_AT_language_name / DW_AT_language_version ahead of finalisation.
The key insight — surfaced by Krister Walfridsson (2016) and earlier by the WOOT ‘11 paper “Exploiting the hard-working DWARF” (Oakley & Bratus) — is that DWARF expressions are formally Turing-complete. The opcode set includes unconditional branch (DW_OP_skip, signed 2-byte offset), conditional branch (DW_OP_bra, pop-test-and-skip), full integer arithmetic, an unbounded operand stack by spec, and DW_OP_call2 / DW_OP_call4 / DW_OP_call_ref opcodes that invoke other DIEs’ expressions — including across compile units, which makes recursion possible. A maliciously crafted .debug_loc entry can drive a debugger, profiler, or crash-reporter into infinite computation; weaponised DWARF has been demonstrated as a way to host trojan logic with no native executable code (no NX/ASLR violations, because nothing is ever loaded into executable memory — the “VM” is the debugger itself).
The same problem class produced parallel-evolution sibling languages on every other platform. Microsoft has CodeView symbol records (S_*) plus the FPO (Frame Pointer Omission) stream and the broader PDB MSF container — never officially documented, reverse-engineered by the LLVM project. Apple has Mach-O __compact_unwind (a denser, lookup-friendly alternative to DWARF CFI, with arch-specific 32-bit-per-PC opcode encodings, falling back to __eh_frame for cases the compact vocabulary cannot express). The Linux/BSD world uses .eh_frame CIE/FDE (Itanium C++ ABI lineage; same DW_CFA_* instruction set as DWARF CFI but living in a loadable section). 32-bit ARM defines its own format in EHABI. Windows x64 has .pdata / .xdata with UWOP_* codes optimised for SEH interop. WebAssembly adopted DWARF 5 with its own quirks. And LLVM IR has its own debug-info layer (!DILocation, !DICompileUnit, …) sitting above DWARF — a small DSL the compiler manipulates internally before emitting a target debug format. Security-wise, libdwarf, elfutils libdw, and binutils libbfd have all shipped CVE fixes for malformed-expression attacks; the implicit attack surface is “any program that loads debug info” — debuggers, profilers, perf, eBPF tooling, sanitisers, crash reporters.
In our deep library
None catalogued. Debug-info sub-DSLs do not have standalone deep-library notes; they are embedded sublanguages of binary container formats.
This note is a split-out from assembly-and-encoding (which catalogues the binary/object/executable formats themselves: ELF, Mach-O, PE/COFF, etc.). That note covers the containers; this one covers the embedded sub-languages within them. Cross-reference:
- assembly-and-encoding — parent / sibling family. ELF, Mach-O, PE/COFF, DWARF-the-format itself live there.
- notation-spec — adjacent (DWARF the spec is itself a tabular notation language).
- c, cpp — the typical compiler frontends that emit DWARF.
- rust — increasingly the source of pathological debug info, because generic monomorphisation produces enormous numbers of DIEs and complex location expressions for closure captures.
- gpu-and-shaders — PTX has its own debug-info conventions; AMDGPU has DWARF extensions for heterogeneous (host+GPU) debugging.
- wasm — WebAssembly DWARF adoption.
Tier 3 family table
| Language / DSL | First appeared | Origin | Container | Status (2026) | URL |
|---|---|---|---|---|---|
DWARF Expressions (DW_OP_*) | 1993 (DWARF 2) | UNIX International PLSIG → DWARF Standards Committee | .debug_info, .debug_loc/.debug_loclists (ELF, Mach-O, PE/COFF) | Active; DWARF 5 (Feb 2017) is the production baseline; DWARF 6 working draft snapshot 2025-05-05 | https://dwarfstd.org/ |
DWARF Call Frame Information (DW_CFA_*) | 1993 (DWARF 2) | UNIX International PLSIG | .debug_frame (ELF) | Active; the canonical unwind language | https://dwarfstd.org/doc/DWARF5.pdf |
DWARF Location Lists (.debug_loclists) | DWARF 5 (2017) replaced earlier .debug_loc | DWARF Standards Committee | .debug_loclists (DWARF 5+); .debug_loc (DWARF 2–4) | Active | https://dwarfstd.org/doc/DWARF5.pdf |
DWARF Macro Information (DW_MACRO_*) | DWARF 5 (2017), reworked from DWARF 4 .debug_macinfo | DWARF Standards Committee | .debug_macro | Active, niche — most builds disable it | https://dwarfstd.org/doc/DWARF5.pdf |
DW_OP_call_ref + GNU extensions (DW_OP_GNU_entry_value, DW_OP_GNU_const_type, DW_OP_GNU_implicit_pointer, …) | GNU extensions ~2010+; some folded into DWARF 5 | GCC / GDB community | .debug_info and friends | Active; vendor-specific opcode subdialects | https://sourceware.org/elfutils/DwarfExtensions |
Linker index sublanguages (.debug_aranges, .debug_pubnames, .debug_pubtypes, .debug_names) | DWARF 2 (pubnames) → DWARF 5 (debug_names accelerator table) | DWARF Standards Committee | ELF index sections | Active; .debug_names (DWARF 5) supersedes .gdb_index and the older pub* sections | https://dwarfstd.org/doc/DWARF5.pdf |
CodeView Symbol Records (S_*) | ~1989 (Microsoft C 6.0); reverse-engineered ~2015+ by LLVM | Microsoft | .debug$S (object) / DBI module streams (PDB) | Active; sole MSVC debug format. Officially undocumented, only LLVM’s reverse-engineered docs exist | https://llvm.org/docs/PDB/CodeViewSymbols.html |
| PDB Stream Format (MSF) + FPO stream | PDB ~1990s; FPO (FPO_DATA) for x86 frame-pointer-omitted unwind | Microsoft | .pdb files (separate from PE image) | Active; FPO is x86-32-only and largely legacy | https://llvm.org/docs/PDB/index.html |
Mach-O __compact_unwind | ~2010 (Snow Leopard / LLVM libunwind era) | Apple | __TEXT,__unwind_info (Mach-O) | Active; primary unwind format on Darwin, falls back to __eh_frame for hard cases | https://faultlore.com/blah/compact-unwinding/ |
.eh_frame CIE/FDE | ~1999 (Itanium C++ ABI / LSB) | HP/Intel/Itanium ABI group | .eh_frame / .eh_frame_hdr (loadable ELF section) | Active; same DW_CFA_* instructions as DWARF CFI but loaded at runtime for C++ EH and _Unwind_Backtrace | https://itanium-cxx-abi.github.io/cxx-abi/exceptions.pdf |
| ARM EHABI exception tables | 2007 (first ratified) | Arm Ltd. | .ARM.exidx + .ARM.extab (ELF, AArch32) | Active for AArch32 only; AArch64 uses standard DWARF/.eh_frame | https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst |
Windows x64 unwind codes (UWOP_*) | 2003 (Windows Server 2003 x64 / AMD64 ABI) | Microsoft | .pdata (RUNTIME_FUNCTION) + .xdata (UNWIND_INFO) in PE32+ | Active; SEH-integrated; UNWIND_INFO v2 added UWOP_EPILOG | https://learn.microsoft.com/en-us/cpp/build/exception-handling-x64 |
| ARM64 / AArch64 Windows unwind codes | ~2017 (Windows on ARM64) | Microsoft | .pdata / .xdata (PE) | Active | https://learn.microsoft.com/en-us/cpp/build/arm64-exception-handling |
| WebAssembly DWARF | ~2019 | WebAssembly CG + LLVM | DWARF 5 sections inside Wasm modules; offsets resolved against the code section | Active; Chrome DevTools and Emscripten use it | https://yurydelendik.github.io/webassembly-dwarf/ |
LLVM IR debug-info metadata (!DILocation, !DISubprogram, !DICompileUnit, …) | 2014 (the metadata rewrite, replacing MDNode debug nodes) | LLVM project (Adrian Prantl et al.) | LLVM IR .ll / bitcode | Active; the IR-side debug DSL upstream of every target backend | https://llvm.org/docs/SourceLevelDebugging.html |
| AMDGPU / heterogeneous DWARF extensions | LLVM 10+ (~2020) | AMD + LLVM | DWARF in ROCm/HIP/HSA targets; adds segment-aware location descriptions | Active, vendor extension trying to upstream into DWARF 6 | https://llvm.org/docs/AMDGPUDwarfExtensionsForHeterogeneousDebugging.html |
Notable threads
-
DWARF expressions are formally Turing-complete. The
DW_OP_bra(conditional branch on stack-top non-zero) plusDW_OP_skip(unconditional branch, signed 2-byte offset) opcodes provide branching; the DWARF spec places no upper bound on the operand stack; andDW_OP_call2/DW_OP_call4/DW_OP_call_refallow one DIE’s expression to invoke another DIE’s expression — across compile units in thecall_refcase — which is sufficient for recursion. Krister Walfridsson and the WOOT ‘11 paper “Exploiting the hard-working DWARF” (Oakley & Bratus) both demonstrated that real malicious.debug_locor.eh_frameexpressions can drive a debugger or unwinder into arbitrary computation. This is not theoretical: weaponised DWARF lets you host trojan logic with no native executable code, evading NX/ASLR because the “VM” is the debugger itself. -
The location-list pattern is a mini-program-per-variable. A single C variable does not have a single location for its lifetime — across an optimised function, variable
xmay live in registerRAXover PC range0x1000–0x1080, then in stack slot[RBP-24]over0x1080–0x10A0, then be partially inXMM0low half andEBX(split storage) over0x10A0–0x10F0. DWARF 5.debug_loclistsis the table of(PC range, expression)pairs that encodes this. Each expression itself is a mini program. The volume of these expressions in a-O2Rust binary with heavy generics is one reason debug info now routinely outweighs the code section. -
CFI is its own assembly language.
.debug_frame(DWARF) and.eh_frame(loadable, Itanium ABI) hold a stream ofDW_CFA_*instructions executed by the unwinder to virtually reconstruct register state at any PC:DW_CFA_advance_loc(advance the simulated PC),DW_CFA_def_cfa_offset(redefine where the canonical frame address sits relative to a register),DW_CFA_offset(record where a callee-saved register was spilled),DW_CFA_restore(revert to CIE-defined state). Stack unwinding — for C++ exceptions, panics, profiler back-traces, eBPF stack walks — is literally interpretation of this instruction stream. -
Mach-O
__compact_unwindis Apple’s denser alternative. Instead of streaming CFI instructions, compact unwind stores a fixed-size 32-bit opcode per PC range in a two-level page-table indexed by function offset. Most functions resolve to one opcode for the whole function (LLVM literally calls them “function offsets”). The vocabulary is closed and arch-specific — ARM64 has a particularly tight opcode set because the AArch64 ABI is stricter — and any function whose unwind shape does not fit falls back to a full DWARF FDE in__eh_frame. The two sections are complementary, not duplicate. -
Windows x64
UWOP_*is yet a third design, optimised for SEH. The.pdatasection is an array ofRUNTIME_FUNCTION(BeginAddress / EndAddress / UnwindInfoAddress RVAs);.xdataholds theUNWIND_INFOstructs. EachUNWIND_CODE(e.g.UWOP_PUSH_NONVOL,UWOP_ALLOC_LARGE,UWOP_SAVE_NONVOL,UWOP_EPILOGin v2) describes one prolog/epilog operation. The whole format is built around making Structured Exception Handling dispatch fast — the unwinder must run duringRaiseExceptionsynchronously, with no debugger present, so simplicity wins over expressiveness. -
DWARF 5 → DWARF 6. DWARF 5 (Feb 2017) was the major modern overhaul: split DWARF (
.dwo/.dwp),.debug_loclists,.debug_rnglists,.debug_str_offsets,.debug_namesaccelerator table, plus a proper byte-codeDW_OP_implicit_value/DW_OP_implicit_pointerstory. DWARF 6 is in working draft — public snapshotdwarf6-20250505-2228.pdf(2025-05-05) onsnapshots.sourceware.org, and the live spec lives in git atgit.dwarfstd.org/dwarf-spec. DWARF 6 is mostly cleanups, bug fixes, and modest features (e.g.DW_AT_language_name/DW_AT_language_versiondecoupled from the old monolithicDW_LANG_*enum, which GCC has already started emitting). Major adoption still waits on the next round of toolchains. libdwarf 2.3.1 (2026-03-04) is the current consumer-side reference implementation. -
Security: the attack surface is “any program that loads debug info.” libdwarf, elfutils libdw, and binutils libbfd have all shipped CVE fixes from malformed expressions and corrupt abbreviation lists. A 2025 libdwarf change made the library refuse rather than loop on duplicate-attribute abbreviation tables — a previous DoS vector. The systemic issue is that debug parsers run before anyone authenticates the binary (debuggers attach, profilers ingest, crash-reporters symbolicate, eBPF tools symbolicate kernel addresses), which means a malicious crash dump or core file can attack the analyst’s machine. The “Exploiting the hard-working DWARF” paper showed this years before it became a CVE pattern.
-
LLVM IR debug-info metadata is a DSL above DWARF. LLVM IR carries debug info as metadata nodes (
!DILocation(line: 42, column: 7, scope: !12),!DICompileUnit(language: DW_LANG_C_plus_plus_14, ...),!DISubprogram(...),!DILocalVariable(...), etc.) — a small structured DSL the optimiser must preserve as it transforms IR. Backends then lower this metadata to DWARF, CodeView, or compact-unwind as appropriate for the target. So in modern toolchains there are at least two debug-info languages stacked: LLVM IR metadata at the compiler-internal layer, and DWARF/CodeView/etc. at the binary layer. Bugs in either (“optimised-out variables”, missing line info, wrong inlined-frame attribution) are notoriously hard because they require thinking in both languages simultaneously.
Citations
- DWARF Standards Committee homepage: https://dwarfstd.org/
- DWARF 5 standard (Feb 2017): https://dwarfstd.org/doc/DWARF5.pdf
- DWARF 6 working draft snapshots (latest 2025-05-05): https://snapshots.sourceware.org/dwarfstd/dwarf-spec/latest/
- DWARF 6 working draft PDF (2025-05-05): https://snapshots.sourceware.org/dwarfstd/dwarf-spec/2025-05-05_22-29_1746484141/dwarf6-20250505-2228.pdf
- DWARF 6 issues list: https://dwarfstd.org/issues.html
- DWARF 6 language codes: https://dwarfstd.org/languages-v6.html
- libdwarf consumer library (davea42, latest 2.3.1, 2026-03-04): https://github.com/davea42/libdwarf-code
- elfutils DWARF extensions index: https://sourceware.org/elfutils/DwarfExtensions
- LLVM Source-Level Debugging (IR metadata reference): https://llvm.org/docs/SourceLevelDebugging.html
- LLVM PDB File Format (the de facto CodeView/PDB documentation): https://llvm.org/docs/PDB/index.html
- LLVM CodeView Symbol Records: https://llvm.org/docs/PDB/CodeViewSymbols.html
- LLVM AMDGPU heterogeneous DWARF extensions: https://llvm.org/docs/AMDGPUDwarfExtensionsForHeterogeneousDebugging.html
- Apple Compact Unwinding Format (Faultlore reverse-engineering): https://faultlore.com/blah/compact-unwinding/
- macho-unwind-info parser (mstange): https://github.com/mstange/macho-unwind-info
- Microsoft x64 exception handling reference: https://learn.microsoft.com/en-us/cpp/build/exception-handling-x64
- Microsoft ARM64 exception handling reference: https://learn.microsoft.com/en-us/cpp/build/arm64-exception-handling
- ARM EHABI specification (abi-aa repository): https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst
- Itanium C++ ABI exception handling (the
.eh_framelineage): https://itanium-cxx-abi.github.io/cxx-abi/abi-eh.html - Krister Walfridsson — “More Turing-completeness in surprising places” (DWARF expressions): https://kristerw.blogspot.com/2016/01/more-turing-completeness-in-surprising.html
- Oakley & Bratus — “Exploiting the hard-working DWARF” (WOOT ‘11): https://www.usenix.org/legacy/events/woot11/tech/final_files/Oakley.pdf
- WebAssembly DWARF: https://yurydelendik.github.io/webassembly-dwarf/
- MaskRay — Stack unwinding survey: https://maskray.me/blog/2020-11-08-stack-unwinding