Assembly & Low-Level Encoding Languages Family Index
type: language-family-index family: assembly-and-encoding languages_catalogued: 35 tags: [language-reference, family-index, assembly, isa, x86, arm, risc-v, wasm, llvm-ir, elf, pe-coff, mach-o, dwarf, ebpf]
Assembly & Low-Level Encoding — Family Index
Family overview
“Assembly” is not one language but a cross-product: an instruction-set architecture (ISA) supplies the semantic vocabulary (registers, mnemonics, encodings), and an assembler + syntax dialect supplies the surface grammar that maps human-readable text to machine code. Picking “x86 assembly” is therefore underspecified — you must also pick a syntax (Intel vs AT&T) and an assembler (GAS, MASM, NASM, FASM, YASM). The dialects are not interchangeable: mov eax, [ebx+4] (Intel) and movl 4(%ebx), %eax (AT&T) produce identical bytes but disagree on operand order, sigils (% register prefix), size suffixes (b/w/l/q), and addressing-mode syntax. Linux and BSD toolchains historically defaulted to AT&T because GAS inherited from the Bell Labs as lineage; everywhere else (Intel SDM, AMD APM, all Microsoft tools, all the textbooks) uses Intel syntax. Modern GAS supports .intel_syntax noprefix to defect.
The 2020s have produced the first genuinely novel ISAs in a generation. RISC-V, an open royalty-free modular ISA, ratified its base integer profiles (RV32I/RV64I) plus extensions M (mul/div), A (atomics), F/D (single/double float), C (compressed) by 2019; the V vector extension (“RVV 1.0”) finalised in November 2021; the B bit-manipulation umbrella (Zba+Zbb+Zbs) ratified by 2024; and the RVA23 application profile ratified on 2024-10-21 made vector and hypervisor extensions mandatory for application-class chips, the most consequential RISC-V milestone since the base ISA. Intel APX (Advanced Performance Extensions, spec at v8 as of 2025-07) doubles x86-64 GPRs from 16 to 32 and adds REX2/EVEX-promoted forms; NASM 3.00 (2025-10) ships APX support and Linux kernel patches were still landing through H2 2025. ARM’s A-profile reached ARMv9.6-A (announced October 2024) with SME structured-sparsity improvements.
Two “portable assemblies” don’t target physical silicon: WebAssembly (text format .wat, binary .wasm) is a stack-machine virtual ISA whose 3.0 release in late 2025 finally shipped GC, native exception handling (exnref), 64-bit memory, tail calls, and multi-memory across all major browsers. LLVM IR is the compiler-internal SSA-form intermediate language exposed as .ll text or .bc bitcode; LLVM’s developer policy guarantees bitcode produced by 3.0+ remains loadable, but the textual IR is explicitly not stable across major versions. eBPF is a third, stranger creature: a 64-bit register VM whose programs must pass an in-kernel verifier (range analysis, bounded loops, memory-safety) at load time — the verifier is effectively the type system, and formal-verification work (Agni, the eBPF Foundation’s 2025 grant program for ePASS) is ongoing.
A parallel grammar family encodes the output of assembly: ELF (Linux/BSD), PE/COFF (Windows), Mach-O (Apple) for executables; Intel HEX and Motorola SREC for firmware images; DWARF (cross-platform debug) and CodeView/PDB (Microsoft debug) for symbol/line/type information. DWARF in particular is itself a programming language — DW_OP_* location expressions form a Turing-complete stack machine for describing where a variable lives at any PC. DWARF 6 is still a working draft as of May 2025, with the most recent snapshot dated 2025-05-05; DWARF 5 (2017) remains the production version.
In our deep library
None of the assembly dialects, assemblers, or binary-format grammars have standalone deep-library notes — assembly is intentionally absent from Tier 1/2 because no individual ISA dialect rises to the importance of a general-purpose source language for our purposes.
Cross-link adjacent notes:
- c — the language whose ABI assembly implements;
objdump -dof any C binary is the de-facto Rosetta stone for ISA assembly. - cpp — same ABI inheritance plus name-mangling rules that show up in
nmoutput. - rust —
core::arch::asm!macro reimagined GCC inline-asm constraint mini-language with named operands. - zig — inline assembly is first-class in the language, not a macro.
- go — Plan 9 assembly dialect is its own oddity (pseudo-registers
SB,FP,SP). - embedded-firmware — where AVR / Z80 / 8051 assembly actually gets written today.
- gpu-and-shaders — NVIDIA PTX is a “GPU assembly” cousin.
- hdl — Verilog/VHDL sit one cycle below assembly, generating the silicon that interprets it.
- notation-spec — overlap with binary-format grammars (ASN.1, protobuf wire format).
- esoteric — Brainfuck and other “minimal assembly” cousins.
Tier 3 family table — ISA assembly dialects
| Dialect | First appeared | Origin | Word size | Status (2026) | URL |
|---|---|---|---|---|---|
| x86 / x86-64 Intel syntax | 1978 (8086) | Intel | 16/32/64-bit | Foundational; Intel SDM is canonical reference; Intel APX spec at v8 (2025-07) doubles GPRs to 32 | https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html |
| x86 / x86-64 AT&T syntax | ~1978 (Unix port) | AT&T Bell Labs / GNU | 16/32/64-bit | GAS/GCC default; dominant in Linux/BSD toolchains | https://sourceware.org/binutils/docs/as/i386_002dSyntax.html |
| ARM AArch32 (A32) | 1985 (ARM1) | Acorn → Arm Holdings | 32-bit | Legacy; A-profile dropped AArch32 EL0 support starting Cortex-A715 (2022) | https://developer.arm.com/documentation/ddi0487/latest/ |
| ARM AArch64 (A64) | 2011 (ARMv8-A) | Arm Holdings | 64-bit | Dominant mobile/server ISA; ARMv9.6-A announced 2024-10 with SME enhancements | https://developer.arm.com/documentation/109697/2025_12/ |
| ARM Thumb / Thumb-2 | 1994 / 2003 | Arm Holdings | 16/32-bit (mixed) | Active in Cortex-M (Thumb-2 only); Thumb is the only encoding on M-profile | https://developer.arm.com/documentation/ddi0403/latest/ |
| RISC-V (RV32I/RV64I) | 2010 (UC Berkeley) | Krste Asanović et al. | 32/64-bit | Base ratified 2019; RVA23 profile ratified 2024-10-21; vector + hypervisor now mandatory | https://riscv.org/technical/specifications/ |
| MIPS (MIPS32/MIPS64) | 1985 (MIPS R2000) | Stanford / MIPS Computer Systems | 32/64-bit | Effectively legacy; Wave Computing/MIPS emerged from Chapter 11 in 2021-03, pivoted to RISC-V cores | https://mips.com/ |
| Power ISA / PowerPC | 1990 (POWER1) → 1994 (PowerPC) | IBM / OpenPOWER Foundation | 32/64-bit | Active; Power ISA v3.1B in IBM Power11 (released 2025-07) | https://openpowerfoundation.org/specifications/isa/ |
| SPARC V9 | 1993 | Sun → Oracle | 64-bit | Legacy; nominally maintained by Oracle but no new server hardware after SPARC M8 (2017) | https://sparc.org/technical-documents/ |
| IBM z/Architecture | 2000 (z900) | IBM | 64-bit CISC | Active; IBM z17 GA 2025-06-18 with Telum II processor | https://www.ibm.com/docs/en/zos/3.1.0?topic=zarchitecture |
| 6502 | 1975 | MOS Technology | 8-bit | Retro/educational; still in production as WDC 65C02 derivatives | http://www.6502.org/tutorials/ |
| Z80 | 1976 | Zilog | 8-bit | Retro; Zilog announced Z80 EOL 2024-04, last orders mid-2024 | https://www.zilog.com/docs/z80/um0080.pdf |
| AVR | 1996 | Atmel → Microchip | 8-bit | Active in low-end MCUs (Arduino Uno R3 ATmega328P); see embedded-firmware | https://onlinedocs.microchip.com/oxy/GUID-0B644D8F-67E7-49E6-82C9-1A2B7600BAD3-en-US-2/index.html |
| Intel 8051 / MCS-51 | 1980 | Intel | 8-bit | Legacy but still shipped as soft IP and in many SoCs (e.g. Bluetooth controllers) | https://www.keil.com/dd/docs/datashts/intel/ism51.pdf |
| WebAssembly text format (WAT) | 2017 (MVP) | W3C / WebAssembly CG | 32/64-bit (Memory64) | Active; WebAssembly 3.0 (Sep 2025) ships GC, exception handling, Memory64, tail calls | https://webassembly.github.io/spec/core/text/index.html |
LLVM IR (textual .ll) | 2003 | Chris Lattner / UIUC → LLVM Foundation | typed SSA, target-parametric | Active; unstable across major versions; bitcode 3.0+ guaranteed loadable | https://llvm.org/docs/LangRef.html |
| eBPF assembly | 2014 (Linux 3.18 cBPF→eBPF) | Alexei Starovoitov / Linux kernel | 64-bit register VM | Active; verifier-gated; eBPF Foundation funded 2025 formal-verification grants (Agni, ePASS) | https://docs.kernel.org/bpf/standardization/instruction-set.html |
Tier 3 family table — Assembler syntaxes / toolchains
| Assembler | First appeared | Default syntax | Targets | Status (2026) | URL |
|---|---|---|---|---|---|
| GAS (GNU as) | 1986 | AT&T (Intel via .intel_syntax noprefix) | x86/64, ARM, AArch64, RISC-V, MIPS, PowerPC, SPARC, ~30 architectures | Active; ships in binutils; default backend for GCC/Clang | https://sourceware.org/binutils/docs/as/ |
| MASM (ml64.exe) | 1981 | Intel | x86, x86-64 (ml64.exe) | Active; bundled with MSVC build tools (no standalone download since VS 2015) | https://learn.microsoft.com/en-us/cpp/assembler/masm/ |
| NASM | 1996 | Intel | x86, x86-64 | Very active; NASM 3.00 (2025-10) added Intel APX + AVX10; 3.01 (2025-10) followed week after | https://www.nasm.us/ |
| FASM (flat assembler) | 1999 | Intel (FASM-specific dialect) | x86, x86-64; FASMg (2020+) adds AArch64/Thumb headers | Active; self-hosting since v1.0; FASMg multi-arch successor under active development | https://flatassembler.net/ |
| YASM | 2001 | Intel (NASM-compatible) + GAS-compatible | x86, x86-64 | Maintenance mode; commits sparse since 2014; many projects (FFmpeg, x264) reverted to NASM | https://yasm.tortall.net/ |
| TASM (Turbo Assembler) | 1988 | Intel (with IDEAL mode extension) | x86 (16/32-bit) | Discontinued; last release TASM 5.0 (1996); abandoned with Borland C++ Builder | https://winworldpc.com/product/turbo-assembler/ |
| HLA (High-Level Assembly) | 1996 | Hyde-specific high-level syntax | x86, x86-64 (compiles to MASM/NASM/GAS) | Pedagogical/legacy; Randall Hyde’s “Art of Assembly” 2nd ed. teaching tool; little production use | https://artofasm.randallhyde.com/ |
| GNU inline asm | 1987 (GCC) | AT&T-syntax string + constraint mini-language | wherever GCC/Clang runs | Active; the "=r"(out) : "r"(in) : "memory" constraint syntax is a small DSL of its own | https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html |
Tier 3 family table — Binary / object / debug encoding
| Format | First appeared | Origin | Domain | Status (2026) | URL |
|---|---|---|---|---|---|
| ELF (Executable and Linkable Format) | 1989 | UNIX System V (USL) | Linux, BSD, Solaris, illumos, Haiku | Universal Unix object format; gABI + processor-specific psABI supplements | https://refspecs.linuxfoundation.org/elf/gabi4+/contents.html |
| PE / COFF (Portable Executable) | 1993 (Windows NT) | Microsoft (extending DEC’s COFF) | Windows, UEFI firmware | Universal Windows format; spec at PE/COFF v11 (2024 revision) | https://learn.microsoft.com/en-us/windows/win32/debug/pe-format |
| Mach-O | 1989 (NeXTSTEP) | NeXT → Apple | macOS, iOS, watchOS, tvOS, visionOS | Universal Apple format; segment/section two-level layering | https://github.com/aidansteele/osx-abi-macho-file-format-reference |
WebAssembly binary .wasm | 2017 | W3C / WebAssembly CG | Browsers + standalone runtimes (Wasmtime, WAMR, Wasmer) | Active; Wasm 3.0 (2025-09) added GC types, exnref, Memory64 layout | https://webassembly.github.io/spec/core/binary/index.html |
| Intel HEX | 1973 | Intel (Intellec MDS) | Microcontroller flashing, EEPROM programming | Active in firmware tooling (avr-objcopy, OpenOCD); ASCII line-record format | https://web.archive.org/web/20240228033310/https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/an/an015.pdf |
| Motorola SREC (S19/S28/S37) | 1970s (Motorola 6800) | Motorola | Firmware, bootloaders, Freescale/NXP toolchains | Active; functionally equivalent to Intel HEX with Motorola lineage | https://en.wikipedia.org/wiki/SREC_(file_format) |
| DWARF | 1988 | UNIX International / Free Standards Group → DWARF Standards Committee | Cross-platform debug info | DWARF 5 (2017) is current production; DWARF 6 working draft snapshot 2025-05-05 | https://dwarfstd.org/ |
| CodeView / PDB | ~1985 (CodeView) / 1994 (PDB) | Microsoft | Windows debug info | Active; PDB format remains semi-documented; LLVM emits PDB via lld-link | https://llvm.org/docs/PDB/ |
| llvm-objdump / objdump notation | 1990s (binutils) / 2010s (llvm) | GNU binutils / LLVM project | Disassembly output convention | Active; competing intel/att flag (-M intel, --x86-asm-syntax=intel) | https://llvm.org/docs/CommandGuide/llvm-objdump.html |
| IDA / Ghidra disassembly | 1991 (IDA) / 2019 (Ghidra OSS) | Hex-Rays / NSA | Reverse-engineering UI conventions | Active; Ghidra’s P-code IR is itself a disassembly-target IR | https://ghidra-sre.org/ |
Notable threads
-
The Intel-vs-AT&T syntax schism is purely cultural at this point. The bytes are identical; the disagreement is entirely about operand order (
dst, srcIntel vssrc, dstAT&T), register sigils (bareeaxvs%eax), size suffixes (mov eax, ...vsmovl ...), and bracket style. GCC/GAS defaulted to AT&T because the Unixaslineage predated the Intel manuals’ standardisation; everyone learning x86 from the Intel SDM, AMD APM, or any modern textbook learns Intel. Modern GAS supports.intel_syntax noprefixand Clang’s integrated assembler honours--x86-asm-syntax=intel, so the schism is now a one-line preamble rather than a rewrite. -
RISC-V crossed the application-class threshold in 2024. Until RVA23 ratification on 2024-10-21, RISC-V chips that ran general-purpose Linux distributions varied wildly in which extensions they supported, forcing distros to ship multiple kernel/userspace builds or to target the lowest common denominator (RVA20). RVA23 mandates the V (vector), Zba/Zbb/Zbs (bit-manipulation), and hypervisor extensions, plus the C (compressed) and B umbrella. This is the milestone that makes a single RISC-V Linux binary feasible the way a single arm64 binary is, and it’s why Ubuntu 25.10 was the first Ubuntu release to ship a default RISC-V build targeting RVA23.
-
Intel APX is the biggest x86 architectural change since x86-64. The APX spec (v8 as of 2025-07) doubles general-purpose registers from 16 to 32, adds three-operand non-destructive forms (NDD), and a new conditional-move flag-suppression mode (NF). NASM 3.00 (2025-10) is the first major assembler with full APX support; Linux kernel patches were still being revised through H2 2025; first silicon is expected in Intel’s next-next generation, not the immediately upcoming one. Intel’s claim is “10% fewer loads, 20%+ fewer stores” from regalloc relief.
-
WebAssembly as portable assembly is no longer a stretch. With Wasm 3.0 shipping in September 2025, the runtime delivered native GC types (
structref,arrayref), anexnrefexception type, 64-bit memory addressing, tail calls, and multi-memory — the feature set that makes Wasm a credible compile target for managed languages (Java, Kotlin, OCaml, Scheme) without bringing your own GC. The WAT text format is not a register-machine assembly — it’s stack-machine s-expression syntax ((i32.add (local.get 0) (i32.const 1))) — but it functions as the assembly-language interchange of the web. -
LLVM IR is a textual assembly that is intentionally not stable across versions. The LLVM Developer Policy guarantees bitcode produced by version 3.0 onward remains loadable, but the textual IR (
.ll) is explicitly permitted to break across major releases. This is whyclang -emit-llvm -Soutput from LLVM 17 won’t necessarily round-trip through LLVM 21’sllc— and why long-term IR archives are a footgun. MLIR (the newer multi-level IR) is attempting a stronger versioning story via a bytecode format with explicit op upgrade rules. -
eBPF: the verifier is the language. Unlike every other ISA in this catalog, an eBPF program is rejected at load time unless an in-kernel static analyser proves it terminates (no unbounded loops without bounded helpers), bounds-checks every memory access, respects the helper-function calling convention, and never reads uninitialised stack. The verifier’s range analysis was the subject of formal-verification work (Agni at CAV 2023, an NCC Group code review in late 2024, and the eBPF Foundation’s 2025 grant of $100k including the ePASS in-kernel LLVM-like framework). The verifier is the type system, the borrow checker, and the linker rolled into one — and getting it wrong is a kernel-privilege escape vector.
-
Mach-O’s segment/section two-level model is more layered than ELF’s. ELF has program headers and sections, where program headers control loading and sections control linking; Mach-O introduces segments (the loader’s unit) which contain sections (the linker’s unit), with explicit
__TEXT/__DATA/__LINKEDITsegments. PE/COFF inherits structure from VAX/VMS and DEC’s COFF rather than Unix, which is why “section characteristics” bit-flags look so different from ELF’ssh_flags. Each format encodes the same conceptual content (code, data, symbols, relocations, debug info) with mutually-incompatible structural rules — there is no useful “lossless converter” between any pair. -
Inline-asm constraint mini-languages are themselves DSLs. GCC’s
asm volatile ("syscall" : "=a"(ret) : "a"(SYS_write), "D"(fd), "S"(buf), "d"(len) : "rcx", "r11", "memory")is parsed by GCC as a constraint expression:=ameans “output, must go inrax”;"D"/"S"/"d"mean “input, must go inrdi/rsi/rdx”; the third colon lists clobbers. This is a small declarative language for “tell the compiler what registers/memory this asm uses.” Rust’score::arch::asm!macro reimagined it with named operands (asm!("syscall", in("rax") SYS_write, in("rdi") fd, ...)) and stricter validation. Zig made inline asm part of the surface language entirely. -
DWARF expressions are Turing-complete. The
DW_OP_*opcodes form a stack machine for describing where a variable lives at any program counter — and because the opcode set includes loops (DW_OP_bra,DW_OP_skip), conditionals, and arbitrary arithmetic, DWARF location expressions are formally Turing-complete. Section.debug_loc(DWARF 4) /.debug_loclists(DWARF 5) contains compiled DWARF programs that gdb/lldb interpret to answer “where is variablexright now?” The DWARF 6 working draft (current snapshot 2025-05-05) is folding in clarifications aroundDW_OP_LLVM_*extensions and split-DWARF refinements but is not yet a published standard.
Citations
- Intel 64 and IA-32 Architectures Software Developer’s Manual (SDM): https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html
- Intel Advanced Performance Extensions (APX) Architecture Specification: https://cdrdv2-public.intel.com/861610/355828-007-intel-apx-spec.pdf
- AMD64 Architecture Programmer’s Manual: https://www.amd.com/system/files/TechDocs/40332.pdf
- ARM Architecture Reference Manual for A-profile (DDI 0487): https://developer.arm.com/documentation/ddi0487/latest/
- ARMv9.6-A architecture extension: https://developer.arm.com/documentation/109697/2025_12/Feature-descriptions/The-Armv9-6-architecture-extension
- RISC-V Specifications hub: https://riscv.org/technical/specifications/
- RISC-V RVA23 profile ratification (2024-10-21): https://riscv.org/blog/risc-v-announces-ratification-of-the-rva23-profile-standard/
- RISC-V Bit-manipulation extensions (Zba/Zbb/Zbs): https://docs.riscv.org/reference/isa/unpriv/b-st-ext.html
- Power ISA v3.1B: https://wiki.raptorcs.com/w/images/d/d3/OPF_PowerISA_v3.1B.pdf
- IBM z17 announcement (GA 2025-06-18): https://newsroom.ibm.com/z17
- z/Architecture Principles of Operation: https://www.ibm.com/docs/en/zos/3.1.0?topic=zarchitecture
- GNU as documentation: https://sourceware.org/binutils/docs/as/
- NASM 3.00 release (Intel APX + AVX10): https://www.phoronix.com/news/NASM-3.00-APX-AVX10
- MASM (ml64.exe) reference: https://learn.microsoft.com/en-us/cpp/assembler/masm/
- FASM (flat assembler): https://flatassembler.net/docs.php
- ELF specification (System V gABI): https://refspecs.linuxfoundation.org/elf/gabi4+/contents.html
- Microsoft PE/COFF specification: https://learn.microsoft.com/en-us/windows/win32/debug/pe-format
- Mach-O file format reference: https://github.com/aidansteele/osx-abi-macho-file-format-reference
- WebAssembly Core specification: https://webassembly.github.io/spec/core/
- WebAssembly 3.0 release notes (2025-09): https://www.x-cmd.com/blog/250924/
- LLVM Language Reference Manual: https://llvm.org/docs/LangRef.html
- LLVM Developer Policy (bitcode compatibility): https://llvm.org/docs/DeveloperPolicy.html
- DWARF Standard home (Committee site): https://dwarfstd.org/
- DWARF 6 working draft snapshot (2025-05-05): https://snapshots.sourceware.org/dwarfstd/dwarf-spec/2025-05-05_22-29_1746484141/dwarf6-20250505-2228.pdf
- eBPF instruction set standard: https://docs.kernel.org/bpf/standardization/instruction-set.html
- eBPF verifier documentation: https://docs.kernel.org/bpf/verifier.html
- eBPF Foundation 2025 Year in Review: https://ebpf.foundation/the-ebpf-foundations-2025-year-in-review/
- MIPS post-bankruptcy rebrand (2021-03): https://mips.com/press-releases/restructured-wave-computing-mips-business-moves-ahead-as-mips/
Caveats
- Latest MASM version number is not publicly versioned. Microsoft ships ml.exe / ml64.exe with each Visual Studio release but does not publish a separate MASM version stream; the most recent reference page is on Microsoft Learn for VS 2022 (
view=msvc-170). Treat “MASM version” as “the one in your installed VS toolchain.” - FASM vs FASMg vs FASM 2 status is murky. The original FASM 1 lineage continues at flatassembler.net under Tomasz Grysztar; FASMg (2020+) is a multi-architecture rewrite; FASM 2 was discussed in early-2025 forum threads but I could not pin a definitive 2025 release identifier.
- DWARF 6 final ratification date is not yet announced. Working drafts have been published since at least 2023-12; the most recent is 2025-05-05; a final standard date had not been announced as of this writing.
- PE/COFF spec revision number (“v11”, 2024) is from the Microsoft Learn page metadata and may not match a formal versioned spec document.
- Intel HEX upstream URL is not stably hosted by Intel anymore; the linked Wayback snapshot is the most-cited canonical reference (Intel Application Note 015 / “AN 015”).
- MIPS ISA ownership in 2026: the trade name MIPS continues at mips.com under Tallwood VC, but the company has publicly pivoted to RISC-V cores; MIPS32/MIPS64 are no longer being extended. Treat MIPS in 2026 as a historical reference architecture with a residual installed base in network-router silicon.
- eBPF as “assembly” is a stretched definition — the BPF instruction set is real and there is a published assembly syntax, but most programmers write eBPF in restricted C or Rust (aya-rs) and never touch the textual asm. Included here because the BPF ISA is a first-class language target and because the verifier-as-type-system story is unique.