Assembly & Low-Level Encoding Languages Family Index


type: language-family-index family: assembly-and-encoding languages_catalogued: 35 tags: [language-reference, family-index, assembly, isa, x86, arm, risc-v, wasm, llvm-ir, elf, pe-coff, mach-o, dwarf, ebpf]

Assembly & Low-Level Encoding — Family Index

Family overview

“Assembly” is not one language but a cross-product: an instruction-set architecture (ISA) supplies the semantic vocabulary (registers, mnemonics, encodings), and an assembler + syntax dialect supplies the surface grammar that maps human-readable text to machine code. Picking “x86 assembly” is therefore underspecified — you must also pick a syntax (Intel vs AT&T) and an assembler (GAS, MASM, NASM, FASM, YASM). The dialects are not interchangeable: mov eax, [ebx+4] (Intel) and movl 4(%ebx), %eax (AT&T) produce identical bytes but disagree on operand order, sigils (% register prefix), size suffixes (b/w/l/q), and addressing-mode syntax. Linux and BSD toolchains historically defaulted to AT&T because GAS inherited from the Bell Labs as lineage; everywhere else (Intel SDM, AMD APM, all Microsoft tools, all the textbooks) uses Intel syntax. Modern GAS supports .intel_syntax noprefix to defect.

The 2020s have produced the first genuinely novel ISAs in a generation. RISC-V, an open royalty-free modular ISA, ratified its base integer profiles (RV32I/RV64I) plus extensions M (mul/div), A (atomics), F/D (single/double float), C (compressed) by 2019; the V vector extension (“RVV 1.0”) finalised in November 2021; the B bit-manipulation umbrella (Zba+Zbb+Zbs) ratified by 2024; and the RVA23 application profile ratified on 2024-10-21 made vector and hypervisor extensions mandatory for application-class chips, the most consequential RISC-V milestone since the base ISA. Intel APX (Advanced Performance Extensions, spec at v8 as of 2025-07) doubles x86-64 GPRs from 16 to 32 and adds REX2/EVEX-promoted forms; NASM 3.00 (2025-10) ships APX support and Linux kernel patches were still landing through H2 2025. ARM’s A-profile reached ARMv9.6-A (announced October 2024) with SME structured-sparsity improvements.

Two “portable assemblies” don’t target physical silicon: WebAssembly (text format .wat, binary .wasm) is a stack-machine virtual ISA whose 3.0 release in late 2025 finally shipped GC, native exception handling (exnref), 64-bit memory, tail calls, and multi-memory across all major browsers. LLVM IR is the compiler-internal SSA-form intermediate language exposed as .ll text or .bc bitcode; LLVM’s developer policy guarantees bitcode produced by 3.0+ remains loadable, but the textual IR is explicitly not stable across major versions. eBPF is a third, stranger creature: a 64-bit register VM whose programs must pass an in-kernel verifier (range analysis, bounded loops, memory-safety) at load time — the verifier is effectively the type system, and formal-verification work (Agni, the eBPF Foundation’s 2025 grant program for ePASS) is ongoing.

A parallel grammar family encodes the output of assembly: ELF (Linux/BSD), PE/COFF (Windows), Mach-O (Apple) for executables; Intel HEX and Motorola SREC for firmware images; DWARF (cross-platform debug) and CodeView/PDB (Microsoft debug) for symbol/line/type information. DWARF in particular is itself a programming language — DW_OP_* location expressions form a Turing-complete stack machine for describing where a variable lives at any PC. DWARF 6 is still a working draft as of May 2025, with the most recent snapshot dated 2025-05-05; DWARF 5 (2017) remains the production version.

In our deep library

None of the assembly dialects, assemblers, or binary-format grammars have standalone deep-library notes — assembly is intentionally absent from Tier 1/2 because no individual ISA dialect rises to the importance of a general-purpose source language for our purposes.

Cross-link adjacent notes:

  • c — the language whose ABI assembly implements; objdump -d of any C binary is the de-facto Rosetta stone for ISA assembly.
  • cpp — same ABI inheritance plus name-mangling rules that show up in nm output.
  • rustcore::arch::asm! macro reimagined GCC inline-asm constraint mini-language with named operands.
  • zig — inline assembly is first-class in the language, not a macro.
  • go — Plan 9 assembly dialect is its own oddity (pseudo-registers SB, FP, SP).
  • embedded-firmware — where AVR / Z80 / 8051 assembly actually gets written today.
  • gpu-and-shaders — NVIDIA PTX is a “GPU assembly” cousin.
  • hdl — Verilog/VHDL sit one cycle below assembly, generating the silicon that interprets it.
  • notation-spec — overlap with binary-format grammars (ASN.1, protobuf wire format).
  • esoteric — Brainfuck and other “minimal assembly” cousins.

Tier 3 family table — ISA assembly dialects

DialectFirst appearedOriginWord sizeStatus (2026)URL
x86 / x86-64 Intel syntax1978 (8086)Intel16/32/64-bitFoundational; Intel SDM is canonical reference; Intel APX spec at v8 (2025-07) doubles GPRs to 32https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html
x86 / x86-64 AT&T syntax~1978 (Unix port)AT&T Bell Labs / GNU16/32/64-bitGAS/GCC default; dominant in Linux/BSD toolchainshttps://sourceware.org/binutils/docs/as/i386_002dSyntax.html
ARM AArch32 (A32)1985 (ARM1)Acorn → Arm Holdings32-bitLegacy; A-profile dropped AArch32 EL0 support starting Cortex-A715 (2022)https://developer.arm.com/documentation/ddi0487/latest/
ARM AArch64 (A64)2011 (ARMv8-A)Arm Holdings64-bitDominant mobile/server ISA; ARMv9.6-A announced 2024-10 with SME enhancementshttps://developer.arm.com/documentation/109697/2025_12/
ARM Thumb / Thumb-21994 / 2003Arm Holdings16/32-bit (mixed)Active in Cortex-M (Thumb-2 only); Thumb is the only encoding on M-profilehttps://developer.arm.com/documentation/ddi0403/latest/
RISC-V (RV32I/RV64I)2010 (UC Berkeley)Krste Asanović et al.32/64-bitBase ratified 2019; RVA23 profile ratified 2024-10-21; vector + hypervisor now mandatoryhttps://riscv.org/technical/specifications/
MIPS (MIPS32/MIPS64)1985 (MIPS R2000)Stanford / MIPS Computer Systems32/64-bitEffectively legacy; Wave Computing/MIPS emerged from Chapter 11 in 2021-03, pivoted to RISC-V coreshttps://mips.com/
Power ISA / PowerPC1990 (POWER1) → 1994 (PowerPC)IBM / OpenPOWER Foundation32/64-bitActive; Power ISA v3.1B in IBM Power11 (released 2025-07)https://openpowerfoundation.org/specifications/isa/
SPARC V91993Sun → Oracle64-bitLegacy; nominally maintained by Oracle but no new server hardware after SPARC M8 (2017)https://sparc.org/technical-documents/
IBM z/Architecture2000 (z900)IBM64-bit CISCActive; IBM z17 GA 2025-06-18 with Telum II processorhttps://www.ibm.com/docs/en/zos/3.1.0?topic=zarchitecture
65021975MOS Technology8-bitRetro/educational; still in production as WDC 65C02 derivativeshttp://www.6502.org/tutorials/
Z801976Zilog8-bitRetro; Zilog announced Z80 EOL 2024-04, last orders mid-2024https://www.zilog.com/docs/z80/um0080.pdf
AVR1996Atmel → Microchip8-bitActive in low-end MCUs (Arduino Uno R3 ATmega328P); see embedded-firmwarehttps://onlinedocs.microchip.com/oxy/GUID-0B644D8F-67E7-49E6-82C9-1A2B7600BAD3-en-US-2/index.html
Intel 8051 / MCS-511980Intel8-bitLegacy but still shipped as soft IP and in many SoCs (e.g. Bluetooth controllers)https://www.keil.com/dd/docs/datashts/intel/ism51.pdf
WebAssembly text format (WAT)2017 (MVP)W3C / WebAssembly CG32/64-bit (Memory64)Active; WebAssembly 3.0 (Sep 2025) ships GC, exception handling, Memory64, tail callshttps://webassembly.github.io/spec/core/text/index.html
LLVM IR (textual .ll)2003Chris Lattner / UIUC → LLVM Foundationtyped SSA, target-parametricActive; unstable across major versions; bitcode 3.0+ guaranteed loadablehttps://llvm.org/docs/LangRef.html
eBPF assembly2014 (Linux 3.18 cBPF→eBPF)Alexei Starovoitov / Linux kernel64-bit register VMActive; verifier-gated; eBPF Foundation funded 2025 formal-verification grants (Agni, ePASS)https://docs.kernel.org/bpf/standardization/instruction-set.html

Tier 3 family table — Assembler syntaxes / toolchains

AssemblerFirst appearedDefault syntaxTargetsStatus (2026)URL
GAS (GNU as)1986AT&T (Intel via .intel_syntax noprefix)x86/64, ARM, AArch64, RISC-V, MIPS, PowerPC, SPARC, ~30 architecturesActive; ships in binutils; default backend for GCC/Clanghttps://sourceware.org/binutils/docs/as/
MASM (ml64.exe)1981Intelx86, x86-64 (ml64.exe)Active; bundled with MSVC build tools (no standalone download since VS 2015)https://learn.microsoft.com/en-us/cpp/assembler/masm/
NASM1996Intelx86, x86-64Very active; NASM 3.00 (2025-10) added Intel APX + AVX10; 3.01 (2025-10) followed week afterhttps://www.nasm.us/
FASM (flat assembler)1999Intel (FASM-specific dialect)x86, x86-64; FASMg (2020+) adds AArch64/Thumb headersActive; self-hosting since v1.0; FASMg multi-arch successor under active developmenthttps://flatassembler.net/
YASM2001Intel (NASM-compatible) + GAS-compatiblex86, x86-64Maintenance mode; commits sparse since 2014; many projects (FFmpeg, x264) reverted to NASMhttps://yasm.tortall.net/
TASM (Turbo Assembler)1988Intel (with IDEAL mode extension)x86 (16/32-bit)Discontinued; last release TASM 5.0 (1996); abandoned with Borland C++ Builderhttps://winworldpc.com/product/turbo-assembler/
HLA (High-Level Assembly)1996Hyde-specific high-level syntaxx86, x86-64 (compiles to MASM/NASM/GAS)Pedagogical/legacy; Randall Hyde’s “Art of Assembly” 2nd ed. teaching tool; little production usehttps://artofasm.randallhyde.com/
GNU inline asm1987 (GCC)AT&T-syntax string + constraint mini-languagewherever GCC/Clang runsActive; the "=r"(out) : "r"(in) : "memory" constraint syntax is a small DSL of its ownhttps://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html

Tier 3 family table — Binary / object / debug encoding

FormatFirst appearedOriginDomainStatus (2026)URL
ELF (Executable and Linkable Format)1989UNIX System V (USL)Linux, BSD, Solaris, illumos, HaikuUniversal Unix object format; gABI + processor-specific psABI supplementshttps://refspecs.linuxfoundation.org/elf/gabi4+/contents.html
PE / COFF (Portable Executable)1993 (Windows NT)Microsoft (extending DEC’s COFF)Windows, UEFI firmwareUniversal Windows format; spec at PE/COFF v11 (2024 revision)https://learn.microsoft.com/en-us/windows/win32/debug/pe-format
Mach-O1989 (NeXTSTEP)NeXT → ApplemacOS, iOS, watchOS, tvOS, visionOSUniversal Apple format; segment/section two-level layeringhttps://github.com/aidansteele/osx-abi-macho-file-format-reference
WebAssembly binary .wasm2017W3C / WebAssembly CGBrowsers + standalone runtimes (Wasmtime, WAMR, Wasmer)Active; Wasm 3.0 (2025-09) added GC types, exnref, Memory64 layouthttps://webassembly.github.io/spec/core/binary/index.html
Intel HEX1973Intel (Intellec MDS)Microcontroller flashing, EEPROM programmingActive in firmware tooling (avr-objcopy, OpenOCD); ASCII line-record formathttps://web.archive.org/web/20240228033310/https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/an/an015.pdf
Motorola SREC (S19/S28/S37)1970s (Motorola 6800)MotorolaFirmware, bootloaders, Freescale/NXP toolchainsActive; functionally equivalent to Intel HEX with Motorola lineagehttps://en.wikipedia.org/wiki/SREC_(file_format)
DWARF1988UNIX International / Free Standards Group → DWARF Standards CommitteeCross-platform debug infoDWARF 5 (2017) is current production; DWARF 6 working draft snapshot 2025-05-05https://dwarfstd.org/
CodeView / PDB~1985 (CodeView) / 1994 (PDB)MicrosoftWindows debug infoActive; PDB format remains semi-documented; LLVM emits PDB via lld-linkhttps://llvm.org/docs/PDB/
llvm-objdump / objdump notation1990s (binutils) / 2010s (llvm)GNU binutils / LLVM projectDisassembly output conventionActive; competing intel/att flag (-M intel, --x86-asm-syntax=intel)https://llvm.org/docs/CommandGuide/llvm-objdump.html
IDA / Ghidra disassembly1991 (IDA) / 2019 (Ghidra OSS)Hex-Rays / NSAReverse-engineering UI conventionsActive; Ghidra’s P-code IR is itself a disassembly-target IRhttps://ghidra-sre.org/

Notable threads

  • The Intel-vs-AT&T syntax schism is purely cultural at this point. The bytes are identical; the disagreement is entirely about operand order (dst, src Intel vs src, dst AT&T), register sigils (bare eax vs %eax), size suffixes (mov eax, ... vs movl ...), and bracket style. GCC/GAS defaulted to AT&T because the Unix as lineage predated the Intel manuals’ standardisation; everyone learning x86 from the Intel SDM, AMD APM, or any modern textbook learns Intel. Modern GAS supports .intel_syntax noprefix and Clang’s integrated assembler honours --x86-asm-syntax=intel, so the schism is now a one-line preamble rather than a rewrite.

  • RISC-V crossed the application-class threshold in 2024. Until RVA23 ratification on 2024-10-21, RISC-V chips that ran general-purpose Linux distributions varied wildly in which extensions they supported, forcing distros to ship multiple kernel/userspace builds or to target the lowest common denominator (RVA20). RVA23 mandates the V (vector), Zba/Zbb/Zbs (bit-manipulation), and hypervisor extensions, plus the C (compressed) and B umbrella. This is the milestone that makes a single RISC-V Linux binary feasible the way a single arm64 binary is, and it’s why Ubuntu 25.10 was the first Ubuntu release to ship a default RISC-V build targeting RVA23.

  • Intel APX is the biggest x86 architectural change since x86-64. The APX spec (v8 as of 2025-07) doubles general-purpose registers from 16 to 32, adds three-operand non-destructive forms (NDD), and a new conditional-move flag-suppression mode (NF). NASM 3.00 (2025-10) is the first major assembler with full APX support; Linux kernel patches were still being revised through H2 2025; first silicon is expected in Intel’s next-next generation, not the immediately upcoming one. Intel’s claim is “10% fewer loads, 20%+ fewer stores” from regalloc relief.

  • WebAssembly as portable assembly is no longer a stretch. With Wasm 3.0 shipping in September 2025, the runtime delivered native GC types (structref, arrayref), an exnref exception type, 64-bit memory addressing, tail calls, and multi-memory — the feature set that makes Wasm a credible compile target for managed languages (Java, Kotlin, OCaml, Scheme) without bringing your own GC. The WAT text format is not a register-machine assembly — it’s stack-machine s-expression syntax ((i32.add (local.get 0) (i32.const 1))) — but it functions as the assembly-language interchange of the web.

  • LLVM IR is a textual assembly that is intentionally not stable across versions. The LLVM Developer Policy guarantees bitcode produced by version 3.0 onward remains loadable, but the textual IR (.ll) is explicitly permitted to break across major releases. This is why clang -emit-llvm -S output from LLVM 17 won’t necessarily round-trip through LLVM 21’s llc — and why long-term IR archives are a footgun. MLIR (the newer multi-level IR) is attempting a stronger versioning story via a bytecode format with explicit op upgrade rules.

  • eBPF: the verifier is the language. Unlike every other ISA in this catalog, an eBPF program is rejected at load time unless an in-kernel static analyser proves it terminates (no unbounded loops without bounded helpers), bounds-checks every memory access, respects the helper-function calling convention, and never reads uninitialised stack. The verifier’s range analysis was the subject of formal-verification work (Agni at CAV 2023, an NCC Group code review in late 2024, and the eBPF Foundation’s 2025 grant of $100k including the ePASS in-kernel LLVM-like framework). The verifier is the type system, the borrow checker, and the linker rolled into one — and getting it wrong is a kernel-privilege escape vector.

  • Mach-O’s segment/section two-level model is more layered than ELF’s. ELF has program headers and sections, where program headers control loading and sections control linking; Mach-O introduces segments (the loader’s unit) which contain sections (the linker’s unit), with explicit __TEXT/__DATA/__LINKEDIT segments. PE/COFF inherits structure from VAX/VMS and DEC’s COFF rather than Unix, which is why “section characteristics” bit-flags look so different from ELF’s sh_flags. Each format encodes the same conceptual content (code, data, symbols, relocations, debug info) with mutually-incompatible structural rules — there is no useful “lossless converter” between any pair.

  • Inline-asm constraint mini-languages are themselves DSLs. GCC’s asm volatile ("syscall" : "=a"(ret) : "a"(SYS_write), "D"(fd), "S"(buf), "d"(len) : "rcx", "r11", "memory") is parsed by GCC as a constraint expression: =a means “output, must go in rax”; "D"/"S"/"d" mean “input, must go in rdi/rsi/rdx”; the third colon lists clobbers. This is a small declarative language for “tell the compiler what registers/memory this asm uses.” Rust’s core::arch::asm! macro reimagined it with named operands (asm!("syscall", in("rax") SYS_write, in("rdi") fd, ...)) and stricter validation. Zig made inline asm part of the surface language entirely.

  • DWARF expressions are Turing-complete. The DW_OP_* opcodes form a stack machine for describing where a variable lives at any program counter — and because the opcode set includes loops (DW_OP_bra, DW_OP_skip), conditionals, and arbitrary arithmetic, DWARF location expressions are formally Turing-complete. Section .debug_loc (DWARF 4) / .debug_loclists (DWARF 5) contains compiled DWARF programs that gdb/lldb interpret to answer “where is variable x right now?” The DWARF 6 working draft (current snapshot 2025-05-05) is folding in clarifications around DW_OP_LLVM_* extensions and split-DWARF refinements but is not yet a published standard.

Citations

Caveats

  • Latest MASM version number is not publicly versioned. Microsoft ships ml.exe / ml64.exe with each Visual Studio release but does not publish a separate MASM version stream; the most recent reference page is on Microsoft Learn for VS 2022 (view=msvc-170). Treat “MASM version” as “the one in your installed VS toolchain.”
  • FASM vs FASMg vs FASM 2 status is murky. The original FASM 1 lineage continues at flatassembler.net under Tomasz Grysztar; FASMg (2020+) is a multi-architecture rewrite; FASM 2 was discussed in early-2025 forum threads but I could not pin a definitive 2025 release identifier.
  • DWARF 6 final ratification date is not yet announced. Working drafts have been published since at least 2023-12; the most recent is 2025-05-05; a final standard date had not been announced as of this writing.
  • PE/COFF spec revision number (“v11”, 2024) is from the Microsoft Learn page metadata and may not match a formal versioned spec document.
  • Intel HEX upstream URL is not stably hosted by Intel anymore; the linked Wayback snapshot is the most-cited canonical reference (Intel Application Note 015 / “AN 015”).
  • MIPS ISA ownership in 2026: the trade name MIPS continues at mips.com under Tallwood VC, but the company has publicly pivoted to RISC-V cores; MIPS32/MIPS64 are no longer being extended. Treat MIPS in 2026 as a historical reference architecture with a residual installed base in network-router silicon.
  • eBPF as “assembly” is a stretched definition — the BPF instruction set is real and there is a published assembly syntax, but most programmers write eBPF in restricted C or Rust (aya-rs) and never touch the textual asm. Included here because the BPF ISA is a first-class language target and because the verifier-as-type-system story is unique.