Codec / DSP / Signal-Processing DSLs Family Index


type: language-family-index family: codec-and-dsp languages_catalogued: 17 tags: [language-reference, family-index, codec-and-dsp, dsp, signal-processing, audio, video, ffmpeg, gstreamer, cmajor, soul, codec]

Codec / DSP / Signal-Processing DSLs — Family Index

Family overview

This family covers the engineering side of audio/video DSP languages — the textual notations that describe digital-signal-processing graphs, codec parameter sets, filter chains, and real-time pipelines deployed inside encoders, broadcast plants, telecoms stacks, and embedded MCUs. Its sibling family music-audio covers the creative side: Csound, SuperCollider, ChucK, Faust-as-instrument, and the live-coding hosts. The line between them is fuzzy (Faust legitimately belongs to both, and Cmajor is rapidly crossing it as plug-in vendors adopt it), but the centre of mass is clear: the music family is about expressing a piece, this family is about expressing a processor.

The defining tension in DSP-language design is determinism. A general-purpose language guarantees nothing about timing, allocation, or bit-exactness — fatal properties when 48 kHz of audio with a 128-sample buffer leaves the CPU 2.67 ms to compute every block, and a single missed deadline is an audible click. SOUL (Sound Object-Oriented Language, Julian Storer / ROLI / JUCE, public beta 2019) was the first serious attempt at “JavaScript for DSP” — a high-level language with formal guarantees about no GC, no allocation in the audio thread, deterministic processing, and bit-identical output across platforms. SOUL was deprecated and archived in 2022 when ROLI’s bankruptcy unwound the team; Storer reincarnated the design as Cmajor at SoundStacks, which has been in active development since 2021 and is now the de facto modern DSP language for plug-in authoring (JIT-compilable, browser-runnable via the Cmajor JIT, with a JUCE exporter). Faust (GRAME/INRIA, 2002+) covers similar ground via a different route — a pure functional block-diagram algebra that compiles to C++/LLVM/Wasm/Web Audio — and is dual-listed in music-audio.

The most-used DSP DSLs on Earth are not language-design exercises but filter-graph mini-languages embedded in command-line tools. FFmpeg’s filter-graph syntax[in1][in2]filter=arg1=v1:arg2=v2[out], with chains separated by ; and labels in [brackets] — has a formal grammar (libavfilter/graphparser.c) and probably appears in more shell scripts and CI pipelines than any music-DSP language has total users. GStreamer’s gst-launch-1.0 pipeline syntax does the same job for the GNOME / freedesktop stack: videotestsrc ! videoconvert ! autovideosink reads as a Unix-pipe analogue but is parsed by gst_parse_launch() into a real element graph. Microsoft’s legacy DirectShow filter-graph notation played the same role for Windows multimedia from the late 1990s to mid-2010s and still surfaces in legacy capture stacks.

Below those user-facing DSLs sit a layer of engineering-spec languages that almost no one outside the standards bodies sees but that govern what every audio chip does: CMSIS-DSP’s vocabulary of fixed-point Q-format conventions and arm_<op>_q15/q31/f32 function families on Cortex-M; VST3’s parameter-tree description schema; the W3C AudioWorklet processor-descriptor JS interface; the Bluetooth codec-config DSLs (SBC, aptX, LDAC, LC3) that decide what your headphones actually do; and the standards-pseudocode of ITU-R BS.1770 / EBU R 128 loudness algorithms — the K-weighting filter, momentary/short-term/integrated time constants, absolute and relative gating — written in a half-formal English-plus-equations dialect that every loudness-meter vendor implements from spec. Finally, speech-recognition stacks (Kaldi nnet3 xconfig, Vosk model.conf, Whisper model parameters) introduce a small constrained DSL for layer composition and chunked inference — the bridge between classical DSP and modern ML.

DSP languages, taken as a class, must speak fluently about: deterministic timing (no GC, bounded latency, sample-accurate parameter automation), fixed-point vs floating (Q-formats on MCUs, IEEE-754 elsewhere), sample-rate as a type (44.1 vs 48 vs 96 kHz, plus integer vs fractional resampling), channel count and layout (mono, stereo, 5.1, Atmos object beds), denormal handling (flush-to-zero modes on x86/ARM SIMD), and block size (the 128-sample AudioWorklet quantum is a hard W3C constant). A general-purpose language provides exactly none of this, and that gap is the entire reason this family exists.

In our deep library

None of these DSLs has a standalone deep-library note; they sit on top of host languages or are command-line tools.

Cross-reference:

  • music-audio — the creative-music sibling. Faust, Csound, SuperCollider, ChucK, Pd, Max/MSP, Reaktor Core, Kyma all live there. Cross-listed: Faust (compiler-for-portable-DSP) appears in both because it serves both communities.
  • visual-dataflow — Simulink, LabVIEW, Pure Data, Reaktor, MAX/MSP, TouchDesigner. The visual DSP languages overlap with this family wherever the patcher targets DSP graphs.
  • gpu-and-shaders — OpenCL kernels are the de facto DSL for GPU-accelerated DSP (FFTs, image filters, ML preprocessing); CUDA cuFFT and Metal Performance Shaders Graph are siblings.
  • embedded-firmware — CMSIS-DSP targets Cortex-M; ESP-DSP targets Xtensa LX6/LX7; RP2350 has its own SIMD-DSP intrinsics. The embedded-firmware family hosts the language tooling.
  • hdl — Verilog-AMS and SPICE for analog/mixed-signal modelling are the lowest layer of the DSP stack; they describe the silicon that runs the algorithms.
  • scientific — NumPy/SciPy scipy.signal, MATLAB, Julia DSP.jl share authoring patterns with the engineering-DSP DSLs but compile/run differently.
  • cpp — host language for nearly every production DSP library (FFmpeg/libavfilter, GStreamer/GLib, JUCE, VST3 SDK, CMSIS-DSP, libsamplerate, FFTW, Speex, Opus).
  • javascript — the host of AudioWorklet processors and the runtime target of Cmajor’s web export and Faust’s faust2webaudio.
  • python — host of librosa, torchaudio, Kaldi-IO, Vosk Python bindings; the analysis layer above this family.

Tier 3 family table

Language / DSLFirst appearedOriginDomainStatus (2026)URL
Cmajor2021SoundStacks (Julian Storer, Cesare Ferrari, ex-ROLI)audio-DSP / plug-in authoringActive; SoundStacks publishes regular releases on GitHub; JUCE exporter, VS Code extension, Cmajor JIT for browser/embedded usehttps://cmajor.dev/
SOUL2018 (announced ADC2018), 1.0 beta 2019ROLI / JUCE (Julian Storer, Cesare Ferrari)audio-DSP / plug-in authoringDeprecated 2022; repo soul-lang/SOUL archived after ROLI’s collapse; spiritual successor is Cmajorhttps://github.com/soul-lang/SOUL
Faust2002GRAME / INRIA (Yann Orlarey, Stéphane Letz, Dominique Fober)audio-DSP / portable signal-processor sourceVery active; cross-listed with music-audio; compiles to C++/LLVM/Wasm/JUCE/VST/AU/Web Audiohttps://faust.grame.fr/
FFmpeg filter-graph syntax~2008 (libavfilter merged into FFmpeg trunk)FFmpeg project (Fabrice Bellard origins, 2000)video-pipeline / audio-DSP / filter-chainVery active; FFmpeg 8.0/8.1 stable in 2026 (8.1.1 released 2026-05-04); 7.1 LTS branch still maintainedhttps://ffmpeg.org/ffmpeg-filters.html
GStreamer gst-launch pipeline syntax2001 (GStreamer 0.1.0); current syntax stable from ~0.10 erafreedesktop / GNOME (Wim Taymans, Erik Walthinsen)video-pipeline / audio-DSP / general-multimediaActive; GStreamer 1.28 stable (1.28.1 released 2026-02), 1.29 unstable dev, 1.30 next stablehttps://gstreamer.freedesktop.org/documentation/tools/gst-launch.html
DirectShow filter-graph notation1996 (ActiveMovie → DirectShow)Microsoftvideo-pipeline / Windows-multimedia (legacy)Legacy / maintenance; superseded by Media Foundation but still used in capture, broadcast, and legacy DV/MPEG-2 stackshttps://learn.microsoft.com/en-us/windows/win32/directshow/directshow
AudioWorklet processor descriptors2018 (Chrome 66, Web Audio API)W3C Web Audio WGaudio-DSP / browserActive; Web Audio API 1.1 published; 128-sample fixed quantum, JS-on-render-thread; AudioWorkletProcessor.process() is the universal browser-DSP entry pointhttps://www.w3.org/TR/webaudio-1.1/
VST3 SDK parameter description2017 (VST 3.6.7), MIT-relicensed 2024Steinbergaudio-DSP / plug-in parameter & preset DSLActive; current VST3 SDK on GitHub under MIT (GPLv3 + Steinberg proprietary licences withdrawn); parameters are normalized doubles in [0.0, 1.0] arranged in a treehttps://github.com/steinbergmedia/vst3sdk
CMSIS-DSP API conventions2010 (CMSIS 2.0)Armgeneral-DSP / embedded (Cortex-M, Cortex-A)Very active; standalone CMSIS-DSP pack on GitHub, Apache 2.0, separated from CMSIS-Core in 2021; arm_<op>_q7/q15/q31/f16/f32 naming + Q-format parameter conventions are an embedded-DSP vocabularyhttps://github.com/ARM-software/CMSIS-DSP
TensorFlow Lite Micro signal layers~2019Google / TFLM working groupspeech-recognition / embedded-ML / DSP-ML convergenceActive; signal/ ops library covering FFT, framing, filterbank, energy, PCAN — the bridge between classical DSP and embedded TinyMLhttps://github.com/tensorflow/tflite-micro
Simulink + DSP System Toolbox1990 (Simulink), DSP Blockset 1995 → DSP System Toolbox 2010MathWorksgeneral-DSP / mixed-signal / industrialVery active; R2026a released April 2026; Filter Designer/Analyzer apps refreshed; Simulink Copilot addedhttps://www.mathworks.com/products/dsp-system.html
MATLAB Signal Processing Toolbox1988MathWorksgeneral-DSP / scientificVery active; R2026a includes new Filter Designer and Filter Analyzer apps, Signal Labeler, Signal Feature Extractorhttps://www.mathworks.com/products/signal.html
LabVIEW + DSP modules1986 (LabVIEW), DSP modules 1990s+National Instrumentsmixed-signal / instrumentationActive; cross-link visual-dataflow; G-language is a true visual-dataflow DSLhttps://www.ni.com/en-us/shop/labview.html
Verilog-AMS / SPICE netlistsSPICE 1973 (UC Berkeley); Verilog-AMS 1.0 2000 (Accellera)UCB / Accelleramixed-signal / analog modellingActive (SPICE variants ubiquitous; Verilog-AMS niche); cross-link hdlhttps://www.accellera.org/downloads/standards/v-ams
Bluetooth codec param-config DSLs (SBC / aptX / aptX HD / aptX Adaptive / aptX Lossless / LDAC / LC3)SBC 1999 (A2DP), aptX 1988 (DTS/Qualcomm), LDAC 2015 (Sony), LC3 2020 (Bluetooth SIG)Bluetooth SIG, Qualcomm, Sony, Fraunhofer/Ericsson (LC3)codec-config / wireless audioActive; LC3 mainstream in 2026 with Bluetooth LE Audio (Pixel 9 / Galaxy S25); aptX Lossless ~1.2 Mbps on Snapdragon 8 Gen 3+; LDAC adaptive 330/660/990 kbpshttps://www.bluetooth.com/specifications/specs/lc3-codec-test-suite/
ITU-R BS.1770 / EBU R 128 loudness pseudocodeBS.1770 first published 2006, BS.1770-4 2015; EBU R 128 first published 2010ITU-R / EBUaudio-DSP / broadcast specActive; the legal basis for broadcast loudness compliance worldwide (CALM Act in US, AGCOM 219/09 in IT, etc.); BS.1770-5 in revision; algorithm is K-weighting + momentary 400 ms / short-term 3 s / integrated, absolute gate −70 LUFS, relative gate −10 LUhttps://tech.ebu.ch/docs/r/r128.pdf
Kaldi nnet3 xconfig / Vosk model.confKaldi 2014 (Povey et al., Johns Hopkins), nnet3 ~2015, Vosk 2019JHU CLSP / Alpha Cepheispeech-recognition / model-descriptionActive for Vosk (alphacephei.com); Kaldi mostly maintenance after the rise of end-to-end ESPnet/k2/icefall, but xconfig is still the canonical compositional DSL for layered acoustic-model definitionshttps://kaldi-asr.org/doc/dnn3.html

Notable threads

  • The SOUL → Cmajor pivot. SOUL was Julian Storer’s serious attempt to make audio DSP authorable by people who weren’t C++ experts. The 2018 ADC announcement and 2019 1.0-beta release got real traction, with a “SOUL patch” format and a JUCE-hosted runtime that gave you no-allocation, deterministic, sample-accurate execution. Then ROLI hit financial trouble in 2021 and the SOUL team was unwound; the soul-lang/SOUL repo was archived in 2022 and the language was officially abandoned. Storer and Cesare Ferrari restarted the project as Cmajor under SoundStacks in 2021 — same design ethos (deterministic, JIT-compilable, no allocation in audio thread) but with a more pragmatic language surface (closer to a constrained C-family syntax instead of SOUL’s slightly more idiosyncratic dialect), a built-in cmaj CLI runner, a VS Code extension, and a JUCE/VST exporter that makes “write a plug-in in this DSL and ship it as a real binary” actually work. As of 2026 Cmajor is the strongest candidate for “the modern DSP language.”

  • FFmpeg filter-graph as the most-used DSP DSL on Earth. No music language — Csound, SuperCollider, Faust, Pd, Max — is in the same order of magnitude as FFmpeg’s filter-graph syntax for total lines-of-DSL written in production. Every video CDN, every transcoding farm, every CI pipeline that touches media, every OBS scene, every browser-side ffmpeg.wasm app is emitting strings like [0:v]scale=1920:1080,fps=30[v];[0:a]aresample=48000[a] and feeding them to avfilter_graph_parse_ptr(). The grammar is small (chain separators ;, label brackets [], filter args =key:val, alt-graph ,) but it’s a real DSL with a real parser, and the fact that almost no one calls it a “language” is itself a sociological fact about what gets canonized.

  • GStreamer pipeline syntax — the GNOME analogue. gst-launch-1.0 v4l2src ! videoconvert ! x264enc ! mp4mux ! filesink location=out.mp4 is the same idea: a textual pipeline DSL parsed (here by gst_parse_launch()) into a runtime element graph. The ! is a Unix-pipe analogue chosen to make the syntax read like a shell pipeline; element properties go inline (x264enc bitrate=2000); pads can be linked explicitly with name=. GStreamer 1.28 (2026-02) added Whisper-based STT element, AV1 V4L2 stateful decoder, and Vulkan Video AV1/VP9 decode — all addressable from this same pipeline syntax. It’s the dominant Linux-side multimedia DSL.

  • Determinism is what you’re paying for. A general-purpose language gives you the universal Turing machine for free and bounded latency for nothing. DSP DSLs reverse the trade: you give up convenience (you can’t malloc in the audio thread, you can’t iterate an unknown-length container, sometimes you can’t even branch) in exchange for the engine guaranteeing that your code completes in N samples, that the same input produces bit-identical output across CPUs, that there will be no GC pause, no priority inversion, no denormal slowdown. Cmajor and SOUL spent enormous design effort on this. AudioWorklet’s 128-sample render quantum is a hard W3C constant for the same reason. Faust’s pure-functional algebra forbids state outside explicit ~ recursion specifically because it lets the compiler prove things about timing.

  • The codec-config underbelly. When a Pixel 9 talks to a pair of LE Audio earbuds, what’s actually happening is a negotiation through a parameter-config DSL almost no one outside the Bluetooth SIG sees: SBC’s bitpool / sub-band / block-length values; LC3’s frame-duration (7.5 ms vs 10 ms) and bitrate-per-channel; aptX Adaptive’s variable-rate target between 280 kbps and 420 kbps; aptX Lossless’s switching threshold to ~1.2 Mbps when RF conditions allow; LDAC’s three modes (330 / 660 / 990 kbps). These are all small but real DSLs — encoded as TLV blobs over GATT or as enum/integer parameter sets in standards prose — and the entire Bluetooth-audio quality wars (LDAC vs aptX HD vs aptX Adaptive vs LC3) play out at this layer. iPhones still don’t speak LC3 as of early 2026, which is itself a strategic choice about which codec-config DSL the platform supports.

  • Speech-recognition’s nnet3 config — a small but real DSL. Kaldi’s nnet3 framework lets you describe a neural acoustic model not by writing PyTorch but by composing layers in a .xconfig file: input dim=40 name=input / relu-batchnorm-layer name=tdnn1 dim=625 / output-layer name=output dim=$num_targets and so on, parsed by steps/nnet3/xconfig_to_configs.py into a lower-level Kaldi config that defines the actual DAG of Component objects glued by named edges. It’s a constrained model-description language — not Turing complete, but expressive enough for TDNN, TDNN-F, LSTM, BLSTM, attention layers — and Vosk’s model.conf plus mfcc.conf are the deployment-time companions describing decoding beams, silence phones, feature pipelines (MFCC vs MFCC-hires, optional pitch features). The whole thing is a workable answer to “how do you describe a layered acoustic model declaratively without writing C++?”

  • Where DSP meets ML. TensorFlow Lite Micro’s signal/ ops library — FFT, framing, filterbank, energy, PCAN — was added precisely because the embedded ML community kept reinventing the same DSP front-end (mel-spectrogram, log-mel features) in slightly-incompatible ways. The ops are now part of the TFLM standard library, schema-described in the TFLM model format, and the line between “classical DSP filter” and “learned filter layer” is dissolving. You can compose a wake-word detector that’s CMSIS-DSP filtering for the audio front end, then a tiny CNN through TFLM, and the parameter description for both lives in the same model file. Cmajor and Faust will likely follow this convergence — Faust already has a faust2tflite path and Cmajor’s roadmap includes ONNX-style learned-component import.

Citations