AI / Agent / Prompt Languages Family Index

type: language-family-index family: ai-prompt-languages languages_catalogued: 18 tags: [language-reference, family-index, ai, llm, prompt-engineering, agentic]

AI / Agent / Prompt Languages — Family Index

Type: Family index (Tier 3)
Family: AI / Agent / Prompt Languages
Languages catalogued: 18
Last updated: 2026-05-07

Family overview

Between roughly 2022 (release of ChatGPT) and 2026, an entire sub-discipline of “languages for talking to large language models” has materialised. The trajectory is clear: raw f-string prompts → structured function-style declarations where prompts become typed Python/TypeScript functions (DSPy, BAML, Mirascope, ell, Marvin) → constrained generation that forces token-level conformance to a regex, JSON Schema, or context-free grammar (Outlines, llama-cpp GBNF, Guidance) → optimizer-driven prompt search where the prompt itself is a parameter compiled by a meta-program (DSPy-2 with BootstrapFewShot/MIPRO, TextGrad’s textual gradients). What started as “prompt engineering” has converged on something closer to probabilistic programming.

Three rough sub-families are useful to keep distinct. Constrained-output DSLs (Outlines, llama-cpp GBNF, Guidance, partly BAML) intervene in the decoding loop itself, masking logits to enforce a grammar or schema; they are the layer closest to the model. Program-style DSLs (DSPy, LMQL, BAML, ell, Mirascope, Marvin, LangChain LCEL, Semantic Kernel planners) treat LLM calls as composable typed functions, with the host language doing routing, retries, and tool dispatch. Eval / harness DSLs (Promptfoo, lm-evaluation-harness, BIG-bench) describe what good looks like — a YAML or Python config that runs a prompt across a matrix of models, datasets, and assertions, becoming the “unit-test layer” of the stack.

The field is also notable for what it is not: there is no agreed-upon standard. Anthropic’s XML-style tool-call convention, OpenAI’s JSON function-calling schema, and Google’s Gemini structured-output spec all coexist; frameworks like LangChain attempt to abstract over them and routinely break when providers change wire formats. Treat anything in this index as a moving target.

In our deep library

None of these have Tier 1 or Tier 2 deep notes — the family is too new and too volatile for the canonical “core syntax + idioms” treatment most language notes get. Cross-reference:

Python — host for DSPy, LMQL, Outlines, ell, Marvin, Guidance, lm-evaluation-harness, TextGrad
TypeScript — host for Genaiscript, BAML clients, LangChain JS, Mirascope-TS
config-and-dsl — Promptfoo configs sit alongside other YAML/HCL eval harnesses
query languages — KQL-for-Copilot and MSGraph queries are the substrate AI assistants increasingly sit on top of

Tier 3 family table

Language / DSL	First appeared	Origin	Status (2026)	URL
DSPy	2023	Stanford NLP (Omar Khattab)	Active, 2.x; adopted by enterprise	https://dspy.ai/
DSPy-2 / Optimizers (BootstrapFewShot, MIPRO, MIPROv2)	2024	Stanford NLP	Active; the “compile-your-prompt” thesis	https://dspy.ai/learn/optimization/optimizers/
LMQL	2022	ETH Zurich (SRI Lab)	Maintenance; influence absorbed into DSPy/Guidance	https://lmql.ai/
Guidance	2023	Microsoft Research	Active 0.2+; constrained generation + token healing	https://github.com/guidance-ai/guidance
Outlines	2023	.txt / Normal Computing (Rémi Louf)	Active; FSM-based regex/JSON Schema constrained decoding	https://github.com/dottxt-ai/outlines
llama-cpp grammars (GBNF)	2023	ggml / llama.cpp project	Active; de-facto grammar standard for local inference	https://github.com/ggml-org/llama.cpp/blob/master/grammars/README.md
Microsoft Semantic Kernel (planner DSL)	2023	Microsoft	Active 1.x; planner deprecated in favour of agent + function-calling APIs	https://learn.microsoft.com/en-us/semantic-kernel/
LangChain LCEL	2023	LangChain Inc.	Active but de-emphasised in v0.3+ in favour of LangGraph	https://python.langchain.com/docs/concepts/lcel/
ell	2024	William Guss	Active; “language model programs as versioned Python functions”	https://github.com/MadcowD/ell
Marvin	2023	Prefect	Active 3.x; declarative LM type-coercion + agents	https://github.com/PrefectHQ/marvin
Promptfoo	2023	Ian Webster	Active; YAML/JS prompt-eval harness, broadly adopted in CI	https://www.promptfoo.dev/
Genaiscript	2024	Microsoft (DevDiv)	Active; JS/TS-based prompt scripts with VS Code integration	https://microsoft.github.io/genaiscript/
Anthropic XML tool-call style	2023	Anthropic	Convention, not a language — the prompting style for tool use in Claude prompts	https://docs.claude.com/en/docs/build-with-claude/tool-use/overview
Mirascope	2024	Mirascope (William Bakst)	Active; typed prompt library, Pydantic-native	https://mirascope.com/
BAML	2024	Boundary	Active; standalone DSL with Rust compiler, generates Py/TS clients	https://docs.boundaryml.com/
TextGrad	2024	Stanford (James Zou lab)	Active research; “textual gradients” for prompt optimisation	https://github.com/zou-group/textgrad
BIG-bench / BIG-Bench Hard	2022 / 2023	Google + 450-author collab; Suzgun et al. for BBH	Reference benchmark; eval DSL via JSON task specs	https://github.com/google/BIG-bench
lm-evaluation-harness	2021	EleutherAI	Active; the de-facto eval-DSL for open-weights LLMs (Hugging Face Open LLM Leaderboard substrate)	https://github.com/EleutherAI/lm-evaluation-harness

Adjacent: AI assistants on top of query DSLs

Not languages of their own, but worth noting because AI agents increasingly generate them:

Surface	Underlying DSL	Notes
Microsoft Copilot for Security / Sentinel	KQL (Kusto Query Language)	Copilot translates NL questions into KQL over Defender/Sentinel logs
MSGraph natural-language queries	MSGraph query syntax (OData $f i lt er,$ select)	M365 Copilot lowers NL into MSGraph calls; sits on top of `query` family
Text-to-SQL agents (Vanna, sqlcoder, OpenAI assistants)	SQL	The original text-to-DSL pattern; cross-ref query

Notable threads

Prompts → typed function signatures. DSPy, BAML, Mirascope, and ell all converged on the same idea: a prompt is the body of a function whose signature (input/output types) the framework should enforce. This made few-shot example bootstrapping mechanical: pick examples that match the signature, run them through the model, score, keep what works.
The DSPy “compile your prompt” thesis. DSPy-2’s optimizers (BootstrapFewShot, COPRO, MIPRO, MIPROv2) treat the prompt — instructions and demonstrations — as parameters to be searched. Compile a program against a small labelled set, get back optimised prompts. By 2026 this is the most cited methodology for production prompt tuning that doesn’t require fine-tuning weights.
Constrained decoding as a hardware-friendly invariant. Outlines, llama-cpp GBNF, and Guidance all manipulate logit masks to force grammar conformance. Because the constraint can be precomputed (FSM for Outlines, parser-state for GBNF), the runtime cost is near-free and it composes with batched inference. This is now standard in vLLM, TGI, and llama.cpp servers.
Eval harnesses became the unit-test layer. Promptfoo’s YAML configs and lm-evaluation-harness’s task specs are the closest the LLM ecosystem has to pytest. Promptfoo CI gates are common in 2026; “regression on eval suite” is now a blocking PR check at multiple AI-first companies.
Framework abstraction vs. direct API tension. LangChain LCEL and Semantic Kernel planners promised provider-agnostic abstractions, but provider feature drift (function calling, structured outputs, prompt caching, vision, computer-use, MCP) has made thin wrappers (BAML, Mirascope, ell, Genaiscript) more popular. The pattern: bind tightly to one or two providers, expose a single typed entry point per LLM call, ditch the multi-provider abstraction.
AI-IDE-native DSLs. BAML ships a Rust compiler with VS Code integration that gives prompts the ergonomics of a real language: type errors at edit time, jump-to-definition, generated client SDKs. Genaiscript leans further: prompts are first-class JS/TS files that import like modules. The bet is that prompts deserve their own file type, not just strings inside Python.
Anthropic XML tool-call convention as folk-DSL. Claude prompts using <tool_name>...</tool_name>, <thinking>, <example> tags are not a formal language but operate as one in practice — there is a documented grammar, IDE support (Claude Code, MCP servers), and frameworks like Anthropic’s official SDKs parse these tags. Worth tracking as a convention that became infrastructure.

Compendium

Explorer

AI / Agent / Prompt Languages Family Index

AI / Agent / Prompt Languages Family Index

type: language-family-index family: ai-prompt-languages languages_catalogued: 18 tags: [language-reference, family-index, ai, llm, prompt-engineering, agentic]

AI / Agent / Prompt Languages — Family Index

Family overview

In our deep library

Tier 3 family table

Adjacent: AI assistants on top of query DSLs

Notable threads

Citations

Graph View

Table of Contents