AI / Agent / Prompt Languages Family Index
type: language-family-index family: ai-prompt-languages languages_catalogued: 18 tags: [language-reference, family-index, ai, llm, prompt-engineering, agentic]
AI / Agent / Prompt Languages — Family Index
- Type: Family index (Tier 3)
- Family: AI / Agent / Prompt Languages
- Languages catalogued: 18
- Last updated: 2026-05-07
Family overview
Between roughly 2022 (release of ChatGPT) and 2026, an entire sub-discipline of “languages for talking to large language models” has materialised. The trajectory is clear: raw f-string prompts → structured function-style declarations where prompts become typed Python/TypeScript functions (DSPy, BAML, Mirascope, ell, Marvin) → constrained generation that forces token-level conformance to a regex, JSON Schema, or context-free grammar (Outlines, llama-cpp GBNF, Guidance) → optimizer-driven prompt search where the prompt itself is a parameter compiled by a meta-program (DSPy-2 with BootstrapFewShot/MIPRO, TextGrad’s textual gradients). What started as “prompt engineering” has converged on something closer to probabilistic programming.
Three rough sub-families are useful to keep distinct. Constrained-output DSLs (Outlines, llama-cpp GBNF, Guidance, partly BAML) intervene in the decoding loop itself, masking logits to enforce a grammar or schema; they are the layer closest to the model. Program-style DSLs (DSPy, LMQL, BAML, ell, Mirascope, Marvin, LangChain LCEL, Semantic Kernel planners) treat LLM calls as composable typed functions, with the host language doing routing, retries, and tool dispatch. Eval / harness DSLs (Promptfoo, lm-evaluation-harness, BIG-bench) describe what good looks like — a YAML or Python config that runs a prompt across a matrix of models, datasets, and assertions, becoming the “unit-test layer” of the stack.
The field is also notable for what it is not: there is no agreed-upon standard. Anthropic’s XML-style tool-call convention, OpenAI’s JSON function-calling schema, and Google’s Gemini structured-output spec all coexist; frameworks like LangChain attempt to abstract over them and routinely break when providers change wire formats. Treat anything in this index as a moving target.
In our deep library
None of these have Tier 1 or Tier 2 deep notes — the family is too new and too volatile for the canonical “core syntax + idioms” treatment most language notes get. Cross-reference:
- Python — host for DSPy, LMQL, Outlines, ell, Marvin, Guidance, lm-evaluation-harness, TextGrad
- TypeScript — host for Genaiscript, BAML clients, LangChain JS, Mirascope-TS
- config-and-dsl — Promptfoo configs sit alongside other YAML/HCL eval harnesses
- query languages — KQL-for-Copilot and MSGraph queries are the substrate AI assistants increasingly sit on top of
Tier 3 family table
| Language / DSL | First appeared | Origin | Status (2026) | URL |
|---|---|---|---|---|
| DSPy | 2023 | Stanford NLP (Omar Khattab) | Active, 2.x; adopted by enterprise | https://dspy.ai/ |
| DSPy-2 / Optimizers (BootstrapFewShot, MIPRO, MIPROv2) | 2024 | Stanford NLP | Active; the “compile-your-prompt” thesis | https://dspy.ai/learn/optimization/optimizers/ |
| LMQL | 2022 | ETH Zurich (SRI Lab) | Maintenance; influence absorbed into DSPy/Guidance | https://lmql.ai/ |
| Guidance | 2023 | Microsoft Research | Active 0.2+; constrained generation + token healing | https://github.com/guidance-ai/guidance |
| Outlines | 2023 | .txt / Normal Computing (Rémi Louf) | Active; FSM-based regex/JSON Schema constrained decoding | https://github.com/dottxt-ai/outlines |
| llama-cpp grammars (GBNF) | 2023 | ggml / llama.cpp project | Active; de-facto grammar standard for local inference | https://github.com/ggml-org/llama.cpp/blob/master/grammars/README.md |
| Microsoft Semantic Kernel (planner DSL) | 2023 | Microsoft | Active 1.x; planner deprecated in favour of agent + function-calling APIs | https://learn.microsoft.com/en-us/semantic-kernel/ |
| LangChain LCEL | 2023 | LangChain Inc. | Active but de-emphasised in v0.3+ in favour of LangGraph | https://python.langchain.com/docs/concepts/lcel/ |
| ell | 2024 | William Guss | Active; “language model programs as versioned Python functions” | https://github.com/MadcowD/ell |
| Marvin | 2023 | Prefect | Active 3.x; declarative LM type-coercion + agents | https://github.com/PrefectHQ/marvin |
| Promptfoo | 2023 | Ian Webster | Active; YAML/JS prompt-eval harness, broadly adopted in CI | https://www.promptfoo.dev/ |
| Genaiscript | 2024 | Microsoft (DevDiv) | Active; JS/TS-based prompt scripts with VS Code integration | https://microsoft.github.io/genaiscript/ |
| Anthropic XML tool-call style | 2023 | Anthropic | Convention, not a language — the prompting style for tool use in Claude prompts | https://docs.claude.com/en/docs/build-with-claude/tool-use/overview |
| Mirascope | 2024 | Mirascope (William Bakst) | Active; typed prompt library, Pydantic-native | https://mirascope.com/ |
| BAML | 2024 | Boundary | Active; standalone DSL with Rust compiler, generates Py/TS clients | https://docs.boundaryml.com/ |
| TextGrad | 2024 | Stanford (James Zou lab) | Active research; “textual gradients” for prompt optimisation | https://github.com/zou-group/textgrad |
| BIG-bench / BIG-Bench Hard | 2022 / 2023 | Google + 450-author collab; Suzgun et al. for BBH | Reference benchmark; eval DSL via JSON task specs | https://github.com/google/BIG-bench |
| lm-evaluation-harness | 2021 | EleutherAI | Active; the de-facto eval-DSL for open-weights LLMs (Hugging Face Open LLM Leaderboard substrate) | https://github.com/EleutherAI/lm-evaluation-harness |
Adjacent: AI assistants on top of query DSLs
Not languages of their own, but worth noting because AI agents increasingly generate them:
| Surface | Underlying DSL | Notes |
|---|---|---|
| Microsoft Copilot for Security / Sentinel | KQL (Kusto Query Language) | Copilot translates NL questions into KQL over Defender/Sentinel logs |
| MSGraph natural-language queries | MSGraph query syntax (OData select) | M365 Copilot lowers NL into MSGraph calls; sits on top of query family |
| Text-to-SQL agents (Vanna, sqlcoder, OpenAI assistants) | SQL | The original text-to-DSL pattern; cross-ref query |
Notable threads
- Prompts → typed function signatures. DSPy, BAML, Mirascope, and ell all converged on the same idea: a prompt is the body of a function whose signature (input/output types) the framework should enforce. This made few-shot example bootstrapping mechanical: pick examples that match the signature, run them through the model, score, keep what works.
- The DSPy “compile your prompt” thesis. DSPy-2’s optimizers (BootstrapFewShot, COPRO, MIPRO, MIPROv2) treat the prompt — instructions and demonstrations — as parameters to be searched. Compile a program against a small labelled set, get back optimised prompts. By 2026 this is the most cited methodology for production prompt tuning that doesn’t require fine-tuning weights.
- Constrained decoding as a hardware-friendly invariant. Outlines, llama-cpp GBNF, and Guidance all manipulate logit masks to force grammar conformance. Because the constraint can be precomputed (FSM for Outlines, parser-state for GBNF), the runtime cost is near-free and it composes with batched inference. This is now standard in vLLM, TGI, and llama.cpp servers.
- Eval harnesses became the unit-test layer. Promptfoo’s YAML configs and lm-evaluation-harness’s task specs are the closest the LLM ecosystem has to
pytest. Promptfoo CI gates are common in 2026; “regression on eval suite” is now a blocking PR check at multiple AI-first companies. - Framework abstraction vs. direct API tension. LangChain LCEL and Semantic Kernel planners promised provider-agnostic abstractions, but provider feature drift (function calling, structured outputs, prompt caching, vision, computer-use, MCP) has made thin wrappers (BAML, Mirascope, ell, Genaiscript) more popular. The pattern: bind tightly to one or two providers, expose a single typed entry point per LLM call, ditch the multi-provider abstraction.
- AI-IDE-native DSLs. BAML ships a Rust compiler with VS Code integration that gives prompts the ergonomics of a real language: type errors at edit time, jump-to-definition, generated client SDKs. Genaiscript leans further: prompts are first-class JS/TS files that import like modules. The bet is that prompts deserve their own file type, not just strings inside Python.
- Anthropic XML tool-call convention as folk-DSL. Claude prompts using
<tool_name>...</tool_name>,<thinking>,<example>tags are not a formal language but operate as one in practice — there is a documented grammar, IDE support (Claude Code, MCP servers), and frameworks like Anthropic’s official SDKs parse these tags. Worth tracking as a convention that became infrastructure.
Citations
- https://dspy.ai/
- https://dspy.ai/learn/optimization/optimizers/
- https://lmql.ai/
- https://github.com/guidance-ai/guidance
- https://github.com/dottxt-ai/outlines
- https://github.com/ggml-org/llama.cpp/blob/master/grammars/README.md
- https://learn.microsoft.com/en-us/semantic-kernel/
- https://python.langchain.com/docs/concepts/lcel/
- https://github.com/MadcowD/ell
- https://github.com/PrefectHQ/marvin
- https://www.promptfoo.dev/
- https://microsoft.github.io/genaiscript/
- https://docs.claude.com/en/docs/build-with-claude/tool-use/overview
- https://mirascope.com/
- https://docs.boundaryml.com/
- https://github.com/zou-group/textgrad
- https://github.com/google/BIG-bench
- https://github.com/EleutherAI/lm-evaluation-harness
- https://learn.microsoft.com/en-us/kusto/query/
- https://learn.microsoft.com/en-us/graph/query-parameters