AI / Agent / Prompt Languages Family Index


type: language-family-index family: ai-prompt-languages languages_catalogued: 18 tags: [language-reference, family-index, ai, llm, prompt-engineering, agentic]

AI / Agent / Prompt Languages — Family Index

  • Type: Family index (Tier 3)
  • Family: AI / Agent / Prompt Languages
  • Languages catalogued: 18
  • Last updated: 2026-05-07

Family overview

Between roughly 2022 (release of ChatGPT) and 2026, an entire sub-discipline of “languages for talking to large language models” has materialised. The trajectory is clear: raw f-string prompts → structured function-style declarations where prompts become typed Python/TypeScript functions (DSPy, BAML, Mirascope, ell, Marvin) → constrained generation that forces token-level conformance to a regex, JSON Schema, or context-free grammar (Outlines, llama-cpp GBNF, Guidance) → optimizer-driven prompt search where the prompt itself is a parameter compiled by a meta-program (DSPy-2 with BootstrapFewShot/MIPRO, TextGrad’s textual gradients). What started as “prompt engineering” has converged on something closer to probabilistic programming.

Three rough sub-families are useful to keep distinct. Constrained-output DSLs (Outlines, llama-cpp GBNF, Guidance, partly BAML) intervene in the decoding loop itself, masking logits to enforce a grammar or schema; they are the layer closest to the model. Program-style DSLs (DSPy, LMQL, BAML, ell, Mirascope, Marvin, LangChain LCEL, Semantic Kernel planners) treat LLM calls as composable typed functions, with the host language doing routing, retries, and tool dispatch. Eval / harness DSLs (Promptfoo, lm-evaluation-harness, BIG-bench) describe what good looks like — a YAML or Python config that runs a prompt across a matrix of models, datasets, and assertions, becoming the “unit-test layer” of the stack.

The field is also notable for what it is not: there is no agreed-upon standard. Anthropic’s XML-style tool-call convention, OpenAI’s JSON function-calling schema, and Google’s Gemini structured-output spec all coexist; frameworks like LangChain attempt to abstract over them and routinely break when providers change wire formats. Treat anything in this index as a moving target.

In our deep library

None of these have Tier 1 or Tier 2 deep notes — the family is too new and too volatile for the canonical “core syntax + idioms” treatment most language notes get. Cross-reference:

  • Python — host for DSPy, LMQL, Outlines, ell, Marvin, Guidance, lm-evaluation-harness, TextGrad
  • TypeScript — host for Genaiscript, BAML clients, LangChain JS, Mirascope-TS
  • config-and-dsl — Promptfoo configs sit alongside other YAML/HCL eval harnesses
  • query languages — KQL-for-Copilot and MSGraph queries are the substrate AI assistants increasingly sit on top of

Tier 3 family table

Language / DSLFirst appearedOriginStatus (2026)URL
DSPy2023Stanford NLP (Omar Khattab)Active, 2.x; adopted by enterprisehttps://dspy.ai/
DSPy-2 / Optimizers (BootstrapFewShot, MIPRO, MIPROv2)2024Stanford NLPActive; the “compile-your-prompt” thesishttps://dspy.ai/learn/optimization/optimizers/
LMQL2022ETH Zurich (SRI Lab)Maintenance; influence absorbed into DSPy/Guidancehttps://lmql.ai/
Guidance2023Microsoft ResearchActive 0.2+; constrained generation + token healinghttps://github.com/guidance-ai/guidance
Outlines2023.txt / Normal Computing (Rémi Louf)Active; FSM-based regex/JSON Schema constrained decodinghttps://github.com/dottxt-ai/outlines
llama-cpp grammars (GBNF)2023ggml / llama.cpp projectActive; de-facto grammar standard for local inferencehttps://github.com/ggml-org/llama.cpp/blob/master/grammars/README.md
Microsoft Semantic Kernel (planner DSL)2023MicrosoftActive 1.x; planner deprecated in favour of agent + function-calling APIshttps://learn.microsoft.com/en-us/semantic-kernel/
LangChain LCEL2023LangChain Inc.Active but de-emphasised in v0.3+ in favour of LangGraphhttps://python.langchain.com/docs/concepts/lcel/
ell2024William GussActive; “language model programs as versioned Python functions”https://github.com/MadcowD/ell
Marvin2023PrefectActive 3.x; declarative LM type-coercion + agentshttps://github.com/PrefectHQ/marvin
Promptfoo2023Ian WebsterActive; YAML/JS prompt-eval harness, broadly adopted in CIhttps://www.promptfoo.dev/
Genaiscript2024Microsoft (DevDiv)Active; JS/TS-based prompt scripts with VS Code integrationhttps://microsoft.github.io/genaiscript/
Anthropic XML tool-call style2023AnthropicConvention, not a language — the prompting style for tool use in Claude promptshttps://docs.claude.com/en/docs/build-with-claude/tool-use/overview
Mirascope2024Mirascope (William Bakst)Active; typed prompt library, Pydantic-nativehttps://mirascope.com/
BAML2024BoundaryActive; standalone DSL with Rust compiler, generates Py/TS clientshttps://docs.boundaryml.com/
TextGrad2024Stanford (James Zou lab)Active research; “textual gradients” for prompt optimisationhttps://github.com/zou-group/textgrad
BIG-bench / BIG-Bench Hard2022 / 2023Google + 450-author collab; Suzgun et al. for BBHReference benchmark; eval DSL via JSON task specshttps://github.com/google/BIG-bench
lm-evaluation-harness2021EleutherAIActive; the de-facto eval-DSL for open-weights LLMs (Hugging Face Open LLM Leaderboard substrate)https://github.com/EleutherAI/lm-evaluation-harness

Adjacent: AI assistants on top of query DSLs

Not languages of their own, but worth noting because AI agents increasingly generate them:

SurfaceUnderlying DSLNotes
Microsoft Copilot for Security / SentinelKQL (Kusto Query Language)Copilot translates NL questions into KQL over Defender/Sentinel logs
MSGraph natural-language queriesMSGraph query syntax (OData select)M365 Copilot lowers NL into MSGraph calls; sits on top of query family
Text-to-SQL agents (Vanna, sqlcoder, OpenAI assistants)SQLThe original text-to-DSL pattern; cross-ref query

Notable threads

  • Prompts → typed function signatures. DSPy, BAML, Mirascope, and ell all converged on the same idea: a prompt is the body of a function whose signature (input/output types) the framework should enforce. This made few-shot example bootstrapping mechanical: pick examples that match the signature, run them through the model, score, keep what works.
  • The DSPy “compile your prompt” thesis. DSPy-2’s optimizers (BootstrapFewShot, COPRO, MIPRO, MIPROv2) treat the prompt — instructions and demonstrations — as parameters to be searched. Compile a program against a small labelled set, get back optimised prompts. By 2026 this is the most cited methodology for production prompt tuning that doesn’t require fine-tuning weights.
  • Constrained decoding as a hardware-friendly invariant. Outlines, llama-cpp GBNF, and Guidance all manipulate logit masks to force grammar conformance. Because the constraint can be precomputed (FSM for Outlines, parser-state for GBNF), the runtime cost is near-free and it composes with batched inference. This is now standard in vLLM, TGI, and llama.cpp servers.
  • Eval harnesses became the unit-test layer. Promptfoo’s YAML configs and lm-evaluation-harness’s task specs are the closest the LLM ecosystem has to pytest. Promptfoo CI gates are common in 2026; “regression on eval suite” is now a blocking PR check at multiple AI-first companies.
  • Framework abstraction vs. direct API tension. LangChain LCEL and Semantic Kernel planners promised provider-agnostic abstractions, but provider feature drift (function calling, structured outputs, prompt caching, vision, computer-use, MCP) has made thin wrappers (BAML, Mirascope, ell, Genaiscript) more popular. The pattern: bind tightly to one or two providers, expose a single typed entry point per LLM call, ditch the multi-provider abstraction.
  • AI-IDE-native DSLs. BAML ships a Rust compiler with VS Code integration that gives prompts the ergonomics of a real language: type errors at edit time, jump-to-definition, generated client SDKs. Genaiscript leans further: prompts are first-class JS/TS files that import like modules. The bet is that prompts deserve their own file type, not just strings inside Python.
  • Anthropic XML tool-call convention as folk-DSL. Claude prompts using <tool_name>...</tool_name>, <thinking>, <example> tags are not a formal language but operate as one in practice — there is a documented grammar, IDE support (Claude Code, MCP servers), and frameworks like Anthropic’s official SDKs parse these tags. Worth tracking as a convention that became infrastructure.

Citations