Bibliographic Citation & Scholarly Metadata DSLs Family Index


type: language-family-index family: citation-formats languages_catalogued: 24 tags: [language-reference, family-index, citation-formats, bibtex, biblatex, csl, marc, jats, crossref, datacite, openalex, scholarly-metadata]

Bibliographic Citation & Scholarly Metadata — Family Index

Family overview

The citation-and-metadata family is unusually old by computing standards. Its earliest member, MARC (Machine-Readable Cataloging), was specified by Henriette Avram at the Library of Congress in 1968 — predating SQL by roughly a decade and predating C by four years — and is still the format every library catalogue in the world ingests in 2026. The lineage that academics actually type into manuscripts, by contrast, starts with Oren Patashnik’s BibTeX in 1985, inheriting TeX’s stylistic conservatism: a record is @article{key, field = {value}, ...}, and twenty-eight standard entry types cover the world. BibLaTeX (Philipp Lehman, first stable release 2006; CTAN package last touched 2026-05-09) was a wholesale modernisation that kept the .bib syntax but extended the field model with dozens of new entry types (@online, @software, @dataset, @thesis variants), Unicode-native names, and a Lua-based backend (biber). Together BibTeX + BibLaTeX still anchor the TeX side of academic publishing.

In parallel, the library-science world built record formats for catalogues rather than footnotes: MARC 21 (the 1999 unification of USMARC and CAN/MARC; latest update No. 41 in December 2025), its XML transcription MARCXML, the Library of Congress’s modern alternative MODS, and the lightweight cross-domain Dublin Core (DCMI Metadata Terms, 1995+). These languages think in cataloguer’s units — leader, indicators, subfields, controlled vocabularies — not in @inproceedings. Reference managers (EndNote, Reference Manager, ProCite) built a separate ecosystem in the 1980s around the tag-per-line RIS format (TY - JOUR / AU - ... / PY - 2024), which became the de facto export format for PubMed, Scopus, and Web of Science.

The modern era is CSL-driven. Citation Style Language splits the problem into two languages: CSL JSON (citeproc-json, the data interchange format — what Zotero, Mendeley, Pandoc, Hugo, Quarto, and academic blogging engines all ingest) and CSL XML (the style language, an XML DSL for describing how citations should be formatted). CSL 1.0.2 (released 22 October 2021, current in 2026) governs both; the Zotero Style Repository hosts roughly 10,000 journal-specific styles, all written in CSL XML. Scholarly publishing layered on JATS (Journal Article Tag Suite, ANSI/NISO Z39.96-2024 v1.4, published 31 October 2024) — every PubMed Central article is JATS XML — and the DOI infrastructure layered on CrossRef (deposit schema 5.4.0) and DataCite (kernel 4.7, released 3 March 2026) metadata.

The 2020s added two more strata. CFF (Citation File Format, current 1.2.0) put citation metadata on GitHub via a CITATION.cff YAML file — closing the long-standing gap that BibTeX and RIS could not model “the artifact is a Git repository at version v1.2.3.” And the open citation graphsOpenAlex (launched January 2022 as the successor to the terminated Microsoft Academic Graph; ~245M works, CC0, as of early 2026), OpenCitations / COCI (latest release 2026-02-17, ~450M DOI-to-DOI citation links), and Wikidata’s bibliographic items — turned the previously paywalled citation network into queryable RDF/JSON-LD data.

In our deep library

None of these formats has its own deep-library Tier 1/Tier 2 note; they are all data DSLs with fixed grammars rather than general-purpose languages.

Cross-reference:

  • document-typesetting — BibTeX and BibLaTeX live inside the TeX/LaTeX ecosystem; \cite{...} is the user-facing surface and bibtex / biber are the post-processors.
  • api-description — DataCite, CrossRef, OpenAlex, and ROR all publish JSON Schema / OpenAPI for their REST APIs; the metadata schema and the API description overlap.
  • notation-spec — MARC 21 has a formal binary record grammar (ISO 2709: leader, directory, variable fields, end-of-record 0x1D); JATS, MODS, MARCXML, and DataCite are all XML-Schema-defined.
  • math-notation — citations to math papers travel in BibTeX with MathJax/MathML in titles; JATS embeds MathML directly.
  • query — OpenAlex, OpenCitations, and Wikidata are queryable as graph data via REST and SPARQL respectively.

Tier 3 family table

FormatFirst appearedOriginType (data / style / classification / identifier / article-markup)Status (2026)URL
BibTeX1985Oren Patashnik (Stanford), as companion to TeXrecord-level data format (.bib)Universal in TeX-ecosystem academic publishing; the format itself is frozen and that is the pointhttps://www.bibtex.org/Format/
BibLaTeX2006 (first stable); CTAN package updated 2026-05-09Philipp Lehman, then Philip Kime, Audrey Boruvka, Joseph Wrightrecord-level data format + LaTeX style package; superset of BibTeXActive, dominant choice for new LaTeX projects; uses biber as backendhttps://ctan.org/pkg/biblatex
RIS1980sResearch Information Systems / Reference Manager (later Clarivate)record-level data format (tag-per-line: TY/AU/JA/PY/EP/ER)Active, the de facto export format from PubMed, Scopus, Web of Sciencehttps://en.wikipedia.org/wiki/RIS_(file_format)
EndNote XML / .enw1988 (EndNote 1.0)Niles & Associates → Thomson Reuters → Clarivaterecord-level data format (XML and tagged-text variants)Active, proprietary but ubiquitous via EndNote installshttps://support.clarivate.com/Endnote/
CSL JSON (citeproc-json)~2009Frank Bennett, Rintze Zelle, Bruce D’Arcus (CSL project)citation-data interchange format (JSON); canonical input to citeprocUniversal — Zotero, Mendeley, Pandoc, Hugo, Quarto, Hypothesis all use ithttps://github.com/citation-style-language/schema
CSL XML (style language)2004 (v0.x); 1.0 in 2010; 1.0.2 (22 Oct 2021), current in 2026Bruce D’Arcus, Simon Kornblith, Frank Bennettcitation-style language (XML DSL describing how citations are formatted)Active, ~10,000 styles in Zotero Style Repositoryhttps://docs.citationstyles.org/en/stable/specification.html
CFF (Citation File Format)2017; 1.2.0 current in 2026Stephan Druskat et al.record-level data format (YAML) for software/dataset citation; surfaced by GitHub’s “Cite this repository” buttonActive, growing rapidly with software-citation pushhttps://citation-file-format.github.io/
DataCite Metadata Schema2011 (v2.0); kernel 4.7 released 3 March 2026 (4.6 released 5 Dec 2024)DataCite consortium (DOI registration agency for research data)record-level XML schema for research-output DOIsActive, mandatory for DataCite DOI registrationhttps://schema.datacite.org/
CrossRef Deposit Schema2000Crossref (DOI registration agency for scholarly publications)record-level XML schema for DOI registration; schema 5.4.0 current in 2026Active, mandatory for Crossref DOI registrationhttps://www.crossref.org/documentation/schema-library/
MARC 21 (binary, ISO 2709)1999 (USMARC + CAN/MARC unification); ancestry to MARC 1968; Update No. 41 (Dec 2025), twice-yearly cadenceLibrary of Congress, Network Development and MARC Standards Officerecord-level binary data format (leader + directory + variable fields)Active, the substrate of every library catalogue ILS in the worldhttps://www.loc.gov/marc/bibliographic/
MARCXML2002Library of CongressXML transcription of MARC 21Active, current preferred wire form for MARC interchangehttps://www.loc.gov/standards/marcxml/
MODS (Metadata Object Description Schema)2002Library of Congress, as MARC’s modern XML cousinrecord-level XML metadata schemaActive, MODS 3.8 current; common in digital-library systemshttps://www.loc.gov/standards/mods/
Dublin Core (DCMI Metadata Terms)1995Dublin Core Metadata Initiative (workshop in Dublin, OH)cross-domain metadata vocabulary (15 simple core elements + DCMI Terms refinements)Active, the lowest-common-denominator metadata vocabularyhttps://www.dublincore.org/specifications/dublin-core/dcmi-terms/
BibJSON2010OKFN / BibServer projectrecord-level data format (JSON encoding of BibTeX-equivalent records)Niche / legacy-active, partial uptake; CSL JSON eclipsed ithttps://okfnlabs.org/bibjson/
JATS (Journal Article Tag Suite)2003 (NLM DTD origins); ANSI/NISO Z39.96 since 2012; v1.4 published 31 Oct 2024 (Z39.96-2024)NCBI / NLM → NISOjournal-article XML markup language (article-level full-text with metadata)Active, the silent giant — every PubMed Central article is JATShttps://jats.nlm.nih.gov/
NLM Journal Archiving Tag Suite2003 (NLM DTD 1.0)NCBI / NLMpredecessor/sibling DTD that became JATSSuperseded by JATS but still in archiveshttps://dtd.nlm.nih.gov/
OpenURL ContextObject (Z39.88)2004 (NISO Z39.88-2004; reaffirmed 2010)NISO; concept by Herbert Van de Sompelreferrer/link-resolver DSL (KEV and XML serialisations)Active, every academic library’s link-resolver speaks ithttps://www.niso.org/publications/ansiniso-z3988-2004-r2010
OpenAlex (JSON-LD)January 2022 (succeeded MAG, terminated 31 Dec 2021)OurResearch (Heather Piwowar, Jason Priem)open-data citation graph; JSON / JSON-LD over RESTActive, ~245M works under CC0; usage-based pricing introduced Feb 2026https://docs.openalex.org/
OpenCitations / COCI2017 (COCI launch); latest dataset release 2026-02-17OpenCitations (Silvio Peroni, David Shotton)open-data citation index; CSV / N-Triples / SCHOLIX / SPARQLActive, ~450M DOI-to-DOI citation linkshttps://opencitations.net/index/coci
Wikidata bibliographic items (P-properties + JSON-LD)2014+ (WikiCite initiative)Wikimedia Foundation; WikiCite communityopen-data citation graph as Wikidata itemsActive, ~40M+ scholarly workshttps://www.wikidata.org/wiki/Wikidata:WikiProject_Source_MetaData
LCC / DDC / UDCLCC 1897+, DDC 1876+, UDC 1905+Library of Congress / Melvil Dewey / Paul Otlet & Henri La Fontaineclassification schemes (hierarchical subject codes)Active, used by virtually every library globallyhttps://www.loc.gov/catdir/cpso/lcco/
ORCID iD + ORCID schema2012ORCID Inc. (community non-profit)identifier system for researchers (16-digit iD + ORCID record schema in JSON / XML)Active, ~20M registered iDs; required by many journals and fundershttps://info.orcid.org/documentation/
ROR (Research Organization Registry)2019ROR community (Crossref / DataCite / California Digital Library + others); now hosted by Crossrefidentifier system for research organisations (ROR ID + JSON schema, v2 current in 2026)Active, 120,000+ organisationshttps://ror.org/
BIBFRAME (Bibliographic Framework)2012 (initiative); BIBFRAME 2.0 in 2016Library of Congresslinked-data successor concept to MARC; RDF vocabularyActive (in pilot at LC and major research libraries); slow long-term replacement of MARChttps://www.loc.gov/bibframe/

Notable threads

  • BibTeX’s longevity is structural, not nostalgic. A 1985 format remains the dominant academic citation standard in 2026 because the constraints that drove its design have not changed: TeX still dominates the publishing pipelines of mathematics, physics, computer science, and most of engineering; the data model (a flat record with typed fields) is dead-simple to parse with a hand-rolled lexer; and BibLaTeX (2006+) extended the field model — adding @online, @software, @dataset, multi-language entries, Unicode names — without breaking the .bib file syntax. The format’s calcification is its feature: the universe of .bib files written between 1985 and 2026 still parses with the same grammar.

  • CSL XML is a Turing-complete-adjacent style language hidden in plain sight. A CSL style file is an XML program: it has variables (<text variable="title"/>), conditionals (<choose><if .../><else-if .../><else/></choose>), iteration over name lists (<names><name .../></names>), recursion through <group>, locale-aware string lookup, and per-language pluralisation. The Zotero Style Repository hosts roughly 10,000 journal-specific styles, all written in this XML — Nature, Science, Cell, every IEEE/ACM/APA/Chicago variant. CSL 1.0.2 (October 2021) is the stable target; 1.0.1-dev experiments have been running for years but the spec has been deliberately frozen to keep the 10,000 styles parseable.

  • The split between citation-data languages and bibliographic-record languages. BibTeX, RIS, CSL JSON, and CFF are citation-data languages — written by authors and reference managers, optimised for “I want to cite this thing in my paper.” MARC 21, MODS, MARCXML, JATS, and the DataCite/CrossRef schemas are bibliographic-record languages — written by cataloguers, publishers, and DOI agencies, optimised for “I want to describe this thing in a catalogue or registry.” The two camps overlap (a journal article appears in both), but their field models, controlled vocabularies, and audiences differ substantially. EndNote and Zotero straddle the divide by ingesting both and projecting CSL JSON outward.

  • MARC 21 as one of the oldest still-used data formats. MARC predates SQL by ~10 years (Avram’s first format went into pilot in 1968; SQL was specified at IBM in 1974). It was originally distributed on punched cards and 9-track magnetic tape, encoded as ISO 2709 with three control characters: field terminator 0x1E, record terminator 0x1D, and subfield delimiter 0x1F. The 2026 binary form is the same 1968 binary form. BIBFRAME (initiated 2012, BIBFRAME 2.0 in 2016) is the long-term linked-data successor, but adoption has been glacial — most ILS vendors still ship MARC-native code paths because every existing record in every existing catalogue is MARC.

  • JATS as the silent giant of scholarly publishing. Authors don’t see it; reviewers don’t see it; even copy-editors usually don’t see it. But every journal article that ends up in PubMed Central is JATS XML, every modern journal-publishing pipeline (eLife’s Continuum, PKP’s OJS, Hindawi/Wiley/Elsevier internal pipelines) produces it, and every preservation system (Portico, CLOCKSS) ingests it. Z39.96-2024 (JATS 1.4, published 31 October 2024) is the current standard, replacing the 2021 version 1.3. JATS is the format in which the world’s scholarly literature is actually stored.

  • The DOI infrastructure rests on two metadata schemas. Crossref (founded 2000, ~150M registered DOIs) handles scholarly publications; DataCite (founded 2009, ~80M registered DOIs) handles research data and software. Crossref’s deposit schema 5.4.0 and DataCite’s kernel 4.7 (released 3 March 2026) are the XML/JSON contracts that every DOI registration must satisfy. Both publish complementary REST APIs and bulk public data files (Crossref’s 2026 public data file released early 2026; DataCite’s quarterly dumps), which together feed every modern citation tool.

  • CFF closes the software-citation gap. Through 2017, citing a piece of software in a paper meant either inventing a @misc BibTeX entry or citing an associated paper, neither of which captured “this is the artifact, at this version, at this DOI/URL/Git ref.” CFF (Citation File Format, current 1.2.0) — a YAML file named CITATION.cff at the root of a Git repository — solves this directly. GitHub renders a “Cite this repository” button when it sees one, and Zenodo + the Software Heritage archive consume it. CFF 1.2.0 added preferred-citation, multiple authors with ORCID iDs, and software-version semantics.

  • The open-data wave: OpenAlex, OpenCitations, Wikidata. Until ~2020, the citation network was effectively private — owned by Web of Science (Clarivate) and Scopus (Elsevier), behind expensive subscriptions. Microsoft Academic Graph (MAG, 2015–2021) was the first large open alternative but Microsoft terminated it on 31 December 2021. OpenAlex (OurResearch, January 2022) inherited the role with ~245M works under CC0 by 2026; OpenCitations / COCI (latest release 2026-02-17) provides ~450M open DOI-to-DOI citation links; Wikidata’s WikiCite project models scholarly works as Wikidata items queryable via SPARQL. These three together are reshaping bibliometrics — what counts as a citation is now a public, queryable graph rather than a vendor’s database.

Citations