Bibliographic Citation & Scholarly Metadata DSLs Family Index
type: language-family-index family: citation-formats languages_catalogued: 24 tags: [language-reference, family-index, citation-formats, bibtex, biblatex, csl, marc, jats, crossref, datacite, openalex, scholarly-metadata]
Bibliographic Citation & Scholarly Metadata — Family Index
Family overview
The citation-and-metadata family is unusually old by computing standards. Its earliest member, MARC (Machine-Readable Cataloging), was specified by Henriette Avram at the Library of Congress in 1968 — predating SQL by roughly a decade and predating C by four years — and is still the format every library catalogue in the world ingests in 2026. The lineage that academics actually type into manuscripts, by contrast, starts with Oren Patashnik’s BibTeX in 1985, inheriting TeX’s stylistic conservatism: a record is @article{key, field = {value}, ...}, and twenty-eight standard entry types cover the world. BibLaTeX (Philipp Lehman, first stable release 2006; CTAN package last touched 2026-05-09) was a wholesale modernisation that kept the .bib syntax but extended the field model with dozens of new entry types (@online, @software, @dataset, @thesis variants), Unicode-native names, and a Lua-based backend (biber). Together BibTeX + BibLaTeX still anchor the TeX side of academic publishing.
In parallel, the library-science world built record formats for catalogues rather than footnotes: MARC 21 (the 1999 unification of USMARC and CAN/MARC; latest update No. 41 in December 2025), its XML transcription MARCXML, the Library of Congress’s modern alternative MODS, and the lightweight cross-domain Dublin Core (DCMI Metadata Terms, 1995+). These languages think in cataloguer’s units — leader, indicators, subfields, controlled vocabularies — not in @inproceedings. Reference managers (EndNote, Reference Manager, ProCite) built a separate ecosystem in the 1980s around the tag-per-line RIS format (TY - JOUR / AU - ... / PY - 2024), which became the de facto export format for PubMed, Scopus, and Web of Science.
The modern era is CSL-driven. Citation Style Language splits the problem into two languages: CSL JSON (citeproc-json, the data interchange format — what Zotero, Mendeley, Pandoc, Hugo, Quarto, and academic blogging engines all ingest) and CSL XML (the style language, an XML DSL for describing how citations should be formatted). CSL 1.0.2 (released 22 October 2021, current in 2026) governs both; the Zotero Style Repository hosts roughly 10,000 journal-specific styles, all written in CSL XML. Scholarly publishing layered on JATS (Journal Article Tag Suite, ANSI/NISO Z39.96-2024 v1.4, published 31 October 2024) — every PubMed Central article is JATS XML — and the DOI infrastructure layered on CrossRef (deposit schema 5.4.0) and DataCite (kernel 4.7, released 3 March 2026) metadata.
The 2020s added two more strata. CFF (Citation File Format, current 1.2.0) put citation metadata on GitHub via a CITATION.cff YAML file — closing the long-standing gap that BibTeX and RIS could not model “the artifact is a Git repository at version v1.2.3.” And the open citation graphs — OpenAlex (launched January 2022 as the successor to the terminated Microsoft Academic Graph; ~245M works, CC0, as of early 2026), OpenCitations / COCI (latest release 2026-02-17, ~450M DOI-to-DOI citation links), and Wikidata’s bibliographic items — turned the previously paywalled citation network into queryable RDF/JSON-LD data.
In our deep library
None of these formats has its own deep-library Tier 1/Tier 2 note; they are all data DSLs with fixed grammars rather than general-purpose languages.
Cross-reference:
- document-typesetting — BibTeX and BibLaTeX live inside the TeX/LaTeX ecosystem;
\cite{...}is the user-facing surface andbibtex/biberare the post-processors. - api-description — DataCite, CrossRef, OpenAlex, and ROR all publish JSON Schema / OpenAPI for their REST APIs; the metadata schema and the API description overlap.
- notation-spec — MARC 21 has a formal binary record grammar (ISO 2709: leader, directory, variable fields, end-of-record
0x1D); JATS, MODS, MARCXML, and DataCite are all XML-Schema-defined. - math-notation — citations to math papers travel in BibTeX with MathJax/MathML in titles; JATS embeds MathML directly.
- query — OpenAlex, OpenCitations, and Wikidata are queryable as graph data via REST and SPARQL respectively.
Tier 3 family table
| Format | First appeared | Origin | Type (data / style / classification / identifier / article-markup) | Status (2026) | URL |
|---|---|---|---|---|---|
| BibTeX | 1985 | Oren Patashnik (Stanford), as companion to TeX | record-level data format (.bib) | Universal in TeX-ecosystem academic publishing; the format itself is frozen and that is the point | https://www.bibtex.org/Format/ |
| BibLaTeX | 2006 (first stable); CTAN package updated 2026-05-09 | Philipp Lehman, then Philip Kime, Audrey Boruvka, Joseph Wright | record-level data format + LaTeX style package; superset of BibTeX | Active, dominant choice for new LaTeX projects; uses biber as backend | https://ctan.org/pkg/biblatex |
| RIS | 1980s | Research Information Systems / Reference Manager (later Clarivate) | record-level data format (tag-per-line: TY/AU/JA/PY/EP/ER) | Active, the de facto export format from PubMed, Scopus, Web of Science | https://en.wikipedia.org/wiki/RIS_(file_format) |
EndNote XML / .enw | 1988 (EndNote 1.0) | Niles & Associates → Thomson Reuters → Clarivate | record-level data format (XML and tagged-text variants) | Active, proprietary but ubiquitous via EndNote installs | https://support.clarivate.com/Endnote/ |
| CSL JSON (citeproc-json) | ~2009 | Frank Bennett, Rintze Zelle, Bruce D’Arcus (CSL project) | citation-data interchange format (JSON); canonical input to citeproc | Universal — Zotero, Mendeley, Pandoc, Hugo, Quarto, Hypothesis all use it | https://github.com/citation-style-language/schema |
| CSL XML (style language) | 2004 (v0.x); 1.0 in 2010; 1.0.2 (22 Oct 2021), current in 2026 | Bruce D’Arcus, Simon Kornblith, Frank Bennett | citation-style language (XML DSL describing how citations are formatted) | Active, ~10,000 styles in Zotero Style Repository | https://docs.citationstyles.org/en/stable/specification.html |
| CFF (Citation File Format) | 2017; 1.2.0 current in 2026 | Stephan Druskat et al. | record-level data format (YAML) for software/dataset citation; surfaced by GitHub’s “Cite this repository” button | Active, growing rapidly with software-citation push | https://citation-file-format.github.io/ |
| DataCite Metadata Schema | 2011 (v2.0); kernel 4.7 released 3 March 2026 (4.6 released 5 Dec 2024) | DataCite consortium (DOI registration agency for research data) | record-level XML schema for research-output DOIs | Active, mandatory for DataCite DOI registration | https://schema.datacite.org/ |
| CrossRef Deposit Schema | 2000 | Crossref (DOI registration agency for scholarly publications) | record-level XML schema for DOI registration; schema 5.4.0 current in 2026 | Active, mandatory for Crossref DOI registration | https://www.crossref.org/documentation/schema-library/ |
| MARC 21 (binary, ISO 2709) | 1999 (USMARC + CAN/MARC unification); ancestry to MARC 1968; Update No. 41 (Dec 2025), twice-yearly cadence | Library of Congress, Network Development and MARC Standards Office | record-level binary data format (leader + directory + variable fields) | Active, the substrate of every library catalogue ILS in the world | https://www.loc.gov/marc/bibliographic/ |
| MARCXML | 2002 | Library of Congress | XML transcription of MARC 21 | Active, current preferred wire form for MARC interchange | https://www.loc.gov/standards/marcxml/ |
| MODS (Metadata Object Description Schema) | 2002 | Library of Congress, as MARC’s modern XML cousin | record-level XML metadata schema | Active, MODS 3.8 current; common in digital-library systems | https://www.loc.gov/standards/mods/ |
| Dublin Core (DCMI Metadata Terms) | 1995 | Dublin Core Metadata Initiative (workshop in Dublin, OH) | cross-domain metadata vocabulary (15 simple core elements + DCMI Terms refinements) | Active, the lowest-common-denominator metadata vocabulary | https://www.dublincore.org/specifications/dublin-core/dcmi-terms/ |
| BibJSON | 2010 | OKFN / BibServer project | record-level data format (JSON encoding of BibTeX-equivalent records) | Niche / legacy-active, partial uptake; CSL JSON eclipsed it | https://okfnlabs.org/bibjson/ |
| JATS (Journal Article Tag Suite) | 2003 (NLM DTD origins); ANSI/NISO Z39.96 since 2012; v1.4 published 31 Oct 2024 (Z39.96-2024) | NCBI / NLM → NISO | journal-article XML markup language (article-level full-text with metadata) | Active, the silent giant — every PubMed Central article is JATS | https://jats.nlm.nih.gov/ |
| NLM Journal Archiving Tag Suite | 2003 (NLM DTD 1.0) | NCBI / NLM | predecessor/sibling DTD that became JATS | Superseded by JATS but still in archives | https://dtd.nlm.nih.gov/ |
| OpenURL ContextObject (Z39.88) | 2004 (NISO Z39.88-2004; reaffirmed 2010) | NISO; concept by Herbert Van de Sompel | referrer/link-resolver DSL (KEV and XML serialisations) | Active, every academic library’s link-resolver speaks it | https://www.niso.org/publications/ansiniso-z3988-2004-r2010 |
| OpenAlex (JSON-LD) | January 2022 (succeeded MAG, terminated 31 Dec 2021) | OurResearch (Heather Piwowar, Jason Priem) | open-data citation graph; JSON / JSON-LD over REST | Active, ~245M works under CC0; usage-based pricing introduced Feb 2026 | https://docs.openalex.org/ |
| OpenCitations / COCI | 2017 (COCI launch); latest dataset release 2026-02-17 | OpenCitations (Silvio Peroni, David Shotton) | open-data citation index; CSV / N-Triples / SCHOLIX / SPARQL | Active, ~450M DOI-to-DOI citation links | https://opencitations.net/index/coci |
| Wikidata bibliographic items (P-properties + JSON-LD) | 2014+ (WikiCite initiative) | Wikimedia Foundation; WikiCite community | open-data citation graph as Wikidata items | Active, ~40M+ scholarly works | https://www.wikidata.org/wiki/Wikidata:WikiProject_Source_MetaData |
| LCC / DDC / UDC | LCC 1897+, DDC 1876+, UDC 1905+ | Library of Congress / Melvil Dewey / Paul Otlet & Henri La Fontaine | classification schemes (hierarchical subject codes) | Active, used by virtually every library globally | https://www.loc.gov/catdir/cpso/lcco/ |
| ORCID iD + ORCID schema | 2012 | ORCID Inc. (community non-profit) | identifier system for researchers (16-digit iD + ORCID record schema in JSON / XML) | Active, ~20M registered iDs; required by many journals and funders | https://info.orcid.org/documentation/ |
| ROR (Research Organization Registry) | 2019 | ROR community (Crossref / DataCite / California Digital Library + others); now hosted by Crossref | identifier system for research organisations (ROR ID + JSON schema, v2 current in 2026) | Active, 120,000+ organisations | https://ror.org/ |
| BIBFRAME (Bibliographic Framework) | 2012 (initiative); BIBFRAME 2.0 in 2016 | Library of Congress | linked-data successor concept to MARC; RDF vocabulary | Active (in pilot at LC and major research libraries); slow long-term replacement of MARC | https://www.loc.gov/bibframe/ |
Notable threads
-
BibTeX’s longevity is structural, not nostalgic. A 1985 format remains the dominant academic citation standard in 2026 because the constraints that drove its design have not changed: TeX still dominates the publishing pipelines of mathematics, physics, computer science, and most of engineering; the data model (a flat record with typed fields) is dead-simple to parse with a hand-rolled lexer; and BibLaTeX (2006+) extended the field model — adding
@online,@software,@dataset, multi-language entries, Unicode names — without breaking the.bibfile syntax. The format’s calcification is its feature: the universe of.bibfiles written between 1985 and 2026 still parses with the same grammar. -
CSL XML is a Turing-complete-adjacent style language hidden in plain sight. A CSL style file is an XML program: it has variables (
<text variable="title"/>), conditionals (<choose><if .../><else-if .../><else/></choose>), iteration over name lists (<names><name .../></names>), recursion through<group>, locale-aware string lookup, and per-language pluralisation. The Zotero Style Repository hosts roughly 10,000 journal-specific styles, all written in this XML — Nature, Science, Cell, every IEEE/ACM/APA/Chicago variant. CSL 1.0.2 (October 2021) is the stable target; 1.0.1-dev experiments have been running for years but the spec has been deliberately frozen to keep the 10,000 styles parseable. -
The split between citation-data languages and bibliographic-record languages. BibTeX, RIS, CSL JSON, and CFF are citation-data languages — written by authors and reference managers, optimised for “I want to cite this thing in my paper.” MARC 21, MODS, MARCXML, JATS, and the DataCite/CrossRef schemas are bibliographic-record languages — written by cataloguers, publishers, and DOI agencies, optimised for “I want to describe this thing in a catalogue or registry.” The two camps overlap (a journal article appears in both), but their field models, controlled vocabularies, and audiences differ substantially. EndNote and Zotero straddle the divide by ingesting both and projecting CSL JSON outward.
-
MARC 21 as one of the oldest still-used data formats. MARC predates SQL by ~10 years (Avram’s first format went into pilot in 1968; SQL was specified at IBM in 1974). It was originally distributed on punched cards and 9-track magnetic tape, encoded as ISO 2709 with three control characters: field terminator
0x1E, record terminator0x1D, and subfield delimiter0x1F. The 2026 binary form is the same 1968 binary form. BIBFRAME (initiated 2012, BIBFRAME 2.0 in 2016) is the long-term linked-data successor, but adoption has been glacial — most ILS vendors still ship MARC-native code paths because every existing record in every existing catalogue is MARC. -
JATS as the silent giant of scholarly publishing. Authors don’t see it; reviewers don’t see it; even copy-editors usually don’t see it. But every journal article that ends up in PubMed Central is JATS XML, every modern journal-publishing pipeline (eLife’s Continuum, PKP’s OJS, Hindawi/Wiley/Elsevier internal pipelines) produces it, and every preservation system (Portico, CLOCKSS) ingests it. Z39.96-2024 (JATS 1.4, published 31 October 2024) is the current standard, replacing the 2021 version 1.3. JATS is the format in which the world’s scholarly literature is actually stored.
-
The DOI infrastructure rests on two metadata schemas. Crossref (founded 2000, ~150M registered DOIs) handles scholarly publications; DataCite (founded 2009, ~80M registered DOIs) handles research data and software. Crossref’s deposit schema 5.4.0 and DataCite’s kernel 4.7 (released 3 March 2026) are the XML/JSON contracts that every DOI registration must satisfy. Both publish complementary REST APIs and bulk public data files (Crossref’s 2026 public data file released early 2026; DataCite’s quarterly dumps), which together feed every modern citation tool.
-
CFF closes the software-citation gap. Through 2017, citing a piece of software in a paper meant either inventing a
@miscBibTeX entry or citing an associated paper, neither of which captured “this is the artifact, at this version, at this DOI/URL/Git ref.” CFF (Citation File Format, current 1.2.0) — a YAML file namedCITATION.cffat the root of a Git repository — solves this directly. GitHub renders a “Cite this repository” button when it sees one, and Zenodo + the Software Heritage archive consume it. CFF 1.2.0 added preferred-citation, multiple authors with ORCID iDs, and software-version semantics. -
The open-data wave: OpenAlex, OpenCitations, Wikidata. Until ~2020, the citation network was effectively private — owned by Web of Science (Clarivate) and Scopus (Elsevier), behind expensive subscriptions. Microsoft Academic Graph (MAG, 2015–2021) was the first large open alternative but Microsoft terminated it on 31 December 2021. OpenAlex (OurResearch, January 2022) inherited the role with ~245M works under CC0 by 2026; OpenCitations / COCI (latest release 2026-02-17) provides ~450M open DOI-to-DOI citation links; Wikidata’s WikiCite project models scholarly works as Wikidata items queryable via SPARQL. These three together are reshaping bibliometrics — what counts as a citation is now a public, queryable graph rather than a vendor’s database.
Citations
- BibTeX format reference (Patashnik): https://www.bibtex.org/Format/
- BibLaTeX on CTAN (current package): https://ctan.org/pkg/biblatex
- BibLaTeX upstream (Philip Kime): https://github.com/plk/biblatex
- CSL 1.0.2 specification (October 2021): https://docs.citationstyles.org/en/stable/specification.html
- Citation Style Language project: https://citationstyles.org/
- Zotero Style Repository (~10,000 CSL styles): https://www.zotero.org/styles
- CFF (Citation File Format) home: https://citation-file-format.github.io/
- CFF 1.2.0 schema guide: https://github.com/citation-file-format/citation-file-format/blob/main/schema-guide.md
- MARC 21 Format for Bibliographic Data (Library of Congress): https://www.loc.gov/marc/bibliographic/
- MARC Standards (Library of Congress, NDMSO): https://www.loc.gov/marc/
- MARCXML schema: https://www.loc.gov/standards/marcxml/
- MODS schema (Library of Congress): https://www.loc.gov/standards/mods/
- BIBFRAME 2.0 (Library of Congress): https://www.loc.gov/bibframe/
- Dublin Core (DCMI Metadata Terms): https://www.dublincore.org/specifications/dublin-core/dcmi-terms/
- JATS Journal Article Tag Suite (NCBI / NLM): https://jats.nlm.nih.gov/
- ANSI/NISO Z39.96-2024 (JATS v1.4, October 2024): https://www.niso.org/publications/z3996-2024-jats
- Crossref schema library (5.4.0): https://www.crossref.org/documentation/schema-library/
- Crossref 5.4.0 deposit schema announcement: https://www.crossref.org/blog/version-5.4.0-metadata-schema-update-now-available/
- DataCite Metadata Schema (kernel 4.7, March 2026): https://schema.datacite.org/
- DataCite 4.6 announcement (Dec 2024): https://datacite.org/blog/announcing-datacite-metadata-schema-4-6/
- OpenURL Z39.88-2004 (NISO): https://www.niso.org/publications/ansiniso-z3988-2004-r2010
- OpenAlex documentation: https://docs.openalex.org/
- OpenCitations / COCI: https://opencitations.net/index/coci
- ORCID documentation: https://info.orcid.org/documentation/
- ROR (Research Organization Registry): https://ror.org/
- ROR data structure (schema v2): https://ror.readme.io/docs/ror-data-structure
- RIS format reference: https://en.wikipedia.org/wiki/RIS_(file_format)
- BibJSON: https://okfnlabs.org/bibjson/