Patent / IP / Standards-Document DSLs Family Index


type: language-family-index family: patent-ip-standards languages_catalogued: 26 tags: [language-reference, family-index, patent-ip-standards, wipo-st96, st26, uspto-xml, epo-xml, niso-sts, rfc-xml-v3, ipc, cpc]

Patent / IP / Standards-Document — Family Index

Family overview

Patent / IP / standards-document DSLs are the set of XML vocabularies, classification grammars, and authoring markup languages used to encode the legal-technical artefacts of the intellectual-property and standards-development worlds. The dominant gravity well is WIPO Standard ST.96 — an XML schema family for industrial-property data covering patents, trademarks, designs, geographical indications, and copyright — currently at version 8.0, released October 2024, and explicitly not backwards-compatible with v7.0/v7.1. ST.96 is the slow-converging global standard intended to supersede the older WIPO ST.36 (patents), ST.66 (trademarks), and ST.86 (designs) format families. ST.96 v8.0 organises its components into Common, Patent, Trademark, Design, Geographical Indication, and Copyright sub-schemas under Annex III.

Running on a parallel track since 2022 is WIPO Standard ST.26, the XML format for biological sequence listings in patent applications. ST.26 went live with a coordinated international “big-bang” cutover on 2022-07-01, replacing the legacy plain-text ST.25 standard. Any patent application (including continuations and divisionals) with a filing date on or after that cutover must submit nucleotide and amino-acid sequence disclosures as a single ST.26 XML file — a hard procedural rule with large practical impact on biotech filings.

Each major patent office overlays ST.96 with local extensions. The USPTO has, since the 2024-01-17 fee surcharge, effectively mandated DOCX-based filing whose downstream XML conversion conforms to ST.96 (with the legacy PatFT/AppFT bulk-text formats and PAUS-XML ICE-DTD-based grants persisting as adjacent data products on data.uspto.gov). The EPO maintains its own EBD (EPO Bibliographic Data) XML stream plus the worldwide DOCDB bibliographic database and the INPADOC family-extended view; the JPO, CNIPA, and KIPO each publish ST.96-aligned XML for their national caseload. International routing systems — the PCT for patents, Madrid for trademarks, Hague for industrial designs — are administered by WIPO on top of the same ST.96 schema family.

The IP world also runs on a small constellation of classification languages — controlled vocabularies expressed as hierarchical codes: IPC (International Patent Classification, WIPO), CPC (Cooperative Patent Classification, joint EPO/USPTO, currently CPC 2026.05 in force as of 2026-04-07), NICE (trademark goods/services, currently 13th edition NCL 13-2026 since 2026-01-01), Vienna (trademark figurative elements), and Locarno (industrial-design products, currently LOC15 v2026 since 2026-01-01). Wholly separate from the IP universe but procedurally adjacent is the standards-document authoring world: the IETF’s RFC XML v3 vocabulary (RFC 7991, published 2016, frozen at xml2rfc 3.0.0 with the bis document slowly catching up to real-world usage), NISO STS 1.2 (ANSI/NISO Z39.102-2022, used by ISO/IEC and national SDOs as the canonical XML for standards), and the W3C-ecosystem markdown-ish authoring layers ReSpec and Bikeshed.

In our deep library

No standalone deep-library notes — this entire family is composed of XML vocabularies, controlled classification codes, and lightweight markdown wrappers rather than general-purpose languages.

Cross-reference:

  • document-typesetting — overlapping territory: OASIS DocBook for standards documents, XML-based authoring pipelines, the broader DITA / DocBook / TEI lineage.
  • citation-formats — IP filings cite prior art with their own conventions (USPTO 102/103 references, EPO X/Y/A category codes); ISO standards use ISO 690 citations.
  • api-description — ST.96, ST.26, NISO STS, and RFC XML v3 are all XML-Schema-described vocabularies; the schema-as-contract pattern is the same shape as OpenAPI/JSON-Schema.
  • notation-spec — formal-grammar overlap for the schema definitions themselves; CPC/IPC/NICE/Locarno/Vienna are essentially controlled-vocabulary mini-languages with hierarchical code grammars.
  • government-civictech — patent and trademark filings live at the legal/regulatory boundary; the procedural-XML pattern is shared with court e-filing (LegalXML / OASIS LegalDocML).
  • bio-fileformats — direct overlap: ST.26 biosequence listings encode the same nucleotide/amino-acid alphabets as FASTA/GenBank but inside a constrained XML envelope with mandatory feature qualifiers and INSDC-aligned vocabularies.

Tier 3 family table — WIPO standards

FormatFirst appearedOriginTypeStatus (2026)URL
WIPO ST.962009 (v1.0), v8.0 in 2024-10WIPO Committee on WIPO Standards (CWS), XML4IP Task ForceXML Schema family for all industrial property (patents, trademarks, designs, GIs, copyright)Current — v8.0 released Oct 2024; explicitly not backward-compatible with v7.xhttps://www.wipo.int/standards/en/st96/v8-0/
WIPO ST.362003WIPO SDWG → CWSOlder patent-only XML vocabularyLegacy — succeeded by ST.96; still in some bulk-data products and historical archiveshttps://www.wipo.int/en/web/standards/standards
WIPO ST.662008WIPOOlder trademark-only XML vocabularyLegacy — succeeded by ST.96 trademark componentshttps://www.wipo.int/en/web/standards/standards
WIPO ST.862008WIPOOlder industrial-design XML vocabularyLegacy — succeeded by ST.96 design componentshttps://www.wipo.int/en/web/standards/standards
WIPO ST.262018 (recommendation), 2022-07-01 mandateWIPO SEQL Task ForceXML for biosequence listings in patent applications (nucleotide + amino-acid disclosures)Mandatory — replaced ST.25 on 2022-07-01 “big-bang” date for all new filings worldwidehttps://www.wipo.int/en/web/standards/sequence
WIPO ST.251998WIPOPlain-text sequence-listing standardRetired — replaced by ST.26 on 2022-07-01 for new filingshttps://www.wipo.int/en/web/standards/sequence/faq
WIPO ST.972020WIPO CWSXML for general IP-related documents (annexes, correspondence)Active — narrower scope than ST.96https://www.wipo.int/en/web/standards/standards
WIPO ST.50-series1970s–presentWIPOBibliographic-data recommendations (ST.50 patent corrections, ST.9 bibliographic data, ST.8 IPC)Active — referenced by EPO EBD and most patent-office XMLhttps://www.wipo.int/en/web/standards/standards

Tier 3 family table — Patent office XML

FormatFirst appearedOriginTypeStatus (2026)URL
USPTO Patent XML (ICE DTD) / PAUS-XML~2002USPTODTD-based XML for grants and applications via the ICE (International Common Element) DTDActive — bulk data on data.uspto.gov; ST.96-aligned conversion now flows from DOCX filinghttps://developer.uspto.gov/product/patent-grant-full-text-dataxml
USPTO DOCX filing2021 voluntary, 2024-01-17 surcharge for non-DOCXUSPTOOOXML (.docx) specification + claims + abstract; converted to ST.96 XML downstreamMandatory in practice for new utility nonprovisional applications since 2024-01-17https://www.uspto.gov/patents/apply/patent-center
USPTO PatFT / AppFT bulk text1970s archives, web 2001USPTOLegacy plain-text-with-tags bulk patent / application dataLegacy — Patent Public Search replaced the PatFT/AppFT web UIs in 2022; bulk text archives still distributedhttps://www.uspto.gov/sites/default/files/documents/Search-field-conversion-PatFT-AppFT-QRG-Patent-Public-Search.pdf
EPO EBD (Bibliographic Data XML)early 2000sEuropean Patent OfficeWeekly XML feed of new EP applications/specifications and amendmentsActive — ST.8/ST.9-conformant XML; pre-2005 backfile may still be SGMLhttp://docs.epoline.org/ebd/xmlinfo.htm
EPO DOCDB1968 (paper), XML form post-2000EPOWorldwide patent bibliographic master database, distributed in XMLActive — backbone of EPO/Espacenet and many third-party patent-search productshttps://www.epo.org/en/searching-for-patents/data
INPADOC1972EPO (originally International Patent Documentation Centre, Vienna)International patent legal-status + family data, distributed as XMLActive — INPADOC families extend DOCDB families by shared-priority linkagehttps://www.epo.org/en/searching-for-patents/data
EPO Patent Information Export Format2010sEPOBulk-export wrapper formats around EBD/DOCDBActive — for licensed bulk subscribershttps://www.epo.org/en/searching-for-patents/data
JPO XML2000sJapan Patent OfficeNational patent/utility-model/design/trademark XML; progressively migrating to ST.96Activehttps://www.jpo.go.jp/e/system/laws/sesaku/data/datapolicy.html
CNIPA XML2010sChina National Intellectual Property AdministrationChinese patent and trademark XML; ST.96 alignment in progressActive — largest filing volume in the worldhttps://english.cnipa.gov.cn/
KIPO XML2000sKorean Intellectual Property OfficeKorean patent XML; ST.96 alignment in progressActivehttps://www.kipo.go.kr/en/
PCT XML (Patent Cooperation Treaty)2000sWIPO (PCT Operations)XML for international PCT applications under WIPO administrationActive — built on ST.96 patent componentshttps://www.wipo.int/pct/en/

Tier 3 family table — IP classification systems

FormatFirst appearedOriginTypeStatus (2026)URL
IPC (International Patent Classification)1971 (Strasbourg Agreement)WIPOHierarchical patent classification (sections A–H → ~80,000 entries)Active — IPC 2026.01 in force from 2026-01-01https://www.wipo.int/classifications/ipc/en/
CPC (Cooperative Patent Classification)2013EPO + USPTO (jointly maintained)Extension of IPC with much finer granularity (~260,000 entries)ActiveCPC 2026.05 entered force 2026-04-07; rolling quarterly-ish releaseshttps://www.cooperativepatentclassification.org/
NICE Classification1957 (Nice Agreement)WIPOInternational classification of goods and services for trademark registration (45 classes)Active13th edition NCL 13-2026 in force from 2026-01-01, replacing NCL 12-2025https://www.wipo.int/en/web/classification-nice
Vienna Classification1973 (Vienna Agreement)WIPOClassification of the figurative elements of marksActive — used alongside NICE for figurative-mark searcheshttps://www.wipo.int/classifications/vienna/en/
Locarno Classification1968 (Locarno Agreement)WIPOInternational classification for industrial designs (32 classes)ActiveLOC15 version 2026 product list since 2026-01-01https://www.wipo.int/en/web/classification-locarno

Tier 3 family table — International registration systems

FormatFirst appearedOriginTypeStatus (2026)URL
Madrid System XML1891 treaty, modern XML 2000sWIPO (Madrid Registry)XML for international trademark filings under the Madrid Protocol; eMadrid + Madrid e-Filing pipelinesActive — 115 members covering 131 countries as of May 2025https://www.wipo.int/en/web/madrid-system
Hague System XML1925 treaty, modern XML 2000sWIPO (Hague Registry)XML for international industrial-design filings under the Hague AgreementActivehttps://www.wipo.int/en/web/hague-system
TRIPS Agreement (adjacent)1995WTOTreaty framework (not a data format) — minimum substantive IP standards binding on WTO membersActive — legal substrate for all the abovehttps://www.wto.org/english/tratop_e/trips_e/trips_e.htm
EPC Article 7 / Implementing Regulations (adjacent)1973 (EPC), 2000 revisionEPO contracting statesTreaty + implementing regulations governing European patents (not a data format)Active — legal substrate for EPO procedurehttps://www.epo.org/en/legal/epc

Tier 3 family table — Standards-document authoring

FormatFirst appearedOriginTypeStatus (2026)URL
NISO STS 1.22017 (STS 1.0), 2022 (STS 1.2 as ANSI/NISO Z39.102-2022)NISO Standards Tag Suite WGXML for full text + metadata of standards documents; derived from JATS (Z39.96)Current — STS 1.2 fully backward-compatible with 1.0; used by ISO/IEC and national SDOshttps://www.niso.org/standards-committees/sts
IETF RFC XML v3 / xml2rfc v32016 (RFC 7991)IETF (Paul Hoffman et al.)XML vocabulary for authoring RFCs; replaced the long-running v2 vocabularyActive but in maintenance — xml2rfc 3.0.0 froze the grammar; RFC 7991bis catching up to real-world dialecthttps://datatracker.ietf.org/doc/html/rfc7991
xml2rfc (tool)early 2000s (v1/v2), 2018+ (v3)IETF ToolsReference processor for RFC XML; converts to PDF/HTML/text RFC outputsActive — Python package, used by RFC Production Centerhttps://github.com/ietf-tools/xml2rfc
Kramdown-RFC2010sCarsten Bormann (IETF community)Markdown + extensions that compile to RFC XML v3Active — the de facto path for non-XML-native IETF authorshttps://github.com/cabo/kramdown-rfc
W3C ReSpec2009+Robin Berjon → W3C communityHTML+JS in-browser preprocessor that renders into a W3C/WHATWG-style specVery active — widely used for W3C TR documents and many community specshttps://respec.org/docs/
W3C Bikeshed2014+Tab Atkins (CSS WG)Source-to-spec preprocessor (Python); markdown-ish input → HTML spec with boilerplate, bibliography, indexesVery active — used by CSS specs, many other W3C WGs, WHATWG, and the C++ standards committeehttps://github.com/speced/bikeshed
OASIS DocBook (for standards)1991HaL Computer Systems + O’Reilly → OASISGeneral technical-documentation XML; sometimes used for standardsActive — overlaps with document-typesettinghttps://docbook.org/

Notable threads

  • WIPO ST.96 is the slowly-converging global standard. ST.96 v8.0 (Oct 2024) is the eighth major revision in fifteen years, and version cadence reflects the fundamental tension of any inter-office XML standard: every national office runs ST.96 with local extensions (PAUS at USPTO, EPO bibliographic profiles, JPO/CNIPA/KIPO national wrappers) and every breaking schema change forces coordinated office-side migration. The explicit “not backward-compatible with v7” warning on v8.0 is telling — schema simplifications and namespace cleanups still ship as breaking changes a decade and a half into the standard’s life. ST.96 effectively eats ST.36 (patents), ST.66 (trademarks), and ST.86 (designs) on a slow trajectory.

  • The ST.26 biosequence cutover (2022-07-01) was the largest mandatory IP-format migration in decades. ST.25 had been the plain-text status quo since 1998 and was unable to represent branched sequences, D-amino acids, nucleotide analogues, or many of the modifications routine in modern molecular biology. The “big-bang” date was coordinated across WIPO, USPTO, EPO, JPO, CNIPA, KIPO, and PCT-receiving offices simultaneously — a rare instance of a hard global cutover in IP procedure. A subtle rule: any continuation or divisional filed on or after 2022-07-01 must use ST.26 even if the parent was an ST.25 filing, which has caused real procedural pain in the biotech prosecution world. ST.26 is also a rare ST.96-adjacent standard with its own dedicated authoring tool ecosystem (WIPO Sequence, WIPO Sequence Validator).

  • The patent-office XML federation overlays ST.96 with local extensions. The reference architecture is “ST.96 common components + office-specific extension schemas in a separate namespace.” USPTO’s PAUS-XML layer carries 35 U.S.C. statutory metadata; EPO EBD encodes EPC procedural events; JPO/CNIPA/KIPO carry national bibliographic identifiers. Bulk-data consumers (Patent Analytics PATSTAT, Google Patents, lens.org, IPlytics, etc.) generally normalise these into a common internal model rather than processing each office’s dialect raw. Pre-2005 EPO data may still arrive in SGML rather than XML — a reminder that “the international patent corpus” is decades-deep and format-heterogeneous.

  • IP classification systems are controlled-vocabulary mini-languages with hierarchical code grammars. IPC (~80k entries), CPC (~260k entries, EPO+USPTO joint), NICE (45 trademark classes), Vienna (figurative elements), Locarno (32 design classes). They are not “programming languages” but they are formal languages — each code has a strict positional grammar (e.g., CPC H04L 9/00 decomposes into section/class/subclass/group/subgroup) and revisions ship on calendar schedules: IPC 2026.01, CPC 2026.05, NICE 13-2026, Locarno LOC15-2026 are all in force as of mid-2026. Tooling around these classifications (concordance tables, IPC↔CPC mappings, classification-prediction ML models) is its own active ecosystem.

  • IETF RFC XML v3 (RFC 7991, 2016) replaced the long-running v2 vocabulary but the migration has been bumpy. The xml2rfc reference processor diverged from the published RFC 7991 grammar during deployment, the grammar was frozen at xml2rfc 3.0.0 to stop the bleeding, and RFC 7991bis is slowly catching the spec up to the dialect that real RFCs actually use — work that paused during the RSWG/RFC 9280 governance overhaul and is resuming in 2025–2026 under the new RFC change-management team. In practice most authors now write Kramdown-RFC (Carsten Bormann’s Markdown-with-extensions front end) and let it emit RFC XML v3 for them.

  • W3C ReSpec and Bikeshed displaced raw HTML for spec authoring. Both took over from the older XMLSpec / pure-HTML-with-stylesheets approach in the early 2010s. ReSpec is HTML+JS that runs in the browser (the spec file is the spec, transformed on load); Bikeshed is a Python preprocessor with a markdown-ish source format. Bikeshed dominates in CSS, WHATWG, and the C++ committee; ReSpec dominates in the broader W3C TR pipeline. The W3C spec-prod GitHub Action standardises CI/CD for both — build, validate, publish to GH Pages and/or w3.org via Echidna.

  • NISO STS is the JATS-for-standards story. STS 1.2 (ANSI/NISO Z39.102-2022) is derived from JATS (Z39.96, the dominant scholarly-article XML), extended with normative-content structures and adoption/translation patterns. ISO and IEC use STS as the basis of their XML production pipelines; national SDOs (ANSI, BSI, DIN, JISC) follow. STS sits in the same conceptual niche as ST.96 but for normative standards rather than IP rights — and they don’t overlap in scope, despite both being XML-Schema-described technical-document vocabularies maintained by international bodies.

  • The slow XML→JSON drift is real but partial. Some IP systems (search APIs, bulk-data overlays, modern e-filing UIs) increasingly expose JSON alongside XML, but the canonical-document format remains XML in essentially every patent office and every standards body. The reason is regulatory: XML Schema is the validation language that procedural law has been written to assume, and the migration cost of switching every implementing regulation to reference a JSON Schema is greater than the developer-ergonomic upside. Expect XML primacy in this domain through at least the late 2020s.

Citations