Linguistic-Resource Publishing / Lexicon DSLs Family Index

type: language-family-index family: linguistic-resources languages_catalogued: 28 tags: [language-reference, family-index, linguistic-resources, ontolex-lemon, lmf, tei-lex0, lift, wn-lmf, cldf, dbnary, glottolog, olac, bcp47]

Linguistic-Resource Publishing / Lexicon / Dictionary — Family Index

Family overview

Linguistic-resource publishing DSLs are the textual and structured vocabularies used to encode digital dictionaries, terminology bases, wordnets, morpheme inventories, and language-documentation archives. They sit at the intersection of three different traditions that historically did not talk to each other: the lexicographic-publishing tradition (Oxford, Merriam-Webster, Brill — born from typesetting and now mostly XML/JSON), the field-linguistics tradition (SIL Toolbox SFM, FieldWorks FLEx, LIFT — born from missionary linguistics in the 1980s and still anchoring endangered-language documentation), and the computational-lexicon tradition (WordNet, OntoLex-Lemon, ISO LMF — born from NLP and the Semantic Web). The 2020s have seen a partial convergence around two attractors: OntoLex-Lemon on the linked-data side and TEI Lex-0 on the document-encoding side, with ISO LMF 24613 as the formal-standards reference.

OntoLex-Lemon is the de facto W3C model for lexicons as linked data, published as a W3C Final Community Group Report in May 2016 (with the Lexicography module lexicog added as a separate Final CG Report in September 2019). It is not a W3C Recommendation (the Community Group track does not produce Recommendations), but it functions as the modern standard: every major linked-data lexicon (DBnary, Apertium-RDF, BabelNet, the European Language Resources Coordination’s outputs) publishes in OntoLex. Modular structure separates a ontolex core (forms, lexical entries, senses, references to ontology concepts) from lime (lexicon metadata), vartrans (translation and variation), decomp (morphological decomposition), morph (inflection), syn (syntactic frames), lexicog (lexicographic structure), and the newer frac (frequency, attestation, corpus — still a working draft as of 2026, not yet a Final CG Report).

ISO LMF (Lexical Markup Framework, ISO 24613) went through a long modular split between 2019 and 2024: the legacy 2008 monolithic standard was retired in favor of six parts — Part 1 Core model (ISO 24613-1:2024), Part 2 Machine-Readable Dictionary (24613-2:2020), Part 3 Etymological Extension (24613-3:2021), Part 4 TEI Serialisation (24613-4:2021), Part 5 Lexical Base Exchange / LBX (24613-5:2022), and Part 6 Syntax and Semantics / SynSem (24613-6:2024). The TEI-serialisation part (24613-4) is the formal bridge from LMF to TEI Lex-0, and the LBX serialisation (24613-5) is the practical XML exchange format used by terminology vendors and EU-funded projects (ELEXIS, European Lexicographic Infrastructure).

The SIL legacy stack persists in field linguistics despite the linked-data wave: SFM (Standard Format Markers, 1980s line-based \lx … \ge … markers) feeds Toolbox; FieldWorks FLEx (.NET application, current 9.x) is the modern descendant; LIFT (Lexicon Interchange Format), currently at v0.13 and maintained via the SIL.Lift NuGet package, is the cross-tool XML exchange format between FLEx, WeSay, Lexique Pro, and Dictionary App Builder. Parallel to all of this, Wikibase Lexemes (Wikidata’s Lexeme/Form/Sense entity model, with dedicated wikibase-lexeme, wikibase-form, wikibase-sense datatypes) have become the largest crowdsourced structured-lexicon corpus, and DBnary (Gilles Sérasset, LIG/Grenoble) republishes Wiktionary as OntoLex-Lemon RDF twice a month across 26+ language editions. The wordnet world has consolidated around the Global WordNet Association’s WN-LMF XML schema (current 1.4 as of 2024), with the Open Multilingual Wordnet (OMW) as the canonical multi-language distribution, and CLDF (Cross-Linguistic Data Formats) — current 1.3 — has become the lingua franca for typological and comparative-linguistics datasets (WALS, PHOIBLE, Glottolog itself).

In our deep library

None of the formats in this family have standalone Tier-1/2 deep-library notes — they are XML/RDF/TSV exchange vocabularies hosted in general-purpose carrier languages.

Cross-reference:

nlp-corpus — sibling family for corpus and annotation formats (TEI body, CoNLL-U, Universal Dependencies, ELAN, PropBank/FrameNet). The line is fuzzy: ELAN .eaf, CHILDES CHAT, and LMF itself appear in both indexes because dictionaries and corpora share serialisation.
voice-phonetics — pronunciation-lexicon overlap (CMUdict, IPA in <pron> elements of TEI Lex-0 and OntoLex lexinfo:pronunciation).
semantic-web — OntoLex-Lemon is built on RDF/OWL/SKOS; MMoOn, GOLD, and CIDOC CRM are all OWL ontologies and live conceptually inside the Semantic Web stack.
i18n-locale — TBX (terminology) and BCP 47 language tags overlap heavily; the i18n index covers them from the software-localisation angle, this index from the dictionary/terminology-publishing angle.
citation-formats — MARC for library cataloging; OAI-PMH harvest protocol used by OLAC for language-archive metadata.
api-description — XML-schema and JSON-schema underpinnings (LIFT XSD, WN-LMF DTD, OntoLex Turtle).
notation-spec — formal-grammar adjacent for morphological-rule formalisms inside LMF Morphology / MMoOn.

Tier 3 family table — Linked-data lexicon (RDF / OntoLex-Lemon)

Format	First appeared	Origin	Type	Status (2026)	URL
OntoLex-Lemon core	2016 (Final CG Report, May 2016)	W3C Ontology-Lexica Community Group (John P. McCrae et al.)	RDF/OWL model for lexicons; classes `ontolex:LexicalEntry`, `ontolex:Form`, `ontolex:LexicalSense`, `ontolex:LexicalConcept`	De facto standard; the modern linked-data lexicon model. Not a W3C Recommendation (CG track) but cited as the reference everywhere	https://www.w3.org/2016/05/ontolex/
OntoLex lime (Lexicon Metadata)	2016	W3C OntoLex CG	Metadata module (`lime:Lexicon`, `lime:entries`, `lime:language`)	Active, Final CG Report	https://www.w3.org/2016/05/ontolex/#metadata-lime
OntoLex vartrans (translation/variation)	2016	W3C OntoLex CG	Translation, term-variation, and lexical-relation module	Active	https://www.w3.org/2016/05/ontolex/#variation-translation-vartrans
OntoLex decomp + morph	2016 (decomp) / 2019+ (morph)	W3C OntoLex CG	Morphological decomposition and inflection paradigms	Active; morph is the newer of the two	https://www.w3.org/2016/05/ontolex/#morphology
OntoLex synsem	2016	W3C OntoLex CG	Syntax–semantics interface for predicates	Active	https://www.w3.org/2016/05/ontolex/#syntax-and-semantics-synsem
OntoLex lexicog (Lexicography module)	2019 (Final CG Report, Sept 2019)	W3C OntoLex CG	Lexicographic-structure module — entries, sub-entries, ordering, dictionary metadata; the canonical “linked-data dictionary” layer	Active, Final CG Report	https://www.w3.org/2019/09/lexicog/
OntoLex FrAC (Frequency, Attestation, Corpus)	2018+ (in development), still Working Draft as of 2026	W3C OntoLex CG (Christian Chiarcos et al.)	Corpus-derived frequencies, attestations, embedding pointers	Working draft, not yet Final CG Report	https://ontolex.github.io/frequency-attestation-corpus-information/
DBnary	2012 (Sérasset, LIG/Grenoble)	INRIA/Université Grenoble Alpes	Wiktionary republished as OntoLex-Lemon RDF; bi-monthly dumps synced to Wikimedia Wiktionary dumps; 26+ language editions as of 2024	Very active	http://kaiko.getalp.org/about-dbnary/
Wikibase Lexeme (Wikidata)	2018 (Lexeme namespace launched on Wikidata)	Wikimedia Deutschland	First-class Wikibase entity types `Lexeme` (L-IDs), `Form` (F-IDs), `Sense` (S-IDs); dedicated datatypes `wikibase-lexeme`, `wikibase-form`, `wikibase-sense`	Very active; the largest crowdsourced structured-lexicon corpus	https://www.wikidata.org/wiki/Wikidata:Lexicographical_data
MMoOn Core (Multilingual Morpheme Ontology)	2016 (initial) / 2021 (Semantic Web journal publication)	AKSW Leipzig (Bettina Klimek et al.)	OWL ontology for morpheme-level inventories; sub-word-level analogue to OntoLex	Active research-grade; not standardised	https://mmoon.org/
GOLD (General Ontology for Linguistic Description)	2003	LinguistList / U. Arizona (Farrar, Langendoen)	OWL ontology for descriptive-linguistics categories (parts of speech, grammatical features); designed for endangered-language fieldwork	Stable/legacy reference; widely cited but not actively versioned	https://linguistics-ontology.org/
CIDOC CRM (ISO 21127:2023)	1996 (CIDOC), ISO 21127:2006 → 2014 → 2023; current community version 7.1.3	International Council of Museums (ICOM)	OWL/RDFS ontology for cultural-heritage records; 81 classes, 160 properties; not lexicon-specific but used for archive metadata around dictionaries and manuscripts	Active; ISO 21127:2023 is current	https://cidoc-crm.org/
Lexvo.org / lexinfo.net	2008+ (Lexvo, McCrae)	DERI Galway / INSIGHT Centre	Companion vocabularies: Lexvo (URIs for languages, scripts, terms) and LexInfo (linguistic-category vocabulary used inside OntoLex)	Active, low-velocity maintenance	https://www.lexinfo.net/

Tier 3 family table — TEI / XML dictionary

Format	First appeared	Origin	Type	Status (2026)	URL
TEI Lex-0	2018 (DARIAH Working Group on Lexical Resources)	DARIAH-EU, Toma Tasovac et al.	Constrained TEI subset for dictionary encoding — fixes the underspecified parts of TEI Chapter 9; canonical baseline for born-digital + retro-digitised dictionaries	Active; current release July 2025 (versioned, ODD-driven schema generation)	https://lex-0.org/
TEI P5 dictionary module (Chapter 9)	1990 (TEI P1) → P5 (2007) → 4.10.0 (Aug 2025) → 4.x ongoing	Text Encoding Initiative Consortium	Full TEI dictionary vocabulary: `<entry>`, `<form>`, `<gramGrp>`, `<sense>`, `<cit>`, `<def>` — flexible but underspecified, hence the need for Lex-0	Active; the parent standard; current Guidelines 4.10.0	https://tei-c.org/release/doc/tei-p5-doc/en/html/DI.html
LMF Part 1 — Core model (ISO 24613-1:2024)	2008 (original ISO 24613), revised 2024	ISO TC 37 / SC 4	UML metamodel for lexical resources; class hierarchy LexicalResource → Lexicon → LexicalEntry → Sense	Current, published 2024	https://www.iso.org/standard/82014.html
LMF Part 2 — Machine-Readable Dictionary / MRD (ISO 24613-2:2020)	2020	ISO TC 37 / SC 4	Specialisation for general-purpose dictionaries	Current	https://www.iso.org/standard/72100.html
LMF Part 3 — Etymological Extension (ISO 24613-3:2021)	2021	ISO TC 37 / SC 4	Etymology, cognates, borrowing chains	Current	https://www.iso.org/standard/72101.html
LMF Part 4 — TEI Serialisation (ISO 24613-4:2021)	2021	ISO TC 37 / SC 4 + TEI liaison	Normative TEI serialisation of LMF (bridge to TEI Lex-0)	Current	https://www.iso.org/standard/72102.html
LMF Part 5 — LBX Lexical Base Exchange (ISO 24613-5:2022)	2022	ISO TC 37 / SC 4	XML exchange-format serialisation; the practical interchange syntax	Current	https://www.iso.org/standard/72099.html
LMF Part 6 — Syntax and Semantics / SynSem (ISO 24613-6:2024)	2024	ISO TC 37 / SC 4	Predicate–argument structures, semantic frames	Current, newest part	https://www.iso.org/standard/83180.html

Tier 3 family table — SIL legacy / FLEx / LIFT / wordlist

Format	First appeared	Origin	Type	Status (2026)	URL
SFM (Standard Format Markers)	early 1980s	SIL International (Shoebox/Toolbox lineage)	Line-based `\marker value` plaintext; configurable marker hierarchies; the substrate of Toolbox	Legacy but widely-deployed in field linguistics	https://software.sil.org/toolbox/
MDF (Multi-Dictionary Formatter)	1990s	SIL (Toolbox shipping standard)	A standardised SFM dialect — agreed marker set (`\lx`, `\ps`, `\ge`, `\dt`) for typological consistency across field projects	Legacy/maintenance	https://software.sil.org/toolbox/
FieldWorks FLEx XML	2007+ (FLEx 1.0), current 9.x	SIL International	Native FLEx data format (XML; project files); the modern successor to Toolbox	Active; FLEx remains the dominant field-linguistics workbench	https://software.sil.org/fieldworks/
LIFT (Lexicon Interchange Format)	2007, current v0.13	SIL International	XML cross-tool dictionary-exchange format; used by FLEx, WeSay, Lexique Pro, Dictionary App Builder; `SIL.Lift` NuGet package is current implementation	Active, the de facto SIL-ecosystem interchange	https://github.com/sillsdev/lift-standard
OpenDictionary / SIL Toolbox project format	1990s	SIL	Project file bundles (.prj + SFM data) for Toolbox	Legacy	https://software.sil.org/toolbox/
ELAN .eaf	2002+, current ELAN 7.x (April 2026)	Max Planck Institute for Psycholinguistics, Nijmegen	XML time-aligned multimedia transcription/annotation; tier hierarchy with constraints; cross-listed in nlp-corpus	Active, the workhorse for endangered-language documentation	https://archive.mpi.nl/tla/elan
CHILDES CHAT	1984+	TalkBank / Carnegie Mellon (Brian MacWhinney)	Plain-text transcription convention for child-language acquisition; cross-listed in nlp-corpus	Active, central in CHILDES/TalkBank archives	https://talkbank.org/manuals/CHAT.html

Tier 3 family table — Wordnet / terminology / Japanese-dictionary

Format	First appeared	Origin	Type	Status (2026)	URL
WN-LMF (Global Wordnet LMF)	2013 (WN-LMF 1.0), current WN-LMF 1.4 (2024)	Global WordNet Association	DTD-defined XML schema for wordnets — synsets, senses, ILI (Inter-Lingual Index) linking	Active; 1.4 current; cross-listed in nlp-corpus	https://globalwordnet.github.io/schemas/
Princeton WordNet native database	1985+ (Miller et al.), 3.1 (2011, last canonical release)	Princeton CSL	Native flat-file format (`data.noun`, `index.noun`, etc.); the original WordNet exchange substrate	Frozen at 3.1; Open English WordNet now extends it	https://wordnet.princeton.edu/
Open English WordNet	2019+, ongoing yearly releases	Global WordNet Association (John P. McCrae et al.)	The actively-maintained successor to Princeton WN 3.1; published as WN-LMF + JSON + RDF	Active, the canonical English wordnet today	https://en-word.net/
OMW-EN / OMW JSON	2010+ (OMW), JSON form 2018+	NTU + GWA (Francis Bond et al.)	Open Multilingual Wordnet — JSON and WN-LMF distributions across 150+ wordnets	Active	https://omwn.org/
TBX (TermBase eXchange, ISO 30042:2019)	2002 (LISA), ISO 2008 → ISO 30042:2019 v3, revision in progress (ISO/AWI 30042)	ISO TC 37 / LISA legacy	XML terminology-exchange standard; concept-oriented; cross-listed in i18n-locale	Active; v3 current; new revision in drafting	https://www.iso.org/standard/62510.html
DatCatInfo (Data Category Repository)	2019 (successor to ISOcat)	LTAC Global / TerminOrgs, ISO TC 37 liaison	Web-accessible repository of standardised data categories (POS values, gender, number, etc.) per ISO 12620	Active; replaces retired ISOcat	https://datcatinfo.net/
ISOcat (legacy)	2009 → frozen 2014	ISO TC 37	Original Data Category Registry per ISO 12620:2009; categories migrated to DatCatInfo and CLARIN Concept Registry	Retired; consult successors	https://www.clarin.eu/news/concept-revival-isocat-clarin-concept-registry
JMdict XML	1999 (Jim Breen, EDRDG)	Electronic Dictionary Research and Development Group	UTF-8 XML Japanese-English multilingual dictionary; daily releases; multiple kanji + readings + glosses per entry	Very active; the canonical OSS Japanese dictionary	https://www.edrdg.org/jmdict/edict.html
EDICT / EDICT2	1991 (EDICT), 2003 (EDICT2)	Jim Breen / EDRDG	Plain-text EUC-JP (EDICT) / enhanced text (EDICT2) Japanese-English dictionary; legacy format derived from JMdict	Legacy (provided for older apps); JMdict XML is canonical	https://www.edrdg.org/jmdict/edict.html
KANJIDIC / KANJIDIC2	1991+, current KANJIDIC2 XML	EDRDG	Per-kanji metadata (readings, meanings, stroke counts, JIS/Unicode codepoints, dictionary cross-references)	Active	https://www.edrdg.org/wiki/KANJIDIC_Project

Tier 3 family table — Language codes / archives / comparative

Format	First appeared	Origin	Type	Status (2026)	URL
ISO 639-1 / 639-2 / 639-3 / 639-5	1967 (639-1 → ISO 639) / 1998 (639-2) / 2007 (639-3) / 2008 (639-5)	ISO TC 37 / SC 2; SIL is RA for 639-3	2-letter (639-1), 3-letter (639-2, -3), and family codes (639-5); 639-3 covers ~7,800 individual languages	Active; Q1 2026 SIL change requests applied; 639-3 is the workhorse	https://iso639-3.sil.org/
BCP 47 / RFC 5646 language tags	RFC 5646 (Sept 2009), still current; companion RFC 4647 for matching	IETF (Phillips, Davis)	Composition of ISO 639 + ISO 15924 script + ISO 3166-1 region + private use + variants; the Web/HTML/XML lang-tag standard	Active; the de facto language identifier on the Web	https://www.rfc-editor.org/rfc/bcp/bcp47.txt
Glottocode (Glottolog languoid ID)	2011+, current Glottolog 5.3 (2026)	MPI EVA Leipzig (Hammarström, Forkel et al.)	8-character ID (e.g. `stan1288`) for every languoid (family, language, dialect); fills gaps in ISO 639-3 (covers extinct, undocumented, and unclassified varieties); available as CLDF + JSON + RDF	Active; complement to 639-3, not replacement	https://glottolog.org/
OLAC metadata	2000+	Open Language Archives Community (Bird, Simons)	XML metadata format extending Dublin Core (all 15 DC elements + community qualifiers); harvested via OAI-PMH; integrated with Linguistic Linked Open Data Cloud (2016)	Active; central to language-archive interop	http://www.language-archives.org/OLAC/metadata.html
OAI-PMH (harvest protocol)	2002 (OAI 2.0)	Open Archives Initiative	HTTP harvest protocol — OLAC archives expose metadata via OAI-PMH endpoints; cross-listed adjacent to citation-formats	Active	https://www.openarchives.org/OAI/openarchivesprotocol.html
CLDF (Cross-Linguistic Data Formats)	2018 (Forkel et al., Scientific Data); current CLDF 1.3	Glottobank consortium (MPI SHH / EVA, ERC CALC)	CSV-on-the-Web (CSVW) profile for comparative-linguistics data — wordlists, structure datasets (WALS), phoneme inventories (PHOIBLE), Glottolog itself; JSON-LD metadata + CSV tables	Active; the lingua franca of comparative linguistics	https://cldf.clld.org/
Wiktionary template syntax	2002+ (Wiktionary)	Wikimedia Foundation	MediaWiki templates inside Wiktionary entries (`{{lb}}`, `{{l}}`, `{{m}}`, `{{tt}}`, language-specific headers); the substrate of the world’s largest free dictionary	Active; complemented by Wikibase Lexemes for structured data	https://en.wiktionary.org/wiki/Wiktionary:Templates

Notable threads

OntoLex-Lemon as the de facto linked-data lexicon standard. Although it never advanced to W3C Recommendation (the Community Group track does not produce Recs), OntoLex-Lemon’s Final Community Group Report (May 2016) plus the lexicog extension (September 2019) function as the modern standard. Every major linked-data lexicography project — DBnary, Apertium-RDF, BabelNet, the European Language Resources Coordination outputs, Wikidata’s own data export — publishes in OntoLex. The trick that made it dominant was modularity: a thin core for forms/senses/concepts, with optional lime, vartrans, decomp, morph, synsem, and lexicog modules picked à la carte. The FrAC module (frequency, attestation, corpus) remains a Working Draft as of May 2026 — the unfinished frontier is corpus-derived evidence.
The long shadow of SIL Toolbox SFM (1980s) in field linguistics. SFM’s \marker value line-based format predates XML by a decade and remains the substrate of an enormous installed base of field-linguistic data. MDF standardised the marker set, FLEx replaced the Toolbox application, and LIFT became the XML export format — but huge legacy SFM corpora still exist in researcher and missionary archives. The persistence is partly cultural (linguistics PhDs trained on Toolbox keep using it) and partly technical (SFM is human-readable in a text editor, which matters in low-connectivity field settings where binary FLEx project files are fragile). LIFT 0.13 is the cross-tool migration path for everyone who has finally moved off SFM.
LIFT as the cross-tool interchange that mostly succeeded. Where OntoLex-Lemon won the linked-data world, LIFT won the SIL-ecosystem world: it is what flows between FLEx (the editor), WeSay (the lightweight tablet/laptop entry tool), Lexique Pro (the publication tool), Dictionary App Builder (mobile-app generator), and Webonary (the web-publishing tool). It does not aspire to round-trip every FLEx feature — FLEx’s native XML is richer — but it carries the dictionary “Send/Receive” workflow that is the actual collaboration model for distributed-field projects with intermittent connectivity. v0.13 has been stable since the late 2010s; the SIL.Lift NuGet package is the canonical implementation.
Wikibase Lexemes as Wikidata’s growing structured-dictionary layer. The Lexeme/Form/Sense entity types launched in 2018 added a third Wikibase entity dimension alongside Items (Q-IDs) and Properties (P-IDs). Each Lexeme has L-IDs, with sub-entities for Forms (F-IDs, one per inflected surface form) and Senses (S-IDs). The three dedicated datatypes — wikibase-lexeme, wikibase-form, wikibase-sense — let properties on Items link to lexical data, and vice versa. This has produced the largest crowdsourced multilingual structured lexicon ever, growing fastest in languages underserved by commercial dictionaries. Wikibase Lexemes are SPARQL-queryable on the Wikidata Query Service and exportable to OntoLex via the Lexicographical data ontology mapping.
CLDF as the typological-database lingua franca. Before CLDF (2018), every comparative-linguistics project (WALS, AUTOTYP, ASJP, Glottolog itself) used its own bespoke CSV/SQLite layout, so cross-project queries required custom ETL. CLDF profiled CSVW (CSV on the Web) for cross-linguistic data, defining standard column names (Form, Cognateset_ID, Parameter_ID) and standard component tables (LanguageTable, ParameterTable, FormTable, CognateTable). Today WALS, PHOIBLE, Glottolog, Concepticon, NorthEuraLex, IELex, and dozens of family-specific datasets all ship CLDF, and pycldf is the canonical Python access library. CLDF 1.3 is current.
Wn-LMF unifying the wordnet world. Before Wn-LMF, each wordnet (Princeton, EuroWordNet, IndoWordNet, BalkaNet, OMW component wordnets) shipped its own format. Wn-LMF 1.0 (2013) and the current 1.4 (2024) defined a single XML schema validated by DTD, with the GWA’s Inter-Lingual Index (ILI) as a glue layer assigning stable cross-language synset IDs. The Open English WordNet now actively maintains the Princeton lineage (Princeton WN 3.1 has been frozen since 2011), and OMW redistributes 150+ wordnets in WN-LMF + JSON.
ISO 639 / BCP 47 / Glottolog as overlapping language-ID systems with subtly-different goals. ISO 639-3 (SIL as RA) is the workhorse 3-letter individual-language identifier (~7,800 codes, Q1 2026 list current); ISO 639-1 (2-letter) and -2 (3-letter bibliographic) cover smaller subsets for older systems. BCP 47 / RFC 5646 composes 639 codes with ISO 15924 scripts (Hans/Hant), ISO 3166-1 regions (US/GB), and registered variants (tr-x-icu, en-GB-oxendict) — it is the Web/HTML/XML standard. Glottolog (currently 5.3) assigns its own 8-character glottocodes that cover everything 639-3 misses (extinct languoids, unclassified varieties, dialects). Modern best practice: use BCP 47 in user-facing contexts (HTML lang, CLDR locale), ISO 639-3 for individual-language identification, and Glottolog glottocodes for typological research and endangered-language documentation.
The TEI Lex-0 / LMF Part 4 / OntoLex lexicog tripod. All three are 2016–2024-era responses to the same problem: the TEI dictionary chapter (P5 Ch 9) is too permissive, and serious dictionary projects need a constrained baseline. TEI Lex-0 is the constrained TEI subset (community recommendations + ODD-driven schema); ISO 24613-4 is the normative TEI serialisation of LMF; OntoLex lexicog is the RDF version of the same concepts. The three are interconvertible by design — DARIAH’s Lexical Resources Working Group, the OntoLex CG, and ISO TC 37 SC 4 share overlapping membership precisely to keep them aligned. Modern projects (ELEXIS, DigiLex, Dictionaria) target all three formats from a single source.

Citations

OntoLex-Lemon Final Community Group Report (May 2016): https://www.w3.org/2016/05/ontolex/
OntoLex Lexicography module lexicog (Sept 2019 Final CG Report): https://www.w3.org/2019/09/lexicog/
OntoLex FrAC (Frequency, Attestation, Corpus — working draft): https://ontolex.github.io/frequency-attestation-corpus-information/
ISO 24613-1:2024 LMF Part 1 Core model: https://www.iso.org/standard/82014.html
ISO 24613-5:2022 LMF Part 5 LBX: https://www.iso.org/standard/72099.html
ISO 24613-6:2024 LMF Part 6 SynSem: https://www.iso.org/standard/83180.html
ISO 30042:2019 TBX v3: https://www.iso.org/standard/62510.html
TEI Lex-0 portal (current July 2025 release): https://lex-0.org/
TEI Guidelines 4.10.0 release notes (Aug 2025): https://tei-c.org/news/2025/08/15/new-release-tei-guidelines-4-10-0-stylesheets-7-59-0/
LIFT standard: https://github.com/sillsdev/lift-standard
Global Wordnet schemas (WN-LMF 1.4): https://globalwordnet.github.io/schemas/
CLDF specification (current 1.3): https://cldf.clld.org/
CLDF Scientific Data paper (Forkel et al., 2018): https://www.nature.com/articles/sdata2018205
Glottolog 5.3: https://glottolog.org/
Wikidata Lexicographical data documentation: https://www.wikidata.org/wiki/Wikidata:Lexicographical_data/Documentation
WikibaseLexeme data model: https://www.mediawiki.org/wiki/Extension:WikibaseLexeme/Data_Model
DBnary project: http://kaiko.getalp.org/about-dbnary/
OLAC metadata: http://www.language-archives.org/OLAC/metadata.html
ISO 639-3 (SIL Registration Authority): https://iso639-3.sil.org/
BCP 47 / RFC 5646: https://www.rfc-editor.org/rfc/bcp/bcp47.txt
MMoOn Multilingual Morpheme Ontology: https://mmoon.org/
DatCatInfo (post-ISOcat Data Category Repository): https://datcatinfo.net/
CIDOC CRM (ISO 21127:2023): https://cidoc-crm.org/
JMdict/EDICT/KANJIDIC (EDRDG): https://www.edrdg.org/

Compendium

Explorer

Linguistic Resource Publishing / Lexicon / Dictionary DSLs Family Index