Linguistic-Resource Publishing / Lexicon DSLs Family Index


type: language-family-index family: linguistic-resources languages_catalogued: 28 tags: [language-reference, family-index, linguistic-resources, ontolex-lemon, lmf, tei-lex0, lift, wn-lmf, cldf, dbnary, glottolog, olac, bcp47]

Linguistic-Resource Publishing / Lexicon / Dictionary — Family Index

Family overview

Linguistic-resource publishing DSLs are the textual and structured vocabularies used to encode digital dictionaries, terminology bases, wordnets, morpheme inventories, and language-documentation archives. They sit at the intersection of three different traditions that historically did not talk to each other: the lexicographic-publishing tradition (Oxford, Merriam-Webster, Brill — born from typesetting and now mostly XML/JSON), the field-linguistics tradition (SIL Toolbox SFM, FieldWorks FLEx, LIFT — born from missionary linguistics in the 1980s and still anchoring endangered-language documentation), and the computational-lexicon tradition (WordNet, OntoLex-Lemon, ISO LMF — born from NLP and the Semantic Web). The 2020s have seen a partial convergence around two attractors: OntoLex-Lemon on the linked-data side and TEI Lex-0 on the document-encoding side, with ISO LMF 24613 as the formal-standards reference.

OntoLex-Lemon is the de facto W3C model for lexicons as linked data, published as a W3C Final Community Group Report in May 2016 (with the Lexicography module lexicog added as a separate Final CG Report in September 2019). It is not a W3C Recommendation (the Community Group track does not produce Recommendations), but it functions as the modern standard: every major linked-data lexicon (DBnary, Apertium-RDF, BabelNet, the European Language Resources Coordination’s outputs) publishes in OntoLex. Modular structure separates a ontolex core (forms, lexical entries, senses, references to ontology concepts) from lime (lexicon metadata), vartrans (translation and variation), decomp (morphological decomposition), morph (inflection), syn (syntactic frames), lexicog (lexicographic structure), and the newer frac (frequency, attestation, corpus — still a working draft as of 2026, not yet a Final CG Report).

ISO LMF (Lexical Markup Framework, ISO 24613) went through a long modular split between 2019 and 2024: the legacy 2008 monolithic standard was retired in favor of six parts — Part 1 Core model (ISO 24613-1:2024), Part 2 Machine-Readable Dictionary (24613-2:2020), Part 3 Etymological Extension (24613-3:2021), Part 4 TEI Serialisation (24613-4:2021), Part 5 Lexical Base Exchange / LBX (24613-5:2022), and Part 6 Syntax and Semantics / SynSem (24613-6:2024). The TEI-serialisation part (24613-4) is the formal bridge from LMF to TEI Lex-0, and the LBX serialisation (24613-5) is the practical XML exchange format used by terminology vendors and EU-funded projects (ELEXIS, European Lexicographic Infrastructure).

The SIL legacy stack persists in field linguistics despite the linked-data wave: SFM (Standard Format Markers, 1980s line-based \lx … \ge … markers) feeds Toolbox; FieldWorks FLEx (.NET application, current 9.x) is the modern descendant; LIFT (Lexicon Interchange Format), currently at v0.13 and maintained via the SIL.Lift NuGet package, is the cross-tool XML exchange format between FLEx, WeSay, Lexique Pro, and Dictionary App Builder. Parallel to all of this, Wikibase Lexemes (Wikidata’s Lexeme/Form/Sense entity model, with dedicated wikibase-lexeme, wikibase-form, wikibase-sense datatypes) have become the largest crowdsourced structured-lexicon corpus, and DBnary (Gilles Sérasset, LIG/Grenoble) republishes Wiktionary as OntoLex-Lemon RDF twice a month across 26+ language editions. The wordnet world has consolidated around the Global WordNet Association’s WN-LMF XML schema (current 1.4 as of 2024), with the Open Multilingual Wordnet (OMW) as the canonical multi-language distribution, and CLDF (Cross-Linguistic Data Formats) — current 1.3 — has become the lingua franca for typological and comparative-linguistics datasets (WALS, PHOIBLE, Glottolog itself).

In our deep library

None of the formats in this family have standalone Tier-1/2 deep-library notes — they are XML/RDF/TSV exchange vocabularies hosted in general-purpose carrier languages.

Cross-reference:

  • nlp-corpus — sibling family for corpus and annotation formats (TEI body, CoNLL-U, Universal Dependencies, ELAN, PropBank/FrameNet). The line is fuzzy: ELAN .eaf, CHILDES CHAT, and LMF itself appear in both indexes because dictionaries and corpora share serialisation.
  • voice-phonetics — pronunciation-lexicon overlap (CMUdict, IPA in <pron> elements of TEI Lex-0 and OntoLex lexinfo:pronunciation).
  • semantic-web — OntoLex-Lemon is built on RDF/OWL/SKOS; MMoOn, GOLD, and CIDOC CRM are all OWL ontologies and live conceptually inside the Semantic Web stack.
  • i18n-locale — TBX (terminology) and BCP 47 language tags overlap heavily; the i18n index covers them from the software-localisation angle, this index from the dictionary/terminology-publishing angle.
  • citation-formats — MARC for library cataloging; OAI-PMH harvest protocol used by OLAC for language-archive metadata.
  • api-description — XML-schema and JSON-schema underpinnings (LIFT XSD, WN-LMF DTD, OntoLex Turtle).
  • notation-spec — formal-grammar adjacent for morphological-rule formalisms inside LMF Morphology / MMoOn.

Tier 3 family table — Linked-data lexicon (RDF / OntoLex-Lemon)

FormatFirst appearedOriginTypeStatus (2026)URL
OntoLex-Lemon core2016 (Final CG Report, May 2016)W3C Ontology-Lexica Community Group (John P. McCrae et al.)RDF/OWL model for lexicons; classes ontolex:LexicalEntry, ontolex:Form, ontolex:LexicalSense, ontolex:LexicalConceptDe facto standard; the modern linked-data lexicon model. Not a W3C Recommendation (CG track) but cited as the reference everywherehttps://www.w3.org/2016/05/ontolex/
OntoLex lime (Lexicon Metadata)2016W3C OntoLex CGMetadata module (lime:Lexicon, lime:entries, lime:language)Active, Final CG Reporthttps://www.w3.org/2016/05/ontolex/#metadata-lime
OntoLex vartrans (translation/variation)2016W3C OntoLex CGTranslation, term-variation, and lexical-relation moduleActivehttps://www.w3.org/2016/05/ontolex/#variation-translation-vartrans
OntoLex decomp + morph2016 (decomp) / 2019+ (morph)W3C OntoLex CGMorphological decomposition and inflection paradigmsActive; morph is the newer of the twohttps://www.w3.org/2016/05/ontolex/#morphology
OntoLex synsem2016W3C OntoLex CGSyntax–semantics interface for predicatesActivehttps://www.w3.org/2016/05/ontolex/#syntax-and-semantics-synsem
OntoLex lexicog (Lexicography module)2019 (Final CG Report, Sept 2019)W3C OntoLex CGLexicographic-structure module — entries, sub-entries, ordering, dictionary metadata; the canonical “linked-data dictionary” layerActive, Final CG Reporthttps://www.w3.org/2019/09/lexicog/
OntoLex FrAC (Frequency, Attestation, Corpus)2018+ (in development), still Working Draft as of 2026W3C OntoLex CG (Christian Chiarcos et al.)Corpus-derived frequencies, attestations, embedding pointersWorking draft, not yet Final CG Reporthttps://ontolex.github.io/frequency-attestation-corpus-information/
DBnary2012 (Sérasset, LIG/Grenoble)INRIA/Université Grenoble AlpesWiktionary republished as OntoLex-Lemon RDF; bi-monthly dumps synced to Wikimedia Wiktionary dumps; 26+ language editions as of 2024Very activehttp://kaiko.getalp.org/about-dbnary/
Wikibase Lexeme (Wikidata)2018 (Lexeme namespace launched on Wikidata)Wikimedia DeutschlandFirst-class Wikibase entity types Lexeme (L-IDs), Form (F-IDs), Sense (S-IDs); dedicated datatypes wikibase-lexeme, wikibase-form, wikibase-senseVery active; the largest crowdsourced structured-lexicon corpushttps://www.wikidata.org/wiki/Wikidata:Lexicographical_data
MMoOn Core (Multilingual Morpheme Ontology)2016 (initial) / 2021 (Semantic Web journal publication)AKSW Leipzig (Bettina Klimek et al.)OWL ontology for morpheme-level inventories; sub-word-level analogue to OntoLexActive research-grade; not standardisedhttps://mmoon.org/
GOLD (General Ontology for Linguistic Description)2003LinguistList / U. Arizona (Farrar, Langendoen)OWL ontology for descriptive-linguistics categories (parts of speech, grammatical features); designed for endangered-language fieldworkStable/legacy reference; widely cited but not actively versionedhttps://linguistics-ontology.org/
CIDOC CRM (ISO 21127:2023)1996 (CIDOC), ISO 21127:2006 → 2014 → 2023; current community version 7.1.3International Council of Museums (ICOM)OWL/RDFS ontology for cultural-heritage records; 81 classes, 160 properties; not lexicon-specific but used for archive metadata around dictionaries and manuscriptsActive; ISO 21127:2023 is currenthttps://cidoc-crm.org/
Lexvo.org / lexinfo.net2008+ (Lexvo, McCrae)DERI Galway / INSIGHT CentreCompanion vocabularies: Lexvo (URIs for languages, scripts, terms) and LexInfo (linguistic-category vocabulary used inside OntoLex)Active, low-velocity maintenancehttps://www.lexinfo.net/

Tier 3 family table — TEI / XML dictionary

FormatFirst appearedOriginTypeStatus (2026)URL
TEI Lex-02018 (DARIAH Working Group on Lexical Resources)DARIAH-EU, Toma Tasovac et al.Constrained TEI subset for dictionary encoding — fixes the underspecified parts of TEI Chapter 9; canonical baseline for born-digital + retro-digitised dictionariesActive; current release July 2025 (versioned, ODD-driven schema generation)https://lex-0.org/
TEI P5 dictionary module (Chapter 9)1990 (TEI P1) → P5 (2007) → 4.10.0 (Aug 2025) → 4.x ongoingText Encoding Initiative ConsortiumFull TEI dictionary vocabulary: <entry>, <form>, <gramGrp>, <sense>, <cit>, <def> — flexible but underspecified, hence the need for Lex-0Active; the parent standard; current Guidelines 4.10.0https://tei-c.org/release/doc/tei-p5-doc/en/html/DI.html
LMF Part 1 — Core model (ISO 24613-1:2024)2008 (original ISO 24613), revised 2024ISO TC 37 / SC 4UML metamodel for lexical resources; class hierarchy LexicalResource → Lexicon → LexicalEntry → SenseCurrent, published 2024https://www.iso.org/standard/82014.html
LMF Part 2 — Machine-Readable Dictionary / MRD (ISO 24613-2:2020)2020ISO TC 37 / SC 4Specialisation for general-purpose dictionariesCurrenthttps://www.iso.org/standard/72100.html
LMF Part 3 — Etymological Extension (ISO 24613-3:2021)2021ISO TC 37 / SC 4Etymology, cognates, borrowing chainsCurrenthttps://www.iso.org/standard/72101.html
LMF Part 4 — TEI Serialisation (ISO 24613-4:2021)2021ISO TC 37 / SC 4 + TEI liaisonNormative TEI serialisation of LMF (bridge to TEI Lex-0)Currenthttps://www.iso.org/standard/72102.html
LMF Part 5 — LBX Lexical Base Exchange (ISO 24613-5:2022)2022ISO TC 37 / SC 4XML exchange-format serialisation; the practical interchange syntaxCurrenthttps://www.iso.org/standard/72099.html
LMF Part 6 — Syntax and Semantics / SynSem (ISO 24613-6:2024)2024ISO TC 37 / SC 4Predicate–argument structures, semantic framesCurrent, newest parthttps://www.iso.org/standard/83180.html

Tier 3 family table — SIL legacy / FLEx / LIFT / wordlist

FormatFirst appearedOriginTypeStatus (2026)URL
SFM (Standard Format Markers)early 1980sSIL International (Shoebox/Toolbox lineage)Line-based \marker value plaintext; configurable marker hierarchies; the substrate of ToolboxLegacy but widely-deployed in field linguisticshttps://software.sil.org/toolbox/
MDF (Multi-Dictionary Formatter)1990sSIL (Toolbox shipping standard)A standardised SFM dialect — agreed marker set (\lx, \ps, \ge, \dt) for typological consistency across field projectsLegacy/maintenancehttps://software.sil.org/toolbox/
FieldWorks FLEx XML2007+ (FLEx 1.0), current 9.xSIL InternationalNative FLEx data format (XML; project files); the modern successor to ToolboxActive; FLEx remains the dominant field-linguistics workbenchhttps://software.sil.org/fieldworks/
LIFT (Lexicon Interchange Format)2007, current v0.13SIL InternationalXML cross-tool dictionary-exchange format; used by FLEx, WeSay, Lexique Pro, Dictionary App Builder; SIL.Lift NuGet package is current implementationActive, the de facto SIL-ecosystem interchangehttps://github.com/sillsdev/lift-standard
OpenDictionary / SIL Toolbox project format1990sSILProject file bundles (.prj + SFM data) for ToolboxLegacyhttps://software.sil.org/toolbox/
ELAN .eaf2002+, current ELAN 7.x (April 2026)Max Planck Institute for Psycholinguistics, NijmegenXML time-aligned multimedia transcription/annotation; tier hierarchy with constraints; cross-listed in nlp-corpusActive, the workhorse for endangered-language documentationhttps://archive.mpi.nl/tla/elan
CHILDES CHAT1984+TalkBank / Carnegie Mellon (Brian MacWhinney)Plain-text transcription convention for child-language acquisition; cross-listed in nlp-corpusActive, central in CHILDES/TalkBank archiveshttps://talkbank.org/manuals/CHAT.html

Tier 3 family table — Wordnet / terminology / Japanese-dictionary

FormatFirst appearedOriginTypeStatus (2026)URL
WN-LMF (Global Wordnet LMF)2013 (WN-LMF 1.0), current WN-LMF 1.4 (2024)Global WordNet AssociationDTD-defined XML schema for wordnets — synsets, senses, ILI (Inter-Lingual Index) linkingActive; 1.4 current; cross-listed in nlp-corpushttps://globalwordnet.github.io/schemas/
Princeton WordNet native database1985+ (Miller et al.), 3.1 (2011, last canonical release)Princeton CSLNative flat-file format (data.noun, index.noun, etc.); the original WordNet exchange substrateFrozen at 3.1; Open English WordNet now extends ithttps://wordnet.princeton.edu/
Open English WordNet2019+, ongoing yearly releasesGlobal WordNet Association (John P. McCrae et al.)The actively-maintained successor to Princeton WN 3.1; published as WN-LMF + JSON + RDFActive, the canonical English wordnet todayhttps://en-word.net/
OMW-EN / OMW JSON2010+ (OMW), JSON form 2018+NTU + GWA (Francis Bond et al.)Open Multilingual Wordnet — JSON and WN-LMF distributions across 150+ wordnetsActivehttps://omwn.org/
TBX (TermBase eXchange, ISO 30042:2019)2002 (LISA), ISO 2008 → ISO 30042:2019 v3, revision in progress (ISO/AWI 30042)ISO TC 37 / LISA legacyXML terminology-exchange standard; concept-oriented; cross-listed in i18n-localeActive; v3 current; new revision in draftinghttps://www.iso.org/standard/62510.html
DatCatInfo (Data Category Repository)2019 (successor to ISOcat)LTAC Global / TerminOrgs, ISO TC 37 liaisonWeb-accessible repository of standardised data categories (POS values, gender, number, etc.) per ISO 12620Active; replaces retired ISOcathttps://datcatinfo.net/
ISOcat (legacy)2009 → frozen 2014ISO TC 37Original Data Category Registry per ISO 12620:2009; categories migrated to DatCatInfo and CLARIN Concept RegistryRetired; consult successorshttps://www.clarin.eu/news/concept-revival-isocat-clarin-concept-registry
JMdict XML1999 (Jim Breen, EDRDG)Electronic Dictionary Research and Development GroupUTF-8 XML Japanese-English multilingual dictionary; daily releases; multiple kanji + readings + glosses per entryVery active; the canonical OSS Japanese dictionaryhttps://www.edrdg.org/jmdict/edict.html
EDICT / EDICT21991 (EDICT), 2003 (EDICT2)Jim Breen / EDRDGPlain-text EUC-JP (EDICT) / enhanced text (EDICT2) Japanese-English dictionary; legacy format derived from JMdictLegacy (provided for older apps); JMdict XML is canonicalhttps://www.edrdg.org/jmdict/edict.html
KANJIDIC / KANJIDIC21991+, current KANJIDIC2 XMLEDRDGPer-kanji metadata (readings, meanings, stroke counts, JIS/Unicode codepoints, dictionary cross-references)Activehttps://www.edrdg.org/wiki/KANJIDIC_Project

Tier 3 family table — Language codes / archives / comparative

FormatFirst appearedOriginTypeStatus (2026)URL
ISO 639-1 / 639-2 / 639-3 / 639-51967 (639-1 → ISO 639) / 1998 (639-2) / 2007 (639-3) / 2008 (639-5)ISO TC 37 / SC 2; SIL is RA for 639-32-letter (639-1), 3-letter (639-2, -3), and family codes (639-5); 639-3 covers ~7,800 individual languagesActive; Q1 2026 SIL change requests applied; 639-3 is the workhorsehttps://iso639-3.sil.org/
BCP 47 / RFC 5646 language tagsRFC 5646 (Sept 2009), still current; companion RFC 4647 for matchingIETF (Phillips, Davis)Composition of ISO 639 + ISO 15924 script + ISO 3166-1 region + private use + variants; the Web/HTML/XML lang-tag standardActive; the de facto language identifier on the Webhttps://www.rfc-editor.org/rfc/bcp/bcp47.txt
Glottocode (Glottolog languoid ID)2011+, current Glottolog 5.3 (2026)MPI EVA Leipzig (Hammarström, Forkel et al.)8-character ID (e.g. stan1288) for every languoid (family, language, dialect); fills gaps in ISO 639-3 (covers extinct, undocumented, and unclassified varieties); available as CLDF + JSON + RDFActive; complement to 639-3, not replacementhttps://glottolog.org/
OLAC metadata2000+Open Language Archives Community (Bird, Simons)XML metadata format extending Dublin Core (all 15 DC elements + community qualifiers); harvested via OAI-PMH; integrated with Linguistic Linked Open Data Cloud (2016)Active; central to language-archive interophttp://www.language-archives.org/OLAC/metadata.html
OAI-PMH (harvest protocol)2002 (OAI 2.0)Open Archives InitiativeHTTP harvest protocol — OLAC archives expose metadata via OAI-PMH endpoints; cross-listed adjacent to citation-formatsActivehttps://www.openarchives.org/OAI/openarchivesprotocol.html
CLDF (Cross-Linguistic Data Formats)2018 (Forkel et al., Scientific Data); current CLDF 1.3Glottobank consortium (MPI SHH / EVA, ERC CALC)CSV-on-the-Web (CSVW) profile for comparative-linguistics data — wordlists, structure datasets (WALS), phoneme inventories (PHOIBLE), Glottolog itself; JSON-LD metadata + CSV tablesActive; the lingua franca of comparative linguisticshttps://cldf.clld.org/
Wiktionary template syntax2002+ (Wiktionary)Wikimedia FoundationMediaWiki templates inside Wiktionary entries ({{lb}}, {{l}}, {{m}}, {{tt}}, language-specific headers); the substrate of the world’s largest free dictionaryActive; complemented by Wikibase Lexemes for structured datahttps://en.wiktionary.org/wiki/Wiktionary:Templates

Notable threads

  • OntoLex-Lemon as the de facto linked-data lexicon standard. Although it never advanced to W3C Recommendation (the Community Group track does not produce Recs), OntoLex-Lemon’s Final Community Group Report (May 2016) plus the lexicog extension (September 2019) function as the modern standard. Every major linked-data lexicography project — DBnary, Apertium-RDF, BabelNet, the European Language Resources Coordination outputs, Wikidata’s own data export — publishes in OntoLex. The trick that made it dominant was modularity: a thin core for forms/senses/concepts, with optional lime, vartrans, decomp, morph, synsem, and lexicog modules picked à la carte. The FrAC module (frequency, attestation, corpus) remains a Working Draft as of May 2026 — the unfinished frontier is corpus-derived evidence.

  • The long shadow of SIL Toolbox SFM (1980s) in field linguistics. SFM’s \marker value line-based format predates XML by a decade and remains the substrate of an enormous installed base of field-linguistic data. MDF standardised the marker set, FLEx replaced the Toolbox application, and LIFT became the XML export format — but huge legacy SFM corpora still exist in researcher and missionary archives. The persistence is partly cultural (linguistics PhDs trained on Toolbox keep using it) and partly technical (SFM is human-readable in a text editor, which matters in low-connectivity field settings where binary FLEx project files are fragile). LIFT 0.13 is the cross-tool migration path for everyone who has finally moved off SFM.

  • LIFT as the cross-tool interchange that mostly succeeded. Where OntoLex-Lemon won the linked-data world, LIFT won the SIL-ecosystem world: it is what flows between FLEx (the editor), WeSay (the lightweight tablet/laptop entry tool), Lexique Pro (the publication tool), Dictionary App Builder (mobile-app generator), and Webonary (the web-publishing tool). It does not aspire to round-trip every FLEx feature — FLEx’s native XML is richer — but it carries the dictionary “Send/Receive” workflow that is the actual collaboration model for distributed-field projects with intermittent connectivity. v0.13 has been stable since the late 2010s; the SIL.Lift NuGet package is the canonical implementation.

  • Wikibase Lexemes as Wikidata’s growing structured-dictionary layer. The Lexeme/Form/Sense entity types launched in 2018 added a third Wikibase entity dimension alongside Items (Q-IDs) and Properties (P-IDs). Each Lexeme has L-IDs, with sub-entities for Forms (F-IDs, one per inflected surface form) and Senses (S-IDs). The three dedicated datatypes — wikibase-lexeme, wikibase-form, wikibase-sense — let properties on Items link to lexical data, and vice versa. This has produced the largest crowdsourced multilingual structured lexicon ever, growing fastest in languages underserved by commercial dictionaries. Wikibase Lexemes are SPARQL-queryable on the Wikidata Query Service and exportable to OntoLex via the Lexicographical data ontology mapping.

  • CLDF as the typological-database lingua franca. Before CLDF (2018), every comparative-linguistics project (WALS, AUTOTYP, ASJP, Glottolog itself) used its own bespoke CSV/SQLite layout, so cross-project queries required custom ETL. CLDF profiled CSVW (CSV on the Web) for cross-linguistic data, defining standard column names (Form, Cognateset_ID, Parameter_ID) and standard component tables (LanguageTable, ParameterTable, FormTable, CognateTable). Today WALS, PHOIBLE, Glottolog, Concepticon, NorthEuraLex, IELex, and dozens of family-specific datasets all ship CLDF, and pycldf is the canonical Python access library. CLDF 1.3 is current.

  • Wn-LMF unifying the wordnet world. Before Wn-LMF, each wordnet (Princeton, EuroWordNet, IndoWordNet, BalkaNet, OMW component wordnets) shipped its own format. Wn-LMF 1.0 (2013) and the current 1.4 (2024) defined a single XML schema validated by DTD, with the GWA’s Inter-Lingual Index (ILI) as a glue layer assigning stable cross-language synset IDs. The Open English WordNet now actively maintains the Princeton lineage (Princeton WN 3.1 has been frozen since 2011), and OMW redistributes 150+ wordnets in WN-LMF + JSON.

  • ISO 639 / BCP 47 / Glottolog as overlapping language-ID systems with subtly-different goals. ISO 639-3 (SIL as RA) is the workhorse 3-letter individual-language identifier (~7,800 codes, Q1 2026 list current); ISO 639-1 (2-letter) and -2 (3-letter bibliographic) cover smaller subsets for older systems. BCP 47 / RFC 5646 composes 639 codes with ISO 15924 scripts (Hans/Hant), ISO 3166-1 regions (US/GB), and registered variants (tr-x-icu, en-GB-oxendict) — it is the Web/HTML/XML standard. Glottolog (currently 5.3) assigns its own 8-character glottocodes that cover everything 639-3 misses (extinct languoids, unclassified varieties, dialects). Modern best practice: use BCP 47 in user-facing contexts (HTML lang, CLDR locale), ISO 639-3 for individual-language identification, and Glottolog glottocodes for typological research and endangered-language documentation.

  • The TEI Lex-0 / LMF Part 4 / OntoLex lexicog tripod. All three are 2016–2024-era responses to the same problem: the TEI dictionary chapter (P5 Ch 9) is too permissive, and serious dictionary projects need a constrained baseline. TEI Lex-0 is the constrained TEI subset (community recommendations + ODD-driven schema); ISO 24613-4 is the normative TEI serialisation of LMF; OntoLex lexicog is the RDF version of the same concepts. The three are interconvertible by design — DARIAH’s Lexical Resources Working Group, the OntoLex CG, and ISO TC 37 SC 4 share overlapping membership precisely to keep them aligned. Modern projects (ELEXIS, DigiLex, Dictionaria) target all three formats from a single source.

Citations