i18n / Localization / Locale-Data DSLs Family Index
type: language-family-index family: i18n-locale languages_catalogued: 24 tags: [language-reference, family-index, i18n-locale, cldr, icu, messageformat, gettext, fluent, xliff, tmx, tbx, plurals, bidi]
i18n / Localization / Locale-Data — Family Index
Family overview
Internationalization-and-localization DSLs are the textual languages that describe locale data (how a culture writes dates, sorts strings, pluralises nouns, formats numbers, lays out bidi text), message templates (how a translatable string with placeholders is authored, plural-selected, gender-selected, and rendered), and interchange formats (how translations move between developers, translators, and translation-management systems). The family sits in three concentric rings: a foundational locale-data layer (Unicode + CLDR), a message-formatting layer (gettext, ICU MessageFormat 1.x → MessageFormat 2.0, Mozilla Fluent, Rails I18n YAML, FormatJS), and a platform-specific resource-file zoo (Android strings.xml, iOS Localizable.xcstrings, .NET .resx, Java .properties, Qt .ts).
The foundational layer is dominated by CLDR. The Unicode Common Locale Data Repository, expressed in LDML (UTS #35), is the canonical source of locale data — calendars, plural rules, collation, transforms, currency formatting, and more — for every modern OS, browser, and runtime. CLDR ships twice a year (April and October); the current line is CLDR 48 (October 2025, paired with Unicode 17 and ICU 78 released March 2026), with CLDR 49 in General Submission since April 2026. Because every JS Intl, Java ICU, .NET globalisation feature, Windows National Language Support, glibc locale, and macOS/iOS/Android locale traces back to CLDR, version-pinning matters for reproducibility.
The message-formatting layer is in the middle of a generational shift. ICU MessageFormat 1.x — invented inside ICU in the early 2000s and embedded into Java’s java.text.MessageFormat, .NET’s string.Format, and FormatJS / Intl.MessageFormat — had real ergonomic problems: nested plurals were near-unreadable, the function vocabulary (number, plural, select, selectordinal) was non-extensible, and translator UX suffered. MessageFormat 2.0 advanced from Final Candidate to Stable in CLDR 47 (March 2025) and is now a stable part of LDML and Unicode’s recommended successor; CLDR 47/48 ship its data and the Unicode message-format-wg drove formal approval. In parallel, XLIFF 2.2 became an OASIS Specification on March 13, 2025, adding a Plural/Gender/Select Module aligned conceptually with MF2. Mozilla Fluent (FTL) remains the heterodox alternative, used heavily across Firefox and sister projects (60% more Fluent strings landed in Firefox during 2025), with deliberately translator-first ergonomics and asymmetric branching.
The platform zoo is messier and stickier. Apple introduced Localizable.xcstrings (a JSON-based “String Catalog”) in Xcode 15 (2023), unifying the legacy .strings + .stringsdict + XLIFF flow into a single source of truth; Xcode 16/17 (2025–2026) defaults to xcstrings for new projects. Android remains on res/values/strings.xml for backward and tooling-lock-in reasons. Java’s .properties is still alive in 2026 despite its long-standing Unicode-escape pathology. .NET’s .resx/.resw continue as ResourceManager’s interchange. The interchange layer (XLIFF, TMX, TBX) is the silent plumbing under translation-management systems — XLIFF for in-flight bilingual files, TMX for translation-memory exchange (still on the 2005 1.4b spec, never updated after LISA’s 2011 dissolution), and TBX for terminology (now ISO 30042:2019 / TBX 3.0).
In our deep library
None catalogued as standalone Tier 1/2 notes — i18n DSLs are auxiliary to the host platforms they live on.
Cross-reference:
- api-description — XLIFF/TMX/TBX are XML schemas with formal XSDs, like SOAP/WSDL/OpenAPI.
- notation-spec — CLDR plural-rules (“
i = 1 and v = 0”) and Unicode bidi controls are small formal languages. - document-typesetting — TeX
babel/polyglossiapackages are the typeset-document parallel of these DSLs. - citation-formats — MARC 880 fields handle alternate-graphic-representation for bibliographic localization.
- javascript —
Intl, FormatJS /intl-messageformat,i18next,react-intl, LinguiJS all live here. - java —
ResourceBundle,java.text.MessageFormat, ICU4J. - csharp —
.resx/.resw+System.Resources.ResourceManager+System.Globalization. - python —
gettextstdlib, Babel (Python),flask-babel,django.utils.translation. - ruby — Rails I18n YAML.
- swift —
Localizable.xcstrings+String.LocalizationValue. - kotlin — Android
strings.xml. - cpp — ICU4C, gettext, Qt Linguist.
Tier 3 family table — Master locale data
| Format | First appeared | Origin | Type | Status (2026) | URL |
|---|---|---|---|---|---|
| CLDR LDML (Locale Data Markup Language, UTS #35) | 2003 (CLDR 1.0) | Unicode Consortium (Mark Davis et al.) | XML schema describing every locale’s dates, numbers, calendars, collation, plurals, transforms | Very active — current is CLDR 48 (Oct 2025, paired w/ Unicode 17), CLDR 49 in Survey-Tool General Submission since 2026-04-29; twice-yearly cadence | https://cldr.unicode.org/ |
| CLDR Plural Rules DSL | 2007 (CLDR 1.6) | Unicode CLDR-TC | Tiny embedded rule language (i = 1 and v = 0, n % 10 = 0..2) — operands n, i, v, w, f, t, c, e | Stable, normative inside LDML Part 3 (Numbers); spec stable across CLDR 47/48 | https://www.unicode.org/reports/tr35/tr35-numbers.html#Language_Plural_Rules |
| CLDR Transforms / ICU rule-based transliteration | 2005 (CLDR 1.3) | Unicode CLDR + ICU | Rule DSL for transliteration (e.g. Cyrl-Latn, Han-Latn, Any-NFC) — left-to-right rewrite rules with context | Active; CLDR 48 added Han→Latin and Gujarati→Latin updates for Unicode 17 | https://www.unicode.org/reports/tr35/tr35-general.html#Transforms |
| Unicode Bidi Algorithm controls | 2001 (UAX #9) | Unicode Consortium | Tiny “language” of bidi-control codepoints: LRM, RLM, ALM, LRE/RLE/PDF (deprecated), LRI/RLI/FSI/PDI | Active; UAX #9 updates with each Unicode release (Unicode 17 in Sept 2025) | https://www.unicode.org/reports/tr9/ |
Tier 3 family table — Message format / placeholder DSLs
| Format | First appeared | Origin | Type | Status (2026) | URL |
|---|---|---|---|---|---|
| GNU gettext PO/POT | 1995 | Sun → Free Software Foundation (Ulrich Drepper et al.) | msgid/msgstr text format with Plural-Forms header expression; .po sources, .mo binary, .pot template | Very active — still the de facto UNIX/Linux i18n format; underpins Python, PHP, Perl, Ruby, GNOME, KDE | https://www.gnu.org/software/gettext/manual/gettext.html |
| ICU MessageFormat 1.x | ~2000 (ICU MessageFormat), Java port 1.4 (2002) | IBM ICU project | Brace-delimited DSL with {var, plural, ...}, {var, select, ...}, {var, selectordinal, ...} | Active but superseded — still ubiquitous via Java MessageFormat, ICU4C, FormatJS, intl-messageformat | https://unicode-org.github.io/icu/userguide/format_parse/messages/ |
| MessageFormat 2.0 (LDML Part 9) | Draft 2020+, Stable in CLDR 47 (Mar 2025) | Unicode message-format-wg (Mihai Niță, Addison Phillips, Eemeli Aro et al.) | New syntax with declarations (.input, .local), matchers (.match), extensible function registry; designed for nested plural+gender | Recommended — CLDR-blessed successor to MF1; stable as of LDML 47, refined in LDML 48; implementations in ICU4J 76+, ICU4C 76+, JS reference impl | https://www.unicode.org/reports/tr35/tr35-messageFormat.html |
| Mozilla Fluent (FTL) | 2017 (Project Fluent at Mozilla) | Mozilla (Stas Małolepszy, Zibi Braniecki) | Declarative .ftl files; asymmetric branching via { $var -> selectors; no printf-style; first-class translator workflow | Active — heavy use in Firefox + sister projects; +60% Fluent strings in Firefox during 2025; spec stable, tooling under continued investment via moz-l10n | https://projectfluent.org/ |
Java MessageFormat (java.text) | JDK 1.1 (1997) | Sun → Oracle / OpenJDK | {0, choice, ...}, {0, number, ...}, {0, date, ...} — limited dialect of ICU MF | Active, in JDK; ICU4J’s com.ibm.icu.text.MessageFormat is the richer alternative | https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/text/MessageFormat.html |
| Rails I18n / Ruby YAML locale | 2008 (Rails 2.2) | David Heinemeier Hansson + Sven Fuchs et al. | YAML files keyed by locale → key tree; interpolation %{name}, :count plural keys via CLDR rules | Very active, default Rails localization layer | https://guides.rubyonrails.org/i18n.html |
| i18next JSON | 2011 | i18next project (Jan Mühlemann) | JSON tree with _plural / _zero / _one / _other keys; or i18next-icu plugin for ICU MF | Very active — dominant in JS/React/Vue/Angular ecosystems alongside FormatJS | https://www.i18next.com/ |
FormatJS / intl-messageformat | 2014 (Yahoo) | Yahoo → OpenJS Foundation | ICU MessageFormat 1.x parser/runtime + React react-intl bindings; pursuing MF2 alignment | Very active, the canonical strict-ICU JS option | https://formatjs.github.io/ |
Tier 3 family table — Platform-specific resource files
| Format | First appeared | Origin | Type | Status (2026) | URL |
|---|---|---|---|---|---|
Java .properties ResourceBundle | JDK 1.1 (1997) | Sun → Oracle | key=value ISO-8859-1 (with \uXXXX escapes) or UTF-8 (since JDK 9 default) | Active, ubiquitous on JVM; native2ascii legacy still bites pre-JDK 9 codebases | https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/util/ResourceBundle.html |
.NET .resx / .resw | .NET FX 1.0 (2002) | Microsoft | XML key/value with strongly-typed designer codegen (Resources.Designer.cs); .resw is the WinRT/UWP variant | Active, the canonical .NET 8/9 resource format; consumed by System.Resources.ResourceManager | https://learn.microsoft.com/en-us/dotnet/core/extensions/work-with-resx-files-programmatically |
Android res/values/strings.xml | Android 1.0 (2008) | XML with <string>, <plurals>, <string-array>; per-locale dirs (values-fr/, values-ar/); CDATA + Java/Kotlin String.format placeholders | Very active — backwards-compatibility lock-in keeps it the canonical Android format despite calls for replacement | https://developer.android.com/guide/topics/resources/string-resource | |
iOS legacy .strings + .stringsdict | NeXTSTEP era → Mac OS X 10.0 (2001) | Apple | Plain-text key/value "hello" = "Hola"; plus XML plist .stringsdict for plurals | Legacy — still widely used but Apple is steering everyone to xcstrings; Xcode auto-converts on demand | https://developer.apple.com/documentation/foundation/optimizing-your-app-s-localization-for-language-direction |
Apple Localizable.xcstrings (String Catalog) | Xcode 15 (2023) | Apple | JSON-based unified format combining .strings + .stringsdict + XLIFF metadata; per-language translation-state tracking; device variations; substitutions | Active and default — Xcode 15+ default for new strings; Xcode 16/17 (2025–2026) tooling matures; “Convert to String Catalog” migrates legacy projects | https://developer.apple.com/documentation/xcode/localizing-and-varying-text-with-a-string-catalog |
Qt Linguist .ts (XML) + .qm (binary) | Qt 1.x era (~1996), modernised in Qt 4 (2005) | Trolltech → Qt Company | XML translation source (.ts) compiled by lrelease to binary .qm; lupdate extracts from C++/QML | Active, current in Qt 6.7+ (2025–2026); also accepts XLIFF as alternative | https://doc.qt.io/qt-6/linguist-ts-file-format.html |
Mozilla DTD / .properties / .lang (legacy) | 2000s (XUL Firefox) | Mozilla | Legacy XUL-era DTD entity files plus Java-style .properties; superseded by Fluent inside Firefox | Legacy — most strings migrated to Fluent by 2024; residual files remain in older add-on ecosystems | https://firefox-source-docs.mozilla.org/l10n/fluent/migrations.html |
Tier 3 family table — Translation interchange / TMS
| Format | First appeared | Origin | Type | Status (2026) | URL |
|---|---|---|---|---|---|
| XLIFF 2.2 | XLIFF 1.0 (2002), 2.0 (2014), 2.1 (2018, ISO 21720:2024 in July 2024), 2.2 OASIS Specification 2025-03-13 | OASIS XLIFF TC | XML interchange of bilingual translatable content; modules for glossary, change-tracking, ITS, Plural/Gender/Select (new in 2.2) | Current is 2.2 (March 2025); 2.1 also widely deployed; 1.2 still in legacy tooling | https://docs.oasis-open.org/xliff/xliff-core/v2.2/xliff-core-v2.2-part1.html |
| TMX (Translation Memory eXchange) 1.4b | 1997 (TMX 1.0), 1.4b in 2005 | LISA OSCAR (now defunct) | XML for translation-memory interchange across CAT tools | Frozen at 1.4b — LISA dissolved 2011; TMX 2.0 was drafted but never finalised; still ubiquitous | https://www.gala-global.org/lisa-oscar-standards |
| TBX (TermBase eXchange) / ISO 30042:2019 | TBX 1.0 (2002), ISO 30042:2008, ISO 30042:2019 (TBX 3.0) | LISA → ISO TC 37 | XML for terminology databases with concept-oriented structure | Active — TBX 3.0 / ISO 30042:2019 is current; managed under ISO since LISA’s 2011 dissolution | https://www.iso.org/standard/62510.html |
| iOS XLIFF export (Apple flavour) | Xcode 6 (2014) | Apple | XLIFF 1.2 dialect used by xcodebuild -exportLocalizations | Legacy / declining — being supplanted by xcstrings as the source of truth; Xcode still exports XLIFF for vendor handoff | https://developer.apple.com/documentation/xcode/exporting-localizations |
| MARC 880 alternate-graphic-representation | MARC 21 (1999, building on MARC 1968) | Library of Congress / OCLC | Bibliographic-record subfield 880 holding parallel script versions of fields (e.g. CJK + Latin transliteration) | Active in library cataloguing — OCLC, LC, national libraries; cross-link citation-formats | https://www.loc.gov/marc/bibliographic/bd880.html |
Notable threads
-
CLDR is the silent foundation under everything. Every modern OS (Windows NLS, macOS/iOS, Android, ChromeOS, glibc) bundles CLDR-derived locale data; every browser
IntlAPI, Java/ICU4J, .NET globalisation feature, Python Babel, and PHPIntlextension traces back to CLDR XML. The twice-yearly cadence (April/October) and version-pinning matter for reproducibility — your test suite that asserts a specific date format may break across CLDR versions because, say, French formal-form spacing rules changed. CLDR 48 (Oct 2025, paired with Unicode 17 + ICU 78) is the current stable line, with CLDR 49 in Survey-Tool General Submission since 2026-04-29. -
MessageFormat 2.0 is the long-awaited fix to MF1’s ergonomic problems. ICU MessageFormat 1.x suffered from three real defects: nested
{count, plural, ...}inside{gender, select, ...}was painful to author and untranslatable; the function vocabulary was hard-coded (no third-party{var, currency, ...}); and the translator UX assumed the translator understood ICU syntax. MF2 (LDML Part 9, stable in CLDR 47 March 2025, refined in CLDR 48) introduces declarations (.input,.local), an explicit.matchmatcher with multi-key keys for combined plural+gender, and an extensible function registry — translators can work with structurally tagged messages rather than nested braces. ICU4J 76+ and ICU4C 76+ ship reference implementations; FormatJS, i18next-icu, and Fluent-adjacent tooling are all moving toward MF2 alignment. -
Fluent’s translator-first bet on grammatical-gender + asymmetry. Mozilla designed Fluent specifically around the observation that languages do not have isomorphic grammar — a single English source string may need 8 distinct Polish forms (gender × number × case) and zero distinct Japanese forms. Fluent makes asymmetric branching (
{ $userGender -> [feminine] ... [masculine] ... *[other] ... }) idiomatic, lets translators add or remove branches without round-tripping with developers, and refuses to be printf-shaped. Adoption is heavy inside Mozilla (Firefox, Thunderbird, MDN, Pontoon) and growing slowly elsewhere; outside the Mozilla orbit, MF2 is the consensus path forward. -
Apple’s xcstrings (Xcode 15, 2023) is a category move. Before xcstrings, an Apple-platform localisation pipeline meant juggling
.strings(regular keys),.stringsdict(plurals only, in plist XML), per-targetInfo.plistlocalisation, and XLIFF round-trips with vendors — four format dialects per app.Localizable.xcstringsis a single JSON file with first-class translation-state tracking (new,translated,needs_review,stale), device variations (iphone,ipad,mac,watch), and substitution placeholders. Xcode 15 made it the default for new strings; Xcode 16/17 (2025–2026) matured the tooling. The trade-off: xcstrings is Apple-only with no formal interchange spec, so vendor TMS integrations route through XLIFF export. -
The XLIFF / TMX / TBX standards are unevenly maintained. XLIFF is the live one — 2.0 in 2014, 2.1 in 2018 (became ISO 21720:2024 in July 2024), 2.2 became an OASIS Specification on 2025-03-13 with a new Plural/Gender/Select Module conceptually aligned with MF2 and XLIFF 2.2. TMX is frozen at 1.4b (2005) — LISA, the original maintainer, dissolved in 2011, TMX 2.0 was drafted but never finalised, and the format is now governed by GALA under Creative Commons; despite that, every CAT tool still imports/exports TMX 1.4b. TBX transitioned cleanly to ISO TC 37 stewardship and is now ISO 30042:2019 (TBX 3.0).
-
The plural-rules problem nobody escapes. English has 2 plural categories (
one,other); Russian has 4 (one,few,many,other); Arabic has 6 (zero,one,two,few,many,other); Welsh has its own quirky set; some languages have 0 distinct plural forms (Japanese, Chinese, Korean). CLDR’s plural-rules DSL — small expressions over operandsn, i, v, w, f, t, c, e(number, integer digits, visible fractional digits, etc.) — is a tiny but normative formal language embedded inside LDML Part 3 (Numbers). Every i18n stack (gettextPlural-Forms, ICUpluralselector, MF2:numbermatcher, Fluent’s plural functions, Rails’s:countrule) ultimately defers to CLDR plural categories. Getting plurals wrong is the single most common visible-localization defect. -
The bidi/RTL story is harder than it looks. Unicode’s bidi algorithm (UAX #9) is itself a small declarative “language”: embedding levels, paragraph direction inference, isolate-vs-embed semantics (isolates LRI/RLI/FSI/PDI, post-Unicode 6.3, replaced the deprecated LRE/RLE/PDF embeddings), and the bidi-control codepoints (LRM, RLM, ALM). Mishandled bidi shows up as garbled phone numbers, broken filenames, mis-aligned Arabic punctuation, and the famous Trojan-Source attack-class (CVE-2021-42574). When user-supplied content mixes scripts, the only safe pattern is to wrap with FSI…PDI (“first-strong isolate … pop directional isolate”).
-
Why Android stuck with
strings.xmlin 2026. Backward compatibility and tooling lock-in. Android Studio’s resource manager, Lint rules (MissingTranslation,Typos,ImpliedQuantity), AAPT2 string interning, and the entireResources.getString()runtime are calibrated forstrings.xml+<plurals>+values-<locale>/directories. Google has discussed a more structured replacement multiple times but has never shipped one — the migration cost across the Play Store is too high. The format limps along with<xliff:g>placeholder annotations as its only real concession to modern i18n thinking.
Citations
- CLDR project home: https://cldr.unicode.org/
- CLDR 48 Release Note: https://cldr.unicode.org/downloads/cldr-48
- CLDR 47 Release Note (MF2 Stable): https://cldr.unicode.org/downloads/cldr-47
- Unicode LDML Part 9: MessageFormat 2.0: https://www.unicode.org/reports/tr35/tr35-messageFormat.html
- Unicode message-format-wg: https://github.com/unicode-org/message-format-wg
- ICU 78 Release Notes: https://unicode-org.github.io/icu/download/78.html
- ICU 76 (initial MF2 ref impl) Release Notes: https://unicode-org.github.io/icu/download/76.html
- GNU gettext Manual: https://www.gnu.org/software/gettext/manual/gettext.html
- Project Fluent: https://projectfluent.org/
- Mozilla L10n 2025 Recap: https://blog.mozilla.org/l10n/2026/01/07/mozilla-localization-in-2025/
- Apple String Catalog (xcstrings) Documentation: https://developer.apple.com/documentation/xcode/localizing-and-varying-text-with-a-string-catalog
- Android string resources: https://developer.android.com/guide/topics/resources/string-resource
- Java
MessageFormat: https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/text/MessageFormat.html - .NET
.resxfiles: https://learn.microsoft.com/en-us/dotnet/core/extensions/work-with-resx-files-programmatically - XLIFF 2.2 Part 1 (Core): https://docs.oasis-open.org/xliff/xliff-core/v2.2/xliff-core-v2.2-part1.html
- XLIFF 2.1 (ISO 21720:2024): https://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html
- TBX / ISO 30042:2019: https://www.iso.org/standard/62510.html
- TMX 1.4b (GALA): https://www.gala-global.org/lisa-oscar-standards
- Unicode UAX #9 (Bidi Algorithm): https://www.unicode.org/reports/tr9/
- Qt Linguist TS file format: https://doc.qt.io/qt-6/linguist-ts-file-format.html
- Rails I18n Guide: https://guides.rubyonrails.org/i18n.html
- i18next: https://www.i18next.com/
- FormatJS: https://formatjs.github.io/
- MARC 880 field: https://www.loc.gov/marc/bibliographic/bd880.html