Languages library audit
TL;DR
- README is massively understated. Claims “51 deep notes + family catalogs across ~310 languages”. Reality: 51 deep notes + 85 family catalogs + 6 cross-cutting comparison notes, covering ~2163 languages per the Tier3
_index.md(the ~310 figure was accurate on 2026-05-07, but the library went through 18 expansion passes through 2026-05-14 and the README was never updated). - T1 coverage of mainstream production languages is solid — every major language in the audit prompt has a standalone
.md(Python, JS, TS, Java, C, C++, C#, Go, Rust, Ruby, PHP, Swift, Kotlin, Scala, Haskell, Erlang, Elixir, Clojure, F#, OCaml, R, Julia, Lua, Perl, Bash, PowerShell, SQL, Dart, Zig, Nim, Crystal). Missing standalone notes: MATLAB, HTML, CSS, Mojo (catalog-only). - Tier 3 is genuinely encyclopedic. Sampled catalogs (lisps 30, scientific 28, esoteric 22, hdl 22, smart-contract 27) match their per-index claims within ±2 rows. 85 catalogs averaging ~25 langs/family supports the ~2163 aggregate.
- Recommendation: Update the README one-liner to reflect the real shape (
51 deep notes + 6 cross-cutting + 85 family catalogs across ~2163 languages). No content gaps urgent enough to block; consider 3-4 targeted fills (MATLAB, mainline web stack notes if desired, Mojo upgrade).
Inventory
Counted via Glob + wc -l against d:\Vault\Compendium\Math-and-Compute\Languages\.
| Bucket | Count | Notes |
|---|---|---|
Root .md files | 60 | 51 language notes + 9 meta files |
| Meta files (root) | 9 | _index, _schema, _roadmap, _learn_next, 5 _compare_* |
| T1+T2 deep language notes | 51 | matches README/_index claim exactly |
Tier3 .md files | 86 | 85 family catalogs + 1 _index.md |
Grand total .md | 146 | vs _index self-claim of “89 files” (stale by 3 expansion passes worth) |
| Cross-cutting comparison notes | 6 | _compare_memory_models, _compare_type_systems, _compare_concurrency, _compare_metaprogramming, _compare_build_tools, _learn_next |
| Languages catalogued (per Tier3 _index) | ~2163 | across 85 family catalogs |
The _index.md at root tells the consistent story (Tier 1 + Tier 2 = 51 deep notes; Tier 3 grew through 18 passes). The README at d:\Vault\Compendium\README.md is the only stale surface.
T1 coverage — major production languages
Standalone .md confirmed for: python, javascript, typescript, java, c, cpp, csharp, go, rust, ruby, php, swift, kotlin, scala, haskell, erlang, elixir, clojure, fsharp, ocaml, r, julia, lua, perl, bash, powershell, sql, dart, zig, nim, crystal.
| Language | Standalone? | Status |
|---|---|---|
| Python, JS, TS, Java, C, C++, C#, Go, Rust, Ruby, PHP, Swift, Kotlin, Scala | yes | Tier 1 |
| Haskell, Erlang, Elixir, Clojure, F#, OCaml, R, Julia, Lua, Perl, Bash, PowerShell, SQL, Dart, Zig, Nim, Crystal | yes | Tier 1 or Tier 2 |
| MATLAB | no | only in Tier3 (likely scientific.md/statistical-software.md) |
| HTML / CSS | no | intentionally excluded (“Markup-only formats … not programming languages”) per Tier3 _index.md |
| Mojo | no | catalog-only (Tier3 gpu-and-shaders + experimental-and-cross-host) |
Spot-check python.md (249 lines): full schema present — frontmatter, At a glance, Getting started, Basics, Intermediate, Advanced, God mode, Idioms & style, Ecosystem, Gotchas, Citations. Substantive content (Python 3.14, free-threaded build PEP 703/779, uv version manager, etc.). The “~500-1000 lines each” Tier-1 convention in the README is aspirational rather than enforced — actual range from wc -l is 118 (cpp.md) to 369 (sql.md) with median ~210. They are concise reference notes, not exhaustive tutorials.
T3 catalog coverage — per-catalog spot-check
Sampled 5 of 85 to validate the per-index counts:
| Catalog | Lines | Claimed langs | Verified rows in table | Match? |
|---|---|---|---|---|
lisps.md | 89 | ~30 (22 + 8 Scheme) | 22 + 8 = 30 (verified by read) | exact |
scientific.md | 98 | ~27 | 28 (incl header) → ~27 data rows | within 1 |
esoteric.md | 93 | ~22 | 22 data rows | exact |
hdl.md | 88 | ~17 | 22 data rows (incl ecosystem tools) | over-reports by ~5 (consistent w/ ”+ ecosystem tools” in _index) |
smart-contract.md | 121 | ~26 | 27 data rows | within 1 |
Per-catalog claims are within ±2 of the actual table rows. Aggregate ~2163 is a credible ballpark.
Catalog size distribution (lines, from wc -l):
- Short (~80-100 lines): lisps, hdl, esoteric, ml-and-fp, mainframe-legacy, gpu-and-shaders, logic-and-constraint, forth-and-concatenative, scientific, query, visual-dataflow, spreadsheet, bio-workflow, robotics-control, music-audio, game-scripting
- Medium (~100-140 lines): config-and-dsl, embedded-firmware, ai-prompt-languages, notation-spec, regex-flavors, math-notation, codec-and-dsp, dwarf-expressions, exam-pseudocode, chatbot-intent-dsls, citation-formats, i18n-locale, agriculture-farming, legal-contract, voice-phonetics, theorem-prover-dsls, game-engine-scripting, document-typesetting, smart-contract, game-data-narrative, process-modelling, learning-content, sport-fitness, genealogy, real-estate-property, insurance-actuarial, retail-supplychain, statistical-software, climate-carbon
- Long (~140-160 lines): most modern/vertical catalogs — healthcare-clinical, financial-regulatory, government-civictech, aerospace-defence, semantic-web, network-protocol-dsls, identity-auth-policy, tax-forms, automotive-onvehicle, print-page-description, music-notation, accessibility-aria, oci-cloud-native, cryptography-keys, construction-bim, energy-power, maritime-port, ros2-robotics-config, nlp-corpus, forensic-evidence, cybersec-deep, 3d-scene, linguistic-resources, genai-llm-runtime
Total Tier3 = 11,252 lines / 86 files = ~131 lines/catalog average.
Total distinct languages — rough estimate
Tier3 _index.md self-reports ~2163. T1+T2 contributes 51 deep notes (a subset already cross-referenced from family notes — Lisps catalog explicitly cross-links to common-lisp, scheme, racket, clojure). De-duplicated total stays ~2163 — the deep notes are not net-additional, they are the “deep-treatment subset” of langs that also appear in their family catalog.
The README’s ~310 languages is frozen at the 2026-05-07 snapshot (first Tier-3 pass: 13 families, ~310 langs). It missed 17 subsequent expansion passes that grew Tier 3 by 7× in language count and 6.5× in family count.
Recommended fills (highest-value additions)
- MATLAB — promote from catalog mention to standalone Tier 1 (huge install base in engineering / academia; explicitly listed as a major production language in the audit prompt).
- Mojo — promote from Tier3 catalog mention to Tier 2 (Modular/Lattner, Python-superset for AI; growing rapidly).
- WebAssembly text (WAT) — could be its own Tier 2 or live in
assembly-and-encoding; useful given the deep WASM/component-model momentum. - HTML/CSS — intentionally excluded as “not programming languages” but the call is debatable: CSS has Turing-complete subsets (selectors-3 + sibling) and the README excluding all web stack is a real gap for an “encyclopedic” library. Either (a) write a single combined Tier 3 family
web-markup-and-styling.md, or (b) document the exclusion more visibly. - Carbon, Hare, Vale — three newer systems languages with active 2024-2026 development; currently in
experimental-and-cross-hostbut increasingly relevant. - CUDA C / OpenCL C — present in
gpu-and-shadersbut not as deep notes; deep treatment would be useful given AI workload prevalence. - A
_compare_error_handling.mdcross-cutting note — Result/Option vs exceptions vs panic-and-recover vs conditions vs result-pattern. The 5 existing cross-cuts cover memory/types/concurrency/metaprog/build; error handling is the obvious 7th and is referenced piecemeal across many deep notes. - Refresh / version-pin pass —
_index.mdalready notes the next scheduled refresh is “~2026-11” (six-month deep-note refresh). The deep notes contain version numbers that drift (e.g., Python 3.14, PHP 8.5). Worth keeping that ratchet.
Recommended deletions / consolidations
- No deletions warranted. The library is well-organized; even the smaller Tier3 catalogs (forth-and-concatenative at 78 lines, ml-and-fp at 82) are domain-appropriate sizes.
- Possible consolidation:
industrial-automation.md+automotive-onvehicle.md+ros2-robotics-config.mdoverlap in CAN bus / fieldbus territory but each has a distinct vertical focus and unique standards stack — keep separate. - Cross-family duplication is acknowledged in Tier3
_index.md“Cross-family overlap” section (Mercury/Curry, Datalog, Mojo, PureScript/Elm). Good hygiene already in place. - One stale claim to fix in
_index.md: says “Total files: 89” in the “Library final shape” block (line ~189) — actual is 146. The 18-pass expansion ratcheted the family count but the file-count summary at the top of root_index.mdwas never updated. The Tier3_index.mdnumbers (85 families / ~2163 langs / “eighteenth pass”) are current and accurate.
Action items (priority order)
- README one-line fix at
d:\Vault\Compendium\README.mdline 23: replace “51 deep per-language notes + family catalogs across ~310 languages” with “51 deep per-language notes + 6 cross-cutting comparisons + 85 family catalogs across ~2163 languages”. - Root
_index.mdsummary fix: update the “Total files: 89” line to “Total files: 146” and bump the catalogued-languages number to ~2163 (Tier3_index.mdalready has the correct figure). - Decide on MATLAB / Mojo standalone promotion (1-2 hours each, low risk).
- Defer: web stack (HTML/CSS) decision, error-handling cross-cut, version refresh — all reasonable but not urgent.