Languages library audit

TL;DR

  • README is massively understated. Claims “51 deep notes + family catalogs across ~310 languages”. Reality: 51 deep notes + 85 family catalogs + 6 cross-cutting comparison notes, covering ~2163 languages per the Tier3 _index.md (the ~310 figure was accurate on 2026-05-07, but the library went through 18 expansion passes through 2026-05-14 and the README was never updated).
  • T1 coverage of mainstream production languages is solid — every major language in the audit prompt has a standalone .md (Python, JS, TS, Java, C, C++, C#, Go, Rust, Ruby, PHP, Swift, Kotlin, Scala, Haskell, Erlang, Elixir, Clojure, F#, OCaml, R, Julia, Lua, Perl, Bash, PowerShell, SQL, Dart, Zig, Nim, Crystal). Missing standalone notes: MATLAB, HTML, CSS, Mojo (catalog-only).
  • Tier 3 is genuinely encyclopedic. Sampled catalogs (lisps 30, scientific 28, esoteric 22, hdl 22, smart-contract 27) match their per-index claims within ±2 rows. 85 catalogs averaging ~25 langs/family supports the ~2163 aggregate.
  • Recommendation: Update the README one-liner to reflect the real shape (51 deep notes + 6 cross-cutting + 85 family catalogs across ~2163 languages). No content gaps urgent enough to block; consider 3-4 targeted fills (MATLAB, mainline web stack notes if desired, Mojo upgrade).

Inventory

Counted via Glob + wc -l against d:\Vault\Compendium\Math-and-Compute\Languages\.

BucketCountNotes
Root .md files6051 language notes + 9 meta files
Meta files (root)9_index, _schema, _roadmap, _learn_next, 5 _compare_*
T1+T2 deep language notes51matches README/_index claim exactly
Tier3 .md files8685 family catalogs + 1 _index.md
Grand total .md146vs _index self-claim of “89 files” (stale by 3 expansion passes worth)
Cross-cutting comparison notes6_compare_memory_models, _compare_type_systems, _compare_concurrency, _compare_metaprogramming, _compare_build_tools, _learn_next
Languages catalogued (per Tier3 _index)~2163across 85 family catalogs

The _index.md at root tells the consistent story (Tier 1 + Tier 2 = 51 deep notes; Tier 3 grew through 18 passes). The README at d:\Vault\Compendium\README.md is the only stale surface.

T1 coverage — major production languages

Standalone .md confirmed for: python, javascript, typescript, java, c, cpp, csharp, go, rust, ruby, php, swift, kotlin, scala, haskell, erlang, elixir, clojure, fsharp, ocaml, r, julia, lua, perl, bash, powershell, sql, dart, zig, nim, crystal.

LanguageStandalone?Status
Python, JS, TS, Java, C, C++, C#, Go, Rust, Ruby, PHP, Swift, Kotlin, ScalayesTier 1
Haskell, Erlang, Elixir, Clojure, F#, OCaml, R, Julia, Lua, Perl, Bash, PowerShell, SQL, Dart, Zig, Nim, CrystalyesTier 1 or Tier 2
MATLABnoonly in Tier3 (likely scientific.md/statistical-software.md)
HTML / CSSnointentionally excluded (“Markup-only formats … not programming languages”) per Tier3 _index.md
Mojonocatalog-only (Tier3 gpu-and-shaders + experimental-and-cross-host)

Spot-check python.md (249 lines): full schema present — frontmatter, At a glance, Getting started, Basics, Intermediate, Advanced, God mode, Idioms & style, Ecosystem, Gotchas, Citations. Substantive content (Python 3.14, free-threaded build PEP 703/779, uv version manager, etc.). The “~500-1000 lines each” Tier-1 convention in the README is aspirational rather than enforced — actual range from wc -l is 118 (cpp.md) to 369 (sql.md) with median ~210. They are concise reference notes, not exhaustive tutorials.

T3 catalog coverage — per-catalog spot-check

Sampled 5 of 85 to validate the per-index counts:

CatalogLinesClaimed langsVerified rows in tableMatch?
lisps.md89~30 (22 + 8 Scheme)22 + 8 = 30 (verified by read)exact
scientific.md98~2728 (incl header) → ~27 data rowswithin 1
esoteric.md93~2222 data rowsexact
hdl.md88~1722 data rows (incl ecosystem tools)over-reports by ~5 (consistent w/ ”+ ecosystem tools” in _index)
smart-contract.md121~2627 data rowswithin 1

Per-catalog claims are within ±2 of the actual table rows. Aggregate ~2163 is a credible ballpark.

Catalog size distribution (lines, from wc -l):

  • Short (~80-100 lines): lisps, hdl, esoteric, ml-and-fp, mainframe-legacy, gpu-and-shaders, logic-and-constraint, forth-and-concatenative, scientific, query, visual-dataflow, spreadsheet, bio-workflow, robotics-control, music-audio, game-scripting
  • Medium (~100-140 lines): config-and-dsl, embedded-firmware, ai-prompt-languages, notation-spec, regex-flavors, math-notation, codec-and-dsp, dwarf-expressions, exam-pseudocode, chatbot-intent-dsls, citation-formats, i18n-locale, agriculture-farming, legal-contract, voice-phonetics, theorem-prover-dsls, game-engine-scripting, document-typesetting, smart-contract, game-data-narrative, process-modelling, learning-content, sport-fitness, genealogy, real-estate-property, insurance-actuarial, retail-supplychain, statistical-software, climate-carbon
  • Long (~140-160 lines): most modern/vertical catalogs — healthcare-clinical, financial-regulatory, government-civictech, aerospace-defence, semantic-web, network-protocol-dsls, identity-auth-policy, tax-forms, automotive-onvehicle, print-page-description, music-notation, accessibility-aria, oci-cloud-native, cryptography-keys, construction-bim, energy-power, maritime-port, ros2-robotics-config, nlp-corpus, forensic-evidence, cybersec-deep, 3d-scene, linguistic-resources, genai-llm-runtime

Total Tier3 = 11,252 lines / 86 files = ~131 lines/catalog average.

Total distinct languages — rough estimate

Tier3 _index.md self-reports ~2163. T1+T2 contributes 51 deep notes (a subset already cross-referenced from family notes — Lisps catalog explicitly cross-links to common-lisp, scheme, racket, clojure). De-duplicated total stays ~2163 — the deep notes are not net-additional, they are the “deep-treatment subset” of langs that also appear in their family catalog.

The README’s ~310 languages is frozen at the 2026-05-07 snapshot (first Tier-3 pass: 13 families, ~310 langs). It missed 17 subsequent expansion passes that grew Tier 3 by 7× in language count and 6.5× in family count.

  1. MATLAB — promote from catalog mention to standalone Tier 1 (huge install base in engineering / academia; explicitly listed as a major production language in the audit prompt).
  2. Mojo — promote from Tier3 catalog mention to Tier 2 (Modular/Lattner, Python-superset for AI; growing rapidly).
  3. WebAssembly text (WAT) — could be its own Tier 2 or live in assembly-and-encoding; useful given the deep WASM/component-model momentum.
  4. HTML/CSS — intentionally excluded as “not programming languages” but the call is debatable: CSS has Turing-complete subsets (selectors-3 + sibling) and the README excluding all web stack is a real gap for an “encyclopedic” library. Either (a) write a single combined Tier 3 family web-markup-and-styling.md, or (b) document the exclusion more visibly.
  5. Carbon, Hare, Vale — three newer systems languages with active 2024-2026 development; currently in experimental-and-cross-host but increasingly relevant.
  6. CUDA C / OpenCL C — present in gpu-and-shaders but not as deep notes; deep treatment would be useful given AI workload prevalence.
  7. A _compare_error_handling.md cross-cutting note — Result/Option vs exceptions vs panic-and-recover vs conditions vs result-pattern. The 5 existing cross-cuts cover memory/types/concurrency/metaprog/build; error handling is the obvious 7th and is referenced piecemeal across many deep notes.
  8. Refresh / version-pin pass_index.md already notes the next scheduled refresh is “~2026-11” (six-month deep-note refresh). The deep notes contain version numbers that drift (e.g., Python 3.14, PHP 8.5). Worth keeping that ratchet.
  • No deletions warranted. The library is well-organized; even the smaller Tier3 catalogs (forth-and-concatenative at 78 lines, ml-and-fp at 82) are domain-appropriate sizes.
  • Possible consolidation: industrial-automation.md + automotive-onvehicle.md + ros2-robotics-config.md overlap in CAN bus / fieldbus territory but each has a distinct vertical focus and unique standards stack — keep separate.
  • Cross-family duplication is acknowledged in Tier3 _index.md “Cross-family overlap” section (Mercury/Curry, Datalog, Mojo, PureScript/Elm). Good hygiene already in place.
  • One stale claim to fix in _index.md: says “Total files: 89” in the “Library final shape” block (line ~189) — actual is 146. The 18-pass expansion ratcheted the family count but the file-count summary at the top of root _index.md was never updated. The Tier3 _index.md numbers (85 families / ~2163 langs / “eighteenth pass”) are current and accurate.

Action items (priority order)

  1. README one-line fix at d:\Vault\Compendium\README.md line 23: replace “51 deep per-language notes + family catalogs across ~310 languages” with “51 deep per-language notes + 6 cross-cutting comparisons + 85 family catalogs across ~2163 languages”.
  2. Root _index.md summary fix: update the “Total files: 89” line to “Total files: 146” and bump the catalogued-languages number to ~2163 (Tier3 _index.md already has the correct figure).
  3. Decide on MATLAB / Mojo standalone promotion (1-2 hours each, low risk).
  4. Defer: web stack (HTML/CSS) decision, error-handling cross-cut, version refresh — all reasonable but not urgent.