Programming Language Ecosystems for ML

A working catalog of the language ecosystems used for machine learning circa 2026. Three layers exist in every healthy ML language stack: (1) research and model definition — where humans write the math; (2) training infrastructure — distributed compute, data loading, checkpointing; (3) inference and serving — production deployment at low latency. Different languages dominate different layers — Python owns 1 and 2; C++/CUDA/Rust dominate the kernel layer of 2; Rust/Go/C++ are increasingly the choice for 3.

The selection axes that matter:

  1. Library breadth — does the framework you need have idiomatic bindings? Python has 90%+ of published research; everything else is catching up.
  2. GPU / accelerator support — first-class CUDA + ROCm + MPS + TPU bindings vs FFI-via-libtorch vs nothing.
  3. Performance — JIT/AOT compilation, memory model, parallelism primitives. Python relies on calling out to C/CUDA; Rust + Mojo + Julia generate native code; JS runs in V8/SpiderMonkey or WebGPU.
  4. Deployment shape — embedded edge inference (Swift, Rust, C++, C), in-browser (JS/WASM), in-database (Python UDFs, PL/Python), large-scale serving (Python+Triton, Rust, Go, C++), notebooks (Python, Julia, R).
  5. Interop — easy bridges (PyTorch C++ frontend, PythonCall, ONNX, MLIR/StableHLO) reduce the lock-in penalty.
  6. Tooling maturity — packaging (pip/uv/conda/Poetry/cargo/Pkg), debugging, testing, profiling, formatters, type checkers.

Python — the dominant ML language

The default everything for ML; roughly 90% of published ML research, near-100% of fine-tuning pipelines, near-100% of teaching. The reasons are historical (NumPy 2006, scikit-learn 2010, Theano 2008, TF 2015, PyTorch 2016) and self-reinforcing — every new model is shipped first as a Python module.

Numerical foundations

  • NumPy — Travis Oliphant 2006 (merger of Numeric + numarray); BSD-3-Clause; the canonical ndarray; underlies essentially every Python ML library. NumPy 2.0 released Jun 2024 — first major version since 2006, breaking changes to ABI, smaller default integer types, better typing.
  • SciPy — sparse matrices, statistics, optimization, signal processing, image processing; 1.13+ in 2024.
  • pandas — Wes McKinney 2008; DataFrames. pandas 2.x (2023+) — PyArrow-backed dtypes (pd.ArrowDtype) optionally replacing the legacy NumPy-backed columns; 3.0 in progress with Copy-on-Write as default.
  • Polars — Ritchie Vink 2020; Rust core, Python bindings; lazy + eager execution; multi-threaded by default; Apache Arrow native. Polars 1.0 released Apr 2024 — stable API. Often 5-30× faster than pandas on grouped aggregations and joins. Polars Cloud announced 2024 for distributed Polars on cloud compute.
  • Dask — Matthew Rocklin 2015; out-of-core + distributed pandas/NumPy/scikit-learn parallelism; declining in favor of Polars + Ray, but still common in Anaconda-shop pipelines.
  • modin — IIDDS Lab at Berkeley → Snowflake; drop-in pandas replacement backed by Ray or Dask.
  • Vaex — out-of-core DataFrame with memory-mapped HDF5; declining in favor of Polars.
  • Ray — RISE Lab Berkeley → Anyscale 2018; distributed Python; Ray Core (actors + tasks), Ray Data, Ray Train, Ray Tune, Ray Serve, RLlib. Anyscale’s pivot to LLM serving and RLHF infrastructure 2024. Used internally at OpenAI, Cohere, Uber, Spotify, Pinterest.
  • cuDF + RAPIDS — NVIDIA 2018; GPU-accelerated pandas + scikit-learn; cuDF (DataFrames), cuML (ML), cuGraph (graph), cuSpatial (geospatial), cuxfilter (interactive viz). cudf.pandas accelerator mode (2023): %load_ext cudf.pandas transparently runs pandas on GPU. Used heavily for ETL on H100/A100 clusters.
  • Ibis — Wes McKinney (post-pandas) 2015; portable DataFrame DSL that compiles to SQL on 20+ backends (DuckDB, Snowflake, BigQuery, Polars, PySpark, Trino, ClickHouse). Ibis 9.0 2024.

ML frameworks

  • scikit-learn — INRIA + Saclay 2010; BSD-3; the classical ML library (linear/logistic regression, trees, SVMs, clustering, dimensionality reduction). Still dominant for tabular and small data; v1.5 (2024).
  • PyTorch — Meta AI Research 2016 (Soumith Chintala et al., evolution of Lua Torch); BSD-style. Donated to Linux Foundation as PyTorch Foundation Sep 2022. The dominant deep-learning framework by 2020+ (~75% of new papers). PyTorch 2.0 (Mar 2023) introduced torch.compile (TorchDynamo + TorchInductor backends; AOTAutograd + PrimTorch lowering); 2.1, 2.2, 2.3 (2023-2024) brought DTensor + FSDP2 + AOTInductor; 2.4 added MPS Metal stability and FX-based dynamic shapes; 2.5/2.6 (late 2024-early 2025) brought torch.export AOT compilation maturity, compiled FSDP2, NJT nested ragged tensors.
  • TensorFlow — Google Brain 2015 (evolution of DistBelief); Apache 2.0. TF 2.x (2019+) refactored around Keras and eager execution. Now largely Google-internal + production legacy; declining in research share. TF.Keras 3.0 (2024) is multi-backend — runs on TF, JAX, or PyTorch interchangeably.
  • JAX — Google Brain → Google DeepMind 2018 (Roy Frostig, Matthew James Johnson, Chris Leary); Apache 2.0. Functional + composable transformations (grad, jit, vmap, pmap, pjit, shard_map). Built on XLA. Underlies Gemini training at Google. Surrounding ecosystem: Flax (Neural networks), Optax (optimizers), Orbax (checkpointing), Equinox (PyTree-native NN), Penzai (DeepMind 2024; new NN style), MaxText (Google’s reference LLM training in JAX).
  • MLX — Apple Machine Learning Research Dec 2023; MIT license. Apple Silicon-optimized array framework; lazy compute, unified memory across CPU/GPU on M-series chips. NumPy-like API, PyTorch-like NN layer. MLX Swift (2024) — Swift bindings for embedding in iOS / macOS apps. Used by mlx-lm and mlx-vlm packages for on-device LLM inference on Mac.
  • PyTorch Lightning — William Falcon 2019 (Lightning AI); training-loop abstraction on top of PyTorch. Lightning Studio (managed cloud notebooks + jobs).
  • Hugging Face Transformers — Hugging Face 2018 (originally pytorch-transformers); Apache 2.0. The dominant model-zoo + tokenizer + training-loop library. Supports PyTorch + TF + JAX/Flax models. Auto classes* (AutoModel, AutoTokenizer) for model-agnostic loading.
  • Hugging Face datasets — Apache 2.0; Arrow-backed dataset loading + streaming; integrated with HF Hub.
  • Hugging Face accelerate — distributed training abstraction across single-GPU, multi-GPU, multi-node, FSDP, DeepSpeed; the “use any backend” wrapper.
  • Hugging Face peft — parameter-efficient fine-tuning library (LoRA, QLoRA, AdaLoRA, IA³, prefix tuning, prompt tuning).
  • Hugging Face trl — Transformer Reinforcement Learning; SFT + DPO + GRPO + PPO + ORPO for post-training.
  • DeepSpeed — Microsoft 2020; ZeRO memory partitioning (Stage 1/2/3), ZeRO-Infinity (NVMe offload), Mixture-of-Experts. Used at Microsoft for Megatron-Turing NLG; still the backbone of many large-model training runs.
  • Megatron-LM — NVIDIA; pipeline + tensor parallelism reference implementation; Megatron-Core (2024) is the modular library extracted from the original training scripts.
  • Lit-GPT / TorchTitan (Meta) — modern, simple PyTorch-native LLM training references.
  • Mosaic / Composer — MosaicML (acquired by Databricks Jun 2023 $1.3B); training-throughput optimizations + benchmarks.
  • vLLM — UC Berkeley Sky Computing Lab 2023 (Woosuk Kwon et al.); PagedAttention memory management; the dominant open LLM-inference engine. vLLM v0.6+ (late 2024) with structured output, prefix cache, speculative decoding mature.

Experiment tracking and ML platforms

  • MLflow — Databricks 2018; open-source experiment tracking + model registry; MLflow 3.0 (2025) is GenAI-native with deep traces and prompt versioning.
  • Weights & Biases — Lukas Biewald 2018; SaaS-only experiment tracking. W&B Weave (2024) is the LLM trace + eval tool.
  • ClearML — Allegro AI; open-source full stack.
  • Hydra — Facebook 2019; hierarchical YAML config management; the de facto standard for ML config in research.
  • Optuna — Preferred Networks 2018; Bayesian hyperparameter optimization.
  • DVC (Data Version Control) — Iterative AI; git-style versioning for datasets + model artifacts.

Inference serving

  • Triton Inference Server (NVIDIA) — multi-framework production server; covered in Tier 2 inference notes.
  • TorchServe — PyTorch Foundation’s serving; declining as torch.export + AOTInductor mature.
  • TensorFlow Serving — Google; legacy.
  • BentoML — BentoML 2020; Python-native packaging + serving + autoscaling.
  • Ray Serve — Anyscale; Ray-backed serving with composable deployments.
  • Streamlit — Streamlit Inc. → Snowflake (acquired Mar 2022, $800M); rapid Python data apps. Snowflake Streamlit-in-Snowflake (2023+).
  • Gradio — Hugging Face (acquired 2021); rapid demo UIs for ML models. Gradio 5.0 (2024) brought significant UX improvements.
  • FastAPI — Sebastián Ramírez 2018; the default Python web framework for ML inference services. Used by Hugging Face Inference Endpoints, OpenAI, Anthropic, Cohere internally.

Packaging and tooling

  • uv — Astral 2024; Rust-written Python package manager; 10-100× faster than pip; manages venvs + Python versions; the rising default.
  • Poetry — declarative + lock-file Python packaging; still common in libraries.
  • conda + mamba + micromamba — Anaconda’s package manager; mamba is the C++ rewrite of conda for speed; micromamba is the standalone smaller binary. pixi (Prefix.dev 2023) is the newer conda-compatible package manager in Rust.
  • pip + pip-tools — venerable; still works.
  • rye — Armin Ronacher 2023 → handed to Astral 2024; now folded into uv.

Type checking and testing

  • mypy — Jukka Lehtosalo (now at Microsoft); the original Python type checker.
  • pyright — Microsoft; faster, more accurate; powers Pylance in VS Code.
  • ruff — Astral 2022; Rust-written linter + formatter; replacing flake8 + isort + black + pyupgrade.
  • pytest — pytest-dev 2009; the de facto test framework.
  • hypothesis — David MacIver 2013; property-based testing for Python.
  • pytest-xdist — parallel test execution.

Rust — the rising ML systems language

Rust’s adoption in ML accelerated dramatically 2022-2026. The pattern: keep Python on top for the user-facing API, but rewrite the hot path in Rust. Tokenizers (Hugging Face, Rust core, Python binding), Polars (Rust core), PyO3 (Rust↔Python bindings), and maturin (build tool) made this approach standard.

  • candle — Hugging Face 2023; minimalist tensor library written in Rust by Laurent Mazaré. No-JVM, no-Python, no-CUDA-required deployment. Supports CUDA, Metal (Apple), CPU. Apache 2.0. Used in production for Hugging Face’s Inference Endpoints and many edge-inference deployments. Supports F16, BF16, F32, quantized GGUF.
  • burn-rs (Burn) — Tracel AI 2023 (Nathaniel Simard); PyTorch-like API but pure Rust; pluggable backends (Candle, LibTorch, NdArray, WGPU). Apache 2.0. Burn 0.14 (2024).
  • dfdx — Discord developer Caleb Caster 2022; deep-learning library with strong type-level shape checking — tensor shapes are part of the compile-time type, eliminating an entire class of runtime errors.
  • linfa — Luca Palmieri (Mainmatter) 2019+; scikit-learn-equivalent for classical ML in pure Rust.
  • polars-rs — see Python section; the Rust crate is itself a popular DataFrame library.
  • nalgebra + ndarray — Sébastien Crozet’s linear algebra (nalgebra) and the NumPy-equivalent (ndarray); foundational.
  • plotters — pure-Rust plotting.
  • rust-bert — Guillaume Becquin; pre-trained BERT/T5/GPT-2/RoBERTa wrappers via tch-rs.
  • tch-rs — LibTorch FFI bindings for Rust (the same C++ PyTorch backend, accessible from Rust).
  • Mistral.rs — Eric Buehler 2024; vLLM-equivalent in Rust; supports Mistral, Llama, Phi, Gemma, Qwen, etc.; PagedAttention; ISQ in-situ quantization; OpenAI-compatible API. Apache 2.0. Notable for very low memory overhead vs Python-based vLLM.
  • tokio + axum + actix-web — the async runtimes and web frameworks Rust uses for serving ML inference.
  • cargo + crates.io — Rust’s package manager and registry; widely admired for ergonomics.

Julia — scientific computing’s modern try

Julia (Jeff Bezanson, Stefan Karpinski, Viral Shah, Alan Edelman; first release 2012; 1.0 in 2018) was designed from day one for high-performance numerical computing. MIT-licensed. The promise was “looks like Python, runs like C” via LLVM JIT. Julia’s adoption in ML never reached Python scale but it is deeply entrenched in scientific computing, differential equations, control theory, computational physics — where its multiple dispatch + JIT + first-class arrays earn the keep.

  • Flux.jl — Mike Innes (originally at Julia Computing) 2018; pure Julia neural network library; MIT. Differentiated by Zygote.jl (source-to-source automatic differentiation; differentiate any Julia function).
  • Lux.jl — Avik Pal 2022+; more explicit / functional alternative to Flux; better for distributed training. Becoming preferred for new Julia ML work.
  • SciML ecosystem — Chris Rackauckas + community; DifferentialEquations.jl (the most comprehensive ODE/SDE/DAE/PDE solver suite in any language), ModelingToolkit.jl (symbolic-numeric modeling), Catalyst.jl (chemical reaction networks). Used at MIT, Caltech, Nvidia (in some research), pharma R&D.
  • Symbolics.jl — symbolic computation in pure Julia; competitive with SymPy + Mathematica for many tasks.
  • MLJ.jl — Anthony Blaom (Alan Turing Institute); scikit-learn-equivalent meta-framework.
  • Turing.jl — Cambridge; probabilistic programming; comparable to Stan + PyMC.
  • CUDA.jl — Tim Besard; idiomatic GPU programming in pure Julia (compile Julia kernels to PTX directly); no C++ required.
  • Knet.jl — Deniz Yuret; legacy framework; declining.
  • Pluto.jl — Fons van der Plas; reactive notebooks (cells re-run automatically as dependencies change). Often preferred over IJulia for teaching.
  • IJulia — Jupyter kernel.
  • PythonCall + JuliaCall — bidirectional interop with Python; widely used.
  • PyCall — older Julia-from-Python and Python-from-Julia bridge.
  • RCall, JavaCall — bridges to R, Java.
  • Julia 1.10 (Dec 2023) — multi-threading enabled by default, package precompilation improvements, ~2× compile-time reduction.
  • Julia 1.11 (Oct 2024) — Memory{T} lower-level memory primitive, more compile time wins.
  • Commercial: Julia Computing → JuliaHub (2018+); cloud service for Julia workloads; HPC partnerships; pivoted to scientific simulation + digital twins market in 2024 rather than competing head-on with Python ML.

Swift — Apple’s edge/on-device ML stack

Swift’s role in ML is almost entirely on-device Apple inference. The earlier “Swift for TensorFlow” (S4TF) attempt (Chris Lattner, 2018-2021) was archived in 2022 when Lattner moved to Modular. Apple kept investing in CoreML and now MLX.

  • Core ML — Apple 2017+; the on-device inference framework for iOS, iPadOS, macOS, watchOS, tvOS, visionOS. Converters from ONNX, PyTorch, TensorFlow. Used inside every iPhone app that runs a model locally.
  • Core ML Tools (coremltools) — Python library to convert from PyTorch / TF / ONNX to Core ML’s .mlmodel format; Apple maintains.
  • CreateML — Swift framework + macOS app for training simple models (classification, regression, sound, image) without writing Python.
  • MLX — covered above; Apple’s pure-Apple-Silicon array framework (Dec 2023).
  • MLX Swift — 2024; Swift bindings to MLX, suitable for shipping MLX-powered inference in iOS / macOS apps. Lets developers run Llama / Mistral / Phi on-device with a Swift API.
  • Swift for TensorFlow (S4TF) — Chris Lattner et al. 2018-2021; archived 2022 when Lattner founded Modular. The dream of “differentiable Swift” lives on in concepts but not in code.
  • TensorFlow Lite Swift — Google’s edge-TF Swift bindings; declining vs Core ML for iOS.

Mojo — Python’s compiled descendant

Mojo (Modular Inc.; Chris Lattner + Tim Davis 2023) is the most ambitious new language in ML circa 2024-2026. The pitch: a superset of Python that compiles to native code via MLIR, with strong static typing, value semantics, ownership (Rust-like), and SIMD primitives built in. Performance up to 35,000× over Python for compiled paths (the original Modular blog post benchmark on matrix multiplication).

  • Public preview May 2023; open-sourced in stages 2024 (stdlib Apache 2.0). Compiler still closed-source as of late 2024 but on a public-roadmap path.
  • MAX (Modular Accelerated Xecution) Platform — Modular’s commercial inference platform; runs PyTorch + ONNX models with Mojo-compiled kernels; competitive with TensorRT and vLLM on benchmarks.
  • Position: not a replacement for Python at the user-facing API level, but a way to write kernels and hot-paths in Python-like syntax that compile to GPU/CPU directly without dropping into CUDA C++ or Triton.
  • Adoption is still small (low thousands of practitioners) but the ML compiler community is paying close attention given Lattner’s track record (LLVM, Clang, Swift, MLIR, Tensor Comprehensions).
  • Modular Inc. funding: 250M Series B 2024 at $1.6B valuation.

Scala / JVM — Spark, H2O, and the legacy DL4J era

  • Apache Spark + MLlib — Apache 2.0; the Scala JVM language. Spark itself (RDD + DataFrame + Dataset APIs) is essential for large-scale batch ETL and feature engineering; MLlib is the bundled Scala/Java ML library for distributed classical ML (linear / tree / clustering / collaborative filtering). Mostly used for tabular and recommendation-system feature engineering rather than deep learning.
  • H2O.ai — H2O.ai Inc. 2012 (Sri Ambati + Cliff Click; SriCC). Open-source H2O-3 (Java/Scala/Python/R distributed ML platform) + Driverless AI (AutoML) + H2O Wave (Python apps) + H2OGPT (open-source LLMs hosted) + H2O Document AI (commercial 2023+). Erin LeDell led much of the H2O-3 work. Used in financial-services and insurance.
  • BigDL (now IPEX-LLM) — Intel 2017; distributed DL on Spark + Intel hardware; rebranded to focus on Intel oneAPI/OpenVINO integration.
  • DeepLearning4J (DL4J) — Eclipse Foundation; Skymind founded 2014, dissolved 2019; the project moved to Konduit AI. Declining but still used in some Java-shop production environments. ND4J was the underlying tensor library.
  • Smile (Statistical Machine Intelligence and Learning Engine) — Haifeng Li 2014; Scala/Java; classical ML.
  • ScalaPy — Scala-Python interop; lets Scala code call Python ML libraries.
  • Apache Beam Scala SDK (SCIO) — Spotify-led Scala wrapper around Apache Beam; widely used for streaming pipelines on Dataflow.
  • Akka + Pekko + Kafka Streams — JVM streaming primitives often paired with ML inference in production microservice deployments.

Kotlin — JetBrains and decline

  • KotlinDL — JetBrains 2020 launch; TensorFlow-backed; deprecated 2024 by JetBrains. The bet on Kotlin-native deep learning didn’t materialize.
  • KMath — Kotlin Multiplatform mathematics library; small but actively developed.
  • ND4J + DL4J Kotlin — via Java interop; same decline as DL4J.
  • Kotlin’s stronger ML role is in Android on-device inference (TensorFlow Lite Kotlin bindings + MediaPipe).

Go — primarily a serving language

Go’s ML library scene is small; its role in ML is deployment of inference services rather than training.

  • Gorgonia — Chewxy (Xuanyi Chew) 2016-onwards; PyTorch-equivalent symbolic graph + autodiff in pure Go. Apache 2.0. Sparsely maintained but still works.
  • Gonum — Go numerical libraries (linear algebra, statistics, optimization, plotting). Solid but small ML community.
  • ONNX-Go — Go bindings to ONNX runtime.
  • Gorgonia-Lab — research extensions to Gorgonia.
  • BentoML’s Go runtime — production serving wrapper.
  • Inferno, Triton Go clients, TF Lite Go — the bigger story: Go is widely used as the serving language wrapping Triton / TF Serving / ONNX Runtime for production inference because of its concurrency model and easy deployment.

C++ — kernels, frameworks’ guts, and edge inference

C++ is the second most important ML language by deployed-binary count, even if no one writes new ML research in it. Every PyTorch, TF, JAX, Triton, vLLM, llama.cpp, MLX, ONNX Runtime, TensorRT under the hood is C++ (with some CUDA).

  • LibTorch — PyTorch C++ frontend; identical model semantics to Python PyTorch; for embedding PyTorch in C++ services.
  • Eigen — Gaël Guennebaud + Benoît Jacob 2009; the header-only template C++ linear algebra library. Used inside TF, Ceres, ROS, many others.
  • Armadillo — Conrad Sanderson; NASA + research-oriented; high-level linear algebra.
  • dlib — Davis King; toolkit including face recognition, classification, regression, optimization.
  • ONNX Runtime — Microsoft; the cross-framework C++ runtime; supports CPU, CUDA, TensorRT, OpenVINO, ROCm, CoreML, NNAPI, WebGPU execution providers. Wide adoption in production.
  • TensorRT — NVIDIA; optimizing inference compiler for NVIDIA GPUs; TensorRT-LLM (2023+) for transformer inference; in-flight batching, paged KV cache, FP8 quantization.
  • MNN — Alibaba; mobile / edge inference; competitive with TFLite.
  • ncnn — Tencent 2017; ARM-mobile-optimized; used in Tencent apps.
  • cppflow — Serizba; lightweight TF C API wrapper for C++.
  • TNN — Tencent’s other inference framework; for production cross-platform.
  • OpenCV — Intel 1999 → OpenCV.org Foundation; computer vision library with DNN module supporting ONNX, TF, Caffe, Darknet, Torch.
  • mlpack — Ryan Curtin et al.; header-only modern C++ ML.
  • Caffe + Caffe2 — Berkeley (Yangqing Jia 2013) + Facebook (2017); merged into PyTorch 2018; legacy but still seen in some embedded deployments.
  • Triton Inference Server — NVIDIA; multi-backend inference server in C++/Python.
  • TorchScript — PyTorch’s earlier serializable IR (mostly superseded by torch.export + AOTInductor in 2024).

C — the lowest-level layer, ggml and llama.cpp

C’s modern resurgence in ML is single-handedly driven by Georgi Gerganov and the ggml ecosystem.

  • ggml — Georgi Gerganov 2022; tensor library in pure C. Supports CPU (AVX, AVX-512, NEON), CUDA, Metal, ROCm, Vulkan, OpenCL via Kompute, SYCL. The foundation of:
  • llama.cpp — Gerganov 2023; runs Llama and ~50 other open LLMs on a laptop / phone / Raspberry Pi via quantization (Q4/Q5/Q6/Q8 + the K-quants). MIT license. GGUF file format (replacing GGML format in 2023) is the lingua franca for shipping quantized models — used by LM Studio, Ollama, Jan, GPT4All, oobabooga, every consumer LLM app.
  • whisper.cpp — Gerganov 2022; OpenAI Whisper speech recognition in C. Runs on iPhone.
  • bark.cpp, stable-diffusion.cpp — ggml ports of Bark TTS and Stable Diffusion image generation.
  • ANN libraries with C interfaces — FAISS C API, Annoy C bindings; widely embedded.
  • ONNX Runtime C API — for C-only embedded inference.

Haskell — research only

  • HaskTorch — Hasktorch organization; PyTorch (libtorch) bindings to Haskell. Active but small community.
  • Backprop, Determinant — pedagogical research libraries for autodiff in Haskell.
  • The Haskell-for-ML community is small but produces interesting type-level work (shape-checked tensors via dependent types, etc.).

F# / .NET — Microsoft’s ML.NET stack

  • ML.NET — Microsoft 2018; C#/F# cross-platform classical-ML library with AutoML. Used in some enterprise .NET shops; the bridge for “we have a giant C# codebase and want some ML.”
  • ONNX Runtime C# bindings — first-class .NET inference.
  • TorchSharp — official .NET binding to libtorch; close API parity with PyTorch Python.
  • Microsoft Semantic Kernel — C# (and Python and Java) LLM orchestration library; Microsoft’s answer to LangChain.
  • Azure ML SDK for .NET — managed training and deployment via Azure ML.

JavaScript / TypeScript — browser, edge, and Node serving

JS-for-ML has exploded 2022-2026 driven by browser-side inference, transformers.js, and the WebGPU rollout.

  • TensorFlow.js — Google 2018; runs TensorFlow models in browser (WebGL backend, WebGPU backend since 2023, WASM backend) and Node.js. Apache 2.0. Used at Twitter (image cropping), Pinterest (visual search inline), and many product-website “try-on” demos.
  • ONNX Runtime Web — Microsoft; ONNX models in the browser; WebGPU EP since 2023; WebGL fallback. Often faster than TF.js for the same model.
  • transformers.js — Hugging Face (Xenova) 2023; Hugging Face Transformers in the browser. Models load from HF Hub, run in-browser via ONNX Runtime Web + WebGPU. Supports embeddings, text classification, summarization, translation, ASR (Whisper-tiny in-browser), TTS, image-classification, zero-shot. v3 (2024) brought WebGPU as default and broader model coverage.
  • MediaPipe Solutions — Google; vision + audio + ML solutions for the web (Tasks API: face landmarker, hand landmarker, object detector, image segmenter, gesture recognizer, audio classifier, LLM inference experimental). Native iOS / Android / Web; WebAssembly + WebGPU.
  • Brain.js — JS-native simple neural networks; declining; mostly used in coding tutorials.
  • Synaptic — older JS NN library; abandoned.
  • PyScript — Anaconda 2022; runs Python (via Pyodide WebAssembly) in the browser including NumPy + scikit-learn; for educational and notebook-like web apps.
  • LangChain.js — JS port of LangChain; integrates with Pinecone, Qdrant, Weaviate, OpenAI, Anthropic, etc.
  • LlamaIndex.ts — JS port of LlamaIndex.
  • Vercel AI SDK — Vercel 2023; TypeScript-native LLM orchestration + streaming + tool use; the rising default for Next.js + AI app developers.
  • Modelfusion — TS-native multi-provider AI library.
  • Mastra, Vellum, CrewAI.js — agent frameworks in TS.

R — statistics first, ML second

R remains the language of academic statistics, biostatistics, social science, and finance quant research. ML role is secondary but real.

  • caret — Max Kuhn 2007; the long-standing ML meta-package; superseded by:
  • tidymodels — Max Kuhn + Hadley Wickham (Posit); modular framework: parsnip (model interface), recipes (preprocessing), rsample (resampling), workflows (pipelines), tune (hyperparameter), yardstick (metrics). The modern R ML stack.
  • torch for R — Posit; R interface to libtorch; lets R users access PyTorch’s neural networks without leaving R.
  • keras + tensorflow R packages — Posit; R bindings to Keras/TF.
  • mlr3 — Bernd Bischl et al. (LMU Munich); the OOP-style alternative to tidymodels.
  • ranger, xgboost R, lightgbm R, catboost R — the tree ensembles, all available in R.
  • tidyverse + data.table — data manipulation; data.table for performance, tidyverse for ergonomics.
  • Quarto — Posit 2022; reproducible scientific publishing; supports R + Python + Julia + Observable; the successor to R Markdown.
  • shiny — Posit; interactive R web apps; commonly used to ship ML model results internally.

  • MATLAB — MathWorks; proprietary; expensive site licenses standard at universities + EE/control companies.
  • Deep Learning Toolbox — MathWorks’ DL framework; supports ONNX import/export, GPU training.
  • Simulink — block-diagram modeling environment; dominant in aerospace + automotive control systems.
  • ML role is primarily research + EE/control + legacy bridge to production (e.g., automotive engineers train networks in MATLAB then export to C/C++ for ECU deployment via MATLAB Coder + Simulink Embedded Coder).

GPU programming languages and shading

The “language” of GPU kernels itself is a category that crosses every ecosystem above.

  • CUDA C/C++ — NVIDIA 2007; the dominant GPU programming language for ML; nvcc compiles to PTX.
  • cuDNN, cuBLAS, NCCL, cuTLASS, CUTLASS, cuFFT, cuSPARSE — NVIDIA’s domain libraries; everyone uses them.
  • Triton — OpenAI 2021 (Philippe Tillet); Python DSL for writing GPU kernels that compile down to PTX via MLIR. Used inside PyTorch’s torch.compile for the GPU code-gen path (TorchInductor → Triton → PTX). Apache 2.0. Significantly simpler than raw CUDA C++ for the typical fused-kernel use case.
  • CUDA Python (Numba CUDA) — Anaconda; @cuda.jit decorators for Python functions to compile to PTX.
  • OpenCL — Apple + Khronos 2009; vendor-agnostic GPU API; mostly displaced by CUDA on NVIDIA and Vulkan/Metal elsewhere.
  • HIP — AMD’s CUDA-portability layer; hipify-perl translates CUDA C++ → HIP nearly automatically; runs on AMD MI300X + NVIDIA via HIP-NVIDIA backend.
  • SYCL — Khronos 2014; cross-platform single-source C++ for accelerators; Intel’s oneAPI DPC++ is the leading implementation.
  • Metal Performance Shaders (MPS) — Apple’s GPU compute API; PyTorch MPS backend uses this for Mac GPU acceleration.
  • WebGPU + WGSL — W3C / Khronos; the browser GPU API finally shipped in Chrome (2023) and Safari/Firefox (2024). Underlies WebGPU backends for TF.js, ONNX Runtime Web, MediaPipe.
  • Vulkan compute — for cross-vendor GPU compute outside the browser.

ML compilers and intermediate representations

The IR layer is converging on a small number of standards driven mostly by Chris Lattner’s career arc (LLVM → Clang → Swift → MLIR → Mojo).

  • MLIR (Multi-Level Intermediate Representation) — Chris Lattner + Google 2019; donated to LLVM Foundation. Extensible IR with multiple dialects (Linalg, Tensor, SCF, GPU, SPIRV). Foundation of all modern ML compilers.
  • XLA (Accelerated Linear Algebra) — Google; original TF + JAX compiler; powers TPU compilation and a lot of GPU codegen.
  • StableHLO — successor to MHLO/HLO; the portable version of XLA’s HLO IR; now a separate Apache 2.0 specification.
  • OpenXLA — collaboration (Google + Meta + AMD + NVIDIA + Apple + Hugging Face + others) 2023; the multi-stakeholder home for XLA + StableHLO + IREE.
  • IREE (Intermediate Representation Execution Environment) — Google ML compiler; lowers StableHLO to CPU/GPU/Vulkan/CUDA targets via MLIR.
  • Triton — see above.
  • Apache TVM + TVM Unity — Apache 2.0; UWash + community ML compiler; lowers ONNX/TF/PyTorch models to optimized kernels. TVM Unity (2023+) is the modern dataflow-IR redesign.
  • TensorRT — NVIDIA-specific; see above.
  • OpenVINO — Intel; CPU + iGPU + VPU.
  • ONNX Runtime EP — execution providers (CUDA, TensorRT, ROCm, OpenVINO, CoreML, NNAPI, WebGPU) — turn ONNX Runtime into an inference router over all the above compilers.
  • Apache Polaris, AITemplate (Meta; 2022 inference compiler; quieter in 2024), Hidet (CentML; acquired by NVIDIA Apr 2024).

Hardware-vendor ML ecosystems

  • NVIDIA: CUDA + cuDNN + NCCL + cuBLAS + cuTLASS + TensorRT + TensorRT-LLM + Triton Inference Server + NIM (NVIDIA Inference Microservices) + NeMo training framework + Megatron-Core + RAPIDS (cuDF, cuML, cuGraph). End-to-end stack.
  • AMD: ROCm (CUDA-equivalent runtime) + MIGraphX (graph compiler) + HIP (portability) + RCCL (NCCL-equivalent). MI300X has displaced some H100 deployments in 2024-2026 inference workloads.
  • Intel: oneAPI (DPC++/SYCL + libraries) + OpenVINO + IPEX (Intel PyTorch Extension) + Gaudi 3 + Tiber developer cloud.
  • Apple: MLX + Core ML + Core ML Tools + Neural Engine (ANE) hardware + MPS.
  • Qualcomm: AI Hub (model zoo + benchmark + on-device deployment) + AI Engine + Hexagon NPU.
  • Google: TPU + XLA + JAX + Pallas (TPU/GPU kernel DSL on top of JAX, 2024+).
  • Cerebras: WSE-3 + Cerebras Inference (very-fast LLM serving via huge on-chip SRAM).
  • Groq: LPU + GroqCloud (high-throughput LLM inference).
  • SambaNova: SN40L + DataScale + Samba-1 trillion-param model.
  • Tenstorrent: Wormhole/Grayskull + TT-Metal + Buda compiler.

  1. Python remains ~90% of new ML projects despite real challengers; the network effect of NumPy + PyTorch + Hugging Face is decisive.
  2. Rust rising for ML systems — Mistral.rs, candle, Polars, tokenizers, uv, ruff. Pattern: Python UX layer + Rust core. Pure Rust ML frameworks (burn, dfdx) remain a small slice.
  3. Mojo positioning as Python-fast-path — the only language that explicitly attacks the “Python is slow” problem at the language level rather than via bindings. Adoption still small but watch.
  4. Julia steady in scientific computing — has not broken into mainstream ML and seems unlikely to, but has a durable niche in DiffEq + control + scientific machine learning.
  5. Swift + MLX for Apple ecosystem — Apple Silicon’s unified memory makes on-device inference of 7B-70B models genuinely usable; Swift APIs make integration into apps trivial.
  6. JS for browser and edge inference exploding — transformers.js + WebGPU make zero-server ML demos commonplace; “your data never leaves the browser” privacy story is real.
  7. ML compilers consolidating around MLIR + StableHLO — OpenXLA collaboration, Triton, IREE, TVM Unity, Mojo all use MLIR; the IR-fragmentation era is ending.
  8. CUDA monopoly weakening at the edges — AMD ROCm + Intel Gaudi + Apple MLX + Cerebras + Groq + SambaNova all gaining serious inference share, even if NVIDIA still owns training overwhelmingly.
  9. Inference frameworks proliferating — vLLM (Python), Mistral.rs (Rust), llama.cpp (C), MLX (Swift/Python), ExecuTorch (PyTorch mobile), MLC-LLM (TVM), MAX (Modular), TensorRT-LLM (NVIDIA), GroqCloud, Cerebras Inference. The pattern: each accelerator vendor ships its own optimized runtime, and ONNX Runtime + Triton remain the cross-vendor gateways.
  10. Tooling renaissance in Python — uv, ruff, pyright, polars, pixi are collectively rewriting the Python toolchain in Rust at 10-100× speed. Python development in 2026 is dramatically more pleasant than in 2020.
  11. R holds in statistics + biotech; Quarto + tidymodels keep the platform fresh.
  12. Kotlin and Scala ML losing share to Python everywhere except where the JVM is non-negotiable (banks, finance, big-data shops with Spark).

Language-by-deployment-target summary

Deployment targetDefault language stack
LLM training (large clusters)Python + PyTorch + DeepSpeed/Megatron + CUDA
LLM training (Google TPU)Python + JAX + Flax + XLA
LLM inference server (cloud)Python + vLLM or Rust + Mistral.rs or C++ + TensorRT-LLM
LLM inference (laptop / desktop)C + llama.cpp (GGUF) or Python + MLX
LLM inference (iOS / macOS)Swift + MLX or Swift + Core ML
LLM inference (Android)Kotlin/Java + TF Lite + MediaPipe LLM
LLM inference (browser)JS + transformers.js + WebGPU
Embeddings + RAG pipelinePython + sentence-transformers / Voyage / Cohere SDK
Classical ML (tabular)Python + scikit-learn + xgboost / lightgbm / catboost
Distributed feature ETLPython + Polars or PySpark / Scala + Spark
Real-time inference (low-latency)Go + Triton/ONNX Runtime or Rust + candle
Computer vision (research)Python + PyTorch + timm + OpenCV
Computer vision (edge)C++ + OpenCV + ONNX Runtime + TensorRT
Scientific computing + ODE/PDEJulia + SciML or Python + JAX
Statistical analysis + biostatisticsR + tidymodels + Quarto
Control / EE / aerospaceMATLAB + Simulink + (export to C/C++)
RoboticsC++ + Python + ROS 2 (rcl + rclpy + rclcpp)
Reinforcement learning (research)Python + JAX + Brax or PyTorch + Stable-Baselines3
Reinforcement learning (industrial)Python + Ray RLlib
Recommender systemsPython + PyTorch + TorchRec or Spark MLlib
Time-series forecastingPython + statsforecast + neuralforecast or R + fpp3
AutoMLPython + AutoGluon / H2O / FLAML / TPOT
Notebooks + reproducible researchPython (Jupyter / Quarto) or Julia (Pluto / IJulia) or R (Quarto / RMarkdown)

Notebook environments

The notebook is the universal ML scratchpad regardless of language.

  • Jupyter — Project Jupyter (Fernando Pérez; evolved from IPython 2001+); BSD; supports 40+ kernels via the kernel protocol. JupyterLab (the modern IDE-style interface) + Jupyter Notebook (classic) + JupyterHub (multi-user) + JupyterBook (publishing) + nbconvert + voila.
  • VS Code Notebooks — Microsoft 2021+; the Jupyter protocol implemented in VS Code; rapidly displacing standalone JupyterLab for many users because it bundles the rest of the IDE.
  • Google Colab — Google 2017; free Jupyter-in-the-cloud with GPU access; the entry point for millions of ML learners. Colab Pro / Pro+ / Enterprise tiers.
  • Kaggle Notebooks — Kaggle (Google-owned); competition + community-driven; free GPU access; tightly integrated with Kaggle datasets.
  • Deepnote — Deepnote 2019; collaborative notebooks; rich version control.
  • Hex — Hex Technologies 2019; notebook-app hybrid; data-analyst-focused.
  • Observable — Observable Inc. (Mike Bostock, D3.js creator); reactive JS-native notebooks; the “spreadsheet of dataflow.”
  • Pluto.jl — see Julia section; reactive notebooks for Julia.
  • Marimo — Marimo 2023; reactive Python notebooks (Pluto-style); rising alternative to Jupyter for reproducible work.
  • Posit Workbench (formerly RStudio Server Pro) + Posit Cloud — R + Python + Quarto + Shiny hosting.
  • Databricks Notebooks — Databricks’ notebook environment; SQL + Python + Scala + R cells; backed by Spark.
  • AWS SageMaker Studio / Studio Lab — AWS managed; Jupyter-based with SageMaker training/inference integrations. Studio Lab is the free tier.
  • Azure ML Notebooks — Azure-managed Jupyter.
  • Vertex AI Workbench — GCP-managed Jupyter on managed VMs.
  • Modal Notebooks — Modal Labs; serverless GPU notebooks.

Cross-language inference protocols

The bridge layer that lets a Python-trained model run from any language.

  • ONNX (Open Neural Network Exchange) — Microsoft + Facebook 2017; protobuf-based model serialization format. Supports nearly every ML framework as exporter and ONNX Runtime as the universal importer. Operator set evolves; opset 21 (2024) covers most modern transformer ops. Bindings: Python, C, C++, C#, Java, JavaScript (Web + Node), Rust, Go, Swift, Objective-C.
  • OpenVINO IR — Intel; alternative IR optimized for Intel CPU / iGPU / VPU.
  • TFLite FlatBuffers — Google; mobile / edge TF.
  • Core ML mlmodel / mlpackage — Apple; iOS / macOS-only.
  • GGUF — Gerganov; quantized weight container for the llama.cpp ecosystem.
  • safetensors — Hugging Face 2022; safe (no arbitrary code execution) weight format; replacing pickle as the default in HF Hub.
  • NVIDIA TensorRT engine — vendor-locked but heavily-optimized; built from ONNX or LibTorch.
  • PyTorch Edge / ExecuTorch — PyTorch Foundation 2024; on-device deployment of PyTorch models; competes with TF Lite + Core ML for cross-platform mobile inference.
  • MLC-LLM — TVM project’s universal LLM compiler; targets WebGPU, iOS Metal, Android Vulkan, CUDA, ROCm, Mac MPS.

Common language interop bridges

The “two-language problem” (research in slow language, production in fast language) is mitigated by bridges:

  • PyO3 + maturin — Rust-from-Python (and Python-from-Rust); the dominant Rust extension toolkit.
  • pybind11 + nanobind — C++-from-Python; nanobind (Wenzel Jakob 2022) is the modern replacement.
  • Cython — Python-with-C-speed; the venerable approach.
  • CFFI — Pythonic C foreign function interface; less invasive than ctypes.
  • JNI + JNA — Java native interface; messy but standard.
  • WASM + Wasmer / wasmtime — sandboxed bytecode bridging; growing as a deployment target.
  • gRPC + Apache Arrow Flight — language-agnostic RPC for shipping tensors / dataframes between processes in different languages.
  • TorchScript + torch.export — PyTorch’s serializable subset; runnable from C++ via LibTorch.

Education and certifications by language

A practical note for choosing a stack to learn or hire for:

  • Python ML: Andrew Ng’s Coursera courses, fast.ai (Jeremy Howard), Hugging Face NLP Course, Karpathy’s “Neural Networks: Zero to Hero” YouTube series, Stanford CS224N / CS231N / CS229. By far the deepest education content.
  • Julia: MIT 18.S191 / 6.S083, Julia Academy. Smaller but high-quality.
  • R: “R for Data Science” (Hadley Wickham), Posit’s certifications. Solid for statistics-first practitioners.
  • Rust ML: emerging; “Rust ML Book” (community), Tracel.ai burn-book.
  • Mojo: Modular’s official docs + their YouTube series; community still small.

Adjacent