GPU & Shader Languages — Tier 3 Index
GPU & Shader Languages — Tier 3 Index
- Type: Family index (Tier 3)
- Family: GPU compute and graphics shading languages
- Languages catalogued: 18
- Last updated: 2026-05-07
Family overview
GPU programming has two intertwined histories: graphics shading languages (GLSL, HLSL, MSL, WGSL, Slang) descend from RenderMan and Cg and target the rasterization pipeline, while general-purpose compute languages (CUDA, HIP, OpenCL, SYCL, Triton, Mojo) emerged once GPUs became programmable enough to host arbitrary workloads. CUDA’s 2006 release combined with NVIDIA’s relentless library investment (cuBLAS, cuDNN, NCCL) gave it a near-monopoly on HPC and ML training that AMD’s HIP and Khronos’s OpenCL have spent fifteen years failing to break. The 2024-2026 era introduced a new layer above CUDA: kernel DSLs (Triton, Mojo, Pallas) that let researchers write GPU kernels in Python-shaped syntax while compiling through MLIR/LLVM down to PTX or AMDGPU.
In our deep library
No dedicated deep notes for GPU languages yet. Adjacent material:
- python — Triton, CUDA-via-CuPy/Numba, JAX/Pallas, Mojo all live in or near Python
- cpp — CUDA C++, HIP, SYCL, ISPC, Halide, Kokkos, RAJA host language
- rust — emerging GPU work via wgpu/rust-gpu (cross-link only)
- fortran — OpenACC and OpenMP target offload pragmatic users
Tier 3 — the family
| Language | First release | Status 2026 | Niche | Why it matters | Source URL |
|---|---|---|---|---|---|
| CUDA C/C++ | 2007 | Dominant | NVIDIA GPU compute | The de facto standard for HPC and ML training; ecosystem moat (cuDNN, cuBLAS, NCCL, Thrust) | https://developer.nvidia.com/cuda-toolkit |
| HIP | 2016 | Active (AMD) | AMD GPU compute | Source-compatible with CUDA; hipify translates kernels; primary path for ROCm | https://rocm.docs.amd.com/projects/HIP/ |
| OpenCL C / OpenCL C++ | 2009 | Maintenance, declining | Cross-vendor compute | Khronos cross-vendor compute; lost momentum to CUDA + SYCL/Vulkan compute | https://www.khronos.org/opencl/ |
| GLSL | 2004 | Standard | OpenGL/Vulkan shaders | The OpenGL Shading Language; via SPIR-V is a primary Vulkan input | https://www.khronos.org/opengl/wiki/OpenGL_Shading_Language |
| HLSL | 2002 | Standard | Direct3D / Vulkan shaders | Microsoft’s shading language; with DXC compiler now also a first-class Vulkan/SPIR-V source | https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/ |
| MSL (Metal Shading Language) | 2014 | Active (Apple) | Apple GPU shaders + compute | C++14-based; only path to Apple Silicon GPUs natively | https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf |
| WGSL | 2023 (W3C CR) | Standardising | WebGPU shaders | The shading language for WebGPU; designed for safety and portability across Vulkan/Metal/D3D backends | https://www.w3.org/TR/WGSL/ |
| Slang | 2018 | Active (NVIDIA-stewarded since 2024) | Modular shading | Modular, generic shading language compiling to HLSL/GLSL/SPIR-V/CUDA/Metal; under Khronos governance | https://shader-slang.com/ |
| ISPC | 2011 | Active (Intel/community) | CPU SIMD | SPMD model for CPU vector lanes; widely used in rendering (Embree, OSPRay) | https://ispc.github.io/ |
| Halide | 2012 | Active | Image-processing pipelines | DSL embedded in C++; algorithm/schedule separation; inside Adobe, Google, half of computational photography | https://halide-lang.org/ |
| SYCL / DPC++ | 2014 / 2019 | Active (Khronos / Intel) | Cross-vendor heterogeneous C++ | Single-source C++17 for CPU/GPU/FPGA; oneAPI’s DPC++ is Intel’s flagship implementation | https://www.khronos.org/sycl/ |
| Triton | 2021 | Dominant for ML kernels | Python DSL for GPU kernels | OpenAI/PyTorch; the way most 2024+ ML researchers write custom kernels; underpins much of torch.compile | https://triton-lang.org/ |
| Mojo | 2023 | Active (Modular) | Python+systems hybrid | Compiles through MLIR; targets CPU/GPU; positioned as Python-superset for AI infra | https://www.modular.com/mojo |
| JAX Pallas | 2024 | Growing | JAX kernel DSL | Google’s Triton-equivalent inside JAX; targets both GPU and TPU | https://jax.readthedocs.io/en/latest/pallas/ |
| Cg | 2002 | Deprecated 2012 | Historical cross-API shading | NVIDIA’s pre-HLSL/GLSL unifier; deprecated when those two won; influenced HLSL syntax | https://en.wikipedia.org/wiki/Cg_(programming_language) |
| RenderMan Shading Language (RSL/OSL) | 1988 / 2010 | Niche (film VFX) | Offline rendering | Pixar’s RSL pioneered programmable shading; OSL (Open Shading Language) is its modern open successor used in Arnold, Cycles, V-Ray | https://github.com/AcademySoftwareFoundation/OpenShadingLanguage |
| OpenACC | 2012 | Maintenance | Pragma-based GPU offload | Directive-based offload for Fortran/C/C++; popular in legacy HPC, eclipsed by OpenMP target | https://www.openacc.org/ |
| OpenMP target offload | 2013 (4.0) | Standard | Pragma-based heterogeneous | OpenMP’s target directives extended OpenMP to GPU offload; now widely supported by GCC/Clang/NVHPC | https://www.openmp.org/ |
| Kokkos / RAJA | 2014 / 2014 | Active (DOE labs) | C++ portable HPC | Header-only C++ libraries for portable parallelism across CPUs/GPUs; LLNL/Sandia stewardship; backbone of US Exascale codes | https://kokkos.org/ |
Notable threads
The CUDA moat. NVIDIA’s 2006 launch of CUDA was a strategic gamble: instead of competing on raw graphics performance alone, they invested for fifteen years in a software stack (cuBLAS, cuDNN, cuSPARSE, NCCL, TensorRT) that nobody else could match. By the time AMD shipped HIP and Intel shipped oneAPI/SYCL, the entire ML training stack — PyTorch, JAX, TensorFlow, vLLM, the model libraries — had been written assuming CUDA. As of 2026 the cheapest path to running any frontier model is still CUDA on H100/B200. AMD’s MI300X has made real inroads in inference (where memory bandwidth matters more than ecosystem) but training remains effectively NVIDIA-only at the frontier.
Triton as the new CUDA. Around 2023 a clear pattern emerged: most ML researchers don’t actually want to write CUDA C++. They want to write tile-based kernels in something Python-shaped. Triton (originally Phil Tillet’s PhD work, then OpenAI, now PyTorch core) hit that sweet spot, and as of 2026 a substantial fraction of the custom kernels powering top-tier inference engines (FlashAttention, paged attention, MoE kernels) are written in Triton. JAX shipped Pallas as the in-tree equivalent; Mojo bets on a more ambitious Python-superset systems language. The thesis across all three: MLIR is the right intermediate representation, Python is the right surface syntax, and CUDA C++ is too low-level for the iteration speed researchers need.
Shader languages converged on SPIR-V. Once Vulkan shipped in 2016 with SPIR-V as its bytecode IR, the question of “which shading language do I write in” became a compiler-frontend question rather than an API-lock-in question. DXC compiles HLSL to SPIR-V. glslang compiles GLSL to SPIR-V. Slang compiles to SPIR-V (and HLSL/MSL/CUDA). WGSL compiles to SPIR-V on Vulkan backends. The practical effect: in 2026 you pick a shading language by team preference and tooling, not by target API.
Directive-based offload’s quiet success. OpenMP target offload didn’t win mindshare the way CUDA did, but it quietly became the path for legacy Fortran/C++ HPC codes (climate models, CFD, lattice QCD) to run on GPUs without a full rewrite. The DOE Exascale projects standardised on Kokkos/RAJA + OpenMP target as their portability story, and Frontier (AMD), Aurora (Intel), and El Capitan (AMD) all run production workloads through that stack.
Citations
- CUDA Toolkit: https://developer.nvidia.com/cuda-toolkit
- HIP / ROCm: https://rocm.docs.amd.com/projects/HIP/
- OpenCL: https://www.khronos.org/opencl/
- GLSL: https://www.khronos.org/opengl/wiki/OpenGL_Shading_Language
- HLSL: https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/
- WGSL spec: https://www.w3.org/TR/WGSL/
- Slang: https://shader-slang.com/
- ISPC: https://ispc.github.io/
- Halide: https://halide-lang.org/
- SYCL: https://www.khronos.org/sycl/
- Triton: https://triton-lang.org/
- Mojo: https://www.modular.com/mojo
- JAX Pallas: https://jax.readthedocs.io/en/latest/pallas/
- OSL: https://github.com/AcademySoftwareFoundation/OpenShadingLanguage
- OpenACC: https://www.openacc.org/
- OpenMP: https://www.openmp.org/
- Kokkos: https://kokkos.org/
- RAJA: https://github.com/LLNL/RAJA