GPU & Shader Languages — Tier 3 Index

Type: Family index (Tier 3)
Family: GPU compute and graphics shading languages
Languages catalogued: 18
Last updated: 2026-05-07

Family overview

GPU programming has two intertwined histories: graphics shading languages (GLSL, HLSL, MSL, WGSL, Slang) descend from RenderMan and Cg and target the rasterization pipeline, while general-purpose compute languages (CUDA, HIP, OpenCL, SYCL, Triton, Mojo) emerged once GPUs became programmable enough to host arbitrary workloads. CUDA’s 2006 release combined with NVIDIA’s relentless library investment (cuBLAS, cuDNN, NCCL) gave it a near-monopoly on HPC and ML training that AMD’s HIP and Khronos’s OpenCL have spent fifteen years failing to break. The 2024-2026 era introduced a new layer above CUDA: kernel DSLs (Triton, Mojo, Pallas) that let researchers write GPU kernels in Python-shaped syntax while compiling through MLIR/LLVM down to PTX or AMDGPU.

In our deep library

No dedicated deep notes for GPU languages yet. Adjacent material:

python — Triton, CUDA-via-CuPy/Numba, JAX/Pallas, Mojo all live in or near Python
cpp — CUDA C++, HIP, SYCL, ISPC, Halide, Kokkos, RAJA host language
rust — emerging GPU work via wgpu/rust-gpu (cross-link only)
fortran — OpenACC and OpenMP target offload pragmatic users

Tier 3 — the family

Language	First release	Status 2026	Niche	Why it matters	Source URL
CUDA C/C++	2007	Dominant	NVIDIA GPU compute	The de facto standard for HPC and ML training; ecosystem moat (cuDNN, cuBLAS, NCCL, Thrust)	https://developer.nvidia.com/cuda-toolkit
HIP	2016	Active (AMD)	AMD GPU compute	Source-compatible with CUDA; `hipify` translates kernels; primary path for ROCm	https://rocm.docs.amd.com/projects/HIP/
OpenCL C / OpenCL C++	2009	Maintenance, declining	Cross-vendor compute	Khronos cross-vendor compute; lost momentum to CUDA + SYCL/Vulkan compute	https://www.khronos.org/opencl/
GLSL	2004	Standard	OpenGL/Vulkan shaders	The OpenGL Shading Language; via SPIR-V is a primary Vulkan input	https://www.khronos.org/opengl/wiki/OpenGL_Shading_Language
HLSL	2002	Standard	Direct3D / Vulkan shaders	Microsoft’s shading language; with DXC compiler now also a first-class Vulkan/SPIR-V source	https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/
MSL (Metal Shading Language)	2014	Active (Apple)	Apple GPU shaders + compute	C++14-based; only path to Apple Silicon GPUs natively	https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf
WGSL	2023 (W3C CR)	Standardising	WebGPU shaders	The shading language for WebGPU; designed for safety and portability across Vulkan/Metal/D3D backends	https://www.w3.org/TR/WGSL/
Slang	2018	Active (NVIDIA-stewarded since 2024)	Modular shading	Modular, generic shading language compiling to HLSL/GLSL/SPIR-V/CUDA/Metal; under Khronos governance	https://shader-slang.com/
ISPC	2011	Active (Intel/community)	CPU SIMD	SPMD model for CPU vector lanes; widely used in rendering (Embree, OSPRay)	https://ispc.github.io/
Halide	2012	Active	Image-processing pipelines	DSL embedded in C++; algorithm/schedule separation; inside Adobe, Google, half of computational photography	https://halide-lang.org/
SYCL / DPC++	2014 / 2019	Active (Khronos / Intel)	Cross-vendor heterogeneous C++	Single-source C++17 for CPU/GPU/FPGA; oneAPI’s DPC++ is Intel’s flagship implementation	https://www.khronos.org/sycl/
Triton	2021	Dominant for ML kernels	Python DSL for GPU kernels	OpenAI/PyTorch; the way most 2024+ ML researchers write custom kernels; underpins much of `torch.compile`	https://triton-lang.org/
Mojo	2023	Active (Modular)	Python+systems hybrid	Compiles through MLIR; targets CPU/GPU; positioned as Python-superset for AI infra	https://www.modular.com/mojo
JAX Pallas	2024	Growing	JAX kernel DSL	Google’s Triton-equivalent inside JAX; targets both GPU and TPU	https://jax.readthedocs.io/en/latest/pallas/
Cg	2002	Deprecated 2012	Historical cross-API shading	NVIDIA’s pre-HLSL/GLSL unifier; deprecated when those two won; influenced HLSL syntax	https://en.wikipedia.org/wiki/Cg_(programming_language)
RenderMan Shading Language (RSL/OSL)	1988 / 2010	Niche (film VFX)	Offline rendering	Pixar’s RSL pioneered programmable shading; OSL (Open Shading Language) is its modern open successor used in Arnold, Cycles, V-Ray	https://github.com/AcademySoftwareFoundation/OpenShadingLanguage
OpenACC	2012	Maintenance	Pragma-based GPU offload	Directive-based offload for Fortran/C/C++; popular in legacy HPC, eclipsed by OpenMP target	https://www.openacc.org/
OpenMP target offload	2013 (4.0)	Standard	Pragma-based heterogeneous	OpenMP’s `target` directives extended OpenMP to GPU offload; now widely supported by GCC/Clang/NVHPC	https://www.openmp.org/
Kokkos / RAJA	2014 / 2014	Active (DOE labs)	C++ portable HPC	Header-only C++ libraries for portable parallelism across CPUs/GPUs; LLNL/Sandia stewardship; backbone of US Exascale codes	https://kokkos.org/

Notable threads

The CUDA moat. NVIDIA’s 2006 launch of CUDA was a strategic gamble: instead of competing on raw graphics performance alone, they invested for fifteen years in a software stack (cuBLAS, cuDNN, cuSPARSE, NCCL, TensorRT) that nobody else could match. By the time AMD shipped HIP and Intel shipped oneAPI/SYCL, the entire ML training stack — PyTorch, JAX, TensorFlow, vLLM, the model libraries — had been written assuming CUDA. As of 2026 the cheapest path to running any frontier model is still CUDA on H100/B200. AMD’s MI300X has made real inroads in inference (where memory bandwidth matters more than ecosystem) but training remains effectively NVIDIA-only at the frontier.

Triton as the new CUDA. Around 2023 a clear pattern emerged: most ML researchers don’t actually want to write CUDA C++. They want to write tile-based kernels in something Python-shaped. Triton (originally Phil Tillet’s PhD work, then OpenAI, now PyTorch core) hit that sweet spot, and as of 2026 a substantial fraction of the custom kernels powering top-tier inference engines (FlashAttention, paged attention, MoE kernels) are written in Triton. JAX shipped Pallas as the in-tree equivalent; Mojo bets on a more ambitious Python-superset systems language. The thesis across all three: MLIR is the right intermediate representation, Python is the right surface syntax, and CUDA C++ is too low-level for the iteration speed researchers need.

Shader languages converged on SPIR-V. Once Vulkan shipped in 2016 with SPIR-V as its bytecode IR, the question of “which shading language do I write in” became a compiler-frontend question rather than an API-lock-in question. DXC compiles HLSL to SPIR-V. glslang compiles GLSL to SPIR-V. Slang compiles to SPIR-V (and HLSL/MSL/CUDA). WGSL compiles to SPIR-V on Vulkan backends. The practical effect: in 2026 you pick a shading language by team preference and tooling, not by target API.

Directive-based offload’s quiet success. OpenMP target offload didn’t win mindshare the way CUDA did, but it quietly became the path for legacy Fortran/C++ HPC codes (climate models, CFD, lattice QCD) to run on GPUs without a full rewrite. The DOE Exascale projects standardised on Kokkos/RAJA + OpenMP target as their portability story, and Frontier (AMD), Aurora (Intel), and El Capitan (AMD) all run production workloads through that stack.

Citations

CUDA Toolkit: https://developer.nvidia.com/cuda-toolkit
HIP / ROCm: https://rocm.docs.amd.com/projects/HIP/
OpenCL: https://www.khronos.org/opencl/
GLSL: https://www.khronos.org/opengl/wiki/OpenGL_Shading_Language
HLSL: https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/
WGSL spec: https://www.w3.org/TR/WGSL/
Slang: https://shader-slang.com/
ISPC: https://ispc.github.io/
Halide: https://halide-lang.org/
SYCL: https://www.khronos.org/sycl/
Triton: https://triton-lang.org/
Mojo: https://www.modular.com/mojo
JAX Pallas: https://jax.readthedocs.io/en/latest/pallas/
OSL: https://github.com/AcademySoftwareFoundation/OpenShadingLanguage
OpenACC: https://www.openacc.org/
OpenMP: https://www.openmp.org/
Kokkos: https://kokkos.org/
RAJA: https://github.com/LLNL/RAJA

Compendium

Explorer

GPU & Shader Languages — Tier 3 Index

GPU & Shader Languages — Tier 3 Index

GPU & Shader Languages — Tier 3 Index

Family overview

In our deep library

Tier 3 — the family

Notable threads

Citations

Graph View

Table of Contents