GPU & Shader Languages — Tier 3 Index

GPU & Shader Languages — Tier 3 Index

  • Type: Family index (Tier 3)
  • Family: GPU compute and graphics shading languages
  • Languages catalogued: 18
  • Last updated: 2026-05-07

Family overview

GPU programming has two intertwined histories: graphics shading languages (GLSL, HLSL, MSL, WGSL, Slang) descend from RenderMan and Cg and target the rasterization pipeline, while general-purpose compute languages (CUDA, HIP, OpenCL, SYCL, Triton, Mojo) emerged once GPUs became programmable enough to host arbitrary workloads. CUDA’s 2006 release combined with NVIDIA’s relentless library investment (cuBLAS, cuDNN, NCCL) gave it a near-monopoly on HPC and ML training that AMD’s HIP and Khronos’s OpenCL have spent fifteen years failing to break. The 2024-2026 era introduced a new layer above CUDA: kernel DSLs (Triton, Mojo, Pallas) that let researchers write GPU kernels in Python-shaped syntax while compiling through MLIR/LLVM down to PTX or AMDGPU.

In our deep library

No dedicated deep notes for GPU languages yet. Adjacent material:

  • python — Triton, CUDA-via-CuPy/Numba, JAX/Pallas, Mojo all live in or near Python
  • cpp — CUDA C++, HIP, SYCL, ISPC, Halide, Kokkos, RAJA host language
  • rust — emerging GPU work via wgpu/rust-gpu (cross-link only)
  • fortran — OpenACC and OpenMP target offload pragmatic users

Tier 3 — the family

LanguageFirst releaseStatus 2026NicheWhy it mattersSource URL
CUDA C/C++2007DominantNVIDIA GPU computeThe de facto standard for HPC and ML training; ecosystem moat (cuDNN, cuBLAS, NCCL, Thrust)https://developer.nvidia.com/cuda-toolkit
HIP2016Active (AMD)AMD GPU computeSource-compatible with CUDA; hipify translates kernels; primary path for ROCmhttps://rocm.docs.amd.com/projects/HIP/
OpenCL C / OpenCL C++2009Maintenance, decliningCross-vendor computeKhronos cross-vendor compute; lost momentum to CUDA + SYCL/Vulkan computehttps://www.khronos.org/opencl/
GLSL2004StandardOpenGL/Vulkan shadersThe OpenGL Shading Language; via SPIR-V is a primary Vulkan inputhttps://www.khronos.org/opengl/wiki/OpenGL_Shading_Language
HLSL2002StandardDirect3D / Vulkan shadersMicrosoft’s shading language; with DXC compiler now also a first-class Vulkan/SPIR-V sourcehttps://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/
MSL (Metal Shading Language)2014Active (Apple)Apple GPU shaders + computeC++14-based; only path to Apple Silicon GPUs nativelyhttps://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf
WGSL2023 (W3C CR)StandardisingWebGPU shadersThe shading language for WebGPU; designed for safety and portability across Vulkan/Metal/D3D backendshttps://www.w3.org/TR/WGSL/
Slang2018Active (NVIDIA-stewarded since 2024)Modular shadingModular, generic shading language compiling to HLSL/GLSL/SPIR-V/CUDA/Metal; under Khronos governancehttps://shader-slang.com/
ISPC2011Active (Intel/community)CPU SIMDSPMD model for CPU vector lanes; widely used in rendering (Embree, OSPRay)https://ispc.github.io/
Halide2012ActiveImage-processing pipelinesDSL embedded in C++; algorithm/schedule separation; inside Adobe, Google, half of computational photographyhttps://halide-lang.org/
SYCL / DPC++2014 / 2019Active (Khronos / Intel)Cross-vendor heterogeneous C++Single-source C++17 for CPU/GPU/FPGA; oneAPI’s DPC++ is Intel’s flagship implementationhttps://www.khronos.org/sycl/
Triton2021Dominant for ML kernelsPython DSL for GPU kernelsOpenAI/PyTorch; the way most 2024+ ML researchers write custom kernels; underpins much of torch.compilehttps://triton-lang.org/
Mojo2023Active (Modular)Python+systems hybridCompiles through MLIR; targets CPU/GPU; positioned as Python-superset for AI infrahttps://www.modular.com/mojo
JAX Pallas2024GrowingJAX kernel DSLGoogle’s Triton-equivalent inside JAX; targets both GPU and TPUhttps://jax.readthedocs.io/en/latest/pallas/
Cg2002Deprecated 2012Historical cross-API shadingNVIDIA’s pre-HLSL/GLSL unifier; deprecated when those two won; influenced HLSL syntaxhttps://en.wikipedia.org/wiki/Cg_(programming_language)
RenderMan Shading Language (RSL/OSL)1988 / 2010Niche (film VFX)Offline renderingPixar’s RSL pioneered programmable shading; OSL (Open Shading Language) is its modern open successor used in Arnold, Cycles, V-Rayhttps://github.com/AcademySoftwareFoundation/OpenShadingLanguage
OpenACC2012MaintenancePragma-based GPU offloadDirective-based offload for Fortran/C/C++; popular in legacy HPC, eclipsed by OpenMP targethttps://www.openacc.org/
OpenMP target offload2013 (4.0)StandardPragma-based heterogeneousOpenMP’s target directives extended OpenMP to GPU offload; now widely supported by GCC/Clang/NVHPChttps://www.openmp.org/
Kokkos / RAJA2014 / 2014Active (DOE labs)C++ portable HPCHeader-only C++ libraries for portable parallelism across CPUs/GPUs; LLNL/Sandia stewardship; backbone of US Exascale codeshttps://kokkos.org/

Notable threads

The CUDA moat. NVIDIA’s 2006 launch of CUDA was a strategic gamble: instead of competing on raw graphics performance alone, they invested for fifteen years in a software stack (cuBLAS, cuDNN, cuSPARSE, NCCL, TensorRT) that nobody else could match. By the time AMD shipped HIP and Intel shipped oneAPI/SYCL, the entire ML training stack — PyTorch, JAX, TensorFlow, vLLM, the model libraries — had been written assuming CUDA. As of 2026 the cheapest path to running any frontier model is still CUDA on H100/B200. AMD’s MI300X has made real inroads in inference (where memory bandwidth matters more than ecosystem) but training remains effectively NVIDIA-only at the frontier.

Triton as the new CUDA. Around 2023 a clear pattern emerged: most ML researchers don’t actually want to write CUDA C++. They want to write tile-based kernels in something Python-shaped. Triton (originally Phil Tillet’s PhD work, then OpenAI, now PyTorch core) hit that sweet spot, and as of 2026 a substantial fraction of the custom kernels powering top-tier inference engines (FlashAttention, paged attention, MoE kernels) are written in Triton. JAX shipped Pallas as the in-tree equivalent; Mojo bets on a more ambitious Python-superset systems language. The thesis across all three: MLIR is the right intermediate representation, Python is the right surface syntax, and CUDA C++ is too low-level for the iteration speed researchers need.

Shader languages converged on SPIR-V. Once Vulkan shipped in 2016 with SPIR-V as its bytecode IR, the question of “which shading language do I write in” became a compiler-frontend question rather than an API-lock-in question. DXC compiles HLSL to SPIR-V. glslang compiles GLSL to SPIR-V. Slang compiles to SPIR-V (and HLSL/MSL/CUDA). WGSL compiles to SPIR-V on Vulkan backends. The practical effect: in 2026 you pick a shading language by team preference and tooling, not by target API.

Directive-based offload’s quiet success. OpenMP target offload didn’t win mindshare the way CUDA did, but it quietly became the path for legacy Fortran/C++ HPC codes (climate models, CFD, lattice QCD) to run on GPUs without a full rewrite. The DOE Exascale projects standardised on Kokkos/RAJA + OpenMP target as their portability story, and Frontier (AMD), Aurora (Intel), and El Capitan (AMD) all run production workloads through that stack.

Citations