Mojo — Reference
Source: https://docs.modular.com/mojo/
Mojo
- Created: announced 2023-05-02 by Modular Inc. (Chris Lattner, Tim Davis); first public beta (cloud playground only) 2023-05; local installation generally available 2023-09-07; open-sourced standard library 2024-03-28; language and compiler open-sourced 2025-05 (Apache 2.0 with LLVM exception, with hardware-acceleration kernels remaining commercial under the MAX platform license).
- Latest stable: v25.4 (Modular adopts calendar versioning
25.{quarter}; v25.4 released 2026-04, current as of 2026-05). v1.0 (the “stable language” milestone with frozen syntax + ABI) is still expected late 2026; the team is presently focused on ergonomics + GPU kernel libraries. - Owner / license: Modular Inc. (Mountain View, CA). Apache 2.0 with LLVM exception on the language, compiler, stdlib, and most kernels (since v25.x). MAX platform commercial license for the inference runtime + hardware-specific accelerated kernel packages.
- Paradigms: systems + scientific. Python-family syntax with explicit memory ownership (Rust-like), value semantics, parametric metaprogramming at compile time, and SIMD as a first-class type.
- Typing: strict static typing in
fnmode; gradual / dynamic indefmode (Python-compat). Generics via parametricstructs andtraits. Compile-time values (alias,parameter) make Mojo a metaprogramming language as well. - Memory: ownership + borrow semantics (move/borrow checker similar to Rust);
var/ref/owned/readargument conventions; explicit__copyinit__/__moveinit__/__del__lifetime hooks. No GC by default; reference counting available viaArcPointer[T]. - Compilation: MLIR-based (Mojo is the flagship language for MLIR — the same compiler infrastructure used by TensorFlow XLA, IREE, and PyTorch 2.0). Lowers through MLIR dialects to LLVM IR, with target-specific lowering for NVIDIA SM_*, AMD CDNA/RDNA, Apple Silicon, x86, ARM.
- Primary domains: AI inference + training kernels, high-performance numeric libraries, GPU programming as a successor/companion to CUDA C++, replacing the C++/Python “two-language” stack used by PyTorch / JAX / TensorFlow contributors.
- Official docs: https://docs.modular.com/mojo/
At a glance
Mojo is Chris Lattner’s third major language project (after Swift and the LLVM/Clang stack) and his attempt to give the AI community a single language that is Python-syntactic on the surface, systems-fast underneath. The pitch: take Python (which 90% of ML researchers already write), add a real type system, ownership-based memory, SIMD primitives, and an MLIR backend, and you can write kernels for GPUs, TPUs, and custom accelerators in the same file as your model code — no C++/CUDA bridge, no JIT compile penalty, no two-language gap. Modular publishes performance claims of 35,000x speedups over CPython on matmul-style kernels (https://www.modular.com/blog/the-worlds-fastest-unified-matrix-multiplication) and is using Mojo internally to ship the MAX inference engine that runs Llama / Mistral / DeepSeek models. The language is evolving fast — syntax breaks between weekly nightly releases are still common, and library ecosystems are nascent.
Getting started
Install: Modular’s pixi (a Conda/PyPI-compatible package manager — https://pixi.sh) is the recommended path since v24.6:
curl -fsSL https://pixi.sh/install.sh | sh
pixi init my-mojo-project
cd my-mojo-project
pixi add mojo
pixi run mojo --versionLegacy modular CLI is being retired in favour of pixi. Mojo also ships in the MAX SDK (pixi add modular) for those building inference apps.
Hello world:
def main():
print("Hello, world!")Run: mojo hello.mojo (JIT-execute) or mojo build hello.mojo -o hello (AOT, produces a native binary). REPL: mojo with no args.
Project layout:
my-project/
pixi.toml # deps + tasks
pixi.lock # lockfile
src/
main.mojo
mypkg/
__init__.mojo
core.mojo
tests/
test_core.mojo
Package manager / build tool: pixi for environment + deps (handles Mojo + Python + native libs in one lockfile). The Mojo compiler itself does build + test + format. Native Mojo package registry is still in design — current ecosystem is dominated by Modular-published libraries shipped via pixi/Conda channels (https://conda.modular.com).
Editor support: the Mojo VS Code extension (Modular, official) provides LSP-backed completion, type info, diagnostics, debugger. JetBrains support is community-driven. A Jupyter kernel ships with the SDK (mojo jupyter).
Basics
Primitives: Int (platform-sized signed int), Int8/16/32/64, UInt*, Float16/32/64, BFloat16, Bool, String, StringLiteral (compile-time string), StringRef (borrowed view), DType (compile-time tag describing a numeric type), SIMD[dtype, width] (the foundational vector type — every numeric scalar is SIMD[T, 1]).
var x: Int = 42
var y: Float64 = 3.14
var v: SIMD[DType.float32, 4] = SIMD[DType.float32, 4](1.0, 2.0, 3.0, 4.0)Variables / scope: var (mutable), alias (compile-time constant). Top-level statements live in fn main() or def main(). Lexical scoping; no let (was removed in v24.4 — use immutable var references or read parameter convention).
Control flow: if/elif/else, for x in xs, while, break, continue, return, raise. try / except / finally for exceptions (Mojo errors are values; Error is the canonical type).
Functions — two flavours:
def— Python-style, dynamic, allows untyped args, raises by default. The “scripty” mode for top-level code and Python-compat work.fn— strict, all args typed, must declareraisesif it can throw, prefers ownership conventions explicit. The “systems” mode for hot paths and library code.
fn dot[width: Int](a: SIMD[DType.float32, width],
b: SIMD[DType.float32, width]) -> Float32:
return (a * b).reduce_add()
def quick(x): # Python-style, x is dynamic
return x * 2The bracketed [width: Int] parameters are compile-time — Mojo specialises the function per width. This is where the SIMD-portable, hardware-tuned kernels come from.
Strings: String (heap, mutable, owned), StringLiteral (compile-time, no allocation), StringRef (borrowed slice). Slicing: s[0:3]. Concat: "a" + "b". f-strings (f"{x:.2f}") — added in v25.1, still gaining features. Unicode handling is byte-oriented in the underlying storage; codepoint iteration is opt-in.
Built-in collections: List[T] (mutable, owned, dynamic-array), Tuple[T1, T2, ...] (heterogeneous, fixed-size), Dict[K, V] (open-addressed hash table), Set[T]. Comprehensions: [x*x for x in range(10)] (added v24.3).
Intermediate
Type system depth:
- Structs (
struct) are the only user-defined product type — value-semantic, sized at compile time. No classes (yet —classis reserved for future Python-compat OO). - Traits (
trait, since v24.2) are interface contracts: astructimplements a trait by providing the listed methods. Compile-time checked. - Parametric structs —
struct Matrix[type: DType, rows: Int, cols: Int]: .... Specialised per parameter combination. @register_passablestruct decorator means the struct is small enough to live in registers (like a__m256SIMD register or a 64-bit scalar) — used forSIMD,Int, pointer types.- Variants via
Variant[T1, T2, ...](tagged union, like Rust’senum). Pattern-match-style discrimination is still under design; current API isv.isa[T]()+v[T].
Ownership conventions (parameter prefixes on fn):
read(default for immutable args, replaces olderborrowed) — immutable borrow, like Rust&T.mut(replaces olderinout) — mutable borrow, like Rust&mut T.owned— the function takes ownership; caller must^transfer or copy. Like Rust by-value orT.ref— generalised reference with explicit lifetime parameter (used in library code).
fn add_one(mut x: Int):
x += 1
fn consume(owned s: String):
print(s) # s is dropped at end of scopeThe ^ transfer operator moves ownership: consume(s^) ends s’s lifetime in the caller.
Modules / packages: every .mojo (or .🔥) file is a module. Directories with an __init__.mojo are packages. from mypkg.core import foo. Compiled packages can be distributed as .mojopkg files (mojo package mypkg -o mypkg.mojopkg).
Error handling: functions that can raise must be marked raises. try/except/finally works as in Python. Errors are values of type Error; the unified error type carries a message. No exception hierarchy yet (under design).
Concurrency / parallelism:
parallelize[fn](num_work_items, num_workers)— fork-join primitive over a function specialised to a work-item index.vectorize[fn, simd_width]— emits a SIMD-vectorised inner loop, with peeled scalar tail.tile[fn, *tile_sizes]— tiles a loop nest for cache locality and codegen.async fn+coroutine.await— async/await (preview since v25.1, evolving). The runtime is cooperative + work-stealing.- GPU programming (since v24.5):
@parameterdecorators +DeviceContextAPI let the same Mojo source compile to a CPU loop or a CUDA / ROCm kernel. Thegpustdlib module exposesblock_idx,thread_idx,barrier(), shared memory.
Python interop:
from python import Pythonthennp = Python.import_module("numpy").- Call any pip-installed Python lib; conversions via
PythonObject(boxed value crossing the boundary). - The boundary has a measurable cost (refcount + GIL acquisition); good for “Mojo as a fast inner loop calling Python for I/O / plotting / preprocessing” pattern.
I/O / stdlib: print, open, read, write. pathlib.Path (similar to Python). The standard library is still small — many Python-style utilities (json, re, subprocess) are not yet in stdlib; the recommended path is Python interop until native versions land.
Advanced
Memory model: value semantics + ownership. Every type defines __copyinit__ (deep copy), __moveinit__ (move construct), and __del__ (destructor). The compiler enforces:
- An owned value has exactly one owner.
- Borrows (
read,mut) cannot outlive the owner. - Moves (
^) end the source’s lifetime — using it after is a compile error.
struct Buffer:
var data: UnsafePointer[Float32]
var size: Int
fn __init__(out self, size: Int):
self.size = size
self.data = UnsafePointer[Float32].alloc(size)
fn __moveinit__(out self, owned existing: Self):
self.data = existing.data
self.size = existing.size
fn __del__(owned self):
self.data.free()@register_passable("trivial") marks a struct as bitwise-copyable (no destructor, no special copy semantics) — used for Int, Bool, SIMD, pointers. They’re passed in registers and don’t need ownership tracking.
SIMD + autovectorisation:
SIMD[DType.float32, simdwidth_of[DType.float32]()]()is the portable max-width vector for the target.simdwidthof[T]()is compile-time: 4 on AVX2 f32, 8 on AVX-512 f32, 16 on NVIDIA H100 fp16, etc.vectorize[fn, w]rolls a scalar inner loop into width-wSIMD with a remainder.vectorize_unroll,unroll,tileare loop transformations as library functions (parametric).
GPU kernels:
from gpu import block_idx, thread_idx, barrier
from gpu.host import DeviceContext
fn vector_add(a: UnsafePointer[Float32],
b: UnsafePointer[Float32],
c: UnsafePointer[Float32], n: Int):
let i = block_idx.x * 256 + thread_idx.x
if i < n:
c[i] = a[i] + b[i]
def main():
var ctx = DeviceContext()
ctx.enqueue_function[vector_add](a_ptr, b_ptr, c_ptr, n,
grid_dim=(n // 256 + 1,),
block_dim=(256,))
ctx.synchronize()The same fn body can target NVIDIA SM_80+ (Ampere/Hopper/Blackwell), AMD CDNA3, and (preview) Apple Metal — selection is by DeviceContext runtime config + MLIR target lowering.
MAX integration: Mojo is the kernel language for the MAX inference engine. Models loaded via the MAX Python API (max.engine.InferenceSession) execute through a graph of Mojo kernels that the Modular team has tuned for each target. Benchmarks: Llama 3.1 8B at FP8 on H100 reportedly outperforms vLLM 0.6 by 20–40% on throughput (Modular’s published numbers, 2025-Q4).
FFI / interop:
- C ABI: Mojo can call C functions via
external_call["c_function_name", ReturnType](args...). No header parsing — declare types manually. - Python: as above, via the
pythonstdlib module. - MLIR: Mojo can emit raw MLIR via the
__mlir_opbuiltin — this is how kernel libraries hand-tune to specific dialects (e.g.,nvgpu,gpu,vector).
Reflection: compile-time reflection via parametric structs and the parameter keyword — you can inspect a type’s traits, field count, and member types at compile time. Runtime reflection is intentionally minimal (kept lean for performance).
Performance tools:
Benchmarkclass in the stdlib (from benchmark import Benchmark) — wraps timing with statistical filtering.nvprof/nsysfor GPU profiling — Mojo emits standard CUDA/PTX, so vendor tools work.- The MLIR pass pipeline can be dumped:
mojo build --emit=mlir foo.mojoshows each lowering stage.
God mode
Parametric metaprogramming: Mojo’s distinguishing feature is that types and values are uniformly handled at compile time. Function and struct parameters ([T: AnyType, N: Int]) are evaluated by the compiler before codegen, letting you write hardware-portable libraries.
fn matmul[
M: Int, N: Int, K: Int,
dtype: DType = DType.float32,
](
c: UnsafePointer[Scalar[dtype]],
a: UnsafePointer[Scalar[dtype]],
b: UnsafePointer[Scalar[dtype]],
):
alias tile_m = 32
alias tile_n = 32
alias tile_k = 8
# tile + vectorize + unroll the kernel hereEach unique (M, N, K, dtype) instantiation gets its own specialised code object.
@parameter decorator: applied to a closure or for-loop, asks the compiler to evaluate at compile time / unroll fully:
@parameter
for i in range(4):
print("compile-time unroll iter", i)Traits + conformance:
trait Drawable:
fn draw(self): ...
struct Circle(Drawable):
var r: Float64
fn draw(self): print("circle r=", self.r)Drawable conformance is structural-with-declaration — you must opt in by listing the trait, and the compiler checks all methods exist with correct signatures.
__mlir_attr / __mlir_op / __mlir_type: raw MLIR escape hatches for library authors to define new builtins, target-specific intrinsics, or attach attributes that downstream MLIR passes recognise. This is how SIMD, UnsafePointer, and the GPU intrinsics are implemented.
ABI / linkage: mark fn with @export to expose with a C-compatible symbol. @always_inline, @noinline, @register_passable are tuning knobs from the LLVM/Swift heritage.
Variant / sum types: Variant[Int, String, Float64]. Until Mojo gets pattern matching, dispatch is via if v.isa[Int]: .... The roadmap includes match syntax (target v26).
Idioms & style
fnfor hot paths,deffor glue. Library code usesfnwith full type signatures; scripts and notebooks lean ondef.- SIMD-first kernels: prefer
SIMD[T, simdwidthof[T]()]over scalar loops; combine withvectorizeover a scalar element accessor. - Parametric over generic when shapes are known at compile time — e.g.,
Matrix[Float32, 128, 64]outperforms a runtime-shape matrix because the compiler can fully tile + unroll. - Own + move, don’t copy. Use
^to transfer; structs that hold large buffers should implement__moveinit__but not__copyinit__to forbid implicit copies. - Naming:
snake_casefor functions and variables (Python heritage),PascalCaseforstructs andtraits,SCREAMING_SNAKEforaliasconstants,_leading_underscorefor private. - Formatter:
mojo format(built-in, since v24.4) — opinionated, ruff-fast. - Linter: the compiler itself surfaces most issues; no separate lint tool yet.
- Doc strings: triple-quoted under
fn/struct/trait; rendered bymojo doc(since v24.6, generates Markdown). - What reviewers flag:
defin performance code (forces dynamic dispatch + raises), missing^on owned args (causes silent copies),Stringin tight loops (heap allocation — useStringReforStringLiteral), Python interop in a hot path, unparametric kernels where parameters are statically known, unsafe pointer use without ownership invariants.
Ecosystem
The Mojo ecosystem in 2026-Q2 is early but accelerating post open-source. Key components:
| Area | Project / Library | Notes |
|---|---|---|
| Build / env | pixi | Official package + env manager |
| Inference runtime | MAX | Modular’s commercial inference engine; runs Llama, Mistral, DeepSeek, Gemma, Qwen with Mojo kernels |
| LLM serving | MAX Serve | OpenAI-compatible REST endpoint built on MAX |
| GPU kernels | MAX Driver Kernels (open-source since 2025-05) | Hand-tuned matmul, attention, layernorm, GEMM for NVIDIA + AMD |
| Numerics | stdlib.builtin, tensor, algorithm, math | Built-in modules; growing each release |
| Notebooks | Jupyter kernel (Modular) | mojo jupyter |
| Editor | VS Code extension (Modular) | LSP, debugger, formatter |
| Community | community.modular.com, Discord (~25k members), github.com/modular/max and github.com/modular/mojo | |
| Python bridge | python stdlib module | CPython interop with Python.import_module |
| Comparison points | julia, python + NumPy/Numba, rust, cpp + CUDA |
Notable users / adopters (2025): Modular itself (MAX inference engine in production), early adopters in AI infrastructure teams (a few referenced publicly in Modular blog posts include companies running custom LLM inference). Adoption beyond Modular is still small but growing fast since the May 2025 open-source release.
Gotchas
- Syntax churn: Mojo is pre-1.0. Breaking changes between minor releases are normal. Pin your
pixilockfile and re-test on each upgrade. Examples of recent breaks:letremoved v24.4;borrowed→read,inout→mutv25.x; ownership-decorator conventions tightened twice in 2025. - Stdlib is thin. No mature
json,re,http,csvyet. The interim pattern is “call Python for non-perf-critical I/O.” That meanspip install+Python.import_module(...). - Python interop has overhead. Every value crossing the boundary boxes through
PythonObject; the GIL is acquired. Don’t put a Python call inside a hot inner loop. - No stable ABI. Mojo libraries must be recompiled against each compiler release. There is no plan for a stable binary interface before v1.0.
- Ownership errors are compile errors, not runtime panics. Good for safety; the error messages have improved significantly through 2025 but can still be intimidating for Python users new to ownership.
defvsfnconfusion:defquietly raises and uses dynamic dispatch;fnrequires types andraisesdeclaration. Mixing them in a callgraph can produce surprising error-propagation paths.- Reference counting (
ArcPointer[T]) is opt-in, not default. Coming from Swift, this trips people — Mojo’s default is move semantics, not refcounting. - GPU support is NVIDIA-first. AMD support landed v25.x but lags in kernel coverage; Apple Metal preview is limited. Don’t assume a Mojo program will magically run on every accelerator.
Stringis not Python’sstr. MojoStringis byte-storage; codepoint operations require explicit calls. Better Unicode is on the roadmap.- Compile times can be slow on parametric-heavy code (matmul kernels with many specialisations). Use
mojo build --no-debug-info -O3for release builds; expect 5–60s for medium libraries. - Pre-1.0 means no semver guarantee. Production deployments should plan for upgrade churn through 2026.
- License nuance: the language and stdlib are Apache 2.0; the MAX inference engine commercial features (multi-replica serving, enterprise support, certain hand-tuned kernel packages) are still under Modular’s commercial license. Reading kernel source is fine; redistributing the MAX runtime as part of your product may require an agreement.
Citations
- Official documentation: https://docs.modular.com/mojo/
- Mojo manual: https://docs.modular.com/mojo/manual/
- Mojo standard library reference: https://docs.modular.com/mojo/lib
- Mojo changelog: https://docs.modular.com/mojo/changelog
- Modular blog (release announcements + benchmarks): https://www.modular.com/blog
- Open-source announcement (2025-05): https://www.modular.com/blog/modular-25-3-introducing-the-open-source-max-and-mojo-foundation
- Mojo language repo (Apache 2.0): https://github.com/modular/mojo
- MAX platform repo: https://github.com/modular/max
- pixi (package manager): https://pixi.sh/
- “Mojo language overview” (Chris Lattner talk, LLVM Dev Meeting 2023): https://www.youtube.com/watch?v=SEwTjZvy8vw
- “Democratizing AI Compute” (Lattner keynote 2024): https://www.modular.com/podcast
- Modular community: https://community.modular.com/
- Comparison: Mojo vs Python performance (matmul case study): https://www.modular.com/blog/the-worlds-fastest-unified-matrix-multiplication
- Wikipedia: https://en.wikipedia.org/wiki/Mojo_(programming_language)