Mojo — Reference

Source: https://docs.modular.com/mojo/

Mojo

  • Created: announced 2023-05-02 by Modular Inc. (Chris Lattner, Tim Davis); first public beta (cloud playground only) 2023-05; local installation generally available 2023-09-07; open-sourced standard library 2024-03-28; language and compiler open-sourced 2025-05 (Apache 2.0 with LLVM exception, with hardware-acceleration kernels remaining commercial under the MAX platform license).
  • Latest stable: v25.4 (Modular adopts calendar versioning 25.{quarter}; v25.4 released 2026-04, current as of 2026-05). v1.0 (the “stable language” milestone with frozen syntax + ABI) is still expected late 2026; the team is presently focused on ergonomics + GPU kernel libraries.
  • Owner / license: Modular Inc. (Mountain View, CA). Apache 2.0 with LLVM exception on the language, compiler, stdlib, and most kernels (since v25.x). MAX platform commercial license for the inference runtime + hardware-specific accelerated kernel packages.
  • Paradigms: systems + scientific. Python-family syntax with explicit memory ownership (Rust-like), value semantics, parametric metaprogramming at compile time, and SIMD as a first-class type.
  • Typing: strict static typing in fn mode; gradual / dynamic in def mode (Python-compat). Generics via parametric structs and traits. Compile-time values (alias, parameter) make Mojo a metaprogramming language as well.
  • Memory: ownership + borrow semantics (move/borrow checker similar to Rust); var / ref / owned / read argument conventions; explicit __copyinit__ / __moveinit__ / __del__ lifetime hooks. No GC by default; reference counting available via ArcPointer[T].
  • Compilation: MLIR-based (Mojo is the flagship language for MLIR — the same compiler infrastructure used by TensorFlow XLA, IREE, and PyTorch 2.0). Lowers through MLIR dialects to LLVM IR, with target-specific lowering for NVIDIA SM_*, AMD CDNA/RDNA, Apple Silicon, x86, ARM.
  • Primary domains: AI inference + training kernels, high-performance numeric libraries, GPU programming as a successor/companion to CUDA C++, replacing the C++/Python “two-language” stack used by PyTorch / JAX / TensorFlow contributors.
  • Official docs: https://docs.modular.com/mojo/

At a glance

Mojo is Chris Lattner’s third major language project (after Swift and the LLVM/Clang stack) and his attempt to give the AI community a single language that is Python-syntactic on the surface, systems-fast underneath. The pitch: take Python (which 90% of ML researchers already write), add a real type system, ownership-based memory, SIMD primitives, and an MLIR backend, and you can write kernels for GPUs, TPUs, and custom accelerators in the same file as your model code — no C++/CUDA bridge, no JIT compile penalty, no two-language gap. Modular publishes performance claims of 35,000x speedups over CPython on matmul-style kernels (https://www.modular.com/blog/the-worlds-fastest-unified-matrix-multiplication) and is using Mojo internally to ship the MAX inference engine that runs Llama / Mistral / DeepSeek models. The language is evolving fast — syntax breaks between weekly nightly releases are still common, and library ecosystems are nascent.

Getting started

Install: Modular’s pixi (a Conda/PyPI-compatible package manager — https://pixi.sh) is the recommended path since v24.6:

curl -fsSL https://pixi.sh/install.sh | sh
pixi init my-mojo-project
cd my-mojo-project
pixi add mojo
pixi run mojo --version

Legacy modular CLI is being retired in favour of pixi. Mojo also ships in the MAX SDK (pixi add modular) for those building inference apps.

Hello world:

def main():
    print("Hello, world!")

Run: mojo hello.mojo (JIT-execute) or mojo build hello.mojo -o hello (AOT, produces a native binary). REPL: mojo with no args.

Project layout:

my-project/
  pixi.toml              # deps + tasks
  pixi.lock              # lockfile
  src/
    main.mojo
    mypkg/
      __init__.mojo
      core.mojo
  tests/
    test_core.mojo

Package manager / build tool: pixi for environment + deps (handles Mojo + Python + native libs in one lockfile). The Mojo compiler itself does build + test + format. Native Mojo package registry is still in design — current ecosystem is dominated by Modular-published libraries shipped via pixi/Conda channels (https://conda.modular.com).

Editor support: the Mojo VS Code extension (Modular, official) provides LSP-backed completion, type info, diagnostics, debugger. JetBrains support is community-driven. A Jupyter kernel ships with the SDK (mojo jupyter).

Basics

Primitives: Int (platform-sized signed int), Int8/16/32/64, UInt*, Float16/32/64, BFloat16, Bool, String, StringLiteral (compile-time string), StringRef (borrowed view), DType (compile-time tag describing a numeric type), SIMD[dtype, width] (the foundational vector type — every numeric scalar is SIMD[T, 1]).

var x: Int = 42
var y: Float64 = 3.14
var v: SIMD[DType.float32, 4] = SIMD[DType.float32, 4](1.0, 2.0, 3.0, 4.0)

Variables / scope: var (mutable), alias (compile-time constant). Top-level statements live in fn main() or def main(). Lexical scoping; no let (was removed in v24.4 — use immutable var references or read parameter convention).

Control flow: if/elif/else, for x in xs, while, break, continue, return, raise. try / except / finally for exceptions (Mojo errors are values; Error is the canonical type).

Functions — two flavours:

  • def — Python-style, dynamic, allows untyped args, raises by default. The “scripty” mode for top-level code and Python-compat work.
  • fn — strict, all args typed, must declare raises if it can throw, prefers ownership conventions explicit. The “systems” mode for hot paths and library code.
fn dot[width: Int](a: SIMD[DType.float32, width],
                   b: SIMD[DType.float32, width]) -> Float32:
    return (a * b).reduce_add()
 
def quick(x):              # Python-style, x is dynamic
    return x * 2

The bracketed [width: Int] parameters are compile-time — Mojo specialises the function per width. This is where the SIMD-portable, hardware-tuned kernels come from.

Strings: String (heap, mutable, owned), StringLiteral (compile-time, no allocation), StringRef (borrowed slice). Slicing: s[0:3]. Concat: "a" + "b". f-strings (f"{x:.2f}") — added in v25.1, still gaining features. Unicode handling is byte-oriented in the underlying storage; codepoint iteration is opt-in.

Built-in collections: List[T] (mutable, owned, dynamic-array), Tuple[T1, T2, ...] (heterogeneous, fixed-size), Dict[K, V] (open-addressed hash table), Set[T]. Comprehensions: [x*x for x in range(10)] (added v24.3).

Intermediate

Type system depth:

  • Structs (struct) are the only user-defined product type — value-semantic, sized at compile time. No classes (yet — class is reserved for future Python-compat OO).
  • Traits (trait, since v24.2) are interface contracts: a struct implements a trait by providing the listed methods. Compile-time checked.
  • Parametric structsstruct Matrix[type: DType, rows: Int, cols: Int]: .... Specialised per parameter combination.
  • @register_passable struct decorator means the struct is small enough to live in registers (like a __m256 SIMD register or a 64-bit scalar) — used for SIMD, Int, pointer types.
  • Variants via Variant[T1, T2, ...] (tagged union, like Rust’s enum). Pattern-match-style discrimination is still under design; current API is v.isa[T]() + v[T].

Ownership conventions (parameter prefixes on fn):

  • read (default for immutable args, replaces older borrowed) — immutable borrow, like Rust &T.
  • mut (replaces older inout) — mutable borrow, like Rust &mut T.
  • owned — the function takes ownership; caller must ^ transfer or copy. Like Rust by-value or T.
  • ref — generalised reference with explicit lifetime parameter (used in library code).
fn add_one(mut x: Int):
    x += 1
 
fn consume(owned s: String):
    print(s)  # s is dropped at end of scope

The ^ transfer operator moves ownership: consume(s^) ends s’s lifetime in the caller.

Modules / packages: every .mojo (or .🔥) file is a module. Directories with an __init__.mojo are packages. from mypkg.core import foo. Compiled packages can be distributed as .mojopkg files (mojo package mypkg -o mypkg.mojopkg).

Error handling: functions that can raise must be marked raises. try/except/finally works as in Python. Errors are values of type Error; the unified error type carries a message. No exception hierarchy yet (under design).

Concurrency / parallelism:

  • parallelize[fn](num_work_items, num_workers) — fork-join primitive over a function specialised to a work-item index.
  • vectorize[fn, simd_width] — emits a SIMD-vectorised inner loop, with peeled scalar tail.
  • tile[fn, *tile_sizes] — tiles a loop nest for cache locality and codegen.
  • async fn + coroutine.await — async/await (preview since v25.1, evolving). The runtime is cooperative + work-stealing.
  • GPU programming (since v24.5): @parameter decorators + DeviceContext API let the same Mojo source compile to a CPU loop or a CUDA / ROCm kernel. The gpu stdlib module exposes block_idx, thread_idx, barrier(), shared memory.

Python interop:

  • from python import Python then np = Python.import_module("numpy").
  • Call any pip-installed Python lib; conversions via PythonObject (boxed value crossing the boundary).
  • The boundary has a measurable cost (refcount + GIL acquisition); good for “Mojo as a fast inner loop calling Python for I/O / plotting / preprocessing” pattern.

I/O / stdlib: print, open, read, write. pathlib.Path (similar to Python). The standard library is still small — many Python-style utilities (json, re, subprocess) are not yet in stdlib; the recommended path is Python interop until native versions land.

Advanced

Memory model: value semantics + ownership. Every type defines __copyinit__ (deep copy), __moveinit__ (move construct), and __del__ (destructor). The compiler enforces:

  • An owned value has exactly one owner.
  • Borrows (read, mut) cannot outlive the owner.
  • Moves (^) end the source’s lifetime — using it after is a compile error.
struct Buffer:
    var data: UnsafePointer[Float32]
    var size: Int
 
    fn __init__(out self, size: Int):
        self.size = size
        self.data = UnsafePointer[Float32].alloc(size)
 
    fn __moveinit__(out self, owned existing: Self):
        self.data = existing.data
        self.size = existing.size
 
    fn __del__(owned self):
        self.data.free()

@register_passable("trivial") marks a struct as bitwise-copyable (no destructor, no special copy semantics) — used for Int, Bool, SIMD, pointers. They’re passed in registers and don’t need ownership tracking.

SIMD + autovectorisation:

  • SIMD[DType.float32, simdwidth_of[DType.float32]()]() is the portable max-width vector for the target.
  • simdwidthof[T]() is compile-time: 4 on AVX2 f32, 8 on AVX-512 f32, 16 on NVIDIA H100 fp16, etc.
  • vectorize[fn, w] rolls a scalar inner loop into width-w SIMD with a remainder.
  • vectorize_unroll, unroll, tile are loop transformations as library functions (parametric).

GPU kernels:

from gpu import block_idx, thread_idx, barrier
from gpu.host import DeviceContext
 
fn vector_add(a: UnsafePointer[Float32],
              b: UnsafePointer[Float32],
              c: UnsafePointer[Float32], n: Int):
    let i = block_idx.x * 256 + thread_idx.x
    if i < n:
        c[i] = a[i] + b[i]
 
def main():
    var ctx = DeviceContext()
    ctx.enqueue_function[vector_add](a_ptr, b_ptr, c_ptr, n,
                                     grid_dim=(n // 256 + 1,),
                                     block_dim=(256,))
    ctx.synchronize()

The same fn body can target NVIDIA SM_80+ (Ampere/Hopper/Blackwell), AMD CDNA3, and (preview) Apple Metal — selection is by DeviceContext runtime config + MLIR target lowering.

MAX integration: Mojo is the kernel language for the MAX inference engine. Models loaded via the MAX Python API (max.engine.InferenceSession) execute through a graph of Mojo kernels that the Modular team has tuned for each target. Benchmarks: Llama 3.1 8B at FP8 on H100 reportedly outperforms vLLM 0.6 by 20–40% on throughput (Modular’s published numbers, 2025-Q4).

FFI / interop:

  • C ABI: Mojo can call C functions via external_call["c_function_name", ReturnType](args...). No header parsing — declare types manually.
  • Python: as above, via the python stdlib module.
  • MLIR: Mojo can emit raw MLIR via the __mlir_op builtin — this is how kernel libraries hand-tune to specific dialects (e.g., nvgpu, gpu, vector).

Reflection: compile-time reflection via parametric structs and the parameter keyword — you can inspect a type’s traits, field count, and member types at compile time. Runtime reflection is intentionally minimal (kept lean for performance).

Performance tools:

  • Benchmark class in the stdlib (from benchmark import Benchmark) — wraps timing with statistical filtering.
  • nvprof / nsys for GPU profiling — Mojo emits standard CUDA/PTX, so vendor tools work.
  • The MLIR pass pipeline can be dumped: mojo build --emit=mlir foo.mojo shows each lowering stage.

God mode

Parametric metaprogramming: Mojo’s distinguishing feature is that types and values are uniformly handled at compile time. Function and struct parameters ([T: AnyType, N: Int]) are evaluated by the compiler before codegen, letting you write hardware-portable libraries.

fn matmul[
    M: Int, N: Int, K: Int,
    dtype: DType = DType.float32,
](
    c: UnsafePointer[Scalar[dtype]],
    a: UnsafePointer[Scalar[dtype]],
    b: UnsafePointer[Scalar[dtype]],
):
    alias tile_m = 32
    alias tile_n = 32
    alias tile_k = 8
    # tile + vectorize + unroll the kernel here

Each unique (M, N, K, dtype) instantiation gets its own specialised code object.

@parameter decorator: applied to a closure or for-loop, asks the compiler to evaluate at compile time / unroll fully:

@parameter
for i in range(4):
    print("compile-time unroll iter", i)

Traits + conformance:

trait Drawable:
    fn draw(self): ...
 
struct Circle(Drawable):
    var r: Float64
    fn draw(self): print("circle r=", self.r)

Drawable conformance is structural-with-declaration — you must opt in by listing the trait, and the compiler checks all methods exist with correct signatures.

__mlir_attr / __mlir_op / __mlir_type: raw MLIR escape hatches for library authors to define new builtins, target-specific intrinsics, or attach attributes that downstream MLIR passes recognise. This is how SIMD, UnsafePointer, and the GPU intrinsics are implemented.

ABI / linkage: mark fn with @export to expose with a C-compatible symbol. @always_inline, @noinline, @register_passable are tuning knobs from the LLVM/Swift heritage.

Variant / sum types: Variant[Int, String, Float64]. Until Mojo gets pattern matching, dispatch is via if v.isa[Int]: .... The roadmap includes match syntax (target v26).

Idioms & style

  • fn for hot paths, def for glue. Library code uses fn with full type signatures; scripts and notebooks lean on def.
  • SIMD-first kernels: prefer SIMD[T, simdwidthof[T]()] over scalar loops; combine with vectorize over a scalar element accessor.
  • Parametric over generic when shapes are known at compile time — e.g., Matrix[Float32, 128, 64] outperforms a runtime-shape matrix because the compiler can fully tile + unroll.
  • Own + move, don’t copy. Use ^ to transfer; structs that hold large buffers should implement __moveinit__ but not __copyinit__ to forbid implicit copies.
  • Naming: snake_case for functions and variables (Python heritage), PascalCase for structs and traits, SCREAMING_SNAKE for alias constants, _leading_underscore for private.
  • Formatter: mojo format (built-in, since v24.4) — opinionated, ruff-fast.
  • Linter: the compiler itself surfaces most issues; no separate lint tool yet.
  • Doc strings: triple-quoted under fn/struct/trait; rendered by mojo doc (since v24.6, generates Markdown).
  • What reviewers flag: def in performance code (forces dynamic dispatch + raises), missing ^ on owned args (causes silent copies), String in tight loops (heap allocation — use StringRef or StringLiteral), Python interop in a hot path, unparametric kernels where parameters are statically known, unsafe pointer use without ownership invariants.

Ecosystem

The Mojo ecosystem in 2026-Q2 is early but accelerating post open-source. Key components:

AreaProject / LibraryNotes
Build / envpixiOfficial package + env manager
Inference runtimeMAXModular’s commercial inference engine; runs Llama, Mistral, DeepSeek, Gemma, Qwen with Mojo kernels
LLM servingMAX ServeOpenAI-compatible REST endpoint built on MAX
GPU kernelsMAX Driver Kernels (open-source since 2025-05)Hand-tuned matmul, attention, layernorm, GEMM for NVIDIA + AMD
Numericsstdlib.builtin, tensor, algorithm, mathBuilt-in modules; growing each release
NotebooksJupyter kernel (Modular)mojo jupyter
EditorVS Code extension (Modular)LSP, debugger, formatter
Communitycommunity.modular.com, Discord (~25k members), github.com/modular/max and github.com/modular/mojo
Python bridgepython stdlib moduleCPython interop with Python.import_module
Comparison pointsjulia, python + NumPy/Numba, rust, cpp + CUDA

Notable users / adopters (2025): Modular itself (MAX inference engine in production), early adopters in AI infrastructure teams (a few referenced publicly in Modular blog posts include companies running custom LLM inference). Adoption beyond Modular is still small but growing fast since the May 2025 open-source release.

Gotchas

  • Syntax churn: Mojo is pre-1.0. Breaking changes between minor releases are normal. Pin your pixi lockfile and re-test on each upgrade. Examples of recent breaks: let removed v24.4; borrowedread, inoutmut v25.x; ownership-decorator conventions tightened twice in 2025.
  • Stdlib is thin. No mature json, re, http, csv yet. The interim pattern is “call Python for non-perf-critical I/O.” That means pip install + Python.import_module(...).
  • Python interop has overhead. Every value crossing the boundary boxes through PythonObject; the GIL is acquired. Don’t put a Python call inside a hot inner loop.
  • No stable ABI. Mojo libraries must be recompiled against each compiler release. There is no plan for a stable binary interface before v1.0.
  • Ownership errors are compile errors, not runtime panics. Good for safety; the error messages have improved significantly through 2025 but can still be intimidating for Python users new to ownership.
  • def vs fn confusion: def quietly raises and uses dynamic dispatch; fn requires types and raises declaration. Mixing them in a callgraph can produce surprising error-propagation paths.
  • Reference counting (ArcPointer[T]) is opt-in, not default. Coming from Swift, this trips people — Mojo’s default is move semantics, not refcounting.
  • GPU support is NVIDIA-first. AMD support landed v25.x but lags in kernel coverage; Apple Metal preview is limited. Don’t assume a Mojo program will magically run on every accelerator.
  • String is not Python’s str. Mojo String is byte-storage; codepoint operations require explicit calls. Better Unicode is on the roadmap.
  • Compile times can be slow on parametric-heavy code (matmul kernels with many specialisations). Use mojo build --no-debug-info -O3 for release builds; expect 5–60s for medium libraries.
  • Pre-1.0 means no semver guarantee. Production deployments should plan for upgrade churn through 2026.
  • License nuance: the language and stdlib are Apache 2.0; the MAX inference engine commercial features (multi-replica serving, enterprise support, certain hand-tuned kernel packages) are still under Modular’s commercial license. Reading kernel source is fine; redistributing the MAX runtime as part of your product may require an agreement.

Citations