OCaml — Reference

Source: https://ocaml.org/docs

OCaml

  • Created: 1996 by Xavier Leroy, Jérôme Vouillon, Damien Doligez, Didier Rémy, and Ascánder Suárez at INRIA. Descends from Caml (1985), itself from ML (Robin Milner, late 1970s) (Wikipedia).
  • Latest stable: OCaml 5.4.1 (ocaml.org/docs).
  • OCaml 5.0: 2022 — multicore runtime (effect handlers + parallel domains), removed the global GC lock.
  • Owner: OCaml team at Inria; OCaml Software Foundation (OCSF, since 2018) provides governance and funding. License: LGPL 2.1 with linking exception (compiler under specific terms; libraries effectively permissive).
  • Paradigms: functional (primary), imperative, object-oriented, modular.
  • Typing: static, strong, Hindley-Milner type inference (you almost never write a type), parametric polymorphism, GADTs, polymorphic variants, structurally-typed objects, modules + functors as a higher-order type system.
  • Memory: tracing, generational, incremental garbage collector. Multicore (OCaml 5+): parallel minor + major collectors, per-domain minor heap.
  • Compilation: ocamlc produces bytecode (portable, fast startup), ocamlopt produces native code (fast, optimized via Flambda/Flambda2). js_of_ocaml and melange target JavaScript.
  • Primary domains: compilers (Hack, Flow, ReScript, Coq, F*), formal methods, financial systems (Jane Street), static analysis (Infer), language tooling.
  • Official docs: https://ocaml.org/docs and https://v2.ocaml.org/manual/

At a glance

OCaml is the most production-deployed ML-family language. Hindley-Milner inference catches astonishing numbers of bugs at compile time without annotation noise; the module system (with functors — modules parameterized by other modules) is its most distinctive feature, used as a structural mechanism for code reuse and dependency injection beyond what most languages offer. OCaml 5 added effect handlers and shared-memory parallelism — a major capability shift after 25 years of single-domain-only execution. Jane Street’s open-source contributions (Base, Core, Async, ppxes) define modern OCaml style.

Getting started

Install: use opam (the package manager and compiler installer):

# Linux/macOS
bash -c "sh <(curl -fsSL https://opam.ocaml.org/install.sh)"
opam init        # bootstraps a switch
opam switch create 5.4.1
eval $(opam env)

Or brew install opam, apt install opam. Windows: native opam supported since opam 2.2 (no more cygwin/wsl required).

Hello world (hello.ml):

let () = print_endline "Hello, world!"

Compile + run: ocaml hello.ml (script-style interpreter), or ocamlfind ocamlopt -package str -linkpkg hello.ml -o hello && ./hello. With dune (the modern way): dune exec ./hello.exe.

REPL: ocaml (top-level) or the much-improved utop (community standard, syntax highlighting + completion + history): opam install utop then utop. Use ;; to terminate top-level expressions.

Project layout (a dune project):

myproj/
  dune-project          # project metadata
  myproj.opam           # opam package metadata (often dune-generated)
  bin/
    dune                # (executable (name main))
    main.ml
  lib/
    dune                # (library (name myproj))
    util.ml
    types.ml
  test/
    dune                # (test (name test_myproj))
    test_myproj.ml

Package/build tool: opam for packages (https://opam.ocaml.org/packages/); dune for builds (dune build, dune test, dune exec). dune files are S-expression descriptions of targets. Together, opam + dune is the canonical modern stack.

Basics

Types and literals:

  • int (63-bit on 64-bit platforms — one bit reserved for GC tagging), int32, int64, nativeint, float, bool, char (single byte, not Unicode), string (immutable since 4.06), bytes (mutable byte string), unit (()), 'a list, 'a array, 'a option (= None | Some of 'a), ('a, 'b) result (= Ok of 'a | Error of 'b), tuples (1, "a"), records { x : int; y : int }, variants type t = A | B of int | C of string * int, 'a ref (mutable cell).
  • Arithmetic operators are monomorphic: + is int, +. is float, ^ is string concat. (No overloading without modular implicits or ppxes.)

Variables/scoping: let x = expr binds, let x = e1 in e2 is local. All bindings are immutable by default; mutation via ref (let r = ref 0; r := !r + 1) or mutable record fields ({mutable x : int}). Shadowing is normal: let x = 1 in let x = x + 1 in x.

Control flow: if e1 then e2 else e3. Pattern match — the workhorse:

match shape with
| Circle r -> 3.14 *. r *. r
| Rectangle (w, h) -> w *. h
| Triangle (a, b, c) -> ... (* compiler warns if non-exhaustive *)

for i = 1 to 10 do ... done, while cond do ... done. No break/continue; use exceptions or recursion.

Functions: everything is curried by default.

let add x y = x + y          (* val add : int -> int -> int *)
let inc = add 1              (* partial application *)
let plus = fun x y -> x + y  (* lambda *)
 
(* labeled args *)
let div ~num ~den = num / den
div ~num:10 ~den:2
 
(* optional args *)
let greet ?(greeting="Hello") name = greeting ^ ", " ^ name
greet "world"
greet ~greeting:"Hi" "world"

Strings: immutable since 4.06. String.length, String.sub s pos len, String.concat ", " ["a"; "b"]. UTF-8 not native — string is bytes; use Uucp/Uutf for Unicode. Sprintf-style: Printf.sprintf "x = %d" 42.

Collections: 'a list (singly-linked, immutable: [1; 2; 3], 1 :: [2; 3]); 'a array (mutable, fixed-length: [|1; 2; 3|], a.(i)); Hashtbl (mutable); Map.Make(K) and Set.Make(K) (functor-built immutable maps/sets); Queue, Stack, Buffer. Modern: Jane Street’s Base.Map, Base.Hashtbl.

Intermediate

Type system depth: this is OCaml’s heart.

  • Type inference (Hindley-Milner): most code needs no annotations. Polymorphism is inferred: let id x = x has type 'a -> 'a.
  • Algebraic data types (ADTs): type 'a tree = Leaf | Node of 'a * 'a tree * 'a tree. Pattern match exhaustively or compiler warns.
  • Polymorphic variants: [ \Apple | `Banana of int ]— open variant types, no declaration needed; useful for extensible APIs (heavy use injs_of_ocaml`).
  • Records: nominal, can have mutable fields. Field-based polymorphism via row types in objects.
  • Objects: structurally typed (< x : int; y : int > is an object type). Method dispatch is by name; subtyping is structural. Used less than ML core.
  • GADTs (Generalized Algebraic Data Types): type _ expr = IntL : int -> int expr | Add : int expr * int expr -> int expr | Eq : 'a expr * 'a expr -> bool expr. Enable type-safe interpreters, length-indexed vectors, etc.

Modules are a separate language layered on top:

module Stack = struct
  type 'a t = 'a list
  let empty = []
  let push x s = x :: s
  let pop = function [] -> None | x :: s -> Some (x, s)
end
 
(* Module signatures (interfaces): *)
module type STACK = sig
  type 'a t
  val empty : 'a t
  val push : 'a -> 'a t -> 'a t
  val pop : 'a t -> ('a * 'a t) option
end

Functors are modules parameterized by modules:

module MakeSet (Ord : sig type t; val compare : t -> t -> int end) = struct
  type elt = Ord.t
  type t = elt list
  (* ... *)
end
module IntSet = MakeSet(struct type t = int let compare = compare end)

Modules are the dependency-injection mechanism of OCaml.

Error handling: two camps.

  1. Exceptionsraise (Failure "msg"), try expr with Failure m -> .... Cheap.
  2. option / result — typed errors (Ok v | Error e). Modern style (Jane Street’s Base/Core, the result library) prefers result. let* x = Result.bind ... for pipelines.

Concurrency primitives:

  • Pre-OCaml 5: cooperative only, via Lwt (lightweight threads, monadic) or Async (Jane Street, monadic). Both are based on promises, single-threaded.
  • OCaml 5+: Domains — true OS threads with parallel GC. Domain.spawn (fun () -> ...). Effect handlers for structured concurrency / control flow inversion. Eio is the modern effect-based I/O library; domainslib for parallelism patterns.

I/O: print_string, print_endline, print_int, Printf.printf. Files: open_in/open_out, input_line, output_string, close_in. Modern: Bos (filesystem ops), Eio (effect-based concurrent I/O), Lwt_io / Async.Reader/Writer (older monadic).

Stdlib highlights: List, Array, String, Bytes, Buffer, Hashtbl, Map.Make, Set.Make, Queue, Stack, Printf/Scanf/Format, Random, Unix (POSIX), Sys (env, argv), Filename, Marshal (serialization, unsafe), Domain (5+), Effect (5+), Atomic (5+), Mutex/Condition/Semaphore/Event (5+).

Advanced

Memory / GC:

  • Generational, incremental tracing GC. Two heaps: minor (per-domain in 5+, very fast bump allocation) and major (mark-and-sweep with optional compaction).
  • Pre-5: a single global GC lock prevented true parallelism.
  • 5+: multicore — each Domain has its own minor heap; major heap is shared with parallel marking. OCAMLRUNPARAM=v=0x400 to dump GC stats.
  • Obj.magic and Marshal bypass the type system — never use except in compiler/library internals.

Concurrency deep dive:

  • Domains = OS threads with isolated minor heaps. Use for CPU-parallel work.
  • Fibers / Effects: OCaml 5 added algebraic effects. Define an effect, perform it, handle it elsewhere — the basis of Eio. Replaces continuation-passing for async I/O.
  • Eio is the modern effect-based concurrency library: structured concurrency, switches, fibers, capabilities. https://github.com/ocaml-multicore/eio.
  • Lwt and Async (cooperative, single-domain) remain the most-deployed today; they will coexist with Eio for years.

FFI:

  • C interop is mature: write external my_fn : int -> int = "caml_my_fn" in OCaml; implement value caml_my_fn(value n) { ... } in C using OCaml’s runtime macros (Val_int, Int_val, caml_alloc_*, CAMLparam*/CAMLreturn to register roots).
  • dune integrates C compilation: (foreign_stubs (language c) (names my_stubs)).
  • ctypes library for safer, declarative bindings without writing C glue.
  • Dune cstubs generate C bindings via OCaml DSL.

Reflection: minimal at runtime — runtime type info is erased after compilation. Compile-time reflection is via PPX preprocessors (see God mode).

Performance tools:

  • ocamlopt -O3 (uses Flambda if compiler was built with --enable-flambda).
  • OCAMLRUNPARAM=v=0x400 for GC trace.
  • perf record ./prog (Linux) — OCaml emits good DWARF.
  • landmarks library: instrument with [@landmark] attributes; produces flame graphs.
  • magic-trace (Jane Street, Linux, Intel-only) for nanosecond-resolution traces.
  • ocaml-memtrace + memtrace_viewer (Jane Street) for allocation profiling.
  • core_bench for microbenchmarks.

God mode

PPX preprocessor extensions: OCaml’s metaprogramming. PPXes rewrite the AST after parsing but before type-check. Built on ppxlib. Examples:

  • [@@deriving show, eq, yojson] auto-generates printers, equality, and JSON serializers.
  • [%bs.obj { x = 1 }] for js_of_ocaml/Melange JS object literals.
  • let%lwt x = e in ... (ppx_let) — monadic syntax.
  • Write your own with ppxlib: define a transformation from Parsetree to Parsetree, register it, and dune’s (preprocess (pps your_ppx)) runs it.

GADTs: type-indexed term language; lets you encode invariants in the type.

type _ ty = Int : int ty | Bool : bool ty
let cast : type a. a ty -> int -> a = function
  | Int -> fun n -> n
  | Bool -> fun n -> n <> 0

Pattern-matching on a GADT constructor refines the existential type variable.

First-class modules: pack a module into a value. let m = (module IntSet : SET with type elt = int); unpack with let module M = (val m) in .... Enables runtime module selection — dependency injection without functors.

Functors: parameterize a module by a module signature. The compiler eagerly checks the parameter satisfies the signature. Building blocks for whole architectures (Jane Street’s Map, Set, Hashtbl are all functor-based).

compiler-libs: the OCaml compiler exposes its parser, type-checker, and interpreter as a library. Build code editors, refactoring tools, custom REPLs.

Bytecode vs native: ocamlc emits .cmo bytecode, runnable on any platform with ocamlrun; great for portability and very fast let%bs startup. ocamlopt emits native object code (uses x86_64, aarch64, riscv64, etc. backends); much faster runtime, larger binaries.

Flambda / Flambda2: optional optimization middle-end. Enable with OCAMLFLAGS=-O3 in dune, or opam switch create 5.4.1+flambda. Inlining, specialization, simplification — gives 5-30% speedups on idiomatic code.

Multicore (OCaml 5+): effect handlers as a low-level concurrency primitive. Eio builds structured concurrency on top. Domain.spawn for true parallelism — but avoid sharing mutable state (no Atomic for arbitrary values; only int/ref).

js_of_ocaml: compiles bytecode to JavaScript — runs essentially any OCaml program in the browser/Node. Used by Flow, Coq’s jsCoq, ReScript’s predecessor. Melange (fork of BuckleScript): compiles typed AST directly to ergonomic JS with great FFI; used in React-like frontend stacks. WasmGC backend (OCaml 5+): emerging.

MetaOCaml: a research dialect adding multi-stage programming — <expr> quotes code, ~e antiquotes, Runcode.run e evaluates. Generate specialized code at runtime then JIT-compile.

Idioms & style

  • Naming: snake_case for values and functions, Capitalized for modules, types (exn), constructors (Some, Error), and exceptions. File foo.ml defines a module Foo automatically.
  • Formatter: ocamlformat (configurable; the de facto standard, integrates with dune via dune build @fmt).
  • Linter: merlin (LSP backend) flags warnings; ocaml-lsp-server, ppx_jane/ppx_expect for Jane Street style.
  • Idiomatic patterns:
    • Use match over if for ADTs; let the compiler check exhaustiveness.
    • Use option/result over exceptions for expected errors.
    • Use let* and let+ (binding operators) for monadic chains in Result/Lwt/Eio.
    • Use Map.Make/Set.Make/Hashtbl.Make (functor instantiations) over polymorphic Hashtbl for type safety.
    • One module per file; signatures in .mli files (interface) hide internals.
    • Avoid Obj.magic, Marshal, mutable globals.
  • Expert review focus: non-exhaustive pattern matches (warn-as-error in CI), needless ref/mutable, exception leakage from libraries (declare (*...*) doc), GADT escape into refutation cases, ppx over-use (slow build), interface (.mli) accidentally widening type abstractions.

Ecosystem

  • Web: Dream (modern, async; recommended for new projects), Opium (Sinatra-style), Cohttp (HTTP client/server), Lwt + Cohttp, Eio + Cohttp.
  • Frontend: js_of_ocaml (any bytecode → JS), Melange (modern, React-friendly), Brr (Browser bindings).
  • Concurrency: Lwt (mature, monadic, single-threaded), Async (Jane Street, monadic, single-threaded), Eio (effect-based, multicore-aware, modern).
  • Data / parsing: Yojson, ppx_yojson_conv, Sexplib, Angstrom (parser combinators), Re (regex), Csv, Owl (numerical computing — NumPy-like).
  • Database: Caqti (interface), Petrol (typed SQL builder), Sqlite3-OCaml, pgx/postgresql-ocaml.
  • Testing: alcotest (lightweight, popular), ppx_inline_test + ppx_expect (Jane Street; tests live next to code; expect tests diff output).
  • Coverage: bisect_ppx.
  • Docs: odoc (the standard); doc comments use (** ... *). dune build @doc generates HTML.
  • Notable users: Jane Street (massive — entire trading system; major opam contributor), Meta/Facebook (Hack, Flow, Infer all in OCaml), Microsoft (F* verifier, parts of Azure SDK gen), Bloomberg, Tezos (blockchain in OCaml), Coq/Rocq (proof assistant), MirageOS (unikernels), Citrix XenServer.

Gotchas

  • int is 63-bit on 64-bit systems (one bit for GC tag). Use Int64.t for bit-exact 64-bit arithmetic.
  • Monomorphic operators: 1 + 2 (int), 1.0 +. 2.0 (float), "a" ^ "b" (string). Mixing without conversion is a type error. Modern style: open Base and use +, +., ^ consistently.
  • Polymorphic comparison =/compare/< uses runtime structural equality; works on most things but blows up on cycles, functions, and abstract types. Use type-specific comparators.
  • == is physical (pointer) equality, not value equality. You almost always want =.
  • unit vs (): forgetting let () = ... in ... at top-level gives “unused result” warnings.
  • Imperative loops with ref in functional code is a code smell — usually a List.fold_left or recursion is clearer.
  • Marshal.to_string is unsafe: no type info preserved across versions; reading the wrong type into a marshal’d value is UB.
  • Open variants leak: \Apple` accidentally matching another module’s polymorphic variant of the same name.
  • GADT exhaustiveness: pattern compiler may need explicit type annotations to prove a case is impossible (function _ -> .).
  • PPX-induced compile times: heavy [@@deriving] use slows builds significantly. Profile with dune build --verbose.
  • Lwt vs Async vs Eio fragmentation: each has incompatible types and bindings; choosing one locks you in for that codebase.
  • Obj.magic exists and a panicked junior may reach for it; always reject in review.
  • OCaml 5 multicore caveats: data-race-freedom is not statically guaranteed; you must design for it. Atomic.t is the only built-in safe sharing primitive for arbitrary mutable cells.
  • opam switches are heavy: each switch is a full toolchain + library set. Use opam switch create . 5.4.1 (local) to scope to a project.
  • .mli interfaces hide types: a too-restrictive .mli can make types opaque to consumers; balance abstraction vs. usability.

Citations