OCaml — Reference
Source: https://ocaml.org/docs
OCaml
- Created: 1996 by Xavier Leroy, Jérôme Vouillon, Damien Doligez, Didier Rémy, and Ascánder Suárez at INRIA. Descends from Caml (1985), itself from ML (Robin Milner, late 1970s) (Wikipedia).
- Latest stable: OCaml 5.4.1 (ocaml.org/docs).
- OCaml 5.0: 2022 — multicore runtime (effect handlers + parallel domains), removed the global GC lock.
- Owner: OCaml team at Inria; OCaml Software Foundation (OCSF, since 2018) provides governance and funding. License: LGPL 2.1 with linking exception (compiler under specific terms; libraries effectively permissive).
- Paradigms: functional (primary), imperative, object-oriented, modular.
- Typing: static, strong, Hindley-Milner type inference (you almost never write a type), parametric polymorphism, GADTs, polymorphic variants, structurally-typed objects, modules + functors as a higher-order type system.
- Memory: tracing, generational, incremental garbage collector. Multicore (OCaml 5+): parallel minor + major collectors, per-domain minor heap.
- Compilation:
ocamlcproduces bytecode (portable, fast startup),ocamloptproduces native code (fast, optimized via Flambda/Flambda2).js_of_ocamlandmelangetarget JavaScript. - Primary domains: compilers (Hack, Flow, ReScript, Coq, F*), formal methods, financial systems (Jane Street), static analysis (Infer), language tooling.
- Official docs: https://ocaml.org/docs and https://v2.ocaml.org/manual/
At a glance
OCaml is the most production-deployed ML-family language. Hindley-Milner inference catches astonishing numbers of bugs at compile time without annotation noise; the module system (with functors — modules parameterized by other modules) is its most distinctive feature, used as a structural mechanism for code reuse and dependency injection beyond what most languages offer. OCaml 5 added effect handlers and shared-memory parallelism — a major capability shift after 25 years of single-domain-only execution. Jane Street’s open-source contributions (Base, Core, Async, ppxes) define modern OCaml style.
Getting started
Install: use opam (the package manager and compiler installer):
# Linux/macOS
bash -c "sh <(curl -fsSL https://opam.ocaml.org/install.sh)"
opam init # bootstraps a switch
opam switch create 5.4.1
eval $(opam env)
Or brew install opam, apt install opam. Windows: native opam supported since opam 2.2 (no more cygwin/wsl required).
Hello world (hello.ml):
let () = print_endline "Hello, world!"Compile + run: ocaml hello.ml (script-style interpreter), or ocamlfind ocamlopt -package str -linkpkg hello.ml -o hello && ./hello. With dune (the modern way): dune exec ./hello.exe.
REPL: ocaml (top-level) or the much-improved utop (community standard, syntax highlighting + completion + history): opam install utop then utop. Use ;; to terminate top-level expressions.
Project layout (a dune project):
myproj/
dune-project # project metadata
myproj.opam # opam package metadata (often dune-generated)
bin/
dune # (executable (name main))
main.ml
lib/
dune # (library (name myproj))
util.ml
types.ml
test/
dune # (test (name test_myproj))
test_myproj.ml
Package/build tool: opam for packages (https://opam.ocaml.org/packages/); dune for builds (dune build, dune test, dune exec). dune files are S-expression descriptions of targets. Together, opam + dune is the canonical modern stack.
Basics
Types and literals:
int(63-bit on 64-bit platforms — one bit reserved for GC tagging),int32,int64,nativeint,float,bool,char(single byte, not Unicode),string(immutable since 4.06),bytes(mutable byte string),unit(()),'a list,'a array,'a option(=None | Some of 'a),('a, 'b) result(=Ok of 'a | Error of 'b), tuples(1, "a"), records{ x : int; y : int }, variantstype t = A | B of int | C of string * int,'a ref(mutable cell).- Arithmetic operators are monomorphic:
+is int,+.is float,^is string concat. (No overloading without modular implicits or ppxes.)
Variables/scoping: let x = expr binds, let x = e1 in e2 is local. All bindings are immutable by default; mutation via ref (let r = ref 0; r := !r + 1) or mutable record fields ({mutable x : int}). Shadowing is normal: let x = 1 in let x = x + 1 in x.
Control flow: if e1 then e2 else e3. Pattern match — the workhorse:
match shape with
| Circle r -> 3.14 *. r *. r
| Rectangle (w, h) -> w *. h
| Triangle (a, b, c) -> ... (* compiler warns if non-exhaustive *)for i = 1 to 10 do ... done, while cond do ... done. No break/continue; use exceptions or recursion.
Functions: everything is curried by default.
let add x y = x + y (* val add : int -> int -> int *)
let inc = add 1 (* partial application *)
let plus = fun x y -> x + y (* lambda *)
(* labeled args *)
let div ~num ~den = num / den
div ~num:10 ~den:2
(* optional args *)
let greet ?(greeting="Hello") name = greeting ^ ", " ^ name
greet "world"
greet ~greeting:"Hi" "world"Strings: immutable since 4.06. String.length, String.sub s pos len, String.concat ", " ["a"; "b"]. UTF-8 not native — string is bytes; use Uucp/Uutf for Unicode. Sprintf-style: Printf.sprintf "x = %d" 42.
Collections: 'a list (singly-linked, immutable: [1; 2; 3], 1 :: [2; 3]); 'a array (mutable, fixed-length: [|1; 2; 3|], a.(i)); Hashtbl (mutable); Map.Make(K) and Set.Make(K) (functor-built immutable maps/sets); Queue, Stack, Buffer. Modern: Jane Street’s Base.Map, Base.Hashtbl.
Intermediate
Type system depth: this is OCaml’s heart.
- Type inference (Hindley-Milner): most code needs no annotations. Polymorphism is inferred:
let id x = xhas type'a -> 'a. - Algebraic data types (ADTs):
type 'a tree = Leaf | Node of 'a * 'a tree * 'a tree. Pattern match exhaustively or compiler warns. - Polymorphic variants:
[ \Apple | `Banana of int ]— open variant types, no declaration needed; useful for extensible APIs (heavy use injs_of_ocaml`). - Records: nominal, can have mutable fields. Field-based polymorphism via row types in objects.
- Objects: structurally typed (
< x : int; y : int >is an object type). Method dispatch is by name; subtyping is structural. Used less than ML core. - GADTs (Generalized Algebraic Data Types):
type _ expr = IntL : int -> int expr | Add : int expr * int expr -> int expr | Eq : 'a expr * 'a expr -> bool expr. Enable type-safe interpreters, length-indexed vectors, etc.
Modules are a separate language layered on top:
module Stack = struct
type 'a t = 'a list
let empty = []
let push x s = x :: s
let pop = function [] -> None | x :: s -> Some (x, s)
end
(* Module signatures (interfaces): *)
module type STACK = sig
type 'a t
val empty : 'a t
val push : 'a -> 'a t -> 'a t
val pop : 'a t -> ('a * 'a t) option
endFunctors are modules parameterized by modules:
module MakeSet (Ord : sig type t; val compare : t -> t -> int end) = struct
type elt = Ord.t
type t = elt list
(* ... *)
end
module IntSet = MakeSet(struct type t = int let compare = compare end)Modules are the dependency-injection mechanism of OCaml.
Error handling: two camps.
- Exceptions —
raise (Failure "msg"),try expr with Failure m -> .... Cheap. option/result— typed errors (Ok v | Error e). Modern style (Jane Street’sBase/Core, theresultlibrary) prefersresult.let* x = Result.bind ...for pipelines.
Concurrency primitives:
- Pre-OCaml 5: cooperative only, via
Lwt(lightweight threads, monadic) orAsync(Jane Street, monadic). Both are based on promises, single-threaded. - OCaml 5+: Domains — true OS threads with parallel GC.
Domain.spawn (fun () -> ...). Effect handlers for structured concurrency / control flow inversion.Eiois the modern effect-based I/O library;domainslibfor parallelism patterns.
I/O: print_string, print_endline, print_int, Printf.printf. Files: open_in/open_out, input_line, output_string, close_in. Modern: Bos (filesystem ops), Eio (effect-based concurrent I/O), Lwt_io / Async.Reader/Writer (older monadic).
Stdlib highlights: List, Array, String, Bytes, Buffer, Hashtbl, Map.Make, Set.Make, Queue, Stack, Printf/Scanf/Format, Random, Unix (POSIX), Sys (env, argv), Filename, Marshal (serialization, unsafe), Domain (5+), Effect (5+), Atomic (5+), Mutex/Condition/Semaphore/Event (5+).
Advanced
Memory / GC:
- Generational, incremental tracing GC. Two heaps: minor (per-domain in 5+, very fast bump allocation) and major (mark-and-sweep with optional compaction).
- Pre-5: a single global GC lock prevented true parallelism.
- 5+: multicore — each
Domainhas its own minor heap; major heap is shared with parallel marking.OCAMLRUNPARAM=v=0x400to dump GC stats. Obj.magicandMarshalbypass the type system — never use except in compiler/library internals.
Concurrency deep dive:
- Domains = OS threads with isolated minor heaps. Use for CPU-parallel work.
- Fibers / Effects: OCaml 5 added algebraic effects. Define an effect, perform it, handle it elsewhere — the basis of
Eio. Replaces continuation-passing for async I/O. - Eio is the modern effect-based concurrency library: structured concurrency, switches, fibers, capabilities. https://github.com/ocaml-multicore/eio.
- Lwt and Async (cooperative, single-domain) remain the most-deployed today; they will coexist with Eio for years.
FFI:
- C interop is mature: write
external my_fn : int -> int = "caml_my_fn"in OCaml; implementvalue caml_my_fn(value n) { ... }in C using OCaml’s runtime macros (Val_int,Int_val,caml_alloc_*,CAMLparam*/CAMLreturnto register roots). duneintegrates C compilation:(foreign_stubs (language c) (names my_stubs)).ctypeslibrary for safer, declarative bindings without writing C glue.- Dune cstubs generate C bindings via OCaml DSL.
Reflection: minimal at runtime — runtime type info is erased after compilation. Compile-time reflection is via PPX preprocessors (see God mode).
Performance tools:
ocamlopt -O3(uses Flambda if compiler was built with--enable-flambda).OCAMLRUNPARAM=v=0x400for GC trace.perf record ./prog(Linux) — OCaml emits good DWARF.landmarkslibrary: instrument with[@landmark]attributes; produces flame graphs.magic-trace(Jane Street, Linux, Intel-only) for nanosecond-resolution traces.ocaml-memtrace+memtrace_viewer(Jane Street) for allocation profiling.core_benchfor microbenchmarks.
God mode
PPX preprocessor extensions: OCaml’s metaprogramming. PPXes rewrite the AST after parsing but before type-check. Built on ppxlib. Examples:
[@@deriving show, eq, yojson]auto-generates printers, equality, and JSON serializers.[%bs.obj { x = 1 }]forjs_of_ocaml/MelangeJS object literals.let%lwt x = e in ...(ppx_let) — monadic syntax.- Write your own with
ppxlib: define a transformation fromParsetreetoParsetree, register it, and dune’s(preprocess (pps your_ppx))runs it.
GADTs: type-indexed term language; lets you encode invariants in the type.
type _ ty = Int : int ty | Bool : bool ty
let cast : type a. a ty -> int -> a = function
| Int -> fun n -> n
| Bool -> fun n -> n <> 0Pattern-matching on a GADT constructor refines the existential type variable.
First-class modules: pack a module into a value. let m = (module IntSet : SET with type elt = int); unpack with let module M = (val m) in .... Enables runtime module selection — dependency injection without functors.
Functors: parameterize a module by a module signature. The compiler eagerly checks the parameter satisfies the signature. Building blocks for whole architectures (Jane Street’s Map, Set, Hashtbl are all functor-based).
compiler-libs: the OCaml compiler exposes its parser, type-checker, and interpreter as a library. Build code editors, refactoring tools, custom REPLs.
Bytecode vs native: ocamlc emits .cmo bytecode, runnable on any platform with ocamlrun; great for portability and very fast let%bs startup. ocamlopt emits native object code (uses x86_64, aarch64, riscv64, etc. backends); much faster runtime, larger binaries.
Flambda / Flambda2: optional optimization middle-end. Enable with OCAMLFLAGS=-O3 in dune, or opam switch create 5.4.1+flambda. Inlining, specialization, simplification — gives 5-30% speedups on idiomatic code.
Multicore (OCaml 5+): effect handlers as a low-level concurrency primitive. Eio builds structured concurrency on top. Domain.spawn for true parallelism — but avoid sharing mutable state (no Atomic for arbitrary values; only int/ref).
js_of_ocaml: compiles bytecode to JavaScript — runs essentially any OCaml program in the browser/Node. Used by Flow, Coq’s jsCoq, ReScript’s predecessor.
Melange (fork of BuckleScript): compiles typed AST directly to ergonomic JS with great FFI; used in React-like frontend stacks.
WasmGC backend (OCaml 5+): emerging.
MetaOCaml: a research dialect adding multi-stage programming — <expr> quotes code, ~e antiquotes, Runcode.run e evaluates. Generate specialized code at runtime then JIT-compile.
Idioms & style
- Naming:
snake_casefor values and functions,Capitalizedfor modules, types (exn), constructors (Some,Error), and exceptions. Filefoo.mldefines a moduleFooautomatically. - Formatter:
ocamlformat(configurable; the de facto standard, integrates with dune viadune build @fmt). - Linter:
merlin(LSP backend) flags warnings;ocaml-lsp-server,ppx_jane/ppx_expectfor Jane Street style. - Idiomatic patterns:
- Use
matchoveriffor ADTs; let the compiler check exhaustiveness. - Use
option/resultover exceptions for expected errors. - Use
let*andlet+(binding operators) for monadic chains inResult/Lwt/Eio. - Use
Map.Make/Set.Make/Hashtbl.Make(functor instantiations) over polymorphicHashtblfor type safety. - One module per file; signatures in
.mlifiles (interface) hide internals. - Avoid
Obj.magic,Marshal, mutable globals.
- Use
- Expert review focus: non-exhaustive pattern matches (warn-as-error in CI), needless
ref/mutable, exception leakage from libraries (declare(*...*)doc), GADT escape into refutation cases,ppxover-use (slow build), interface (.mli) accidentally widening type abstractions.
Ecosystem
- Web: Dream (modern, async; recommended for new projects), Opium (Sinatra-style), Cohttp (HTTP client/server), Lwt + Cohttp, Eio + Cohttp.
- Frontend: js_of_ocaml (any bytecode → JS), Melange (modern, React-friendly), Brr (Browser bindings).
- Concurrency: Lwt (mature, monadic, single-threaded), Async (Jane Street, monadic, single-threaded), Eio (effect-based, multicore-aware, modern).
- Data / parsing:
Yojson,ppx_yojson_conv,Sexplib,Angstrom(parser combinators),Re(regex),Csv,Owl(numerical computing — NumPy-like). - Database:
Caqti(interface),Petrol(typed SQL builder),Sqlite3-OCaml,pgx/postgresql-ocaml. - Testing:
alcotest(lightweight, popular),ppx_inline_test+ppx_expect(Jane Street; tests live next to code; expect tests diff output). - Coverage:
bisect_ppx. - Docs:
odoc(the standard); doc comments use(** ... *).dune build @docgenerates HTML. - Notable users: Jane Street (massive — entire trading system; major opam contributor), Meta/Facebook (Hack, Flow, Infer all in OCaml), Microsoft (F* verifier, parts of Azure SDK gen), Bloomberg, Tezos (blockchain in OCaml), Coq/Rocq (proof assistant), MirageOS (unikernels), Citrix XenServer.
Gotchas
intis 63-bit on 64-bit systems (one bit for GC tag). UseInt64.tfor bit-exact 64-bit arithmetic.- Monomorphic operators:
1 + 2(int),1.0 +. 2.0(float),"a" ^ "b"(string). Mixing without conversion is a type error. Modern style: openBaseand use+,+.,^consistently. - Polymorphic comparison
=/compare/<uses runtime structural equality; works on most things but blows up on cycles, functions, and abstract types. Use type-specific comparators. ==is physical (pointer) equality, not value equality. You almost always want=.unitvs(): forgettinglet () = ... in ...at top-level gives “unused result” warnings.- Imperative loops with
refin functional code is a code smell — usually aList.fold_leftor recursion is clearer. Marshal.to_stringis unsafe: no type info preserved across versions; reading the wrong type into amarshal’d value is UB.- Open variants leak:
\Apple` accidentally matching another module’s polymorphic variant of the same name. - GADT exhaustiveness: pattern compiler may need explicit type annotations to prove a case is impossible (
function _ -> .). - PPX-induced compile times: heavy
[@@deriving]use slows builds significantly. Profile withdune build --verbose. - Lwt vs Async vs Eio fragmentation: each has incompatible types and bindings; choosing one locks you in for that codebase.
Obj.magicexists and a panicked junior may reach for it; always reject in review.- OCaml 5 multicore caveats: data-race-freedom is not statically guaranteed; you must design for it.
Atomic.tis the only built-in safe sharing primitive for arbitrary mutable cells. - opam switches are heavy: each switch is a full toolchain + library set. Use
opam switch create . 5.4.1(local) to scope to a project. .mliinterfaces hide types: a too-restrictive.mlican make types opaque to consumers; balance abstraction vs. usability.
Citations
- OCaml docs hub: https://ocaml.org/docs
- OCaml Manual (canonical reference): https://v2.ocaml.org/manual/
- The OCaml Manual (5.4 / latest): https://ocaml.org/manual
- “Real World OCaml” 2nd ed. (free online): https://dev.realworldocaml.org/
- opam: https://opam.ocaml.org/
- dune build system: https://dune.readthedocs.io/
- ppxlib (PPX framework): https://ocaml-ppx.github.io/ppxlib/ppxlib/
- OCaml 5 multicore guide: https://v2.ocaml.org/manual/parallelism.html
- Effect handlers tutorial: https://github.com/ocaml-multicore/ocaml-effects-tutorial
- Eio (effect-based concurrency): https://github.com/ocaml-multicore/eio
- ocaml-lsp / merlin: https://github.com/ocaml/ocaml-lsp
- ocamlformat: https://github.com/ocaml-ppx/ocamlformat
- odoc: https://ocaml.github.io/odoc/
- OCaml Software Foundation: https://ocaml-sf.org/
- Jane Street open-source (Base, Core, Async): https://opensource.janestreet.com/
- Wikipedia (history, creators, license): https://en.wikipedia.org/wiki/OCaml