SLAM (Simultaneous Localization & Mapping) — Robotics Reference

See also (Tier 3 family index): SLAM Algorithm Zoo

1. At a glance

SLAM answers a chicken-and-egg question: a robot moving through an unknown environment must build a map of the environment and localise itself within that map, simultaneously, from sensor data alone. Both estimates depend on each other; neither can be solved without the other. SLAM is the algorithmic layer that consumes the exteroceptive and proprioceptive sensors covered in [[Robotics/sensors-perception]] and [[Robotics/sensors-pose-motion]] and emits a coherent (pose, map) tuple suitable for the planner and controller layers above it.

Three modern paradigms dominate, plus emerging neural variants:

Filter-based SLAM — EKF-SLAM (Smith / Self / Cheeseman 1986), particle filter (FastSLAM, Montemerlo 2002), and Multi-State Constraint Kalman Filter (MSCKF, Mourikis 2007). One recursive Bayesian state that grows as landmarks are added. Mostly historical for mapping; MSCKF persists in tightly-coupled VIO.
Optimisation / graph SLAM — Lu / Milios 1997 onward; the dominant paradigm. The trajectory and landmarks are nodes in a factor graph; sensor measurements and odometry are edges (factors); a non-linear least-squares solver (Gauss-Newton, Levenberg-Marquardt, or incremental iSAM2) finds the maximum-a-posteriori estimate. GTSAM, g2o, Ceres are the production libraries.
Tightly-coupled visual-inertial / LiDAR-inertial — VIO (VINS-Mono, ORB-SLAM3 inertial mode, OKVIS, ROVIO, Kimera) and LIO (LIO-SAM, FAST-LIO2, Point-LIO) fuse IMU pre-integration (Forster 2017) directly into the optimisation. The default for mobile robotics in 2026.
Neural / implicit SLAM — NICE-SLAM (Zhu 2022), iMAP, Gaussian-Splatting SLAM (Matsuki 2024). Replace the discrete map with a learned implicit field (NeRF) or 3D Gaussian primitives. Photometrically accurate, dense, GPU-bound; emerging into AR / robotics demos but not yet production-ready for safety-critical autonomy.

Every modern SLAM system splits into a front-end (sensor processing, feature extraction, data association, place recognition) and a back-end (graph optimisation, loop-closure integration, marginalisation). Front-end errors are local and recoverable; back-end errors (especially false loop closures) corrupt the entire map globally.

First ask before applying SLAM:

What are the sensors (mono / stereo / RGB-D / 2D-LiDAR / 3D-LiDAR / IMU / GPS), and what are their rates and synchronisation guarantees?
What is the environment — indoor structured, outdoor open, mixed, underground, dynamic?
What is the trajectory length — 10 m corridor, 1 km warehouse loop, 100 km road network?
What is the output — a 2D occupancy grid for nav2, a 3D point cloud for visualisation, a TSDF for collision, a textured mesh for AR?
What is the compute budget — Jetson Orin Nano (15 W), Orin AGX (60 W), workstation, or cloud?
Loop closure — required for bounded drift; how reliable does the detector need to be?

2. First principles

The SLAM probability problem

The full SLAM problem is to estimate the joint posterior

p(x_{1:t}, m | z_{1:t}, u_{1:t})

over the robot trajectory x_{1:t} = (x_1, …, x_t), the map m (landmarks, occupancy grid, or surfels), conditioned on all sensor observations z_{1:t} and control inputs u_{1:t}. This is intractable in general; practical SLAM solves one of three reductions:

Online SLAM — marginalise out past poses, keep p(x_t, m | z_{1:t}, u_{1:t}). Filter-based methods (EKF, particle) live here.
Full SLAM — keep the whole trajectory and map; optimise over everything. Graph SLAM methods live here.
Sliding-window SLAM — keep only the last N keyframes; marginalise older states with a prior. VIO and most modern VO/VIO systems live here.

Factor graphs

A factor graph (Dellaert & Kaess 2017) is a bipartite graph with two node types: variables (poses, landmarks, calibrations, biases) and factors (probabilistic measurement constraints). The joint posterior factorises as

p(X | Z) ∝ ∏_i  φ_i(X_i)

where each factor φ_i is typically Gaussian: φ(x) = exp(−½ · ‖e(x)‖²_Ω). Solving for the MAP estimate reduces to non-linear weighted least squares:

X* = argmin_X  Σ_i  e_i(X_i)^T · Ω_i · e_i(X_i)

Common factor types:

Odometry factor: e = log(T_ij_meas⁻¹ · T_i⁻¹ · T_j) ∈ se(3)
Loop-closure factor: same form, but T_ij from place-recognition + relative-pose estimation
Landmark observation factor: e = z − π(T_i, l_j), where π is the camera-projection model
IMU pre-integration factor (Forster 2017): a compact summary of all IMU measurements between two keyframes, accounting for bias and gravity
Prior factor: fixes the first pose at the origin to remove the gauge freedom of SE(3)
GPS factor, wheel odometry factor, plane factor, switchable constraint factor, etc.

A pose graph is the special case where landmarks have been marginalised out (Schur complement) or were never modelled; only poses and pose-pose constraints remain. Pose-graph optimisation is much cheaper than full bundle adjustment and is the back-end of choice for LiDAR SLAM.

Front-end vs back-end split

Stage	Job	Typical components
Front-end	Per-frame sensor processing, feature / scan extraction, data association, motion estimation, keyframe selection, loop-closure detection	ORB / SIFT / SuperPoint, LK tracker, RANSAC, ICP, scan-context
Back-end	Maintain the factor graph; solve non-linear LS; marginalise old states; handle loop-closure insertion	GTSAM / g2o / Ceres; iSAM2 incremental solver; robust kernels

This separation matters operationally: the front-end runs at sensor rate (30–200 Hz), the back-end runs at keyframe rate (1–10 Hz) or asynchronously when loop closures arrive. Most failures (wrong matches, false loops) are front-end failures that the back-end inherits.

Loop closure: why it dominates

Without loop closure, every SLAM system accumulates drift roughly linearly in trajectory length — O(d) where d is distance travelled. With a working loop-closure detector and the resulting global optimisation, drift becomes O(1) bounded by the size of the explored region. The pivotal moment in any SLAM run is the first revisit of a previously-mapped place. A true positive loop closure injects a global constraint that pulls drift out of the trajectory; a false positive loop closure injects a contradictory constraint that the back-end will try to satisfy, and the entire map can collapse onto itself (“kidnap” failure).

Robust back-ends mitigate this with switchable constraints (Sünderhauf & Protzel 2012), max-mixtures (Olson 2013), or graduated non-convexity (Yang 2020) — methods that let the optimiser down-weight or reject suspect constraints during optimisation rather than committing to them up-front.

Marginalisation and information loss

Sliding-window VIO maintains a bounded state by marginalising out the oldest keyframe at each step. Marginalisation converts the joint posterior over (X_old, X_keep) into a prior over X_keep alone — analytically a Schur complement that fills in across all variables connected to X_old. This makes the resulting information matrix dense (fill-in), increasing solve cost. First-Estimate Jacobians (FEJ, Huang 2010) and observability-constrained variants address consistency issues that pure linearisation around the latest estimate introduces.

3. Practical math and worked examples

Example A — Pose-graph optimisation in GTSAM

A small mobile robot drives a 10-pose square loop with wheel-odometry edges and a single closure when it returns to the start.

import gtsam, numpy as np
graph = gtsam.NonlinearFactorGraph()
init  = gtsam.Values()
 
# Odometry noise: 5 cm translation, 1 deg rotation
odo_noise  = gtsam.noiseModel.Diagonal.Sigmas(np.array([0.05]*3 + [np.deg2rad(1)]*3))
loop_noise = gtsam.noiseModel.Diagonal.Sigmas(np.array([0.10]*3 + [np.deg2rad(2)]*3))
 
graph.add(gtsam.PriorFactorPose3(0, gtsam.Pose3(), odo_noise))
for i in range(9):
    delta = gtsam.Pose3(gtsam.Rot3.RzRyRx(0, 0, np.deg2rad(40)), gtsam.Point3(1.0,0,0))
    graph.add(gtsam.BetweenFactorPose3(i, i+1, delta, odo_noise))
    init.insert(i+1, gtsam.Pose3())
init.insert(0, gtsam.Pose3())
 
# Loop closure: pose 9 back to pose 0
graph.add(gtsam.BetweenFactorPose3(9, 0, gtsam.Pose3(), loop_noise))
 
params = gtsam.LevenbergMarquardtParams()
result = gtsam.LevenbergMarquardtOptimizer(graph, init, params).optimize()

On a 2024 x86 laptop this 10-node graph solves in ~5 ms with iSAM2 incremental updates and ~20 ms with batch Levenberg-Marquardt. Scaling to a 10 000-pose graph: iSAM2 stays in the tens-of-ms range (incremental QR factorisation reuses prior work); batch LM grows to seconds.

Units: poses in SE(3) (meters + radians); noise sigmas have meters for the first three and radians for the last three; the BetweenFactor’s error is in tangent space se(3) so the resulting cost is dimensionally consistent.

Example B — ORB feature matching for V-SLAM

ORB-SLAM3 extracts up to 1000 ORB features per frame at 30 Hz. ORB descriptors are 256-bit binary vectors; match cost is Hamming distance (population-count of XOR), which takes ~3 ns per pair on AVX2 hardware.

Naïve matching of two frames is 1000 × 1000 = 10⁶ pair-wise distance computations ≈ 3 ms — fast enough on CPU. Two cuts reduce this to ~10–30 μs in production:

Lowe’s ratio test — for each query feature, find the best and second-best match in the train set; accept only if best/second-best < 0.75. This rejects ambiguous correspondences that would generate outliers.
Geometric verification with RANSAC + 5-point essential matrix (Nistér 2004) — fit E from a random minimal sample of 5 correspondences; count inliers under the epipolar constraint; iterate 100–500 times; refine on the inlier set. With ratio-tested matches, typical inlier counts are 100–300 out of ~500 candidates.

The full pipeline (extract → ratio → RANSAC → refine) runs at 15–30 Hz on a Jetson Orin Nano for 1280×720 input.

Example C — LIO-SAM benchmark on KITTI sequence 00

Hardware: Velodyne VLP-16 (10 Hz, 600 k points/s) + Xsens MTi-30 IMU (100 Hz). Sequence 00 is 3724 m of urban driving with multiple revisits.

Pipeline:

Per-scan de-skewing using IMU integration over the 100 ms scan period.
Feature extraction: planar + edge points via LOAM-style curvature filtering, ~3000 features per scan.
Scan-to-map registration via point-to-plane ICP at keyframes.
IMU pre-integration factors between keyframes (Forster 2017): one factor summarises 10 IMU samples.
ScanContext loop-closure detection (Kim 2018) — a polar 60×20 histogram of point heights; matches via row-shift-invariant cosine similarity. Detector fires at ~5 places along the loop.
iSAM2 incremental back-end on Jetson Xavier NX (15 W mode).

Result: end-to-end Absolute Trajectory Error (ATE) ≈ 0.5 % of distance = 18 m over 3.7 km, vs ~1.5 % for LOAM-only (lidar-only, no IMU), vs ~0.3 % for state-of-the-art LiDAR-VIO (KISS-ICP + camera). Frame-to-frame latency is ~50 ms; loop-closure-triggered global optimisation completes in 200–400 ms.

Example D — Point-to-plane ICP for scan registration

LiDAR scan registration uses point-to-plane ICP (Chen & Medioni 1991) rather than point-to-point because the latter is biased for sparse scans against dense surfaces. Given a source point cloud P and a target T with surface normals n_j, the cost is:

E(T) = Σ_i  ((T·p_i − q_j)^T · n_j)²

linearised around small angle α and translation t, the per-residual Jacobian is the 1×6 row:

J_i = [ (p_i × n_j)^T   n_j^T ]

Each iteration solves the 6×6 normal-equation H·δ = b with H = Σ J^T·J and b = −Σ J^T·r. For a 50 000-point scan with k-d-tree nearest-neighbour lookups (~3 μs per query on a Jetson Orin Nano), one iteration costs ~150 ms naïve but drops to ~25 ms with multi-threading and to ~5 ms on GPU (cuICP, ZED SDK). Typical convergence: 10–20 iterations to a final residual under 1 cm.

Common failure: ICP returns a local minimum if the initial pose is more than half the scan’s correlation length off true. Mitigations: use IMU prediction as the initial pose; use multi-resolution voxel grids (coarse-to-fine); use NDT (Normal Distributions Transform, Biber 2003) which has a wider basin of convergence.

Example E — IMU pre-integration factor cost

Between two keyframes at times t_i and t_j, the IMU pre-integrated factor (Forster 2017) summarises N raw IMU samples into three quantities — Δp_ij, Δv_ij, ΔR_ij — and their 9×9 covariance Σ_ij. The factor’s residual is a 15-vector (the three pre-integration deltas plus accel + gyro bias deltas):

r_p  = R_i^T · (p_j − p_i − v_i·Δt − ½·g·Δt²)  − Δp_ij(b_a, b_g)
r_v  = R_i^T · (v_j − v_i − g·Δt)              − Δv_ij(b_a, b_g)
r_R  = log( ΔR_ij(b_g)^T · R_i^T · R_j )
r_ba = b_a_j − b_a_i
r_bg = b_g_j − b_g_i

The key insight: pre-integration moves the integration into a frame that does not depend on R_i, so if R_i changes during optimisation, the pre-integrated quantities don’t need to be re-integrated — only bias-corrected with first-order updates. A 100 Hz IMU between 10 Hz keyframes contributes 10 samples per factor; cost evaluation is < 10 μs.

Bias modelling: accel and gyro biases are typically modelled as random walks (ḃ = w_b, w_b white noise) with hand-tuned process noise — order 10⁻³ m/s³/√Hz for accel, 10⁻⁵ rad/s²/√Hz for gyro on a consumer MEMS IMU like Bosch BMI088 or InvenSense ICM-42688.

4. Design heuristics

Sensor stack by platform

Platform	Sensors	Recommended algorithm
Indoor AMR / warehouse robot	2D LiDAR + wheel-odom + IMU	`slam_toolbox`, Cartographer 2D, gmapping
Outdoor AGV / lawn robot	3D LiDAR + IMU + GNSS	LIO-SAM, FAST-LIO2, Cartographer 3D
AR / VR headset	Stereo + IMU	VIO (proprietary Apple ARKit, Google ARCore; open VINS-Fusion, ORB-SLAM3)
Consumer drone	Stereo + IMU + downward-facing	VINS-Fusion, Kimera, proprietary DJI/Skydio
Self-driving car	Multi-LiDAR + cameras + radars + IMU + GNSS + HD-map	Proprietary; Cartographer or Autoware research stacks
Surgical / medical	Stereo endoscope + IMU + tool encoders	Proprietary; ORB-SLAM3 / SuperPoint-based
Quadruped (Spot, ANYmal, Unitree)	Stereo / RGB-D + LiDAR + IMU + leg-odom	VILENS (Wisth 2022), Pronto, proprietary
Underwater (AUV)	DVL + IMU + sonar + occasional GPS	Acoustic SLAM, factor-graph fusion (GTSAM)

Loop-closure detector choice

Detector	Sensor	Pros	Cons
DBoW2 / DBoW3 (Gálvez-López 2012)	Visual	Fast (~5 ms), proven, ORB-friendly	Lighting / season sensitive
NetVLAD (Arandjelović 2016)	Visual	Robust to illumination	~50 ms inference; needs GPU
HF-Net / SuperPoint+SuperGlue	Visual	SOTA accuracy	Heavy compute
ScanContext (Kim 2018)	3D LiDAR	Compact (1200-byte descriptor), rotation invariant	2D projection loses 3D detail
ScanContext++ (Kim 2022)	3D LiDAR	Adds altitude robustness	Slightly more compute
BoW3D (Cui 2022)	3D LiDAR	Real-time, LinK3D-feature based	Newer, less battle-tested
Iris (Wang 2020)	3D LiDAR	Binary signature, fast	Less robust than ScanContext
Place-NeRF / OverlapNet	LiDAR / visual	Learned, robust	GPU-bound

Setting the threshold — every detector has a similarity threshold T. T high → fewer false positives but missed closures (drift grows). T low → catches all true closures but admits false positives that destroy the map. Production stacks add a geometric verification step: after detector fires, run ICP or 5-point essential matrix to check the relative-pose hypothesis is geometrically consistent. Reject if inlier count or fitness score is below a second threshold.

Map representations

Representation	2D / 3D	Use case	Library
Occupancy grid (probabilistic)	2D / 3D	Path planning, navigation	nav2 costmap, OctoMap
Point cloud	3D	Visualisation, registration	PCL, Open3D
Surfel cloud	3D	Dense surface, deformable	ElasticFusion, SuMa (Behley 2018)
TSDF (Truncated SDF)	3D	Collision, gradient-based planning, dense	KinectFusion, Voxblox, nvblox
ESDF (Euclidean SDF)	3D	Sampling-based planning, drone obstacle avoidance	Voxblox, Voxfield
Triangle mesh	3D	Visualisation, photogrammetry export	Open3D Poisson, AliceVision
NDT (Normal Distributions Transform)	2D / 3D	Probabilistic registration	autoware.universe
NeRF / implicit	3D	Photometric novel-view, AR	NICE-SLAM, Instant-NGP
3D Gaussian splats	3D	Dense + fast render, AR	GS-SLAM (Matsuki 2024), MonoGS
Topological / pose graph	abstract	Long-term, multi-session	Maplab, RTAB-Map graph mode

Compute budget reality check

Platform	TDP	What runs real-time
Jetson Orin Nano (8 GB)	7–15 W	ORB-SLAM3 mono/stereo at 15 fps (1280×720), FAST-LIO2 at 10 Hz (VLP-16), slam_toolbox 2D
Jetson Orin NX 16 GB	10–25 W	ORB-SLAM3 stereo-inertial at 30 fps, FAST-LIO2 with 64-channel, Kimera-VIO
Jetson AGX Orin 64 GB	15–60 W	All of the above plus dense neural SLAM (NICE-SLAM at low resolution)
RTX 4090 workstation	450 W	Gaussian-splatting SLAM at interactive rates; 8K bundle adjustment
Apple M2 / M3	15–30 W	ARKit production VIO at 60 Hz
Qualcomm RB6 (Snapdragon 8550)	5–15 W	Hexagon-DSP VIO for drones at 100 Hz with low jitter

Time-sync requirements

Software-only timestamping (e.g. node arrival time) drifts 5–50 ms relative to physical capture, with non-deterministic latency. Cross-modal fusion requires better than 1 ms alignment between an IMU and a camera or LiDAR. Achievable via:

Hardware trigger from a common PPS (GNSS) or FPGA pulse.
PTP / IEEE 1588 on all sensors with PHY-level timestamping (Basler, FLIR, Ouster, Velodyne all support PTP).
Per-point LiDAR timestamping — every Velodyne / Ouster point carries an offset within the rotation; combine with IMU to de-skew.

Initialisation gotchas

System	Initialisation requirement
Mono SLAM	Need parallax motion ≥ 0.1 m over ≥ 0.5 s
Stereo SLAM	Works from frame 1 (metric scale from baseline)
RGB-D SLAM	Works from frame 1
LiDAR SLAM	Needs feature-rich first scan; avoid blank corridor at start
Mono-inertial VIO	Need motion to observe accel bias; sit-still + accelerate-forward is the canonical init pattern
Stereo-inertial VIO	Can initialise from gravity vector + stereo depth even when static (some implementations)

Degeneracy detection

SLAM does not gracefully degrade; it fails. Watch for:

Long featureless corridor — visual SLAM loses tracking; LiDAR sees parallel walls and slides along them.
Smoke / fog / dust — LiDAR sees a wall of noise; vision sees nothing.
Glass / mirrors — LiDAR sees through (no return) or reflects (false geometry); vision tracks the reflection rather than the surface.
Dynamic scenes (crowds, traffic) — features attach to moving objects → drift. Filter with semantic segmentation (DynaSLAM, Bescos 2018) or motion masks.
Repetitive structure — empty parking lots, warehouse aisles. Loop-closure detector fires false positives between identical-looking cells.

Production stacks add a failure-detection layer that watches feature count, ICP residual, IMU-vs-vision disagreement, and falls back to dead-reckoning (IMU-only or wheel-odom) with growing uncertainty until tracking recovers.

5. Components and sourcing

Open-source visual SLAM

Project	Sensors	License	Notes
ORB-SLAM3 (Campos 2021)	Mono / stereo / RGB-D + IMU	GPLv3	The single most-cited open-source SLAM since 2015; multi-map ATLAS subsystem
VINS-Fusion (Qin 2019)	Mono / stereo + IMU + (GPS)	GPLv3	HKUST; battle-tested on drones
SVO 2.0 (Forster 2017)	Mono / stereo	GPLv3 (commercial license available)	Direct/sparse, ultra-fast
DSO / LDSO (Engel 2017)	Mono	GPLv3	Photometric direct VO; demanding calibration
OpenVSLAM / stella_vslam	Mono / stereo / RGB-D + fisheye	2-clause BSD (after re-release)	Best fisheye support among open systems
Kimera (Rosinol 2020)	Stereo + IMU	BSD	Metric-semantic; mesh + 3D scene graph
Maplab 2.0 (ETH ASL)	Multi-session VIO	Apache 2.0	Designed for long-term multi-session mapping
OKVIS / OKVIS2 (Leutenegger)	Stereo + IMU	BSD	Reference implementation of keyframe-based VIO
ROVIO (Bloesch 2017)	Mono + IMU	BSD	Iterated EKF; runs on tiny CPUs
BASALT (TUM)	Stereo + IMU	BSD	Modern marginalisation; clean codebase

Open-source LiDAR SLAM

Project	Sensors	License	Notes
LOAM (Zhang & Singh 2014)	3D LiDAR	non-commercial	The progenitor; many derivatives
LeGO-LOAM (Shan 2018)	3D LiDAR (ground)	BSD	Ground-segmentation optimisation
LIO-SAM (Shan 2020)	3D LiDAR + IMU + (GPS)	MIT	The mainstream LIO baseline
FAST-LIO / FAST-LIO2 (Xu 2022)	3D LiDAR + IMU	GPLv2	Iterated-EKF, very low latency
Point-LIO (He 2023)	3D LiDAR + IMU	GPLv2	Per-point processing; high-vibration robust
KISS-ICP (Vizzo 2023)	3D LiDAR	MIT	LiDAR-only; minimal, drop-in odometry
Cartographer (Google, Hess 2016)	2D / 3D LiDAR + IMU + (odom)	Apache 2.0	2D rocks; 3D more situational
GLIM (Koide 2024)	3D LiDAR + IMU	MIT	GPU-accelerated factor-graph LiDAR SLAM
HDL-SLAM	3D LiDAR	BSD	Older but stable
A-LOAM / F-LOAM	3D LiDAR	BSD	Cleaned-up LOAM rewrites

Frameworks and ROS 2 integration

Package	Function
`slam_toolbox`	2D LiDAR SLAM, the nav2 default in ROS 2; serialisable maps
`cartographer_ros`	Google Cartographer ROS 2 bindings
`rtabmap_ros`	RTAB-Map (Labbé 2019); RGB-D + LiDAR; multi-session
`nav2` + `amcl`	Adaptive Monte Carlo Localization against a known map
`lio_sam` / `fast_lio`	ROS 2 ports
`kiss_icp`	Standalone ROS 2 node
`rviz2`	Visualisation
`octomap_server`	OctoMap 3D occupancy

Optimisation libraries

Library	Best for	License
GTSAM (Dellaert 2012)	iSAM2 incremental; factor graphs with semantic factors; Python bindings	BSD
g2o (Kümmerle 2011)	Batch pose-graph and bundle adjustment; simple API	BSD
Ceres Solver (Google)	General non-linear LS; auto-diff; SfM, calibration	New BSD
SymForce (Skydio 2022)	Symbolic factor generation, optimised codegen	Apache 2.0
MOLA	Modular SLAM framework (José-Luis Blanco)	GPLv3
SuiteSparse / CHOLMOD	Sparse linear-algebra backbone	LGPL

Commercial / proprietary

Vendor	Product	Stack
Apple	ARKit	Custom VIO + LiDAR fusion (Pro models)
Google	ARCore	Custom VIO; Cloud Anchors for multi-session
Microsoft	HoloLens 2 / Mesh	4-cam VIO + ToF
Meta	Quest 3 / Quest Pro	Inside-out 4-cam VIO + colour passthrough
Magic Leap	ML2	Stereo VIO + IR depth
Tesla	FSD	Vision-only HydraNet + lane-graph
Waymo	Driver	Multi-sensor proprietary fusion
Zoox	Vehicle	Custom multi-sensor stack
SLAMtec	Mapper M-series	2D / 3D LiDAR + SLAM for indoor
Ouster	Gemini	VLS-128 + camera + radar reference stack
Stereolabs	ZED SDK Spatial Mapping	Stereo + IMU; mesh / point-cloud output
Intel RealSense	T265 (EoL)	Snapdragon-Flight VIO module
NVIDIA	Isaac Nova Carter / Isaac Perceptor	Reference perception stack on Jetson

Standard datasets

Dataset	Sensors	Scenario
KITTI (Geiger 2012)	Stereo + Velodyne HDL-64E + GPS + IMU	Urban driving (Karlsruhe)
KITTI-360 (Liao 2022)	Same + fisheye + 360° camera	Karlsruhe with dense semantic GT
EuRoC MAV (Burri 2016)	Stereo (20 Hz) + IMU (200 Hz) + Vicon GT	Indoor MAV, machine-hall + Vicon room
TUM RGB-D (Sturm 2012)	Kinect v1 + Vicon GT	Indoor handheld
TUM Mono VO	Mono with photometric calib	Direct-VO benchmark
TUM VI (Schubert 2018)	Stereo + IMU (factory-calibrated)	Visual-inertial
M2DGR (Yin 2022)	Multi-modal ground robot	Diverse motion patterns
MulRan (Kim 2020)	3D LiDAR + Navtech radar	Long-term place recognition
Newer College (Ramezani 2020)	OS1-64 + RealSense + IMU	Oxford handheld
FusionPortable (Jiao 2022)	Handheld + quadruped + UGV	Diverse platforms
Hilti SLAM Challenge	Multi-modal	Construction-site benchmarking
nuScenes / Waymo Open	AV-grade multi-sensor	Self-driving
TartanAir (Wang 2020)	Simulated stereo + IMU + depth	Challenging photoreal sequences

6. Reference data

Algorithm × sensor × output matrix

Algorithm	Sensors	Map output	Loop closure	License
ORB-SLAM3	Mono / stereo / RGB-D ± IMU	Sparse landmarks + keyframes	DBoW2	GPLv3
VINS-Fusion	Mono/stereo + IMU + (GPS)	Sparse keyframes	DBoW2 (optional)	GPLv3
Kimera	Stereo + IMU	Mesh + 3D scene graph	DBoW2	BSD
SVO 2.0	Mono / stereo	Sparse	(separate)	GPL
RTAB-Map	RGB-D / stereo / LiDAR	Dense + graph	Bayesian filter + BoW	BSD
Cartographer 2D	2D LiDAR + IMU + odom	Occupancy grid	Branch-and-bound submap match	Apache 2.0
Cartographer 3D	3D LiDAR + IMU	Voxel map	Same	Apache 2.0
LIO-SAM	3D LiDAR + IMU + (GPS)	Point-cloud submap	Radius search / ScanContext	MIT
FAST-LIO2	3D LiDAR + IMU	ikd-Tree point cloud	none built-in	GPLv2
KISS-ICP	3D LiDAR	Voxel map	none	MIT
GLIM	3D LiDAR + IMU	GICP point cloud	DSC, optional	MIT
NICE-SLAM	RGB-D	Implicit neural	none	Apache 2.0
GS-SLAM	RGB-D / mono	3D Gaussians	DBoW2	MIT
slam_toolbox	2D LiDAR + odom	Occupancy grid	Pose-graph + scan match	BSD

KITTI Odometry Benchmark — representative top performers (May 2026)

Translation error is averaged % over sub-sequences of length 100–800 m on the held-out KITTI test sequences (11–21). Numbers are approximate snapshots; the leaderboard is updated continuously.

Rank tier	Method	Sensors	Trans. err.	Rot. err. (°/100 m)
Top	CT-ICP / KISS-ICP++ derivatives	3D LiDAR	0.45–0.55 %	0.0014
Top	SuMa++ (Chen 2019)	3D LiDAR + semantic	0.65 %	0.0019
Top	LIO-SAM tuned	3D LiDAR + IMU	0.60 %	0.0024
Mid	LOAM	3D LiDAR	0.85 %	0.0030
Mid	F-LOAM / LeGO-LOAM	3D LiDAR	0.90–1.0 %	0.0030
Mid	ORB-SLAM3 stereo-inertial	Stereo + IMU	1.0–1.5 %	0.0040
Mid	VINS-Fusion stereo	Stereo + IMU	1.5 %	0.0050
Mid	DSO mono	Mono	1.7 %	0.0080
Tail	Pure mono VIO	Mono + IMU	2.5–4 %	0.010

LiDAR + IMU systems dominate at sub-1 %; pure visual systems sit around 1–2 %; pure monocular systems sit at 2–4 %.

Optimisation library function map

Function	GTSAM	g2o	Ceres
Non-linear LS solver	LM, GN, Dogleg, iSAM2	LM, GN	LM, GN, Dogleg, trust-region
Incremental	iSAM2 (Bayes tree)	—	—
Robust kernels	Huber, Cauchy, Tukey, Geman-McClure	Huber, Cauchy, etc.	Huber, Cauchy, Tukey, arctan, soft-L1
SE(3) on-manifold	`Pose3` + tangent space	`SE3Quat`	manual `LocalParameterization`
Auto-differentiation	numerical / templated	numerical / hand	symbolic, automatic via templates
Marginalisation	iSAM2 keys + `marginalize`	Schur complement	manual
Python bindings	yes (`gtsam`)	yes (`g2o-python`)	yes (`pyceres`)
Built for SLAM	yes (originated there)	yes	general; SLAM via factor templates

Common indoor + outdoor dataset facts

Dataset	Length	Sensors	Ground truth
EuRoC MH	~1 km cumulative across 11 seqs	Stereo + IMU	Leica + Vicon
EuRoC V (Vicon room)	6 seqs, 30–130 s	Stereo + IMU	Vicon mocap
TUM VI	28 seqs, 142 min total	Stereo + IMU	partial mocap
KITTI Odometry	22 sequences, 39 km total	Stereo + Velodyne HDL-64E + GPS/INS	dGPS RTK
Newer College	2.2 km + 8 km extensions	Ouster OS1-64 + RealSense + IMU	survey-grade laser scanner
Hilti 2022	9 sequences	Hesai PandarXT-32 + Alphasense + IMU	total station

7. Failure modes and debugging

Drift accumulates in featureless areas — long blank corridor, foggy outdoor area, smoke. Visual front-end loses feature tracks; LiDAR ICP slides ambiguously along parallel walls. Fix: add fiducials (AprilTag, ArUco) every 5–10 m; sprinkle in retroreflectors for LiDAR; in outdoor with sky view, weight in GPS factors.
Wrong loop closure (“kidnapping” the map) — visual detector matches two similar-looking but distinct places (warehouse aisles, parking-lot rows). Symptom: trajectory abruptly folds onto itself. Fix: tighten detector threshold; add geometric verification (ICP / 5-point); enable switchable constraints (Sünderhauf 2012) so the back-end can disable suspect edges during optimisation.
IMU bias drift — accelerometer or gyro bias wanders with temperature or supply voltage; integrated nav-frame velocity drifts. Fix: model bias as a slowly-evolving state inside the optimisation; reinitialise on detected stillness (zero-velocity update / ZUPT).
Vision loss in dark or motion blur — feature detector returns nothing. Fix: fall back to IMU dead-reckoning + LiDAR; raise gain temporarily and accept noise; for predictable lighting, switch to global-shutter sensor with shorter exposure plus higher gain ([[Robotics/sensors-perception]]).
LiDAR motion distortion — at vehicle speeds, a 10 Hz scan covers 100 ms of motion; a single “frame” is from many ego-poses. Fix: de-skew every scan using IMU-integrated trajectory at IMU rate.
Map memory explosion on long trajectory — naive point-cloud accumulation grows without bound. Fix: keyframe pruning, voxel downsampling at insert time, ikd-Tree (FAST-LIO2), or marginalisation in sliding-window VIO.
Loop-closure latency — global optimisation after a closure can take 100 ms to seconds; the front-end cannot stall. Fix: run back-end async in a dedicated thread; the front-end uses the latest available estimate; merge the corrected map atomically when ready.
Multi-session merging — robot is shut down, restarted in same building. Need to relocalise into the prior map. Fix: place-recognition (DBoW, NetVLAD, ScanContext) at startup; ORB-SLAM3 ATLAS does this for visual; Maplab and RTAB-Map for multi-modal.
Timestamp drift across sensors — software-timestamped USB cameras and Ethernet LiDAR may disagree by tens of ms. Symptom: VIO residuals grow systematically; stereo disparity slope across the image. Fix: PTP everywhere; hardware trigger; per-point LiDAR timestamps; estimate the offset online (td in VINS-Fusion).
Monocular VIO refusing to initialise — IMU bias is unobservable without motion. Fix: enforce ≥ 0.1 m translation and ≥ 0.5 m of parallax during init; reject init attempts that don’t have enough excitation.
NaN in the factor graph — bad initial guess, sensor outlier, or singular Jacobian. Fix: validate sensor messages (range > 0, no NaN, timestamps monotonic) at the front-end before insertion; use M-estimator-style robust kernels (Huber, Cauchy) on all edges.
Pose-graph divergence — back-end optimisation step blows up. Fix: bound the trust-region size (LM lambda) more aggressively; add robust kernels; check that the prior factor on pose 0 is in place to fix the gauge.
Real-time miss — VIO can’t keep up with sensor rate; queues back up; latency grows unboundedly. Fix: drop frames cleverly (skip every other when behind); downsample LiDAR; switch to lighter front-end (KISS-ICP for LiDAR-only fast mode); accept frame-skip in the back-end while front-end keeps running.
Indoor / outdoor transition — GPS confidence collapses as you enter a building. Switch fusion weights smoothly with a GPS-quality factor (HDOP, fix type) rather than a hard switch that introduces a step in the trajectory.
Dynamic objects polluting features — cars, pedestrians, crowds attach features that move with them. Fix: semantic segmentation mask (DynaSLAM, Bescos 2018, masks COCO classes “person”, “car”, etc.); RANSAC with geometric motion model rejects movers as outliers; in LiDAR, motion-cluster detection.
Calibration drift over time — camera intrinsics, camera-IMU extrinsics, LiDAR-IMU extrinsics shift with temperature, mechanical shock. Fix: online self-calibration (VINS-Fusion, ORB-SLAM3 do this for time offset and extrinsics); periodic re-calibration with a known target.
Photometric drift in direct-VO — DSO needs photometric calibration (vignette, response function) to be stable; auto-exposure changes corrupt the photometric model. Fix: capture in fixed-exposure mode if possible; pre-compute photometric response.
Map orientation gravity-aligned vs north-aligned — VIO knows gravity but not north; the map’s yaw is undetermined unless a magnetometer or GPS gives a heading prior. Symptom: rebuilt map is rotated relative to the prior map. Fix: magnetometer factor on the first keyframe; or GPS heading from two-position triangulation.

8. Case studies

Cartographer (Google, 2016)

Hess, Kohler, Rapp & Andor’s “Real-Time Loop Closure in 2D LIDAR SLAM” (ICRA 2016) introduced what is now Google’s open-source Cartographer system. The key contribution is a branch-and-bound submap-matching loop-closure detector that runs in O(log N) over the global map and is mathematically guaranteed to find the globally-optimal scan-to-submap match within a configurable window. The architecture has two levels: a local SLAM (per-trajectory) that uses scan-to-submap matching + IMU integration for smooth local estimates, and a global SLAM that runs sparse pose-graph optimisation across all submaps using detected loop closures.

Cartographer powers Google’s commercial Project Cartographer floor-scanner backpack (released 2014; used internally for Street View indoor mapping) and is the canonical 2D LiDAR SLAM baseline in ROS 2 alongside slam_toolbox. Its 3D variant works but is less universally adopted than LiDAR-inertial competitors (LIO-SAM, FAST-LIO2) which arrived later with tighter IMU coupling.

ORB-SLAM3 (Campos et al. 2021)

Campos, Elvira, Gómez Rodríguez, Montiel & Tardós’s ORB-SLAM3 (IEEE Transactions on Robotics, 2021) is the modern industry-standard open-source visual / visual-inertial SLAM. It is the first system to unify monocular, stereo, RGB-D, and visual-inertial modes — with or without IMU — in a single codebase, and to handle multi-map ATLAS seamlessly: when tracking is lost, ORB-SLAM3 starts a new “map” and silently re-fuses it into the global map once the robot revisits a known place.

The system uses ORB features + DBoW2 place recognition + g2o pose-graph + custom bundle adjustment + IMU pre-integration in MAP estimation. Reported drift on EuRoC machine-hall MH_05 is 3.6 cm RMSE stereo-inertial (best-in-class for 2021). The code (UZ-SLAMLab on GitHub) is GPLv3, which has limited its direct commercial adoption — many companies write clean-room re-implementations of the same algorithms.

Tesla FSD — vision-only SLAM at scale

Tesla’s Full-Self-Driving stack (HW3 from 2018, HW4 from late 2023) does no LiDAR SLAM. The 8-camera perception stack runs a HydraNet multi-head CNN that emits a bird’s-eye-view (BEV) occupancy grid, lane-line graph, and object detections at 36 fps per camera. The “SLAM” function is split: short-term ego-motion comes from visual odometry + wheel-odom + IMU; long-term localisation against the world uses lane-graph matching to crowdsourced fleet-mapped HD lane data (a sparse map, not a dense LiDAR point cloud). This is a deliberate architectural choice — Elon’s bet that cameras + neural networks generalise where LiDAR’s geometric truth becomes a crutch. The trade-off shows in heavy rain and dense fog, where vision degrades faster than LiDAR / radar.

Apple ARKit — production VIO at iPhone scale

Apple’s ARKit (released 2017) runs a proprietary monocular-inertial VIO at 60 Hz on every iPhone since 6s. On iPhone Pro models (12 Pro onward) and iPad Pro (2020 onward), a custom Sony SPAD-based dToF LiDAR at 12×12 / 24×24 zones fuses with the VIO to add metric depth and improve initialisation. Apple has not published the algorithm; reverse-engineered behaviour (anchor stability, plane detection, re-localisation latency) matches a sliding-window VIO with online camera-IMU calibration and a learned plane detector. Performance is good enough to support ARKit’s industrial use-cases (Lowe’s room measurement, IKEA Place, GE turbine inspection) without LiDAR-grade error budgets.

DJI / Skydio — drone VIO at edge SWaP

DJI Mavic 3 and Phantom 4 Pro use a proprietary 6-camera + downward-stereo + GPS stack for indoor + outdoor flight. Skydio (X10, 2023) doubles down on this with six wide-baseline cameras providing a full 360° depth bubble; obstacle avoidance runs at 10 Hz with sub-100 ms latency on a Jetson Orin NX. Both architectures use VIO for ego-motion + a sparse-keyframe map for short-term re-localisation; long-term mapping is not the goal — drones rely on GPS for global frame and trade SLAM completeness for low SWaP (the BoM cost of a Livox Mid-360 is 265 g and 6 W, unacceptable on a 1.4 kg drone).

9. Cross-references

[[Robotics/sensors-perception]] — the exteroceptive sensors (RGB, stereo, RGB-D, LiDAR, radar, event cameras) feeding the SLAM front-end.
[[Robotics/sensors-pose-motion]] — the proprioceptive sensors (encoders, IMUs) supplying odometry and ego-motion constraints, including the strapdown navigation equations integrated by VIO.
[[Robotics/sensors-force-tactile]] — companion sensor reference for completeness; less relevant to SLAM specifically.
[[Robotics/kinematics-dh]] — SE(3) machinery, frame conventions, and transform composition used throughout factor-graph optimisation.
[[Robotics/dynamics-rigid-body]] — the rigid-body model VIO pre-integration depends on for predicting state evolution from IMU samples.
[[Robotics/bayesian-estimation]] (planned) — Kalman / EKF / particle filter foundations underlying both filter-based SLAM and the smoothing back-ends.
[[Robotics/computer-vision-robotics]] (planned) — feature extraction (ORB, SuperPoint), data association, semantic segmentation, depth networks that the SLAM front-end consumes.
[[Robotics/path-planning]] (planned) — the consumer of the SLAM map output; nav2 costmap and OctoMap interfaces.
[[Robotics/mobile-base-wheeled]] (planned) — wheel-odometry models and platform-level integration with SLAM.
[[Robotics/multirotor-design]] (planned) — drone-specific VIO + GPS fusion, low-SWaP compute integration.
[[Engineering/electromagnetics-engineering]] — LiDAR optical physics, wavelength choice (905 vs 1550 nm), FMCW chirp design.
[[Engineering/semiconductor-devices]] — SPAD physics for dToF LiDAR; CMOS image sensors.
[[Languages/Tier3/ros2-robotics-config]] — ROS 2 / DDS message types (sensor_msgs, nav_msgs, geometry_msgs/TransformStamped) used to plumb SLAM nodes.
[[Languages/Tier3/3d-scene]] — point-cloud and mesh formats (PLY, LAS, E57, USD, glTF) used to export and exchange SLAM maps.
[[Languages/Tier3/robotics-control]] (planned) — DSL coverage for navigation behaviour-trees, costmap configuration, lifecycle nodes.

10. Citations

Smith, R., Self, M. & Cheeseman, P. (1986). “Estimating Uncertain Spatial Relationships in Robotics.” In Autonomous Robot Vehicles, Cox & Wilfong eds. Springer. The canonical origin of EKF-SLAM.
Lu, F. & Milios, E. (1997). “Globally Consistent Range Scan Alignment for Environment Mapping.” Autonomous Robots 4, 333–349. The origin of graph-based / pose-graph SLAM.
Thrun, S., Burgard, W. & Fox, D. (2005). Probabilistic Robotics. MIT Press. The canonical textbook for SLAM, Kalman/particle filters, occupancy grids.
Dellaert, F. & Kaess, M. (2017). “Factor Graphs for Robot Perception.” Foundations and Trends in Robotics 6(1–2), 1–139. The reference for factor-graph SLAM.
Kaess, M., Johannsson, H., Roberts, R., Ila, V., Leonard, J. & Dellaert, F. (2012). “iSAM2: Incremental Smoothing and Mapping Using the Bayes Tree.” International Journal of Robotics Research 31(2), 216–235. The incremental solver behind GTSAM.
Cadena, C. et al. (2016). “Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age.” IEEE Transactions on Robotics 32(6), 1309–1332. arXiv:1606.05830. The canonical decade-survey.
Campos, C., Elvira, R., Gómez Rodríguez, J. J., Montiel, J. M. M. & Tardós, J. D. (2021). “ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual-Inertial and Multi-Map SLAM.” IEEE Transactions on Robotics 37(6), 1874–1890. DOI 10.1109/TRO.2021.3075644.
Mur-Artal, R. & Tardós, J. D. (2017). “ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras.” IEEE Transactions on Robotics 33(5), 1255–1262.
Qin, T., Li, P. & Shen, S. (2018). “VINS-Mono: A Robust and Versatile Monocular Visual-Inertial State Estimator.” IEEE Transactions on Robotics 34(4), 1004–1020. arXiv:1708.03852.
Qin, T., Pan, J., Cao, S. & Shen, S. (2019). “A General Optimization-based Framework for Local Odometry Estimation with Multiple Sensors.” arXiv:1901.03638. (VINS-Fusion.)
Forster, C., Carlone, L., Dellaert, F. & Scaramuzza, D. (2017). “On-Manifold Preintegration for Real-Time Visual-Inertial Odometry.” IEEE Transactions on Robotics 33(1), 1–21. The pre-integration formulation used by every modern VIO.
Hess, W., Kohler, D., Rapp, H. & Andor, D. (2016). “Real-Time Loop Closure in 2D LIDAR SLAM.” ICRA 2016, 1271–1278. (Cartographer.)
Zhang, J. & Singh, S. (2014). “LOAM: Lidar Odometry and Mapping in Real-time.” Robotics: Science and Systems 2014.
Shan, T. & Englot, B. (2018). “LeGO-LOAM: Lightweight and Ground-Optimized Lidar Odometry and Mapping on Variable Terrain.” IROS 2018, 4758–4765.
Shan, T., Englot, B., Meyers, D., Wang, W., Ratti, C. & Rus, D. (2020). “LIO-SAM: Tightly-coupled Lidar Inertial Odometry via Smoothing and Mapping.” IROS 2020. arXiv:2007.00258.
Xu, W., Cai, Y., He, D., Lin, J. & Zhang, F. (2022). “FAST-LIO2: Fast Direct LiDAR-Inertial Odometry.” IEEE Transactions on Robotics 38(4), 2053–2073.
He, D., Xu, W., Chen, N., Kong, F., Yuan, C. & Zhang, F. (2023). “Point-LIO: Robust High-Bandwidth LiDAR-Inertial Odometry.” Advanced Intelligent Systems 5(7), 2200459.
Vizzo, I., Guadagnino, T., Mersch, B., Wiesmann, L., Behley, J. & Stachniss, C. (2023). “KISS-ICP: In Defense of Point-to-Point ICP — Simple, Accurate, and Robust Registration If Done the Right Way.” IEEE Robotics and Automation Letters 8(2), 1029–1036.
Kim, G. & Kim, A. (2018). “Scan Context: Egocentric Spatial Descriptor for Place Recognition Within 3D Point Cloud Map.” IROS 2018, 4802–4809.
Mourikis, A. & Roumeliotis, S. (2007). “A Multi-State Constraint Kalman Filter for Vision-aided Inertial Navigation.” ICRA 2007, 3565–3572. The original MSCKF.
Bloesch, M., Burri, M., Omari, S., Hutter, M. & Siegwart, R. (2017). “Iterated Extended Kalman Filter Based Visual-Inertial Odometry Using Direct Photometric Feedback.” International Journal of Robotics Research 36(10), 1053–1072. (ROVIO.)
Leutenegger, S., Lynen, S., Bosse, M., Siegwart, R. & Furgale, P. (2015). “Keyframe-Based Visual-Inertial Odometry Using Nonlinear Optimization.” International Journal of Robotics Research 34(3), 314–334. (OKVIS.)
Rosinol, A., Abate, M., Chang, Y. & Carlone, L. (2020). “Kimera: an Open-Source Library for Real-Time Metric-Semantic Localization and Mapping.” ICRA 2020. arXiv:1910.02490.
Newcombe, R. et al. (2011). “KinectFusion: Real-Time Dense Surface Mapping and Tracking.” ISMAR 2011, 127–136.
Whelan, T., Salas-Moreno, R., Glocker, B., Davison, A. & Leutenegger, S. (2016). “ElasticFusion: Real-Time Dense SLAM and Light Source Estimation.” International Journal of Robotics Research 35(14), 1697–1716.
Labbé, M. & Michaud, F. (2019). “RTAB-Map as an Open-Source Lidar and Visual Simultaneous Localization and Mapping Library for Large-Scale and Long-Term Online Operation.” Journal of Field Robotics 36(2), 416–446.
Zhu, Z., Peng, S., Larsson, V., Xu, W., Bao, H., Cui, Z., Oswald, M. R. & Pollefeys, M. (2022). “NICE-SLAM: Neural Implicit Scalable Encoding for SLAM.” CVPR 2022. arXiv:2112.12130.
Matsuki, H., Murai, R., Kelly, P. H. J. & Davison, A. J. (2024). “Gaussian Splatting SLAM.” CVPR 2024. arXiv:2312.06741.
Sünderhauf, N. & Protzel, P. (2012). “Switchable Constraints for Robust Pose Graph SLAM.” IROS 2012, 1879–1884.
Olson, E. & Agarwal, P. (2013). “Inference on Networks of Mixtures for Robust Robot Mapping.” International Journal of Robotics Research 32(7), 826–840. (Max-mixture robust optimisation.)
Yang, H., Antonante, P., Tzoumas, V. & Carlone, L. (2020). “Graduated Non-Convexity for Robust Spatial Perception.” IEEE Robotics and Automation Letters 5(2), 1127–1134.
Huang, G., Mourikis, A. & Roumeliotis, S. (2010). “Observability-based Rules for Designing Consistent EKF SLAM Estimators.” International Journal of Robotics Research 29(5), 502–528. (First-Estimate Jacobians.)
Bescos, B., Fácil, J. M., Civera, J. & Neira, J. (2018). “DynaSLAM: Tracking, Mapping, and Inpainting in Dynamic Scenes.” IEEE Robotics and Automation Letters 3(4), 4076–4083.
Geiger, A., Lenz, P., Stiller, C. & Urtasun, R. (2013). “Vision meets Robotics: The KITTI Dataset.” International Journal of Robotics Research 32(11), 1231–1237.
Burri, M. et al. (2016). “The EuRoC Micro Aerial Vehicle Datasets.” International Journal of Robotics Research 35(10), 1157–1163.
Sturm, J. et al. (2012). “A Benchmark for the Evaluation of RGB-D SLAM Systems.” IROS 2012, 573–580.
Wisth, D., Camurri, M. & Fallon, M. (2022). “VILENS: Visual, Inertial, Lidar, and Leg Odometry for All-Terrain Legged Robots.” IEEE Transactions on Robotics 39(1), 309–326.
Titterton, D. H. & Weston, J. L. (2004). Strapdown Inertial Navigation Technology (2nd ed.). IET. The reference for IMU integration mechanisation underlying VIO pre-integration.
GTSAM documentation (Borglab, Georgia Tech). https://gtsam.org
g2o documentation (Kümmerle et al. 2011). https://github.com/RainerKuemmerle/g2o
Ceres Solver documentation (Google). https://ceres-solver.org
UZ-SLAMLab. ORB-SLAM3 repository. https://github.com/UZ-SLAMLab/ORB_SLAM3
cartographer-project. Cartographer repository. https://github.com/cartographer-project/cartographer

Compendium

Explorer

SLAM (Simultaneous Localization & Mapping) — Robotics Reference

SLAM (Simultaneous Localization & Mapping) — Robotics Reference

1. At a glance

2. First principles

The SLAM probability problem

Factor graphs

Front-end vs back-end split

Loop closure: why it dominates

Marginalisation and information loss

3. Practical math and worked examples

Example A — Pose-graph optimisation in GTSAM

Example B — ORB feature matching for V-SLAM

Example C — LIO-SAM benchmark on KITTI sequence 00

Example D — Point-to-plane ICP for scan registration

Example E — IMU pre-integration factor cost

4. Design heuristics

Sensor stack by platform

Loop-closure detector choice

Map representations

Compute budget reality check

Time-sync requirements

Initialisation gotchas

Degeneracy detection

5. Components and sourcing

Open-source visual SLAM

Open-source LiDAR SLAM

Frameworks and ROS 2 integration

Optimisation libraries

Commercial / proprietary

Standard datasets

6. Reference data

Algorithm × sensor × output matrix

KITTI Odometry Benchmark — representative top performers (May 2026)

Optimisation library function map

Common indoor + outdoor dataset facts

7. Failure modes and debugging

8. Case studies

Cartographer (Google, 2016)

ORB-SLAM3 (Campos et al. 2021)

Tesla FSD — vision-only SLAM at scale

Apple ARKit — production VIO at iPhone scale

DJI / Skydio — drone VIO at edge SWaP

9. Cross-references

10. Citations

Graph View

Table of Contents

Backlinks