SLAM (Simultaneous Localization & Mapping) — Robotics Reference
See also (Tier 3 family index): SLAM Algorithm Zoo
1. At a glance
SLAM answers a chicken-and-egg question: a robot moving through an unknown environment must build a map of the environment and localise itself within that map, simultaneously, from sensor data alone. Both estimates depend on each other; neither can be solved without the other. SLAM is the algorithmic layer that consumes the exteroceptive and proprioceptive sensors covered in [[Robotics/sensors-perception]] and [[Robotics/sensors-pose-motion]] and emits a coherent (pose, map) tuple suitable for the planner and controller layers above it.
Three modern paradigms dominate, plus emerging neural variants:
- Filter-based SLAM — EKF-SLAM (Smith / Self / Cheeseman 1986), particle filter (FastSLAM, Montemerlo 2002), and Multi-State Constraint Kalman Filter (MSCKF, Mourikis 2007). One recursive Bayesian state that grows as landmarks are added. Mostly historical for mapping; MSCKF persists in tightly-coupled VIO.
- Optimisation / graph SLAM — Lu / Milios 1997 onward; the dominant paradigm. The trajectory and landmarks are nodes in a factor graph; sensor measurements and odometry are edges (factors); a non-linear least-squares solver (Gauss-Newton, Levenberg-Marquardt, or incremental iSAM2) finds the maximum-a-posteriori estimate. GTSAM, g2o, Ceres are the production libraries.
- Tightly-coupled visual-inertial / LiDAR-inertial — VIO (VINS-Mono, ORB-SLAM3 inertial mode, OKVIS, ROVIO, Kimera) and LIO (LIO-SAM, FAST-LIO2, Point-LIO) fuse IMU pre-integration (Forster 2017) directly into the optimisation. The default for mobile robotics in 2026.
- Neural / implicit SLAM — NICE-SLAM (Zhu 2022), iMAP, Gaussian-Splatting SLAM (Matsuki 2024). Replace the discrete map with a learned implicit field (NeRF) or 3D Gaussian primitives. Photometrically accurate, dense, GPU-bound; emerging into AR / robotics demos but not yet production-ready for safety-critical autonomy.
Every modern SLAM system splits into a front-end (sensor processing, feature extraction, data association, place recognition) and a back-end (graph optimisation, loop-closure integration, marginalisation). Front-end errors are local and recoverable; back-end errors (especially false loop closures) corrupt the entire map globally.
First ask before applying SLAM:
- What are the sensors (mono / stereo / RGB-D / 2D-LiDAR / 3D-LiDAR / IMU / GPS), and what are their rates and synchronisation guarantees?
- What is the environment — indoor structured, outdoor open, mixed, underground, dynamic?
- What is the trajectory length — 10 m corridor, 1 km warehouse loop, 100 km road network?
- What is the output — a 2D occupancy grid for nav2, a 3D point cloud for visualisation, a TSDF for collision, a textured mesh for AR?
- What is the compute budget — Jetson Orin Nano (15 W), Orin AGX (60 W), workstation, or cloud?
- Loop closure — required for bounded drift; how reliable does the detector need to be?
2. First principles
The SLAM probability problem
The full SLAM problem is to estimate the joint posterior
p(x_{1:t}, m | z_{1:t}, u_{1:t})
over the robot trajectory x_{1:t} = (x_1, …, x_t), the map m (landmarks, occupancy grid, or surfels), conditioned on all sensor observations z_{1:t} and control inputs u_{1:t}. This is intractable in general; practical SLAM solves one of three reductions:
- Online SLAM — marginalise out past poses, keep p(x_t, m | z_{1:t}, u_{1:t}). Filter-based methods (EKF, particle) live here.
- Full SLAM — keep the whole trajectory and map; optimise over everything. Graph SLAM methods live here.
- Sliding-window SLAM — keep only the last N keyframes; marginalise older states with a prior. VIO and most modern VO/VIO systems live here.
Factor graphs
A factor graph (Dellaert & Kaess 2017) is a bipartite graph with two node types: variables (poses, landmarks, calibrations, biases) and factors (probabilistic measurement constraints). The joint posterior factorises as
p(X | Z) ∝ ∏_i φ_i(X_i)
where each factor φ_i is typically Gaussian: φ(x) = exp(−½ · ‖e(x)‖²_Ω). Solving for the MAP estimate reduces to non-linear weighted least squares:
X* = argmin_X Σ_i e_i(X_i)^T · Ω_i · e_i(X_i)
Common factor types:
- Odometry factor: e = log(T_ij_meas⁻¹ · T_i⁻¹ · T_j) ∈ se(3)
- Loop-closure factor: same form, but T_ij from place-recognition + relative-pose estimation
- Landmark observation factor: e = z − π(T_i, l_j), where π is the camera-projection model
- IMU pre-integration factor (Forster 2017): a compact summary of all IMU measurements between two keyframes, accounting for bias and gravity
- Prior factor: fixes the first pose at the origin to remove the gauge freedom of SE(3)
- GPS factor, wheel odometry factor, plane factor, switchable constraint factor, etc.
A pose graph is the special case where landmarks have been marginalised out (Schur complement) or were never modelled; only poses and pose-pose constraints remain. Pose-graph optimisation is much cheaper than full bundle adjustment and is the back-end of choice for LiDAR SLAM.
Front-end vs back-end split
| Stage | Job | Typical components |
|---|---|---|
| Front-end | Per-frame sensor processing, feature / scan extraction, data association, motion estimation, keyframe selection, loop-closure detection | ORB / SIFT / SuperPoint, LK tracker, RANSAC, ICP, scan-context |
| Back-end | Maintain the factor graph; solve non-linear LS; marginalise old states; handle loop-closure insertion | GTSAM / g2o / Ceres; iSAM2 incremental solver; robust kernels |
This separation matters operationally: the front-end runs at sensor rate (30–200 Hz), the back-end runs at keyframe rate (1–10 Hz) or asynchronously when loop closures arrive. Most failures (wrong matches, false loops) are front-end failures that the back-end inherits.
Loop closure: why it dominates
Without loop closure, every SLAM system accumulates drift roughly linearly in trajectory length — O(d) where d is distance travelled. With a working loop-closure detector and the resulting global optimisation, drift becomes O(1) bounded by the size of the explored region. The pivotal moment in any SLAM run is the first revisit of a previously-mapped place. A true positive loop closure injects a global constraint that pulls drift out of the trajectory; a false positive loop closure injects a contradictory constraint that the back-end will try to satisfy, and the entire map can collapse onto itself (“kidnap” failure).
Robust back-ends mitigate this with switchable constraints (Sünderhauf & Protzel 2012), max-mixtures (Olson 2013), or graduated non-convexity (Yang 2020) — methods that let the optimiser down-weight or reject suspect constraints during optimisation rather than committing to them up-front.
Marginalisation and information loss
Sliding-window VIO maintains a bounded state by marginalising out the oldest keyframe at each step. Marginalisation converts the joint posterior over (X_old, X_keep) into a prior over X_keep alone — analytically a Schur complement that fills in across all variables connected to X_old. This makes the resulting information matrix dense (fill-in), increasing solve cost. First-Estimate Jacobians (FEJ, Huang 2010) and observability-constrained variants address consistency issues that pure linearisation around the latest estimate introduces.
3. Practical math and worked examples
Example A — Pose-graph optimisation in GTSAM
A small mobile robot drives a 10-pose square loop with wheel-odometry edges and a single closure when it returns to the start.
import gtsam, numpy as np
graph = gtsam.NonlinearFactorGraph()
init = gtsam.Values()
# Odometry noise: 5 cm translation, 1 deg rotation
odo_noise = gtsam.noiseModel.Diagonal.Sigmas(np.array([0.05]*3 + [np.deg2rad(1)]*3))
loop_noise = gtsam.noiseModel.Diagonal.Sigmas(np.array([0.10]*3 + [np.deg2rad(2)]*3))
graph.add(gtsam.PriorFactorPose3(0, gtsam.Pose3(), odo_noise))
for i in range(9):
delta = gtsam.Pose3(gtsam.Rot3.RzRyRx(0, 0, np.deg2rad(40)), gtsam.Point3(1.0,0,0))
graph.add(gtsam.BetweenFactorPose3(i, i+1, delta, odo_noise))
init.insert(i+1, gtsam.Pose3())
init.insert(0, gtsam.Pose3())
# Loop closure: pose 9 back to pose 0
graph.add(gtsam.BetweenFactorPose3(9, 0, gtsam.Pose3(), loop_noise))
params = gtsam.LevenbergMarquardtParams()
result = gtsam.LevenbergMarquardtOptimizer(graph, init, params).optimize()On a 2024 x86 laptop this 10-node graph solves in ~5 ms with iSAM2 incremental updates and ~20 ms with batch Levenberg-Marquardt. Scaling to a 10 000-pose graph: iSAM2 stays in the tens-of-ms range (incremental QR factorisation reuses prior work); batch LM grows to seconds.
Units: poses in SE(3) (meters + radians); noise sigmas have meters for the first three and radians for the last three; the BetweenFactor’s error is in tangent space se(3) so the resulting cost is dimensionally consistent.
Example B — ORB feature matching for V-SLAM
ORB-SLAM3 extracts up to 1000 ORB features per frame at 30 Hz. ORB descriptors are 256-bit binary vectors; match cost is Hamming distance (population-count of XOR), which takes ~3 ns per pair on AVX2 hardware.
Naïve matching of two frames is 1000 × 1000 = 10⁶ pair-wise distance computations ≈ 3 ms — fast enough on CPU. Two cuts reduce this to ~10–30 µs in production:
- Lowe’s ratio test — for each query feature, find the best and second-best match in the train set; accept only if best/second-best < 0.75. This rejects ambiguous correspondences that would generate outliers.
- Geometric verification with RANSAC + 5-point essential matrix (Nistér 2004) — fit E from a random minimal sample of 5 correspondences; count inliers under the epipolar constraint; iterate 100–500 times; refine on the inlier set. With ratio-tested matches, typical inlier counts are 100–300 out of ~500 candidates.
The full pipeline (extract → ratio → RANSAC → refine) runs at 15–30 Hz on a Jetson Orin Nano for 1280×720 input.
Example C — LIO-SAM benchmark on KITTI sequence 00
Hardware: Velodyne VLP-16 (10 Hz, 600 k points/s) + Xsens MTi-30 IMU (100 Hz). Sequence 00 is 3724 m of urban driving with multiple revisits.
Pipeline:
- Per-scan de-skewing using IMU integration over the 100 ms scan period.
- Feature extraction: planar + edge points via LOAM-style curvature filtering, ~3000 features per scan.
- Scan-to-map registration via point-to-plane ICP at keyframes.
- IMU pre-integration factors between keyframes (Forster 2017): one factor summarises 10 IMU samples.
- ScanContext loop-closure detection (Kim 2018) — a polar 60×20 histogram of point heights; matches via row-shift-invariant cosine similarity. Detector fires at ~5 places along the loop.
- iSAM2 incremental back-end on Jetson Xavier NX (15 W mode).
Result: end-to-end Absolute Trajectory Error (ATE) ≈ 0.5 % of distance = 18 m over 3.7 km, vs ~1.5 % for LOAM-only (lidar-only, no IMU), vs ~0.3 % for state-of-the-art LiDAR-VIO (KISS-ICP + camera). Frame-to-frame latency is ~50 ms; loop-closure-triggered global optimisation completes in 200–400 ms.
Example D — Point-to-plane ICP for scan registration
LiDAR scan registration uses point-to-plane ICP (Chen & Medioni 1991) rather than point-to-point because the latter is biased for sparse scans against dense surfaces. Given a source point cloud P and a target T with surface normals n_j, the cost is:
E(T) = Σ_i ((T·p_i − q_j)^T · n_j)²linearised around small angle α and translation t, the per-residual Jacobian is the 1×6 row:
J_i = [ (p_i × n_j)^T n_j^T ]Each iteration solves the 6×6 normal-equation H·δ = b with H = Σ J^T·J and b = −Σ J^T·r. For a 50 000-point scan with k-d-tree nearest-neighbour lookups (~3 µs per query on a Jetson Orin Nano), one iteration costs ~150 ms naïve but drops to ~25 ms with multi-threading and to ~5 ms on GPU (cuICP, ZED SDK). Typical convergence: 10–20 iterations to a final residual under 1 cm.
Common failure: ICP returns a local minimum if the initial pose is more than half the scan’s correlation length off true. Mitigations: use IMU prediction as the initial pose; use multi-resolution voxel grids (coarse-to-fine); use NDT (Normal Distributions Transform, Biber 2003) which has a wider basin of convergence.
Example E — IMU pre-integration factor cost
Between two keyframes at times t_i and t_j, the IMU pre-integrated factor (Forster 2017) summarises N raw IMU samples into three quantities — Δp_ij, Δv_ij, ΔR_ij — and their 9×9 covariance Σ_ij. The factor’s residual is a 15-vector (the three pre-integration deltas plus accel + gyro bias deltas):
r_p = R_i^T · (p_j − p_i − v_i·Δt − ½·g·Δt²) − Δp_ij(b_a, b_g)
r_v = R_i^T · (v_j − v_i − g·Δt) − Δv_ij(b_a, b_g)
r_R = log( ΔR_ij(b_g)^T · R_i^T · R_j )
r_ba = b_a_j − b_a_i
r_bg = b_g_j − b_g_iThe key insight: pre-integration moves the integration into a frame that does not depend on R_i, so if R_i changes during optimisation, the pre-integrated quantities don’t need to be re-integrated — only bias-corrected with first-order updates. A 100 Hz IMU between 10 Hz keyframes contributes 10 samples per factor; cost evaluation is < 10 µs.
Bias modelling: accel and gyro biases are typically modelled as random walks (ḃ = w_b, w_b white noise) with hand-tuned process noise — order 10⁻³ m/s³/√Hz for accel, 10⁻⁵ rad/s²/√Hz for gyro on a consumer MEMS IMU like Bosch BMI088 or InvenSense ICM-42688.
4. Design heuristics
Sensor stack by platform
| Platform | Sensors | Recommended algorithm |
|---|---|---|
| Indoor AMR / warehouse robot | 2D LiDAR + wheel-odom + IMU | slam_toolbox, Cartographer 2D, gmapping |
| Outdoor AGV / lawn robot | 3D LiDAR + IMU + GNSS | LIO-SAM, FAST-LIO2, Cartographer 3D |
| AR / VR headset | Stereo + IMU | VIO (proprietary Apple ARKit, Google ARCore; open VINS-Fusion, ORB-SLAM3) |
| Consumer drone | Stereo + IMU + downward-facing | VINS-Fusion, Kimera, proprietary DJI/Skydio |
| Self-driving car | Multi-LiDAR + cameras + radars + IMU + GNSS + HD-map | Proprietary; Cartographer or Autoware research stacks |
| Surgical / medical | Stereo endoscope + IMU + tool encoders | Proprietary; ORB-SLAM3 / SuperPoint-based |
| Quadruped (Spot, ANYmal, Unitree) | Stereo / RGB-D + LiDAR + IMU + leg-odom | VILENS (Wisth 2022), Pronto, proprietary |
| Underwater (AUV) | DVL + IMU + sonar + occasional GPS | Acoustic SLAM, factor-graph fusion (GTSAM) |
Loop-closure detector choice
| Detector | Sensor | Pros | Cons |
|---|---|---|---|
| DBoW2 / DBoW3 (Gálvez-López 2012) | Visual | Fast (~5 ms), proven, ORB-friendly | Lighting / season sensitive |
| NetVLAD (Arandjelović 2016) | Visual | Robust to illumination | ~50 ms inference; needs GPU |
| HF-Net / SuperPoint+SuperGlue | Visual | SOTA accuracy | Heavy compute |
| ScanContext (Kim 2018) | 3D LiDAR | Compact (1200-byte descriptor), rotation invariant | 2D projection loses 3D detail |
| ScanContext++ (Kim 2022) | 3D LiDAR | Adds altitude robustness | Slightly more compute |
| BoW3D (Cui 2022) | 3D LiDAR | Real-time, LinK3D-feature based | Newer, less battle-tested |
| Iris (Wang 2020) | 3D LiDAR | Binary signature, fast | Less robust than ScanContext |
| Place-NeRF / OverlapNet | LiDAR / visual | Learned, robust | GPU-bound |
Setting the threshold — every detector has a similarity threshold T. T high → fewer false positives but missed closures (drift grows). T low → catches all true closures but admits false positives that destroy the map. Production stacks add a geometric verification step: after detector fires, run ICP or 5-point essential matrix to check the relative-pose hypothesis is geometrically consistent. Reject if inlier count or fitness score is below a second threshold.
Map representations
| Representation | 2D / 3D | Use case | Library |
|---|---|---|---|
| Occupancy grid (probabilistic) | 2D / 3D | Path planning, navigation | nav2 costmap, OctoMap |
| Point cloud | 3D | Visualisation, registration | PCL, Open3D |
| Surfel cloud | 3D | Dense surface, deformable | ElasticFusion, SuMa (Behley 2018) |
| TSDF (Truncated SDF) | 3D | Collision, gradient-based planning, dense | KinectFusion, Voxblox, nvblox |
| ESDF (Euclidean SDF) | 3D | Sampling-based planning, drone obstacle avoidance | Voxblox, Voxfield |
| Triangle mesh | 3D | Visualisation, photogrammetry export | Open3D Poisson, AliceVision |
| NDT (Normal Distributions Transform) | 2D / 3D | Probabilistic registration | autoware.universe |
| NeRF / implicit | 3D | Photometric novel-view, AR | NICE-SLAM, Instant-NGP |
| 3D Gaussian splats | 3D | Dense + fast render, AR | GS-SLAM (Matsuki 2024), MonoGS |
| Topological / pose graph | abstract | Long-term, multi-session | Maplab, RTAB-Map graph mode |
Compute budget reality check
| Platform | TDP | What runs real-time |
|---|---|---|
| Jetson Orin Nano (8 GB) | 7–15 W | ORB-SLAM3 mono/stereo at 15 fps (1280×720), FAST-LIO2 at 10 Hz (VLP-16), slam_toolbox 2D |
| Jetson Orin NX 16 GB | 10–25 W | ORB-SLAM3 stereo-inertial at 30 fps, FAST-LIO2 with 64-channel, Kimera-VIO |
| Jetson AGX Orin 64 GB | 15–60 W | All of the above plus dense neural SLAM (NICE-SLAM at low resolution) |
| RTX 4090 workstation | 450 W | Gaussian-splatting SLAM at interactive rates; 8K bundle adjustment |
| Apple M2 / M3 | 15–30 W | ARKit production VIO at 60 Hz |
| Qualcomm RB6 (Snapdragon 8550) | 5–15 W | Hexagon-DSP VIO for drones at 100 Hz with low jitter |
Time-sync requirements
Software-only timestamping (e.g. node arrival time) drifts 5–50 ms relative to physical capture, with non-deterministic latency. Cross-modal fusion requires better than 1 ms alignment between an IMU and a camera or LiDAR. Achievable via:
- Hardware trigger from a common PPS (GNSS) or FPGA pulse.
- PTP / IEEE 1588 on all sensors with PHY-level timestamping (Basler, FLIR, Ouster, Velodyne all support PTP).
- Per-point LiDAR timestamping — every Velodyne / Ouster point carries an offset within the rotation; combine with IMU to de-skew.
Initialisation gotchas
| System | Initialisation requirement |
|---|---|
| Mono SLAM | Need parallax motion ≥ 0.1 m over ≥ 0.5 s |
| Stereo SLAM | Works from frame 1 (metric scale from baseline) |
| RGB-D SLAM | Works from frame 1 |
| LiDAR SLAM | Needs feature-rich first scan; avoid blank corridor at start |
| Mono-inertial VIO | Need motion to observe accel bias; sit-still + accelerate-forward is the canonical init pattern |
| Stereo-inertial VIO | Can initialise from gravity vector + stereo depth even when static (some implementations) |
Degeneracy detection
SLAM does not gracefully degrade; it fails. Watch for:
- Long featureless corridor — visual SLAM loses tracking; LiDAR sees parallel walls and slides along them.
- Smoke / fog / dust — LiDAR sees a wall of noise; vision sees nothing.
- Glass / mirrors — LiDAR sees through (no return) or reflects (false geometry); vision tracks the reflection rather than the surface.
- Dynamic scenes (crowds, traffic) — features attach to moving objects → drift. Filter with semantic segmentation (DynaSLAM, Bescos 2018) or motion masks.
- Repetitive structure — empty parking lots, warehouse aisles. Loop-closure detector fires false positives between identical-looking cells.
Production stacks add a failure-detection layer that watches feature count, ICP residual, IMU-vs-vision disagreement, and falls back to dead-reckoning (IMU-only or wheel-odom) with growing uncertainty until tracking recovers.
5. Components and sourcing
Open-source visual SLAM
| Project | Sensors | License | Notes |
|---|---|---|---|
| ORB-SLAM3 (Campos 2021) | Mono / stereo / RGB-D + IMU | GPLv3 | The single most-cited open-source SLAM since 2015; multi-map ATLAS subsystem |
| VINS-Fusion (Qin 2019) | Mono / stereo + IMU + (GPS) | GPLv3 | HKUST; battle-tested on drones |
| SVO 2.0 (Forster 2017) | Mono / stereo | GPLv3 (commercial license available) | Direct/sparse, ultra-fast |
| DSO / LDSO (Engel 2017) | Mono | GPLv3 | Photometric direct VO; demanding calibration |
| OpenVSLAM / stella_vslam | Mono / stereo / RGB-D + fisheye | 2-clause BSD (after re-release) | Best fisheye support among open systems |
| Kimera (Rosinol 2020) | Stereo + IMU | BSD | Metric-semantic; mesh + 3D scene graph |
| Maplab 2.0 (ETH ASL) | Multi-session VIO | Apache 2.0 | Designed for long-term multi-session mapping |
| OKVIS / OKVIS2 (Leutenegger) | Stereo + IMU | BSD | Reference implementation of keyframe-based VIO |
| ROVIO (Bloesch 2017) | Mono + IMU | BSD | Iterated EKF; runs on tiny CPUs |
| BASALT (TUM) | Stereo + IMU | BSD | Modern marginalisation; clean codebase |
Open-source LiDAR SLAM
| Project | Sensors | License | Notes |
|---|---|---|---|
| LOAM (Zhang & Singh 2014) | 3D LiDAR | non-commercial | The progenitor; many derivatives |
| LeGO-LOAM (Shan 2018) | 3D LiDAR (ground) | BSD | Ground-segmentation optimisation |
| LIO-SAM (Shan 2020) | 3D LiDAR + IMU + (GPS) | MIT | The mainstream LIO baseline |
| FAST-LIO / FAST-LIO2 (Xu 2022) | 3D LiDAR + IMU | GPLv2 | Iterated-EKF, very low latency |
| Point-LIO (He 2023) | 3D LiDAR + IMU | GPLv2 | Per-point processing; high-vibration robust |
| KISS-ICP (Vizzo 2023) | 3D LiDAR | MIT | LiDAR-only; minimal, drop-in odometry |
| Cartographer (Google, Hess 2016) | 2D / 3D LiDAR + IMU + (odom) | Apache 2.0 | 2D rocks; 3D more situational |
| GLIM (Koide 2024) | 3D LiDAR + IMU | MIT | GPU-accelerated factor-graph LiDAR SLAM |
| HDL-SLAM | 3D LiDAR | BSD | Older but stable |
| A-LOAM / F-LOAM | 3D LiDAR | BSD | Cleaned-up LOAM rewrites |
Frameworks and ROS 2 integration
| Package | Function |
|---|---|
slam_toolbox | 2D LiDAR SLAM, the nav2 default in ROS 2; serialisable maps |
cartographer_ros | Google Cartographer ROS 2 bindings |
rtabmap_ros | RTAB-Map (Labbé 2019); RGB-D + LiDAR; multi-session |
nav2 + amcl | Adaptive Monte Carlo Localization against a known map |
lio_sam / fast_lio | ROS 2 ports |
kiss_icp | Standalone ROS 2 node |
rviz2 | Visualisation |
octomap_server | OctoMap 3D occupancy |
Optimisation libraries
| Library | Best for | License |
|---|---|---|
| GTSAM (Dellaert 2012) | iSAM2 incremental; factor graphs with semantic factors; Python bindings | BSD |
| g2o (Kümmerle 2011) | Batch pose-graph and bundle adjustment; simple API | BSD |
| Ceres Solver (Google) | General non-linear LS; auto-diff; SfM, calibration | New BSD |
| SymForce (Skydio 2022) | Symbolic factor generation, optimised codegen | Apache 2.0 |
| MOLA | Modular SLAM framework (José-Luis Blanco) | GPLv3 |
| SuiteSparse / CHOLMOD | Sparse linear-algebra backbone | LGPL |
Commercial / proprietary
| Vendor | Product | Stack |
|---|---|---|
| Apple | ARKit | Custom VIO + LiDAR fusion (Pro models) |
| ARCore | Custom VIO; Cloud Anchors for multi-session | |
| Microsoft | HoloLens 2 / Mesh | 4-cam VIO + ToF |
| Meta | Quest 3 / Quest Pro | Inside-out 4-cam VIO + colour passthrough |
| Magic Leap | ML2 | Stereo VIO + IR depth |
| Tesla | FSD | Vision-only HydraNet + lane-graph |
| Waymo | Driver | Multi-sensor proprietary fusion |
| Zoox | Vehicle | Custom multi-sensor stack |
| SLAMtec | Mapper M-series | 2D / 3D LiDAR + SLAM for indoor |
| Ouster | Gemini | VLS-128 + camera + radar reference stack |
| Stereolabs | ZED SDK Spatial Mapping | Stereo + IMU; mesh / point-cloud output |
| Intel RealSense | T265 (EoL) | Snapdragon-Flight VIO module |
| NVIDIA | Isaac Nova Carter / Isaac Perceptor | Reference perception stack on Jetson |
Standard datasets
| Dataset | Sensors | Scenario |
|---|---|---|
| KITTI (Geiger 2012) | Stereo + Velodyne HDL-64E + GPS + IMU | Urban driving (Karlsruhe) |
| KITTI-360 (Liao 2022) | Same + fisheye + 360° camera | Karlsruhe with dense semantic GT |
| EuRoC MAV (Burri 2016) | Stereo (20 Hz) + IMU (200 Hz) + Vicon GT | Indoor MAV, machine-hall + Vicon room |
| TUM RGB-D (Sturm 2012) | Kinect v1 + Vicon GT | Indoor handheld |
| TUM Mono VO | Mono with photometric calib | Direct-VO benchmark |
| TUM VI (Schubert 2018) | Stereo + IMU (factory-calibrated) | Visual-inertial |
| M2DGR (Yin 2022) | Multi-modal ground robot | Diverse motion patterns |
| MulRan (Kim 2020) | 3D LiDAR + Navtech radar | Long-term place recognition |
| Newer College (Ramezani 2020) | OS1-64 + RealSense + IMU | Oxford handheld |
| FusionPortable (Jiao 2022) | Handheld + quadruped + UGV | Diverse platforms |
| Hilti SLAM Challenge | Multi-modal | Construction-site benchmarking |
| nuScenes / Waymo Open | AV-grade multi-sensor | Self-driving |
| TartanAir (Wang 2020) | Simulated stereo + IMU + depth | Challenging photoreal sequences |
6. Reference data
Algorithm × sensor × output matrix
| Algorithm | Sensors | Map output | Loop closure | License |
|---|---|---|---|---|
| ORB-SLAM3 | Mono / stereo / RGB-D ± IMU | Sparse landmarks + keyframes | DBoW2 | GPLv3 |
| VINS-Fusion | Mono/stereo + IMU + (GPS) | Sparse keyframes | DBoW2 (optional) | GPLv3 |
| Kimera | Stereo + IMU | Mesh + 3D scene graph | DBoW2 | BSD |
| SVO 2.0 | Mono / stereo | Sparse | (separate) | GPL |
| RTAB-Map | RGB-D / stereo / LiDAR | Dense + graph | Bayesian filter + BoW | BSD |
| Cartographer 2D | 2D LiDAR + IMU + odom | Occupancy grid | Branch-and-bound submap match | Apache 2.0 |
| Cartographer 3D | 3D LiDAR + IMU | Voxel map | Same | Apache 2.0 |
| LIO-SAM | 3D LiDAR + IMU + (GPS) | Point-cloud submap | Radius search / ScanContext | MIT |
| FAST-LIO2 | 3D LiDAR + IMU | ikd-Tree point cloud | none built-in | GPLv2 |
| KISS-ICP | 3D LiDAR | Voxel map | none | MIT |
| GLIM | 3D LiDAR + IMU | GICP point cloud | DSC, optional | MIT |
| NICE-SLAM | RGB-D | Implicit neural | none | Apache 2.0 |
| GS-SLAM | RGB-D / mono | 3D Gaussians | DBoW2 | MIT |
| slam_toolbox | 2D LiDAR + odom | Occupancy grid | Pose-graph + scan match | BSD |
KITTI Odometry Benchmark — representative top performers (May 2026)
Translation error is averaged % over sub-sequences of length 100–800 m on the held-out KITTI test sequences (11–21). Numbers are approximate snapshots; the leaderboard is updated continuously.
| Rank tier | Method | Sensors | Trans. err. | Rot. err. (°/100 m) |
|---|---|---|---|---|
| Top | CT-ICP / KISS-ICP++ derivatives | 3D LiDAR | 0.45–0.55 % | 0.0014 |
| Top | SuMa++ (Chen 2019) | 3D LiDAR + semantic | 0.65 % | 0.0019 |
| Top | LIO-SAM tuned | 3D LiDAR + IMU | 0.60 % | 0.0024 |
| Mid | LOAM | 3D LiDAR | 0.85 % | 0.0030 |
| Mid | F-LOAM / LeGO-LOAM | 3D LiDAR | 0.90–1.0 % | 0.0030 |
| Mid | ORB-SLAM3 stereo-inertial | Stereo + IMU | 1.0–1.5 % | 0.0040 |
| Mid | VINS-Fusion stereo | Stereo + IMU | 1.5 % | 0.0050 |
| Mid | DSO mono | Mono | 1.7 % | 0.0080 |
| Tail | Pure mono VIO | Mono + IMU | 2.5–4 % | 0.010 |
LiDAR + IMU systems dominate at sub-1 %; pure visual systems sit around 1–2 %; pure monocular systems sit at 2–4 %.
Optimisation library function map
| Function | GTSAM | g2o | Ceres |
|---|---|---|---|
| Non-linear LS solver | LM, GN, Dogleg, iSAM2 | LM, GN | LM, GN, Dogleg, trust-region |
| Incremental | iSAM2 (Bayes tree) | — | — |
| Robust kernels | Huber, Cauchy, Tukey, Geman-McClure | Huber, Cauchy, etc. | Huber, Cauchy, Tukey, arctan, soft-L1 |
| SE(3) on-manifold | Pose3 + tangent space | SE3Quat | manual LocalParameterization |
| Auto-differentiation | numerical / templated | numerical / hand | symbolic, automatic via templates |
| Marginalisation | iSAM2 keys + marginalize | Schur complement | manual |
| Python bindings | yes (gtsam) | yes (g2o-python) | yes (pyceres) |
| Built for SLAM | yes (originated there) | yes | general; SLAM via factor templates |
Common indoor + outdoor dataset facts
| Dataset | Length | Sensors | Ground truth |
|---|---|---|---|
| EuRoC MH | ~1 km cumulative across 11 seqs | Stereo + IMU | Leica + Vicon |
| EuRoC V (Vicon room) | 6 seqs, 30–130 s | Stereo + IMU | Vicon mocap |
| TUM VI | 28 seqs, 142 min total | Stereo + IMU | partial mocap |
| KITTI Odometry | 22 sequences, 39 km total | Stereo + Velodyne HDL-64E + GPS/INS | dGPS RTK |
| Newer College | 2.2 km + 8 km extensions | Ouster OS1-64 + RealSense + IMU | survey-grade laser scanner |
| Hilti 2022 | 9 sequences | Hesai PandarXT-32 + Alphasense + IMU | total station |
7. Failure modes and debugging
- Drift accumulates in featureless areas — long blank corridor, foggy outdoor area, smoke. Visual front-end loses feature tracks; LiDAR ICP slides ambiguously along parallel walls. Fix: add fiducials (AprilTag, ArUco) every 5–10 m; sprinkle in retroreflectors for LiDAR; in outdoor with sky view, weight in GPS factors.
- Wrong loop closure (“kidnapping” the map) — visual detector matches two similar-looking but distinct places (warehouse aisles, parking-lot rows). Symptom: trajectory abruptly folds onto itself. Fix: tighten detector threshold; add geometric verification (ICP / 5-point); enable switchable constraints (Sünderhauf 2012) so the back-end can disable suspect edges during optimisation.
- IMU bias drift — accelerometer or gyro bias wanders with temperature or supply voltage; integrated nav-frame velocity drifts. Fix: model bias as a slowly-evolving state inside the optimisation; reinitialise on detected stillness (zero-velocity update / ZUPT).
- Vision loss in dark or motion blur — feature detector returns nothing. Fix: fall back to IMU dead-reckoning + LiDAR; raise gain temporarily and accept noise; for predictable lighting, switch to global-shutter sensor with shorter exposure plus higher gain (
[[Robotics/sensors-perception]]). - LiDAR motion distortion — at vehicle speeds, a 10 Hz scan covers 100 ms of motion; a single “frame” is from many ego-poses. Fix: de-skew every scan using IMU-integrated trajectory at IMU rate.
- Map memory explosion on long trajectory — naive point-cloud accumulation grows without bound. Fix: keyframe pruning, voxel downsampling at insert time, ikd-Tree (FAST-LIO2), or marginalisation in sliding-window VIO.
- Loop-closure latency — global optimisation after a closure can take 100 ms to seconds; the front-end cannot stall. Fix: run back-end async in a dedicated thread; the front-end uses the latest available estimate; merge the corrected map atomically when ready.
- Multi-session merging — robot is shut down, restarted in same building. Need to relocalise into the prior map. Fix: place-recognition (DBoW, NetVLAD, ScanContext) at startup; ORB-SLAM3 ATLAS does this for visual; Maplab and RTAB-Map for multi-modal.
- Timestamp drift across sensors — software-timestamped USB cameras and Ethernet LiDAR may disagree by tens of ms. Symptom: VIO residuals grow systematically; stereo disparity slope across the image. Fix: PTP everywhere; hardware trigger; per-point LiDAR timestamps; estimate the offset online (
tdin VINS-Fusion). - Monocular VIO refusing to initialise — IMU bias is unobservable without motion. Fix: enforce ≥ 0.1 m translation and ≥ 0.5 m of parallax during init; reject init attempts that don’t have enough excitation.
- NaN in the factor graph — bad initial guess, sensor outlier, or singular Jacobian. Fix: validate sensor messages (range > 0, no NaN, timestamps monotonic) at the front-end before insertion; use M-estimator-style robust kernels (Huber, Cauchy) on all edges.
- Pose-graph divergence — back-end optimisation step blows up. Fix: bound the trust-region size (LM lambda) more aggressively; add robust kernels; check that the prior factor on pose 0 is in place to fix the gauge.
- Real-time miss — VIO can’t keep up with sensor rate; queues back up; latency grows unboundedly. Fix: drop frames cleverly (skip every other when behind); downsample LiDAR; switch to lighter front-end (KISS-ICP for LiDAR-only fast mode); accept frame-skip in the back-end while front-end keeps running.
- Indoor / outdoor transition — GPS confidence collapses as you enter a building. Switch fusion weights smoothly with a GPS-quality factor (HDOP, fix type) rather than a hard switch that introduces a step in the trajectory.
- Dynamic objects polluting features — cars, pedestrians, crowds attach features that move with them. Fix: semantic segmentation mask (DynaSLAM, Bescos 2018, masks COCO classes “person”, “car”, etc.); RANSAC with geometric motion model rejects movers as outliers; in LiDAR, motion-cluster detection.
- Calibration drift over time — camera intrinsics, camera-IMU extrinsics, LiDAR-IMU extrinsics shift with temperature, mechanical shock. Fix: online self-calibration (VINS-Fusion, ORB-SLAM3 do this for time offset and extrinsics); periodic re-calibration with a known target.
- Photometric drift in direct-VO — DSO needs photometric calibration (vignette, response function) to be stable; auto-exposure changes corrupt the photometric model. Fix: capture in fixed-exposure mode if possible; pre-compute photometric response.
- Map orientation gravity-aligned vs north-aligned — VIO knows gravity but not north; the map’s yaw is undetermined unless a magnetometer or GPS gives a heading prior. Symptom: rebuilt map is rotated relative to the prior map. Fix: magnetometer factor on the first keyframe; or GPS heading from two-position triangulation.
8. Case studies
Cartographer (Google, 2016)
Hess, Kohler, Rapp & Andor’s “Real-Time Loop Closure in 2D LIDAR SLAM” (ICRA 2016) introduced what is now Google’s open-source Cartographer system. The key contribution is a branch-and-bound submap-matching loop-closure detector that runs in O(log N) over the global map and is mathematically guaranteed to find the globally-optimal scan-to-submap match within a configurable window. The architecture has two levels: a local SLAM (per-trajectory) that uses scan-to-submap matching + IMU integration for smooth local estimates, and a global SLAM that runs sparse pose-graph optimisation across all submaps using detected loop closures.
Cartographer powers Google’s commercial Project Cartographer floor-scanner backpack (released 2014; used internally for Street View indoor mapping) and is the canonical 2D LiDAR SLAM baseline in ROS 2 alongside slam_toolbox. Its 3D variant works but is less universally adopted than LiDAR-inertial competitors (LIO-SAM, FAST-LIO2) which arrived later with tighter IMU coupling.
ORB-SLAM3 (Campos et al. 2021)
Campos, Elvira, Gómez Rodríguez, Montiel & Tardós’s ORB-SLAM3 (IEEE Transactions on Robotics, 2021) is the modern industry-standard open-source visual / visual-inertial SLAM. It is the first system to unify monocular, stereo, RGB-D, and visual-inertial modes — with or without IMU — in a single codebase, and to handle multi-map ATLAS seamlessly: when tracking is lost, ORB-SLAM3 starts a new “map” and silently re-fuses it into the global map once the robot revisits a known place.
The system uses ORB features + DBoW2 place recognition + g2o pose-graph + custom bundle adjustment + IMU pre-integration in MAP estimation. Reported drift on EuRoC machine-hall MH_05 is 3.6 cm RMSE stereo-inertial (best-in-class for 2021). The code (UZ-SLAMLab on GitHub) is GPLv3, which has limited its direct commercial adoption — many companies write clean-room re-implementations of the same algorithms.
Tesla FSD — vision-only SLAM at scale
Tesla’s Full-Self-Driving stack (HW3 from 2018, HW4 from late 2023) does no LiDAR SLAM. The 8-camera perception stack runs a HydraNet multi-head CNN that emits a bird’s-eye-view (BEV) occupancy grid, lane-line graph, and object detections at 36 fps per camera. The “SLAM” function is split: short-term ego-motion comes from visual odometry + wheel-odom + IMU; long-term localisation against the world uses lane-graph matching to crowdsourced fleet-mapped HD lane data (a sparse map, not a dense LiDAR point cloud). This is a deliberate architectural choice — Elon’s bet that cameras + neural networks generalise where LiDAR’s geometric truth becomes a crutch. The trade-off shows in heavy rain and dense fog, where vision degrades faster than LiDAR / radar.
Apple ARKit — production VIO at iPhone scale
Apple’s ARKit (released 2017) runs a proprietary monocular-inertial VIO at 60 Hz on every iPhone since 6s. On iPhone Pro models (12 Pro onward) and iPad Pro (2020 onward), a custom Sony SPAD-based dToF LiDAR at 12×12 / 24×24 zones fuses with the VIO to add metric depth and improve initialisation. Apple has not published the algorithm; reverse-engineered behaviour (anchor stability, plane detection, re-localisation latency) matches a sliding-window VIO with online camera-IMU calibration and a learned plane detector. Performance is good enough to support ARKit’s industrial use-cases (Lowe’s room measurement, IKEA Place, GE turbine inspection) without LiDAR-grade error budgets.
DJI / Skydio — drone VIO at edge SWaP
DJI Mavic 3 and Phantom 4 Pro use a proprietary 6-camera + downward-stereo + GPS stack for indoor + outdoor flight. Skydio (X10, 2023) doubles down on this with six wide-baseline cameras providing a full 360° depth bubble; obstacle avoidance runs at 10 Hz with sub-100 ms latency on a Jetson Orin NX. Both architectures use VIO for ego-motion + a sparse-keyframe map for short-term re-localisation; long-term mapping is not the goal — drones rely on GPS for global frame and trade SLAM completeness for low SWaP (the BoM cost of a Livox Mid-360 is 265 g and 6 W, unacceptable on a 1.4 kg drone).
9. Cross-references
[[Robotics/sensors-perception]]— the exteroceptive sensors (RGB, stereo, RGB-D, LiDAR, radar, event cameras) feeding the SLAM front-end.[[Robotics/sensors-pose-motion]]— the proprioceptive sensors (encoders, IMUs) supplying odometry and ego-motion constraints, including the strapdown navigation equations integrated by VIO.[[Robotics/sensors-force-tactile]]— companion sensor reference for completeness; less relevant to SLAM specifically.[[Robotics/kinematics-dh]]— SE(3) machinery, frame conventions, and transform composition used throughout factor-graph optimisation.[[Robotics/dynamics-rigid-body]]— the rigid-body model VIO pre-integration depends on for predicting state evolution from IMU samples.[[Robotics/bayesian-estimation]](planned) — Kalman / EKF / particle filter foundations underlying both filter-based SLAM and the smoothing back-ends.[[Robotics/computer-vision-robotics]](planned) — feature extraction (ORB, SuperPoint), data association, semantic segmentation, depth networks that the SLAM front-end consumes.[[Robotics/path-planning]](planned) — the consumer of the SLAM map output; nav2 costmap and OctoMap interfaces.[[Robotics/mobile-base-wheeled]](planned) — wheel-odometry models and platform-level integration with SLAM.[[Robotics/multirotor-design]](planned) — drone-specific VIO + GPS fusion, low-SWaP compute integration.[[Engineering/electromagnetics-engineering]]— LiDAR optical physics, wavelength choice (905 vs 1550 nm), FMCW chirp design.[[Engineering/semiconductor-devices]]— SPAD physics for dToF LiDAR; CMOS image sensors.[[Languages/Tier3/ros2-robotics-config]]— ROS 2 / DDS message types (sensor_msgs,nav_msgs,geometry_msgs/TransformStamped) used to plumb SLAM nodes.[[Languages/Tier3/3d-scene]]— point-cloud and mesh formats (PLY, LAS, E57, USD, glTF) used to export and exchange SLAM maps.[[Languages/Tier3/robotics-control]](planned) — DSL coverage for navigation behaviour-trees, costmap configuration, lifecycle nodes.
10. Citations
- Smith, R., Self, M. & Cheeseman, P. (1986). “Estimating Uncertain Spatial Relationships in Robotics.” In Autonomous Robot Vehicles, Cox & Wilfong eds. Springer. The canonical origin of EKF-SLAM.
- Lu, F. & Milios, E. (1997). “Globally Consistent Range Scan Alignment for Environment Mapping.” Autonomous Robots 4, 333–349. The origin of graph-based / pose-graph SLAM.
- Thrun, S., Burgard, W. & Fox, D. (2005). Probabilistic Robotics. MIT Press. The canonical textbook for SLAM, Kalman/particle filters, occupancy grids.
- Dellaert, F. & Kaess, M. (2017). “Factor Graphs for Robot Perception.” Foundations and Trends in Robotics 6(1–2), 1–139. The reference for factor-graph SLAM.
- Kaess, M., Johannsson, H., Roberts, R., Ila, V., Leonard, J. & Dellaert, F. (2012). “iSAM2: Incremental Smoothing and Mapping Using the Bayes Tree.” International Journal of Robotics Research 31(2), 216–235. The incremental solver behind GTSAM.
- Cadena, C. et al. (2016). “Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age.” IEEE Transactions on Robotics 32(6), 1309–1332. arXiv:1606.05830. The canonical decade-survey.
- Campos, C., Elvira, R., Gómez Rodríguez, J. J., Montiel, J. M. M. & Tardós, J. D. (2021). “ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual-Inertial and Multi-Map SLAM.” IEEE Transactions on Robotics 37(6), 1874–1890. DOI 10.1109/TRO.2021.3075644.
- Mur-Artal, R. & Tardós, J. D. (2017). “ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras.” IEEE Transactions on Robotics 33(5), 1255–1262.
- Qin, T., Li, P. & Shen, S. (2018). “VINS-Mono: A Robust and Versatile Monocular Visual-Inertial State Estimator.” IEEE Transactions on Robotics 34(4), 1004–1020. arXiv:1708.03852.
- Qin, T., Pan, J., Cao, S. & Shen, S. (2019). “A General Optimization-based Framework for Local Odometry Estimation with Multiple Sensors.” arXiv:1901.03638. (VINS-Fusion.)
- Forster, C., Carlone, L., Dellaert, F. & Scaramuzza, D. (2017). “On-Manifold Preintegration for Real-Time Visual-Inertial Odometry.” IEEE Transactions on Robotics 33(1), 1–21. The pre-integration formulation used by every modern VIO.
- Hess, W., Kohler, D., Rapp, H. & Andor, D. (2016). “Real-Time Loop Closure in 2D LIDAR SLAM.” ICRA 2016, 1271–1278. (Cartographer.)
- Zhang, J. & Singh, S. (2014). “LOAM: Lidar Odometry and Mapping in Real-time.” Robotics: Science and Systems 2014.
- Shan, T. & Englot, B. (2018). “LeGO-LOAM: Lightweight and Ground-Optimized Lidar Odometry and Mapping on Variable Terrain.” IROS 2018, 4758–4765.
- Shan, T., Englot, B., Meyers, D., Wang, W., Ratti, C. & Rus, D. (2020). “LIO-SAM: Tightly-coupled Lidar Inertial Odometry via Smoothing and Mapping.” IROS 2020. arXiv:2007.00258.
- Xu, W., Cai, Y., He, D., Lin, J. & Zhang, F. (2022). “FAST-LIO2: Fast Direct LiDAR-Inertial Odometry.” IEEE Transactions on Robotics 38(4), 2053–2073.
- He, D., Xu, W., Chen, N., Kong, F., Yuan, C. & Zhang, F. (2023). “Point-LIO: Robust High-Bandwidth LiDAR-Inertial Odometry.” Advanced Intelligent Systems 5(7), 2200459.
- Vizzo, I., Guadagnino, T., Mersch, B., Wiesmann, L., Behley, J. & Stachniss, C. (2023). “KISS-ICP: In Defense of Point-to-Point ICP — Simple, Accurate, and Robust Registration If Done the Right Way.” IEEE Robotics and Automation Letters 8(2), 1029–1036.
- Kim, G. & Kim, A. (2018). “Scan Context: Egocentric Spatial Descriptor for Place Recognition Within 3D Point Cloud Map.” IROS 2018, 4802–4809.
- Mourikis, A. & Roumeliotis, S. (2007). “A Multi-State Constraint Kalman Filter for Vision-aided Inertial Navigation.” ICRA 2007, 3565–3572. The original MSCKF.
- Bloesch, M., Burri, M., Omari, S., Hutter, M. & Siegwart, R. (2017). “Iterated Extended Kalman Filter Based Visual-Inertial Odometry Using Direct Photometric Feedback.” International Journal of Robotics Research 36(10), 1053–1072. (ROVIO.)
- Leutenegger, S., Lynen, S., Bosse, M., Siegwart, R. & Furgale, P. (2015). “Keyframe-Based Visual-Inertial Odometry Using Nonlinear Optimization.” International Journal of Robotics Research 34(3), 314–334. (OKVIS.)
- Rosinol, A., Abate, M., Chang, Y. & Carlone, L. (2020). “Kimera: an Open-Source Library for Real-Time Metric-Semantic Localization and Mapping.” ICRA 2020. arXiv:1910.02490.
- Newcombe, R. et al. (2011). “KinectFusion: Real-Time Dense Surface Mapping and Tracking.” ISMAR 2011, 127–136.
- Whelan, T., Salas-Moreno, R., Glocker, B., Davison, A. & Leutenegger, S. (2016). “ElasticFusion: Real-Time Dense SLAM and Light Source Estimation.” International Journal of Robotics Research 35(14), 1697–1716.
- Labbé, M. & Michaud, F. (2019). “RTAB-Map as an Open-Source Lidar and Visual Simultaneous Localization and Mapping Library for Large-Scale and Long-Term Online Operation.” Journal of Field Robotics 36(2), 416–446.
- Zhu, Z., Peng, S., Larsson, V., Xu, W., Bao, H., Cui, Z., Oswald, M. R. & Pollefeys, M. (2022). “NICE-SLAM: Neural Implicit Scalable Encoding for SLAM.” CVPR 2022. arXiv:2112.12130.
- Matsuki, H., Murai, R., Kelly, P. H. J. & Davison, A. J. (2024). “Gaussian Splatting SLAM.” CVPR 2024. arXiv:2312.06741.
- Sünderhauf, N. & Protzel, P. (2012). “Switchable Constraints for Robust Pose Graph SLAM.” IROS 2012, 1879–1884.
- Olson, E. & Agarwal, P. (2013). “Inference on Networks of Mixtures for Robust Robot Mapping.” International Journal of Robotics Research 32(7), 826–840. (Max-mixture robust optimisation.)
- Yang, H., Antonante, P., Tzoumas, V. & Carlone, L. (2020). “Graduated Non-Convexity for Robust Spatial Perception.” IEEE Robotics and Automation Letters 5(2), 1127–1134.
- Huang, G., Mourikis, A. & Roumeliotis, S. (2010). “Observability-based Rules for Designing Consistent EKF SLAM Estimators.” International Journal of Robotics Research 29(5), 502–528. (First-Estimate Jacobians.)
- Bescos, B., Fácil, J. M., Civera, J. & Neira, J. (2018). “DynaSLAM: Tracking, Mapping, and Inpainting in Dynamic Scenes.” IEEE Robotics and Automation Letters 3(4), 4076–4083.
- Geiger, A., Lenz, P., Stiller, C. & Urtasun, R. (2013). “Vision meets Robotics: The KITTI Dataset.” International Journal of Robotics Research 32(11), 1231–1237.
- Burri, M. et al. (2016). “The EuRoC Micro Aerial Vehicle Datasets.” International Journal of Robotics Research 35(10), 1157–1163.
- Sturm, J. et al. (2012). “A Benchmark for the Evaluation of RGB-D SLAM Systems.” IROS 2012, 573–580.
- Wisth, D., Camurri, M. & Fallon, M. (2022). “VILENS: Visual, Inertial, Lidar, and Leg Odometry for All-Terrain Legged Robots.” IEEE Transactions on Robotics 39(1), 309–326.
- Titterton, D. H. & Weston, J. L. (2004). Strapdown Inertial Navigation Technology (2nd ed.). IET. The reference for IMU integration mechanisation underlying VIO pre-integration.
- GTSAM documentation (Borglab, Georgia Tech). https://gtsam.org
- g2o documentation (Kümmerle et al. 2011). https://github.com/RainerKuemmerle/g2o
- Ceres Solver documentation (Google). https://ceres-solver.org
- UZ-SLAMLab. ORB-SLAM3 repository. https://github.com/UZ-SLAMLab/ORB_SLAM3
- cartographer-project. Cartographer repository. https://github.com/cartographer-project/cartographer