SLAM Algorithms — Family Index
Simultaneous Localization And Mapping is the joint inference problem of estimating both robot trajectory and a map of the environment from sensor data, when neither is known a priori. After three decades the field has fragmented into a zoo of algorithms differentiated along three axes — sensor modality, map representation, and back-end estimator. This index catalogs the canonical members and notes which combinations are deployed in practice.
1. At a glance — taxonomy axes
SLAM systems decompose along three largely orthogonal axes:
- Sensor modality: monocular camera, stereo, RGB-D (Kinect, RealSense, Tango), 2D LiDAR, 3D LiDAR (Velodyne, Ouster, Livox), event camera (DVS / DAVIS), radar (FMCW), sonar (underwater), IMU (always auxiliary, never alone). Visual-inertial (VIO) and LiDAR-inertial (LIO) are the dominant tightly-coupled multi-sensor combinations.
- Map representation: sparse landmark (3D points + descriptors), semi-dense (high-gradient pixels), dense (every pixel / every voxel), volumetric (TSDF grid, occupancy grid, ESDF), surfel (oriented disks), mesh, neural-implicit (MLP weights), 3D Gaussian (Gaussian Splatting).
- Estimator back-end: Extended Kalman Filter (EKF), Unscented KF (UKF), particle filter (Rao-Blackwellized), pose-graph (poses as nodes, relative-motion measurements as edges), factor-graph (poses + landmarks + IMU pre-integration as factors), batch bundle adjustment, incremental smoothing (iSAM2 Bayes tree), learned end-to-end.
The 1990s-2007 filtering era used EKF/UKF/particle filters with a strict O(n²)-or-worse cost in landmark count. The “graph SLAM” insight (Lu + Milios 1997, Dellaert 2005 square-root SAM) re-cast the problem as sparse nonlinear least-squares on a graph, enabling O(n) incremental smoothing via Cholesky / Bayes-tree factorization and unlocking modern scalable SLAM.
A practical SLAM system also splits responsibility along a front-end / back-end axis: the front-end is the per-frame sensor-driven step (feature extraction, data association, scan matching, photometric tracking, IMU pre-integration) that emits constraints, and the back-end is the global optimizer (filter, smoother, graph optimizer) that fuses them. Loop closure spans both: the front-end proposes candidates (appearance-based with DBoW2/NetVLAD, or geometry-based with scan-context / branch-and-bound) and the back-end accepts, rejects, or down-weights them (RANSAC, PCM, GNC, switchable constraints). Most failure modes in real-world SLAM trace to data-association errors at the front-end propagating into the back-end as false-positive loop closures, motivating the heavy robust-back-end research of the last decade.
2. Filtering era (Bayesian recursive estimation)
Before pose-graph SLAM matured, the dominant paradigm was a single Gaussian (EKF) or particle distribution over the joint robot+map state, updated recursively per sensor frame.
- EKF-SLAM — Smith, Self, Cheeseman 1990 (“Estimating Uncertain Spatial Relationships in Robotics”, the “SPmodel” / stochastic map). The joint covariance matrix is updated by linearizing both motion and observation models. Computational complexity is O(n²) per update in number of landmarks, fundamentally limiting it to a few hundred landmarks. MonoSLAM (Davison 2007 PAMI) was the first real-time monocular EKF-SLAM at 30 Hz on a desktop CPU, ~100 features tracked, a landmark milestone for visual SLAM. Inverse-depth parameterization (Civera, Davison, Montiel 2008) extended EKF-SLAM to handle low-parallax points.
- UKF-SLAM — Unscented transform avoids Jacobian computation and handles stronger nonlinearity than EKF, but rarely used in practice because the pose-graph approach dominated by the time UKF-SLAM matured.
- FastSLAM — Montemerlo, Thrun, Koller, Wegbreit 2002 (“FastSLAM: A Factored Solution to the Simultaneous Localization and Mapping Problem”). Rao-Blackwellized particle filter: each particle is a trajectory hypothesis, and conditional on a trajectory, landmark posteriors decouple into independent low-dimensional EKFs. Scales to thousands of landmarks. FastSLAM 2.0 (Montemerlo + Thrun 2003) added improved proposal distribution using current measurement.
- GMapping — Grisetti, Stachniss, Burgard 2007 (“Improved Techniques for Grid Mapping with Rao-Blackwellized Particle Filters”, IEEE T-RO). Rao-Blackwellized particle filter for 2D occupancy-grid LiDAR. The standard ROS-1 mobile-robot SLAM baseline 2008-2016. Limitations: particle depletion in long corridors, no native loop closure.
- Hector SLAM — Kohlbrecher et al. 2011 — scan-matching-only 2D LiDAR SLAM, no odometry required, common on quadrotors.
Filtering-era systems are still preferred where computational budget is fixed and small (microcontroller-grade) or where the state stays small (camera-mounted AR with bounded workspace).
3. Pose-graph and factor-graph back-ends (modern)
The graph-SLAM paradigm formulates the MAP estimate as a sparse nonlinear least-squares problem on a graph. Pose-graphs (Lu + Milios 1997) keep only poses with relative-pose edges; factor-graphs (Dellaert + Kaess 2006) admit arbitrary factor types (IMU pre-integration, GPS, range, landmark observations). Minimization is of the negative log-likelihood — equivalent to weighted least-squares under Gaussian noise.
- TORO — Grisetti, Stachniss, Grzonka, Burgard 2007 — Tree-based netwORk Optimizer; stochastic gradient descent on a spanning-tree-parameterized graph.
- HOG-Man — Grisetti, Kümmerle, Stachniss, Frese, Hertzberg 2010 — Hierarchical Optimization for pose Graphs on Manifolds.
- g2o — Kümmerle, Grisetti, Strasdat, Konolige, Burgard 2011 ICRA (“g2o: A General Framework for Graph Optimization”). C++ library, sparse Cholesky / PCG / Levenberg-Marquardt. Used in ORB-SLAM, RGB-D-SLAM, parts of Cartographer.
- GTSAM — Dellaert (Georgia Tech, 2012-present) — factor-graph library with the iSAM2 incremental smoother backed by a Bayes tree data structure (Kaess et al. 2012 IJRR). iSAM (2008) was the original incremental smoothing-and-mapping algorithm. GTSAM is the back-end of choice for modern factor-graph SLAM: Kimera, LIO-SAM, multi-robot extensions, DOOR-SLAM.
- Ceres Solver — Google 2010+. General-purpose nonlinear least-squares with auto-differentiation. Powers VINS-Mono / VINS-Fusion, many academic VO pipelines, and Google’s own Tango.
- SE-Sync — Rosen, Carlone, Bandeira, Leonard 2017 — certifiably-correct synchronization on SE(d), addresses local-minima problem of pose-graph SLAM.
4. Feature-based visual SLAM
Track sparse keypoints across frames, triangulate them as 3D landmarks, optimize via bundle adjustment.
- MonoSLAM — Davison 2007 PAMI; EKF-based.
- PTAM — Klein + Murray 2007 ISMAR (“Parallel Tracking And Mapping for Small AR Workspaces”). Split tracking (per-frame) from mapping (background bundle adjustment) into separate threads — became the architectural template for nearly every modern visual SLAM system. Designed for AR.
- ORB-SLAM — Mur-Artal, Montiel, Tardós 2015 T-RO (“ORB-SLAM: A Versatile and Accurate Monocular SLAM System”). ORB feature (Rublee 2011), three threads (tracking / local mapping / loop closing), DBoW2 vocabulary tree for place recognition.
- ORB-SLAM2 — Mur-Artal + Tardós 2017 T-RO — mono / stereo / RGB-D unified.
- ORB-SLAM3 — Campos, Elvira, Gómez Rodríguez, Montiel, Tardós 2021 T-RO — adds tight visual-inertial fusion, multi-map system (“Atlas”), pinhole + fisheye support. The current reference open-source visual-(inertial-)SLAM.
- VINS-Mono — Qin, Li, Shen (HKUST 2018, IEEE T-RO) — tightly-coupled monocular VIO with sliding-window optimization in Ceres, IMU pre-integration. Robust on aerial vehicles.
- VINS-Fusion — Qin et al. 2019 — VINS-Mono extended with stereo and GPS.
- OKVIS — Leutenegger, Lynen, Bosse, Siegwart, Furgale 2015 IJRR — tightly-coupled keyframe visual-inertial, foundational; spawned OKVIS2.
- Maplab — Schneider, Dymczyk, Fehr, Egger, Lynen, Gilitschenski, Siegwart (ETH ASL + Furgale 2018) — open multi-session mapping research platform.
- RTABMap — Labbé + Michaud 2013 — appearance-based loop closure on top of RGB-D / stereo VO, the default ROS mid-sized RGB-D SLAM.
5. Direct visual SLAM (photometric error)
Instead of matching features, minimize the per-pixel photometric error directly. Avoids feature-extraction failure modes in low-texture or blur and uses more image information per frame.
- DTAM — Newcombe, Lovegrove, Davison 2011 ICCV (“Dense Tracking And Mapping in Real-Time”). First real-time dense monocular SLAM on GPU; per-pixel inverse depth optimization.
- LSD-SLAM — Engel, Schöps, Cremers 2014 ECCV (“LSD-SLAM: Large-Scale Direct Monocular SLAM”). Semi-dense — operates only on high-gradient pixels. CPU-only.
- DSO — Engel, Koltun, Cremers 2017 PAMI (“Direct Sparse Odometry”). Photometric bundle adjustment on a small set of carefully chosen sparse points with full photometric calibration (vignette, response, exposure). Strong odometry, no loop closure in canonical form.
- VI-DSO / DSO-VI — Stumberg, Usenko, Cremers 2018 — visual-inertial extension.
- REMODE — Pizzoli, Forster, Scaramuzza 2014 ICRA (“REgularized MOnocular Depth Estimation”) — per-pixel Bayesian depth fusion for dense monocular reconstruction; coupled with SVO for tracking.
- CNN-SLAM — Tateno et al. 2017 — early hybrid using CNN depth predictions inside an LSD-SLAM-style framework.
6. Dense / volumetric SLAM
Dense map representations (TSDF, surfel, mesh) — typically RGB-D-sensor-driven, GPU-bound.
- KinectFusion — Newcombe, Izadi, Hilliges, Molyneaux, Kim, Davison, Kohli, Shotton, Hodges, Fitzgibbon 2011 ISMAR. TSDF volumetric integration with frame-to-model ICP tracking. Foundational dense RGB-D SLAM. Confined to a fixed cubic volume.
- Kintinuous — Whelan, Kaess, Fallon, Johannsson, Leonard, McDonald 2012 — KinectFusion with a shifting volume (cyclic buffer) to map larger environments.
- ElasticFusion — Whelan, Salas-Moreno, Glocker, Davison, Leutenegger 2015 (“ElasticFusion: Dense SLAM Without A Pose Graph”, RSS / IJRR). Surfel-based deformable model with non-rigid deformation graph for loop closure; no explicit pose-graph.
- InfiniTAM — Kähler, Prisacariu, Ren, Sun, Torr, Murray 2015 (and Nießner et al. 2013 introduced voxel hashing) — sparse-voxel TSDF on GPU, much higher memory-efficiency than dense grid.
- BundleFusion — Dai, Nießner, Zollhöfer, Izadi, Theobalt 2017 SIGGRAPH. Global per-frame pose optimization at scale, eliminates KinectFusion drift entirely on indoor RGB-D sequences.
- Voxblox — Oleynikova, Taylor, Fehr, Siegwart, Nieto (ETH ASL) 2017 IROS — TSDF + Euclidean Signed Distance Field (ESDF) on CPU; primary mapping back-end for planning on UAVs.
- Voxgraph — Reijgwart, Millane, Oleynikova, Siegwart, Cadena, Nieto 2020 (ETH ASL) — submap-based TSDF SLAM with pose-graph back-end.
- SLAM++ (dense) / Stereo Bundle Mapping pipelines feed dense surfel or volumetric reconstructions off-line.
7. LiDAR SLAM
3D LiDAR (Velodyne HDL/VLP, Ouster OS, Livox MID/Avia) drove a separate algorithm lineage rooted in LOAM.
- LOAM — Zhang + Singh 2014 RSS (“LOAM: Lidar Odometry and Mapping in Real-time”). Decompose into high-rate odometry (edge + planar features) and low-rate mapping. Long the #1 on KITTI odometry. Closed-source canonical implementation; many forks.
- A-LOAM — Tong Qin, simplified Ceres-based reimplementation of LOAM, widely used as a starting point.
- LeGO-LOAM — Shan + Englot 2018 IROS (“Lightweight and Ground-Optimized LiDAR Odometry and Mapping on Variable Terrain”). Ground-plane segmentation, two-step LM optimization, loop closure via ICP — for ground vehicles.
- LIO-SAM — Shan, Englot, Meyers, Wang, Ratti, Rus 2020 IROS (“LIO-SAM: Tightly-coupled Lidar Inertial Odometry via Smoothing and Mapping”). LOAM features + IMU pre-integration in GTSAM factor-graph + GPS factor + loop closure. One of the most widely-deployed open-source LIO systems.
- LIO-Mapping — Ye, Chen, Liu 2019; LINS — Qin, Cao, Cao, Liu 2019 — earlier tightly-coupled LIO.
- FAST-LIO — Xu + Zhang (HKU MARS Lab) 2021 — iterated error-state Kalman filter LIO; very low latency.
- FAST-LIO2 — Xu, Cai, He, Lin, Zhang 2022 T-RO — ikd-Tree incremental k-d tree for fast nearest-neighbor; significant speedup. Open-source de facto baseline for solid-state LiDAR (Livox).
- Point-LIO — Cai, Xu, Zhang 2023 ICRA — point-by-point LIO without frame batching; handles very-aggressive motion.
- FAST-LIO-MULTI — multi-LiDAR extension.
- NDT-Mapping — Magnusson 2009 — Normal Distributions Transform scan matching; basis of Autoware’s mapping/localization stack.
- KISS-ICP — Vizzo, Guadagnino, Mersch, Wiesmann, Behley, Stachniss 2023 RA-L (“KISS-ICP: In Defense of Point-to-Point ICP — Simple, Accurate, and Robust Registration If Done the Right Way”). Pure point-to-point ICP LiDAR odometry, no IMU, no features — surprisingly competitive.
- SuMa / SuMa++ — Behley + Stachniss 2018-2019 — surfel-based LiDAR SLAM with semantics.
- MULLS, HDL-SLAM, BLAM — other LiDAR pipelines.
8. Cartographer (Google)
- Cartographer — Hess, Kohler, Rapp, Andor 2016 ICRA (“Real-Time Loop Closure in 2D LIDAR SLAM”). 2D + 3D LiDAR with submap-based scan matching and branch-and-bound loop-closure search; pose-graph optimization in Ceres. Native ROS 2 support. Heavily deployed in warehouse AGVs and indoor robotics.
9. Visual-Inertial Odometry (loosely or tightly coupled)
VIO systems with explicit IMU pre-integration are mandatory wherever camera-only methods fail (low texture, motion blur, rolling shutter, fast motion).
- MSCKF — Mourikis + Roumeliotis 2007 ICRA (“A Multi-State Constraint Kalman Filter for Vision-aided Inertial Navigation”). Marginalize landmarks in nullspace, keep only a sliding window of past poses in state. The basis of Project Tango and Magic Leap’s tracking.
- ROVIO — Bloesch, Burri, Omari, Hutter, Siegwart 2015 IROS (“Robust Visual Inertial Odometry Using a Direct EKF-Based Approach”). Robocentric (state in body frame) iterated EKF with direct photometric updates.
- OpenVINS — Geneva, Eckenhoff, Lee, Yang, Huang 2020 ICRA — open-source MSCKF-style filter VIO from RPNG, well-maintained.
- OKVIS / OKVIS2 — see §4.
- VINS-Mono / VINS-Fusion — see §4.
- SVO — Forster, Pizzoli, Scaramuzza 2014 ICRA (“SVO: Fast Semi-Direct Monocular Visual Odometry”). Semi-direct: feature-based + photometric refinement, very fast on UAVs.
- SVO 2.0 — Forster, Zhang, Gassner, Werlberger, Scaramuzza 2017 T-RO — multi-camera, edgelet support.
- Basalt — Usenko, Demmel, Stumberg, Cremers 2020 (TUM) — visual-inertial mapping with non-linear factor recovery.
10. Event-camera SLAM
Dynamic Vision Sensors (DVS, DAVIS) output per-pixel asynchronous brightness-change events at microsecond latency. Required for very-high-speed motion (drones, FPV racing) where conventional cameras suffer motion blur.
- EVO — Rebecq, Horstschaefer, Gallego, Scaramuzza 2017 RA-L (“EVO: A Geometric Approach to Event-based 6-DOF Parallel Tracking and Mapping in Real-Time”). PTAM-style for events.
- DEVO — depth + event; ESVO — Zhou, Gallego, Shen 2021 (event stereo).
- Ultimate SLAM — Rosinol Vidal, Rebecq, Horstschaefer, Scaramuzza 2018 RA-L — events + frames + IMU on UAV (paper title “Ultimate SLAM? Combining Events, Images, and IMU”). Strong performance in HDR/blur regimes.
11. Semantic and object-level SLAM
Integrate semantic segmentation or object-level reconstruction into the SLAM estimate.
- SLAM++ — Salas-Moreno, Newcombe, Strasdat, Kelly, Davison 2013 CVPR. Pre-scanned 3D object instances are detected, posed and integrated as objects (not points) in an RGB-D SLAM graph.
- MaskFusion — Rünz, Buffier, Agapito 2018 ISMAR — per-object surfel reconstruction with Mask-RCNN; MID-Fusion, EM-Fusion are contemporaneous.
- Kimera — Rosinol, Abate, Chang, Carlone (MIT 2020-2022) — Kimera-VIO + Kimera-Mesher (3D mesh) + Kimera-Semantics (semantically-labeled mesh) + Kimera-RPGO (Robust Pose Graph Optimization with Pairwise Consistency Maximum Set). Kimera-Multi is the multi-robot extension.
- Hydra — Hughes, Chang, Carlone 2022 RSS — real-time 3D Scene Graph construction (buildings → rooms → places → objects) on top of Kimera. Foundational for embodied-AI tasks.
- Voxblox-plus-plus, PanopticFusion, MaskFusion continue the dense-semantic line.
12. Learned and neural-implicit SLAM
End-to-end differentiable bundle adjustment and neural-implicit map representations.
- DROID-SLAM — Teed + Deng 2021 NeurIPS (“DROID-SLAM: Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras”). RAFT-style flow + differentiable dense bundle adjustment layer (“DBA”); jointly optimizes poses + per-pixel inverse-depth. Auto-grade accuracy on TUM-RGBD, EuRoC, TartanAir.
- iMAP — Sucar, Liu, Ortiz, Davison 2021 ICCV (“iMAP: Implicit Mapping and Positioning in Real-Time”). Single multilayer perceptron stores the scene as a continuous occupancy/color field; tracking by gradient descent against the MLP.
- NICE-SLAM — Zhu, Peng, Larsson, Xu, Bao, Cui, Oswald, Pollefeys 2022 CVPR (“NICE-SLAM: Neural Implicit Scalable Encoding for SLAM”). Hierarchical voxel-grid features + small decoder MLPs; scalable to room-sized scenes.
- Nicer-SLAM — Zhu et al. 2023 — monocular variant.
- NeRF-SLAM — Rosinol, Leonard, Carlone 2022 — Kimera VIO + Instant-NGP NeRF for dense reconstruction.
- OrbeezSLAM — Chung, Tseng, Hsieh et al. 2022 — Instant-NGP-based real-time NeRF-SLAM with ORB front-end.
- Co-SLAM, ESLAM, Point-SLAM, GO-SLAM — recent academic NeRF-SLAM variants.
- MonoGS / Gaussian Splatting SLAM — Matsuki, Murai, Kelly, Davison 2024 CVPR (“Gaussian Splatting SLAM”). 3D Gaussian primitives optimized photometrically; very high rendering quality and competitive accuracy. The 2024-25 frontier.
- Photo-SLAM, SplaTAM, Gaussian-SLAM, MonoGS — the 3DGS SLAM family proliferating since 2024.
- Pearl, NeuralRecon, VolSDF-SLAM — neural surface SLAMs.
13. Multi-robot collaborative SLAM
- CCM-SLAM — Schmuck + Chli 2017-2019 ETH ASL — Centralized Collaborative Monocular SLAM with a server-and-agent architecture.
- DOOR-SLAM — Lajoie, Ramtoula, Chang, Carlone, Beltrame 2019 RA-L — Distributed Outlier-Robust SLAM; combines distributed pose-graph optimization with PCM (Pairwise Consistency Maximization) for outlier-robust inter-robot loop closures.
- Kimera-Multi — Chang, Tian, How, Carlone 2021 — multi-robot Kimera with Kimera-Distributed.
- Swarm-SLAM — Lajoie + Beltrame 2024 RA-L — open-source decentralized C-SLAM framework for fleets.
- Maplab Multi — ETH ASL.
- DCL-SLAM, D-Lio-SAM — distributed LiDAR variants.
14. Comparison table
| Algorithm | Sensor | Back-end | Map | Year | Typical use | Lib |
|---|---|---|---|---|---|---|
| EKF-SLAM (canonical) | mono cam | EKF | sparse landmarks | 1990 | small indoor | custom |
| MonoSLAM | mono cam | EKF | sparse | 2007 | AR research | open |
| GMapping | 2D LiDAR | RBPF | 2D occupancy | 2007 | mobile-base ROS-1 | open |
| FastSLAM 2.0 | range/cam | RBPF | landmarks | 2003 | offroad | open |
| PTAM | mono cam | bundle adj | sparse | 2007 | AR | open |
| ORB-SLAM3 | mono / stereo / RGB-D + IMU | g2o BA | sparse | 2021 | drones, AR | open |
| VINS-Mono | mono cam + IMU | Ceres slid-win | sparse | 2018 | UAV | open |
| VINS-Fusion | stereo + IMU + GPS | Ceres | sparse | 2019 | UGV/UAV | open |
| OKVIS | stereo + IMU | keyframe BA | sparse | 2015 | research | open |
| ROVIO | mono + IMU | iEKF | sparse direct | 2015 | UAV | open |
| OpenVINS | mono / stereo + IMU | MSCKF | sparse | 2020 | research baseline | open |
| MSCKF | mono + IMU | EKF (null-space) | sparse | 2007 | Tango, MagicLeap | proprietary |
| SVO 2.0 | mono / multi-cam + IMU | semi-direct BA | semi-dense | 2017 | UAV | open |
| LSD-SLAM | mono cam | pose-graph | semi-dense | 2014 | research | open |
| DSO | mono cam | photo BA | sparse direct | 2017 | research | open |
| DTAM | mono cam | dense opt | dense | 2011 | research/AR | research |
| KinectFusion | RGB-D | ICP frame-to-model | TSDF | 2011 | indoor scan | open |
| ElasticFusion | RGB-D | non-rigid deform | surfel | 2015 | indoor | open |
| BundleFusion | RGB-D | global BA | TSDF | 2017 | indoor scan | open |
| Voxblox | depth | TSDF + ESDF | volumetric | 2017 | UAV planning map | open |
| RTABMap | stereo / RGB-D | pose-graph + DBoW2 | dense optional | 2013 | ROS RGB-D | open |
| LOAM | 3D LiDAR | feature scan-match | point cloud | 2014 | KITTI / AV | closed |
| LeGO-LOAM | 3D LiDAR | feature + ICP | point cloud | 2018 | UGV | open |
| LIO-SAM | 3D LiDAR + IMU | GTSAM factor graph | point cloud | 2020 | AV / UGV | open |
| FAST-LIO2 | LiDAR (Livox) + IMU | iEKF + ikd-tree | point cloud | 2022 | drone / fast UGV | open |
| Cartographer | 2D / 3D LiDAR | Ceres branch-bound | submaps + grid | 2016 | warehouse AGV | open |
| KISS-ICP | 3D LiDAR | point-to-point ICP | none | 2023 | LiDAR odometry baseline | open |
| Kimera | stereo + IMU | GTSAM + RPGO | semantic mesh | 2020 | semantic mapping | open |
| Hydra | stereo + IMU | Kimera + scene graph | 3D scene graph | 2022 | embodied AI | open |
| DROID-SLAM | mono / stereo / RGB-D | dense diff BA | per-pixel depth | 2021 | learned VO | open |
| iMAP | RGB-D | MLP gradient descent | neural-implicit | 2021 | research | research |
| NICE-SLAM | RGB-D | hierarchical NeRF | neural-implicit | 2022 | research | open |
| Gaussian Splatting SLAM (MonoGS) | mono / RGB-D | 3DGS photo BA | 3D Gaussians | 2024 | research frontier | open |
15. Front-end techniques (shared building blocks)
- Feature extractors — Harris (1988), Shi-Tomasi (1994); SIFT (Lowe 2004); SURF (Bay 2008); FAST (Rosten 2006); ORB (Rublee, Rabaud, Konolige, Bradski 2011); BRISK (Leutenegger 2011). Learned: SuperPoint (DeTone, Malisiewicz, Rabinovich 2018), R2D2 (Revaud 2019), KP2D, DISK (Tyszkiewicz 2020), ALIKE / ALIKED (2022/23).
- Descriptor matching — BRIEF (Calonder 2010), FREAK (Alahi 2012). Learned: SuperGlue (Sarlin, Sattler, Lynen, Hartley 2020), LightGlue (Lindenberger, Sarlin, Pollefeys 2023 ICCV) — much faster than SuperGlue.
- Dense feature matching — LoFTR (Sun, Shen, Wang, Zhou, Bao 2021 CVPR) — detector-free transformer correspondences.
- Photometric / direct tracking — KLT (Lucas-Kanade 1981), inverse compositional Lucas-Kanade (Baker + Matthews 2001), direct image alignment (Engel et al.).
- Loop closure — DBoW2 vocabulary tree (Gálvez-López + Tardós 2012); NetVLAD (Arandjelović, Gronat, Torii, Pajdla, Sivic 2016) — learned place recognition; CosPlace / MixVPR more recent. Geometric verification by PnP + RANSAC or essential matrix decomposition.
- Outlier rejection — RANSAC (Fischler + Bolles 1981); MAGSAC++ (Barath 2020); GNC (Graduated Non-Convexity, Yang + Carlone 2020) — applied at the graph optimization level.
- IMU pre-integration — Forster, Carlone, Dellaert, Scaramuzza 2017 T-RO (“On-Manifold Preintegration for Real-Time Visual-Inertial Odometry”) — the canonical formulation used by every modern tightly-coupled VIO. Composes IMU integrals between two keyframes on SO(3)×R^3 in body frame so that the residual at the back-end depends linearly on first-order corrections to gyro and accel biases — making it feasible to re-linearize without re-integrating raw IMU samples.
- Robust kernels — Huber, Cauchy, Tukey, DCS (Dynamic Covariance Scaling, Agarwal 2013), switchable constraints (Sünderhauf + Protzel 2012). All four reduce the influence of false-positive loop closures inside the standard NLS optimization.
- Marginalization — Schur complement to remove old states while preserving information; foundational for sliding-window VIO (VINS-Mono, OKVIS, OpenVINS) where memory must be bounded.
- Sparse Cholesky / QR — SuiteSparse CHOLMOD and SPQR (Davis); the numerical workhorses underneath Ceres / g2o / GTSAM linear-solver back-ends.
16. Selection heuristics
- Indoor drone, monocular, AR target → ORB-SLAM3 (mono-inertial) or VINS-Mono.
- Outdoor autonomous vehicle mapping → LIO-SAM or FAST-LIO2 + Cartographer 3D, fused with cameras for semantics; post-process with g2o or GTSAM for HD-map deliverable.
- Warehouse AGV with 2D LiDAR → Cartographer 2D (ROS 2) — the default deployment.
- ROS 2 mobile base with RGB-D → RTABMap (most mature ROS RGB-D pipeline) or Cartographer (3D).
- Visual-inertial AR / handheld → proprietary stacks (ARKit, ARCore — MSCKF-derived) or open-source ROVIO / OpenVINS / VINS-Fusion.
- High-speed UAV / FPV racing → FAST-LIO2 (if LiDAR) or SVO 2.0 + event camera (Ultimate SLAM) for blur regimes.
- HD-map for self-driving → LIO-SAM + Cartographer + offline g2o/GTSAM batch optimization; manual loop-closure curation.
- UAV photogrammetry, no real-time → COLMAP offline SfM (Schönberger 2016) — not SLAM but related.
- Bin-picking / dense reconstruction → KinectFusion / ElasticFusion / BundleFusion (RGB-D), or for research-grade NICE-SLAM / Gaussian Splatting SLAM.
- Humanoid robot perception subsystem → Kimera VIO + Hydra scene graph for high-level semantics.
- Quadruped (Spot, ANYmal) → typically proprietary onboard with ICP + visual; open-source equivalent is LIO-SAM + Kimera.
- Race-car high-speed autonomous → FAST-LIO2 + Cartographer with custom high-frequency loop-closure logic.
- Surgical endoscope / minimally-invasive → DROID-SLAM or specialized learned monocular SLAM (textureless tissue).
- Underwater (sonar/visual) → Pose-graph SLAM with sonar registration (DIDSON, Sonar-SLAM); often Kalman-filter-based with DVL/INS aiding.
- Multi-robot fleet → Kimera-Multi or Swarm-SLAM or DOOR-SLAM; require communication-aware design.
- Resource-constrained microcontroller / embedded → EKF-SLAM with hand-tuned landmark count, or scan-matching-only (Hector SLAM) with no global optimization.
- GPS-denied long-range (subterranean / DARPA SubT-style) → LIO-SAM or FAST-LIO2 fused with thermal / radar; require resilience to dust, smoke, dynamic obstacles, and degraded LiDAR geometry (long featureless tunnels). The 2021 DARPA SubT-Final winners ran LIO-SAM variants with custom degeneracy detection.
- Dynamic / non-rigid scene → DynaSLAM (Bescos 2018), DS-SLAM, VDO-SLAM — all extend ORB-SLAM with semantic masking or motion segmentation to ignore moving objects. Pure NeRF / 3DGS SLAM still struggles with dynamics.
- Low-cost ground robot (Roomba-class) → wheel-odometry + 2D LiDAR with GMapping or Cartographer 2D; or pure visual with monocular ORB-SLAM3.
17. Datasets and benchmarks
Algorithm comparisons depend on a small set of standard datasets that have become the de facto evaluation harnesses:
- KITTI (Geiger, Lenz, Stiller, Urtasun 2013 IJRR) — automotive stereo + Velodyne HDL-64 + GPS/INS ground truth; the canonical autonomous-driving SLAM benchmark. KITTI-360 (2022) extends with 360° panoramas.
- EuRoC MAV (Burri et al. 2016 IJRR) — drone stereo + IMU + Vicon ground truth; the canonical visual-inertial benchmark for UAV-scale motion.
- TUM RGB-D (Sturm, Engelhard, Endres, Burgard, Cremers 2012 IROS) — handheld RGB-D + mocap; the canonical indoor RGB-D evaluation.
- TUM VI (Schubert, Goll, Demmel, Usenko, Stueckler, Cremers 2018) — visual-inertial handheld, fisheye stereo.
- TartanAir (Wang et al. 2020) — large-scale photorealistic simulation, diverse environments and motion; used as the DROID-SLAM training corpus.
- ScanNet (Dai 2017) and Replica (Straub 2019) — indoor RGB-D scene datasets used by neural-implicit SLAMs (iMAP, NICE-SLAM).
- Newer College (Ramezani 2020) — Oxford handheld 3D LiDAR + IMU + cameras + survey-grade ground truth; LIO benchmark.
- NCLT (Carlevaris-Bianco 2016, U-Michigan) — Segway long-term outdoor (15-month) LiDAR + camera; long-term SLAM benchmark.
- Hilti SLAM Challenge (2021-2023) — industrial / construction environment, multi-sensor; informs LIO research.
Metrics: Absolute Trajectory Error (ATE) in RMSE meters after Sim(3) alignment is the dominant single-number metric (Sturm 2012); Relative Pose Error (RPE) measures drift over fixed-distance windows. For 3D map quality: Chamfer distance to ground-truth mesh, F-score at a threshold.
18. Theoretical underpinnings (brief)
- Observability — Visual-inertial systems have four unobservable directions (global position xyz, yaw); pitch and roll are observable thanks to gravity. Visual-only mono SLAM additionally has scale unobservable (7 unobservable degrees: SE(3) gauge + scale). LiDAR-inertial: same four unobservable as VIO. Consistency-aware filtering (First-Estimates Jacobian, OC-EKF — Huang, Mourikis, Roumeliotis 2010) enforces these.
- Information form vs covariance form — equivalent dual representations of the same Gaussian; information form is sparse when factor graphs are sparse, hence the dominance of information-form smoothers.
- Gauge freedom — SLAM is defined up to a rigid transform of the global frame (and scale for monocular); the back-end fixes this by anchoring the first pose or by computing on the manifold modulo the gauge group.
- Maximum-A-Posteriori (MAP) — the unifying view of modern SLAM: minimize the negative-log-posterior over poses+landmarks, subject to Gaussian (or robust-kernel) priors on measurements and motion. Under linear-Gaussian assumptions this reduces to weighted least-squares. The factor graph is just the Bayes-net’s factor-graph reduction.
19. Cross-references
- slam — overview note in Tier 1.
- computer-vision-robotics — visual front-end fundamentals shared with non-SLAM CV.
- perception-sensors — sensor characteristics that constrain algorithm choice.
- bayesian-estimation — EKF / UKF / particle filter / factor-graph theory underpinning every back-end.
- sensors-pose-motion — IMU pre-integration, GNSS fusion, sensor models.
- path-planning-algorithms — consumer of SLAM maps; ESDF / occupancy / mesh feed into planners.
20. Citations (primary)
- Cadena, Carlone, Carrillo, Latif, Scaramuzza, Neira, Reid, Leonard. “Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age.” IEEE T-RO 32(6), 2016.
- Davison, Reid, Molton, Stasse. “MonoSLAM: Real-Time Single Camera SLAM.” IEEE T-PAMI 29(6), 2007.
- Klein, Murray. “Parallel Tracking and Mapping for Small AR Workspaces.” ISMAR 2007.
- Engel, Koltun, Cremers. “Direct Sparse Odometry.” IEEE T-PAMI 40(3), 2018.
- Mur-Artal, Tardós. “ORB-SLAM2: an Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras.” IEEE T-RO 33(5), 2017.
- Campos, Elvira, Gómez Rodríguez, Montiel, Tardós. “ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual-Inertial and Multi-Map SLAM.” IEEE T-RO 37(6), 2021.
- Hess, Kohler, Rapp, Andor. “Real-Time Loop Closure in 2D LIDAR SLAM.” ICRA 2016.
- Shan, Englot, Meyers, Wang, Ratti, Rus. “LIO-SAM: Tightly-coupled Lidar Inertial Odometry via Smoothing and Mapping.” IROS 2020.
- Xu, Cai, He, Lin, Zhang. “FAST-LIO2: Fast Direct LiDAR-Inertial Odometry.” IEEE T-RO 38(4), 2022.
- Qin, Li, Shen. “VINS-Mono: A Robust and Versatile Monocular Visual-Inertial State Estimator.” IEEE T-RO 34(4), 2018.
- Newcombe, Izadi, Hilliges, Molyneaux, Kim, Davison, Kohli, Shotton, Hodges, Fitzgibbon. “KinectFusion: Real-time Dense Surface Mapping and Tracking.” ISMAR 2011.
- Whelan, Salas-Moreno, Glocker, Davison, Leutenegger. “ElasticFusion: Real-time Dense SLAM and Light Source Estimation.” IJRR 35(14), 2016.
- Teed, Deng. “DROID-SLAM: Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras.” NeurIPS 2021.
- Matsuki, Murai, Kelly, Davison. “Gaussian Splatting SLAM.” CVPR 2024.
- Rosinol, Abate, Chang, Carlone. “Kimera: an Open-Source Library for Real-Time Metric-Semantic Localization and Mapping.” ICRA 2020.
- Forster, Carlone, Dellaert, Scaramuzza. “On-Manifold Preintegration for Real-Time Visual-Inertial Odometry.” IEEE T-RO 33(1), 2017.
- Grisetti, Stachniss, Burgard. “Improved Techniques for Grid Mapping with Rao-Blackwellized Particle Filters.” IEEE T-RO 23(1), 2007.
- Smith, Self, Cheeseman. “Estimating Uncertain Spatial Relationships in Robotics.” In Autonomous Robot Vehicles, Springer, 1990.
- Montemerlo, Thrun, Koller, Wegbreit. “FastSLAM: A Factored Solution to the Simultaneous Localization and Mapping Problem.” AAAI 2002.
- Mourikis, Roumeliotis. “A Multi-State Constraint Kalman Filter for Vision-aided Inertial Navigation.” ICRA 2007.
- Kümmerle, Grisetti, Strasdat, Konolige, Burgard. “g2o: A General Framework for Graph Optimization.” ICRA 2011.
- Kaess, Johannsson, Roberts, Ila, Leonard, Dellaert. “iSAM2: Incremental Smoothing and Mapping Using the Bayes Tree.” IJRR 31(2), 2012.
- Dellaert, Kaess. “Square Root SAM: Simultaneous Localization and Mapping via Square Root Information Smoothing.” IJRR 25(12), 2006.
- Lu, Milios. “Globally Consistent Range Scan Alignment for Environment Mapping.” Autonomous Robots 4(4), 1997.
- Bloesch, Burri, Omari, Hutter, Siegwart. “Iterated Extended Kalman Filter Based Visual-Inertial Odometry Using Direct Photometric Feedback.” IJRR 36(10), 2017.
- Sucar, Liu, Ortiz, Davison. “iMAP: Implicit Mapping and Positioning in Real-Time.” ICCV 2021.
- Zhu, Peng, Larsson, Xu, Bao, Cui, Oswald, Pollefeys. “NICE-SLAM: Neural Implicit Scalable Encoding for SLAM.” CVPR 2022.
- Vizzo, Guadagnino, Mersch, Wiesmann, Behley, Stachniss. “KISS-ICP: In Defense of Point-to-Point ICP — Simple, Accurate, and Robust Registration If Done the Right Way.” IEEE RA-L 8(2), 2023.
- Hughes, Chang, Carlone. “Hydra: A Real-Time Spatial Perception System for 3D Scene Graph Construction and Optimization.” RSS 2022.
- Sarlin, DeTone, Malisiewicz, Rabinovich. “SuperGlue: Learning Feature Matching with Graph Neural Networks.” CVPR 2020.
- Lindenberger, Sarlin, Pollefeys. “LightGlue: Local Feature Matching at Light Speed.” ICCV 2023.
- Gálvez-López, Tardós. “Bags of Binary Words for Fast Place Recognition in Image Sequences.” IEEE T-RO 28(5), 2012.