SLAM Algorithms — Family Index

Simultaneous Localization And Mapping is the joint inference problem of estimating both robot trajectory and a map of the environment from sensor data, when neither is known a priori. After three decades the field has fragmented into a zoo of algorithms differentiated along three axes — sensor modality, map representation, and back-end estimator. This index catalogs the canonical members and notes which combinations are deployed in practice.

1. At a glance — taxonomy axes

SLAM systems decompose along three largely orthogonal axes:

  • Sensor modality: monocular camera, stereo, RGB-D (Kinect, RealSense, Tango), 2D LiDAR, 3D LiDAR (Velodyne, Ouster, Livox), event camera (DVS / DAVIS), radar (FMCW), sonar (underwater), IMU (always auxiliary, never alone). Visual-inertial (VIO) and LiDAR-inertial (LIO) are the dominant tightly-coupled multi-sensor combinations.
  • Map representation: sparse landmark (3D points + descriptors), semi-dense (high-gradient pixels), dense (every pixel / every voxel), volumetric (TSDF grid, occupancy grid, ESDF), surfel (oriented disks), mesh, neural-implicit (MLP weights), 3D Gaussian (Gaussian Splatting).
  • Estimator back-end: Extended Kalman Filter (EKF), Unscented KF (UKF), particle filter (Rao-Blackwellized), pose-graph (poses as nodes, relative-motion measurements as edges), factor-graph (poses + landmarks + IMU pre-integration as factors), batch bundle adjustment, incremental smoothing (iSAM2 Bayes tree), learned end-to-end.

The 1990s-2007 filtering era used EKF/UKF/particle filters with a strict O(n²)-or-worse cost in landmark count. The “graph SLAM” insight (Lu + Milios 1997, Dellaert 2005 square-root SAM) re-cast the problem as sparse nonlinear least-squares on a graph, enabling O(n) incremental smoothing via Cholesky / Bayes-tree factorization and unlocking modern scalable SLAM.

A practical SLAM system also splits responsibility along a front-end / back-end axis: the front-end is the per-frame sensor-driven step (feature extraction, data association, scan matching, photometric tracking, IMU pre-integration) that emits constraints, and the back-end is the global optimizer (filter, smoother, graph optimizer) that fuses them. Loop closure spans both: the front-end proposes candidates (appearance-based with DBoW2/NetVLAD, or geometry-based with scan-context / branch-and-bound) and the back-end accepts, rejects, or down-weights them (RANSAC, PCM, GNC, switchable constraints). Most failure modes in real-world SLAM trace to data-association errors at the front-end propagating into the back-end as false-positive loop closures, motivating the heavy robust-back-end research of the last decade.

2. Filtering era (Bayesian recursive estimation)

Before pose-graph SLAM matured, the dominant paradigm was a single Gaussian (EKF) or particle distribution over the joint robot+map state, updated recursively per sensor frame.

  • EKF-SLAM — Smith, Self, Cheeseman 1990 (“Estimating Uncertain Spatial Relationships in Robotics”, the “SPmodel” / stochastic map). The joint covariance matrix is updated by linearizing both motion and observation models. Computational complexity is O(n²) per update in number of landmarks, fundamentally limiting it to a few hundred landmarks. MonoSLAM (Davison 2007 PAMI) was the first real-time monocular EKF-SLAM at 30 Hz on a desktop CPU, ~100 features tracked, a landmark milestone for visual SLAM. Inverse-depth parameterization (Civera, Davison, Montiel 2008) extended EKF-SLAM to handle low-parallax points.
  • UKF-SLAM — Unscented transform avoids Jacobian computation and handles stronger nonlinearity than EKF, but rarely used in practice because the pose-graph approach dominated by the time UKF-SLAM matured.
  • FastSLAM — Montemerlo, Thrun, Koller, Wegbreit 2002 (“FastSLAM: A Factored Solution to the Simultaneous Localization and Mapping Problem”). Rao-Blackwellized particle filter: each particle is a trajectory hypothesis, and conditional on a trajectory, landmark posteriors decouple into independent low-dimensional EKFs. Scales to thousands of landmarks. FastSLAM 2.0 (Montemerlo + Thrun 2003) added improved proposal distribution using current measurement.
  • GMapping — Grisetti, Stachniss, Burgard 2007 (“Improved Techniques for Grid Mapping with Rao-Blackwellized Particle Filters”, IEEE T-RO). Rao-Blackwellized particle filter for 2D occupancy-grid LiDAR. The standard ROS-1 mobile-robot SLAM baseline 2008-2016. Limitations: particle depletion in long corridors, no native loop closure.
  • Hector SLAM — Kohlbrecher et al. 2011 — scan-matching-only 2D LiDAR SLAM, no odometry required, common on quadrotors.

Filtering-era systems are still preferred where computational budget is fixed and small (microcontroller-grade) or where the state stays small (camera-mounted AR with bounded workspace).

3. Pose-graph and factor-graph back-ends (modern)

The graph-SLAM paradigm formulates the MAP estimate as a sparse nonlinear least-squares problem on a graph. Pose-graphs (Lu + Milios 1997) keep only poses with relative-pose edges; factor-graphs (Dellaert + Kaess 2006) admit arbitrary factor types (IMU pre-integration, GPS, range, landmark observations). Minimization is of the negative log-likelihood — equivalent to weighted least-squares under Gaussian noise.

  • TORO — Grisetti, Stachniss, Grzonka, Burgard 2007 — Tree-based netwORk Optimizer; stochastic gradient descent on a spanning-tree-parameterized graph.
  • HOG-Man — Grisetti, Kümmerle, Stachniss, Frese, Hertzberg 2010 — Hierarchical Optimization for pose Graphs on Manifolds.
  • g2o — Kümmerle, Grisetti, Strasdat, Konolige, Burgard 2011 ICRA (“g2o: A General Framework for Graph Optimization”). C++ library, sparse Cholesky / PCG / Levenberg-Marquardt. Used in ORB-SLAM, RGB-D-SLAM, parts of Cartographer.
  • GTSAM — Dellaert (Georgia Tech, 2012-present) — factor-graph library with the iSAM2 incremental smoother backed by a Bayes tree data structure (Kaess et al. 2012 IJRR). iSAM (2008) was the original incremental smoothing-and-mapping algorithm. GTSAM is the back-end of choice for modern factor-graph SLAM: Kimera, LIO-SAM, multi-robot extensions, DOOR-SLAM.
  • Ceres Solver — Google 2010+. General-purpose nonlinear least-squares with auto-differentiation. Powers VINS-Mono / VINS-Fusion, many academic VO pipelines, and Google’s own Tango.
  • SE-Sync — Rosen, Carlone, Bandeira, Leonard 2017 — certifiably-correct synchronization on SE(d), addresses local-minima problem of pose-graph SLAM.

4. Feature-based visual SLAM

Track sparse keypoints across frames, triangulate them as 3D landmarks, optimize via bundle adjustment.

  • MonoSLAM — Davison 2007 PAMI; EKF-based.
  • PTAM — Klein + Murray 2007 ISMAR (“Parallel Tracking And Mapping for Small AR Workspaces”). Split tracking (per-frame) from mapping (background bundle adjustment) into separate threads — became the architectural template for nearly every modern visual SLAM system. Designed for AR.
  • ORB-SLAM — Mur-Artal, Montiel, Tardós 2015 T-RO (“ORB-SLAM: A Versatile and Accurate Monocular SLAM System”). ORB feature (Rublee 2011), three threads (tracking / local mapping / loop closing), DBoW2 vocabulary tree for place recognition.
  • ORB-SLAM2 — Mur-Artal + Tardós 2017 T-RO — mono / stereo / RGB-D unified.
  • ORB-SLAM3 — Campos, Elvira, Gómez Rodríguez, Montiel, Tardós 2021 T-RO — adds tight visual-inertial fusion, multi-map system (“Atlas”), pinhole + fisheye support. The current reference open-source visual-(inertial-)SLAM.
  • VINS-Mono — Qin, Li, Shen (HKUST 2018, IEEE T-RO) — tightly-coupled monocular VIO with sliding-window optimization in Ceres, IMU pre-integration. Robust on aerial vehicles.
  • VINS-Fusion — Qin et al. 2019 — VINS-Mono extended with stereo and GPS.
  • OKVIS — Leutenegger, Lynen, Bosse, Siegwart, Furgale 2015 IJRR — tightly-coupled keyframe visual-inertial, foundational; spawned OKVIS2.
  • Maplab — Schneider, Dymczyk, Fehr, Egger, Lynen, Gilitschenski, Siegwart (ETH ASL + Furgale 2018) — open multi-session mapping research platform.
  • RTABMap — Labbé + Michaud 2013 — appearance-based loop closure on top of RGB-D / stereo VO, the default ROS mid-sized RGB-D SLAM.

5. Direct visual SLAM (photometric error)

Instead of matching features, minimize the per-pixel photometric error directly. Avoids feature-extraction failure modes in low-texture or blur and uses more image information per frame.

  • DTAM — Newcombe, Lovegrove, Davison 2011 ICCV (“Dense Tracking And Mapping in Real-Time”). First real-time dense monocular SLAM on GPU; per-pixel inverse depth optimization.
  • LSD-SLAM — Engel, Schöps, Cremers 2014 ECCV (“LSD-SLAM: Large-Scale Direct Monocular SLAM”). Semi-dense — operates only on high-gradient pixels. CPU-only.
  • DSO — Engel, Koltun, Cremers 2017 PAMI (“Direct Sparse Odometry”). Photometric bundle adjustment on a small set of carefully chosen sparse points with full photometric calibration (vignette, response, exposure). Strong odometry, no loop closure in canonical form.
  • VI-DSO / DSO-VI — Stumberg, Usenko, Cremers 2018 — visual-inertial extension.
  • REMODE — Pizzoli, Forster, Scaramuzza 2014 ICRA (“REgularized MOnocular Depth Estimation”) — per-pixel Bayesian depth fusion for dense monocular reconstruction; coupled with SVO for tracking.
  • CNN-SLAM — Tateno et al. 2017 — early hybrid using CNN depth predictions inside an LSD-SLAM-style framework.

6. Dense / volumetric SLAM

Dense map representations (TSDF, surfel, mesh) — typically RGB-D-sensor-driven, GPU-bound.

  • KinectFusion — Newcombe, Izadi, Hilliges, Molyneaux, Kim, Davison, Kohli, Shotton, Hodges, Fitzgibbon 2011 ISMAR. TSDF volumetric integration with frame-to-model ICP tracking. Foundational dense RGB-D SLAM. Confined to a fixed cubic volume.
  • Kintinuous — Whelan, Kaess, Fallon, Johannsson, Leonard, McDonald 2012 — KinectFusion with a shifting volume (cyclic buffer) to map larger environments.
  • ElasticFusion — Whelan, Salas-Moreno, Glocker, Davison, Leutenegger 2015 (“ElasticFusion: Dense SLAM Without A Pose Graph”, RSS / IJRR). Surfel-based deformable model with non-rigid deformation graph for loop closure; no explicit pose-graph.
  • InfiniTAM — Kähler, Prisacariu, Ren, Sun, Torr, Murray 2015 (and Nießner et al. 2013 introduced voxel hashing) — sparse-voxel TSDF on GPU, much higher memory-efficiency than dense grid.
  • BundleFusion — Dai, Nießner, Zollhöfer, Izadi, Theobalt 2017 SIGGRAPH. Global per-frame pose optimization at scale, eliminates KinectFusion drift entirely on indoor RGB-D sequences.
  • Voxblox — Oleynikova, Taylor, Fehr, Siegwart, Nieto (ETH ASL) 2017 IROS — TSDF + Euclidean Signed Distance Field (ESDF) on CPU; primary mapping back-end for planning on UAVs.
  • Voxgraph — Reijgwart, Millane, Oleynikova, Siegwart, Cadena, Nieto 2020 (ETH ASL) — submap-based TSDF SLAM with pose-graph back-end.
  • SLAM++ (dense) / Stereo Bundle Mapping pipelines feed dense surfel or volumetric reconstructions off-line.

7. LiDAR SLAM

3D LiDAR (Velodyne HDL/VLP, Ouster OS, Livox MID/Avia) drove a separate algorithm lineage rooted in LOAM.

  • LOAM — Zhang + Singh 2014 RSS (“LOAM: Lidar Odometry and Mapping in Real-time”). Decompose into high-rate odometry (edge + planar features) and low-rate mapping. Long the #1 on KITTI odometry. Closed-source canonical implementation; many forks.
  • A-LOAM — Tong Qin, simplified Ceres-based reimplementation of LOAM, widely used as a starting point.
  • LeGO-LOAM — Shan + Englot 2018 IROS (“Lightweight and Ground-Optimized LiDAR Odometry and Mapping on Variable Terrain”). Ground-plane segmentation, two-step LM optimization, loop closure via ICP — for ground vehicles.
  • LIO-SAM — Shan, Englot, Meyers, Wang, Ratti, Rus 2020 IROS (“LIO-SAM: Tightly-coupled Lidar Inertial Odometry via Smoothing and Mapping”). LOAM features + IMU pre-integration in GTSAM factor-graph + GPS factor + loop closure. One of the most widely-deployed open-source LIO systems.
  • LIO-Mapping — Ye, Chen, Liu 2019; LINS — Qin, Cao, Cao, Liu 2019 — earlier tightly-coupled LIO.
  • FAST-LIO — Xu + Zhang (HKU MARS Lab) 2021 — iterated error-state Kalman filter LIO; very low latency.
  • FAST-LIO2 — Xu, Cai, He, Lin, Zhang 2022 T-RO — ikd-Tree incremental k-d tree for fast nearest-neighbor; significant speedup. Open-source de facto baseline for solid-state LiDAR (Livox).
  • Point-LIO — Cai, Xu, Zhang 2023 ICRA — point-by-point LIO without frame batching; handles very-aggressive motion.
  • FAST-LIO-MULTI — multi-LiDAR extension.
  • NDT-Mapping — Magnusson 2009 — Normal Distributions Transform scan matching; basis of Autoware’s mapping/localization stack.
  • KISS-ICP — Vizzo, Guadagnino, Mersch, Wiesmann, Behley, Stachniss 2023 RA-L (“KISS-ICP: In Defense of Point-to-Point ICP — Simple, Accurate, and Robust Registration If Done the Right Way”). Pure point-to-point ICP LiDAR odometry, no IMU, no features — surprisingly competitive.
  • SuMa / SuMa++ — Behley + Stachniss 2018-2019 — surfel-based LiDAR SLAM with semantics.
  • MULLS, HDL-SLAM, BLAM — other LiDAR pipelines.

8. Cartographer (Google)

  • Cartographer — Hess, Kohler, Rapp, Andor 2016 ICRA (“Real-Time Loop Closure in 2D LIDAR SLAM”). 2D + 3D LiDAR with submap-based scan matching and branch-and-bound loop-closure search; pose-graph optimization in Ceres. Native ROS 2 support. Heavily deployed in warehouse AGVs and indoor robotics.

9. Visual-Inertial Odometry (loosely or tightly coupled)

VIO systems with explicit IMU pre-integration are mandatory wherever camera-only methods fail (low texture, motion blur, rolling shutter, fast motion).

  • MSCKF — Mourikis + Roumeliotis 2007 ICRA (“A Multi-State Constraint Kalman Filter for Vision-aided Inertial Navigation”). Marginalize landmarks in nullspace, keep only a sliding window of past poses in state. The basis of Project Tango and Magic Leap’s tracking.
  • ROVIO — Bloesch, Burri, Omari, Hutter, Siegwart 2015 IROS (“Robust Visual Inertial Odometry Using a Direct EKF-Based Approach”). Robocentric (state in body frame) iterated EKF with direct photometric updates.
  • OpenVINS — Geneva, Eckenhoff, Lee, Yang, Huang 2020 ICRA — open-source MSCKF-style filter VIO from RPNG, well-maintained.
  • OKVIS / OKVIS2 — see §4.
  • VINS-Mono / VINS-Fusion — see §4.
  • SVO — Forster, Pizzoli, Scaramuzza 2014 ICRA (“SVO: Fast Semi-Direct Monocular Visual Odometry”). Semi-direct: feature-based + photometric refinement, very fast on UAVs.
  • SVO 2.0 — Forster, Zhang, Gassner, Werlberger, Scaramuzza 2017 T-RO — multi-camera, edgelet support.
  • Basalt — Usenko, Demmel, Stumberg, Cremers 2020 (TUM) — visual-inertial mapping with non-linear factor recovery.

10. Event-camera SLAM

Dynamic Vision Sensors (DVS, DAVIS) output per-pixel asynchronous brightness-change events at microsecond latency. Required for very-high-speed motion (drones, FPV racing) where conventional cameras suffer motion blur.

  • EVO — Rebecq, Horstschaefer, Gallego, Scaramuzza 2017 RA-L (“EVO: A Geometric Approach to Event-based 6-DOF Parallel Tracking and Mapping in Real-Time”). PTAM-style for events.
  • DEVO — depth + event; ESVO — Zhou, Gallego, Shen 2021 (event stereo).
  • Ultimate SLAM — Rosinol Vidal, Rebecq, Horstschaefer, Scaramuzza 2018 RA-L — events + frames + IMU on UAV (paper title “Ultimate SLAM? Combining Events, Images, and IMU”). Strong performance in HDR/blur regimes.

11. Semantic and object-level SLAM

Integrate semantic segmentation or object-level reconstruction into the SLAM estimate.

  • SLAM++ — Salas-Moreno, Newcombe, Strasdat, Kelly, Davison 2013 CVPR. Pre-scanned 3D object instances are detected, posed and integrated as objects (not points) in an RGB-D SLAM graph.
  • MaskFusion — Rünz, Buffier, Agapito 2018 ISMAR — per-object surfel reconstruction with Mask-RCNN; MID-Fusion, EM-Fusion are contemporaneous.
  • Kimera — Rosinol, Abate, Chang, Carlone (MIT 2020-2022) — Kimera-VIO + Kimera-Mesher (3D mesh) + Kimera-Semantics (semantically-labeled mesh) + Kimera-RPGO (Robust Pose Graph Optimization with Pairwise Consistency Maximum Set). Kimera-Multi is the multi-robot extension.
  • Hydra — Hughes, Chang, Carlone 2022 RSS — real-time 3D Scene Graph construction (buildings → rooms → places → objects) on top of Kimera. Foundational for embodied-AI tasks.
  • Voxblox-plus-plus, PanopticFusion, MaskFusion continue the dense-semantic line.

12. Learned and neural-implicit SLAM

End-to-end differentiable bundle adjustment and neural-implicit map representations.

  • DROID-SLAM — Teed + Deng 2021 NeurIPS (“DROID-SLAM: Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras”). RAFT-style flow + differentiable dense bundle adjustment layer (“DBA”); jointly optimizes poses + per-pixel inverse-depth. Auto-grade accuracy on TUM-RGBD, EuRoC, TartanAir.
  • iMAP — Sucar, Liu, Ortiz, Davison 2021 ICCV (“iMAP: Implicit Mapping and Positioning in Real-Time”). Single multilayer perceptron stores the scene as a continuous occupancy/color field; tracking by gradient descent against the MLP.
  • NICE-SLAM — Zhu, Peng, Larsson, Xu, Bao, Cui, Oswald, Pollefeys 2022 CVPR (“NICE-SLAM: Neural Implicit Scalable Encoding for SLAM”). Hierarchical voxel-grid features + small decoder MLPs; scalable to room-sized scenes.
  • Nicer-SLAM — Zhu et al. 2023 — monocular variant.
  • NeRF-SLAM — Rosinol, Leonard, Carlone 2022 — Kimera VIO + Instant-NGP NeRF for dense reconstruction.
  • OrbeezSLAM — Chung, Tseng, Hsieh et al. 2022 — Instant-NGP-based real-time NeRF-SLAM with ORB front-end.
  • Co-SLAM, ESLAM, Point-SLAM, GO-SLAM — recent academic NeRF-SLAM variants.
  • MonoGS / Gaussian Splatting SLAM — Matsuki, Murai, Kelly, Davison 2024 CVPR (“Gaussian Splatting SLAM”). 3D Gaussian primitives optimized photometrically; very high rendering quality and competitive accuracy. The 2024-25 frontier.
  • Photo-SLAM, SplaTAM, Gaussian-SLAM, MonoGS — the 3DGS SLAM family proliferating since 2024.
  • Pearl, NeuralRecon, VolSDF-SLAM — neural surface SLAMs.

13. Multi-robot collaborative SLAM

  • CCM-SLAM — Schmuck + Chli 2017-2019 ETH ASL — Centralized Collaborative Monocular SLAM with a server-and-agent architecture.
  • DOOR-SLAM — Lajoie, Ramtoula, Chang, Carlone, Beltrame 2019 RA-L — Distributed Outlier-Robust SLAM; combines distributed pose-graph optimization with PCM (Pairwise Consistency Maximization) for outlier-robust inter-robot loop closures.
  • Kimera-Multi — Chang, Tian, How, Carlone 2021 — multi-robot Kimera with Kimera-Distributed.
  • Swarm-SLAM — Lajoie + Beltrame 2024 RA-L — open-source decentralized C-SLAM framework for fleets.
  • Maplab Multi — ETH ASL.
  • DCL-SLAM, D-Lio-SAM — distributed LiDAR variants.

14. Comparison table

AlgorithmSensorBack-endMapYearTypical useLib
EKF-SLAM (canonical)mono camEKFsparse landmarks1990small indoorcustom
MonoSLAMmono camEKFsparse2007AR researchopen
GMapping2D LiDARRBPF2D occupancy2007mobile-base ROS-1open
FastSLAM 2.0range/camRBPFlandmarks2003offroadopen
PTAMmono cambundle adjsparse2007ARopen
ORB-SLAM3mono / stereo / RGB-D + IMUg2o BAsparse2021drones, ARopen
VINS-Monomono cam + IMUCeres slid-winsparse2018UAVopen
VINS-Fusionstereo + IMU + GPSCeressparse2019UGV/UAVopen
OKVISstereo + IMUkeyframe BAsparse2015researchopen
ROVIOmono + IMUiEKFsparse direct2015UAVopen
OpenVINSmono / stereo + IMUMSCKFsparse2020research baselineopen
MSCKFmono + IMUEKF (null-space)sparse2007Tango, MagicLeapproprietary
SVO 2.0mono / multi-cam + IMUsemi-direct BAsemi-dense2017UAVopen
LSD-SLAMmono campose-graphsemi-dense2014researchopen
DSOmono camphoto BAsparse direct2017researchopen
DTAMmono camdense optdense2011research/ARresearch
KinectFusionRGB-DICP frame-to-modelTSDF2011indoor scanopen
ElasticFusionRGB-Dnon-rigid deformsurfel2015indooropen
BundleFusionRGB-Dglobal BATSDF2017indoor scanopen
VoxbloxdepthTSDF + ESDFvolumetric2017UAV planning mapopen
RTABMapstereo / RGB-Dpose-graph + DBoW2dense optional2013ROS RGB-Dopen
LOAM3D LiDARfeature scan-matchpoint cloud2014KITTI / AVclosed
LeGO-LOAM3D LiDARfeature + ICPpoint cloud2018UGVopen
LIO-SAM3D LiDAR + IMUGTSAM factor graphpoint cloud2020AV / UGVopen
FAST-LIO2LiDAR (Livox) + IMUiEKF + ikd-treepoint cloud2022drone / fast UGVopen
Cartographer2D / 3D LiDARCeres branch-boundsubmaps + grid2016warehouse AGVopen
KISS-ICP3D LiDARpoint-to-point ICPnone2023LiDAR odometry baselineopen
Kimerastereo + IMUGTSAM + RPGOsemantic mesh2020semantic mappingopen
Hydrastereo + IMUKimera + scene graph3D scene graph2022embodied AIopen
DROID-SLAMmono / stereo / RGB-Ddense diff BAper-pixel depth2021learned VOopen
iMAPRGB-DMLP gradient descentneural-implicit2021researchresearch
NICE-SLAMRGB-Dhierarchical NeRFneural-implicit2022researchopen
Gaussian Splatting SLAM (MonoGS)mono / RGB-D3DGS photo BA3D Gaussians2024research frontieropen

15. Front-end techniques (shared building blocks)

  • Feature extractors — Harris (1988), Shi-Tomasi (1994); SIFT (Lowe 2004); SURF (Bay 2008); FAST (Rosten 2006); ORB (Rublee, Rabaud, Konolige, Bradski 2011); BRISK (Leutenegger 2011). Learned: SuperPoint (DeTone, Malisiewicz, Rabinovich 2018), R2D2 (Revaud 2019), KP2D, DISK (Tyszkiewicz 2020), ALIKE / ALIKED (2022/23).
  • Descriptor matching — BRIEF (Calonder 2010), FREAK (Alahi 2012). Learned: SuperGlue (Sarlin, Sattler, Lynen, Hartley 2020), LightGlue (Lindenberger, Sarlin, Pollefeys 2023 ICCV) — much faster than SuperGlue.
  • Dense feature matchingLoFTR (Sun, Shen, Wang, Zhou, Bao 2021 CVPR) — detector-free transformer correspondences.
  • Photometric / direct tracking — KLT (Lucas-Kanade 1981), inverse compositional Lucas-Kanade (Baker + Matthews 2001), direct image alignment (Engel et al.).
  • Loop closureDBoW2 vocabulary tree (Gálvez-López + Tardós 2012); NetVLAD (Arandjelović, Gronat, Torii, Pajdla, Sivic 2016) — learned place recognition; CosPlace / MixVPR more recent. Geometric verification by PnP + RANSAC or essential matrix decomposition.
  • Outlier rejection — RANSAC (Fischler + Bolles 1981); MAGSAC++ (Barath 2020); GNC (Graduated Non-Convexity, Yang + Carlone 2020) — applied at the graph optimization level.
  • IMU pre-integration — Forster, Carlone, Dellaert, Scaramuzza 2017 T-RO (“On-Manifold Preintegration for Real-Time Visual-Inertial Odometry”) — the canonical formulation used by every modern tightly-coupled VIO. Composes IMU integrals between two keyframes on SO(3)×R^3 in body frame so that the residual at the back-end depends linearly on first-order corrections to gyro and accel biases — making it feasible to re-linearize without re-integrating raw IMU samples.
  • Robust kernels — Huber, Cauchy, Tukey, DCS (Dynamic Covariance Scaling, Agarwal 2013), switchable constraints (Sünderhauf + Protzel 2012). All four reduce the influence of false-positive loop closures inside the standard NLS optimization.
  • Marginalization — Schur complement to remove old states while preserving information; foundational for sliding-window VIO (VINS-Mono, OKVIS, OpenVINS) where memory must be bounded.
  • Sparse Cholesky / QR — SuiteSparse CHOLMOD and SPQR (Davis); the numerical workhorses underneath Ceres / g2o / GTSAM linear-solver back-ends.

16. Selection heuristics

  • Indoor drone, monocular, AR target → ORB-SLAM3 (mono-inertial) or VINS-Mono.
  • Outdoor autonomous vehicle mapping → LIO-SAM or FAST-LIO2 + Cartographer 3D, fused with cameras for semantics; post-process with g2o or GTSAM for HD-map deliverable.
  • Warehouse AGV with 2D LiDAR → Cartographer 2D (ROS 2) — the default deployment.
  • ROS 2 mobile base with RGB-D → RTABMap (most mature ROS RGB-D pipeline) or Cartographer (3D).
  • Visual-inertial AR / handheld → proprietary stacks (ARKit, ARCore — MSCKF-derived) or open-source ROVIO / OpenVINS / VINS-Fusion.
  • High-speed UAV / FPV racing → FAST-LIO2 (if LiDAR) or SVO 2.0 + event camera (Ultimate SLAM) for blur regimes.
  • HD-map for self-driving → LIO-SAM + Cartographer + offline g2o/GTSAM batch optimization; manual loop-closure curation.
  • UAV photogrammetry, no real-time → COLMAP offline SfM (Schönberger 2016) — not SLAM but related.
  • Bin-picking / dense reconstruction → KinectFusion / ElasticFusion / BundleFusion (RGB-D), or for research-grade NICE-SLAM / Gaussian Splatting SLAM.
  • Humanoid robot perception subsystem → Kimera VIO + Hydra scene graph for high-level semantics.
  • Quadruped (Spot, ANYmal) → typically proprietary onboard with ICP + visual; open-source equivalent is LIO-SAM + Kimera.
  • Race-car high-speed autonomous → FAST-LIO2 + Cartographer with custom high-frequency loop-closure logic.
  • Surgical endoscope / minimally-invasive → DROID-SLAM or specialized learned monocular SLAM (textureless tissue).
  • Underwater (sonar/visual) → Pose-graph SLAM with sonar registration (DIDSON, Sonar-SLAM); often Kalman-filter-based with DVL/INS aiding.
  • Multi-robot fleet → Kimera-Multi or Swarm-SLAM or DOOR-SLAM; require communication-aware design.
  • Resource-constrained microcontroller / embedded → EKF-SLAM with hand-tuned landmark count, or scan-matching-only (Hector SLAM) with no global optimization.
  • GPS-denied long-range (subterranean / DARPA SubT-style) → LIO-SAM or FAST-LIO2 fused with thermal / radar; require resilience to dust, smoke, dynamic obstacles, and degraded LiDAR geometry (long featureless tunnels). The 2021 DARPA SubT-Final winners ran LIO-SAM variants with custom degeneracy detection.
  • Dynamic / non-rigid scene → DynaSLAM (Bescos 2018), DS-SLAM, VDO-SLAM — all extend ORB-SLAM with semantic masking or motion segmentation to ignore moving objects. Pure NeRF / 3DGS SLAM still struggles with dynamics.
  • Low-cost ground robot (Roomba-class) → wheel-odometry + 2D LiDAR with GMapping or Cartographer 2D; or pure visual with monocular ORB-SLAM3.

17. Datasets and benchmarks

Algorithm comparisons depend on a small set of standard datasets that have become the de facto evaluation harnesses:

  • KITTI (Geiger, Lenz, Stiller, Urtasun 2013 IJRR) — automotive stereo + Velodyne HDL-64 + GPS/INS ground truth; the canonical autonomous-driving SLAM benchmark. KITTI-360 (2022) extends with 360° panoramas.
  • EuRoC MAV (Burri et al. 2016 IJRR) — drone stereo + IMU + Vicon ground truth; the canonical visual-inertial benchmark for UAV-scale motion.
  • TUM RGB-D (Sturm, Engelhard, Endres, Burgard, Cremers 2012 IROS) — handheld RGB-D + mocap; the canonical indoor RGB-D evaluation.
  • TUM VI (Schubert, Goll, Demmel, Usenko, Stueckler, Cremers 2018) — visual-inertial handheld, fisheye stereo.
  • TartanAir (Wang et al. 2020) — large-scale photorealistic simulation, diverse environments and motion; used as the DROID-SLAM training corpus.
  • ScanNet (Dai 2017) and Replica (Straub 2019) — indoor RGB-D scene datasets used by neural-implicit SLAMs (iMAP, NICE-SLAM).
  • Newer College (Ramezani 2020) — Oxford handheld 3D LiDAR + IMU + cameras + survey-grade ground truth; LIO benchmark.
  • NCLT (Carlevaris-Bianco 2016, U-Michigan) — Segway long-term outdoor (15-month) LiDAR + camera; long-term SLAM benchmark.
  • Hilti SLAM Challenge (2021-2023) — industrial / construction environment, multi-sensor; informs LIO research.

Metrics: Absolute Trajectory Error (ATE) in RMSE meters after Sim(3) alignment is the dominant single-number metric (Sturm 2012); Relative Pose Error (RPE) measures drift over fixed-distance windows. For 3D map quality: Chamfer distance to ground-truth mesh, F-score at a threshold.

18. Theoretical underpinnings (brief)

  • Observability — Visual-inertial systems have four unobservable directions (global position xyz, yaw); pitch and roll are observable thanks to gravity. Visual-only mono SLAM additionally has scale unobservable (7 unobservable degrees: SE(3) gauge + scale). LiDAR-inertial: same four unobservable as VIO. Consistency-aware filtering (First-Estimates Jacobian, OC-EKF — Huang, Mourikis, Roumeliotis 2010) enforces these.
  • Information form vs covariance form — equivalent dual representations of the same Gaussian; information form is sparse when factor graphs are sparse, hence the dominance of information-form smoothers.
  • Gauge freedom — SLAM is defined up to a rigid transform of the global frame (and scale for monocular); the back-end fixes this by anchoring the first pose or by computing on the manifold modulo the gauge group.
  • Maximum-A-Posteriori (MAP) — the unifying view of modern SLAM: minimize the negative-log-posterior over poses+landmarks, subject to Gaussian (or robust-kernel) priors on measurements and motion. Under linear-Gaussian assumptions this reduces to weighted least-squares. The factor graph is just the Bayes-net’s factor-graph reduction.

19. Cross-references

20. Citations (primary)

  • Cadena, Carlone, Carrillo, Latif, Scaramuzza, Neira, Reid, Leonard. “Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age.” IEEE T-RO 32(6), 2016.
  • Davison, Reid, Molton, Stasse. “MonoSLAM: Real-Time Single Camera SLAM.” IEEE T-PAMI 29(6), 2007.
  • Klein, Murray. “Parallel Tracking and Mapping for Small AR Workspaces.” ISMAR 2007.
  • Engel, Koltun, Cremers. “Direct Sparse Odometry.” IEEE T-PAMI 40(3), 2018.
  • Mur-Artal, Tardós. “ORB-SLAM2: an Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras.” IEEE T-RO 33(5), 2017.
  • Campos, Elvira, Gómez Rodríguez, Montiel, Tardós. “ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual-Inertial and Multi-Map SLAM.” IEEE T-RO 37(6), 2021.
  • Hess, Kohler, Rapp, Andor. “Real-Time Loop Closure in 2D LIDAR SLAM.” ICRA 2016.
  • Shan, Englot, Meyers, Wang, Ratti, Rus. “LIO-SAM: Tightly-coupled Lidar Inertial Odometry via Smoothing and Mapping.” IROS 2020.
  • Xu, Cai, He, Lin, Zhang. “FAST-LIO2: Fast Direct LiDAR-Inertial Odometry.” IEEE T-RO 38(4), 2022.
  • Qin, Li, Shen. “VINS-Mono: A Robust and Versatile Monocular Visual-Inertial State Estimator.” IEEE T-RO 34(4), 2018.
  • Newcombe, Izadi, Hilliges, Molyneaux, Kim, Davison, Kohli, Shotton, Hodges, Fitzgibbon. “KinectFusion: Real-time Dense Surface Mapping and Tracking.” ISMAR 2011.
  • Whelan, Salas-Moreno, Glocker, Davison, Leutenegger. “ElasticFusion: Real-time Dense SLAM and Light Source Estimation.” IJRR 35(14), 2016.
  • Teed, Deng. “DROID-SLAM: Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras.” NeurIPS 2021.
  • Matsuki, Murai, Kelly, Davison. “Gaussian Splatting SLAM.” CVPR 2024.
  • Rosinol, Abate, Chang, Carlone. “Kimera: an Open-Source Library for Real-Time Metric-Semantic Localization and Mapping.” ICRA 2020.
  • Forster, Carlone, Dellaert, Scaramuzza. “On-Manifold Preintegration for Real-Time Visual-Inertial Odometry.” IEEE T-RO 33(1), 2017.
  • Grisetti, Stachniss, Burgard. “Improved Techniques for Grid Mapping with Rao-Blackwellized Particle Filters.” IEEE T-RO 23(1), 2007.
  • Smith, Self, Cheeseman. “Estimating Uncertain Spatial Relationships in Robotics.” In Autonomous Robot Vehicles, Springer, 1990.
  • Montemerlo, Thrun, Koller, Wegbreit. “FastSLAM: A Factored Solution to the Simultaneous Localization and Mapping Problem.” AAAI 2002.
  • Mourikis, Roumeliotis. “A Multi-State Constraint Kalman Filter for Vision-aided Inertial Navigation.” ICRA 2007.
  • Kümmerle, Grisetti, Strasdat, Konolige, Burgard. “g2o: A General Framework for Graph Optimization.” ICRA 2011.
  • Kaess, Johannsson, Roberts, Ila, Leonard, Dellaert. “iSAM2: Incremental Smoothing and Mapping Using the Bayes Tree.” IJRR 31(2), 2012.
  • Dellaert, Kaess. “Square Root SAM: Simultaneous Localization and Mapping via Square Root Information Smoothing.” IJRR 25(12), 2006.
  • Lu, Milios. “Globally Consistent Range Scan Alignment for Environment Mapping.” Autonomous Robots 4(4), 1997.
  • Bloesch, Burri, Omari, Hutter, Siegwart. “Iterated Extended Kalman Filter Based Visual-Inertial Odometry Using Direct Photometric Feedback.” IJRR 36(10), 2017.
  • Sucar, Liu, Ortiz, Davison. “iMAP: Implicit Mapping and Positioning in Real-Time.” ICCV 2021.
  • Zhu, Peng, Larsson, Xu, Bao, Cui, Oswald, Pollefeys. “NICE-SLAM: Neural Implicit Scalable Encoding for SLAM.” CVPR 2022.
  • Vizzo, Guadagnino, Mersch, Wiesmann, Behley, Stachniss. “KISS-ICP: In Defense of Point-to-Point ICP — Simple, Accurate, and Robust Registration If Done the Right Way.” IEEE RA-L 8(2), 2023.
  • Hughes, Chang, Carlone. “Hydra: A Real-Time Spatial Perception System for 3D Scene Graph Construction and Optimization.” RSS 2022.
  • Sarlin, DeTone, Malisiewicz, Rabinovich. “SuperGlue: Learning Feature Matching with Graph Neural Networks.” CVPR 2020.
  • Lindenberger, Sarlin, Pollefeys. “LightGlue: Local Feature Matching at Light Speed.” ICCV 2023.
  • Gálvez-López, Tardós. “Bags of Binary Words for Fast Place Recognition in Image Sequences.” IEEE T-RO 28(5), 2012.