LiDAR, Depth Cameras & RGB — Robotics Exteroception

See also (Tier 3 family index): Perception Sensors

1. At a glance

Where [[Robotics/sensors-pose-motion]] answers “where am I?”, this note answers “what’s outside me?“. Exteroception — the sensing layer that maps the environment — is what separates a controllable machine from an autonomous one. Without it, a robot can servo joints to commanded angles but cannot avoid a wall, pick a part off a moving conveyor, or stop for a pedestrian.

Four sensor families dominate exteroceptive perception in 2026, plus three emerging:

  • RGB cameras — pinhole-projection imagers, 1–60+ Mpx, 30–240 fps. Passive (need ambient light), texture-rich, no native depth. Bayer-filter CMOS sensors are the universal substrate (Sony IMX series, ON Semi AR series, OmniVision OV series).
  • Depth cameras — RGB-D devices that emit their own illumination and recover Z per pixel:
    • Stereo (passive triangulation between two RGB cameras) — ZED, RealSense D435/D455.
    • Structured light (project a known IR pattern, observe deformation) — Kinect v1, RealSense D435 (assisted stereo).
    • Time-of-Flight (ToF) — measure photon round-trip time directly (dToF: Kinect Azure, iPhone Lidar, Lucid Helios) or AMCW phase shift (iToF: Intel L515 (EoL), Microsoft AzureKinect-class).
  • LiDAR — scanning or flash laser ranging at 200 m+:
    • Mechanical spinning — 360° horizontal, fixed vertical lines (Velodyne HDL/VLP, Ouster OS series, Hesai Pandar). 5–25 Hz frame rate; 16 to 128+ channels.
    • Solid-state MEMS or OPA — fixed FoV, microelectromechanical or optical-phased-array beam steering (Innoviz, Luminar Iris+, Hesai AT128). No moving parts at the macro scale; automotive-qualified.
    • FMCW LiDAR — coherent detection, simultaneous range and radial velocity (Aeva Aeries II, Mobileye Lumotive). Immune to interference from other LiDARs.
    • Flash LiDAR — single-shot wide-FoV illumination, no scanning; range-limited.
  • mm-wave radar — 24, 60, 77, 79 GHz FMCW radar producing range–Doppler–azimuth tensors. Robust through fog, rain, dust, and snow. Continental ARS548, Bosch LRR, TI AWR2944. Lower angular resolution than camera or LiDAR; resolves radial velocity directly.

Emerging:

  • Event cameras — per-pixel temporal-contrast (DVS / DAVIS / Metavision) with µs-class latency and HDR > 120 dB; Prophesee, iniVation. Sparse async output.
  • Polarization cameras — Sony IMX264MZR / IMX250MZR per-pixel micropolarizer; specular vs diffuse separation, glare suppression, transparent-object handling.
  • Hyperspectral / NIR — line-scan or snapshot tens-to-hundreds of bands; Headwall, Specim, Cubert. Material classification in agriculture, recycling, food sorting.

First ask before selecting:

  1. Range — 0.1 m (cobot fingertip ToF), 5 m (indoor mobile robot), 100 m (campus AGV), 250 m (highway autonomy)?
  2. Refresh rate — 10 Hz mapping, 30 Hz tracking, 100 Hz reactive avoidance, 1 kHz VR?
  3. Illumination & weather — controlled indoors, daylight outdoors, fog/rain/snow, night?
  4. Reflectance — black tyre vs white wall vs glass vs sky?
  5. Bandwidth & compute — what does the downstream perception stack ingest (and where does it run)?
  6. Cost, SWaP, certification80 k automotive LiDAR; cobot-class IP67 to military MIL-STD-810.

The dominant trend through 2026 is sensor fusion as the default architecture: there is no single sensor that wins on every axis, so production autonomy stacks combine cameras (semantics, density), LiDAR (geometric truth, range), radar (velocity, weather), and IMU (ego-motion, see [[Robotics/sensors-pose-motion]]).

2. First principles

RGB camera — projection model

A pinhole camera maps a 3D point P = (X, Y, Z) in the camera frame to a 2D pixel p = (u, v) via the intrinsic matrix K:

[u]       [fx  0  cx] [X/Z]
[v]  =    [ 0 fy  cy] [Y/Z]
[1]       [ 0  0   1] [ 1 ]

with focal lengths f_x, f_y in pixels (= focal-length-mm × pixels-per-mm), principal point (c_x, c_y) at the optical axis, and the standard OpenCV convention +Z forward, +X right, +Y down. Real lenses add distortion — the radial-tangential Brown-Conrady model is the default in OpenCV calibration:

x' = x·(1 + k1·r² + k2·r⁴ + k3·r⁶) + 2·p1·x·y + p2·(r² + 2·x²)
y' = y·(1 + k1·r² + k2·r⁴ + k3·r⁶) + p1·(r² + 2·y²) + 2·p2·x·y

with r² = x² + y² in normalised coordinates. Wide-angle lenses (FoV > 100°) need a higher-order model — Kannala-Brandt fisheye (4 parameters), MEI omnidirectional, or division model.

The image sensor itself is a Bayer-pattern CMOS array: 2×2 RGGB mosaic over photodiodes; the missing channels at each pixel are interpolated (“demosaiced”, typically AHD or Malvar-He-Cutler). Image-signal processor (ISP) on the sensor or host applies black-level correction, lens shading correction, white balance, demosaic, denoise, sharpen, gamma, tone mapping. For computer vision, capture RAW Bayer or 12–16 bit linear before the ISP destroys photometric linearity.

Stereo depth — triangulation

Two rectified cameras with parallel optical axes, baseline B, focal length f (pixels):

Z = f · B / d

with disparity d = u_L − u_R the horizontal pixel offset of corresponding features. Disparity is computed by stereo matching: block-match (BM), semi-global match (SGBM, Hirschmüller 2008), or learned (PSMNet, RAFT-Stereo, FoundationStereo).

Differentiating gives the depth uncertainty:

σ_Z = (Z² / (f · B)) · σ_d

Depth accuracy degrades quadratically with range. Doubling baseline halves the σ_Z at any given range, but increases the minimum range (sub-baseline parallax becomes a vanishing disparity) and shrinks the stereo-overlap FoV.

Structured light

Project a known pseudo-random IR pattern (Kinect v1 used a fixed-position diffractive optical element); a single IR camera observes the pattern’s deformation in 3D and triangulates each pattern element against its design position. Effective baseline ≈ projector-to-sensor distance. Works well indoors at 0.5–5 m with controlled IR exposure; saturates under direct sunlight because solar IR overwhelms the projected pattern. Intel RealSense D-series uses active stereo (two IR cameras + IR pattern projector) — the pattern adds texture to featureless surfaces but disparity is still computed by stereo matching; the projector is effectively a textured “fill light”.

Time-of-flight

  • Direct ToF (dToF) — emit a short laser pulse (ns) and time the photon arrival at a single-photon avalanche diode (SPAD) array. Range = c · Δt / 2. Sony IMX516, ams VL53L8, Apple LiDAR (Sony custom). Excellent accuracy (mm-class), wide dynamic range, sun-tolerant.
  • Indirect ToF (iToF) — amplitude-modulate the illumination at frequency f_m (10–100 MHz), measure phase shift φ between emitted and reflected; Z = (c · φ) / (4π · f_m). Cheaper sensor (CMOS PMD pixel) but suffers range aliasing at d_max = c / (2 · f_m), multi-path interference, and sun saturation.

The fundamental ToF precision floor is photon-shot-noise limited:

σ_Z ≈ (c / (4π · f_m · SNR)) · √(BW)    [for iToF]
σ_Z ≈ (c · σ_t / 2)                     [for dToF, σ_t = SPAD jitter]

A typical iToF at 80 MHz modulation with SNR = 100 (well-lit scene) achieves σ_Z ≈ 3 mm at 1 m. dToF with 200 ps SPAD jitter achieves σ_Z ≈ 30 mm independent of range (until photon-count drops).

LiDAR — scanning principles

Each laser pulse measures one range sample. Frame rate × points-per-frame = point rate (~10⁵ to 10⁷ points/s). Three optical-architecture families:

  1. Mechanical spinning — a stack of N laser-detector pairs rotates 360°. Vertical resolution = N (fixed, set by laser stack). Horizontal resolution = pulses-per-rotation. Velodyne VLP-16/32, Ouster OS0/1/2 (64–128 channels using VCSEL arrays + SPAD focal-plane), Hesai Pandar.
  2. Solid-state MEMS — a single (or few) laser-detector pair sweeps across a fixed FoV via a vibrating mirror (~1 mm Si MEMS). No 360° coverage but no macro moving parts — automotive-grade reliability. Innoviz Two, Luminar Iris+, Hesai AT128.
  3. Optical phased array (OPA) — array of phase-controlled emitters steers a beam electronically with zero moving parts. Mobileye/Lumotive in 2024–2026 ramp. Promising but immature compared to MEMS.

FMCW is orthogonal to scanning architecture: instead of a pulse, emit a frequency-chirped continuous wave; the beat between the transmitted and returned signal gives range and radial velocity simultaneously. Coherent detection at the photodiode rejects all incoherent light — sun-insensitive and immune to cross-talk from other LiDARs at the same wavelength. Aeva Aeries II, Mobileye, Scantinel. Cost has historically been the barrier.

mm-wave radar

77 GHz automotive radars are FMCW: emit a linear frequency ramp (“chirp”) of duration T_c and bandwidth B_w. The beat frequency f_b = (2 · R · B_w) / (c · T_c) gives range R; Doppler shift across multiple chirps gives radial velocity; an antenna array (MIMO virtual array) gives azimuth and elevation by digital beamforming.

range resolution    ΔR = c / (2 · B_w)
velocity resolution Δv = λ / (2 · T_frame)
angular resolution  Δθ ≈ λ / (N · d_ant)

A TI AWR2944 with B_w = 4 GHz achieves ΔR = 3.75 cm at 250 m max range; a 4 Tx × 4 Rx MIMO array gives 16 virtual antennas → ~3.5° azimuth resolution. Weather robustness comes from the cm-class wavelength (λ = 3.9 mm at 77 GHz) being much larger than typical fog droplets and raindrops; attenuation is ~0.1 dB/km in heavy rain vs ~30 dB/km for 905 nm LiDAR.

Event cameras

A DVS / DAVIS pixel asynchronously emits a “+1” or “−1” spike when its log-intensity changes by ≥ a contrast threshold C (typically 10–25 %). No frame rate; output is a stream of (x, y, t, polarity) events at µs timestamps. Dynamic range > 120 dB (vs ~70 dB for a standard CMOS); native temporal resolution 1 µs. Prophesee Genx320 / EVK4, iniVation DVXplorer. The event stream demands a different algorithmic stack (asynchronous CNNs, spiking networks, event-by-event SLAM) — see [[Robotics/computer-vision-robotics]].

3. Practical math / worked examples

Worked example A — Stereo depth accuracy at range

A Stereolabs ZED 2i has baseline B = 12 cm and rectified focal length f = 1400 px (at 1080p / HD mode, 110° HFoV lens). Block-matching disparity precision on a well-textured surface is σ_d ≈ 0.5 px (good lighting, sub-pixel refinement enabled).

At Z = 1 m :  σ_Z = (1² · 0.5) / (1400 · 0.12)        = 0.003 m = 3 mm
At Z = 5 m :  σ_Z = (5² · 0.5) / (1400 · 0.12)        = 0.074 m = 7.4 cm
At Z = 10 m:  σ_Z = (10² · 0.5) / (1400 · 0.12)       = 0.30 m  = 30 cm
At Z = 20 m:  σ_Z = (20² · 0.5) / (1400 · 0.12)       = 1.19 m

Implication. Indoor mobile robot at 5 m: stereo is fine. Curbside autonomous vehicle at 50 m: stereo is useless — for the ZED 2i, σ_Z at 50 m is 7.4 m. Need LiDAR, or a stereo rig with B = 1 m (e.g. truck-roof-mounted), or fuse with monocular metric-depth networks (DepthAnything-V2, Metric3D, ZoeDepth) that exploit perspective cues to constrain absolute scale.

Baseline-tuning rule of thumb: pick B ≈ Z_max / 100 for ~1 % depth precision at maximum range. For Z_max = 20 m, B ≈ 20 cm; for Z_max = 100 m, B ≈ 1 m.

Worked example B — LiDAR point density on a wall

Velodyne VLP-16 specs: 16 lasers across 30° vertical FoV (so 2.0° vertical spacing), 0.1–0.4° horizontal angular resolution (configurable), 10 Hz spin rate, 600 000 points/s total firing rate (single-return mode).

At 30 m range, the lateral spacing of returns on a wall normal to the LiDAR is:

horizontal:  30 m · tan(0.2°) = 0.105 m ≈ 10.5 cm
vertical:    30 m · tan(2.0°) = 1.05 m

So one full sweep at 30 m gives a roughly 10 cm × 105 cm grid — very anisotropic, “rakes” with big vertical gaps. A pedestrian at 30 m (1.7 m tall, 0.4 m wide) gets hit by ~2 vertical lines × 4 horizontal columns = ~8 returns total. Detection works but classification is hard.

Compare Ouster OS1-128 at 30 m:

horizontal:  30 m · tan(0.18°)  = 0.094 m
vertical:    30 m · tan(45°/128) = 0.184 m

Same pedestrian: ~9 vertical × 4 horizontal = ~36 returns, sufficient for class label from PointPillars / CenterPoint. Modern 128-channel sensors are 5–10× denser at any given range, which is why automotive LiDAR moved decisively from 16/32 channels in 2018 to 128–300+ channels in 2024.

Worked example C — Camera intrinsics calibration (OpenCV)

Standard pinhole calibration procedure with cv2.calibrateCamera:

  1. Print a 9×6 checkerboard with 25 mm square side. Mount flat on a rigid backing.
  2. Capture 20–30 views at varied angles and translations across the FoV. Ensure the board covers all four image quadrants in at least 5 views.
  3. Detect inner corners with cv2.findChessboardCornersSB (better sub-pixel than the legacy version).
  4. Call cv2.calibrateCamera(objectPoints, imagePoints, imageSize, None, None) → returns K (intrinsics 3×3), dist (5 distortion coefficients: k1, k2, p1, p2, k3), per-view rotation/translation, and per-view reprojection error.

Expected reprojection error:

RMS reproj errorDiagnosis
< 0.2 pxExcellent — research-grade
0.2–0.5 pxGood — production-fine
0.5–1.0 pxMarginal — recapture with more views
> 1.0 pxBad — likely motion blur, bad pattern detection, or wrong distortion model

For wide-FoV lenses (> 100°), switch to cv2.fisheye.calibrate (Kannala-Brandt, 4 coeffs). For multi-camera rigs (stereo, multi-cam), calibrate intrinsics first, then extrinsics with cv2.stereoCalibrate — fix the intrinsics by passing flags=CALIB_FIX_INTRINSIC. Re-calibrate annually or after any mechanical shock; thermal drift of plastic lens housings can shift principal point by 1–3 px over 50 °C, enough to matter for SLAM loop closure.

Worked example D — Camera bandwidth at 4K @ 60 fps

A 4K UHD (3840×2160) sensor at 60 fps, 12-bit RAW:

Pixels/frame  = 3840 · 2160                = 8.29 Mpx
Bits/frame    = 8.29 Mpx · 12              = 99.5 Mb
Bytes/frame   = 99.5 / 8                   = 12.4 MB
Bytes/s (raw) = 12.4 MB · 60 fps           = 746 MB/s

Transport options:

InterfaceBandwidthNotes
USB 3.2 Gen 21.25 GB/sOK; uncompressed possible at 4K30 only without ISP compression
USB 3.2 Gen 2×22.5 GB/sRare on host side
10 GbE1.0 GB/s usableIndustrial standard; GigE Vision protocol
MIPI CSI-2 D-PHY 4-lane9 Gbps (4×2.5 Gbps)Direct sensor → SoC on embedded boards
MIPI CSI-2 C-PHY 3-trioup to 22 GbpsNewer Jetson Orin, Apple
GMSL2 (Maxim)6 GbpsAutomotive, single coax up to 15 m
FPD-Link III (TI)4.16 GbpsAutomotive, same role as GMSL
CoaXPress 1212.5 GbpsIndustrial; one coax per lane

For 8 cameras at 4K60 (a typical autonomous-vehicle perimeter), aggregate raw bandwidth is ~6 GB/s — necessitating either on-sensor JPEG/H.265 compression, ISP-side downsampling, or sustained Jetson AGX Orin / TI TDA4VM-class compute. Lossy compression destroys photometric linearity; ML inference is robust to this, but stereo matching and photometric SLAM (DSO, LSD-SLAM) suffer.

Worked example E — Radar range, velocity, angular resolution

TI AWR2944 cascaded MIMO radar at 77 GHz: B_w = 4 GHz, T_c = 25 µs chirp, T_frame = 50 ms, 4 Tx × 4 Rx → 16 virtual antennas with d_ant = λ/2.

ΔR  = c / (2 · B_w)         = 3·10⁸ / (2 · 4·10⁹)       = 0.0375 m = 3.75 cm
Δv  = λ / (2 · T_frame)     = 0.0039 / (2 · 0.05)       = 0.039 m/s ≈ 0.14 km/h
Δθ  ≈ λ / (N · d_ant)        = 0.0039 / (16 · 0.00195)   = 0.125 rad ≈ 7.2° (uniform)

With non-uniform MIMO and super-resolution (MUSIC, ESPRIT), angular resolution drops to ~1–2° — competitive with low-end LiDAR. Range × resolution trade-off is fundamental: doubling chirp duration halves range resolution but doubles unambiguous velocity range.

4. Design heuristics

Range vs sensor-family mapping

RangeRecommended primaryBackup
0–0.5 mToF (VL53L8) or capacitiveVision proximity
0.5–2 mStructured light (RealSense D435), dToF (Lucid Helios)RGB monocular
2–10 mActive stereo (RealSense D455), passive stereo (ZED 2i)Indoor LiDAR (Livox Mid-360)
10–100 mSpinning LiDAR (Ouster OS1, Hesai QT128)Stereo with B = 0.5 m
50–300 mSolid-state LiDAR (Luminar Iris+, Innoviz Two) + mm-wave radarLong-FL telephoto camera
> 300 mmm-wave radar (Continental ARS548 LRR)Adaptive long-range LiDAR

Indoor vs outdoor

EnvironmentWorksMarginalFails
Office, lab, warehouseRGB, stereo, ToF, structured light, LiDARMagnetometer
Outdoor daylightRGB, stereo, LiDAR, radariToF (sun saturation)Structured light (Kinect pattern washes out)
Outdoor nightLiDAR, radar, NIR-illuminated cameraRGB without illuminationPassive stereo
Direct sunlight on lensLiDAR (905 nm narrow-band filter), radarRGB (lens flare)iToF, sunlit IR ToF
Fog / mistRadar, FMCW LiDAR (partial)Direct-detection LiDARRGB, stereo
Heavy rainRadarLiDAR (degraded ~30–50 %)RGB (water on lens)
Snow / falling snowRadarLiDAR (snowflakes = false returns)RGB
Dust stormRadarLiDAR (saturated)RGB

Update rate vs application

ApplicationRequired rate
Static map building5–10 Hz
Mobile robot navigation (≤ 2 m/s)10–20 Hz
Autonomous vehicle (highway)10–25 Hz LiDAR, 20–30 Hz radar, 30–60 Hz camera
Cobot collision avoidance30–60 Hz
Drone obstacle avoidance (10 m/s)60–120 Hz
VR / AR head-tracking90+ Hz camera + 1 kHz IMU
High-speed picking240+ Hz event camera or strobed RGB
Autonomous racing (Indy AC)100+ Hz LiDAR, 200+ Hz camera

Sensor synchronisation

Software timestamps over Ethernet/USB drift by 1–50 ms with non-deterministic latency. For any multi-sensor stack:

  • Hardware trigger all sensors that support it from a common pulse — typically a 1 PPS signal from a GNSS receiver or an FPGA-generated sync line.
  • PTP / IEEE 1588 ([[Languages/Tier3/ros2-robotics-config]]) on every sensor’s network interface. Modern industrial cameras (Basler ace2, FLIR Blackfly S Mono GigE) support PTP timestamps directly.
  • Hardware timestamping at the capture buffer: e.g. NVIDIA Jetson MIPI CSI-2 has a per-frame hardware timestamp captured at start-of-frame, exposed through Argus/V4L2.
  • For LiDAR, treat each point (not each frame) as having its own timestamp — points within a single 100 ms scan span 100 ms of ego-motion, and motion-deskewing with IMU integration is essential at any vehicle speed.

Camera selection ladder

RequirementPick
Hobby / prototypingRaspberry Pi Camera v3 (Sony IMX708), Arducam
Industrial machine visionBasler ace2, FLIR Blackfly S, Allied Vision Alvium
Outdoor automotiveSony IMX490, Sony IMX728, ON Semi AR0820AT (LFM, HDR, AEC-Q100)
Drone obstacle avoidGlobal-shutter rolling-replacement (Sony IMX296, OmniVision OV9281)
HDR scene (tunnel exits, welding)LFM-equipped sensors (Sony IMX390, IMX490)
High-speedSony IMX250 (5 Mpx, 163 fps), Vision Datum LU series
Low-light / NIRSony Starvis IMX462, IMX485; intensified Onsemi KAE
PolarizationSony IMX250MZR / IMX264MZR (mono polar)
EventProphesee Genx320 (320×320, automotive), IMX636 (1280×720)

Global vs rolling shutter. Rolling-shutter sensors expose row-by-row over typically 5–30 ms. On any moving robot or any sensor moving through space, this introduces rolling-shutter skew — straight lines bow, fast objects shear. SLAM and VIO algorithms break down or require explicit rolling-shutter compensation (Hedborg 2012). For drones, AGVs, automotive: pay the BoM premium for global shutter (~2× cost).

Stereo geometry tuning

  • Baseline rule: B ≈ Z_max / 100 for ~1 % depth precision at range.
  • Focal length trades FoV for angular pixel resolution; a 50° HFoV at 1080p ≈ 2000 px/rad ≈ 0.029°/pixel.
  • Disparity search range d_max = f · B / Z_min sets compute cost; SGBM/RAFT cost scales linearly with disparity range. Limit by setting a sane minimum range (e.g. Z_min = 0.5 m for indoor robot, 5 m for highway).
  • Texture is everything. Stereo matching collapses on blank walls, glass, ceilings. Hence “active stereo” with IR pattern projector (RealSense D435), or accept holes and fuse with other modalities.

GPU vs FPGA vs DSP

PipelineBest fitWhy
Prototyping CNN inferenceGPU (Jetson Orin, RTX)Toolchain, batch size, flexibility
Production CNN inference on cost-sensitive embeddedNPU / accelerator (Hailo-8, Coral Edge TPU, Qualcomm AI engine)TOPS/W
Safety-rated fixed-latency visionFPGA (Xilinx Kria K26, Lattice CrossLink)Deterministic latency, ISO 26262
Stereo SGM at 4K60FPGA or fixed-function ISP blockMemory bandwidth, parallel disparity sweep
Event-camera processingFPGA / neuromorphic (Loihi 2, Akida)Asynchronous spike processing
Low-power VIO on droneDSP + tightly-coupled IMU (Qualcomm Hexagon, Snapdragon Flight)mW-scale power

5. Components & sourcing — real parts

RGB cameras (industrial / robotics)

VendorFamilySensor formatNotes
Baslerace 2, boost, dartup to 24 MpxGigE Vision, USB3 Vision, MIPI-CSI variants
FLIR / TeledyneBlackfly S, Oryx, Forgeup to 65 MpxHigh SNR; industrial standard
Allied VisionAlvium 1500/1800, Mako, Mantaup to 31 MpxMIPI-CSI for embedded, GigE for stationary
Lucid VisionTriton, Atlasup to 65 MpxIP67-rated outdoor
IDSuEye+, EnsensobroadGerman machine vision incumbent
VieworksVC, VP, VTup to 250 MpxInspection, high-resolution
The Imaging SourceDFK / DMK5–20 MpxLower cost, USB3
ArducamPi HQ, Orin moduleembeddedMIPI-CSI for Jetson, Pi
Leopard ImagingLI-IMX477, GMSL2 modulesvariedAutomotive over GMSL
e-con SystemsSee3CAM, NileCAMvariedMIPI for Jetson / RB5

Underlying CMOS sensor families (sourced separately on production boards): Sony IMX477 (12 Mpx, 60 fps, used in Pi HQ Camera), IMX585 (8 Mpx Starvis-2 low light), IMX490 (5.4 Mpx 120 dB LED-flicker-mitigated automotive), IMX661 (127 Mpx APS-H, machine vision); ON Semi AR0820AT (8 Mpx HDR automotive); OmniVision OV9281 (1 Mpx mono global shutter, drone VIO).

Stereo and RGB-D cameras

VendorModelTechRangeNotes
StereolabsZED 2iPassive stereo, B = 12 cm0.3–20 m1080p120, IP66, integrated IMU
StereolabsZED X / X MiniActive stereo + GMSL20.2–20 mAutomotive/AGV; sun-tolerant
Intel RealSenseD435i / D435fActive stereo + IR projector + IMU0.3–10 mEoL announced; still in field
Intel RealSenseD455 / D456Active stereo, B = 9.5 cm0.6–20 mD456 is IP65 / IP67
Intel RealSenseD457GMSL2 industrial0.6–20 mDesigned for AGV / fixed installs
OrbbecFemto Bolt, Femto MegadToF + RGB0.25–5.46 mAzure Kinect successor
OrbbecGemini 2 / 335Structured light + RGB0.15–10 mIndoor robotics
Lucid VisionHelios 2dToF (Sony IMX556)0.3–8.3 mIP67, GigE; sun-tolerant
PhotoneoPhoXi 3D Scanner, MotionCam 3DStructured light0.5–6 mHigh-precision bin picking
ZividZivid 2, Zivid 2+Structured light0.3–3 mSub-mm bin picking
Mech-MindMech-Eye Pro M / SStructured light0.3–3 mChinese bin-picking standard
PickitPickit MStructured light0.4–2.5 mIndustrial bin pick
MicrosoftAzure Kinect (EoL 2023)dToF + RGB + 7-mic0.5–5.5 mStill in research use

LiDAR — spinning

VendorModelChannelsRangeNotes
Velodyne (now Ouster)VLP-16 (Puck)16100 mLegacy 2014 standard; cheap on eBay
Velodyne (now Ouster)HDL-32E / HDL-64E32 / 64100 / 120 mDARPA-era; obsolete in new builds
OusterOS0-64 / OS0-12864 / 12835 m90° vertical FoV; near-field
OusterOS1-32 / OS1-12832 / 128200 m / 120 mMainstream AGV/AMR sensor
OusterOS2-64 / OS2-12864 / 128240 mLong-range autonomy
HesaiPandar QT64 / QT12864 / 12830 / 50 mShort-range automotive perception
HesaiPandar 40P / 64 / 12840 / 64 / 128200 mRobotaxi standard in China
HesaiPandar XT32M2X32300 mLong-range single-return
RoboSenseRS-LiDAR-16 / 32 / 8016 / 32 / 80150–250 mLower-cost mechanical
LivoxMid-360 / HAP / Avianon-repetitive40–450 mPetal-pattern, low-cost
CeptonVista-X90 / Novasolid-state MMT150–200 mFrictionless mirror tech

LiDAR — solid-state and FMCW

VendorModelArchitectureRangeNotes
InnovizInnoviz One, Innoviz TwoMEMS solid-state250–300 mTier-1 automotive (BMW, VW)
LuminarIris, Iris+1550 nm fiber laser + MEMS300–500 mVolvo EX90, Mercedes
HesaiAT128 / AT512 / ATXMEMS-like 1D scanning200–300 mHighest-volume L3 automotive
AevaAeries IIFMCW MEMS500 mVolkswagen, Daimler Truck
MobileyeLumotive OPAOPA solid-statevaries2025–2026 ramp
ValeoScala 3mechanical scanning200 mFirst-mass-production AD LiDAR (Audi A8 2018 onward)
ContinentalHRL131flash + scanning hybrid150 mTier-1 automotive
Sense PhotonicsOsprey / Sense Oneflash dToF30–100 mIndustrial short-range
CeptonNovaMMT solid-state30 mCompact, near-field

mm-wave automotive radar

VendorModelBandNotes
ContinentalARS548 LRR77 GHz4D imaging radar, 300 m
ContinentalSRR520 SRR77 GHzShort range, blind-spot
BoschLRR4 / LRR577 GHzMainstream ACC radar
AptivRACam (radar + camera)77 GHzIntegrated front sensor
TIAWR1843 / AWR2243 / AWR294477 GHzReference SoC / cascaded chips
NXPTEF82xx / S32R4577 GHzRFCMOS + processor stack
ArbePhoenix79 GHz4D imaging, 2304 virtual channels
UhnderF1 (digital-code mod)77 GHzFirst DCM (CDMA) automotive radar
SmartmicroUMRR-9677 GHzAGV / industrial

Event cameras and emerging

VendorModelResolutionNotes
PropheseeEVK4 (IMX636)1280×720Industrial dev kit
PropheseeGenx320320×320Automotive, ultra-low power
iniVationDVXplorer / DAVIS346640×480 / 346×260Research
SonyIMX636 / IMX637up to HDDirect Sony silicon (also in EVK4)

Compute pairing

PlatformTOPS / NPUCamera lanesNotes
NVIDIA Jetson Orin Nano (8 GB)40 TOPS4 × MIPI CSI-2Hobby + prosumer robotics
NVIDIA Jetson Orin NX 16 GB100 TOPS8 × MIPI CSI-2Drone / AMR
NVIDIA Jetson AGX Orin 64 GB275 TOPS16 × MIPI CSI-2Autonomous vehicle research
NVIDIA DRIVE Orin / Thor254 / 1000 TOPSmanyProduction automotive
Qualcomm RB5 / RB615 / 30+ TOPS7 × MIPI CSI-2Snapdragon Flight drone
Intel NUC + Arc A380~10 TFLOPS GPUUSB / PCIeIndoor robotics
Hailo-8 / Hailo-1026 / 40 TOPSPCIe addonEdge inference accelerator
AMD/Xilinx Kria K26 / KV260~1.2 TOPS (Zynq DPU)MIPI CSI-2, nativeFPGA fixed-latency pipelines
Google Coral Edge TPU4 TOPSUSB/PCIeQuantised int8 inference
TI TDA4VM / TDA4AEN8 TOPS + DSP4 × CSI-2Automotive ECU
Mobileye EyeQ6H34 TOPSmanyRobotaxi-grade
Apple M2 / M4 SoC16–38 TOPS NPUshared LPDDRARKit, iPhone Lidar pipeline

6. Reference data

Sensor-family comparison (quick reference)

FamilyRangeAccuracyRefreshCostLight requiredWeather
RGB monocularinfinite (no depth)n/a depth30–240 fps5kyespoor
Passive stereo0.3–30 m1 cm @ 3 m, 30 cm @ 10 m30–120 fps2kyespoor
Active stereo (IR)0.2–10 m1 cm @ 1 m, 10 cm @ 5 m30–90 fps2ksun-degradesindoor-fine
Structured light0.15–5 m1 mm–1 cm5–30 fps$1–25ksun-degradesindoor-only
iToF0.3–8 m3 mm–3 cm15–60 fps3ksun-saturatesindoor-fine
dToF0.1–10 m3 mm–3 cm15–60 fps5ksun-tolerantindoor + outdoor day
Spinning LiDAR0.5–300 m2–5 cm5–25 Hz80knorain ~30 % loss
Solid-state LiDAR1–500 m3–10 cm10–25 Hz20knorain ~30 % loss
FMCW LiDAR1–500 m3 cm + 0.1 m/s10–20 Hz30knorain ~20 % loss
mm-wave radar0.2–300 m4 cm range, 0.1 m/s10–50 Hz500noexcellent
Event cameran/a depth1 µs latencyasync5kno (HDR)excellent (wide DR)

LiDAR vendor / model table (2026)

Vendor / ModelChannelsMax range (10 % refl)FoV (H×V)Points/sFrame rateWavelength
Velodyne VLP-1616100 m360°×30°600 k5–20 Hz903 nm
Ouster OS1-128128200 m (at 10 %)360°×45°5.2 M10–20 Hz865 nm
Ouster OS2-128128240 m360°×22.5°5.2 M10 Hz865 nm
Hesai Pandar 128128200 m360°×40°3.5 M10–20 Hz905 nm
Hesai AT128128200 m120°×25.4°1.5 M10 Hz905 nm
Hesai QT12812850 m360°×104.2°3.5 M10 Hz905 nm
Livox Mid-360non-repeat40 m360°×59°200 k10 Hz905 nm
Innoviz Twon/a (solid)300 m120°×40°varies10–20 Hz905 nm
Luminar Iris+n/a500 m120°×26°varies10–30 Hz1550 nm
Aeva Aeries IIFMCW500 m120°×30°1.6 M10–20 Hz1550 nm
RoboSense M1solid200 m120°×25°750 k10 Hz905 nm
Valeo Scala 3mech200 m133°×11°varies25 Hz905 nm
Continental HRL131hybrid150 m120°×28°varies25 Hz905 nm

RGB-D camera comparison

ModelTechDepth resolutionFrame rateMin/Max rangeIMUCost (USD)
RealSense D435iActive stereo1280×72090 fps0.3–3 m (typ)yes$300
RealSense D455Active stereo1280×72090 fps0.6–6 myes$400
RealSense D457Active stereo (GMSL)1280×72090 fps0.6–20 myes$700
ZED 2iPassive stereo2208×1242100 fps0.3–20 myes$500
ZED XActive+GMSL21920×120060 fps0.2–20 myes$1500
Orbbec Femto BoltdToF1024×102430 fps0.25–2.88 myes$400
Orbbec Femto MegadToF1024×102430 fps0.25–5.46 myes$600
Lucid Helios 2dToF640×48030 fps0.3–8.3 mno$1700
Photoneo PhoXi MStructured light2064×15440.25 fps (full)0.5–2 mno$13k
Zivid 2+ M60Structured light1944×12005 fps0.6–2 mno$14k
Microsoft Azure Kinect (EoL)dToF1024×102430 fps0.5–5.46 myes$400 used

Camera RAW bandwidth (uncompressed)

ResolutionfpsBit depthBandwidth
VGA 640×48060817.6 MB/s
720p 1280×72030826.4 MB/s
1080p 1920×108030859.3 MB/s
1080p 1920×10806012178 MB/s
4K 3840×21603012373 MB/s
4K 3840×21606012746 MB/s
8K 7680×432030121.49 GB/s
12 Mpx60121.08 GB/s
24 Mpx30121.08 GB/s
50 Mpx30122.25 GB/s

GPU / accelerator inference rates (typical)

PlatformYOLOv8-S @ 640PointPillars (LiDAR)DepthAnything-V2 Small
Jetson Orin Nano (8 GB)~60 fps INT8~10 Hz~15 fps
Jetson Orin NX 16 GB~140 fps INT8~25 Hz~35 fps
Jetson AGX Orin 64 GB~330 fps INT8~70 Hz~80 fps
Hailo-8~150 fps INT8n/a (no point-net)~25 fps
Coral Edge TPU~30 fps INT8 (limited model size)n/an/a
RTX 4090~1500 fps FP16~250 Hz~400 fps

(Numbers vary substantially with batch size, INT8/FP16 quantisation, and TensorRT vs ONNX runtime. Treat as order-of-magnitude.)

Weather robustness matrix

SensorClearLight rainHeavy rainFog 50 mSnowDustDirect sun
RGB camera★★★★★★★★★★ (flare)
Stereo★★★★★★★★★★
Structured light★★★★★n/a outdoorn/a outdoorn/a outdoorn/a outdoorn/a outdoor
ToF (iToF)★★★★★★★
ToF (dToF, narrowband)★★★★★★★★★★★★★★★★★★★
LiDAR (905 nm)★★★★★★★★★★★★★★★★★★★★★★
LiDAR (1550 nm)★★★★★★★★★★★★★★★★★★★★★★★★★
FMCW LiDAR (1550 nm)★★★★★★★★★★★★★★★★★★★★★★★★★
mm-wave radar★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★
Event camera★★★★★★★★★★★★★★★★★★ (HDR)

7. Failure modes & debugging

  • Lens flare and ghosting — sun, headlights, or bright LEDs in or near frame produce internal reflections off lens elements. Causes phantom features and false detections downstream. Mitigation: lens hood / baffle, anti-reflection multi-coating on the lens, LFM-equipped sensors for periodic LEDs.
  • Rolling-shutter skew — straight vertical edges bow into S-curves on moving platforms; fast objects shear. Detection: stationary check pattern with the robot moving — should be straight. Fix: switch to global-shutter sensor (Sony IMX296, IMX392, IMX422; ON Semi AR0234) for any platform faster than ~1 m/s.
  • Auto-exposure hunting — sun briefly behind a building, then in frame, sends AE chasing back and forth on a 100 ms scale, momentarily over/under-exposing every frame. Disable AE in mission mode; fix exposure from a calibrated lookup keyed to ambient lux, or use HDR sensors with 120+ dB intra-frame dynamic range.
  • Depth holes — low texture (stereo) — blank walls, ceilings, glass. Stereo matchers return no correspondences. Fix: assisted stereo (IR pattern projector), fuse with ToF/LiDAR.
  • Depth holes — specular reflections (ToF) — mirrors, polished metal, water. Photons bounce away from the sensor. Fix: polarization camera variant; LiDAR; flag-and-discard region.
  • Depth holes — transparent surfaces — glass walls and windows transmit IR. Fix: same as specular; or treat glass as “ground truth wall” via secondary cues (CAD model, frame edges).
  • LiDAR motion distortion (skewing) — a spinning LiDAR rotating at 10 Hz spans 100 ms; the robot moves several cm during that interval. Returns within a single “frame” come from different ego-positions. Mitigation: per-point timestamp + IMU-integrated ego-pose deskewing (KISS-ICP, FAST-LIO2 do this internally).
  • iToF multi-path — light bounces off two surfaces before returning, giving a phase reading that is the sum of two distances. Manifests as ghost-depth halos behind objects or curved-looking corners. Fix: use multiple modulation frequencies + AI denoising; or switch to dToF.
  • iToF sun saturation — outdoor sunlight contains broad-spectrum IR that overwhelms the photodetector before the modulated signal can be demodulated. Fix: indoor only, or use narrow-band 1550 nm dToF/LiDAR.
  • Rain droplets on LiDAR window — falling drops generate spurious near-field returns; standing drops scatter the outgoing beam. Mitigation: hydrophobic coating, heated window, water-shedding geometry, dual-return filtering (drop = near return, scene = far return).
  • Cross-talk between LiDARs at the same wavelength — two LiDARs of the same brand within line-of-sight may interpret each other’s pulses as their own returns. Manifests as random “ghost” points in the air. Fix: per-unit pulse coding (Innoviz, modern Hesai support this), FMCW (intrinsically immune), or wavelength separation.
  • Bandwidth saturation — USB 3.0 host controller can sustain ~3.2 Gb/s real-world; running two D455s + a ZED 2i on the same USB3 root hub WILL drop frames intermittently. Diagnose with dmesg | grep usb and per-frame timestamps. Fix: separate USB host controllers (PCIe-USB cards), GigE Vision over wired Ethernet, MIPI CSI for embedded.
  • Time-sync drift in software-timestamped multi-camera rigs — observed as growing residuals in stereo / VIO over minutes. Fix: hardware trigger from a common source; PTP across Ethernet links; use hardware capture timestamps (V4L2, Argus, GigE Vision PTP).
  • Lens calibration drift — plastic-mount lenses can shift 1–3 px in principal point over a 50 °C swing or a single 5 g impact. Manifests as SLAM map drift and degraded reprojection error. Mitigation: metal mounts; periodic recalibration; online self-calibration in VIO (VINS-Fusion, ORB-SLAM3 do this).
  • GPU memory leaks in long-running perception pipelines — typical for ad-hoc Python/PyTorch wrappers that allocate tensors per frame without reuse. Monitor with nvidia-smi, nsys. Fix: pre-allocate buffers, use torch.cuda.empty_cache() between scenes, prefer C++/TensorRT for production.
  • Frame stale in pipeline — perception consumes a 100 ms-old image because the queue backed up. Fix: drop-on-write queues with timestamp gating; reject any frame older than 2× expected period.
  • Coordinate-frame mistakes — the OpenCV camera convention is +Z forward, +X right, +Y down; ROS / REP-103 uses +Z up, +X forward, +Y left; OpenGL / Blender uses +Z toward viewer, +Y up; Unreal is +Z up, +X forward. Mismatched conventions produce flipped axes that are subtle bugs — features look “almost right” until you add a second sensor. Fix: nail down the frame_id of every published topic; static_transform_publisher diagrams; rviz to inspect.
  • Stereo disparity bias on tilted floors — block-matchers assume horizontal epipolar lines; a misrectified rig produces a depth slope across the image. Fix: re-verify rectification after any shock; calibrate periodically.
  • Radar ghost targets — multi-bounce returns (radar → guardrail → vehicle → radar) appear as targets that don’t exist. Fix: micro-Doppler signature filtering; map-based suppression of static reflectors.
  • Radar angular ambiguity — sparse MIMO arrays alias; one true target can appear at multiple azimuths. Fix: super-resolution algorithms (MUSIC, IAA, compressed sensing); larger virtual array.
  • Photometric vs geometric calibration drift — VIO drifts because feature descriptors degrade (sun angle, exposure changes), not because IMU is bad. Fix: photometric VO (DSO) is fragile here; ORB / SuperPoint features are more robust.
  • Sensor bracket vibration resonance — a 6 mm 3D-printed bracket has its first mode at 30–80 Hz; a motor harmonic at 50 Hz excites it, smearing the camera image. Fix: stiffen the mount (FEM), add silicone damping, or move the sensor to a stiffer location.

8. Case studies

Tesla Autopilot HW3 / HW4 — vision-only autonomy

Tesla’s production fleet from 2018 onward runs an 8-camera vision-only perception stack (radar deprecated mid-2021 on most models, ultrasonics phased out in 2022). The eight cameras are three forward (narrow, main, wide), two side-pillar, two repeater (B-pillar), one rear, each based on a Sony IMX490-class 1.2 Mpx HDR automotive CMOS sensor with 120 dB dynamic range, LED flicker mitigation, and AEC-Q100 qualification. Capture is 36 fps per camera.

The HW3 compute board uses Tesla’s custom FSD Chip (two redundant NPUs, ~72 TOPS each, GF 14 nm) running a shared “Hydranet” multi-head CNN that consumes all 8 video streams in a unified bird’s-eye-view space (the occupancy network architecture from AI Day 2022). HW4 (HW3 successor, launched late 2023) roughly triples the compute and ups camera resolution to 5 Mpx. Tesla’s design thesis — controversial but consistent — is that LiDAR is unnecessary if camera-only neural networks can match the geometric inference; the company has staked its full-autonomy roadmap on this. The trade-off: vision-only stacks struggle in heavy rain, dense fog, and at night without lit infrastructure, where Waymo / Cruise / Zoox LiDAR stacks maintain performance.

Waymo Driver Gen-5 — multi-modal autonomy

The 5th-generation Waymo Driver (deployed in I-Pace and Jaguar test fleets, 2020 onward) carries one of the most sensor-rich production autonomy stacks of any commercial vehicle:

  • 4 LiDARs — one long-range “Honeycomb” top-mounted spinning unit with 360° / 200+ m / ~1 cm range precision; one rooftop “Laser Bear” perimeter unit; two short-range corner LiDARs for the near-field around the vehicle.
  • 29 cameras spanning long-range telephotos (for traffic-light reading at 250 m), perimeter cameras, and a high-resolution “peripheral vision” mast for cross-traffic.
  • 6 radars (mostly 77 GHz FMCW) for adverse-weather redundancy and long-range Doppler.
  • A custom Waymo compute platform fusing the lot.

The Honeycomb LiDAR achieves ~5 cm range accuracy at 100 m; classification accuracy on pedestrians at 100 m sits above 99 % in clear daylight, dropping to ~90 % in heavy rain. The Waymo Open Dataset (released 2019, expanded 2023–2024) is the publicly-traceable view into this stack’s performance.

Skydio X10 — autonomous drone obstacle avoidance

Skydio’s flagship enterprise drone (2023 release, succeeding X2D) is built around six Sony IMX-class CMOS sensors arranged as a true 360° hex-camera array — three on the top, three on the bottom — providing full hemispherical depth coverage with no blind spot. Each pair has a wide-baseline overlap (~10 cm), and the on-device visual SLAM stack runs on an NVIDIA Jetson Orin NX with a custom Skydio runtime, fusing the six-camera optical flow + a Bosch BMI088-class IMU + GNSS for sub-second-latency obstacle avoidance at 10 m/s flight speeds.

No LiDAR, no radar, no ultrasonic — pure vision. The architectural bet (similar to Tesla but for very different reasons) is that flight envelopes are bounded enough that vision-only depth at 5–25 m suffices, and the SWaP cost of even a small Livox Mid-360 LiDAR (265 g, 6 W) is unjustifiable on a 1.4 kg drone. The result is reliable autonomous flight through forests, bridges, and complex industrial sites — used by US DOT, fire departments, and military inspection units. Performance degrades at night (visible-light cameras with no on-board illumination); in 2026 Skydio introduced a NightSense IR mode using LWIR + active illumination.

Boston Dynamics Spot — robust outdoor perception

Each Spot platform carries 5 stereo camera assemblies (front, sides, rear) using PMD ToF + RGB sensors plus integrated IMU — Boston Dynamics’ “Stereo Camera” modules each combine a structured-light projector, two grayscale stereo cameras, and a color RGB camera. The combined 360° depth perception runs at 15 Hz and feeds the proprioceptive whole-body controller. For longer-range mapping in Spot Enterprise (mining, oil & gas), Boston Dynamics offers an optional Velodyne VLP-16 or Ouster OS0 spinning LiDAR via the EAP (Enhanced Autonomy Package) payload bay — this enables persistent map building over multi-kilometre patrol routes and integrates with the company’s Autowalk feature.

The Spot CORE compute is an Intel x86 module (initially NUC-class, upgraded in 2023); perception runs on-device with no off-board dependency in basic operation. Operators commonly add a Jetson Orin NX in the EAP payload bay for custom ML perception on top of the BD stack.

Ouster Gemini — robotaxi reference architecture

Ouster’s Gemini perception software stack (acquired from Velodyne, integrated 2024) is the de-facto reference for retrofitting commercial vehicles into AV testbeds. A typical Gemini deployment uses 2 × OS1-128 rooftop spinning LiDARs in a “X” configuration (eliminates blind spots from the LiDAR’s vertical FoV gap), 1 × OS0-128 front bumper short-range, 6–8 ARS548 radars at 77 GHz around the perimeter, and 6–10 Sony IMX490 / OmniVision OX08AT cameras. Compute is split between an NVIDIA DRIVE Orin platform (perception) and a separate planning ECU. This is the reference for shuttle / mining / agriculture autonomy programs, and is what most Tier-2 system integrators build atop.

9. Cross-references

  • [[Robotics/sensors-pose-motion]] — companion proprioceptive sensors (encoders, resolvers, IMUs); VIO and SLAM tightly couple these with the exteroceptive sensors here.
  • [[Robotics/sensors-force-tactile]] — force, torque, and tactile sensing (planned); the third major exteroceptive modality not covered here.
  • [[Robotics/slam]] — Simultaneous Localisation and Mapping (planned); the algorithmic layer that fuses these sensors into a coherent map and pose estimate.
  • [[Robotics/computer-vision-robotics]] — feature extraction, object detection, semantic segmentation, depth networks (planned); consumes the RGB and depth streams from this note.
  • [[Robotics/bayesian-estimation]] — Kalman / EKF / particle filter fusion (planned); the mathematics of combining multi-rate, multi-modal sensors with appropriate noise models.
  • [[Robotics/kinematics-dh]] — extrinsic calibration of camera-to-body and camera-to-camera transforms uses the same SE(3) machinery.
  • [[Engineering/op-amps]] — analog front-end for SPAD readout, photodiode transimpedance amplifiers in LiDAR, automotive radar IF stages.
  • [[Engineering/electromagnetics-engineering]] — radar physics (FMCW chirp design, antenna arrays, beamforming); LiDAR optical-system trade-offs.
  • [[Engineering/semiconductor-devices]] — CMOS image sensor physics, SPAD avalanche statistics, VCSEL laser-diode operation.
  • [[Languages/Tier3/robotics-control]] — ROS 2 / DDS topic conventions for sensor_msgs/PointCloud2, sensor_msgs/Image, sensor_msgs/CameraInfo.
  • [[Languages/Tier3/3d-scene]] — point-cloud and 3D-mesh formats (planned, USD/glTF/PLY/LAS/E57); the file-format layer for stored map data.

10. Citations

  • Szeliski, R. (2022). Computer Vision: Algorithms and Applications (2nd ed.). Springer. Free PDF at szeliski.org/Book. The standard practitioner reference; chapters 11–12 cover stereo, structure-from-motion, depth.
  • Hartley, R. & Zisserman, A. (2003). Multiple View Geometry in Computer Vision (2nd ed.). Cambridge University Press. The canonical reference for projective geometry, epipolar geometry, camera calibration.
  • Forsyth, D. A. & Ponce, J. (2011). Computer Vision: A Modern Approach (2nd ed.). Pearson.
  • Hirschmüller, H. (2008). “Stereo Processing by Semiglobal Matching and Mutual Information.” IEEE TPAMI 30(2), 328–341. DOI 10.1109/TPAMI.2007.1166. The SGBM/SGM algorithm at the heart of nearly all production stereo.
  • Beraldin, J.-A., Blais, F. & Lohr, U. (2010). “Laser Scanning Technology.” In Airborne and Terrestrial Laser Scanning. Whittles Publishing.
  • Geiger, A., Lenz, P. & Urtasun, R. (2012). “Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite.” CVPR. The foundational AV perception benchmark using Velodyne HDL-64E.
  • Sun, P. et al. (2020). “Scalability in Perception for Autonomous Driving: Waymo Open Dataset.” CVPR. arXiv:1912.04838.
  • Caesar, H. et al. (2020). “nuScenes: A multimodal dataset for autonomous driving.” CVPR. The 6-camera, 1-LiDAR, 5-radar reference dataset.
  • Hedborg, J. et al. (2012). “Rolling Shutter Bundle Adjustment.” CVPR. The standard rolling-shutter compensation for SfM.
  • Gallego, G. et al. (2022). “Event-Based Vision: A Survey.” IEEE TPAMI 44(1). arXiv:1904.08405. Definitive event-camera reference.
  • Velodyne (2019). VLP-16 User Manual and Programming Guide, rev D.
  • Velodyne (2017). HDL-64E S3 User’s Manual rev J.
  • Ouster (2024). OS1 / OS2 Sensor Hardware User Manual rev 7.
  • Hesai (2023). Pandar XT32 / QT128 User Manual.
  • Innoviz (2023). Innoviz Two Technical Brief.
  • Luminar Technologies (2023). Iris+ LiDAR Datasheet.
  • Aeva (2023). Aeries II FMCW LiDAR Datasheet.
  • Intel (2024). Intel RealSense Product Family Datasheet, rev 26.
  • Stereolabs (2024). ZED 2i Camera Specification.
  • Prophesee (2023). Metavision EVK4 Datasheet (IMX636 sensor).
  • Continental Engineering Services (2023). ARS548 RDI Premium Radar Datasheet.
  • Texas Instruments (2024). AWR2944 Single-Chip 77-GHz FMCW Radar Sensor datasheet, SWRS257A.
  • Bosch (2024). SMI230 Inertial Measurement Unit for ADAS product brief.
  • Sony Semiconductor Solutions (2023). IMX490 / IMX490AAQR Diagonal 6.46 mm 5.4 Mpx HDR Image Sensor datasheet.
  • NVIDIA (2024). Jetson Orin Series Datasheet — Jetson Orin Nano / NX / AGX Orin.
  • Apple (2020). iPad Pro 2020 LiDAR Scanner — ARKit Documentation.
  • Tesla (2022). AI Day 2022 presentation transcripts; (2024). We, Robot event materials.
  • Waymo (2023). Waymo Driver — 5th Generation Technical Overview (blog series).
  • Skydio (2023). Skydio X10 Product Brief.
  • Boston Dynamics (2023). Spot Robot Hardware Specification, rev 4.0.
  • ISO 13855:2024. Safety of machinery — Positioning of safeguards with respect to the approach speeds of parts of the human body. Sets minimum-distance requirements that drive sensor-range and refresh-rate spec for safety-rated perception.
  • IEC 61496-3:2018. Safety of machinery — Electro-sensitive protective equipment — Particular requirements for active opto-electronic protective devices responsive to diffuse reflection (AOPDDR). The standard for safety-rated 2D LiDAR scanners (SICK S300, Keyence SZ-V).