LiDAR, Depth Cameras & RGB — Robotics Exteroception

See also (Tier 3 family index): Perception Sensors

1. At a glance

Where [[Robotics/sensors-pose-motion]] answers “where am I?”, this note answers “what’s outside me?“. Exteroception — the sensing layer that maps the environment — is what separates a controllable machine from an autonomous one. Without it, a robot can servo joints to commanded angles but cannot avoid a wall, pick a part off a moving conveyor, or stop for a pedestrian.

Four sensor families dominate exteroceptive perception in 2026, plus three emerging:

RGB cameras — pinhole-projection imagers, 1–60+ Mpx, 30–240 fps. Passive (need ambient light), texture-rich, no native depth. Bayer-filter CMOS sensors are the universal substrate (Sony IMX series, ON Semi AR series, OmniVision OV series).
Depth cameras — RGB-D devices that emit their own illumination and recover Z per pixel:
- Stereo (passive triangulation between two RGB cameras) — ZED, RealSense D435/D455.
- Structured light (project a known IR pattern, observe deformation) — Kinect v1, RealSense D435 (assisted stereo).
- Time-of-Flight (ToF) — measure photon round-trip time directly (dToF: Kinect Azure, iPhone Lidar, Lucid Helios) or AMCW phase shift (iToF: Intel L515 (EoL), Microsoft AzureKinect-class).
LiDAR — scanning or flash laser ranging at 200 m+:
- Mechanical spinning — 360° horizontal, fixed vertical lines (Velodyne HDL/VLP, Ouster OS series, Hesai Pandar). 5–25 Hz frame rate; 16 to 128+ channels.
- Solid-state MEMS or OPA — fixed FoV, microelectromechanical or optical-phased-array beam steering (Innoviz, Luminar Iris+, Hesai AT128). No moving parts at the macro scale; automotive-qualified.
- FMCW LiDAR — coherent detection, simultaneous range and radial velocity (Aeva Aeries II, Mobileye Lumotive). Immune to interference from other LiDARs.
- Flash LiDAR — single-shot wide-FoV illumination, no scanning; range-limited.
mm-wave radar — 24, 60, 77, 79 GHz FMCW radar producing range–Doppler–azimuth tensors. Robust through fog, rain, dust, and snow. Continental ARS548, Bosch LRR, TI AWR2944. Lower angular resolution than camera or LiDAR; resolves radial velocity directly.

Emerging:

Event cameras — per-pixel temporal-contrast (DVS / DAVIS / Metavision) with μs-class latency and HDR > 120 dB; Prophesee, iniVation. Sparse async output.
Polarization cameras — Sony IMX264MZR / IMX250MZR per-pixel micropolarizer; specular vs diffuse separation, glare suppression, transparent-object handling.
Hyperspectral / NIR — line-scan or snapshot tens-to-hundreds of bands; Headwall, Specim, Cubert. Material classification in agriculture, recycling, food sorting.

First ask before selecting:

Range — 0.1 m (cobot fingertip ToF), 5 m (indoor mobile robot), 100 m (campus AGV), 250 m (highway autonomy)?
Refresh rate — 10 Hz mapping, 30 Hz tracking, 100 Hz reactive avoidance, 1 kHz VR?
Illumination & weather — controlled indoors, daylight outdoors, fog/rain/snow, night?
Reflectance — black tyre vs white wall vs glass vs sky?
Bandwidth & compute — what does the downstream perception stack ingest (and where does it run)?
Cost, SWaP, certification — $30 w e b c am t o$ 80 k automotive LiDAR; cobot-class IP67 to military MIL-STD-810.

The dominant trend through 2026 is sensor fusion as the default architecture: there is no single sensor that wins on every axis, so production autonomy stacks combine cameras (semantics, density), LiDAR (geometric truth, range), radar (velocity, weather), and IMU (ego-motion, see [[Robotics/sensors-pose-motion]]).

2. First principles

RGB camera — projection model

A pinhole camera maps a 3D point P = (X, Y, Z) in the camera frame to a 2D pixel p = (u, v) via the intrinsic matrix K:

[u]       [fx  0  cx] [X/Z]
[v]  =    [ 0 fy  cy] [Y/Z]
[1]       [ 0  0   1] [ 1 ]

with focal lengths f_x, f_y in pixels (= focal-length-mm × pixels-per-mm), principal point (c_x, c_y) at the optical axis, and the standard OpenCV convention +Z forward, +X right, +Y down. Real lenses add distortion — the radial-tangential Brown-Conrady model is the default in OpenCV calibration:

x' = x·(1 + k1·r² + k2·r⁴ + k3·r⁶) + 2·p1·x·y + p2·(r² + 2·x²)
y' = y·(1 + k1·r² + k2·r⁴ + k3·r⁶) + p1·(r² + 2·y²) + 2·p2·x·y

with r² = x² + y² in normalised coordinates. Wide-angle lenses (FoV > 100°) need a higher-order model — Kannala-Brandt fisheye (4 parameters), MEI omnidirectional, or division model.

The image sensor itself is a Bayer-pattern CMOS array: 2×2 RGGB mosaic over photodiodes; the missing channels at each pixel are interpolated (“demosaiced”, typically AHD or Malvar-He-Cutler). Image-signal processor (ISP) on the sensor or host applies black-level correction, lens shading correction, white balance, demosaic, denoise, sharpen, gamma, tone mapping. For computer vision, capture RAW Bayer or 12–16 bit linear before the ISP destroys photometric linearity.

Stereo depth — triangulation

Two rectified cameras with parallel optical axes, baseline B, focal length f (pixels):

Z = f · B / d

with disparity d = u_L − u_R the horizontal pixel offset of corresponding features. Disparity is computed by stereo matching: block-match (BM), semi-global match (SGBM, Hirschmüller 2008), or learned (PSMNet, RAFT-Stereo, FoundationStereo).

Differentiating gives the depth uncertainty:

σ_Z = (Z² / (f · B)) · σ_d

Depth accuracy degrades quadratically with range. Doubling baseline halves the σ_Z at any given range, but increases the minimum range (sub-baseline parallax becomes a vanishing disparity) and shrinks the stereo-overlap FoV.

Structured light

Project a known pseudo-random IR pattern (Kinect v1 used a fixed-position diffractive optical element); a single IR camera observes the pattern’s deformation in 3D and triangulates each pattern element against its design position. Effective baseline ≈ projector-to-sensor distance. Works well indoors at 0.5–5 m with controlled IR exposure; saturates under direct sunlight because solar IR overwhelms the projected pattern. Intel RealSense D-series uses active stereo (two IR cameras + IR pattern projector) — the pattern adds texture to featureless surfaces but disparity is still computed by stereo matching; the projector is effectively a textured “fill light”.

Time-of-flight

Direct ToF (dToF) — emit a short laser pulse (ns) and time the photon arrival at a single-photon avalanche diode (SPAD) array. Range = c · Δt / 2. Sony IMX516, ams VL53L8, Apple LiDAR (Sony custom). Excellent accuracy (mm-class), wide dynamic range, sun-tolerant.
Indirect ToF (iToF) — amplitude-modulate the illumination at frequency f_m (10–100 MHz), measure phase shift φ between emitted and reflected; Z = (c · φ) / (4π · f_m). Cheaper sensor (CMOS PMD pixel) but suffers range aliasing at d_max = c / (2 · f_m), multi-path interference, and sun saturation.

The fundamental ToF precision floor is photon-shot-noise limited:

σ_Z ≈ (c / (4π · f_m · SNR)) · √(BW)    [for iToF]
σ_Z ≈ (c · σ_t / 2)                     [for dToF, σ_t = SPAD jitter]

A typical iToF at 80 MHz modulation with SNR = 100 (well-lit scene) achieves σ_Z ≈ 3 mm at 1 m. dToF with 200 ps SPAD jitter achieves σ_Z ≈ 30 mm independent of range (until photon-count drops).

LiDAR — scanning principles

Each laser pulse measures one range sample. Frame rate × points-per-frame = point rate (~10⁵ to 10⁷ points/s). Three optical-architecture families:

Mechanical spinning — a stack of N laser-detector pairs rotates 360°. Vertical resolution = N (fixed, set by laser stack). Horizontal resolution = pulses-per-rotation. Velodyne VLP-16/32, Ouster OS0/1/2 (64–128 channels using VCSEL arrays + SPAD focal-plane), Hesai Pandar.
Solid-state MEMS — a single (or few) laser-detector pair sweeps across a fixed FoV via a vibrating mirror (~1 mm Si MEMS). No 360° coverage but no macro moving parts — automotive-grade reliability. Innoviz Two, Luminar Iris+, Hesai AT128.
Optical phased array (OPA) — array of phase-controlled emitters steers a beam electronically with zero moving parts. Mobileye/Lumotive in 2024–2026 ramp. Promising but immature compared to MEMS.

FMCW is orthogonal to scanning architecture: instead of a pulse, emit a frequency-chirped continuous wave; the beat between the transmitted and returned signal gives range and radial velocity simultaneously. Coherent detection at the photodiode rejects all incoherent light — sun-insensitive and immune to cross-talk from other LiDARs at the same wavelength. Aeva Aeries II, Mobileye, Scantinel. Cost has historically been the barrier.

mm-wave radar

77 GHz automotive radars are FMCW: emit a linear frequency ramp (“chirp”) of duration T_c and bandwidth B_w. The beat frequency f_b = (2 · R · B_w) / (c · T_c) gives range R; Doppler shift across multiple chirps gives radial velocity; an antenna array (MIMO virtual array) gives azimuth and elevation by digital beamforming.

range resolution    ΔR = c / (2 · B_w)
velocity resolution Δv = λ / (2 · T_frame)
angular resolution  Δθ ≈ λ / (N · d_ant)

A TI AWR2944 with B_w = 4 GHz achieves ΔR = 3.75 cm at 250 m max range; a 4 Tx × 4 Rx MIMO array gives 16 virtual antennas → ~3.5° azimuth resolution. Weather robustness comes from the cm-class wavelength (λ = 3.9 mm at 77 GHz) being much larger than typical fog droplets and raindrops; attenuation is ~0.1 dB/km in heavy rain vs ~30 dB/km for 905 nm LiDAR.

Event cameras

A DVS / DAVIS pixel asynchronously emits a “+1” or “−1” spike when its log-intensity changes by ≥ a contrast threshold C (typically 10–25 %). No frame rate; output is a stream of (x, y, t, polarity) events at μs timestamps. Dynamic range > 120 dB (vs ~70 dB for a standard CMOS); native temporal resolution 1 μs. Prophesee Genx320 / EVK4, iniVation DVXplorer. The event stream demands a different algorithmic stack (asynchronous CNNs, spiking networks, event-by-event SLAM) — see [[Robotics/computer-vision-robotics]].

3. Practical math / worked examples

Worked example A — Stereo depth accuracy at range

A Stereolabs ZED 2i has baseline B = 12 cm and rectified focal length f = 1400 px (at 1080p / HD mode, 110° HFoV lens). Block-matching disparity precision on a well-textured surface is σ_d ≈ 0.5 px (good lighting, sub-pixel refinement enabled).

At Z = 1 m :  σ_Z = (1² · 0.5) / (1400 · 0.12)        = 0.003 m = 3 mm
At Z = 5 m :  σ_Z = (5² · 0.5) / (1400 · 0.12)        = 0.074 m = 7.4 cm
At Z = 10 m:  σ_Z = (10² · 0.5) / (1400 · 0.12)       = 0.30 m  = 30 cm
At Z = 20 m:  σ_Z = (20² · 0.5) / (1400 · 0.12)       = 1.19 m

Implication. Indoor mobile robot at 5 m: stereo is fine. Curbside autonomous vehicle at 50 m: stereo is useless — for the ZED 2i, σ_Z at 50 m is 7.4 m. Need LiDAR, or a stereo rig with B = 1 m (e.g. truck-roof-mounted), or fuse with monocular metric-depth networks (DepthAnything-V2, Metric3D, ZoeDepth) that exploit perspective cues to constrain absolute scale.

Baseline-tuning rule of thumb: pick B ≈ Z_max / 100 for ~1 % depth precision at maximum range. For Z_max = 20 m, B ≈ 20 cm; for Z_max = 100 m, B ≈ 1 m.

Worked example B — LiDAR point density on a wall

Velodyne VLP-16 specs: 16 lasers across 30° vertical FoV (so 2.0° vertical spacing), 0.1–0.4° horizontal angular resolution (configurable), 10 Hz spin rate, 600 000 points/s total firing rate (single-return mode).

At 30 m range, the lateral spacing of returns on a wall normal to the LiDAR is:

horizontal:  30 m · tan(0.2°) = 0.105 m ≈ 10.5 cm
vertical:    30 m · tan(2.0°) = 1.05 m

So one full sweep at 30 m gives a roughly 10 cm × 105 cm grid — very anisotropic, “rakes” with big vertical gaps. A pedestrian at 30 m (1.7 m tall, 0.4 m wide) gets hit by ~2 vertical lines × 4 horizontal columns = ~8 returns total. Detection works but classification is hard.

Compare Ouster OS1-128 at 30 m:

horizontal:  30 m · tan(0.18°)  = 0.094 m
vertical:    30 m · tan(45°/128) = 0.184 m

Same pedestrian: ~9 vertical × 4 horizontal = ~36 returns, sufficient for class label from PointPillars / CenterPoint. Modern 128-channel sensors are 5–10× denser at any given range, which is why automotive LiDAR moved decisively from 16/32 channels in 2018 to 128–300+ channels in 2024.

Worked example C — Camera intrinsics calibration (OpenCV)

Standard pinhole calibration procedure with cv2.calibrateCamera:

Print a 9×6 checkerboard with 25 mm square side. Mount flat on a rigid backing.
Capture 20–30 views at varied angles and translations across the FoV. Ensure the board covers all four image quadrants in at least 5 views.
Detect inner corners with cv2.findChessboardCornersSB (better sub-pixel than the legacy version).
Call cv2.calibrateCamera(objectPoints, imagePoints, imageSize, None, None) → returns K (intrinsics 3×3), dist (5 distortion coefficients: k1, k2, p1, p2, k3), per-view rotation/translation, and per-view reprojection error.

Expected reprojection error:

RMS reproj error	Diagnosis
< 0.2 px	Excellent — research-grade
0.2–0.5 px	Good — production-fine
0.5–1.0 px	Marginal — recapture with more views
> 1.0 px	Bad — likely motion blur, bad pattern detection, or wrong distortion model

For wide-FoV lenses (> 100°), switch to cv2.fisheye.calibrate (Kannala-Brandt, 4 coeffs). For multi-camera rigs (stereo, multi-cam), calibrate intrinsics first, then extrinsics with cv2.stereoCalibrate — fix the intrinsics by passing flags=CALIB_FIX_INTRINSIC. Re-calibrate annually or after any mechanical shock; thermal drift of plastic lens housings can shift principal point by 1–3 px over 50 °C, enough to matter for SLAM loop closure.

Worked example D — Camera bandwidth at 4K @ 60 fps

A 4K UHD (3840×2160) sensor at 60 fps, 12-bit RAW:

Pixels/frame  = 3840 · 2160                = 8.29 Mpx
Bits/frame    = 8.29 Mpx · 12              = 99.5 Mb
Bytes/frame   = 99.5 / 8                   = 12.4 MB
Bytes/s (raw) = 12.4 MB · 60 fps           = 746 MB/s

Transport options:

Interface	Bandwidth	Notes
USB 3.2 Gen 2	1.25 GB/s	OK; uncompressed possible at 4K30 only without ISP compression
USB 3.2 Gen 2×2	2.5 GB/s	Rare on host side
10 GbE	1.0 GB/s usable	Industrial standard; GigE Vision protocol
MIPI CSI-2 D-PHY 4-lane	9 Gbps (4×2.5 Gbps)	Direct sensor → SoC on embedded boards
MIPI CSI-2 C-PHY 3-trio	up to 22 Gbps	Newer Jetson Orin, Apple
GMSL2 (Maxim)	6 Gbps	Automotive, single coax up to 15 m
FPD-Link III (TI)	4.16 Gbps	Automotive, same role as GMSL
CoaXPress 12	12.5 Gbps	Industrial; one coax per lane

For 8 cameras at 4K60 (a typical autonomous-vehicle perimeter), aggregate raw bandwidth is ~6 GB/s — necessitating either on-sensor JPEG/H.265 compression, ISP-side downsampling, or sustained Jetson AGX Orin / TI TDA4VM-class compute. Lossy compression destroys photometric linearity; ML inference is robust to this, but stereo matching and photometric SLAM (DSO, LSD-SLAM) suffer.

Worked example E — Radar range, velocity, angular resolution

TI AWR2944 cascaded MIMO radar at 77 GHz: B_w = 4 GHz, T_c = 25 μs chirp, T_frame = 50 ms, 4 Tx × 4 Rx → 16 virtual antennas with d_ant = λ/2.

ΔR  = c / (2 · B_w)         = 3·10⁸ / (2 · 4·10⁹)       = 0.0375 m = 3.75 cm
Δv  = λ / (2 · T_frame)     = 0.0039 / (2 · 0.05)       = 0.039 m/s ≈ 0.14 km/h
Δθ  ≈ λ / (N · d_ant)        = 0.0039 / (16 · 0.00195)   = 0.125 rad ≈ 7.2° (uniform)

With non-uniform MIMO and super-resolution (MUSIC, ESPRIT), angular resolution drops to ~1–2° — competitive with low-end LiDAR. Range × resolution trade-off is fundamental: doubling chirp duration halves range resolution but doubles unambiguous velocity range.

4. Design heuristics

Range vs sensor-family mapping

Range	Recommended primary	Backup
0–0.5 m	ToF (VL53L8) or capacitive	Vision proximity
0.5–2 m	Structured light (RealSense D435), dToF (Lucid Helios)	RGB monocular
2–10 m	Active stereo (RealSense D455), passive stereo (ZED 2i)	Indoor LiDAR (Livox Mid-360)
10–100 m	Spinning LiDAR (Ouster OS1, Hesai QT128)	Stereo with B = 0.5 m
50–300 m	Solid-state LiDAR (Luminar Iris+, Innoviz Two) + mm-wave radar	Long-FL telephoto camera
> 300 m	mm-wave radar (Continental ARS548 LRR)	Adaptive long-range LiDAR

Indoor vs outdoor

Environment	Works	Marginal	Fails
Office, lab, warehouse	RGB, stereo, ToF, structured light, LiDAR	Magnetometer	—
Outdoor daylight	RGB, stereo, LiDAR, radar	iToF (sun saturation)	Structured light (Kinect pattern washes out)
Outdoor night	LiDAR, radar, NIR-illuminated camera	RGB without illumination	Passive stereo
Direct sunlight on lens	LiDAR (905 nm narrow-band filter), radar	RGB (lens flare)	iToF, sunlit IR ToF
Fog / mist	Radar, FMCW LiDAR (partial)	Direct-detection LiDAR	RGB, stereo
Heavy rain	Radar	LiDAR (degraded ~30–50 %)	RGB (water on lens)
Snow / falling snow	Radar	LiDAR (snowflakes = false returns)	RGB
Dust storm	Radar	LiDAR (saturated)	RGB

Update rate vs application

Application	Required rate
Static map building	5–10 Hz
Mobile robot navigation (≤ 2 m/s)	10–20 Hz
Autonomous vehicle (highway)	10–25 Hz LiDAR, 20–30 Hz radar, 30–60 Hz camera
Cobot collision avoidance	30–60 Hz
Drone obstacle avoidance (10 m/s)	60–120 Hz
VR / AR head-tracking	90+ Hz camera + 1 kHz IMU
High-speed picking	240+ Hz event camera or strobed RGB
Autonomous racing (Indy AC)	100+ Hz LiDAR, 200+ Hz camera

Sensor synchronisation

Software timestamps over Ethernet/USB drift by 1–50 ms with non-deterministic latency. For any multi-sensor stack:

Hardware trigger all sensors that support it from a common pulse — typically a 1 PPS signal from a GNSS receiver or an FPGA-generated sync line.
PTP / IEEE 1588 ([[Languages/Tier3/ros2-robotics-config]]) on every sensor’s network interface. Modern industrial cameras (Basler ace2, FLIR Blackfly S Mono GigE) support PTP timestamps directly.
Hardware timestamping at the capture buffer: e.g. NVIDIA Jetson MIPI CSI-2 has a per-frame hardware timestamp captured at start-of-frame, exposed through Argus/V4L2.
For LiDAR, treat each point (not each frame) as having its own timestamp — points within a single 100 ms scan span 100 ms of ego-motion, and motion-deskewing with IMU integration is essential at any vehicle speed.

Camera selection ladder

Requirement	Pick
Hobby / prototyping	Raspberry Pi Camera v3 (Sony IMX708), Arducam
Industrial machine vision	Basler ace2, FLIR Blackfly S, Allied Vision Alvium
Outdoor automotive	Sony IMX490, Sony IMX728, ON Semi AR0820AT (LFM, HDR, AEC-Q100)
Drone obstacle avoid	Global-shutter rolling-replacement (Sony IMX296, OmniVision OV9281)
HDR scene (tunnel exits, welding)	LFM-equipped sensors (Sony IMX390, IMX490)
High-speed	Sony IMX250 (5 Mpx, 163 fps), Vision Datum LU series
Low-light / NIR	Sony Starvis IMX462, IMX485; intensified Onsemi KAE
Polarization	Sony IMX250MZR / IMX264MZR (mono polar)
Event	Prophesee Genx320 (320×320, automotive), IMX636 (1280×720)

Global vs rolling shutter. Rolling-shutter sensors expose row-by-row over typically 5–30 ms. On any moving robot or any sensor moving through space, this introduces rolling-shutter skew — straight lines bow, fast objects shear. SLAM and VIO algorithms break down or require explicit rolling-shutter compensation (Hedborg 2012). For drones, AGVs, automotive: pay the BoM premium for global shutter (~2× cost).

Stereo geometry tuning

Baseline rule: B ≈ Z_max / 100 for ~1 % depth precision at range.
Focal length trades FoV for angular pixel resolution; a 50° HFoV at 1080p ≈ 2000 px/rad ≈ 0.029°/pixel.
Disparity search range d_max = f · B / Z_min sets compute cost; SGBM/RAFT cost scales linearly with disparity range. Limit by setting a sane minimum range (e.g. Z_min = 0.5 m for indoor robot, 5 m for highway).
Texture is everything. Stereo matching collapses on blank walls, glass, ceilings. Hence “active stereo” with IR pattern projector (RealSense D435), or accept holes and fuse with other modalities.

GPU vs FPGA vs DSP

Pipeline	Best fit	Why
Prototyping CNN inference	GPU (Jetson Orin, RTX)	Toolchain, batch size, flexibility
Production CNN inference on cost-sensitive embedded	NPU / accelerator (Hailo-8, Coral Edge TPU, Qualcomm AI engine)	TOPS/W
Safety-rated fixed-latency vision	FPGA (Xilinx Kria K26, Lattice CrossLink)	Deterministic latency, ISO 26262
Stereo SGM at 4K60	FPGA or fixed-function ISP block	Memory bandwidth, parallel disparity sweep
Event-camera processing	FPGA / neuromorphic (Loihi 2, Akida)	Asynchronous spike processing
Low-power VIO on drone	DSP + tightly-coupled IMU (Qualcomm Hexagon, Snapdragon Flight)	mW-scale power

5. Components & sourcing — real parts

RGB cameras (industrial / robotics)

Vendor	Family	Sensor format	Notes
Basler	ace 2, boost, dart	up to 24 Mpx	GigE Vision, USB3 Vision, MIPI-CSI variants
FLIR / Teledyne	Blackfly S, Oryx, Forge	up to 65 Mpx	High SNR; industrial standard
Allied Vision	Alvium 1500/1800, Mako, Manta	up to 31 Mpx	MIPI-CSI for embedded, GigE for stationary
Lucid Vision	Triton, Atlas	up to 65 Mpx	IP67-rated outdoor
IDS	uEye+, Ensenso	broad	German machine vision incumbent
Vieworks	VC, VP, VT	up to 250 Mpx	Inspection, high-resolution
The Imaging Source	DFK / DMK	5–20 Mpx	Lower cost, USB3
Arducam	Pi HQ, Orin module	embedded	MIPI-CSI for Jetson, Pi
Leopard Imaging	LI-IMX477, GMSL2 modules	varied	Automotive over GMSL
e-con Systems	See3CAM, NileCAM	varied	MIPI for Jetson / RB5

Underlying CMOS sensor families (sourced separately on production boards): Sony IMX477 (12 Mpx, 60 fps, used in Pi HQ Camera), IMX585 (8 Mpx Starvis-2 low light), IMX490 (5.4 Mpx 120 dB LED-flicker-mitigated automotive), IMX661 (127 Mpx APS-H, machine vision); ON Semi AR0820AT (8 Mpx HDR automotive); OmniVision OV9281 (1 Mpx mono global shutter, drone VIO).

Stereo and RGB-D cameras

Vendor	Model	Tech	Range	Notes
Stereolabs	ZED 2i	Passive stereo, B = 12 cm	0.3–20 m	1080p120, IP66, integrated IMU
Stereolabs	ZED X / X Mini	Active stereo + GMSL2	0.2–20 m	Automotive/AGV; sun-tolerant
Intel RealSense	D435i / D435f	Active stereo + IR projector + IMU	0.3–10 m	EoL announced; still in field
Intel RealSense	D455 / D456	Active stereo, B = 9.5 cm	0.6–20 m	D456 is IP65 / IP67
Intel RealSense	D457	GMSL2 industrial	0.6–20 m	Designed for AGV / fixed installs
Orbbec	Femto Bolt, Femto Mega	dToF + RGB	0.25–5.46 m	Azure Kinect successor
Orbbec	Gemini 2 / 335	Structured light + RGB	0.15–10 m	Indoor robotics
Lucid Vision	Helios 2	dToF (Sony IMX556)	0.3–8.3 m	IP67, GigE; sun-tolerant
Photoneo	PhoXi 3D Scanner, MotionCam 3D	Structured light	0.5–6 m	High-precision bin picking
Zivid	Zivid 2, Zivid 2+	Structured light	0.3–3 m	Sub-mm bin picking
Mech-Mind	Mech-Eye Pro M / S	Structured light	0.3–3 m	Chinese bin-picking standard
Pickit	Pickit M	Structured light	0.4–2.5 m	Industrial bin pick
Microsoft	Azure Kinect (EoL 2023)	dToF + RGB + 7-mic	0.5–5.5 m	Still in research use

LiDAR — spinning

Vendor	Model	Channels	Range	Notes
Velodyne (now Ouster)	VLP-16 (Puck)	16	100 m	Legacy 2014 standard; cheap on eBay
Velodyne (now Ouster)	HDL-32E / HDL-64E	32 / 64	100 / 120 m	DARPA-era; obsolete in new builds
Ouster	OS0-64 / OS0-128	64 / 128	35 m	90° vertical FoV; near-field
Ouster	OS1-32 / OS1-128	32 / 128	200 m / 120 m	Mainstream AGV/AMR sensor
Ouster	OS2-64 / OS2-128	64 / 128	240 m	Long-range autonomy
Hesai	Pandar QT64 / QT128	64 / 128	30 / 50 m	Short-range automotive perception
Hesai	Pandar 40P / 64 / 128	40 / 64 / 128	200 m	Robotaxi standard in China
Hesai	Pandar XT32M2X	32	300 m	Long-range single-return
RoboSense	RS-LiDAR-16 / 32 / 80	16 / 32 / 80	150–250 m	Lower-cost mechanical
Livox	Mid-360 / HAP / Avia	non-repetitive	40–450 m	Petal-pattern, low-cost
Cepton	Vista-X90 / Nova	solid-state MMT	150–200 m	Frictionless mirror tech

LiDAR — solid-state and FMCW

Vendor	Model	Architecture	Range	Notes
Innoviz	Innoviz One, Innoviz Two	MEMS solid-state	250–300 m	Tier-1 automotive (BMW, VW)
Luminar	Iris, Iris+	1550 nm fiber laser + MEMS	300–500 m	Volvo EX90, Mercedes
Hesai	AT128 / AT512 / ATX	MEMS-like 1D scanning	200–300 m	Highest-volume L3 automotive
Aeva	Aeries II	FMCW MEMS	500 m	Volkswagen, Daimler Truck
Mobileye	Lumotive OPA	OPA solid-state	varies	2025–2026 ramp
Valeo	Scala 3	mechanical scanning	200 m	First-mass-production AD LiDAR (Audi A8 2018 onward)
Continental	HRL131	flash + scanning hybrid	150 m	Tier-1 automotive
Sense Photonics	Osprey / Sense One	flash dToF	30–100 m	Industrial short-range
Cepton	Nova	MMT solid-state	30 m	Compact, near-field

mm-wave automotive radar

Vendor	Model	Band	Notes
Continental	ARS548 LRR	77 GHz	4D imaging radar, 300 m
Continental	SRR520 SRR	77 GHz	Short range, blind-spot
Bosch	LRR4 / LRR5	77 GHz	Mainstream ACC radar
Aptiv	RACam (radar + camera)	77 GHz	Integrated front sensor
TI	AWR1843 / AWR2243 / AWR2944	77 GHz	Reference SoC / cascaded chips
NXP	TEF82xx / S32R45	77 GHz	RFCMOS + processor stack
Arbe	Phoenix	79 GHz	4D imaging, 2304 virtual channels
Uhnder	F1 (digital-code mod)	77 GHz	First DCM (CDMA) automotive radar
Smartmicro	UMRR-96	77 GHz	AGV / industrial

Event cameras and emerging

Vendor	Model	Resolution	Notes
Prophesee	EVK4 (IMX636)	1280×720	Industrial dev kit
Prophesee	Genx320	320×320	Automotive, ultra-low power
iniVation	DVXplorer / DAVIS346	640×480 / 346×260	Research
Sony	IMX636 / IMX637	up to HD	Direct Sony silicon (also in EVK4)

Compute pairing

Platform	TOPS / NPU	Camera lanes	Notes
NVIDIA Jetson Orin Nano (8 GB)	40 TOPS	4 × MIPI CSI-2	Hobby + prosumer robotics
NVIDIA Jetson Orin NX 16 GB	100 TOPS	8 × MIPI CSI-2	Drone / AMR
NVIDIA Jetson AGX Orin 64 GB	275 TOPS	16 × MIPI CSI-2	Autonomous vehicle research
NVIDIA DRIVE Orin / Thor	254 / 1000 TOPS	many	Production automotive
Qualcomm RB5 / RB6	15 / 30+ TOPS	7 × MIPI CSI-2	Snapdragon Flight drone
Intel NUC + Arc A380	~10 TFLOPS GPU	USB / PCIe	Indoor robotics
Hailo-8 / Hailo-10	26 / 40 TOPS	PCIe addon	Edge inference accelerator
AMD/Xilinx Kria K26 / KV260	~1.2 TOPS (Zynq DPU)	MIPI CSI-2, native	FPGA fixed-latency pipelines
Google Coral Edge TPU	4 TOPS	USB/PCIe	Quantised int8 inference
TI TDA4VM / TDA4AEN	8 TOPS + DSP	4 × CSI-2	Automotive ECU
Mobileye EyeQ6H	34 TOPS	many	Robotaxi-grade
Apple M2 / M4 SoC	16–38 TOPS NPU	shared LPDDR	ARKit, iPhone Lidar pipeline

6. Reference data

Sensor-family comparison (quick reference)

Family	Range	Accuracy	Refresh	Cost	Light required	Weather
RGB monocular	infinite (no depth)	n/a depth	30–240 fps	$30-$ 5k	yes	poor
Passive stereo	0.3–30 m	1 cm @ 3 m, 30 cm @ 10 m	30–120 fps	$400-$ 2k	yes	poor
Active stereo (IR)	0.2–10 m	1 cm @ 1 m, 10 cm @ 5 m	30–90 fps	$300-$ 2k	sun-degrades	indoor-fine
Structured light	0.15–5 m	1 mm–1 cm	5–30 fps	$1–25k	sun-degrades	indoor-only
iToF	0.3–8 m	3 mm–3 cm	15–60 fps	$200-$ 3k	sun-saturates	indoor-fine
dToF	0.1–10 m	3 mm–3 cm	15–60 fps	$300-$ 5k	sun-tolerant	indoor + outdoor day
Spinning LiDAR	0.5–300 m	2–5 cm	5–25 Hz	$4 k -$ 80k	no	rain ~30 % loss
Solid-state LiDAR	1–500 m	3–10 cm	10–25 Hz	$700-$ 20k	no	rain ~30 % loss
FMCW LiDAR	1–500 m	3 cm + 0.1 m/s	10–20 Hz	$5 k -$ 30k	no	rain ~20 % loss
mm-wave radar	0.2–300 m	4 cm range, 0.1 m/s	10–50 Hz	$50-$ 500	no	excellent
Event camera	n/a depth	1 μs latency	async	$500-$ 5k	no (HDR)	excellent (wide DR)

LiDAR vendor / model table (2026)

Vendor / Model	Channels	Max range (10 % refl)	FoV (H×V)	Points/s	Frame rate	Wavelength
Velodyne VLP-16	16	100 m	360°×30°	600 k	5–20 Hz	903 nm
Ouster OS1-128	128	200 m (at 10 %)	360°×45°	5.2 M	10–20 Hz	865 nm
Ouster OS2-128	128	240 m	360°×22.5°	5.2 M	10 Hz	865 nm
Hesai Pandar 128	128	200 m	360°×40°	3.5 M	10–20 Hz	905 nm
Hesai AT128	128	200 m	120°×25.4°	1.5 M	10 Hz	905 nm
Hesai QT128	128	50 m	360°×104.2°	3.5 M	10 Hz	905 nm
Livox Mid-360	non-repeat	40 m	360°×59°	200 k	10 Hz	905 nm
Innoviz Two	n/a (solid)	300 m	120°×40°	varies	10–20 Hz	905 nm
Luminar Iris+	n/a	500 m	120°×26°	varies	10–30 Hz	1550 nm
Aeva Aeries II	FMCW	500 m	120°×30°	1.6 M	10–20 Hz	1550 nm
RoboSense M1	solid	200 m	120°×25°	750 k	10 Hz	905 nm
Valeo Scala 3	mech	200 m	133°×11°	varies	25 Hz	905 nm
Continental HRL131	hybrid	150 m	120°×28°	varies	25 Hz	905 nm

RGB-D camera comparison

Model	Tech	Depth resolution	Frame rate	Min/Max range	IMU	Cost (USD)
RealSense D435i	Active stereo	1280×720	90 fps	0.3–3 m (typ)	yes	$300
RealSense D455	Active stereo	1280×720	90 fps	0.6–6 m	yes	$400
RealSense D457	Active stereo (GMSL)	1280×720	90 fps	0.6–20 m	yes	$700
ZED 2i	Passive stereo	2208×1242	100 fps	0.3–20 m	yes	$500
ZED X	Active+GMSL2	1920×1200	60 fps	0.2–20 m	yes	$1500
Orbbec Femto Bolt	dToF	1024×1024	30 fps	0.25–2.88 m	yes	$400
Orbbec Femto Mega	dToF	1024×1024	30 fps	0.25–5.46 m	yes	$600
Lucid Helios 2	dToF	640×480	30 fps	0.3–8.3 m	no	$1700
Photoneo PhoXi M	Structured light	2064×1544	0.25 fps (full)	0.5–2 m	no	$13k
Zivid 2+ M60	Structured light	1944×1200	5 fps	0.6–2 m	no	$14k
Microsoft Azure Kinect (EoL)	dToF	1024×1024	30 fps	0.5–5.46 m	yes	$400 used

Camera RAW bandwidth (uncompressed)

Resolution	fps	Bit depth	Bandwidth
VGA 640×480	60	8	17.6 MB/s
720p 1280×720	30	8	26.4 MB/s
1080p 1920×1080	30	8	59.3 MB/s
1080p 1920×1080	60	12	178 MB/s
4K 3840×2160	30	12	373 MB/s
4K 3840×2160	60	12	746 MB/s
8K 7680×4320	30	12	1.49 GB/s
12 Mpx	60	12	1.08 GB/s
24 Mpx	30	12	1.08 GB/s
50 Mpx	30	12	2.25 GB/s

GPU / accelerator inference rates (typical)

Platform	YOLOv8-S @ 640	PointPillars (LiDAR)	DepthAnything-V2 Small
Jetson Orin Nano (8 GB)	~60 fps INT8	~10 Hz	~15 fps
Jetson Orin NX 16 GB	~140 fps INT8	~25 Hz	~35 fps
Jetson AGX Orin 64 GB	~330 fps INT8	~70 Hz	~80 fps
Hailo-8	~150 fps INT8	n/a (no point-net)	~25 fps
Coral Edge TPU	~30 fps INT8 (limited model size)	n/a	n/a
RTX 4090	~1500 fps FP16	~250 Hz	~400 fps

(Numbers vary substantially with batch size, INT8/FP16 quantisation, and TensorRT vs ONNX runtime. Treat as order-of-magnitude.)

Weather robustness matrix

Sensor	Clear	Light rain	Heavy rain	Fog 50 m	Snow	Dust	Direct sun
RGB camera	★★★★★	★★★	★	★	★	★	★★ (flare)
Stereo	★★★★★	★★★	★	★	★	★	★★
Structured light	★★★★★	n/a outdoor	n/a outdoor	n/a outdoor	n/a outdoor	n/a outdoor	✗
ToF (iToF)	★★★★★	★★	★	★	★	★	✗
ToF (dToF, narrowband)	★★★★★	★★★	★★	★★	★★	★★	★★★
LiDAR (905 nm)	★★★★★	★★★★	★★★	★★	★★	★★	★★★★
LiDAR (1550 nm)	★★★★★	★★★★	★★★	★★★	★★★	★★★	★★★★
FMCW LiDAR (1550 nm)	★★★★★	★★★★	★★★	★★★	★★★	★★★	★★★★
mm-wave radar	★★★★	★★★★★	★★★★★	★★★★★	★★★★★	★★★★★	★★★★★
Event camera	★★★★	★★★	★★	★	★★	★★	★★★★★ (HDR)

7. Failure modes & debugging

Lens flare and ghosting — sun, headlights, or bright LEDs in or near frame produce internal reflections off lens elements. Causes phantom features and false detections downstream. Mitigation: lens hood / baffle, anti-reflection multi-coating on the lens, LFM-equipped sensors for periodic LEDs.
Rolling-shutter skew — straight vertical edges bow into S-curves on moving platforms; fast objects shear. Detection: stationary check pattern with the robot moving — should be straight. Fix: switch to global-shutter sensor (Sony IMX296, IMX392, IMX422; ON Semi AR0234) for any platform faster than ~1 m/s.
Auto-exposure hunting — sun briefly behind a building, then in frame, sends AE chasing back and forth on a 100 ms scale, momentarily over/under-exposing every frame. Disable AE in mission mode; fix exposure from a calibrated lookup keyed to ambient lux, or use HDR sensors with 120+ dB intra-frame dynamic range.
Depth holes — low texture (stereo) — blank walls, ceilings, glass. Stereo matchers return no correspondences. Fix: assisted stereo (IR pattern projector), fuse with ToF/LiDAR.
Depth holes — specular reflections (ToF) — mirrors, polished metal, water. Photons bounce away from the sensor. Fix: polarization camera variant; LiDAR; flag-and-discard region.
Depth holes — transparent surfaces — glass walls and windows transmit IR. Fix: same as specular; or treat glass as “ground truth wall” via secondary cues (CAD model, frame edges).
LiDAR motion distortion (skewing) — a spinning LiDAR rotating at 10 Hz spans 100 ms; the robot moves several cm during that interval. Returns within a single “frame” come from different ego-positions. Mitigation: per-point timestamp + IMU-integrated ego-pose deskewing (KISS-ICP, FAST-LIO2 do this internally).
iToF multi-path — light bounces off two surfaces before returning, giving a phase reading that is the sum of two distances. Manifests as ghost-depth halos behind objects or curved-looking corners. Fix: use multiple modulation frequencies + AI denoising; or switch to dToF.
iToF sun saturation — outdoor sunlight contains broad-spectrum IR that overwhelms the photodetector before the modulated signal can be demodulated. Fix: indoor only, or use narrow-band 1550 nm dToF/LiDAR.
Rain droplets on LiDAR window — falling drops generate spurious near-field returns; standing drops scatter the outgoing beam. Mitigation: hydrophobic coating, heated window, water-shedding geometry, dual-return filtering (drop = near return, scene = far return).
Cross-talk between LiDARs at the same wavelength — two LiDARs of the same brand within line-of-sight may interpret each other’s pulses as their own returns. Manifests as random “ghost” points in the air. Fix: per-unit pulse coding (Innoviz, modern Hesai support this), FMCW (intrinsically immune), or wavelength separation.
Bandwidth saturation — USB 3.0 host controller can sustain ~3.2 Gb/s real-world; running two D455s + a ZED 2i on the same USB3 root hub WILL drop frames intermittently. Diagnose with dmesg | grep usb and per-frame timestamps. Fix: separate USB host controllers (PCIe-USB cards), GigE Vision over wired Ethernet, MIPI CSI for embedded.
Time-sync drift in software-timestamped multi-camera rigs — observed as growing residuals in stereo / VIO over minutes. Fix: hardware trigger from a common source; PTP across Ethernet links; use hardware capture timestamps (V4L2, Argus, GigE Vision PTP).
Lens calibration drift — plastic-mount lenses can shift 1–3 px in principal point over a 50 °C swing or a single 5 g impact. Manifests as SLAM map drift and degraded reprojection error. Mitigation: metal mounts; periodic recalibration; online self-calibration in VIO (VINS-Fusion, ORB-SLAM3 do this).
GPU memory leaks in long-running perception pipelines — typical for ad-hoc Python/PyTorch wrappers that allocate tensors per frame without reuse. Monitor with nvidia-smi, nsys. Fix: pre-allocate buffers, use torch.cuda.empty_cache() between scenes, prefer C++/TensorRT for production.
Frame stale in pipeline — perception consumes a 100 ms-old image because the queue backed up. Fix: drop-on-write queues with timestamp gating; reject any frame older than 2× expected period.
Coordinate-frame mistakes — the OpenCV camera convention is +Z forward, +X right, +Y down; ROS / REP-103 uses +Z up, +X forward, +Y left; OpenGL / Blender uses +Z toward viewer, +Y up; Unreal is +Z up, +X forward. Mismatched conventions produce flipped axes that are subtle bugs — features look “almost right” until you add a second sensor. Fix: nail down the frame_id of every published topic; static_transform_publisher diagrams; rviz to inspect.
Stereo disparity bias on tilted floors — block-matchers assume horizontal epipolar lines; a misrectified rig produces a depth slope across the image. Fix: re-verify rectification after any shock; calibrate periodically.
Radar ghost targets — multi-bounce returns (radar → guardrail → vehicle → radar) appear as targets that don’t exist. Fix: micro-Doppler signature filtering; map-based suppression of static reflectors.
Radar angular ambiguity — sparse MIMO arrays alias; one true target can appear at multiple azimuths. Fix: super-resolution algorithms (MUSIC, IAA, compressed sensing); larger virtual array.
Photometric vs geometric calibration drift — VIO drifts because feature descriptors degrade (sun angle, exposure changes), not because IMU is bad. Fix: photometric VO (DSO) is fragile here; ORB / SuperPoint features are more robust.
Sensor bracket vibration resonance — a 6 mm 3D-printed bracket has its first mode at 30–80 Hz; a motor harmonic at 50 Hz excites it, smearing the camera image. Fix: stiffen the mount (FEM), add silicone damping, or move the sensor to a stiffer location.

8. Case studies

Tesla Autopilot HW3 / HW4 — vision-only autonomy

Tesla’s production fleet from 2018 onward runs an 8-camera vision-only perception stack (radar deprecated mid-2021 on most models, ultrasonics phased out in 2022). The eight cameras are three forward (narrow, main, wide), two side-pillar, two repeater (B-pillar), one rear, each based on a Sony IMX490-class 1.2 Mpx HDR automotive CMOS sensor with 120 dB dynamic range, LED flicker mitigation, and AEC-Q100 qualification. Capture is 36 fps per camera.

The HW3 compute board uses Tesla’s custom FSD Chip (two redundant NPUs, ~72 TOPS each, GF 14 nm) running a shared “Hydranet” multi-head CNN that consumes all 8 video streams in a unified bird’s-eye-view space (the occupancy network architecture from AI Day 2022). HW4 (HW3 successor, launched late 2023) roughly triples the compute and ups camera resolution to 5 Mpx. Tesla’s design thesis — controversial but consistent — is that LiDAR is unnecessary if camera-only neural networks can match the geometric inference; the company has staked its full-autonomy roadmap on this. The trade-off: vision-only stacks struggle in heavy rain, dense fog, and at night without lit infrastructure, where Waymo / Cruise / Zoox LiDAR stacks maintain performance.

The 5th-generation Waymo Driver (deployed in I-Pace and Jaguar test fleets, 2020 onward) carries one of the most sensor-rich production autonomy stacks of any commercial vehicle:

4 LiDARs — one long-range “Honeycomb” top-mounted spinning unit with 360° / 200+ m / ~1 cm range precision; one rooftop “Laser Bear” perimeter unit; two short-range corner LiDARs for the near-field around the vehicle.
29 cameras spanning long-range telephotos (for traffic-light reading at 250 m), perimeter cameras, and a high-resolution “peripheral vision” mast for cross-traffic.
6 radars (mostly 77 GHz FMCW) for adverse-weather redundancy and long-range Doppler.
A custom Waymo compute platform fusing the lot.

The Honeycomb LiDAR achieves ~5 cm range accuracy at 100 m; classification accuracy on pedestrians at 100 m sits above 99 % in clear daylight, dropping to ~90 % in heavy rain. The Waymo Open Dataset (released 2019, expanded 2023–2024) is the publicly-traceable view into this stack’s performance.

Skydio X10 — autonomous drone obstacle avoidance

Skydio’s flagship enterprise drone (2023 release, succeeding X2D) is built around six Sony IMX-class CMOS sensors arranged as a true 360° hex-camera array — three on the top, three on the bottom — providing full hemispherical depth coverage with no blind spot. Each pair has a wide-baseline overlap (~10 cm), and the on-device visual SLAM stack runs on an NVIDIA Jetson Orin NX with a custom Skydio runtime, fusing the six-camera optical flow + a Bosch BMI088-class IMU + GNSS for sub-second-latency obstacle avoidance at 10 m/s flight speeds.

No LiDAR, no radar, no ultrasonic — pure vision. The architectural bet (similar to Tesla but for very different reasons) is that flight envelopes are bounded enough that vision-only depth at 5–25 m suffices, and the SWaP cost of even a small Livox Mid-360 LiDAR (265 g, 6 W) is unjustifiable on a 1.4 kg drone. The result is reliable autonomous flight through forests, bridges, and complex industrial sites — used by US DOT, fire departments, and military inspection units. Performance degrades at night (visible-light cameras with no on-board illumination); in 2026 Skydio introduced a NightSense IR mode using LWIR + active illumination.

Boston Dynamics Spot — robust outdoor perception

Each Spot platform carries 5 stereo camera assemblies (front, sides, rear) using PMD ToF + RGB sensors plus integrated IMU — Boston Dynamics’ “Stereo Camera” modules each combine a structured-light projector, two grayscale stereo cameras, and a color RGB camera. The combined 360° depth perception runs at 15 Hz and feeds the proprioceptive whole-body controller. For longer-range mapping in Spot Enterprise (mining, oil & gas), Boston Dynamics offers an optional Velodyne VLP-16 or Ouster OS0 spinning LiDAR via the EAP (Enhanced Autonomy Package) payload bay — this enables persistent map building over multi-kilometre patrol routes and integrates with the company’s Autowalk feature.

The Spot CORE compute is an Intel x86 module (initially NUC-class, upgraded in 2023); perception runs on-device with no off-board dependency in basic operation. Operators commonly add a Jetson Orin NX in the EAP payload bay for custom ML perception on top of the BD stack.

Ouster Gemini — robotaxi reference architecture

Ouster’s Gemini perception software stack (acquired from Velodyne, integrated 2024) is the de-facto reference for retrofitting commercial vehicles into AV testbeds. A typical Gemini deployment uses 2 × OS1-128 rooftop spinning LiDARs in a “X” configuration (eliminates blind spots from the LiDAR’s vertical FoV gap), 1 × OS0-128 front bumper short-range, 6–8 ARS548 radars at 77 GHz around the perimeter, and 6–10 Sony IMX490 / OmniVision OX08AT cameras. Compute is split between an NVIDIA DRIVE Orin platform (perception) and a separate planning ECU. This is the reference for shuttle / mining / agriculture autonomy programs, and is what most Tier-2 system integrators build atop.

9. Cross-references

[[Robotics/sensors-pose-motion]] — companion proprioceptive sensors (encoders, resolvers, IMUs); VIO and SLAM tightly couple these with the exteroceptive sensors here.
[[Robotics/sensors-force-tactile]] — force, torque, and tactile sensing (planned); the third major exteroceptive modality not covered here.
[[Robotics/slam]] — Simultaneous Localisation and Mapping (planned); the algorithmic layer that fuses these sensors into a coherent map and pose estimate.
[[Robotics/computer-vision-robotics]] — feature extraction, object detection, semantic segmentation, depth networks (planned); consumes the RGB and depth streams from this note.
[[Robotics/bayesian-estimation]] — Kalman / EKF / particle filter fusion (planned); the mathematics of combining multi-rate, multi-modal sensors with appropriate noise models.
[[Robotics/kinematics-dh]] — extrinsic calibration of camera-to-body and camera-to-camera transforms uses the same SE(3) machinery.
[[Engineering/op-amps]] — analog front-end for SPAD readout, photodiode transimpedance amplifiers in LiDAR, automotive radar IF stages.
[[Engineering/electromagnetics-engineering]] — radar physics (FMCW chirp design, antenna arrays, beamforming); LiDAR optical-system trade-offs.
[[Engineering/semiconductor-devices]] — CMOS image sensor physics, SPAD avalanche statistics, VCSEL laser-diode operation.
[[Languages/Tier3/robotics-control]] — ROS 2 / DDS topic conventions for sensor_msgs/PointCloud2, sensor_msgs/Image, sensor_msgs/CameraInfo.
[[Languages/Tier3/3d-scene]] — point-cloud and 3D-mesh formats (planned, USD/glTF/PLY/LAS/E57); the file-format layer for stored map data.

10. Citations

Szeliski, R. (2022). Computer Vision: Algorithms and Applications (2nd ed.). Springer. Free PDF at szeliski.org/Book. The standard practitioner reference; chapters 11–12 cover stereo, structure-from-motion, depth.
Hartley, R. & Zisserman, A. (2003). Multiple View Geometry in Computer Vision (2nd ed.). Cambridge University Press. The canonical reference for projective geometry, epipolar geometry, camera calibration.
Forsyth, D. A. & Ponce, J. (2011). Computer Vision: A Modern Approach (2nd ed.). Pearson.
Hirschmüller, H. (2008). “Stereo Processing by Semiglobal Matching and Mutual Information.” IEEE TPAMI 30(2), 328–341. DOI 10.1109/TPAMI.2007.1166. The SGBM/SGM algorithm at the heart of nearly all production stereo.
Beraldin, J.-A., Blais, F. & Lohr, U. (2010). “Laser Scanning Technology.” In Airborne and Terrestrial Laser Scanning. Whittles Publishing.
Geiger, A., Lenz, P. & Urtasun, R. (2012). “Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite.” CVPR. The foundational AV perception benchmark using Velodyne HDL-64E.
Sun, P. et al. (2020). “Scalability in Perception for Autonomous Driving: Waymo Open Dataset.” CVPR. arXiv:1912.04838.
Caesar, H. et al. (2020). “nuScenes: A multimodal dataset for autonomous driving.” CVPR. The 6-camera, 1-LiDAR, 5-radar reference dataset.
Hedborg, J. et al. (2012). “Rolling Shutter Bundle Adjustment.” CVPR. The standard rolling-shutter compensation for SfM.
Gallego, G. et al. (2022). “Event-Based Vision: A Survey.” IEEE TPAMI 44(1). arXiv:1904.08405. Definitive event-camera reference.
Velodyne (2019). VLP-16 User Manual and Programming Guide, rev D.
Velodyne (2017). HDL-64E S3 User’s Manual rev J.
Ouster (2024). OS1 / OS2 Sensor Hardware User Manual rev 7.
Hesai (2023). Pandar XT32 / QT128 User Manual.
Innoviz (2023). Innoviz Two Technical Brief.
Luminar Technologies (2023). Iris+ LiDAR Datasheet.
Aeva (2023). Aeries II FMCW LiDAR Datasheet.
Intel (2024). Intel RealSense Product Family Datasheet, rev 26.
Stereolabs (2024). ZED 2i Camera Specification.
Prophesee (2023). Metavision EVK4 Datasheet (IMX636 sensor).
Continental Engineering Services (2023). ARS548 RDI Premium Radar Datasheet.
Texas Instruments (2024). AWR2944 Single-Chip 77-GHz FMCW Radar Sensor datasheet, SWRS257A.
Bosch (2024). SMI230 Inertial Measurement Unit for ADAS product brief.
Sony Semiconductor Solutions (2023). IMX490 / IMX490AAQR Diagonal 6.46 mm 5.4 Mpx HDR Image Sensor datasheet.
NVIDIA (2024). Jetson Orin Series Datasheet — Jetson Orin Nano / NX / AGX Orin.
Apple (2020). iPad Pro 2020 LiDAR Scanner — ARKit Documentation.
Tesla (2022). AI Day 2022 presentation transcripts; (2024). We, Robot event materials.
Waymo (2023). Waymo Driver — 5th Generation Technical Overview (blog series).
Skydio (2023). Skydio X10 Product Brief.
Boston Dynamics (2023). Spot Robot Hardware Specification, rev 4.0.
ISO 13855:2024. Safety of machinery — Positioning of safeguards with respect to the approach speeds of parts of the human body. Sets minimum-distance requirements that drive sensor-range and refresh-rate spec for safety-rated perception.
IEC 61496-3:2018. Safety of machinery — Electro-sensitive protective equipment — Particular requirements for active opto-electronic protective devices responsive to diffuse reflection (AOPDDR). The standard for safety-rated 2D LiDAR scanners (SICK S300, Keyence SZ-V).

Compendium

Explorer

LiDAR, Depth Cameras & RGB — Robotics Exteroception Reference