Autonomous Driving — Perception, Prediction, Planning, Control Stack

The full self-driving stack as it exists in production fleets as of 2026: Waymo (5th-Gen Driver on I-Pace, 6th-Gen on Zeekr), Cruise (paused since 2023), Tesla FSD (v12/v13 end-to-end), Mobileye SuperVision, Wayve AV2.0, Baidu Apollo, Pony.ai, Aurora Driver, Nuro R3, Zoox bidirectional pod. The stack: HD-map vs mapless, sensor fusion (camera-only vs lidar-fused), prediction-conditioned planning, MPC + behavior trees + IDM for control, and the safety-case + ODD definitions that determine what level the system is graded at under SAE J3016.

1. At a glance

An autonomous vehicle (AV) is a wheeled robot operating on a road network at automotive scale — 30–150 km/h, 1500–3000 kg mass, with the constraint that any failure is potentially lethal to occupants or other road users. The SAE J3016 levels frame the conversation:

L0: no automation; AEB only.
L1: single-axis assist; adaptive cruise OR lane-keep.
L2: partial automation; driver responsible. Tesla AP/FSD-Beta, GM Super Cruise (hands-free in ODD), Ford BlueCruise, Mercedes Drive Pilot.
L3: conditional, eyes-off in ODD; driver takes back on disengage. Mercedes Drive Pilot (CA/NV/Germany ≤ 95 km/h), Honda Sensing Elite (JP), Audi A8 (cancelled 2020).
L4: high automation, no driver in ODD; Minimum Risk Maneuver (MRM) on fault. Waymo, Cruise (paused since 2023), Apollo Go, Pony.ai, Zoox, Nuro, Aurora trucking.
L5: full automation, no ODD limit. No deployed product; no credible roadmap.

The economically significant target is robust L4 for ride-hail and delivery; L3 is an in-between that automakers have largely de-prioritised in 2024–2025 (Audi cancelled L3; Tesla rejected J3016 framing entirely and pushes “FSD Supervised” as a marketing category).

The canonical architecture in 2026 splits across two camps:

Modular stack (Waymo, Cruise, Aurora, Mobileye, Apollo). Independent ML-trained modules for sensing → perception → prediction → planning → control with explicit interfaces (sensor frames, detections, tracks, trajectories). Decades of automotive-grade verification methods (SOTIF ISO 21448, ISO 26262 functional safety) wrapped around each.
End-to-end stack (Tesla FSD v12/v13, Wayve AV2.0, comma.ai openpilot post-2023, Helm.ai, Ghost Autonomy 2.0). A single neural network maps pixels (and sometimes radar/IMU) to motor torques + steering rates.

Both approaches need explicit safety filters; both make the same hard problems hard.

Where this sits:

non-holonomic mobile base supplies the kinematics.
radar supplies the raw measurements.
SLAM supplies localization (HD-map + LiDAR or visual-INS).
Computer vision supplies detection / segmentation.
Bayesian estimation supplies multi-object tracking.
Sampling + lattice planners supply behavior + trajectory.
MPC supplies the low-level controller.
Functional safety is the wrapper: ISO 26262 + ISO 21448 + UL 4600.

First ask:

What’s the ODD? Highway-only L3 differs qualitatively from urban L4 — different sensor suite (less LiDAR, more long-range radar), different mapping strategy, different prediction problem (lane-following vs unprotected left turns).
Maps or mapless? Waymo / Mobileye REM / Tesla pre-v12 use HD vector maps; Wayve / Tesla v12+ run mapless. Map-heavy works but limits scaling to new cities.
What’s the redundancy story? L4 needs dual-redundant compute, steering, braking, power, sensing — single-fault tolerance is the table stake.
Who carries liability on failure? This drives the entire system design — L2 = driver; L3 = OEM during ODD; L4 = operator.
What’s the verification approach? Modular fits ISO 26262 module-level proofs; end-to-end relies on “operational metrics” (interventions per mile, simulated miles). Regulators are still catching up to E2E.

2. First principles

2.1 ODD, ODD exit, and the MRM

The Operational Design Domain (SAE J3016) enumerates: roadway types (highway, urban, parking), geographic area (specific cities, geo-fenced polygons), time of day, weather (rain rate, fog visibility), traffic density, road condition. The vehicle must monitor the ODD in real time. When ODD exit is predicted (sensor failure, weather degradation, geo-fence approach, internal fault), the system executes the Minimum Risk Maneuver — for L3 this means handing back to the driver (10 s typical handover); for L4 it means a safe stop, ideally pulling out of the active travel lane.

2.2 Coordinate frames

Standard AV frames:

Vehicle frame (ISO 8855): x forward, y left, z up; origin at center of rear axle.
Sensor frames: each camera/LiDAR/radar has a calibrated extrinsic (R, t) to vehicle frame.
Global frame: UTM zone-local x/y, or WGS84 lat/lon. Heading θ = vehicle x-axis relative to UTM north.
HD-map frame: typically WGS84 or local-Cartesian aligned to a Mercator projection in the operating region; lane-level geometry as polylines with semantic attributes.

Time synchronization across sensors uses PTP (IEEE 1588) to < 1 ms; vision + LiDAR fusion needs hardware-trigger sync or motion-compensated reprojection.

2.3 Bicycle model

Standard kinematic model for trajectory tracking:

ẋ = v cos(θ) ẏ = v sin(θ) θ̇ = (v / L) tan(δ)

with L = wheelbase, δ = front-wheel steering angle, v = velocity. The dynamic counterpart adds tire forces (Pacejka magic-formula tire model):

m v̇ = F_xf cos(δ) - F_yf sin(δ) + F_xr - F_drag m v (β̇ + ψ̇) = F_xf sin(δ) + F_yf cos(δ) + F_yr I_z ψ̈ = a (F_xf sin(δ) + F_yf cos(δ)) - b F_yr

with β = sideslip, ψ = yaw, a/b = front/rear axle distances from CoG. Used at higher speeds where slip matters; below ~5 m/s the kinematic model suffices.

2.4 Pacejka tire model

Tire lateral force F_y = D sin(C arctan(B α - E (B α - arctan(B α))))

with α = slip angle, B/C/D/E fitted per tire (Pacejka 1993, “Magic Formula”). Linear-region cornering stiffness C_α = B·C·D ≈ 30–80 kN/rad for a passenger tire. Peak μ ≈ 0.7–1.1 dry, 0.3–0.6 wet, 0.05–0.2 ice. Below the peak: linear, predictable. Above: degenerate, instability risk.

2.5 Friction circle

Combined longitudinal + lateral force is bounded by the friction circle:

(F_x / (μ F_z))² + (F_y / (μ F_z))² ≤ 1

i.e., you can’t simultaneously brake hard and corner hard. Sequential planners exploit this: brake-then-turn-then-accelerate rather than do both. Race drivers blend the boundary — production AVs respect it conservatively.

2.6 IDM (Intelligent Driver Model, Treiber 2000)

Standard car-following model used in both behavior planning and simulation:

s*= s_0 + max(0, v T + v Δv / (2 √(a b))) v̇ = a [1 - (v / v₀)^4 - (s* / s)²]

with s = gap to leader, s_0 = jam gap, T = time headway (1.0–1.5 s typical), v₀ = desired speed, a = acceleration, b = braking. Produces realistic follow-and-merge behavior. Combined with MOBIL lane-changing (Kesting 2007) for multi-lane traffic.

2.7 Time-to-collision (TTC)

TTC = (s - s_safe) / max(0, Δv)

Below TTC = 1.5 s, brake-now is invoked. TTC of 2.5–4 s is a comfortable safety envelope. Used in both AEB regulations (FMVSS 127) and AV planners.

2.8 Responsibility-Sensitive Safety (RSS, Shalev-Shwartz 2017, Mobileye)

Formal safe-distance constraints that, if respected, mean the AV is “not at fault” in any collision. RSS defines longitudinal and lateral safe distances:

d_lon,safe = v_r ρ + 0.5 a_max_accel ρ² + (v_r + ρ a_max_accel)² / (2 a_min_brake) - v_f² / (2 a_max_brake_f)

with ρ = reaction delay, v_r = rear (ego) velocity, v_f = front (lead) velocity. If satisfied, ego can brake to a stop before colliding regardless of lead’s behavior (within bounded assumptions). Adopted in Mobileye’s RSS proposal; influenced UL 4600 and ISO 21448.

3. Practical math — perception → prediction → planning → control

3.1 Multi-object tracking pipeline

Per frame:

Detection. Camera: CenterPoint, BEVFormer, PETR, StreamPETR, SparseDrive (2024+) running on Nvidia Orin / DRIVE Thor at 10–30 Hz. LiDAR: PointPillars, CenterPoint, PVRCNN (2019–2021 baselines), modern Transfusion / BEVFusion / DSVT (2023+). Output: 3D bounding boxes (x, y, z, l, w, h, θ, class) with confidence.
Sensor fusion. Either early-fusion (concatenate raw lidar + camera features in BEV space — BEVFusion 2022, TransFusion 2022) or late-fusion (per-sensor detect, associate, then fuse track-level — older, easier to certify).
Association. Detections to existing tracks via Hungarian / greedy matching on IoU + Mahalanobis distance. For each unmatched detection: spawn new track. For each unmatched track: increment lost-count; delete after N misses.
Tracking. Per-object Kalman filter (constant-velocity or constant-acceleration model); state = (x, y, vx, vy, θ, ω). Predict + update at 10 Hz.
Classification + intent. Refine class via temporal consistency (track-level classifier). Compute aggregate features (vehicle/pedestrian/cyclist), aggregate dynamics (stop/move/turning).

3.2 Prediction

Given a tracked agent at time t, predict trajectories for the next 3–8 seconds. Modern stack uses Transformer-based multi-modal predictors:

VectorNet (Gao 2020): polyline representation of lanes + agent histories → graph attention.
MultiPath (Chai 2019): fixed K=64 trajectory anchors + per-anchor offsets and probabilities.
DenseTNT (Gu 2021): goal-set prediction + trajectory completion.
Wayformer (Nayakanti 2022): self/cross-attention on scene + agent context.
MTR / MTR++ (Shi 2022/2023): motion-transformer with intention queries; Waymo Open Motion Prediction Challenge winner 2022.
MotionLM (Seff 2023, Waymo): autoregressive language-model formulation of joint multi-agent prediction.

Output: K=6 trajectory modes per agent, each with probability + waypoints at 0.5 s spacing. Used by planner to evaluate cost.

3.3 Behavior planning

Discrete decision layer over the next 3–10 s. Approaches:

Hand-engineered state machines / decision trees — Waymo’s “Cruise Control” + lane-change automaton (2009–2020 era).
POMDPs — formally optimal under uncertainty; intractable except for narrow problems (intersection-crossing, Bai 2015).
MCTS over predicted futures — Hubmann 2018; informally used by several stacks.
Learned policies — ChauffeurNet (Bansal 2019), ALVINN-descendants, scenario-conditioned end-to-end nets.

3.4 Trajectory planning

Continuous-time planner over 3–8 s. Frenet-frame planners parameterize over arc-length along a reference path (Werling 2010): generate K quintic polynomials in (s, l) space, score by curvature + jerk + clearance, pick best. Optimization-based planners (CHOMP, STOMP, TrajOpt for AVs; Apollo’s EM planner) optimize a cost over a spline parameterization. Sampling-based: RRT* + smoothing for offroad/parking.

3.5 Control (MPC)

Standard production controller: linearized model-predictive control with N = 20–50 step horizon, dt = 50–100 ms. State: (x, y, θ, v, β). Inputs: (a, δ̇). Quadratic cost on tracking error + control effort + jerk. Constraints: tire friction, steering rate limit, max acceleration. Solver: OSQP / qpOASES at 50–100 Hz on automotive CPU (Aurora Driver, Cruise, Apollo). Tesla FSD uses a learned end-to-end policy after v12 — but a low-level torque-vectoring + steering rate controller still lives behind the neural net.

3.6 Lateral controller — pure pursuit and Stanley

For lower-bandwidth lateral control or fallback:

Pure pursuit (Coulter 1992): δ = arctan(2L sin(α) / l_d) with l_d = look-ahead distance, α = angle to look-ahead point.

Stanley (Stanford DARPA Urban Challenge 2007): δ = ψ_e + arctan(k e_fa / v) with ψ_e = heading error, e_fa = cross-track error at front axle, k = gain ~2.5.

Both still see use as fallbacks in production stacks.

3.7 Sized example: highway lane change

For a 1500 kg sedan at 30 m/s (108 km/h) executing a lane change:

Lateral distance: 3.5 m (US lane width).
Comfortable lateral acceleration: 1.5 m/s² peak.
Time required: ~6 s for jerk-bounded sigmoid; ~3.5 s for square-wave acceleration (uncomfortable).
Longitudinal distance covered: 180 m at 30 m/s × 6 s.
Detection-to-target gap: minimum ~50 m before initiation; full lookahead ~250 m.

This sets minimum sensor range requirement at highway speed: ~300 m forward, 150 m side/rear for blind-spot.

3.8 Stopping distance

At 30 m/s on dry asphalt (μ ≈ 0.9): minimum braking distance = v² / (2 μ g) = 900 / (2 × 0.9 × 9.81) ≈ 51 m. Add 1.5 s perception+decision latency: 45 m. Total: ~95 m. Wet asphalt halves μ: 90 + 45 = 135 m. Snow: 250+ m.

3.9 Worked example: prediction confidence threshold

Pedestrian detected at crosswalk; predictor outputs 6 trajectories with probabilities. Planner uses minimum-risk decision:

If max P(intent = cross within 2s) > 0.3 → yield (stop / slow).
If max P(intent = cross within 2s) ≤ 0.05 → continue at planned speed.
Else: reduce speed; re-evaluate at next frame.

Choice of threshold is the difference between Cruise’s “yields too much” (lower threshold) and Tesla’s “phantoms” (less yield). It’s safety-vs-throughput tradeoff.

4. Design heuristics

HD maps trade scaling cost for problem simplification. Waymo’s centimeter-accurate maps reduce perception load (you know where the curbs are) but require map-update operations in every city. Mapless approaches (Wayve, Tesla v12) scale faster but solve a harder perception problem.
LiDAR is not optional above L2. Every L3+ deployment except Tesla uses LiDAR. The reason: redundant range measurement against camera + radar, with much lower domain-shift sensitivity than vision-only. The cost has dropped 10× in 2018–2026 (Innoviz, Hesai, Luminar, Aeva, Cepton); Hesai AT128 is ~ $500-$ 1000 in volume.
The long tail eats you. Highway driving is 99.9% well-behaved; the remaining 0.1% (debris, construction, emergency vehicles, weather, deer) is most of the engineering effort. Datasets must be curated for tail events; Waymo Open Dataset, nuScenes, KITTI-360, Argoverse-2, Lyft Level 5 / Woven all sample tails.
Simulation budget exceeds road budget. Waymo runs ~10⁹ simulated miles per real mile driven. CARLA, Nvidia DRIVE Sim, Applied Intuition, Waymo Carcraft, Cognata, Foretellix Foretify, NVIDIA Omniverse — pick your stack.
Latency is in the safety case. End-to-end latency from photon-to-actuator must be < 100 ms for highway speeds. Each pipeline stage burns budget; perception alone often 30–50 ms, prediction 10–30, planning 10–30, control 5–10. The compute-platform pick (Orin → Thor → custom-ASIC) directly determines achievable horizon.
Don’t trust monoculus depth. Monocular depth networks (DPT, Marigold) are useful for far-field scene understanding but not for distance-to-lead-vehicle. Use lidar or radar for the safety-critical longitudinal channel.
Calibration drifts. Camera + LiDAR extrinsics shift with thermal cycling and vibration; online calibration (Levinson 2013 for camera + LiDAR; Schneider 2017 for radar) is mandatory or weekly garage recalibration.
The pedestrian problem is not solved. Detection works; intent prediction at crosswalks remains the dominant failure mode in urban deployments (Cruise’s 2023 incident chain). Conservative behavior (yielding always when uncertain) trades capacity for safety.

5. Components & sourcing

Subsystem	Common parts
Compute	Nvidia DRIVE Orin (254 TOPS, in Volvo EX90/Polestar 3/XPeng G9), DRIVE Thor (2000 TFLOPS, GB202 + Hopper, in 2025+ AVs), Tesla HW4 (2 SoC, ~144 TOPS), Mobileye EyeQ6 (34 TOPS), Qualcomm Snapdragon Ride 9540
LiDAR (long-range)	Hesai AT128 (200 m, $500-$ 1k vol), Luminar Iris (250 m, deployed Volvo EX90), Innoviz Two (300 m, BMW iX), Velodyne (acquired by Ouster), Cepton Vista-X, RoboSense MX
LiDAR (360°)	Ouster OS1 / OS2, Hesai Pandar64/128, Velodyne VLS-128 (legacy)
Radar	Continental ARS540 (4D imaging, 300 m), Bosch front radar gen5, Aptiv RACam, Arbe Phoenix (4D imaging, 0.5° angular, 300 m), Aurora FirstLight (FMCW, in-house), Mobileye EyeQ Radar
Camera	Sony IMX490/IMX728 (8 MP automotive HDR), OnSemi AR0820, OmniVision OX08B/OX08D, Foxconn / Leopard / Forvia modules
GNSS+IMU	Novatel PwrPak7-E1, Applanix POS LV, OXTS RT3000v3, NextNav, ublox NEO/F9
HD Maps	TomTom HD Map, HERE HD Live Map, Mobileye REM (crowdsourced), DeepMap (acquired Nvidia), Waymo internal, Ushr (acquired Dynamic Map Platform)
Safety controller	Infineon AURIX TC3xx/TC4xx (lockstep ARM Cortex-R52), NXP S32N, Renesas RH850
Brake-by-wire	Continental MK C1/C2, Bosch IPB-i, ZF IBC, BWI Group
Steer-by-wire	Nexteer, JTEKT, ZF (production in 2025+ Toyota bZ4X/Lexus RZ, Tesla Cybertruck)

6. Reference data

6.1 SAE J3016 levels

Level	Name	Driver role	Examples (2026)
0	No automation	Full	AEB-only
1	Driver Assist	Full	Adaptive cruise OR lane-keep
2	Partial	Full, hands-on capable	Tesla AP/FSD-Beta, GM Super Cruise (hands-free in ODD), Ford BlueCruise
3	Conditional	Eyes off in ODD, ready to take back	Mercedes Drive Pilot (CA/NV/Germany ≤ 95 km/h), Honda Sensing Elite (JP)
4	High	None in ODD	Waymo One, Apollo Go, Pony.ai, Zoox, Cruise (paused), Nuro
5	Full	None ever	No deployed product

6.2 Notable AV operators (2026 status)

Operator	Status	Notes
Waymo	L4 commercial, paid rides in Phoenix, SF, LA, Austin	~250k paid rides/wk Q1 2026
Cruise (GM)	Paused since Oct 2023 SF incident	Reorganizing; no public timeline
Tesla FSD	L2 supervised, public beta on ~5 M vehicles	v13 end-to-end mid-2025; robotaxi pilot Austin 2025
Mobileye	Drive (L4) + Chauffeur (L3 EOL); SuperVision (L2+) in Polestar/Zeekr	Chauffeur cancelled 2024
Aurora	Autonomous trucking L4; Dallas–Houston revenue ops 2024	Aurora Driver on Volvo VNL Autonomous + Peterbilt 579
Wayve	UK + US L4 development; mapless end-to-end	AV2.0 $1B Series C 2024 (Microsoft, Nvidia, SoftBank)
Pony.ai	Robotaxi China (Beijing, Guangzhou); Hong Kong IPO 2024
Baidu Apollo Go	Robotaxi China (Wuhan, Beijing, Chongqing, ChengDu)	~10 M cumulative rides by 2026
Zoox (Amazon)	LA/Foster City/Las Vegas; bidirectional pod, no steering wheel	Pre-revenue commercial
Nuro	L4 delivery (no occupant); R3 unveiled 2024	Pilot Mountain View, Houston
Aurora (truck)	Houston–Dallas, OKC–Fort Worth Q4 2024	Driverless launches 2024 Q4
Kodiak Robotics	I-20/I-10 trucking, Atlas Air Cargo logistics	Pre-revenue driverless
Plus.ai / Embark / TuSimple	Wound down 2023–2024	Trucking AV consolidation

6.3 Standards landscape

Standard	Scope
ISO 26262	Functional safety (E/E systems); ASIL A/B/C/D
ISO 21448 (SOTIF)	Safety of intended functionality (perception/planning misbehavior)
ISO/SAE 21434	Cybersecurity engineering
ISO/PAS 22737	LSAD (Low-Speed Automated Driving)
UL 4600	Safety case for autonomous products
SAE J3016	Driving automation levels
UNECE WP.29 R157	ALKS (Automated Lane Keeping System; L3 framework, EU regulation)
FMVSS 127 (US)	AEB requirement
Euro NCAP Vision 2030	AEB / lane-keep / driver-monitoring rating protocols

6.4 Sensor suite by deployment tier

L2 ADAS (mass market):
  - 1 forward camera + 1 forward radar
  - Optional rear + side blind-spot radars
  - GPS + IMU (no centimeter accuracy)
  - $200–500 BOM sensor cost
  - Example: Tesla AP standard, Ford BlueCruise

L2+ (enhanced ADAS):
  - 1 forward camera trifocal + 4-8 surround cameras
  - 5+ radars
  - GPS + IMU + wheel-speed dead-reckoning
  - Optional: short-range LiDAR for parking
  - $500–2000 BOM
  - Example: Mobileye SuperVision (no LiDAR), Mercedes Drive Pilot (with LiDAR)

L3 (highway autonomy):
  - 1-2 long-range LiDAR (Luminar Iris, Innoviz Two)
  - 5+ radars including 4D imaging radar
  - 8+ cameras (HDR automotive)
  - Centimeter-accurate GPS-RTK + IMU
  - $2000–5000 BOM
  - Example: Mercedes Drive Pilot (Valeo SCALA LiDAR)

L4 robotaxi (urban):
  - 5+ LiDARs (mix of long + short range)
  - 5-10 radars
  - 15-20+ cameras
  - HD map service + centimeter-accurate GPS+IMU
  - Dual redundant compute
  - $10k-50k BOM (Gen 5 Waymo ~$50k; Gen 6 targets ~$15k)
  - Example: Waymo Driver Gen 5/6, Cruise Origin, Zoox

L4 trucking:
  - 2-4 long-range LiDARs (Aurora FirstLight, Luminar)
  - 5+ radars
  - 6-8 cameras
  - Centimeter-accurate GPS + Tactical-grade IMU
  - Specialized for highway ODD
  - Example: Aurora Driver, Kodiak Driver

7. Failure modes & debugging

Phantom braking — Tesla AP/FSD braking for shadows or overhead signs. Cause: monocular vision confusing 2D pattern for 3D obstacle; insufficient radar gating after radar removal (Tesla 2021). NHTSA investigation EA22-002 open since Feb 2022.
Pedestrian dragging — Cruise SF October 2, 2023: pedestrian struck by another vehicle was deposited under Cruise robotaxi, which then performed a pullover maneuver dragging the pedestrian ~20 ft. Root cause chain: failed to detect that the object under the vehicle was a person, executed standard “clear the lane” routine. DMV suspended Cruise’s permit October 24; CEO resigned November.
Roundabout deadlock — Mobileye + Waymo both observed: AV waits indefinitely for “safe” gap that never materializes in dense traffic. Fix: tunable gap-acceptance thresholds + assertive merge after timeout.
Construction zone misclassification — orange cones at sufficient density cause path-planning to give up. Fix: explicit construction-zone semantic class + temporary lane-map overrides.
HD-map staleness — Waymo + Cruise: new construction not in map → vehicle attempts illegal turn. Fix: real-time discrepancy detection + automatic map invalidation.
Emergency vehicle interaction — Tesla, Cruise, Waymo all logged incidents at active fire/police scenes. Fix: dedicated emergency-vehicle perception model + behavior tree branch.
GPS multipath in urban canyons — Manhattan, SF Financial District: lane-level localization fails. Fix: lidar-to-map matching (NDT, ICP) gives sub-meter independent of GPS.
Rain on the lidar dome — Velodyne / Hesai lose 30–50% of returns above 25 mm/hr. Fix: heated covers + multi-bounce filtering + radar-priority fusion in heavy rain.

8. Case studies

8.1 Waymo’s Driver: 6 generations of sensor evolution

Gen 1 (2009): Toyota Prius + roof-mounted Velodyne HDL-64E ($75k LiDAR), one camera. Stanford DARPA team’s legacy.
Gen 2-3 (2012-2014): Lexus RX450h, custom LiDAR + cameras + radar.
Gen 4 (2017): Chrysler Pacifica + first in-house LiDAR (short, medium, long range).
Gen 5 (2019): Jaguar I-PACE; in-house long-range LiDAR claimed ~300 m, 29 cameras, 6 radars. Deployed in Phoenix commercial 2020.
Gen 6 (2024): Zeekr RT custom platform (Geely partnership); reduced sensor count via better fusion + ML; cost down ~10× vs Gen 5. Deployed 2025+ as Waymo One scales.

Stack: modular, written largely in C++ with PyTorch perception. Carcraft simulator + ChauffeurNet for behavior training. Closed-loop ODD: Phoenix Metro, SF, LA, Austin. Operations: ~400+ vehicle fleet by Q1 2026, ~250k paid rides/wk.

8.2 Tesla FSD v12-v13: end-to-end neural net

Pre-v12 (2016–2023): “Stack of nine networks” + heuristic C++ planner. Failed left turns, awkward roundabouts, jerky stops.

v12 (Mar 2024): single neural network from cameras-only to motor torques + steering rates + brake pressures. Trained on > 1 B miles of human-driven Tesla telemetry (selectively curated). ~50 M parameters fit in HW3, ~200 M in HW4. Wayve-style approach but at Tesla scale.

v13 (late 2024 / 2025): expanded context window, smoother behavior, parking included. Tesla claims “10× safer than human” via internal metrics; external regulators have not validated. NHTSA investigation EA22-002 still open in 2026.

Key tension: end-to-end is easier to scale + improve via more data, but harder to certify under ISO 26262 / 21448 — no module-level safety case. Tesla’s response: “operational metrics” (interventions per mile) replace ISO process.

8.3 Mobileye SuperVision in Zeekr 001 / Polestar

Eight cameras (one front-trifocal + four side + two rear-side + one rear), one EyeQ5H SoC pair (Lockstep), no LiDAR for SuperVision (LiDAR for Chauffeur which is cancelled). REM crowdsourced HD maps. Highway hands-free in EU/CN. Roughly the same hardware Tesla AP4 has, with different software philosophy (modular + REM-anchored vs. mapless E2E).

8.4 Aurora Driver: autonomous trucking

Class-8 Volvo VNL Autonomous + Peterbilt 579 platform. FirstLight FMCW LiDAR (1 km range, Doppler velocity directly), Hesai supplemental, three cameras + radar suite. Operations: Dallas–Houston, OKC–Fort Worth driverless launches Q4 2024 (Aurora press conf). Stack: modular with strong safety case (UL 4600 + ISO 26262). Driverless requires Operations Center monitoring + chase vehicle on initial routes; transition to true unsupervised in 2025–2026.

8.5 Mercedes Drive Pilot — first L3 on US roads

Mercedes-Benz S-Class + EQS, certified L3 by California DMV (May 2023) and Nevada (Jan 2023), following Germany (Dec 2021). ODD: highway only, ≤ 95 km/h, daytime, dry weather, no construction zones, geo-fenced. Driver may legally watch movies or work email during automated driving. Hardware: Valeo SCALA LiDAR (short-range), 1 stereo camera, 1 long-range camera, 4 surround cameras, 5 radars (1 long, 4 short), in-car driver-monitoring camera. Software: heavily modular, ISO 26262 ASIL-D certified for the autonomy supervisor. ODD handback warning: ~10 s before requiring driver attention. Liability: Mercedes accepts during ODD. Adoption: ~~$2500/yr subscription added on top of base Drive Assist; uptake in US has been low (~~< 1% of S-Class buyers as of 2025).

8.6 Cruise pre-shutdown architecture

Bolt EV base, dual redundant compute, ~$150k of sensors per vehicle. Custom Cruise compute box with two Nvidia Pegasus + safety MCU + lockstep verifier. 5 LiDARs, 14 cameras, 21 radars. Stack: C++ modular, in-house simulator Cruisecraft, fleet learning loop. Why it failed: scaling problems + the October 2023 incident chain + GM patience-loss. As of 2026 reorganization, reduced workforce, supplemental robotaxi pilot pending. Lessons: fleet ops + remote-assistance design is half the safety story.

8.7 Aurora long-haul trucking — Dallas to Houston

Aurora Innovation founded 2017 by Chris Urmson (Google self-driving lead), Sterling Anderson (Tesla Autopilot lead), Drew Bagnell (CMU). Strategy: focus on trucking first (highway ODD simpler, freight economics support BOM cost). Hardware: in-house FirstLight FMCW LiDAR (1 km range, direct Doppler), Hesai supplemental short-range, custom long-range radar, automotive cameras. Compute: in-house ML-accelerator + Nvidia. Q4 2024: removed safety drivers from Dallas–Houston route; commercial fleet operating 24/7. Q2 2025: expanded to El Paso and Phoenix. Stack: modular with strong safety case under UL 4600. Customer signups: Schneider, Werner, Hirschbach Motor Lines. Aurora’s bet: trucking margins (~5%) require capital efficiency on the AV cost; FirstLight is the LiDAR moat.

8.8 Apollo Go — China’s largest robotaxi fleet

Baidu’s Apollo Go: ~250k cumulative rides by Q1 2026 across Wuhan, Beijing, Chongqing, Chengdu, Shenzhen. Wuhan operates ~500+ Apollo RT6 vehicles (in-house Chinese platform, ~$30k unit cost — claimed cheapest L4 vehicle in production). Pricing: ~50% lower than typical Didi ride; subsidized as land-grab. Stack: modular, Baidu’s BEVFormer perception, in-house HD maps for all operating cities. The Chinese regulatory regime (no federal NHTSA; cities issue permits; Wuhan offered Apollo aggressive concessions) enables faster expansion than US peers. Apollo Go’s 2025 IPO consideration was paused; the unit remains within Baidu.

8.9 Comma.ai openpilot — open-source aftermarket L2

George Hotz’s company. openpilot: open-source software (MIT/GPL hybrid) for L2 driver-assist running on a $1500 “comma 3X” device + per-car wiring harness. Supports ~250 car models (mostly Toyota, Honda, Hyundai/Kia, GM, Subaru). Approach: vision-only (no radar, no LiDAR) end-to-end neural net (Tinygrad-trained); ~6 M devices in the field. Notable: comma’s bet is that aftermarket L2 reaches more drivers than OEM L2; the data-collection from > 100M miles of openpilot operation feeds the next model. Regulatory status: officially “not a self-driving system” — explicit L2 advertised. Has remained legal across all jurisdictions.

8.10 Zoox bidirectional pod (Amazon, 2024+)

Zoox (Amazon-owned): no steering wheel, no driver seat, bidirectional drive — vehicle has no “front.” Designed from scratch as L4-only.

4 wheels with independent steering (full 360° turning, parallel-park anywhere).
Battery: 133 kWh, 16 hr operating shift.
4-passenger pod, face-to-face seating.
Sensors: 8 cameras + LiDAR + radar at each corner.
ODD: Foster City CA (HQ campus), Las Vegas Strip (paid pilot 2024+), Austin (2025).
Stack: in-house modular.
Manufacturing: 10k-unit/year plant in Fremont CA (Amazon-funded).
Status (2026): public commercial pilot in Las Vegas; pre-revenue at scale.

The bet: a purpose-built L4 vehicle outperforms a retrofit L4 vehicle in unit economics + passenger experience.

8.11 Nuro R3 — driverless delivery (2024)

Nuro’s R3 (revealed 2024): no passengers, designed for last-mile + delivery. Small footprint (1.4 m wide, vs ~1.85 m for a sedan). Trunk-only design.

LiDAR: in-house “Nuro Driver” plus auxiliary.
Range: ~40 mile per charge.
Top speed: ~25 mph (last-mile only).
Customers: Domino’s, FedEx, Walmart, Kroger pilot programs since 2020 (R2, R3 supersedes).
Status: pilot only in Mountain View CA, Houston TX (paused 2023, resumed late 2024).
ODD: residential + suburban arterials, daytime, dry.

Nuro’s bet: no-passenger architecture removes the biggest safety variable (passenger injury) + lets the vehicle be simpler/smaller.

Common architectural patterns

Modular stack (Waymo, Aurora, Cruise, Apollo):
  Sensing → Perception → Prediction → Planning → Control
    Each stage has independent ML training + verification
    Inter-stage interfaces explicit (sensor frames, detections, tracks)
    Maps to ISO 26262 ASIL-D module-level safety case
    Easier regulatory + audit story
    Slower to ship new behavior (each module change cascades)

End-to-end stack (Tesla v12+, Wayve, comma):
  Pixels → (single neural net) → Control
    One network, one training dataset
    Faster to scale
    Harder to verify; no module-level safety case
    Relies on operational metrics (interventions/mile, simulated miles)
    Regulatory framework still catching up

Adjacent

Deployment ecosystem snapshot (2026)

Robotaxi operators (commercial, paid rides):
  - Waymo (US: Phoenix, SF, LA, Austin, Atlanta) — >250k rides/wk
  - Apollo Go (CN: Wuhan, Beijing, Chongqing, Chengdu, Shenzhen)
  - Pony.ai (CN: Beijing, Guangzhou, Hong Kong)
  - WeRide (CN multi-city) — robotaxi + robobus + robosweeper
  - AutoX (CN: Shenzhen, Wuhu) — Pacifica fleet

Robotaxi operators (pilot, restricted access):
  - Zoox (Las Vegas, Foster City)
  - May Mobility (Tokyo, Detroit, Tempe AZ)
  - Motional (Las Vegas with Uber; Hyundai Ioniq 5)

Truck operators (driverless on highway):
  - Aurora (Dallas-Houston, OKC-Fort Worth, Phoenix)
  - Kodiak (I-20/I-10 + Atlas Air)
  - Plus.ai-derivative (some partner programs)
  - Inceptio (CN long-haul)

Delivery (no occupant):
  - Nuro R3 (Mountain View, Houston)
  - Coco (LA, sidewalk)
  - Starship (campuses worldwide, sidewalk)
  - Avride (Yandex spin-out)

L3 production cars (eyes-off in ODD):
  - Mercedes Drive Pilot (S-Class, EQS in CA/NV/DE)
  - Honda Sensing Elite (Legend in JP)
  - BMW Personal Pilot L3 (7 Series in DE only)
  - Stellantis L3 (announced 2025, undeployed)

L2+ production cars (eyes-on in ODD, hands-off):
  - Tesla AP/FSD (all models)
  - GM Super Cruise (Bolt EUV, Cadillac, etc.)
  - Ford BlueCruise (F-150, Mach-E, etc.)
  - Mobileye SuperVision (Polestar 3, Zeekr 001, NIO ET7)
  - XPeng XNGP (CN multi-model)
  - Huawei ADS 2.0/3.0 (Aito, Avatr)
  - Mercedes Drive Assist (most models)

Citations

Shalev-Shwartz S., Shammah S., Shashua A. (2017) “On a Formal Model of Safe and Scalable Self-Driving Cars” (RSS). arXiv:1708.06374.
Treiber M., Hennecke A., Helbing D. (2000) “Congested Traffic States in Empirical Observations and Microscopic Simulations” (IDM). Phys Rev E 62.
Kesting A., Treiber M., Helbing D. (2007) “General Lane-Changing Model MOBIL for Car-Following Models.” Transp Res Rec 1999.
Werling M., Ziegler J., Kammel S., Thrun S. (2010) “Optimal Trajectory Generation for Dynamic Street Scenarios in a Frenét Frame.” ICRA.
Pacejka H. (1993) “The Magic Formula Tyre Model.” Tire Mechanics Handbook.
Gao J., Sun C., Zhao H., et al. (2020) “VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation.” CVPR.
Chai Y., Sapp B., Bansal M., Anguelov D. (2019) “MultiPath: Multiple Probabilistic Anchor Trajectory Hypotheses.” CoRL.
Shi S., Jiang L., Dai D., Schiele B. (2022) “Motion Transformer with Global Intention Localization and Local Movement Refinement” (MTR). NeurIPS.
Bansal M., Krizhevsky A., Ogale A. (2019) “ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst.” RSS.
Levinson J., Thrun S. (2013) “Automatic Online Calibration of Cameras and Lasers.” RSS.
Liu Z., Tang H., Amini A., et al. (2022) “BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird’s-Eye View Representation.” ICRA 2023.
Yan Y., Liu Z., et al. (2022) “TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers.” CVPR.
Coulter R.C. (1992) “Implementation of the Pure Pursuit Path Tracking Algorithm.” CMU-RI-TR-92-01.
Hoffmann G., Tomlin C., Montemerlo M., Thrun S. (2007) “Autonomous Automobile Trajectory Tracking for Off-Road Driving” (Stanley controller). ACC.
ISO 26262 (2018 + 2023) Road vehicles — Functional safety.
ISO 21448 (2022) Road vehicles — Safety of the intended functionality (SOTIF).
ISO/SAE 21434 (2021) Road vehicles — Cybersecurity engineering.
UNECE Regulation No. 157 (2020, amended 2023) Uniform provisions for Automated Lane Keeping Systems.
UL 4600 (2020 / 2022 / 2024) Standard for Safety for the Evaluation of Autonomous Products.
SAE J3016 (2021) Taxonomy and Definitions for Terms Related to Driving Automation Systems.
Waymo Safety Report 2024.
NHTSA EA22-002 Tesla Autopilot/FSD investigation.
California DMV Order of Suspension to Cruise LLC, October 24, 2023.

Compendium

Explorer

Autonomous Driving — Perception, Prediction, Planning, Control Stack

Autonomous Driving — Perception, Prediction, Planning, Control Stack

See also

1. At a glance

2. First principles

2.1 ODD, ODD exit, and the MRM

2.2 Coordinate frames

2.3 Bicycle model

2.4 Pacejka tire model

2.5 Friction circle

2.6 IDM (Intelligent Driver Model, Treiber 2000)

2.7 Time-to-collision (TTC)

2.8 Responsibility-Sensitive Safety (RSS, Shalev-Shwartz 2017, Mobileye)

3. Practical math — perception → prediction → planning → control

3.1 Multi-object tracking pipeline

3.2 Prediction

3.3 Behavior planning

3.4 Trajectory planning

3.5 Control (MPC)

3.6 Lateral controller — pure pursuit and Stanley

3.7 Sized example: highway lane change

3.8 Stopping distance

3.9 Worked example: prediction confidence threshold

4. Design heuristics

5. Components & sourcing

6. Reference data

6.1 SAE J3016 levels

6.2 Notable AV operators (2026 status)

6.3 Standards landscape

6.4 Sensor suite by deployment tier

7. Failure modes & debugging

8. Case studies

8.1 Waymo’s Driver: 6 generations of sensor evolution

8.2 Tesla FSD v12-v13: end-to-end neural net

8.3 Mobileye SuperVision in Zeekr 001 / Polestar

8.4 Aurora Driver: autonomous trucking

8.5 Mercedes Drive Pilot — first L3 on US roads

8.6 Cruise pre-shutdown architecture

8.7 Aurora long-haul trucking — Dallas to Houston

8.8 Apollo Go — China’s largest robotaxi fleet

8.9 Comma.ai openpilot — open-source aftermarket L2

8.10 Zoox bidirectional pod (Amazon, 2024+)

8.11 Nuro R3 — driverless delivery (2024)

Common architectural patterns

Adjacent

Deployment ecosystem snapshot (2026)

Citations

Graph View

Table of Contents