MPC for Robots
Model Predictive Control (MPC) for robots is the practice of repeatedly solving
a finite-horizon optimal-control problem (OCP) on-line, applying the first
control to the plant, then re-solving from the new measured state. The textbook
recipe is identical to process-industry MPC (see [[Engineering/mpc-control]]),
but the implementation is different in kind: robot dynamics are nonlinear and
underactuated, state-dim is high (40+ for humanoids), the loop closes at
100-1000 Hz, and intermittent contact makes the dynamics hybrid. This note
collects the robot-specific design, solver, and deployment knowledge that the
process-MPC literature does not cover.
1. At a glance
Why robot MPC is its own subfield, not just “MPC with a different plant”:
- Nonlinear rigid-body dynamics. The model ẋ = f(x,u) comes from
Newton-Euler / Lagrangian mechanics (see
[[Robotics/dynamics-rigid-body]]). Linear MPC formulations (Camacho 1995; Maciejowski 2002) used in process control do not generalise without re-linearisation every step, which is expensive when the Jacobian is 40×40. - High state-dim. A 7-DOF arm has 14 states; a quadruped (floating base
- 12 joints) has 36 states; a humanoid (floating base + 28-32 joints) sits at 60-70 states. Naive QP scales O(N³·n³) with horizon N and state-dim n.
- Fast update rate. Manipulation MPC: 50-200 Hz. Legged locomotion MPC: 100-500 Hz inner / 30-100 Hz outer. Drone NMPC: 100-1000 Hz. A control cycle of 5 ms leaves no time for general-purpose IPM that costs 50 ms.
- Hybrid dynamics from contact. Each foot independently makes/breaks contact; force is constrained to a friction cone in stance and is zero in swing. This is mixed-integer in the strict reading; practical robotics MPC uses (a) prescribed contact schedule (gait), (b) complementarity constraints (Posa, Cantu, Tedrake 2014), or (c) smoothed contact models (Howell, Le Cleach’, Manchester 2022).
Two solver families dominate:
- DDP / iLQR family (Mayne 1966; Tassa, Erez, Todorov 2012). Exploits the Bellman recursion to factor the Hessian; cost is O(N·n³) per iteration instead of O((N·n)³). Unconstrained at heart; constraints handled by penalty, augmented Lagrangian, or projected line-search. Toolchains: Crocoddyl (LAAS+Edinburgh 2020), OCS2 (ETH 2017), Drake’s TrajectoryOpt.
- Direct-collocation + SQP / IPM (Bock 1984; Hager 2000). Treats both x and u as decision variables; dynamics become equality constraints. Hard inequality constraints are first-class. Real-time-iteration (RTI) variant (Diehl, Bock, Schlöder 2005) runs one SQP step per control cycle and warm-starts from the previous solution — this is what makes embedded NMPC tractable. Toolchains: acados (Verschueren, Frey, Kirches 2018-2021), FORCES Pro (Domahidi, Jerez 2014), CasADi + IPOPT (Andersson 2019; Wächter & Biegler 2006) for offline.
The choice is mostly cultural: legged-robotics labs in Europe (LAAS, ETH, Edinburgh, IIT) lean DDP via Crocoddyl/OCS2; aerial-robotics labs and many manipulation groups use acados; offline trajectory studies use CasADi.
2. First principles
A finite-horizon continuous-time OCP looks like:
minimise ∫₀^T L(x(t), u(t)) dt + φ(x(T))
x(·),u(·)
subject to ẋ(t) = f(x(t), u(t)), t ∈ [0,T]
x(0) = x̂ (current measured state)
g(x(t), u(t)) ≤ 0 (inequality, e.g. friction cone, joint lim)
h(x(t), u(t)) = 0 (equality, e.g. contact-foot-fixed)
x(T) ∈ X_f (terminal set, often {x_goal})
L is the stage cost, φ the terminal cost, f the dynamics. For receding-
horizon MPC, this OCP is re-solved at every control step from the latest x̂
and the first control u*(0) is applied. Discretisation converts the
continuous-time problem to an N-stage NLP.
Three discretisation flavours are used in robotics:
Single shooting. Decision variables are only u₀, …, u_{N-1}; states are integrated by a numerical scheme (RK4 typical for robots — symplectic integrators preferred for energy-conserving long horizons). Hessian becomes dense and ill-conditioned for long horizons; rarely used in practice for N > 30.
Multiple shooting (Bock & Plitt 1984). Discretise the horizon into N
intervals; both x_k and u_k are decision variables, with shooting-gap
equality constraints x_{k+1} = F(x_k, u_k) (RK4 step). Parallelisable across
intervals, much better-conditioned, and the structure exploitable by Riccati-
type linear solvers. acados, FORCES Pro, and Crocoddyl’s MPC mode all use
multiple shooting.
Direct collocation. State and control are represented by polynomials within each interval; the dynamics ẋ = f(x,u) is enforced at collocation points (Legendre-Gauss-Radau pseudospectral — Hager 2000; Hermite-Simpson — Hargraves & Paris 1987). State and control become joint decision variables. Used by TrajOpt (Schulman 2013), PSOPT, Drake. Higher accuracy per node but larger NLP; better for offline trajectory optimisation than for inner-loop MPC.
DDP / iLQR. A dynamic-programming method that linearises f and quadratises L around a trial trajectory and computes a feedback policy δu_k = K_k δx_k + d_k via a backward Riccati pass, then a forward roll-out with line-search. iLQR (Li & Todorov 2004; Tassa 2012) drops the second-order dynamics term of true DDP (Mayne 1966) — cheaper, almost the same convergence in practice. Variants:
- SLQ (Sideris & Bobrow 2005, used in ETH’s OCS2) — continuous-time DDP.
- FDDP (Mastalli et al. 2020, in Crocoddyl) — “feasibility-driven” DDP that does not require dynamic feasibility at every iterate; converges from arbitrary initial guesses including non-rolled-out ones.
- iLQG (Todorov & Li 2005) — DDP with control noise.
- AL-iLQR (Howell, Jackson, Manchester 2019, ALTRO) — augmented- Lagrangian wrapping iLQR for hard inequality constraints.
Real-time iteration (RTI). Diehl, Bock, Schlöder 2005 observed that solving the NLP to convergence every cycle is wasteful when the previous cycle’s solution is a good warm-start. RTI runs one full SQP step (linearise + solve QP + take step) per control cycle, exploiting the fact that the warm- started QP is well-conditioned. This is the bedrock of all real-time NMPC on robots; “MPC at 100 Hz” essentially means “one RTI step per 10 ms”.
3. Worked examples
(a) Cart-pole swing-up via iLQR
The canonical demo (running example in Tassa, Erez, Todorov 2012). State
x = [x, θ, ẋ, θ̇] (cart position [m], pole angle [rad], cart velocity [m/s],
pole angular velocity [rad/s]); control u is force on cart [N]. Goal: invert
the pole from θ=π (hanging) to θ=0 (up) and stabilise.
Cost design:
- Horizon N = 100, dt = 0.02 s → T = 2 s.
- Stage cost:
L(x,u) = x' Q x + u' R uwith Q = diag(1, 10, 0.1, 0.1), R = 0.001. - Terminal:
φ(x_N) = x_N' Q_f x_Nwith Q_f = 100·I₄.
One iLQR iteration:
- Forward pass. Roll out from x̂ with current control sequence u^{(i)}; record (x_k, u_k) and cost-to-go.
- Linearise & quadratise. Compute A_k = ∂f/∂x, B_k = ∂f/∂u via finite- difference (or analytic — see Pinocchio’s algorithmic derivatives for the rigid-body case). Compute first/second derivatives of L.
- Backward pass. Riccati recursion for V_k = optimal cost-to-go quadratic approximation; compute feedback K_k and feedforward d_k.
- Forward roll-out with line-search. Try α = 1 first; if cost increases,
halve to α = 0.5, 0.25, …, with min α ≈ 2⁻¹⁰. Updates:
u_k = u_k^{(i)} + α d_k + K_k (x_k - x_k^{(i)}). - Convergence test. Stop when
|dV_α=1| < 10⁻⁶or||d_k||_∞ < 10⁻⁴.
Empirically converges in ~15 iterations from u ≡ 0 initial guess. Tassa’s implementation runs ~50 ms in MATLAB; modern C++ Crocoddyl does it in <1 ms.
(b) Quadruped trot via SRBD MPC
The Di Carlo & Wensing 2018 IROS convex MPC, deployed on MIT Cheetah 3 and nearly every academic quadruped since.
Model — Single Rigid Body Dynamics (SRBD). Replace the full 18-DOF floating-base + 12-joint system with the centroidal inertia of the trunk and the ground-reaction-forces (GRFs) at the feet. State x ∈ R¹² = [p, Θ, ṗ, ω], where p ∈ R³ is base position [m], Θ ∈ R³ is roll-pitch-yaw [rad], ṗ ∈ R³ is COM linear velocity [m/s], ω ∈ R³ is base angular velocity [rad/s]. Control u ∈ R¹² = [f₁, f₂, f₃, f₄], one 3D force per foot [N].
Dynamics (linearised, key trick — Di Carlo 2018):
m p̈ = Σᵢ fᵢ - m g
I_world ω̇ = Σᵢ (rᵢ - p) × fᵢ
Θ̇ = T(Θ) ω (Euler-rate map)
Linearise the rotational inertia by holding I_world fixed at the current orientation each MPC step — this turns the dynamics affine in u and produces a QP rather than an NLP.
Constraints.
- Friction cone (linearised pyramid):
|fᵢ_x| ≤ μ fᵢ_z,|fᵢ_y| ≤ μ fᵢ_z, with μ ≈ 0.4-0.6 depending on surface. - Stance/swing schedule from a prescribed gait clock:
fᵢ_z = 0when foot i is in swing;fᵢ_z ≥ f_min ≈ 5 Nwhen in stance. - Force magnitude:
fᵢ_z ≤ f_max ≈ 250 N(Cheetah 3 actuator limit).
Horizon & rate. N = 50 nodes, dt = 0.01 s → 0.5 s lookahead. Re-solve at 100 Hz. Solve time on Intel i7 NUC: 3-8 ms with qpOASES warm-started; ANYmal deployments report 5-15 ms.
The MPC output is reference foot forces. A lower-loop whole-body QP at 500-
1000 Hz tracks these forces while satisfying the full kinematic/dynamic
manipulator constraints (see [[Robotics/legged-robotics]] for the WBC).
(c) 7-DOF manipulator collision-avoidance MPC
Setup: Franka Emika Panda (7 revolute joints). State x ∈ R¹⁴ = [q; q̇], joint positions [rad] and velocities [rad/s]. Control u ∈ R⁷ is joint torque [N·m] or joint acceleration depending on whether the inner loop is torque or velocity (Franka exposes both via libfranka).
Model. Full rigid-body dynamics M(q) q̈ + C(q,q̇) q̇ + g(q) = τ, with M,
C, g computed by Pinocchio’s CRBA/RNEA (Featherstone 2008 algorithms).
Cost.
- Tracking:
||p_ee(q) - p_des(t)||²weighted ~100 in [m²]. - Posture:
||q - q_rest||² · w_posturewith w_posture = 0.01 (regulariser). - Effort:
||τ||² · 10⁻⁴. - Terminal: tracking weight ×10.
Constraints.
- Joint limits
q_min ≤ q ≤ q_max(Panda: q1∈[-2.9, 2.9], q4∈[-3.07, -0.07], etc., from the official URDF). - Velocity limits
|q̇| ≤ q̇_max(Panda: 2.17 rad/s on q1). - Torque limits
|τ| ≤ τ_max(Panda: 87 N·m on q1-4, 12 N·m on q5-7). - Signed-distance ≥ d_safe ≈ 0.03 m to a list of spherical obstacle primitives. SDF linearised around current iterate (Schulman’s TrajOpt approach 2013); use HPP-FCL or libccd for the SD computation.
Horizon & rate. N = 20, dt = 0.05 s → 1 s lookahead. 50 Hz outer. acados with RTI + partial condensing solves in 2-5 ms on a standard i7 CPU.
4. Solver families & toolchains
| Toolchain | Author(s) / year | Language | Method | Typical deployment |
|---|---|---|---|---|
| Crocoddyl | Mastalli et al., LAAS-CNRS + Edinburgh, 2020 (ICRA) | C++ / Python | DDP, FDDP, AL | Talos humanoid, ANYmal, MIT-Cheetah replicas, HRP-2, SOLO |
| acados | Verschueren, Frey, Kirches et al., 2018-2021 | C (codegen) | RTI-SQP + HPIPM/qpOASES | TIAGo, Jackal UGV, autonomous racing, drones, Crazyflie |
| OCS2 | Farshidian et al., ETH RSL, 2017 | C++ | SLQ (cts-time DDP) | ANYmal C / D, ANYmal-on-wheels, ballbot |
| CasADi + IPOPT | Andersson, Gillis, Horn, Rawlings, Diehl 2019; Wächter & Biegler 2006 | Python / MATLAB / C++ | Direct collocation + IPM | Offline trajopt, MHE for SLAM, research prototypes |
| TrajOpt | Schulman, Ho, Lee, Awwal, Bradlow, Abbeel 2013 (IJRR) | C++ / Python | Sequential convex programming | Kinematic trajectory optimisation, motion planning |
| FORCES Pro | Domahidi & Jerez 2014, embotech | C codegen, commercial | RTI-SQP + interior-point | Embedded automotive, drones; ABB / Mercedes deployments |
| drake/MathematicalProgram | Tedrake et al., MIT, 2014+ | C++ / Python | Multi-backend (SNOPT, IPOPT, Gurobi, MOSEK) | Research; Atlas / Spot at TRI; benchmark studies |
| ALTRO | Howell, Jackson, Manchester 2019 | Julia / C++ | AL-iLQR | Aerospace research, REx Lab |
| HPIPM | Frison, Diehl 2020 | C | Riccati QP / IPM | Internal solver inside acados |
| PSOPT | Becerra 2010 | C++ | Pseudospectral collocation | Aerospace, offline studies |
A few practical notes:
- Crocoddyl is the dominant choice when you have analytical rigid-body derivatives via Pinocchio (Carpentier et al. 2019); the two libraries share authors and integrate tightly. FDDP convergence from infeasible warm-starts is genuinely useful in MPC because the previous-step solution may violate the new shooting gaps after a state jump.
- acados is the most-deployed embedded NMPC framework as of 2024-2025. Codegens C with no runtime dependencies; HPIPM exploits the multiple- shooting block-tridiagonal Hessian structure. The Python interface is excellent for development; the generated C runs anywhere.
- OCS2 has a small footprint but is older and less actively maintained than Crocoddyl; ANYbotics inherited it via Marco Hutter’s lab.
- CasADi is the right answer when “off-line” — designing controllers, identifying parameters, computing reference trajectories — and the wrong answer for inner-loop MPC because IPOPT is too slow per iteration.
5. Design heuristics
Real-time MPC succeeds or fails on a handful of pragmatic choices:
- Keep the horizon short. T = 0.3-1.0 s is normal for legged and aerial robots. Manipulation: 0.5-2 s. Process MPC’s “few minutes” horizons are irrelevant — robot dynamics are stiff at the second-scale.
- dt ≥ 5 ms. RK4 with sub-millisecond dt explodes the NLP size with little benefit. If you need fine integration, run a finer integrator inside each shooting interval (multiple-shooting “implicit RK”).
- Use RTI, not full SQP-to-convergence. Diehl 2005’s result is that one SQP step from a warm-start is nearly optimal when the plant is stable; multiple steps are wasteful when the next solve is 10 ms away.
- Warm-start always. Initialise (x, u) with the shifted previous
solution:
u_k ← u_{k+1}^prev,x_k ← x_{k+1}^prev, then re-roll the dynamics. The first cold solve is allowed to take 10-100× longer. - If the QP is > 50 % of the period, you’ll miss deadlines. Profile your worst-case, not your mean. Use HPIPM partial-condensing N₂ parameter to trade between dense-QP size and Riccati passes.
- Cost tuning is empirical. Start with Q on positions only; add velocity damping when oscillating; add control rate (Δu) cost when chattering. Bryson’s rule (Q_ii = 1/x_max²) gives a starting point, never a final one.
- For contact-rich tasks, prefer contact-implicit MPC (Posa, Cantu, Tedrake 2014 IJRR; Howell, Le Cleach’, Manchester 2022) over hand-tuned mode switching. The complementarity formulation is harder to solve but handles unanticipated contacts.
- SRBD is the right fidelity for legged locomotion MPC. Whole-body MPC at 100 Hz is barely tractable on a flagship CPU (10-30 ms solve), so reserve whole-body for either (a) offline trajectory generation or (b) a slower outer loop (20-50 Hz) feeding an SRBD inner loop.
- Outer-loop MPC + inner-loop tracking is the default architecture. MPC produces a reference (forces, accelerations, or joint trajectories); a faster (500-1 kHz) operational-space or whole-body QP tracks it. The inner loop sees current state and reacts to disturbances the MPC missed.
- Use augmented Lagrangian, not penalty. Penalty methods need ever-growing weights; AL converges with bounded multipliers and is what Crocoddyl/ALTRO use.
6. Numerical conditioning
Robot MPC NLPs are routinely ill-conditioned, for understood reasons:
- Large state-dim with rotational coordinates. Quaternions (4 components, unit-norm) and rotation matrices (9 components, orthogonal) are over-parameterised. The KKT matrix becomes structurally singular along the manifold tangent. Tassa 2012 §III.C documents the symptom: backward- pass Hessian Q_uu becomes indefinite at iterates with large angle errors.
- Standard fix: regularisation. Add λI to Q_uu (and recursively the Riccati matrices) until positive-definite. λ adapted Levenberg-Marquardt style: increase 10× on non-PD or non-descending step; decrease 10× on successful step. Crocoddyl, OCS2, and FORCES all do this internally.
- Quaternion drift. Integrating ̇q = ½ Ω(ω) q with RK4 drifts off the unit-norm manifold within a few hundred steps. Two remedies: (a) after every integration step, normalise q ← q / ||q||; (b) use the error-state formulation — represent state as nominal quaternion plus tangent-space error δθ ∈ R³, integrate δθ linearly, occasionally fold into the nominal (Forster, Carlone, Dellaert, Scaramuzza 2015 — the IMU-preintegration paper introduced this notation to robotics).
- SO(3) Jacobians. The tangent-space derivative of a rotation w.r.t. ω is not constant; use the right-Jacobian Jr(θ) (Sola 2018 micro-Lie tutorial) to map between body and world rates. Crocoddyl and Pinocchio implement these.
- Floating-base coordinates. A free-floating rigid-body has 6 unactuated DOFs. The mass matrix M(q) couples joints to base; its inverse is dense. Use CRBA (Composite Rigid Body Algorithm, Walker & Orin 1982) for M and RNEA (Featherstone 2008) for the bias term. Pinocchio’s spatial-algebra implementation gives both with analytic derivatives at near-Eigen speed.
- Scaling. Position (m), velocity (m/s), angle (rad), torque (N·m) have
hugely different magnitudes. Hessian condition number explodes without
pre-scaling. Either (a) rescale states by
D_x = diag(1/x_typ)before passing to the solver, or (b) tune Q and R until well-conditioned — practical heuristic: aim for condition number < 10⁴. - Inequality slacks. Hard inequalities with active sets near the iterate produce abrupt Hessian rank changes. Relax with slack variables penalised quadratically; gives a smoother NLP at the cost of small constraint violation (Hannemann-Tamás & Marquardt 2010).
7. Failure modes & debugging
- QP solver diverges or returns infeasible. Almost always indefinite Hessian or a too-tight constraint. Crank regularisation; sanity-check that Q_uu is PD by eigenvalue; if not, increase trust-region radius or back off line-search.
- Solution looks smooth in sim, unstable on hardware. Model mismatch.
Either (a) add online system ID (see
[[Engineering/system-identification]]— recursive least-squares on inertia or friction parameters), or (b) wrap the MPC with a robust inner loop (H∞ — see[[Engineering/h-infinity-robust]], or[[Engineering/sliding-mode-control]]for the actuator-level disturbance rejection). - Foot slips during trot/walk. Friction-cone constraint in MPC is tight but the low-level WBC commands forces outside the cone because the MPC reference is on the boundary. Tighten the low-level cone (μ_lowlevel = 0.8 μ_mpc) so disturbances do not push it over.
- Late deadline misses (jitter). Profile worst-case, not mean. Options in order of severity: (i) decrease N or increase dt; (ii) switch from full SQP to RTI (one step); (iii) drop hard inequalities to slacked-soft; (iv) switch from interior-point to active-set QP if the active set is small and stable.
- Local-minima trap on hard manoeuvres (humanoid stand-up from prone; parkour vault; tight squeeze). DDP is a local method. Remedies: (a) multi-start from several initial guesses; (b) seed with a sampling-based planner (CHOMP — Ratliff 2009, STOMP — Kalakrishnan 2011, RRT-Connect for geometric seeding); (c) use a global trajectory optimiser offline (Mixed-Integer Convex — Deits & Tedrake 2014 for footstep planning).
- NaN propagation. Almost always integrator instability after a divergent SQP step. Cap the step length and add NaN guards in the forward pass; Crocoddyl does this by default.
- Cost blows up far from goal. Quadratic terminal cost is unbounded.
Either (a) saturate the cost (
min(cost, cost_max)), or (b) use a log-barrier or Huber-loss terminal.
8. Hardware platforms & deployments
- ANYmal C / D (ANYbotics, Hutter lab spin-out, 2019+) — OCS2-based MPC at 100 Hz with whole-body QP tracking at 400 Hz. Stair-climbing demonstrations 2020-2023 ran the full Crocoddyl whole-body pipeline at 50 Hz with Pinocchio analytical derivatives.
- MIT Cheetah 3 & Mini Cheetah (Kim, Wensing, Bledt, Katz, Di Carlo, 2018-2019) — convex SRBD MPC at 30-40 Hz on an Intel NUC; enabled backflips, stair-climb without prior maps, and the open-hardware Mini Cheetah platform that spawned dozens of academic clones.
- Boston Dynamics Spot (commercial 2020+) — proprietary stack believed to be MPC-based; engineers have publicly described an SRBD outer loop and whole-body QP inner loop. Atlas parkour (2018-2022 videos) uses a multi-fidelity stack with a high-level footstep planner described in Kuindersma et al. 2016 (Autonomous Robots) — the DRC Atlas paper — and newer work cited in BD blog posts.
- Unitree Go1 / Go2 / B1 (2022+) — hybrid MPC + RL. The locomotion policy is RL-trained but inherits the SRBD MPC structure as either an initial-condition supervisor or a residual reference.
- PAL Robotics TIAGo — whole-body Crocoddyl-based MPC (Mastalli, Merkt, Marti-Saumell, Stouraitis, Goepp, et al. 2022) for mobile manipulation; runs at 50 Hz on the onboard CPU.
- Skydio R1 / 2 / X10 drones (2018+) — onboard NMPC for trajectory tracking and obstacle avoidance; published details are sparse but the architecture is widely described in talks as RTI-SQP at ~200 Hz.
- Crazyflie / RotorS / PX4 community — acados-based NMPC controllers (Foehn et al. 2021 — time-optimal planning for quadrotor waypoint flight) run at 100-200 Hz on Cortex-M4 / Pi-Zero-class hardware.
- Tesla Optimus (2022+) — Tesla’s publicly shown talks reference SQP-MPC for locomotion; specifics undisclosed.
- Apptronik Apollo, Figure 01/02, 1X Neo, Agility Digit — humanoid commercial fleet (2024-2025); locomotion stacks are believed to be a mixture of MPC + learned policies, similar to Unitree’s pattern.
9. Case studies
(a) MIT Cheetah 3 convex MPC (Di Carlo & Wensing, IROS 2018)
The crisp insight: the rotational part of rigid-body dynamics is what makes quadruped dynamics nonlinear, and you can dodge it by linearising the inertia tensor around the current orientation at every MPC step.
I_world(t) ≈ R(Θ̂(t₀)) I_body R(Θ̂(t₀))ᵀ (frozen each MPC step)
With I_world held constant over the horizon, both linear and angular dynamics become linear in the GRFs, and the OCP collapses to a QP. The QP has 12·N decision variables (N=10-50) and friction-cone inequalities; qpOASES solves it in <1 ms on the Cheetah 3 embedded Intel NUC. Re-solve at 30-40 Hz.
Consequences for capability:
- Backflip from a standing start. The MPC is replanned every cycle, so there is no offline trajectory; the gait scheduler triggers the airborne phase by switching all four feet to swing, and the MPC handles the ballistic dynamics with the same code path.
- Blind stair-climb. No terrain map; the MPC’s foot-position references come from the gait clock + body-velocity command, and contact events are detected on-board.
The paper became the de-facto reference architecture; ANYmal, Spot, Unitree all use variants. The main known limitation: the linearised inertia tensor introduces ~5° tilt-tracking error at aggressive turns; whole-body MPC recovers this at higher computational cost.
(b) ANYmal stair-climbing with whole-body MPC (Mastalli et al. 2023)
Crocoddyl FDDP solving the full floating-base + 12-joint dynamics:
- State dim: 6 (base pose) + 12 (joints) + 6 (base twist) + 12 (joint velocities) = 36; with the unit-quaternion this becomes a 37-dim ambient representation, 36-dim tangent.
- Control dim: 12 joint torques.
- Horizon: 0.5 s at 20 ms ⇒ N = 25.
- Pinocchio’s analytic derivatives of CRBA, RNEA, and contact Jacobians feed Crocoddyl’s backward pass — no finite differences.
- Solves at 8-20 ms on an i7-class CPU; deployed at 50 Hz with the previous- cycle solution warm-starting.
The lesson: with analytic derivatives + FDDP + warm-start, whole-body MPC is just tractable at 50 Hz on present hardware. Without any of those three levers it isn’t.
(c) Atlas parkour (Boston Dynamics, 2018-2022 videos)
Specifics are proprietary, but the architecture is publicly described in Kuindersma, Permenter, Tedrake (Autonomous Robots 2016, “Optimization-based locomotion planning, estimation, and control design for the Atlas humanoid robot”), the canonical DRC Atlas paper, and updated in BD’s 2021 blog post “Picking Up Momentum”:
- High-level: footstep planner over discrete contact sequences (mixed- integer convex; Deits & Tedrake 2014 IROS).
- Middle-level: SRBD MPC at 100 Hz producing a reference centroidal trajectory and contact wrenches.
- Low-level: whole-body QP at 1 kHz tracking the wrench reference while satisfying joint, torque, and contact-cone constraints (Kuindersma 2016 §V).
- State estimation via factor-graph EKF using IMU + joint encoders + vision.
The 2021 parkour videos add learned components — most plausibly to choose contact strategies or to seed the trajectory optimisation — but the runtime controller remains MPC-based.
10. Cross-references
[[Engineering/mpc-control]]— general (process-industry, linear) MPC.[[Robotics/state-space-lqr]]— LQR is the one-step / infinite-horizon linear special case of MPC; the Riccati machinery in iLQR is identical.[[Robotics/dynamics-rigid-body]]— the rigid-body modelM(q) q̈ + C q̇ + g = τthat every robotics MPC formulation depends on.[[Robotics/legged-robotics]]— SRBD, whole-body QP, gait scheduling that wrap legged-robot MPC.[[Robotics/humanoid-balance]]— capture-point / DCM control as a simplified MPC.[[Robotics/multirotor-design]]— NMPC on drones is mature; acados is the go-to.[[Robotics/rl-for-control]]— hybrid MPC + RL (Wang, Levine, Abbeel 2018; ETH ANYmal-RL stack) is the active frontier.[[Robotics/trajectory-generation]]— offline trajectory optimisation (CasADi, TrajOpt) is the close cousin.[[Robotics/impedance-control]]— common as the low-level layer beneath MPC.[[Engineering/sliding-mode-control]]— robust inner loop wrap for MPC.[[Engineering/system-identification]]— online ID closes the model- mismatch loop.[[Engineering/h-infinity-robust]]— alternative robust inner loop.[[Engineering/adaptive-control]]— for model-parameter drift.
11. Citations
Founding methods
- Mayne (1966). “A second-order gradient method.” Int. J. Control 3(1):85-95. — Original DDP.
- Bock & Plitt (1984). “A multiple shooting algorithm for direct solution of OCPs.” IFAC World Congress, Budapest.
- Hargraves & Paris (1987). “Direct trajectory optimization using NLP and collocation.” JGCD 10(4):338-342.
- Hager (2000). “Runge-Kutta methods in optimal control and the transformed adjoint system.” Numer. Math. 87:247-282.
- Li & Todorov (2004). “Iterative LQR for nonlinear biological movement systems.” ICINCO.
- Todorov & Li (2005). “Generalized iLQG for locally-optimal feedback of constrained nonlinear stochastic systems.” ACC.
- Diehl, Bock, Schlöder (2005). “A real-time iteration scheme for NLP in optimal feedback control.” SIAM J. Control Optim. 43(5):1714-1736. — RTI.
Modern robotics MPC
- Tassa, Erez, Todorov (2012). “Synthesis and stabilization of complex behaviors through online trajectory optimization.” IROS. — Canonical iLQR-for-robotics.
- Schulman et al. (2013). “Finding locally optimal, collision-free trajectories with sequential convex optimization.” IJRR. — TrajOpt.
- Posa, Cantu, Tedrake (2014). “A direct method for trajectory optimization of rigid bodies through contact.” IJRR 33(1):69-81. — Contact-implicit.
- Deits & Tedrake (2014). “Footstep planning on uneven terrain with mixed-integer convex optimization.” Humanoids.
- Kuindersma et al. (2016). “Optimization-based locomotion planning, estimation, and control design for the Atlas humanoid.” Autonomous Robots 40:429-455.
- Di Carlo, Wensing, Katz, Bledt, Kim (2018). “Dynamic locomotion in MIT Cheetah 3 through convex MPC.” IROS.
- Bledt, Wensing, Kim (2017). “Policy-regularized MPC for MIT Cheetah.” IROS.
- Howell, Jackson, Manchester (2019). “ALTRO: a fast solver for constrained trajectory optimization.” IROS.
- Mastalli et al. (2020). “Crocoddyl: an efficient and versatile framework for multi-contact optimal control.” ICRA.
- Mastalli et al. (2022). “A direct-indirect hybridization approach to control-limited DDP.” Autonomous Robots.
- Howell, Le Cleach’, Manchester (2022). “Trajectory optimization with optimization-based dynamics.” RA-L.
- Foehn, Romero, Scaramuzza (2021). “Time-optimal planning for quadrotor waypoint flight.” Science Robotics 6(56).
- Sleiman, Farshidian, Minniti, Hutter (2021). “A unified MPC framework for whole-body dynamic locomotion and manipulation.” RA-L.
- Farshidian et al. (2017). “An efficient optimal planning and control framework for quadrupedal locomotion.” ICRA. — OCS2.
Solver infrastructure
- Wächter & Biegler (2006). “Implementation of an interior-point filter line-search algorithm.” Math. Program. 106(1):25-57. — IPOPT.
- Domahidi & Jerez (2014). “FORCES Pro: code generation for embedded optimization.” embotech.
- Andersson, Gillis, Horn, Rawlings, Diehl (2019). “CasADi: a framework for NLP and OC.” Math. Program. Comput. 11:1-36.
- Frison & Diehl (2020). “HPIPM: a high-performance QP framework for MPC.” IFAC World Congress.
- Verschueren et al. (2021). “acados — a modular open-source framework for fast embedded OC.” Math. Program. Comput. 14:147-183.
Rigid-body machinery
- Walker & Orin (1982). “Efficient dynamic computer simulation of robotic mechanisms.” ASME J. Dyn. Syst. 104(3):205-211. — CRBA.
- Featherstone (2008). Rigid Body Dynamics Algorithms. Springer. — CRBA, RNEA, ABA.
- Forster, Carlone, Dellaert, Scaramuzza (2015). “IMU preintegration on manifold for VIO.” RSS. — Error-state formulation.
- Sola, Deray, Atchuthan (2018). “A micro Lie theory for state estimation in robotics.” arXiv:1812.01537. — SO(3) / SE(3) notation.
- Carpentier et al. (2019). “Pinocchio — fast rigid-body dynamics with analytical derivatives.” SII.
Process-MPC roots (historical bridge)
- Camacho & Bordons (1995). Model Predictive Control in the Process Industry. Springer.
- Maciejowski (2002). Predictive Control with Constraints. Prentice Hall.
- Rawlings, Mayne, Diehl (2017). Model Predictive Control: Theory, Computation, and Design, 2nd ed. Nob Hill.