Humanoid Balance Control (ZMP, Capture Point, MPC) — Robotics Reference

Scope. Bipedal balance is the single hardest classical control problem in robotics: the plant is floating-base, underactuated, hybrid, unilaterally-constrained, and statically unstable. Two feet (or one, in single-support) cannot hold the centre of mass without active control — falling is the default. This note covers the engineering recipe for keeping a 1.5 m tall, 60–90 kg humanoid upright while walking, pushed, perturbed, or carrying payload. Quadruped balance lives in [[Robotics/legged-robotics]]; manipulation while standing lives in [[Robotics/mpc-for-robots]]. Here we focus on the CoM ↔ ZMP ↔ contact-wrench trilateral that every humanoid controller — from Honda ASIMO (1996) to Atlas Electric (2024) — must solve.

1. At a glance

A humanoid balances on contact patches whose combined area (≈ 0.04 m² for two adult-sized feet) is tiny relative to the body height (≈ 1.7 m). The CoM hovers about 0.95 m above the floor. The system is therefore an inverted pendulum on a flat foot — unstable, with characteristic divergence time 1/ω = √(h/g) ≈ 0.31 s for h = 0.95 m. Lose footing for one third of a second and the robot is on the floor.

Why is it hard:

Under-actuation. Six DoF of the floating base (3 translation + 3 rotation) cannot be commanded directly. The only authority comes through the unilateral contact wrench at the feet, which is itself bounded by the friction cone and the support polygon.
Hybrid dynamics. Each foot strike or lift-off is an instantaneous structural change in the equations of motion (see [[Robotics/legged-robotics]] §2.1). Eight contact transitions per metre of walk at 1 m/s.
Real-time. Capture-point recovery requires updating footstep targets every 2–10 ms; whole-body torque commands at 1–5 kHz; current loops at 10–30 kHz.
Robustness. Friction varies (μ ≈ 0.3 on tile, 0.9 on rubber), payload moves the CoM, perception is noisy, motors saturate.

Three control paradigms dominate the 2026 humanoid stack:

ZMP / preview control (Honda ASIMO, HRP-2/3/4, JAXA): pre-compute a CoM trajectory whose induced ZMP tracks a desired reference inside the support polygon. Flat-ground, statically posed, slow (0.4–0.8 m/s) but reliable.
Capture-point / DCM step planning (Pratt 2006, Englsberger 2015 — IHMC Atlas, DLR TORO): place the swing foot at the instantaneous capture point to bring the CoM to rest in one step. Naturally handles disturbances and uneven terrain.
Whole-body MPC (Di Carlo 2018 SRBD; Mastalli 2020 Crocoddyl; Dafarra 2024): solve a convex QP over centroidal dynamics + contact wrenches at 50–200 Hz; project to joint torques through a hierarchical task QP at 500–1000 Hz. The dominant 2024–2026 method.

Current humanoid platforms (2026):

Platform	Vendor	Height/Mass	Actuation	Balance method
Atlas Electric	Boston Dynamics (2024)	1.5 m / 89 kg	Electric QDD + planetary	Capture point + WBC + learned residual
Figure 02	Figure AI (2024)	1.7 m / 70 kg	Custom BLDC + cycloidal	Capture-point + neural policy
1X Neo Gamma	1X Technologies (2024)	1.65 m / 30 kg	Tendon-gear (soft drive)	WBC + learned compliance
Optimus Gen 3	Tesla (2025)	1.73 m / 57 kg	In-house BLDC + planetary	WBC + RL
Apollo	Apptronik (2024)	1.73 m / 73 kg	SEA + harmonic	Capture-point + ZMP fallback
Unitree H1	Unitree (2024)	1.8 m / 47 kg	Parallel-mechanism QDD	RL policy (Isaac Lab)
Unitree G1	Unitree (2024)	1.32 m / 35 kg	QDD	RL policy
Digit	Agility Robotics (2023+)	1.75 m / 45 kg	SEA + tendon springs	HZD + capture point

Where it sits in the design stack: floating-base dynamics supplies the model; QDD or SEA actuation supplies the torque; admittance supplies contact compliance; invariant EKF supplies the CoM/orientation estimate; MPC supplies the regulator. This note glues them together at the balance layer.

First ask before applying: Single-support or double-support? Single → strict capture-point or DCM; double → ZMP inside the support polygon will do. Flat ground or uneven terrain? Flat → LIP + preview is exact; uneven → centroidal MPC. RL or model-based? RL trains in Isaac Lab / MuJoCo MJX and is robust to terrain noise; model-based is interpretable and certifiable. Modern systems are hybrid — model-based MPC + learned residual.

2. First principles

2.1 The state and the unstable mode

The configuration is split between base and joints exactly as in [[Robotics/legged-robotics]] §2.1:

$q = (p_{b}, R_{b}, q_{j}) \in SE (3) \times R^{n_{a}}, n_{a} \in [25, 35] for full humanoid$

The CoM dynamics, ignoring leg mass, reduce to a point of mass m (the robot total) at height h_{\text{CoM}}. The only horizontal force at the foot–floor contact is the friction-limited component of the ground reaction wrench. In single-support, the system is a planar inverted pendulum of natural frequency ω = √(g/h_{\text{CoM}}). For h = 0.95 m, g = 9.81 m/s²:

$ω = 9.81/0.95 \approx 3.21 rad/s, 1/ ω \approx 0.31 s$

Any horizontal position error grows like e^{ωt}. A 5 mm tracking error doubles in 0.22 s; a 1 cm error reaches the edge of the foot in under half a second. This timescale governs every other choice in the stack: ZMP MPC must run faster than 1/ω; capture-point recovery must replan within ~ 100 ms; impedance loops must close within milliseconds.

2.2 Zero-Moment Point (ZMP) — Vukobratović 1972

Vukobratović defined the ZMP as the point on the support surface where the net horizontal moment of the ground-reaction wrench is zero. On flat ground it coincides with the Centre of Pressure (CoP); on inclines or stairs the two differ by the inclination geometry.

For a humanoid with feet contacting at r_1, r_2, \dots, each carrying vertical force F_{i,z}:

$p_{ZMP} = \frac{\sum _{i} r _{i} F _{i, z}}{\sum _{i} F _{i, z}} - \frac{1}{m g} [(\dot{L}_{CoM})_{x y}] \times \overset{z}{^}$

(the second term is the angular-momentum correction — Sardain & Bessonnet 2004 IEEE T-SMC). In practice, with arms locked, the correction is small and ZMP ≈ CoP.

The Vukobratović balance theorem: the robot does not tip iff p_{\text{ZMP}} \in \mathrm{ConvHull}(\text{contact patches}). If the ZMP migrates outside the support polygon, the foot edge becomes a pivot and the robot rotates about it. The job of every balance controller is to keep the ZMP inside the polygon — with margin, since perception and actuation are imperfect.

2.3 Linear Inverted Pendulum (LIP) — Kajita 2001

Assume (i) CoM at constant height h_{\text{CoM}}, (ii) all mass at the CoM, (iii) massless telescopic leg, (iv) one foot in contact at x_{\text{ZMP}}. The horizontal CoM dynamics become a linear unstable second-order ODE:

$\overset{x}{¨}_{CoM} = \frac{g}{h _{CoM}} (x_{CoM} - x_{ZMP}) = ω^{2} (x_{CoM} - x_{ZMP})$

Eigenvalues ±ω — one stable, one anti-stable. The diverging mode is the failure mode. Closed-form orbital energy:

$E_{LIP} = \frac{1}{2} \overset{x}{˙}_{CoM}^{2} - \frac{1}{2} ω^{2} (x_{CoM} - x_{ZMP})^{2}$

is conserved during single-support, so the orbit in the (x, \dot x) plane is a hyperbola asymptotic to lines of slope ±ω. This is the fundamental analytical tool for biped footstep planning.

2.4 Capture point (Pratt 2006)

The instantaneous capture point is the location at which the next foot must land — now — to bring the CoM to rest in one step under the LIP dynamics:

$x_{CP} = x_{CoM} + \frac{x ˙ _{CoM}}{ω}$

(For the 2D case Pratt-Carff-Drakunov-Goswami 2006, IEEE-RAS Humanoids; for 3D see Koolen 2012 IJRR.) The capture point is the projection of the CoM forward along the diverging eigenvector. If the swing foot lands exactly on the CP and the CoP is held there, the LIP energy is dissipated to zero by the natural dynamics in one step — no further effort needed.

The locus of N-step capture points (i.e. positions from which the robot can come to rest in at most N steps with feasible foot placements) forms the N-step capture region, a finite, computable set. Disturbances that drive the CoM outside the 1-step region but inside the 2-step region recover after a small jog; outside the N-step region the robot must fall.

2.5 Divergent Component of Motion (DCM) — Englsberger 2015

Englsberger, Ott & Albu-Schäffer (DLR, 2015 IEEE T-RO) generalised the capture point to a continuous state. Define:

$ξ = x_{CoM} + \frac{x ˙ _{CoM}}{ω}$

The LIP dynamics in (x_{\text{CoM}}, \xi) coordinates decouple into:

$\dot{ξ} = ω (ξ - x_{ZMP}) (unstable, controlled by ZMP)$

$\overset{x}{˙}_{CoM} = ω (ξ - x_{CoM}) (stable, follows DCM)$

The unstable mode is now an explicit, scalar, controllable variable — and the stable mode is autonomously tracking. DCM control is the basis of the IHMC Atlas walking controller used in the DARPA Robotics Challenge and the DLR TORO humanoid.

2.6 Single Rigid Body Dynamics (SRBD) — Di Carlo 2018

For dynamic motion where the LIP assumption breaks (vertical CoM excursions, fast manoeuvres), approximate the robot as a single rigid body with mass m, body inertia I_B, and massless legs locating the contact wrenches F_i. State x = (r_{\text{CoM}}, \Theta_B, v_{\text{CoM}}, \omega_B) \in \mathbb{R}^{12} and input u = \mathrm{stack}(F_i):

Compendium

Explorer

Humanoid Balance Control (ZMP, Capture Point, MPC) — Robotics Reference

Humanoid Balance Control (ZMP, Capture Point, MPC) — Robotics Reference

1. At a glance

2. First principles

2.1 The state and the unstable mode

2.2 Zero-Moment Point (ZMP) — Vukobratović 1972

2.3 Linear Inverted Pendulum (LIP) — Kajita 2001

2.4 Capture point (Pratt 2006)

2.5 Divergent Component of Motion (DCM) — Englsberger 2015

2.6 Single Rigid Body Dynamics (SRBD) — Di Carlo 2018

Graph View

Table of Contents