Mobile Manipulation — Robotics Reference

Scope. A mobile manipulator is one or more articulated arms mounted on a locomoting base — wheeled, tracked, legged, or floating. The control problem is what distinguishes it from “arm bolted to a table” and “AMR with a flat top”: the base and the arm share inertia, share workspace constraints, and share a single task. Arm design lives in [[Robotics/manipulator-design]]; wheeled chassis kinematics live in [[Robotics/mobile-base-wheeled]]; floating-base dynamics live in [[Robotics/legged-robotics]]; this note ties them together. The hard problems are: (a) deciding when to move the base vs the arm, (b) coordinating them when both must move at once, (c) keeping the base from tipping while the arm extends, (d) preserving end-effector accuracy while the base wobbles, and (e) running all of the above at servo rate on a battery-powered platform.

1. At a glance

A mobile manipulator is a redundant kinematic system whose end-effector pose is the product of two transforms:

$T_{ee}^{world} = T_{base}^{world} \cdot T_{arm}^{base} \cdot T_{ee}^{arm}$

with $T_{base}^{world}$ supplied by the locomotion subsystem (3 DOF for a planar base, 6 DOF for a legged float) and $T_{arm}^{base}$ by the manipulator joints (typically 6 or 7 DOF). Total task-space DOF: 9 (diff-drive + 6-arm), 10 (diff-drive + 7-arm), 13 (humanoid). End-effector tasks live in $SE (3)$ (6 DOF) — so every mobile manipulator is redundant by 3–7 DOF. That redundancy is the lever the controller pulls on to keep the base happy, the arm dexterous, the centre of gravity inside the wheelbase, and the obstacles avoided.

Two operating regimes, picked by the planner per task:

Decoupled (navigate-then-manipulate). Drive to a fixed standoff, brake, then plan an arm trajectory with the base treated as a rigid pedestal. 90 % of warehouse and home-service tasks (pick a box from a shelf, set a plate on a counter, plug a charger) tolerate this — and it keeps the software simple: Nav2 → arrive → MoveIt 2 → place.
Coordinated (whole-body). Base and arm move simultaneously, coupled by a single optimisation. Required when (i) the workspace ball doesn’t cover the task (long reach down an aisle), (ii) the task constraints force the base to move (opening a drawer pulls the gripper toward you — something has to back away), (iii) cycle time matters more than control simplicity (Mobile ALOHA bimanual tasks), or (iv) the platform is a legged floating-base humanoid (Atlas, Apollo, Optimus) where the “base” is not even a rigid body.

2026 platform wave. The market split that has hardened by 2026:

Class	Example platforms	Base	Arms	Payload
Warehouse single-arm	Boston Dynamics Stretch, Berkshire Grey, Pickle	Tracked / wheeled	1× 7-DOF + suction	20–25 kg
Logistics fleet	Locus LocusBot, Fetch (Zebra) Roller-Top, GreyOrange	Diff / swerve	Conveyor or 6-DOF	15–80 kg
Bimanual research / kitchen	Mobile ALOHA, Toyota T-HR3, RB-Vogui XL+iiwa×2	Cart / omni / swerve	2× 6-DOF or 2× 7-DOF	2–5 kg/arm
Hospital / hospitality	TIAGo (PAL), HSR (Toyota), Stretch RE3 (Hello)	Diff	1× 5–7 DOF	2–5 kg
Quadruped + arm	Spot + Spot Arm, ANYmal + Z1 (Unitree), Vision 60 + arm	Quadruped	1× 6-DOF	4–14 kg
Humanoid 2025–26	Figure 02, 1X Neo Gamma, Optimus Gen 3, Apollo, Unitree H1+, Atlas Electric	Biped floating	2× 7-DOF	9–25 kg total upper-body

Where it sits in the design stack. A mobile manipulator is the integration target of every other Robotics note. Above it: task and motion planning (TAMP), perception, fleet management. Below it: joint-level control on each arm joint, wheel/joint velocity loops on the base. Between them — and this is where mobile-manipulation engineering actually happens — sits the whole-body controller: a single function that takes a task description (e.g. “end-effector at pose X, base pointing forward, CoM over support polygon, no collisions”) and returns commands for every actuator on the robot, every 1–10 ms, while respecting kinematic, dynamic, and safety constraints.

First ask before you write a line of code:

Decoupled or coordinated? If the task tolerates “drive then manipulate,” do that. Coordinated control quintuples engineering effort.
Holonomic base? If yes (mecanum, swerve, omni, legged) the controller can independently command $(v_{x}, v_{y}, ω)$ ; if no (diff, Ackermann) the base contributes only 2 instantaneous DOFs to the combined Jacobian.
Is the arm payload comparable to the base mass? If yes (Stretch arm extended with a 23 kg box on a sub-200 kg platform) tip-over is the dominant constraint and dictates everything else. Compute the static support polygon margin before sizing the arm.
Real-time torque control or position only? Whole-body QP needs joint-level torque control (Franka, iiwa, ANYmal, Spot+Arm, Atlas, Apollo); on a position-only stack (UR + AMR) you are limited to admittance/impedance approximations with worse contact behaviour.
Floating base or planar base? Floating-base humanoid + manipulation is a research frontier (Sentis 2007, Kanoun 2011, BD Atlas 2024). Planar wheeled + manipulation is solved.

2. First principles

2.1 Combined kinematics

Let the arm have $n_{a}$ joints with configuration $q_{a} \in R^{n_{a}}$ and the base have $n_{b}$ DOFs with configuration $q_{b}$ (planar: $n_{b} = 3$ , $(x_{b}, y_{b}, θ_{b})$ ; floating: $n_{b} = 6$ ). Full configuration:

$q = [q_{b} q_{a}] \in R^{n_{b} + n_{a}}$

The end-effector pose is the chained transform above. Its world-frame velocity is:

$[v_{ee} ω_{ee}] = J_{full} (q) \overset{q}{˙} = [J_{b} (q) J_{a} (q)] [\overset{q}{˙}_{b} \overset{q}{˙}_{a}]$

with $J_{full} \in R^{6 \times (n_{b} + n_{a})}$ . The arm Jacobian $J_{a}$ is the usual fixed-base body Jacobian expressed in the world frame; the base Jacobian $J_{b}$ is the rigid-body adjoint mapping body-frame base velocity into end-effector twist:

$J_{b} = Ad_{T_{base}^{world}} \cdot S_{b}$

where $S_{b}$ is the base actuation matrix: $S_{b} = diag (1, 1, 1)$ for an omni base (full $v_{x}, v_{y}, ω$ commandable), and $S_{b} = [e_{1} e_{6}]$ (forward + yaw only) for a differential-drive base. The Pfaffian non-holonomic constraint of a diff-drive base ([[Robotics/mobile-base-wheeled]]) is encoded in $S_{b}$ — you cannot command $\overset{y}{˙}_{b}$ instantaneously; the planner integrates yaw to swing the base before any lateral motion appears at the end-effector.

2.2 Redundancy and the null space

With $J_{full} \in R^{6 \times (n_{b} + n_{a})}$ and $n_{b} + n_{a} \geq 7$ , the system is kinematically redundant with respect to a 6-DOF SE(3) task. The general velocity solution is:

$\overset{q}{˙} = J^{+} ξ_{ee} + (I - J^{+} J) \overset{q}{˙}_{0}$

where $J^{+}$ is the Moore-Penrose pseudoinverse (damped: $J^{⊤} (J J^{⊤} + λ^{2} I)^{- 1}$ near singularities), $ξ_{ee}$ is the desired end-effector twist, and $\overset{q}{˙}_{0}$ is an arbitrary secondary-task velocity that the null-space projector $(I - J^{+} J)$ selects only the task-feasible component of. The classical secondary tasks for a mobile manipulator:

Keep the arm centred in its workspace — $\overset{q}{˙}_{0} = - k_{q} (q_{a} - q_{a}^{rest})$ pulls joints toward a manipulable rest pose; the base drives to maintain end-effector pose.
Preserve manipulability — $\overset{q}{˙}_{0} = k_{m} \nabla_{q} w (q)$ with $w (q) = det (J_{a} J_{a}^{⊤})$ (Yoshikawa 1985); pushes the configuration away from arm singularities by walking the base.
Keep CoM over support polygon — for legged or tip-prone wheeled bases.
Avoid joint limits — $\overset{q}{˙}_{0} = - k_{l} \nabla_{q} H (q)$ with a barrier function $H$ .
Avoid collisions — gradient of distance to nearest obstacle, computed against a swept volume ([[Robotics/path-planning]]).

Hierarchical task-priority IK (Nakamura 1987, Siciliano-Slotine 1991) stacks an arbitrary number of these:

$\overset{q}{˙} = J_{1}^{+} ξ_{1} + N_{1} (J_{2} N_{1})^{+} (ξ_{2} - J_{2} J_{1}^{+} ξ_{1}) + N_{12} (\cdot) (ξ_{3} - \dots) + \dots$

with $N_{i}$ the null-space projector of the augmented Jacobian up through task $i$ . Used as written this works but is brittle — every modern stack solves the same problem as a single QP (see §2.4).

2.3 Coupled dynamics — vehicle-arm interaction

The combined equations of motion partition base and arm DOFs:

$[M_{bb} M_{ab} M_{ba} M_{aa}] [\overset{q}{¨}_{b} \overset{q}{¨}_{a}] + [h_{b} h_{a}] = [S_{b}^{⊤} u_{b} τ_{a}] + J_{ext}^{⊤} F_{ext}$

The off-diagonal $M_{ba} = M_{ab}^{⊤}$ block is the cross-inertia: arm acceleration generates a torque on the base, and base acceleration generates a torque on the arm. The magnitude scales as $m_{arm} \cdot L_{arm}$ — a Spot + 6-DOF Spot Arm (arm mass ~8 kg, max reach 1 m) reacts the robot enough at 10 g end-effector acceleration to noticeably yaw the body; the planner must either back off the acceleration or pre-compensate with a counter-yaw on the legs.

Reaction-null space (Yoshida 1996). For a free-flying or wheeled-with-soft-suspension base you can find an arm trajectory whose Jacobian projection onto the base inertia is zero — the arm moves without disturbing the base. Used on satellite manipulators (ETS-VII, ROKVISS) and emerging on lightweight household humanoids where any base disturbance ruins end-effector accuracy.

2.4 Whole-Body Control (Sentis-Khatib 2005, Khatib 1987 operational space)

The dominant 2026 stack for tightly-coupled mobile manipulation is a hierarchical task-priority QP solved at 100–1000 Hz. At each control tick, solve:

$min_{\overset{q}{¨}, τ, λ} \sum_{i} w_{i} ∥ J_{i} (q) \overset{q}{¨} + \dot{J}_{i} \overset{q}{˙} - \overset{x}{¨}_{i}^{des} ∥^{2}$

subject to:

Dynamics: $M \overset{q}{¨} + h = S^{⊤} τ + J_{c}^{⊤} λ$
Contact constraints (legged base): $J_{c} \overset{q}{¨} + \dot{J}_{c} \overset{q}{˙} = 0$ on stance feet
Friction cones: $∥ λ_{x y} ∥ \leq μ λ_{z}$ , $λ_{z} \geq 0$
Joint torque limits: $τ_{m i n} \leq τ \leq τ_{m a x}$
Velocity / acceleration limits: $\overset{q}{˙}_{m i n} \leq \overset{q}{˙} + \overset{q}{¨} Δ t \leq \overset{q}{˙}_{m a x}$
Tip-over / ZMP for wheeled with high arm: $p_{ZMP} \in conv (wheel contacts)$

Tasks are stacked by weight (soft) or by null-space projection (strict). A 2026 humanoid manipulation controller typically runs five priority levels:

Hard: dynamics consistency, contact non-penetration, torque limits, self-collision.
Strict: balance / ZMP / CoM position.
Strict: end-effector pose tracking.
Soft: posture regularisation (keep arm near rest, base level).
Soft: joint-velocity minimisation, smoothness.

Solver of choice in 2024–2026: ProxQP (LAAS-CNRS, Bambade 2022) or HPIPM (IMU Freiburg, Frison 2020) for the inequality-constrained QP; OSQP for the soft-stack only. Pinocchio (Carpentier 2019) supplies $M, h, J$ at ~10 μs per evaluation on a modern CPU for a 36-DOF humanoid.

3. Practical math + worked examples

Example A — Coupled inverse kinematics on a diff-drive base + 7-DOF arm

Setup. A TIAGo-style mobile base (diff-drive, wheel separation $L = 0.5$ m, wheel radius $r = 0.1$ m) carrying a 7-DOF cobot arm with reach 0.9 m mounted on a 0.4 m torso. Target: end-effector at $(x, y, z) = (3.0, 2.0, 1.0)$ m world, gripper pointing along $+ x_{world}$ , robot starts at the origin facing $+ x$ .

Step 1 — feasibility. The arm’s reachable sphere around its shoulder has radius 0.9 m. The shoulder sits at $(x_{b}, y_{b}, 0.4)$ when the base is at $(x_{b}, y_{b}, θ_{b})$ . For a target height 1.0 m the available radius in the horizontal plane is $0. 9^{2} - 0. 6^{2} = 0.67$ m. Therefore the base must end within a 0.67 m circle around the target’s $(x, y) = (3.0, 2.0)$ projection — far outside the starting workspace, so the base must move.

Step 2 — pick a base goal. A reasonable secondary objective is to maximise arm manipulability. Standing the base perpendicular to the reach direction (heading $θ_{b} = 0$ rad, since the arm extends along $+ x$ from the torso) and at standoff 0.45 m gives:

$q_{b}^{goal} = (x_{b}, y_{b}, θ_{b}) = (2.55, 2.0, 0)$

Step 3 — drive a non-holonomic path to the standoff. A diff-drive base cannot translate sideways; the planner ([[Robotics/path-planning]]) generates a Dubins / Reeds-Shepp curve from $(0, 0, 0)$ to $(2.55, 2.0, 0)$ . With minimum turning radius $ρ_{m i n} = v / ω_{m a x} = 0.5/1.2 = 0.42$ m, the shortest path is approximately a 90° arc into a straight + 90° arc out — total length ~3.6 m, traversed at 0.5 m/s in ~7.2 s.

Step 4 — solve arm IK at the new standoff. With the base now at $(2.55, 2.0, 0)$ , the shoulder is at $(2.55, 2.0, 0.4)$ ; the gripper target $(3.0, 2.0, 1.0)$ is at relative position $(0.45, 0, 0.6)$ , magnitude 0.75 m — inside the 0.9 m arm sphere. Solve 7-DOF IK with the redundancy resolved by minimising joint motion from a comfortable elbow-up configuration. With Pieper-decomposable Franka-style geometry the solution converges in ~20 Newton iterations / 0.5 ms (Beeson-Ames 2015 TRAC-IK).

Step 5 — decoupled execution. Run the base trajectory in Nav2; on arrived event hand off to MoveIt 2 which executes the arm trajectory. Total task time ~9 s.

Step 6 — coordinated alternative. A whole-body controller would interleave base and arm motion: as soon as the target enters the arm’s reachable workspace ball (around $t = 5$ s, when the base passes $(2.0, 1.6)$ ), the arm starts moving. Total task time drops to ~6.5 s at the cost of needing a coupled QP solver and a planner that can emit base+arm trajectories simultaneously. Used by Mobile ALOHA, Stretch, modern humanoid manipulation.

Example B — Whole-body QP control tick

Setup. Quadruped + 6-DOF arm (Spot + Spot Arm class). DOFs: 18 (12 leg + 6 arm) + 6 floating-base = 24 generalised coordinates, 18 actuated. Control rate: 500 Hz.

Tasks at this tick (top priority first):

Stance feet zero acceleration (4 legs × 3 = 12 contact constraint dimensions).
Friction-cone constraints on each stance foot: $∣ λ_{x y} ∣ \leq 0.7 λ_{z}$ .
CoM tracks reference at $(\overset{x}{¨}, \overset{y}{¨}, \overset{z}{¨})^{des} = (0, 0, 0)$ m/s² (stationary stance).
End-effector tracks reference: $\overset{x}{¨}_{ee}^{des}$ from a 50 Hz outer Cartesian PD loop, current value $(0.3, 0, - 0.1)$ m/s² (reaching toward a doorknob).
Posture regulariser: $\overset{q}{¨}_{a}^{des} = - k_{p} (q_{a} - q_{a}^{rest}) - k_{d} \overset{q}{˙}_{a}$ on each non-task joint, $k_{p} = 50$ , $k_{d} = 5$ .

Assemble QP. Decision variables: $(\overset{q}{¨}, τ, λ) \in R^{24 + 18 + 12} = R^{54}$ . Equality: dynamics ( $24$ rows). Inequality: friction cones ( $4 \times 4 = 16$ rows, linearised), torque limits ( $2 \times 18 = 36$ rows), contact normal force ≥ 0 ( $4$ rows).

Solve. ProxQP on this problem on a modern Ryzen 7 6800U: ~0.3 ms warm-started, ~0.8 ms cold. Well within the 2 ms budget at 500 Hz.

Output. Joint torques $τ$ go straight to the motor drivers. The legs sit firm, the arm reaches forward, and the base posture stays level even as the arm pulls forward on the body.

Example C — Coordinated drawer-opening

Setup. Holonomic mobile base (mecanum, KUKA omniMove or RB-Vogui) + 7-DOF arm grasping a drawer handle that opens along $+ x_{world}$ . The drawer is 0.5 m deep; the arm’s reach minus the safe operating envelope is 0.4 m. Therefore the base must move during the open.

Constraint formulation. Once the gripper has the handle, the end-effector velocity in the world frame is constrained to $\overset{x}{˙}_{ee} = v_{open}$ , $\overset{y}{˙}_{ee} = \overset{z}{˙}_{ee} = 0$ , with $v_{open}$ the desired drawer-pull rate (~0.05 m/s, conservative). The arm wrist orientation is locked to match the drawer face.

Two ways to split it. With the redundancy of (3 base + 7 arm) = 10 DOFs against a 6-DOF task, 4 DOFs are free. Choose:

Arm leads, base trails. Arm starts the pull; the base follows at $v_{x, b} = v_{open}$ once the elbow approaches a singular configuration (Yoshikawa manipulability $w < w_{m i n}$ ). Equivalent to a hand-off; simple but jerky.
Coordinated weighted velocity. Distribute $v_{open}$ between base and arm in proportion to a stiffness ratio: $v_{x, b} = α \cdot v_{open}$ , $v_{x, arm rel base} = (1 - α) \cdot v_{open}$ . Tuning $α$ continuously based on manipulability gives a smooth pull from start to finish.

A typical whole-body QP just writes the constraint $J_{full} \overset{q}{˙} = [v_{open}, 0, 0, 0, 0, 0]^{⊤}$ and lets the regulariser settle the split. With a small null-space cost penalising arm motion ( $w_{a} > w_{b}$ ), the base ends up doing the bulk of the displacement — which is also the desired behaviour, since arm joint motion costs more energy than wheel motion on most platforms.

4. Architectures

Five architectural patterns dominate the 2026 production and research landscape:

Architecture	Coupling	Typical stack	Used by
Decoupled state machine	None — switch by phase	`Nav2` + `MoveIt 2` + behaviour tree	Most warehouse / hospital service robots (Fetch, MiR+UR, TIAGo default)
Loosely-coupled supervisor	Sequential with hand-offs	TAMP planner emits navigation + manipulation segments	OK-Robot, HomeRobot baseline
Tightly-coupled whole-body	Single optimisation	Pinocchio + ProxQP / TSID / ocs2 + Crocoddyl	Atlas, Spot+Arm, ANYmal+Z1, Apollo, Optimus
End-to-end learned	Implicit	VLA models (RT-2, π_0, OpenVLA), diffusion policies	Mobile ALOHA, BC-Z, RT-2-X, Physical Intelligence π_0
Hybrid: model + residual policy	Model-based core, learned correction	WBC + learned correction net	Margolis 2024 (quadruped), Cheng 2024 (humanoid manipulation)

The decoupled stack is overwhelmingly the most-deployed in 2026 because the vast majority of mobile-manipulation tasks tolerate a 0.5–2 s base-arm hand-off pause. The whole-body stack appears whenever the task requires base-arm simultaneity (dishwasher unloading, drawer pulling, doorway traversal with payload), and is mandatory on legged platforms where stance management itself is a whole-body problem before the arm even moves.

End-to-end learned controllers gained ground rapidly in 2023–2026: Mobile ALOHA (Fu-Zhao-Wu-Finn-Levine 2024) showed bimanual cooking on a $30 k cart-mounted setup using behaviour cloning from 50 human demonstrations per task. RT-2 (Brohan et al 2023) and OpenVLA (Kim 2024) demonstrated language-conditioned manipulation generalising across embodiments via Open X-Embodiment data. The 2026 frontier: π_0 (Physical Intelligence 2024) and Helix (Figure 2025) — pretrained vision-language-action models running on humanoid platforms at 30–200 Hz.

5. Key platforms

Platform	Base	Arm(s)	Reach	Payload	Released
PR2 (Willow Garage)	Holonomic 4-caster omni	2× 7-DOF	0.9 m	1.8 kg/arm	2010 (legacy)
Fetch / Freight (now Zebra)	Diff + caster	1× 7-DOF	0.92 m	6 kg	2015
TIAGo (PAL Robotics)	Diff	1× 7-DOF + torso lift	0.87 m	3 kg	2016, ongoing
HSR (Toyota Human Support)	Cylindrical omni	1× 5-DOF + telescoping	0.6 m	0.5 kg	2017
Stretch RE1/RE2/RE3 (Hello Robot)	Diff	1-DOF arm + 1-DOF wrist + telescoping mast	~0.5 m	1.5 kg	2020, RE3 2024
Spot + Spot Arm (Boston Dynamics)	Quadruped	1× 6-DOF	0.99 m	11 kg static / 4 kg dynamic	Arm released 2021
BD Stretch (commercial)	Tracked omni	1× 7-DOF + vacuum	~2 m (extended)	23 kg	2022
Husky + UR5 (Clearpath)	Skid-steer 4WD	1× 6-DOF cobot	0.85 m	5 kg	Modular
RB-Vogui XL (Robotnik)	Swerve	1× or 2× cobot (UR, Franka, iiwa)	varies	50 kg cart + 5 kg/arm	2022
Mobile ALOHA (Stanford 2024)	Wheeled cart (manual)	2× ViperX-300 6-DOF + 2× WidowX-250 leader	0.75 m/arm	0.75 kg/arm	2024
ANYmal + Z1	Quadruped	1× 6-DOF (Unitree Z1)	0.74 m	2 kg	2023+
Digit (Agility Robotics)	Biped	2× 4-DOF	0.8 m	16 kg (lift)	V4 2024, V5 2025
Figure 02	Biped	2× 7-DOF	~0.8 m	25 kg total	2024
Apptronik Apollo	Biped	2× 7-DOF	~0.85 m	25 kg total	2024
1X Neo Beta / Gamma	Biped	2× 7-DOF (tendon-driven)	~0.7 m	20 kg total	2024–25
Tesla Optimus Gen 2 / Gen 3	Biped	2× 7-DOF + 11-DOF hand	~0.8 m	~9 kg/arm	2024, Gen 3 announced
Unitree H1 / G1	Biped	2× 4-DOF (H1) / 7-DOF (G1)	varies	~3 kg/arm	2023–24
BD Atlas Electric	Biped	2× 7-DOF	~0.9 m	~25 kg	2024

The 2026 humanoid wave converges on a remarkably similar specification: 1.5–1.8 m tall, 50–90 kg mass, 28–40 actuated DOFs total, 2 × 7-DOF arms with anthropomorphic hands, batteries supporting 1–5 h walking. Where they differ is in the actuator philosophy — Optimus uses planetary-geared frameless servos with cross-roller bearings; Apollo uses HEBI-style modular smart joints; Figure uses custom strain-wave drives; 1X uses tendon-driven Series Elastic Actuators (SEA) for soft contact behaviour.

6. Tasks + benchmarks

Benchmark	Scope	Year	Notable
DARPA Robotics Challenge	Disaster response: drive vehicle, open door, valve, drill, walk over rubble, climb stairs	2015	KAIST DRC-Hubo win
RoboCup@Home	Domestic service (greeting, dishes, groceries)	Annual since 2006	RoboCup@Home OPL / DSPL
NIST ARM	Manipulation primitives (peg-in-hole, knot-tying, assembly)	Ongoing	NIST Robotic Grasping benchmarks
HomeRobot OVMM (Meta + Stanford 2023)	Open-vocabulary mobile manipulation in 60 photo-real homes	2023	Habitat-Sim 3.0
Behavior-1K (Stanford 2024)	1000 household tasks; iGibson 2.0 / OmniGibson sim	2024	Largest household task suite
OK-Robot (NYU + Meta 2024)	Zero-shot pick-and-place in unseen homes	2024	Stretch RE2 + LLM
Open X-Embodiment (Google + 33 labs 2023)	Cross-platform manipulation dataset, 1M+ episodes, 22 embodiments	2023	RT-X model release
RT-2 / RT-X / π_0 evaluations	Real-world VLA performance	2023–24	Generalisation across robots
ManiSkill 3 (UCSD 2024)	Sim2real benchmark, 100+ manipulation tasks	2024	Built on SAPIEN
Habitat 3.0 (Meta 2024)	Photorealistic human-robot social nav + manip	2024	Used for Spot RL training

The shift from individual-task benchmarks (peg-in-hole) to fleet-scale household evaluation (Behavior-1K, OVMM) and cross-platform foundation models (Open X-Embodiment, RT-X) is the defining 2023–2026 trend. Productionisation is starting to track those numbers: 1X reports its Neo policies trained on Open X data; Figure 02 ships with Helix VLA.

7. Software stacks

Stack	Layer	Maintained by	Notes
ROS 2 Jazzy / Kilted	Middleware	Open Robotics / OSRF	Default 2025–26 mobile-manip middleware
Nav2	Navigation (planning + control + recovery)	Open Navigation	Bound to ROS 2
MoveIt 2	Arm motion planning (OMPL, CHOMP, STOMP, Pilz)	PickNik Robotics	Bound to ROS 2
Pinocchio	Rigid-body dynamics (RNEA, ABA, Jacobians, derivatives)	INRIA / LAAS	C++ + Python; ~10 μs per 30-DOF eval
Crocoddyl	Optimal control / DDP / iLQR for whole-body MPC	LAAS-CNRS	Used in TALOS, Solo, ANYmal
TSID	Task-Space Inverse Dynamics QP (whole-body controller)	LAAS-CNRS	Used in TALOS
ocs2	Optimal-control C++ framework (SQP/SLQ/DDP)	ETH RSL	Used in ANYmal locomotion + manipulation
ProxQP / OSQP / HPIPM	QP solvers	LAAS / Stanford / Freiburg	Mille-Feuille of whole-body QP stacks
Drake	Multi-body + optimisation + sim	Toyota Research / MIT	Used in Mobile ALOHA, TRI
MuJoCo MJX	GPU-accelerated dynamics simulator	DeepMind (open-source 2021+)	Default RL sim
Isaac Sim + Isaac Manipulator / Isaac Lab	NVIDIA stack — high-fidelity sim, RL training	NVIDIA	GPU-parallel, used by Cheetah robotics, Boston Dynamics for some RL
Habitat 3.0	Photorealistic indoor sim, embodied AI	Meta FAIR	Used for home robot RL
Open X-Embodiment / RT-X	Cross-platform dataset + model	Google + 33 labs	Foundation model pretraining

A typical 2026 production mobile-manipulator integrator picks ROS 2 + Nav2 + MoveIt 2 + Pinocchio as the baseline, then layers in ocs2 or Crocoddyl if whole-body MPC is needed and Isaac Sim for RL-policy training. The split between “ROS-style decoupled” and “C++ whole-body real-time” is the canonical software architecture decision; both ends of the spectrum talk over a ROS 2 topic at 100–500 Hz at the seam.

8. Control challenges

Base disturbance during manipulation. Light bases (under ~30 kg) get pushed around by 5 kg cobot motion; arm jerk transmits to base sway, which feeds back into end-effector error. Mitigations: (a) heavier base / lower CoM, (b) suspension stiffening during manipulation, (c) reaction-null-space arm trajectories (Yoshida 1996), (d) feedforward base-thrust compensation (commanded $v_{b}$ that cancels the arm reaction).
TAMP — Task and Motion Planning. Selecting where to place the base relative to the task is a discrete-continuous optimisation. Plaks-Lozano-Kaelbling (2010), Garrett-Lozano-Pérez-Kaelbling (2020) frame it as a symbolic-planner outer loop with continuous IK feasibility inner solves. PDDLStream is the current research reference; OK-Robot uses an LLM as the symbolic planner.
Whole-body collision avoidance. Arm-to-base, arm-to-arm (bimanual), arm-to-environment, base-to-environment — all must check at servo rate. Signed-distance fields (Voxblox, FCL, HPP-FCL) precomputed against environment + URDF give 1–10 μs distance queries; gradient feeds into the WBC null-space term.
State estimation under base motion. Visual or LiDAR SLAM running on the base must publish a pose update at 50–200 Hz that the arm controller treats as ground truth. Drift during a long task → arm misses its target by the drift amount. Invariant EKF (Barrau-Bonnabel 2017), legged-specific factor-graph estimators ([[Robotics/slam]]), and visual-inertial odometry are standard.
Real-time MPC on humanoid. Centroidal-dynamics MPC at 100–200 Hz for a 30+ DOF humanoid is solvable in 2026 on a single CPU thanks to ProxQP / HPIPM warm-starts and Pinocchio’s analytic derivatives. Reaching beyond 200 Hz on the full-body model is still a research effort.
Failure recovery. Dropped object, slipped grasp, collision — controllers must detect and re-plan within ~50 ms. Modern stacks layer: low-level safety stop (joint torque or F/T threshold) → mid-level recovery behaviour (retract arm, re-localise) → high-level re-plan (TAMP).
Battery + compute budget. A mobile manipulator that runs whole-body MPC on a 12-core CPU plus a 200 W NVIDIA GPU for perception draws ~250 W from compute alone; locomotion adds 100–500 W on a humanoid; arm motion 50–200 W. A 1 kWh battery yields 1–3 h continuous operation. Cycle-time-aware scheduling that idles the GPU when not perceiving and lowers MPC rate during decoupled phases is now a standard optimisation.
Mismatched control rates. Base at 50 Hz (Nav2 controller), arm at 1 kHz (MoveIt FollowJointTrajectory), low-level torque at 4 kHz (drive firmware) — passing whole-body QP commands across this hierarchy without losing real-time guarantees requires careful ROS 2 QoS settings (SensorDataQoS, BEST_EFFORT) and often a separate Cyclone DDS configuration.

9. Edge cases + failure modes

Object slip during base motion. Pulling a heavy box on a diff-drive base with low CoM; cornering accelerations exceed grip on the gripper. Detect via tactile/force sensors; cap base lateral acceleration. Stretch caps cornering at 0.5 m/s² with a payload.
Cable management. An arm cable harness routed externally can snag on the base, the load, or doorways. Internal routing or strain-relief loops at every joint. Spot+Arm carries the arm wiring through a slip-ring at the base joint.
Whole-body collision during transition. As the arm folds for transport, it can clip the base or another platform. URDF self-collision pairs must include base ↔ arm; transitional poses should be verified in sim.
Localisation drift on long task. A 5-minute task with 1 % SLAM drift = 3 cm end-effector offset by task end. Fix: reset localisation at known fiducials (April tags, ChArUco) at each base-stop, or feed task-completion residuals back into the SLAM estimator.
Base vibration transmits to end-effector. Especially on tracked platforms or unsuspended wheeled bases; gripper position oscillates at ~5–20 Hz. Fix: notch filter in the Cartesian controller, or stiffen the torso, or add a wrist isolator.
Tip-over. Arm fully extended laterally + heavy payload + sudden base brake → CoG exits the support polygon. Enforce a real-time tip-over margin constraint: $dist (CoM projection, support polygon edge) \geq d_{m i n}$ (typically 5–10 cm). Stretch + Spot have hardware tip-over interlocks.
Mismatched arm and base control rates. Base controller at 50 Hz × arm at 1 kHz; if the base pose stale at the arm, the arm Jacobian uses a wrong base transform → end-effector tracks wrong target. Fix: timestamp every transform; reject stale transforms older than ~20 ms.
Stuck in clutter. Wheeled base in narrow aisle with arm extended cannot back out without arm collision. Recovery behaviours: tuck arm to home before unstuck attempt, then re-deploy.
Door opening + push/pull coordination. Pulling a door inward: gripper applies force, door arcs about its hinge, robot must reverse to follow the door arc. Pushing outward: robot drives forward as gripper pushes. Specialised controllers (Niemeyer-Slotine 1989 impedance + Karayiannidis 2013 hinge identification) measure the door axis at runtime.
Stairs and obstacle traversal with manipulation gear. Legged platforms (Spot, Atlas) can climb stairs while carrying payload, but arm motion during step-up can disturb balance — most production stacks force the arm to a “carry” pose during stair sequences and unfreeze only on flat ground.
Battery sag mid-task. Voltage drops during high-power moves (driving + arm acceleration simultaneously) cause joint torque limits to shrink. Whole-body QP must respect a time-varying $τ_{m a x}$ from the BMS.
Floor-coupling and slip. Wet floors, wax, casters on grout lines — base odometry diverges from true pose during arm interaction. Visual or LiDAR-anchored fusion mandatory in real deployments.
Soft floors and rugs. Diff-drive wheels sink unevenly into carpet, changing effective wheel radius by ~5 %; odometry skew → end-effector misses target by centimetres. Fix: trust visual-pose estimate over wheel odometry near manipulation targets.

10. Cross-references

manipulator-design — arm topology, payload sizing, harmonic-drive vs cycloidal, cobot vs industrial.
mobile-base-wheeled — diff-drive, mecanum, swerve kinematics; the base half of the combined Jacobian.
legged-robotics — floating-base dynamics + contact-mode-switching; the base layer of every humanoid mobile manipulator.
path-planning — Dubins/Reeds-Shepp for the base, RRT-Connect/CHOMP/STOMP for the arm, TAMP for the combined task.
slam — pose estimation that the WBC consumes as ground truth.
end-effectors — grippers, suction cups, magnetic tools; the business end of mobile manipulation.
impedance-control — Cartesian impedance + admittance; the standard contact-control layer above the WBC.
state-space-lqr — LQR / MPC formulations that the whole-body optimisation reduces to.
computer-vision-robotics — pose estimation of target objects; the perception input to the manipulation task.
rl-for-control — RL and imitation policies for end-to-end mobile manipulation (RT-2, π_0, Mobile ALOHA, Helix).
humanoid-balance — planned: bipedal balance during manipulation.
dynamics-rigid-body — RNEA, ABA; the dynamics primitives Pinocchio implements.
microcontrollers — power, harness, thermal integration of a mobile manipulator.
robotics-control — controller languages (URScript, KRL, RAPID) on the arm side.

11. Citations

Khatib, O. “A Unified Approach for Motion and Force Control of Robot Manipulators: The Operational Space Formulation.” IEEE J. Robotics and Automation, 3(1):43–53, 1987. The foundational paper for task-space control on which whole-body controllers are built.
Sentis, L. & Khatib, O. “Synthesis of Whole-Body Behaviors through Hierarchical Control of Behavioral Primitives.” International Journal of Humanoid Robotics, 2(4):505–518, 2005. The canonical formulation of hierarchical WBC.
Featherstone, R. Rigid Body Dynamics Algorithms, Springer, 2008. ISBN 978-0-387-74314-1. RNEA + ABA, the engine of every WBC stack.
Mistry, M., Buchli, J. & Schaal, S. “Inverse Dynamics Control of Floating Base Systems Using Orthogonal Decomposition.” Proc. IEEE ICRA, 2010. Floating-base inverse dynamics that handles base under-actuation cleanly.
Hutter, M., Gehring, C., Jud, D., Lauber, A., Bellicoso, C.D., Tsounis, V. et al. “ANYmal — A Highly Mobile and Dynamic Quadrupedal Robot.” Proc. IEEE/RSJ IROS, 2016. ANYmal hardware + WBC.
Bouyarmane, K. & Kheddar, A. “Humanoid Robot Locomotion and Manipulation Step Planning.” Advanced Robotics, 2011. Combined locomotion + manipulation TAMP.
Garrett, C., Chitnis, R., Holladay, R., Kim, B., Silver, T., Kaelbling, L.P. & Lozano-Pérez, T. “Integrated Task and Motion Planning.” Annual Review of Control, Robotics, and Autonomous Systems, 4:265–293, 2021. TAMP review.
Driess, D., Xia, F., Sajjadi, M., Lynch, C., Chowdhery, A., Ichter, B. et al. “PaLM-E: An Embodied Multimodal Language Model.” Proc. ICML, 2023. arXiv:2303.03378.
Brohan, A., Brown, N., Carbajal, J., Chebotar, Y., Chen, X., Choromanski, K. et al. “RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control.” Proc. CoRL, 2023. arXiv:2307.15818.
Open X-Embodiment Collaboration et al. “Open X-Embodiment: Robotic Learning Datasets and RT-X Models.” Proc. ICRA, 2024. arXiv:2310.08864.
Zhao, T.Z., Kumar, V., Levine, S. & Finn, C. “Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware.” Proc. RSS, 2023 (ALOHA / ACT).
Fu, Z., Zhao, T.Z. & Finn, C. “Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation.” Proc. CoRL, 2024. arXiv:2401.02117.
Wijaya, S., Kira, Z. et al. “OK-Robot: What Really Matters in Integrating Open-Knowledge Models for Robotics.” Meta AI / NYU, 2024. arXiv:2401.12202.
Yoshida, K. & Hashizume, K. “Zero Reaction Maneuver: Flight Validation with ETS-VII Space Robot and Extension to Kinematically Redundant Arm.” Proc. IEEE ICRA, 2001. Reaction-null space — relevant to base-disturbance management.
Nakamura, Y., Hanafusa, H. & Yoshikawa, T. “Task-Priority Based Redundancy Control of Robot Manipulators.” International Journal of Robotics Research, 6(2):3–15, 1987. Hierarchical IK.
Siciliano, B. & Slotine, J.-J. “A General Framework for Managing Multiple Tasks in Highly Redundant Robotic Systems.” Proc. ICAR, 1991. The canonical hierarchical-task-priority formulation.
Yoshikawa, T. “Manipulability of Robotic Mechanisms.” International Journal of Robotics Research, 4(2):3–9, 1985. The manipulability measure used as a null-space objective.
Carpentier, J., Saurel, G., Buondonno, G., Mirabel, J., Lamiraux, F., Stasse, O. & Mansard, N. “The Pinocchio C++ library — A Fast and Flexible Implementation of Rigid Body Dynamics Algorithms and their Analytical Derivatives.” Proc. SII, 2019.
Bambade, A., El-Kazdadi, S., Taylor, A. & Carpentier, J. “PROX-QP: Yet Another Quadratic Programming Solver for Robotics and Beyond.” Proc. RSS, 2022.
Frison, G. & Diehl, M. “HPIPM: a high-performance quadratic programming framework for model predictive control.” Proc. IFAC World Congress, 2020.
Di Carlo, J., Wensing, P.M., Katz, B., Bledt, G. & Kim, S. “Dynamic Locomotion in the MIT Cheetah 3 Through Convex Model-Predictive Control.” Proc. IEEE/RSJ IROS, 2018. The convex-MPC formulation widely adopted on quadruped + arm.
Boston Dynamics — Stretch and Spot Arm product documentation, 2022–2025. https://www.bostondynamics.com
Hello Robot — Stretch RE2/RE3 specification and user guide, 2022–2024. https://hello-robot.com
PAL Robotics — TIAGo product reference, 2024. https://pal-robotics.com
Robotnik — RB-Vogui XL product reference, 2024. https://robotnik.eu
Habitat 3.0 — Puig, X. et al. “Habitat 3.0: A Co-Habitat for Humans, Avatars and Robots.” Proc. ICLR, 2024. arXiv:2310.13724.
Behavior-1K — Li, C., Zhang, R., Wong, J., Gokmen, C., Srivastava, S., Martín-Martín, R. et al. “BEHAVIOR-1K: A Benchmark for Embodied AI with 1000 Everyday Activities and Realistic Simulation.” Proc. CoRL, 2023.
Physical Intelligence — Black, K., Brown, N., Driess, D. et al. “π_0: A Vision-Language-Action Flow Model for General Robot Control.” 2024. arXiv:2410.24164.
Figure AI — “Helix” technical post, 2025. https://www.figure.ai
Margolis, G., Yang, G., Paigwar, K., Chen, T. & Agrawal, P. “Rapid Locomotion via Reinforcement Learning.” Int. J. Robotics Research, 43(4), 2024.

Session log:

node ~/.claude/bin/obsidian-research.mjs log "Built Robotics/mobile-manipulation.md Tier 2 deep note"

Compendium

Explorer

Mobile Manipulation — Robotics Reference

Mobile Manipulation — Robotics Reference

1. At a glance

2. First principles

2.1 Combined kinematics

2.2 Redundancy and the null space

2.3 Coupled dynamics — vehicle-arm interaction

2.4 Whole-Body Control (Sentis-Khatib 2005, Khatib 1987 operational space)

3. Practical math + worked examples

Example A — Coupled inverse kinematics on a diff-drive base + 7-DOF arm

Example B — Whole-body QP control tick

Example C — Coordinated drawer-opening

4. Architectures

5. Key platforms

6. Tasks + benchmarks

7. Software stacks

8. Control challenges

9. Edge cases + failure modes

10. Cross-references

11. Citations

Graph View

Table of Contents