Sim-to-Real — Robotics Simulators, Domain Randomization, Transfer

Training robot policies in simulation and deploying on physical hardware. The simulator zoo (MuJoCo, Isaac Sim, Gazebo, Drake, Genesis, PyBullet, CoppeliaSim, Webots), the physics-engine choices that determine what transfers, the domain-randomization recipes that close the reality gap, the system-identification techniques that shrink it from the other end. Sim-to-real is the single most important capability that made the 2018–2026 wave of learned robot policies (legged locomotion, dexterous manipulation, humanoid control) economically viable; without it each robot would need millions of physical interactions.

See also

1. At a glance

A robotics simulator is a software environment that approximates the physics, sensors, and actuators of a real robot well enough that a controller (hand-written or learned) trained or tested in it transfers to hardware without catastrophic failure. “Well enough” is the entire problem: physics engines simplify contact, friction, deformation, and aerodynamics in different ways, and the gap between simulated and real behaviour — the reality gap — destroys naive transfer.

The strategy that works in 2026 has four pillars:

  1. Pick the right physics engine for the task. Rigid-body contact-rich (manipulation, locomotion) → MuJoCo or Isaac Sim. Soft-body / deformable → SOFA, FleX, Genesis. Aerodynamics / multirotor → AirSim / Flightmare / Aerial Gym / RotorPy. Underwater → Stonefish or HoloOcean. Massively parallel RL → Isaac Lab (GPU-resident, 1k–100k parallel envs) or Brax.
  2. Randomize what you don’t know. Domain Randomization (Tobin 2017, Peng 2018) varies friction, mass, sensor noise, latency, motor saturation, lighting, textures during training. The policy learns the family of dynamics, not the specific one.
  3. System-identify what you can. Measure joint friction, motor torque constants, link masses, latency offline. Tighten the simulation to match. The smaller the gap, the less randomization you need.
  4. Use real data where it matters. Real2sim2real (Chebotar 2019), residual policy learning, online adaptation (RMA — Rapid Motor Adaptation, Kumar 2021). Don’t make sim do the impossible.

This stack is what put Boston Dynamics quadrupeds into production-RL controllers (2021+), got Cassie running 5 km (Crowley 2022), trained OpenAI’s dexterous Rubik’s Cube hand (2019), and now drives every humanoid RL policy (Figure, 1X, Apptronik, Unitree).

Where this sits. RL uses the simulator as its environment; rigid-body dynamics is what the physics engine integrates; calibration is the manual sysid pre-step; vision uses the renderer for synthetic training data; ROS 2 integrates via ros_control / gz_ros2 / Isaac ROS for sim-real-swap; legged and humanoid policies are the largest current consumer.

First ask. What’s the contact problem? If you have rich contact (grasping, walking, sliding), you need a hard-constraint contact solver (MuJoCo, Drake) or a softened one with care (PhysX, Bullet — softer than reality unless tuned). Single environment or 10k parallel? Single → MuJoCo / Drake / Gazebo. 10k+ → Isaac Lab / Brax / Genesis (GPU-resident). What needs to render photorealistically? Vision in the loop → Isaac Sim (PhysX + Omniverse RTX), Unreal-based AirSim, Habitat 3.0 (indoor), CARLA (urban driving). Are you doing soft-body? Genesis (2024), SOFA, MuJoCo MJX with FEM (2024+), FleX. Standard rigid-body engines don’t help.

2. First principles

2.1 The reality gap

Decomposes into:

  • Modeling gap. Physics simplifications: rigid bodies instead of compliant, ideal motors instead of torque-ripple-and-cogging, ideal sensors instead of biased + delayed + noisy.
  • Parameter gap. Even with the right model: mass, friction, inertia, compliance values differ from reality. Friction in particular is notoriously hard to identify and varies with surface condition.
  • Perception gap. Synthetic images differ from real (lighting, sensor noise, lens distortion, motion blur, depth-camera artifacts).
  • Latency gap. Sim runs at perfect timing; real systems have variable network/bus delays.
  • Compliance gap. Real joints flex; harmonic drives have backlash + lost motion; cables stretch; gears wear.

2.2 Domain randomization (DR)

Train across many randomly-parameterized environments so the learned policy generalizes to some unseen real environment. The key idea (Tobin 2017): “if the model sees enough variation in simulation, the real world will look like just another variation.”

Standard randomized quantities for locomotion / manipulation:

  • Link masses: ±20-30% multiplicative
  • Inertia tensors: ±20-30%
  • Joint friction (Coulomb + viscous): 0.5–2× nominal
  • Joint damping: 0.5–2× nominal
  • Motor torque limit: 0.7–1.3× nominal
  • Motor PD gains: 0.7–1.3× nominal
  • Actuator latency: 0–30 ms uniform
  • IMU bias / noise: ±0.05 m/s² accel, ±0.01 rad/s gyro
  • Ground friction coefficient: 0.4–1.2
  • Ground restitution: 0–0.3
  • External pushes: ±20 N every 5 s
  • Terrain height field: ±2–5 cm noise

Visual DR (Tobin 2017, more aggressive than physics DR):

  • Object textures: random from 1000+ texture library
  • Lighting: 1–4 point lights, random position + intensity + colour
  • Camera position: ±5 cm position, ±2° orientation jitter
  • Distractor objects: 0–8 random shapes in background
  • Image noise: Gaussian + speckle

2.3 Automatic domain randomization (ADR)

OpenAI 2019 (Rubik’s Cube) introduced Automatic Domain Randomization — the randomization range expands automatically when the policy succeeds, shrinks when it fails. This avoids the manual tuning problem (too wide → slow training; too narrow → bad transfer). Variant: Robust Adversarial Reinforcement Learning (RARL, Pinto 2017) where an adversary perturbs the dynamics.

2.4 System identification

Inverse problem: find physics parameters θ that minimize ||sim(θ) − real||. Two flavors:

Offline sysid: collect real trajectories of known controls; fit dynamics parameters (mass, inertia, friction, motor constants) by gradient descent or evolutionary search. Tools: Pinocchio Identification, Drake SystemIdentification, custom MuJoCo + autograd / JAX.

Online sysid (adaptive control): estimate parameters live during operation. RLS (recursive least squares), L1 adaptive, MIAC. Often combined with a fast inner loop that uses the estimate.

2.5 Rapid Motor Adaptation (RMA, Kumar 2021)

Train two policies in parallel:

  1. Privileged-teacher: takes ground-truth dynamics parameters z (friction, mass, etc.) as input. Trained with PPO + DR.
  2. Adaptation module: predicts a latent z̃ from the recent history (last 50 timesteps of state-action). Trained by regression to teacher’s z.

At deployment: the policy uses z̃ instead of true z. The adaptation module updates online without gradient steps — pure feedforward inference at policy rate. RMA was the key trick that took Mini-Cheetah, A1, and ANYmal RL controllers from “works on flat ground” to “works on grass, gravel, sand, ice.”

2.6 Real2Sim2Real

Chebotar 2019, Allevato 2020. Iteratively:

  1. Roll out a policy on real hardware; log trajectories.
  2. In simulation, find dynamics parameters θ that best reproduce the real trajectories (sysid).
  3. Train a new policy with the updated simulator + (often) DR around θ.
  4. Deploy; collect new data; iterate.

Closes the loop between offline sysid and policy training.

2.7 Residual policy learning

Train a learned policy that adds a correction to an analytic / model-based controller. The base controller handles 90% of the behavior; the learned residual closes the gap. Used in ANYmal navigation policy (Bjelonic 2022), in surgical-robot deployments (Korkmaz 2023), and increasingly in mobile manipulation.

2.8 The contact-solver problem

The hardest physics for sim-to-real is contact — when discrete events (impacts, friction transitions) happen inside a continuous integration step. Two main approaches:

  • Linear Complementarity Problem (LCP) solvers — Stewart-Trinkle. Hard constraints, no inter-penetration, no “soft springiness.” Used by ODE (legacy), Vortex.
  • Soft-contact / penalty — Springs and dampers approximate contact. Easier to tune for stability, but adds artificial compliance. Used by Bullet, PhysX (default).
  • Convex contact — MuJoCo’s pyramidal friction cone + Newton solver. Stable, fast, but pyramidal-cone friction is a 4-sided approximation to the true Coulomb cone.
  • Hard contact with smoothing — Drake’s TAMSI (Time-stepping with Anitescu Smoothing Iteration); MuJoCo MJX’s analytic gradient through contact.

The choice cascades to what transfers. PhysX-trained policies often learn to exploit soft-contact compliance that doesn’t exist in reality. MuJoCo-trained policies are tighter but slower per step (until MJX vectorized it on GPU in 2023).

3. Practical math — DR ranges and sysid examples

3.1 Worked example: quadruped friction DR range

For a quadruped expected to operate on indoor concrete (μ ≈ 0.8), outdoor asphalt (μ ≈ 0.7), grass (μ ≈ 0.6), wet tile (μ ≈ 0.3):

  • Training distribution: U[0.3, 1.0] for ground friction.
  • Bottom of range covers wet tile worst case.
  • Top covers dry concrete + some margin.

After training, the policy is robust over the entire range. The OOD slip (wet tile + downhill) is where you’d see degradation.

3.2 Worked example: motor constant identification

Goal: identify motor torque constant K_t.

  1. Lock the joint output mechanically (kinematic constraint).
  2. Command a known current i.
  3. Measure resulting torque τ on a load cell.
  4. K_t = τ / i.

Typical result: K_t = 0.5 N·m/A ± 5% across motors in a batch. Use the per-motor value in sim if practical; otherwise use the batch mean + DR ±5%.

3.3 Worked example: latency calibration

Goal: identify the round-trip command-to-encoder delay.

  1. Command a step torque at t=0.
  2. Log the encoder velocity response.
  3. Find the inflection point in velocity. Subtract any expected mechanical delay (low-pass filter, gear backlash).
  4. Result: typically 1–5 ms for a real-time EtherCAT system, 10–30 ms for ROS over Ethernet.

Bake this into sim as a fixed offset, then add DR ±50% on top.

3.4 GPU-resident training throughput

Isaac Lab on H100 (4096 environments × 1 kHz physics × shared policy network):

  • Step time: ~4 ms per batched step (all 4096 in parallel).
  • Throughput: ~1.0 M env-steps/s.
  • 1B-step training: ~17 min wall-clock.

Vs MuJoCo single-thread: ~20k steps/s. The GPU pipeline is 50× faster per environment, and supports 100× more environments. This is what made humanoid RL viable in 2023+.

3.5 Common DR ranges (literature consensus)

mass_link_i        ~ U[0.7, 1.3] * nominal
inertia_link_i     ~ U[0.7, 1.3] * nominal
com_link_i         + N(0, 0.01 m) per axis
joint_friction     ~ U[0.5, 2.0] * nominal
joint_damping      ~ U[0.5, 2.0] * nominal
motor_kp           ~ U[0.7, 1.3] * nominal
motor_kd           ~ U[0.7, 1.3] * nominal
ground_friction    ~ U[0.3, 1.0]
ground_restitution ~ U[0.0, 0.3]
ext_force_push     ~ N(0, 10 N), every 5 s
encoder_noise      ~ N(0, 0.001 rad)
imu_accel_bias     ~ N(0, 0.05 m/s²)
imu_gyro_bias      ~ N(0, 0.01 rad/s)
action_latency     ~ U[0, 30 ms]
obs_latency        ~ U[0, 30 ms]

4. Design heuristics

  • Sysid before DR. A 10% mass error eats DR budget you could spend on real uncertainty. Weigh your robot. Measure each motor. CAD-mass is wrong by ~10–30% routinely.
  • Latency is the killer. Sim-to-real for fast dynamics (legged, multirotor) is dominated by latency mismatch. Identify it offline and bake it in.
  • Episodes matter. DR helps within an episode-length window. If your real episode is 5 minutes but your sim trains on 5-second episodes, behaviors that drift over minutes (battery sag, motor heat) won’t be captured.
  • Watch for sim-only exploits. Common: policy “vibrates” the motors at the integrator’s Nyquist, exploiting numerical instabilities to produce non-physical thrust. Fix: action smoothness penalty + lower control rate.
  • Train at the deployment control rate. If your real system commands at 50 Hz, sim should match. Higher sim rate looks better in metrics but produces policies that fail on hardware.
  • Render variety > render fidelity for vision policies. Tobin 2017: a bag of unrealistic textures generalizes better than one realistic texture.
  • Test transfer with the same firmware you deploy. Different filter cut-offs / different deadbands ≠ same robot. Hardware-in-the-loop sim is the cheapest catch.
  • Always have a model-based fallback. A learned controller failing is a black-box failure; you can’t tune it. Keep a model-based emergency-stop / posture-holder that takes over below a safety threshold.

5. Components & sourcing — the simulator zoo

5.1 Physics-engine-focused

SimulatorEngineLicenseBest forNotes
MuJoCoCustom convexApache 2.0 (since 2022)Contact-rich, fastAcquired by DeepMind 2021, open-sourced 2022; MJX GPU port 2023
MuJoCo MJXMuJoCo on JAXApache 2.0GPU RLVectorized; 1000s parallel envs
Isaac Sim + Isaac LabPhysX 5 + OmniverseNvidia EULA, free for researchPhotorealistic + RL at scaleReplaces Isaac Gym 2023; GPU-resident
DrakeTAMSI / customBSD-3High-fidelity, certifiedMIT/TRI; differentiable; symbolic; manipulation focus
Bullet / PyBulletBullet 3ZlibQuick prototypingOlder, slower than MuJoCo for contact
GenesisCustom GPU + diffApache 2.0Unified rigid + soft + fluid2024 release; combines MPM + FEM + rigid
Gazebo (Ignition)ODE / Bullet / Dart / Simbody pluginsApache 2.0ROS 2 integrationOpen Robotics; gz-sim default in ROS 2 Humble+
CoppeliaSim (V-REP)Bullet / ODE / MuJoCo / Newton / VortexFree educational, paid commercialMulti-engine, scriptableCoppelia Robotics
WebotsODE-derivedApache 2.0Education + multi-robotCyberbotics
BraxCustom JAXApache 2.0GPU/TPU RL onlyDeepMind; no contact gradients pre-2023
DARTFeatherstone + LCPBSD-2Open researchSlower than MuJoCo
NVIDIA FleXPosition-based dynamicsNVIDIA EULASoft + fluidLegacy; mostly subsumed by Isaac Sim PhysX

5.2 Application-focused

SimulatorFocusLicense
AirSimMultirotor + cars on UnrealMIT (Microsoft, archived 2022)
FlightmareQuadrotor (RL focus)MIT
Aerial Gym (RotorPy)Quadrotor learning at GPU scaleMIT
CARLASelf-driving urbanMIT
NVIDIA DRIVE SimAV; ties to OmniverseNvidia
Cognata / Foretellix / Applied Intuition / Parallel DomainAV simulation commercialCommercial
HoloOceanUnderwaterMIT
StonefishUnderwater + maritimeGPL-3
Habitat 3.0Indoor navigation + interactionMIT (Meta)
iGibson 2.0Indoor with realistic physicsMIT (Stanford)
ManiSkill / SAPIENManipulation benchmarksMIT (UCSD)
RobosuiteManipulation on MuJoCoMIT

5.3 Differentiable simulators

SimulatorDiff methodUse
BraxAutomatic via JAXPolicy gradient + meta-learning
MuJoCo MJXAutomatic via JAXSame
Dojo (Stanford)Implicit-function theorem through contactTrajectory optimization through contact
Nimble (Mehrabi 2021)Featherstone with diffSysid + trajectory opt
Drake (AutoDiffXd)Forward-modeTrajectory opt + sysid
DiSECtDifferentiable cuttingCutting / surgical sim
Warp (NVIDIA)Differentiable Python kernelsSoft / fluid / mesh sim

6. Reference data

6.1 Physics engine comparison (rigid-body contact)

EngineContact modelFrictionSpeed (rel. MuJoCo)DifferentiableNotes
MuJoCoConvex hull + softPyramidal cone1.0×Yes (MJX)Best for legged + manipulation
PhysX (Isaac)TGS + softCoulomb (approx)0.7×No (sample-based grads)GPU resident; great for scale
BulletSequential impulseCoulomb (approx)0.5×NoLegacy choice
DrakeTAMSI / TAMSI-LSAPCoulomb0.3×YesHigh fidelity
ODESequential impulse + LCPPyramidal0.4×NoGazebo default
DartLCPCoulomb0.4×Yes (limited)Slower
SimbodyFeatherstone + softSoft0.4×NoDrake’s predecessor

6.2 Notable sim-to-real successes

YearProjectSimTricksResult
2018OpenAI Dactyl (Rubik’s Cube hand)MuJoCoDR + ADR + LSTM memorySolved cube in ~50 attempts on Shadow Hand
2018Hwangbo et al. ANYmalRaiSimDR + actuator netsFirst quadruped RL controller deployed
2020Lee et al. ANYmal (rough terrain)RaiSimTeacher–student + privileged infoWalked rough terrain reliably
2021Kumar et al. (RMA, A1)Isaac GymRMA adaptation moduleMini Cheetah / A1 across terrains
2022Crowley et al. CassieMuJoCo + IsaacDR + reward shaping5 km outdoor run
2023Margolis et al. Mini-Cheetah parkourIsaac GymPeriodic gait + DR + curriculumStairs, gaps, vertical leaps
2024NVIDIA H1 humanoid (OmniH2O)Isaac LabWhole-body teleop + RLUnitree H1 retargeting from human MoCap
2024Berkeley HumanPlusIsaac GymImitation + RLHumanoid whole-body skills from human video
2024LocoMan / Extreme ParkourIsaac GymTwo-stage curriculum + DRQuad gym + parkour

6.3 Software-stack cross-reference

NeedPick
Open-source, contact-rich manipulationMuJoCo + Robosuite
Open-source GPU-scale RLIsaac Lab (free for research) or MuJoCo MJX
Photorealistic vision in the loopIsaac Sim or Habitat 3.0
ROS 2 integrationGazebo (Ignition) or Isaac Sim with Isaac ROS bridge
Manipulation benchmark suiteManiSkill / Robosuite / RLBench
Quadrotor RLAerial Gym + Isaac Lab
Differentiable for sysid / trajoptDrake or MuJoCo MJX or Dojo
Soft body / cloth / fluidGenesis / MuJoCo MJX (FEM) / FleX
Self-drivingCARLA (open) or DRIVE Sim / Foretify (commercial)

6.4 Software-version pinning practice

Sim-to-real is brittle to upstream changes — a physics-engine update can change contact behavior. Standard hygiene:

  • Lock physics engine version (mujoco==3.1.4, gymnasium==0.29.1, isaaclab==0.4.0).
  • Lock Python + CUDA + driver triple.
  • Container the entire training environment (Docker image hash recorded).
  • Git-tag the URDF + policy network + reward + hyper-params at every published checkpoint.
  • Re-validate sim-to-real transfer on every dependency bump.

The “I retrained today and the policy mysteriously got worse” failure is almost always a silent upstream change.

6.5 Data flow in a typical sim-to-real loop

[CAD URDF / SDF / MJCF] ────────┐
[Calibration data (mass,        │
 friction, motor consts)]───────┼─→ [Sim model.xml]
                                │        │
[Real telemetry logs] ──────────┘        │ 4096 parallel
                                          ▼
                                    [Isaac Lab / MJX / Brax]
                                          │
                                          ▼
                                  [PPO/SAC trainer]
                                          │
                                          ▼
                                  [Policy + adaptation module]
                                          │
                                          ▼
                          [ONNX / TorchScript export]
                                          │
                                          ▼
                          [Onboard inference (Orin / Jetson / x86)]
                                          │
                                          ▼
                                  [Robot hardware]
                                          │
                                          ▼
                          [Telemetry → cycle]

6.6 Cost-benefit per simulator

Simulator        | Setup difficulty | Throughput   | Vision realism | Differentiable
MuJoCo (CPU)     | Low              | Medium       | Low            | Limited
MuJoCo MJX (GPU) | Medium           | Very high    | Low            | Yes
Isaac Lab        | High             | Very high    | High           | Limited
Drake            | High             | Low          | Low            | Yes
Gazebo           | Medium           | Low          | Medium         | No
PyBullet         | Very low         | Low          | Low            | No
Genesis          | Medium (new)     | Very high    | Medium         | Yes

7. Failure modes & debugging

  • The “sim exploit” gait — policy discovers a vibration / hopping pattern that exploits integrator instabilities. Debug: lower simulation step size; add action smoothness penalty; observe motor commands directly. Fix: clip action rate; train at deployment frequency.
  • Latency-blind oscillation — policy works in sim, oscillates in real because the real loop has 20 ms more latency than simulated. Fix: measure real latency; add to sim; retrain.
  • Friction-cone collapse — policy slips because real friction is lower than the lowest DR sample. Fix: extend DR range downward; add explicit slip-detection input.
  • Encoder quantization error — sim has 32-bit floats; real encoder is 13-bit (8192 counts/rev). Policy uses high-frequency components that don’t exist. Fix: quantize sim observations; or add encoder noise > quantization.
  • Backlash in harmonic drives — sim assumes rigid; real has 0.5–2 arcmin backlash. Fix: model backlash explicitly (Drake supports; MuJoCo needs custom joint), or train with random small dead-zones.
  • Thermal motor degradation — sim runs at constant motor performance; real motor torque limit drops 20% after 30 s of hard use. Fix: model thermal model in sim; or episode-length match deployment.
  • Sensor synchronization — sim updates all sensors at the same instant; real has staggered (camera 30 Hz, IMU 1 kHz, encoder 10 kHz). Fix: model sensor-specific update rates; or align everything to lowest common rate.
  • Camera-noise gap — synthetic images look perfect; real images have rolling shutter, motion blur, lens distortion. Fix: aggressive image augmentation (color jitter, noise, blur, JPEG compression artifacts), or sim2real with style-transfer (CycleGAN, but rarely used post-2022).
  • CoM-uncertainty — payload changes shift CoM; policy untrained for it. Fix: include CoM as DR variable, or use RMA to adapt online.

8. Case studies

8.1 OpenAI Dactyl + Rubik’s Cube (2018-2019)

Robot: Shadow Robot Hand (24 DoF anthropomorphic). Task: pick up + manipulate a Rubik’s Cube into solved configuration. Sim: MuJoCo with 23 randomized parameters. Tricks:

  • LSTM policy memory to infer parameters online.
  • Automatic Domain Randomization (ADR) — randomization range expands when policy succeeds at p > 0.5, shrinks below p < 0.3.
  • 13,000 years of simulated experience (~10⁹ env-steps × 60× wall-clock acceleration).

Result: 60% Cube success on hardware; cited as the canonical proof that sim-to-real for dexterous manipulation works at all. Subsequent critique (Kalashnikov 2018): solved a narrow distribution; non-Cube objects fail. But the technique worked and propagated everywhere.

8.2 ANYmal: from sim to construction site

ANYbotics’ ANYmal-D (24 kg, 12 DoF SEA actuators) shipped in 2020 with hand-coded MPC + reactive controller; in 2022+ added RL policies trained in RaiSim.

Sim: RaiSim (ETH Zurich’s in-house Featherstone + soft-contact engine). Training: PPO with privileged-teacher → student distillation (Lee 2020 “Learning quadrupedal locomotion over challenging terrain”). DR over friction, mass, motor delay, terrain. Result: deployed on ABB substation inspection, BP Mad Dog oil rig (with Boston Dynamics Spot), ÖBB railway inspection. The RL policy handles terrain (gravel, stairs, ladders); the MPC handles posture + manipulation. Hybrid stack.

8.3 Cassie outdoor run (Crowley 2022)

University of Michigan Cassie (Agility Robotics) — 10 DoF biped, no upper body. Sim: MuJoCo and Isaac Gym. Training: PPO over a single neural-net policy mapping IMU + joint state → joint targets. DR: friction 0.5–1.5, link mass ±20%, motor delay ±10 ms, ground inclination ±10°. Reward: forward velocity tracking + foot-clearance + minimum-jerk + survival bonus. Result: 5 km outdoor running on Wolverine running track in Ann Arbor MI, ~5 m/s average. First demonstration that a single end-to-end RL policy could handle the full locomotion stack for a biped, including transient pushes.

8.4 NVIDIA Isaac Lab humanoid training (2024)

Target: Unitree H1, Apptronik Apollo, etc. Stack: Isaac Lab (built on Isaac Sim PhysX 5 + Omniverse). 4096 parallel environments on a single H100 GPU. Throughput: ~1 M env-steps/s; PPO training to working locomotion policy in ~6 hours wall-clock. Output: open-source baselines for major humanoids; community fork “Humanoid Bench” (Sferrazza 2024) standardized benchmark.

8.5 BD Spot RL policy (Boston Dynamics 2024)

Boston Dynamics released production RL controllers for Spot in 2024 as a software update. Earlier Spot software was MPC-based hand-coded. The RL controller:

  • Trained in BD-internal simulator (MuJoCo + custom contact model).
  • DR over friction, mass, payload position, motor saturation, joint damping.
  • 5000+ parallel envs on multi-GPU cluster.
  • Distilled from a privileged-teacher to an observation-history student.
  • Deployed to ~1500+ Spots in service worldwide.
  • Improvement: faster reactive behavior, smoother trajectories on terrain.

Significance: first major commercial deployment of sim-to-real RL on a production fleet (vs. research demos). Demonstrated that RL+DR is reliable enough for paid-customer use, not just lab.

8.6 Tesla Optimus walking policy (claimed)

Tesla has claimed Optimus walking policy is RL-trained in MuJoCo + Tesla in-house simulator. As of mid-2025 public demos show humanoid walking outdoors, picking up objects, basic teleop. Tesla’s sim-to-real pipeline reportedly leverages the same vision dataset infrastructure as FSD (curate-and-reweight pipelines). Skepticism remains about scope of generalization; verified public-task list is limited.

8.7 Unitree G1 humanoid RL policies (open community)

Unitree G1 (~100M+ humanoid program to $16k DIY.

8.8 Genesis universal simulator (Dec 2024)

CMU + multi-institutional release. Combines: rigid-body, soft-body (FEM), MPM (Material Point Method for granular / fluid), cloth, and rendering, all GPU-native and differentiable.

  • Speed: claimed 43× MuJoCo on rigid quadruped sims, ~5× Isaac Lab for some benchmarks.
  • Coverage: 30+ robotic embodiments preloaded.
  • Differentiable through contact for trajectory optimization.

As of 2026 still maturing; production sim-to-real deployments remain dominated by MuJoCo + Isaac Lab pair. Watch this space.

8.9 OmniH2O — humanoid whole-body teleop (NVIDIA 2024)

NVIDIA + CMU collaboration. Train a Unitree H1 humanoid to perform whole-body manipulations by retargeting human motion-capture data:

  • Input: human MoCap from CMU dataset → retargeted to H1’s 23-DoF embodiment.
  • Sim: Isaac Lab with 4096 parallel envs.
  • Reward: imitation distance + balance + survival bonus.
  • DR over friction, base disturbance, motor delay, payload mass.
  • Output: H1 performs dance, hand-clapping, balance-on-one-foot, and reaches for objects, all sim-to-real-transferred.

Demonstrated at Nvidia GTC 2024 keynote; informed later work GR00T.

8.10 ANYmal mobile inspection — sim-trained mobility on real customer sites

ANYbotics’ production deployments at BP Mad Dog, Shell Nyhamna, Statkraft hydroelectric, ÖBB rail. All deployed RL policies for locomotion were trained in RaiSim with:

  • ~100 sim hours of training (RL).
  • DR: friction 0.4–1.2, body mass ±15%, motor delays ±10 ms.
  • Terrain: heightfield generation with rocks, slopes 0–25°, stairs.
  • Field deployment: > 99% session-completion rate across customer sites.

A meaningful proof-of-concept for sim-to-real RL in OOD industrial environments (oil rigs, snow, hot surfaces).

Reference workflow — starting a new sim-to-real project

1. Define the task envelope.
   - What real-world variability must the policy handle?
   - What metrics define success?

2. Build a sim model.
   - CAD → URDF / MJCF / SDF (export tools: Onshape, Solidworks, OpenSCAD, Blender + plugin).
   - Verify visually in the simulator GUI.
   - Compare to real photos (sanity check geometry).

3. Identify physical parameters.
   - Weigh links; measure motor constants; measure friction (joint friction test rig).
   - Identify latencies (timing test).
   - Plug measurements into sim model.

4. Pick the right simulator.
   - Use the decision tree from §5.3.

5. Define the DR distribution.
   - Start with ±20% on key parameters.
   - Add randomization to anything you don't measure.

6. Train.
   - Verify in sim that policy reaches target metrics.
   - Inspect for sim exploits (look at action profiles).

7. Deploy to real (carefully).
   - Start in a safe area; have the e-stop ready.
   - Compare real-vs-sim trajectories visually.
   - Note systematic biases.

8. Iterate.
   - Sysid → DR tightening → retrain → redeploy.

Adjacent

Citations

  • Tobin J., Fong R., Ray A., et al. (2017) “Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World.” IROS.
  • Peng X.B., Andrychowicz M., Zaremba W., Abbeel P. (2018) “Sim-to-Real Transfer of Robotic Control with Dynamics Randomization.” ICRA.
  • OpenAI, Andrychowicz M., et al. (2019) “Solving Rubik’s Cube with a Robot Hand.” arXiv:1910.07113.
  • Hwangbo J., Lee J., Dosovitskiy A., et al. (2019) “Learning Agile and Dynamic Motor Skills for Legged Robots.” Science Robotics 4.
  • Lee J., Hwangbo J., Wellhausen L., Koltun V., Hutter M. (2020) “Learning Quadrupedal Locomotion over Challenging Terrain.” Science Robotics 5.
  • Kumar A., Fu Z., Pathak D., Malik J. (2021) “RMA: Rapid Motor Adaptation for Legged Robots.” RSS.
  • Chebotar Y., Handa A., Makoviychuk V., et al. (2019) “Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience.” ICRA.
  • Pinto L., Davidson J., Sukthankar R., Gupta A. (2017) “Robust Adversarial Reinforcement Learning.” ICML.
  • Crowley D., Dao H., Hurst J., et al. (2022) “5K Outdoor Run by a Bipedal Robot Cassie.” ICRA.
  • Todorov E., Erez T., Tassa Y. (2012) “MuJoCo: A physics engine for model-based control.” IROS.
  • Makoviychuk V., Wawrzyniak L., Guo Y., et al. (2021) “Isaac Gym: High Performance GPU-Based Physics Simulation for Robot Learning.” NeurIPS Datasets and Benchmarks.
  • Mittal M., Yu C., Yu Q., Liu J., Rudin N., et al. (2023) “Orbit: A Unified Simulation Framework for Interactive Robot Learning Environments.” (predecessor to Isaac Lab)
  • Howell T., Le Lidec Q., Cleach S.L., et al. (2022) “Dojo: A Differentiable Physics Engine for Robotics.” arXiv:2203.00806.
  • Stewart D., Trinkle J. (1996) “An Implicit Time-Stepping Scheme for Rigid Body Dynamics with Coulomb Friction.” Int J Numer Meth Eng 39.
  • Anitescu M. (2006) “Optimization-based simulation of nonsmooth rigid multibody dynamics.” Math Prog.
  • Genesis Authors (2024) “Genesis: A Universal and Generative Physics Engine for Robotics and Beyond.” genesis-world.readthedocs.io.
  • Margolis G.B., Yang G., Paigwar K., et al. (2024) “Rapid Locomotion via Reinforcement Learning.” IJRR.
  • Sferrazza C., Huang D.M., Lin X., Lee Y., Abbeel P. (2024) “HumanoidBench: Simulated Humanoid Benchmark for Whole-Body Locomotion and Manipulation.” arXiv:2403.10506.
  • Drake project authors (2014–) drake.mit.edu.
  • MuJoCo project authors (since 2012, DeepMind since 2021) mujoco.org.