Sim-to-Real — Robotics Simulators, Domain Randomization, Transfer

Training robot policies in simulation and deploying on physical hardware. The simulator zoo (MuJoCo, Isaac Sim, Gazebo, Drake, Genesis, PyBullet, CoppeliaSim, Webots), the physics-engine choices that determine what transfers, the domain-randomization recipes that close the reality gap, the system-identification techniques that shrink it from the other end. Sim-to-real is the single most important capability that made the 2018–2026 wave of learned robot policies (legged locomotion, dexterous manipulation, humanoid control) economically viable; without it each robot would need millions of physical interactions.

1. At a glance

A robotics simulator is a software environment that approximates the physics, sensors, and actuators of a real robot well enough that a controller (hand-written or learned) trained or tested in it transfers to hardware without catastrophic failure. “Well enough” is the entire problem: physics engines simplify contact, friction, deformation, and aerodynamics in different ways, and the gap between simulated and real behaviour — the reality gap — destroys naive transfer.

The strategy that works in 2026 has four pillars:

Pick the right physics engine for the task. Rigid-body contact-rich (manipulation, locomotion) → MuJoCo or Isaac Sim. Soft-body / deformable → SOFA, FleX, Genesis. Aerodynamics / multirotor → AirSim / Flightmare / Aerial Gym / RotorPy. Underwater → Stonefish or HoloOcean. Massively parallel RL → Isaac Lab (GPU-resident, 1k–100k parallel envs) or Brax.
Randomize what you don’t know. Domain Randomization (Tobin 2017, Peng 2018) varies friction, mass, sensor noise, latency, motor saturation, lighting, textures during training. The policy learns the family of dynamics, not the specific one.
System-identify what you can. Measure joint friction, motor torque constants, link masses, latency offline. Tighten the simulation to match. The smaller the gap, the less randomization you need.
Use real data where it matters. Real2sim2real (Chebotar 2019), residual policy learning, online adaptation (RMA — Rapid Motor Adaptation, Kumar 2021). Don’t make sim do the impossible.

This stack is what put Boston Dynamics quadrupeds into production-RL controllers (2021+), got Cassie running 5 km (Crowley 2022), trained OpenAI’s dexterous Rubik’s Cube hand (2019), and now drives every humanoid RL policy (Figure, 1X, Apptronik, Unitree).

Where this sits. RL uses the simulator as its environment; rigid-body dynamics is what the physics engine integrates; calibration is the manual sysid pre-step; vision uses the renderer for synthetic training data; ROS 2 integrates via ros_control / gz_ros2 / Isaac ROS for sim-real-swap; legged and humanoid policies are the largest current consumer.

First ask. What’s the contact problem? If you have rich contact (grasping, walking, sliding), you need a hard-constraint contact solver (MuJoCo, Drake) or a softened one with care (PhysX, Bullet — softer than reality unless tuned). Single environment or 10k parallel? Single → MuJoCo / Drake / Gazebo. 10k+ → Isaac Lab / Brax / Genesis (GPU-resident). What needs to render photorealistically? Vision in the loop → Isaac Sim (PhysX + Omniverse RTX), Unreal-based AirSim, Habitat 3.0 (indoor), CARLA (urban driving). Are you doing soft-body? Genesis (2024), SOFA, MuJoCo MJX with FEM (2024+), FleX. Standard rigid-body engines don’t help.

2. First principles

2.1 The reality gap

Decomposes into:

Modeling gap. Physics simplifications: rigid bodies instead of compliant, ideal motors instead of torque-ripple-and-cogging, ideal sensors instead of biased + delayed + noisy.
Parameter gap. Even with the right model: mass, friction, inertia, compliance values differ from reality. Friction in particular is notoriously hard to identify and varies with surface condition.
Perception gap. Synthetic images differ from real (lighting, sensor noise, lens distortion, motion blur, depth-camera artifacts).
Latency gap. Sim runs at perfect timing; real systems have variable network/bus delays.
Compliance gap. Real joints flex; harmonic drives have backlash + lost motion; cables stretch; gears wear.

2.2 Domain randomization (DR)

Train across many randomly-parameterized environments so the learned policy generalizes to some unseen real environment. The key idea (Tobin 2017): “if the model sees enough variation in simulation, the real world will look like just another variation.”

Standard randomized quantities for locomotion / manipulation:

Link masses: ±20-30% multiplicative
Inertia tensors: ±20-30%
Joint friction (Coulomb + viscous): 0.5–2× nominal
Joint damping: 0.5–2× nominal
Motor torque limit: 0.7–1.3× nominal
Motor PD gains: 0.7–1.3× nominal
Actuator latency: 0–30 ms uniform
IMU bias / noise: ±0.05 m/s² accel, ±0.01 rad/s gyro
Ground friction coefficient: 0.4–1.2
Ground restitution: 0–0.3
External pushes: ±20 N every 5 s
Terrain height field: ±2–5 cm noise

Visual DR (Tobin 2017, more aggressive than physics DR):

Object textures: random from 1000+ texture library
Lighting: 1–4 point lights, random position + intensity + colour
Camera position: ±5 cm position, ±2° orientation jitter
Distractor objects: 0–8 random shapes in background
Image noise: Gaussian + speckle

2.3 Automatic domain randomization (ADR)

OpenAI 2019 (Rubik’s Cube) introduced Automatic Domain Randomization — the randomization range expands automatically when the policy succeeds, shrinks when it fails. This avoids the manual tuning problem (too wide → slow training; too narrow → bad transfer). Variant: Robust Adversarial Reinforcement Learning (RARL, Pinto 2017) where an adversary perturbs the dynamics.

2.4 System identification

Inverse problem: find physics parameters θ that minimize ||sim(θ) − real||. Two flavors:

Offline sysid: collect real trajectories of known controls; fit dynamics parameters (mass, inertia, friction, motor constants) by gradient descent or evolutionary search. Tools: Pinocchio Identification, Drake SystemIdentification, custom MuJoCo + autograd / JAX.

Online sysid (adaptive control): estimate parameters live during operation. RLS (recursive least squares), L1 adaptive, MIAC. Often combined with a fast inner loop that uses the estimate.

2.5 Rapid Motor Adaptation (RMA, Kumar 2021)

Train two policies in parallel:

Privileged-teacher: takes ground-truth dynamics parameters z (friction, mass, etc.) as input. Trained with PPO + DR.
Adaptation module: predicts a latent z̃ from the recent history (last 50 timesteps of state-action). Trained by regression to teacher’s z.

At deployment: the policy uses z̃ instead of true z. The adaptation module updates online without gradient steps — pure feedforward inference at policy rate. RMA was the key trick that took Mini-Cheetah, A1, and ANYmal RL controllers from “works on flat ground” to “works on grass, gravel, sand, ice.”

2.6 Real2Sim2Real

Chebotar 2019, Allevato 2020. Iteratively:

Roll out a policy on real hardware; log trajectories.
In simulation, find dynamics parameters θ that best reproduce the real trajectories (sysid).
Train a new policy with the updated simulator + (often) DR around θ.
Deploy; collect new data; iterate.

Closes the loop between offline sysid and policy training.

2.7 Residual policy learning

Train a learned policy that adds a correction to an analytic / model-based controller. The base controller handles 90% of the behavior; the learned residual closes the gap. Used in ANYmal navigation policy (Bjelonic 2022), in surgical-robot deployments (Korkmaz 2023), and increasingly in mobile manipulation.

2.8 The contact-solver problem

The hardest physics for sim-to-real is contact — when discrete events (impacts, friction transitions) happen inside a continuous integration step. Two main approaches:

Linear Complementarity Problem (LCP) solvers — Stewart-Trinkle. Hard constraints, no inter-penetration, no “soft springiness.” Used by ODE (legacy), Vortex.
Soft-contact / penalty — Springs and dampers approximate contact. Easier to tune for stability, but adds artificial compliance. Used by Bullet, PhysX (default).
Convex contact — MuJoCo’s pyramidal friction cone + Newton solver. Stable, fast, but pyramidal-cone friction is a 4-sided approximation to the true Coulomb cone.
Hard contact with smoothing — Drake’s TAMSI (Time-stepping with Anitescu Smoothing Iteration); MuJoCo MJX’s analytic gradient through contact.

The choice cascades to what transfers. PhysX-trained policies often learn to exploit soft-contact compliance that doesn’t exist in reality. MuJoCo-trained policies are tighter but slower per step (until MJX vectorized it on GPU in 2023).

3. Practical math — DR ranges and sysid examples

3.1 Worked example: quadruped friction DR range

For a quadruped expected to operate on indoor concrete (μ ≈ 0.8), outdoor asphalt (μ ≈ 0.7), grass (μ ≈ 0.6), wet tile (μ ≈ 0.3):

Training distribution: U[0.3, 1.0] for ground friction.
Bottom of range covers wet tile worst case.
Top covers dry concrete + some margin.

After training, the policy is robust over the entire range. The OOD slip (wet tile + downhill) is where you’d see degradation.

3.2 Worked example: motor constant identification

Goal: identify motor torque constant K_t.

Lock the joint output mechanically (kinematic constraint).
Command a known current i.
Measure resulting torque τ on a load cell.
K_t = τ / i.

Typical result: K_t = 0.5 N·m/A ± 5% across motors in a batch. Use the per-motor value in sim if practical; otherwise use the batch mean + DR ±5%.

3.3 Worked example: latency calibration

Goal: identify the round-trip command-to-encoder delay.

Command a step torque at t=0.
Log the encoder velocity response.
Find the inflection point in velocity. Subtract any expected mechanical delay (low-pass filter, gear backlash).
Result: typically 1–5 ms for a real-time EtherCAT system, 10–30 ms for ROS over Ethernet.

Bake this into sim as a fixed offset, then add DR ±50% on top.

3.4 GPU-resident training throughput

Isaac Lab on H100 (4096 environments × 1 kHz physics × shared policy network):

Step time: ~4 ms per batched step (all 4096 in parallel).
Throughput: ~1.0 M env-steps/s.
1B-step training: ~17 min wall-clock.

Vs MuJoCo single-thread: ~20k steps/s. The GPU pipeline is 50× faster per environment, and supports 100× more environments. This is what made humanoid RL viable in 2023+.

3.5 Common DR ranges (literature consensus)

mass_link_i        ~ U[0.7, 1.3] * nominal
inertia_link_i     ~ U[0.7, 1.3] * nominal
com_link_i         + N(0, 0.01 m) per axis
joint_friction     ~ U[0.5, 2.0] * nominal
joint_damping      ~ U[0.5, 2.0] * nominal
motor_kp           ~ U[0.7, 1.3] * nominal
motor_kd           ~ U[0.7, 1.3] * nominal
ground_friction    ~ U[0.3, 1.0]
ground_restitution ~ U[0.0, 0.3]
ext_force_push     ~ N(0, 10 N), every 5 s
encoder_noise      ~ N(0, 0.001 rad)
imu_accel_bias     ~ N(0, 0.05 m/s²)
imu_gyro_bias      ~ N(0, 0.01 rad/s)
action_latency     ~ U[0, 30 ms]
obs_latency        ~ U[0, 30 ms]

4. Design heuristics

Sysid before DR. A 10% mass error eats DR budget you could spend on real uncertainty. Weigh your robot. Measure each motor. CAD-mass is wrong by ~10–30% routinely.
Latency is the killer. Sim-to-real for fast dynamics (legged, multirotor) is dominated by latency mismatch. Identify it offline and bake it in.
Episodes matter. DR helps within an episode-length window. If your real episode is 5 minutes but your sim trains on 5-second episodes, behaviors that drift over minutes (battery sag, motor heat) won’t be captured.
Watch for sim-only exploits. Common: policy “vibrates” the motors at the integrator’s Nyquist, exploiting numerical instabilities to produce non-physical thrust. Fix: action smoothness penalty + lower control rate.
Train at the deployment control rate. If your real system commands at 50 Hz, sim should match. Higher sim rate looks better in metrics but produces policies that fail on hardware.
Render variety > render fidelity for vision policies. Tobin 2017: a bag of unrealistic textures generalizes better than one realistic texture.
Test transfer with the same firmware you deploy. Different filter cut-offs / different deadbands ≠ same robot. Hardware-in-the-loop sim is the cheapest catch.
Always have a model-based fallback. A learned controller failing is a black-box failure; you can’t tune it. Keep a model-based emergency-stop / posture-holder that takes over below a safety threshold.

5. Components & sourcing — the simulator zoo

5.1 Physics-engine-focused

Simulator	Engine	License	Best for	Notes
MuJoCo	Custom convex	Apache 2.0 (since 2022)	Contact-rich, fast	Acquired by DeepMind 2021, open-sourced 2022; MJX GPU port 2023
MuJoCo MJX	MuJoCo on JAX	Apache 2.0	GPU RL	Vectorized; 1000s parallel envs
Isaac Sim + Isaac Lab	PhysX 5 + Omniverse	Nvidia EULA, free for research	Photorealistic + RL at scale	Replaces Isaac Gym 2023; GPU-resident
Drake	TAMSI / custom	BSD-3	High-fidelity, certified	MIT/TRI; differentiable; symbolic; manipulation focus
Bullet / PyBullet	Bullet 3	Zlib	Quick prototyping	Older, slower than MuJoCo for contact
Genesis	Custom GPU + diff	Apache 2.0	Unified rigid + soft + fluid	2024 release; combines MPM + FEM + rigid
Gazebo (Ignition)	ODE / Bullet / Dart / Simbody plugins	Apache 2.0	ROS 2 integration	Open Robotics; gz-sim default in ROS 2 Humble+
CoppeliaSim (V-REP)	Bullet / ODE / MuJoCo / Newton / Vortex	Free educational, paid commercial	Multi-engine, scriptable	Coppelia Robotics
Webots	ODE-derived	Apache 2.0	Education + multi-robot	Cyberbotics
Brax	Custom JAX	Apache 2.0	GPU/TPU RL only	DeepMind; no contact gradients pre-2023
DART	Featherstone + LCP	BSD-2	Open research	Slower than MuJoCo
NVIDIA FleX	Position-based dynamics	NVIDIA EULA	Soft + fluid	Legacy; mostly subsumed by Isaac Sim PhysX

5.2 Application-focused

Simulator	Focus	License
AirSim	Multirotor + cars on Unreal	MIT (Microsoft, archived 2022)
Flightmare	Quadrotor (RL focus)	MIT
Aerial Gym (RotorPy)	Quadrotor learning at GPU scale	MIT
CARLA	Self-driving urban	MIT
NVIDIA DRIVE Sim	AV; ties to Omniverse	Nvidia
Cognata / Foretellix / Applied Intuition / Parallel Domain	AV simulation commercial	Commercial
HoloOcean	Underwater	MIT
Stonefish	Underwater + maritime	GPL-3
Habitat 3.0	Indoor navigation + interaction	MIT (Meta)
iGibson 2.0	Indoor with realistic physics	MIT (Stanford)
ManiSkill / SAPIEN	Manipulation benchmarks	MIT (UCSD)
Robosuite	Manipulation on MuJoCo	MIT

5.3 Differentiable simulators

Simulator	Diff method	Use
Brax	Automatic via JAX	Policy gradient + meta-learning
MuJoCo MJX	Automatic via JAX	Same
Dojo (Stanford)	Implicit-function theorem through contact	Trajectory optimization through contact
Nimble (Mehrabi 2021)	Featherstone with diff	Sysid + trajectory opt
Drake (`AutoDiffXd`)	Forward-mode	Trajectory opt + sysid
DiSECt	Differentiable cutting	Cutting / surgical sim
Warp (NVIDIA)	Differentiable Python kernels	Soft / fluid / mesh sim

6. Reference data

6.1 Physics engine comparison (rigid-body contact)

Engine	Contact model	Friction	Speed (rel. MuJoCo)	Differentiable	Notes
MuJoCo	Convex hull + soft	Pyramidal cone	1.0×	Yes (MJX)	Best for legged + manipulation
PhysX (Isaac)	TGS + soft	Coulomb (approx)	0.7×	No (sample-based grads)	GPU resident; great for scale
Bullet	Sequential impulse	Coulomb (approx)	0.5×	No	Legacy choice
Drake	TAMSI / TAMSI-LSAP	Coulomb	0.3×	Yes	High fidelity
ODE	Sequential impulse + LCP	Pyramidal	0.4×	No	Gazebo default
Dart	LCP	Coulomb	0.4×	Yes (limited)	Slower
Simbody	Featherstone + soft	Soft	0.4×	No	Drake’s predecessor

6.2 Notable sim-to-real successes

Year	Project	Sim	Tricks	Result
2018	OpenAI Dactyl (Rubik’s Cube hand)	MuJoCo	DR + ADR + LSTM memory	Solved cube in ~50 attempts on Shadow Hand
2018	Hwangbo et al. ANYmal	RaiSim	DR + actuator nets	First quadruped RL controller deployed
2020	Lee et al. ANYmal (rough terrain)	RaiSim	Teacher–student + privileged info	Walked rough terrain reliably
2021	Kumar et al. (RMA, A1)	Isaac Gym	RMA adaptation module	Mini Cheetah / A1 across terrains
2022	Crowley et al. Cassie	MuJoCo + Isaac	DR + reward shaping	5 km outdoor run
2023	Margolis et al. Mini-Cheetah parkour	Isaac Gym	Periodic gait + DR + curriculum	Stairs, gaps, vertical leaps
2024	NVIDIA H1 humanoid (OmniH2O)	Isaac Lab	Whole-body teleop + RL	Unitree H1 retargeting from human MoCap
2024	Berkeley HumanPlus	Isaac Gym	Imitation + RL	Humanoid whole-body skills from human video
2024	LocoMan / Extreme Parkour	Isaac Gym	Two-stage curriculum + DR	Quad gym + parkour

6.3 Software-stack cross-reference

Need	Pick
Open-source, contact-rich manipulation	MuJoCo + Robosuite
Open-source GPU-scale RL	Isaac Lab (free for research) or MuJoCo MJX
Photorealistic vision in the loop	Isaac Sim or Habitat 3.0
ROS 2 integration	Gazebo (Ignition) or Isaac Sim with Isaac ROS bridge
Manipulation benchmark suite	ManiSkill / Robosuite / RLBench
Quadrotor RL	Aerial Gym + Isaac Lab
Differentiable for sysid / trajopt	Drake or MuJoCo MJX or Dojo
Soft body / cloth / fluid	Genesis / MuJoCo MJX (FEM) / FleX
Self-driving	CARLA (open) or DRIVE Sim / Foretify (commercial)

6.4 Software-version pinning practice

Sim-to-real is brittle to upstream changes — a physics-engine update can change contact behavior. Standard hygiene:

Lock physics engine version (mujoco==3.1.4, gymnasium==0.29.1, isaaclab==0.4.0).
Lock Python + CUDA + driver triple.
Container the entire training environment (Docker image hash recorded).
Git-tag the URDF + policy network + reward + hyper-params at every published checkpoint.
Re-validate sim-to-real transfer on every dependency bump.

The “I retrained today and the policy mysteriously got worse” failure is almost always a silent upstream change.

6.5 Data flow in a typical sim-to-real loop

[CAD URDF / SDF / MJCF] ────────┐
[Calibration data (mass,        │
 friction, motor consts)]───────┼─→ [Sim model.xml]
                                │        │
[Real telemetry logs] ──────────┘        │ 4096 parallel
                                          ▼
                                    [Isaac Lab / MJX / Brax]
                                          │
                                          ▼
                                  [PPO/SAC trainer]
                                          │
                                          ▼
                                  [Policy + adaptation module]
                                          │
                                          ▼
                          [ONNX / TorchScript export]
                                          │
                                          ▼
                          [Onboard inference (Orin / Jetson / x86)]
                                          │
                                          ▼
                                  [Robot hardware]
                                          │
                                          ▼
                          [Telemetry → cycle]

6.6 Cost-benefit per simulator

Simulator        | Setup difficulty | Throughput   | Vision realism | Differentiable
MuJoCo (CPU)     | Low              | Medium       | Low            | Limited
MuJoCo MJX (GPU) | Medium           | Very high    | Low            | Yes
Isaac Lab        | High             | Very high    | High           | Limited
Drake            | High             | Low          | Low            | Yes
Gazebo           | Medium           | Low          | Medium         | No
PyBullet         | Very low         | Low          | Low            | No
Genesis          | Medium (new)     | Very high    | Medium         | Yes

7. Failure modes & debugging

The “sim exploit” gait — policy discovers a vibration / hopping pattern that exploits integrator instabilities. Debug: lower simulation step size; add action smoothness penalty; observe motor commands directly. Fix: clip action rate; train at deployment frequency.
Latency-blind oscillation — policy works in sim, oscillates in real because the real loop has 20 ms more latency than simulated. Fix: measure real latency; add to sim; retrain.
Friction-cone collapse — policy slips because real friction is lower than the lowest DR sample. Fix: extend DR range downward; add explicit slip-detection input.
Encoder quantization error — sim has 32-bit floats; real encoder is 13-bit (8192 counts/rev). Policy uses high-frequency components that don’t exist. Fix: quantize sim observations; or add encoder noise > quantization.
Backlash in harmonic drives — sim assumes rigid; real has 0.5–2 arcmin backlash. Fix: model backlash explicitly (Drake supports; MuJoCo needs custom joint), or train with random small dead-zones.
Thermal motor degradation — sim runs at constant motor performance; real motor torque limit drops 20% after 30 s of hard use. Fix: model thermal model in sim; or episode-length match deployment.
Sensor synchronization — sim updates all sensors at the same instant; real has staggered (camera 30 Hz, IMU 1 kHz, encoder 10 kHz). Fix: model sensor-specific update rates; or align everything to lowest common rate.
Camera-noise gap — synthetic images look perfect; real images have rolling shutter, motion blur, lens distortion. Fix: aggressive image augmentation (color jitter, noise, blur, JPEG compression artifacts), or sim2real with style-transfer (CycleGAN, but rarely used post-2022).
CoM-uncertainty — payload changes shift CoM; policy untrained for it. Fix: include CoM as DR variable, or use RMA to adapt online.

8. Case studies

8.1 OpenAI Dactyl + Rubik’s Cube (2018-2019)

Robot: Shadow Robot Hand (24 DoF anthropomorphic). Task: pick up + manipulate a Rubik’s Cube into solved configuration. Sim: MuJoCo with 23 randomized parameters. Tricks:

LSTM policy memory to infer parameters online.
Automatic Domain Randomization (ADR) — randomization range expands when policy succeeds at p > 0.5, shrinks below p < 0.3.
13,000 years of simulated experience (~10⁹ env-steps × 60× wall-clock acceleration).

Result: 60% Cube success on hardware; cited as the canonical proof that sim-to-real for dexterous manipulation works at all. Subsequent critique (Kalashnikov 2018): solved a narrow distribution; non-Cube objects fail. But the technique worked and propagated everywhere.

8.2 ANYmal: from sim to construction site

ANYbotics’ ANYmal-D (24 kg, 12 DoF SEA actuators) shipped in 2020 with hand-coded MPC + reactive controller; in 2022+ added RL policies trained in RaiSim.

Sim: RaiSim (ETH Zurich’s in-house Featherstone + soft-contact engine). Training: PPO with privileged-teacher → student distillation (Lee 2020 “Learning quadrupedal locomotion over challenging terrain”). DR over friction, mass, motor delay, terrain. Result: deployed on ABB substation inspection, BP Mad Dog oil rig (with Boston Dynamics Spot), ÖBB railway inspection. The RL policy handles terrain (gravel, stairs, ladders); the MPC handles posture + manipulation. Hybrid stack.

8.3 Cassie outdoor run (Crowley 2022)

University of Michigan Cassie (Agility Robotics) — 10 DoF biped, no upper body. Sim: MuJoCo and Isaac Gym. Training: PPO over a single neural-net policy mapping IMU + joint state → joint targets. DR: friction 0.5–1.5, link mass ±20%, motor delay ±10 ms, ground inclination ±10°. Reward: forward velocity tracking + foot-clearance + minimum-jerk + survival bonus. Result: 5 km outdoor running on Wolverine running track in Ann Arbor MI, ~5 m/s average. First demonstration that a single end-to-end RL policy could handle the full locomotion stack for a biped, including transient pushes.

8.4 NVIDIA Isaac Lab humanoid training (2024)

Target: Unitree H1, Apptronik Apollo, etc. Stack: Isaac Lab (built on Isaac Sim PhysX 5 + Omniverse). 4096 parallel environments on a single H100 GPU. Throughput: ~1 M env-steps/s; PPO training to working locomotion policy in ~6 hours wall-clock. Output: open-source baselines for major humanoids; community fork “Humanoid Bench” (Sferrazza 2024) standardized benchmark.

8.5 BD Spot RL policy (Boston Dynamics 2024)

Boston Dynamics released production RL controllers for Spot in 2024 as a software update. Earlier Spot software was MPC-based hand-coded. The RL controller:

Trained in BD-internal simulator (MuJoCo + custom contact model).
DR over friction, mass, payload position, motor saturation, joint damping.
5000+ parallel envs on multi-GPU cluster.
Distilled from a privileged-teacher to an observation-history student.
Deployed to ~1500+ Spots in service worldwide.
Improvement: faster reactive behavior, smoother trajectories on terrain.

Significance: first major commercial deployment of sim-to-real RL on a production fleet (vs. research demos). Demonstrated that RL+DR is reliable enough for paid-customer use, not just lab.

8.6 Tesla Optimus walking policy (claimed)

Tesla has claimed Optimus walking policy is RL-trained in MuJoCo + Tesla in-house simulator. As of mid-2025 public demos show humanoid walking outdoors, picking up objects, basic teleop. Tesla’s sim-to-real pipeline reportedly leverages the same vision dataset infrastructure as FSD (curate-and-reweight pipelines). Skepticism remains about scope of generalization; verified public-task list is limited.

8.7 Unitree G1 humanoid RL policies (open community)

Unitree G1 (~ $16 k co n s u m er h u man o i d, 35 Do F) s hi pp e d w i t hhan d - co d e d MPC d e f a u lt co n t ro ll ers + o p e n - so u rce R L p o l i cys t a c k . C o mm u ni t y - t r ain e d p o l i c i es (R L_{w} a r p, I s aa c G y m U ni t ree, e t c .) re a c h co mm erc ia l - g r a d e w a l kin g o n g r a ss + in d oors u r f a ces . T h ecyc l eo f t r ainin g i s p u b l i c l y v i s ib l e — o p e n D i scor d c hann e l s, w ee k l y p u b l i s h e d c h ec k p o in t s — an d re p rese n t s t h e f i rs t co mm u ni t y w h eree n d - u sersro u t in e l y t r ain + d e pl oy t h e i ro w n R L p o l i c i eso na p ro d u c t i o nh u man o i d . S i g ni f i c an ce : l o w ers t h ee n t ry ba rr i er f ro m$ 100M+ humanoid program to $16k DIY.

8.8 Genesis universal simulator (Dec 2024)

CMU + multi-institutional release. Combines: rigid-body, soft-body (FEM), MPM (Material Point Method for granular / fluid), cloth, and rendering, all GPU-native and differentiable.

Speed: claimed 43× MuJoCo on rigid quadruped sims, ~5× Isaac Lab for some benchmarks.
Coverage: 30+ robotic embodiments preloaded.
Differentiable through contact for trajectory optimization.

As of 2026 still maturing; production sim-to-real deployments remain dominated by MuJoCo + Isaac Lab pair. Watch this space.

8.9 OmniH2O — humanoid whole-body teleop (NVIDIA 2024)

NVIDIA + CMU collaboration. Train a Unitree H1 humanoid to perform whole-body manipulations by retargeting human motion-capture data:

Input: human MoCap from CMU dataset → retargeted to H1’s 23-DoF embodiment.
Sim: Isaac Lab with 4096 parallel envs.
Reward: imitation distance + balance + survival bonus.
DR over friction, base disturbance, motor delay, payload mass.
Output: H1 performs dance, hand-clapping, balance-on-one-foot, and reaches for objects, all sim-to-real-transferred.

Demonstrated at Nvidia GTC 2024 keynote; informed later work GR00T.

8.10 ANYmal mobile inspection — sim-trained mobility on real customer sites

ANYbotics’ production deployments at BP Mad Dog, Shell Nyhamna, Statkraft hydroelectric, ÖBB rail. All deployed RL policies for locomotion were trained in RaiSim with:

~100 sim hours of training (RL).
DR: friction 0.4–1.2, body mass ±15%, motor delays ±10 ms.
Terrain: heightfield generation with rocks, slopes 0–25°, stairs.
Field deployment: > 99% session-completion rate across customer sites.

A meaningful proof-of-concept for sim-to-real RL in OOD industrial environments (oil rigs, snow, hot surfaces).

Reference workflow — starting a new sim-to-real project

1. Define the task envelope.
   - What real-world variability must the policy handle?
   - What metrics define success?

2. Build a sim model.
   - CAD → URDF / MJCF / SDF (export tools: Onshape, Solidworks, OpenSCAD, Blender + plugin).
   - Verify visually in the simulator GUI.
   - Compare to real photos (sanity check geometry).

3. Identify physical parameters.
   - Weigh links; measure motor constants; measure friction (joint friction test rig).
   - Identify latencies (timing test).
   - Plug measurements into sim model.

4. Pick the right simulator.
   - Use the decision tree from §5.3.

5. Define the DR distribution.
   - Start with ±20% on key parameters.
   - Add randomization to anything you don't measure.

6. Train.
   - Verify in sim that policy reaches target metrics.
   - Inspect for sim exploits (look at action profiles).

7. Deploy to real (carefully).
   - Start in a safe area; have the e-stop ready.
   - Compare real-vs-sim trajectories visually.
   - Note systematic biases.

8. Iterate.
   - Sysid → DR tightening → retrain → redeploy.

Adjacent

Citations

Tobin J., Fong R., Ray A., et al. (2017) “Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World.” IROS.
Peng X.B., Andrychowicz M., Zaremba W., Abbeel P. (2018) “Sim-to-Real Transfer of Robotic Control with Dynamics Randomization.” ICRA.
OpenAI, Andrychowicz M., et al. (2019) “Solving Rubik’s Cube with a Robot Hand.” arXiv:1910.07113.
Hwangbo J., Lee J., Dosovitskiy A., et al. (2019) “Learning Agile and Dynamic Motor Skills for Legged Robots.” Science Robotics 4.
Lee J., Hwangbo J., Wellhausen L., Koltun V., Hutter M. (2020) “Learning Quadrupedal Locomotion over Challenging Terrain.” Science Robotics 5.
Kumar A., Fu Z., Pathak D., Malik J. (2021) “RMA: Rapid Motor Adaptation for Legged Robots.” RSS.
Chebotar Y., Handa A., Makoviychuk V., et al. (2019) “Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience.” ICRA.
Pinto L., Davidson J., Sukthankar R., Gupta A. (2017) “Robust Adversarial Reinforcement Learning.” ICML.
Crowley D., Dao H., Hurst J., et al. (2022) “5K Outdoor Run by a Bipedal Robot Cassie.” ICRA.
Todorov E., Erez T., Tassa Y. (2012) “MuJoCo: A physics engine for model-based control.” IROS.
Makoviychuk V., Wawrzyniak L., Guo Y., et al. (2021) “Isaac Gym: High Performance GPU-Based Physics Simulation for Robot Learning.” NeurIPS Datasets and Benchmarks.
Mittal M., Yu C., Yu Q., Liu J., Rudin N., et al. (2023) “Orbit: A Unified Simulation Framework for Interactive Robot Learning Environments.” (predecessor to Isaac Lab)
Howell T., Le Lidec Q., Cleach S.L., et al. (2022) “Dojo: A Differentiable Physics Engine for Robotics.” arXiv:2203.00806.
Stewart D., Trinkle J. (1996) “An Implicit Time-Stepping Scheme for Rigid Body Dynamics with Coulomb Friction.” Int J Numer Meth Eng 39.
Anitescu M. (2006) “Optimization-based simulation of nonsmooth rigid multibody dynamics.” Math Prog.
Genesis Authors (2024) “Genesis: A Universal and Generative Physics Engine for Robotics and Beyond.” genesis-world.readthedocs.io.
Margolis G.B., Yang G., Paigwar K., et al. (2024) “Rapid Locomotion via Reinforcement Learning.” IJRR.
Sferrazza C., Huang D.M., Lin X., Lee Y., Abbeel P. (2024) “HumanoidBench: Simulated Humanoid Benchmark for Whole-Body Locomotion and Manipulation.” arXiv:2403.10506.
Drake project authors (2014–) drake.mit.edu.
MuJoCo project authors (since 2012, DeepMind since 2021) mujoco.org.

Compendium

Explorer

Sim-to-Real — Robotics Simulators, Domain Randomization, Transfer

Sim-to-Real — Robotics Simulators, Domain Randomization, Transfer

See also

1. At a glance

2. First principles

2.1 The reality gap

2.2 Domain randomization (DR)

2.3 Automatic domain randomization (ADR)

2.4 System identification

2.5 Rapid Motor Adaptation (RMA, Kumar 2021)

2.6 Real2Sim2Real

2.7 Residual policy learning

2.8 The contact-solver problem

3. Practical math — DR ranges and sysid examples

3.1 Worked example: quadruped friction DR range

3.2 Worked example: motor constant identification

3.3 Worked example: latency calibration

3.4 GPU-resident training throughput

3.5 Common DR ranges (literature consensus)

4. Design heuristics

5. Components & sourcing — the simulator zoo

5.1 Physics-engine-focused

5.2 Application-focused

5.3 Differentiable simulators

6. Reference data

6.1 Physics engine comparison (rigid-body contact)

6.2 Notable sim-to-real successes

6.3 Software-stack cross-reference

6.4 Software-version pinning practice

6.5 Data flow in a typical sim-to-real loop

6.6 Cost-benefit per simulator

7. Failure modes & debugging

8. Case studies

8.1 OpenAI Dactyl + Rubik’s Cube (2018-2019)

8.2 ANYmal: from sim to construction site

8.3 Cassie outdoor run (Crowley 2022)

8.4 NVIDIA Isaac Lab humanoid training (2024)

8.5 BD Spot RL policy (Boston Dynamics 2024)

8.6 Tesla Optimus walking policy (claimed)

8.7 Unitree G1 humanoid RL policies (open community)

8.8 Genesis universal simulator (Dec 2024)

8.9 OmniH2O — humanoid whole-body teleop (NVIDIA 2024)

8.10 ANYmal mobile inspection — sim-trained mobility on real customer sites

Reference workflow — starting a new sim-to-real project

Adjacent

Citations

Graph View

Table of Contents