G1 Humanoid Motion Imitation (AMP) - Step 2 Physics Calibration
Reinforcement learning policy for the Unitree G1 humanoid robot to imitate 494 human motion capture sequences from the AMASS dataset, trained using Adversarial Motion Priors (AMP) in Isaac Lab.
This is the Step 2 (Physics Calibration) checkpoint β pure tracking mode with curriculum domain randomization, before style injection.
Model Details
| Parameter | Value |
|---|---|
| Robot | Unitree G1 (37 DOFs, 23 active via DOF mask) |
| Algorithm | AMP (Adversarial Motion Priors) via skrl |
| Framework | Isaac Lab 2.3.0 / Isaac Sim 5.1.0 |
| Motion Dataset | 494 AMASS motions (~113 min, 196,642 frames) |
| Training Mode | Pure tracking (tracking=1.0, style=0.0, discriminator OFF) |
| Training Hardware | NVIDIA RTX 4080 SUPER (16GB VRAM) |
| Training Duration |
Performance (Best Checkpoint)
| Metric | Value |
|---|---|
| Total Reward (mean) | 160.29 |
| Total Reward (max) | 285.12 |
| Episode Length (mean) | 396.9 / 400 steps |
| Best Checkpoint Step | 3,850,240 |
Tracking Reward
The tracking reward is an exponential kernel (exp(-error / 0.2)) over a weighted combination of pose errors. Range is 0.0 (poor) to 1.0 (perfect match).
| Metric | At Best Checkpoint | Peak (all time) |
|---|---|---|
| Instantaneous Reward (mean) | 0.400 | 0.402 (step 5.5M) |
| Instantaneous Reward (max) | 0.596 | 0.673 (step 8.8M) |
| Tracking Reward (mean) | 0.317 | 0.576 (step 0 β easy motions) |
| Tracking Reward (max) | 0.585 | 0.623 (step 12M) |
The instantaneous reward mean of ~0.40 indicates the policy tracks the average motion with reasonable fidelity across all 494 diverse motions. The max of ~0.60 shows strong tracking on easier motions.
Architecture
- Policy: Gaussian MLP (1024 β 512 β 23), fixed log_std = -2.9
- Value: Deterministic MLP (1024 β 512 β 1)
- Discriminator: Deterministic MLP (1024 β 512 β 1) with ELU (disabled in Step 2)
- Observation Space: 216 dimensions (joint pos/vel, root state, future reference targets)
- Action Space: 23 dimensions (joint position targets, scaled by 0.5)
Training Configuration
- Environments: 1024 parallel
- Rollouts: 16 steps
- Learning Rate: 2.5e-5
- Discount Factor: 0.99
- GAE Lambda: 0.95
- Mini-batches: 2
- Learning Epochs: 6
- PPO Clip: 0.2
- Physics dt: 0.005s (200Hz), decimation=4, 50Hz control
Domain Randomization (Curriculum)
Linearly interpolated from initial to target ranges over 240k iterations:
| Parameter | Initial Range | Target Range |
|---|---|---|
| Mass | (0.95, 1.05) | (0.8, 1.2) |
| Friction | (0.9, 1.1) | (0.6, 1.4) |
| PD Gains | (0.9, 1.1) | (0.7, 1.3) |
| Action Delay | (0, 1) | (0, 2) steps |
Reward Weights
| Component | Weight |
|---|---|
| Tracking | 1.0 |
| Action Rate Penalty | 0.01 |
| Termination (height < 0.6m) | -200.0 |
| Style (discriminator) | 0.0 (disabled) |
Tracking Metric Weights
| Component | Weight |
|---|---|
| Root Rotation | 0.4 |
| End Effector Position | 0.3 |
| Joint Position | 0.2 |
| Root Position XY | 0.1 |
Usage
Evaluation
# Inside Isaac Lab Docker container
cd /workspace/isaaclab
/isaac-sim/python.sh scripts/reinforcement_learning/skrl/play.py \
--task Isaac-Humanoid-Amass-Step2-PhysicsCalibration-v0 \
--algorithm AMP --num_envs 16 \
--checkpoint /path/to/best_agent.pt
Record Video
/isaac-sim/python.sh scripts/reinforcement_learning/skrl/play.py \
--task Isaac-Humanoid-Amass-Step2-PhysicsCalibration-v0 \
--algorithm AMP --num_envs 16 \
--checkpoint /path/to/best_agent.pt \
--video --video_length 500
Resume Training
/isaac-sim/python.sh scripts/reinforcement_learning/skrl/train.py \
--task Isaac-Humanoid-Amass-Step2-PhysicsCalibration-v0 \
--algorithm AMP --num_envs 1024 --headless \
--checkpoint /path/to/best_agent.pt
Files
βββ best_agent.pt # Full checkpoint β policy + value + discriminator + optimizer (25 MB)
βββ policy_jit.pt # JIT-traced policy only β for deployment/inference (2.9 MB)
βββ params/
β βββ agent.yaml # skrl agent configuration
β βββ env.yaml # Environment configuration
βββ README.md # This model card
JIT Model
policy_jit.pt is a TorchScript-traced policy network (input: 216-dim observation, output: 23-dim joint targets). It runs without skrl or Isaac Lab dependencies:
import torch
model = torch.jit.load("policy_jit.pt")
obs = torch.randn(1, 216) # [batch, obs_dim]
actions = model(obs) # [batch, 23] β joint position targets (scale by 0.5)
Use best_agent.pt to resume training. Use policy_jit.pt for deployment or sim-to-real transfer.
Three-Step Training Strategy
This checkpoint is from Step 2 of a three-step curriculum:
| Step | Goal | Discriminator | Status |
|---|---|---|---|
| 1. Verification | Physics check (50 easy motions) | OFF | Complete |
| 2. Physics Calibration | Master all 494 motions | OFF | This checkpoint |
| 3. Style Injection | Add natural motion style | ON | Pending |
Citation
@misc{pathonai2026g1imitate,
title={G1 Humanoid Motion Imitation with AMP in Isaac Lab},
author={PathOn-AI},
year={2026},
publisher={Hugging Face},
url={https://huggingface.co/PathOn-AI/g1-imitate-isaaclab-amp}
}
License
MIT