G1 Humanoid Motion Imitation (AMP) - Step 2 Physics Calibration

Reinforcement learning policy for the Unitree G1 humanoid robot to imitate 494 human motion capture sequences from the AMASS dataset, trained using Adversarial Motion Priors (AMP) in Isaac Lab.

This is the Step 2 (Physics Calibration) checkpoint — pure tracking mode with curriculum domain randomization, before style injection.

Model Details

Parameter	Value
Robot	Unitree G1 (37 DOFs, 23 active via DOF mask)
Algorithm	AMP (Adversarial Motion Priors) via skrl
Framework	Isaac Lab 2.3.0 / Isaac Sim 5.1.0
Motion Dataset	494 AMASS motions (~113 min, 196,642 frames)
Training Mode	Pure tracking (tracking=1.0, style=0.0, discriminator OFF)
Training Hardware	NVIDIA RTX 4080 SUPER (16GB VRAM)
Training Duration	~~5.5 days (~~12.4M timesteps total, best at 3.85M)

Performance (Best Checkpoint)

Metric	Value
Total Reward (mean)	160.29
Total Reward (max)	285.12
Episode Length (mean)	396.9 / 400 steps
Best Checkpoint Step	3,850,240

Tracking Reward

The tracking reward is an exponential kernel (exp(-error / 0.2)) over a weighted combination of pose errors. Range is 0.0 (poor) to 1.0 (perfect match).

Metric	At Best Checkpoint	Peak (all time)
Instantaneous Reward (mean)	0.400	0.402 (step 5.5M)
Instantaneous Reward (max)	0.596	0.673 (step 8.8M)
Tracking Reward (mean)	0.317	0.576 (step 0 — easy motions)
Tracking Reward (max)	0.585	0.623 (step 12M)

The instantaneous reward mean of ~0.40 indicates the policy tracks the average motion with reasonable fidelity across all 494 diverse motions. The max of ~0.60 shows strong tracking on easier motions.

Architecture

Policy: Gaussian MLP (1024 → 512 → 23), fixed log_std = -2.9
Value: Deterministic MLP (1024 → 512 → 1)
Discriminator: Deterministic MLP (1024 → 512 → 1) with ELU (disabled in Step 2)
Observation Space: 216 dimensions (joint pos/vel, root state, future reference targets)
Action Space: 23 dimensions (joint position targets, scaled by 0.5)

Training Configuration

Environments: 1024 parallel
Rollouts: 16 steps
Learning Rate: 2.5e-5
Discount Factor: 0.99
GAE Lambda: 0.95
Mini-batches: 2
Learning Epochs: 6
PPO Clip: 0.2
Physics dt: 0.005s (200Hz), decimation=4, 50Hz control

Domain Randomization (Curriculum)

Linearly interpolated from initial to target ranges over 240k iterations:

Parameter	Initial Range	Target Range
Mass	(0.95, 1.05)	(0.8, 1.2)
Friction	(0.9, 1.1)	(0.6, 1.4)
PD Gains	(0.9, 1.1)	(0.7, 1.3)
Action Delay	(0, 1)	(0, 2) steps

Reward Weights

Component	Weight
Tracking	1.0
Action Rate Penalty	0.01
Termination (height < 0.6m)	-200.0
Style (discriminator)	0.0 (disabled)

Tracking Metric Weights

Component	Weight
Root Rotation	0.4
End Effector Position	0.3
Joint Position	0.2
Root Position XY	0.1

Usage

Evaluation

# Inside Isaac Lab Docker container
cd /workspace/isaaclab
/isaac-sim/python.sh scripts/reinforcement_learning/skrl/play.py \
    --task Isaac-Humanoid-Amass-Step2-PhysicsCalibration-v0 \
    --algorithm AMP --num_envs 16 \
    --checkpoint /path/to/best_agent.pt

Record Video

/isaac-sim/python.sh scripts/reinforcement_learning/skrl/play.py \
    --task Isaac-Humanoid-Amass-Step2-PhysicsCalibration-v0 \
    --algorithm AMP --num_envs 16 \
    --checkpoint /path/to/best_agent.pt \
    --video --video_length 500

Resume Training

/isaac-sim/python.sh scripts/reinforcement_learning/skrl/train.py \
    --task Isaac-Humanoid-Amass-Step2-PhysicsCalibration-v0 \
    --algorithm AMP --num_envs 1024 --headless \
    --checkpoint /path/to/best_agent.pt

Files

├── best_agent.pt          # Full checkpoint — policy + value + discriminator + optimizer (25 MB)
├── policy_jit.pt          # JIT-traced policy only — for deployment/inference (2.9 MB)
├── params/
│   ├── agent.yaml         # skrl agent configuration
│   └── env.yaml           # Environment configuration
└── README.md              # This model card

JIT Model

policy_jit.pt is a TorchScript-traced policy network (input: 216-dim observation, output: 23-dim joint targets). It runs without skrl or Isaac Lab dependencies:

import torch
model = torch.jit.load("policy_jit.pt")
obs = torch.randn(1, 216)  # [batch, obs_dim]
actions = model(obs)        # [batch, 23] — joint position targets (scale by 0.5)

Use best_agent.pt to resume training. Use policy_jit.pt for deployment or sim-to-real transfer.

Three-Step Training Strategy

This checkpoint is from Step 2 of a three-step curriculum:

Step	Goal	Discriminator	Status
1. Verification	Physics check (50 easy motions)	OFF	Complete
2. Physics Calibration	Master all 494 motions	OFF	This checkpoint
3. Style Injection	Add natural motion style	ON	Pending

Citation

@misc{pathonai2026g1imitate,
  title={G1 Humanoid Motion Imitation with AMP in Isaac Lab},
  author={PathOn-AI},
  year={2026},
  publisher={Hugging Face},
  url={https://huggingface.co/PathOn-AI/g1-imitate-isaaclab-amp}
}

License

MIT

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning