G1 Humanoid Motion Imitation (AMP) - Step 2 Physics Calibration

Reinforcement learning policy for the Unitree G1 humanoid robot to imitate 494 human motion capture sequences from the AMASS dataset, trained using Adversarial Motion Priors (AMP) in Isaac Lab.

This is the Step 2 (Physics Calibration) checkpoint β€” pure tracking mode with curriculum domain randomization, before style injection.

Model Details

Parameter Value
Robot Unitree G1 (37 DOFs, 23 active via DOF mask)
Algorithm AMP (Adversarial Motion Priors) via skrl
Framework Isaac Lab 2.3.0 / Isaac Sim 5.1.0
Motion Dataset 494 AMASS motions (~113 min, 196,642 frames)
Training Mode Pure tracking (tracking=1.0, style=0.0, discriminator OFF)
Training Hardware NVIDIA RTX 4080 SUPER (16GB VRAM)
Training Duration 5.5 days (12.4M timesteps total, best at 3.85M)

Performance (Best Checkpoint)

Metric Value
Total Reward (mean) 160.29
Total Reward (max) 285.12
Episode Length (mean) 396.9 / 400 steps
Best Checkpoint Step 3,850,240

Tracking Reward

The tracking reward is an exponential kernel (exp(-error / 0.2)) over a weighted combination of pose errors. Range is 0.0 (poor) to 1.0 (perfect match).

Metric At Best Checkpoint Peak (all time)
Instantaneous Reward (mean) 0.400 0.402 (step 5.5M)
Instantaneous Reward (max) 0.596 0.673 (step 8.8M)
Tracking Reward (mean) 0.317 0.576 (step 0 β€” easy motions)
Tracking Reward (max) 0.585 0.623 (step 12M)

The instantaneous reward mean of ~0.40 indicates the policy tracks the average motion with reasonable fidelity across all 494 diverse motions. The max of ~0.60 shows strong tracking on easier motions.

Architecture

  • Policy: Gaussian MLP (1024 β†’ 512 β†’ 23), fixed log_std = -2.9
  • Value: Deterministic MLP (1024 β†’ 512 β†’ 1)
  • Discriminator: Deterministic MLP (1024 β†’ 512 β†’ 1) with ELU (disabled in Step 2)
  • Observation Space: 216 dimensions (joint pos/vel, root state, future reference targets)
  • Action Space: 23 dimensions (joint position targets, scaled by 0.5)

Training Configuration

  • Environments: 1024 parallel
  • Rollouts: 16 steps
  • Learning Rate: 2.5e-5
  • Discount Factor: 0.99
  • GAE Lambda: 0.95
  • Mini-batches: 2
  • Learning Epochs: 6
  • PPO Clip: 0.2
  • Physics dt: 0.005s (200Hz), decimation=4, 50Hz control

Domain Randomization (Curriculum)

Linearly interpolated from initial to target ranges over 240k iterations:

Parameter Initial Range Target Range
Mass (0.95, 1.05) (0.8, 1.2)
Friction (0.9, 1.1) (0.6, 1.4)
PD Gains (0.9, 1.1) (0.7, 1.3)
Action Delay (0, 1) (0, 2) steps

Reward Weights

Component Weight
Tracking 1.0
Action Rate Penalty 0.01
Termination (height < 0.6m) -200.0
Style (discriminator) 0.0 (disabled)

Tracking Metric Weights

Component Weight
Root Rotation 0.4
End Effector Position 0.3
Joint Position 0.2
Root Position XY 0.1

Usage

Evaluation

# Inside Isaac Lab Docker container
cd /workspace/isaaclab
/isaac-sim/python.sh scripts/reinforcement_learning/skrl/play.py \
    --task Isaac-Humanoid-Amass-Step2-PhysicsCalibration-v0 \
    --algorithm AMP --num_envs 16 \
    --checkpoint /path/to/best_agent.pt

Record Video

/isaac-sim/python.sh scripts/reinforcement_learning/skrl/play.py \
    --task Isaac-Humanoid-Amass-Step2-PhysicsCalibration-v0 \
    --algorithm AMP --num_envs 16 \
    --checkpoint /path/to/best_agent.pt \
    --video --video_length 500

Resume Training

/isaac-sim/python.sh scripts/reinforcement_learning/skrl/train.py \
    --task Isaac-Humanoid-Amass-Step2-PhysicsCalibration-v0 \
    --algorithm AMP --num_envs 1024 --headless \
    --checkpoint /path/to/best_agent.pt

Files

β”œβ”€β”€ best_agent.pt          # Full checkpoint β€” policy + value + discriminator + optimizer (25 MB)
β”œβ”€β”€ policy_jit.pt          # JIT-traced policy only β€” for deployment/inference (2.9 MB)
β”œβ”€β”€ params/
β”‚   β”œβ”€β”€ agent.yaml         # skrl agent configuration
β”‚   └── env.yaml           # Environment configuration
└── README.md              # This model card

JIT Model

policy_jit.pt is a TorchScript-traced policy network (input: 216-dim observation, output: 23-dim joint targets). It runs without skrl or Isaac Lab dependencies:

import torch
model = torch.jit.load("policy_jit.pt")
obs = torch.randn(1, 216)  # [batch, obs_dim]
actions = model(obs)        # [batch, 23] β€” joint position targets (scale by 0.5)

Use best_agent.pt to resume training. Use policy_jit.pt for deployment or sim-to-real transfer.

Three-Step Training Strategy

This checkpoint is from Step 2 of a three-step curriculum:

Step Goal Discriminator Status
1. Verification Physics check (50 easy motions) OFF Complete
2. Physics Calibration Master all 494 motions OFF This checkpoint
3. Style Injection Add natural motion style ON Pending

Citation

@misc{pathonai2026g1imitate,
  title={G1 Humanoid Motion Imitation with AMP in Isaac Lab},
  author={PathOn-AI},
  year={2026},
  publisher={Hugging Face},
  url={https://huggingface.co/PathOn-AI/g1-imitate-isaaclab-amp}
}

License

MIT

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading