G1 Navigation Policy (DAgger Distillation)

Vision-based navigation policies for the Unitree G1 humanoid robot, trained using teacher-student DAgger distillation.

Model Description

This repository contains two PyTorch TorchScript models:

Model File Input Dim Output Dim Description
Student student_policy.pt 82 3 Deployable policy using depth rays from monocular depth estimation
Teacher teacher_policy.pt 34 3 Privileged policy with ground-truth obstacle positions

Architecture

Both policies use a 3-layer MLP with ELU activations:

  • Student: [82] β†’ 256 β†’ 128 β†’ 64 β†’ [3]
  • Teacher: [34] β†’ 256 β†’ 128 β†’ 64 β†’ [3]

Observation Spaces

Student (82-dim):

  • Depth rays: 72 dims (Β±70Β° FOV, corrupted with noise/dropout)
  • Robot velocity (vx, vy, Ο‰): 3 dims
  • Goal relative position: 2 dims
  • Goal distance & angle: 2 dims
  • Previous action: 3 dims

Teacher (34-dim):

  • Nearest obstacles: 8 Γ— 3 = 24 dims (x, y, distance)
  • Robot velocity: 3 dims
  • Goal relative + distance/angle: 4 dims
  • Previous action: 3 dims

Action Space

Velocity commands: [vx, vy, Ο‰]

  • vx ∈ [-0.6, 1.0] m/s (forward/backward)
  • vy ∈ [-0.5, 0.5] m/s (lateral)
  • Ο‰ ∈ [-1.57, 1.57] rad/s (yaw rate)

Training Details

Two-Stage Pipeline

  1. Stage 1: Teacher PPO - Train privileged teacher with ground-truth obstacles using PPO (2000 iterations)
  2. Stage 2: DAgger Distillation - Distill teacher to student using Dataset Aggregation with 70% β†’ 20% teacher mixing decay

Key Innovations

  • FOV Randomization: fov_keep_ratio ∈ [0.35, 1.0] prevents sensor overfitting
  • Hardcase Curriculum: Mine failure trajectories, retrain with 35% hardcase resets
  • Symmetry Augmentation: 50% mirror transform eliminates left/right bias
  • Runtime Safety Layer: Distance-based velocity scaling for collision avoidance

Performance

Scenario Success Rate Collision Rate
Deploy (mild noise) 75.0% 25.0%
Stress (heavy noise) 75.2% 24.8%
Wide-FOV Clean 74.4% 25.6%

Real Robot Validation

Direction Target Final Distance Result
Forward 2.0m 0.26m βœ… SUCCESS
Backward -1.5m 0.26m βœ… SUCCESS
Left 1.5m 0.31m βœ… SUCCESS
Right -2.0m 0.26m βœ… SUCCESS
Diagonal (1.5, 1.5)m 0.30m βœ… SUCCESS

Usage

import torch

# Load student policy (for deployment)
student = torch.jit.load("student_policy.pt")
student.eval()

# Prepare observation (82-dim)
obs = torch.zeros(1, 82)  # [depth_rays(72), vel(3), goal_rel(2), goal_dist_angle(2), prev_action(3)]

# Get action
with torch.no_grad():
    action = student(obs)  # [vx, vy, omega]

Training Environment

  • Simulator: NVIDIA IsaacLab (Isaac Sim 4.5)
  • Arena: 8m Γ— 8m with 24-32 cylindrical obstacles
  • Control Rate: 10 Hz policy / 50 Hz physics
  • Robot: Unitree G1 (capsule proxy in sim, full robot for real deployment)

Citation

If you use this model, please cite:

@misc{g1-navigation-dagger-2026,
  title={Teacher-Student Distillation via DAgger for Sim-to-Real Navigation on the Unitree G1},
  author={Adjimavo},
  year={2026},
  url={https://huggingface.co/Adjimavo/g1-navigation-dagger}
}

License

Apache 2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading