G1 Navigation Policy (DAgger Distillation)

Vision-based navigation policies for the Unitree G1 humanoid robot, trained using teacher-student DAgger distillation.

Model Description

This repository contains two PyTorch TorchScript models:

Model	File	Input Dim	Output Dim	Description
Student	`student_policy.pt`	82	3	Deployable policy using depth rays from monocular depth estimation
Teacher	`teacher_policy.pt`	34	3	Privileged policy with ground-truth obstacle positions

Architecture

Both policies use a 3-layer MLP with ELU activations:

Student: [82] → 256 → 128 → 64 → [3]
Teacher: [34] → 256 → 128 → 64 → [3]

Observation Spaces

Student (82-dim):

Depth rays: 72 dims (±70° FOV, corrupted with noise/dropout)
Robot velocity (vx, vy, ω): 3 dims
Goal relative position: 2 dims
Goal distance & angle: 2 dims
Previous action: 3 dims

Teacher (34-dim):

Nearest obstacles: 8 × 3 = 24 dims (x, y, distance)
Robot velocity: 3 dims
Goal relative + distance/angle: 4 dims
Previous action: 3 dims

Action Space

Velocity commands: [vx, vy, ω]

vx ∈ [-0.6, 1.0] m/s (forward/backward)
vy ∈ [-0.5, 0.5] m/s (lateral)
ω ∈ [-1.57, 1.57] rad/s (yaw rate)

Training Details

Two-Stage Pipeline

Stage 1: Teacher PPO - Train privileged teacher with ground-truth obstacles using PPO (2000 iterations)
Stage 2: DAgger Distillation - Distill teacher to student using Dataset Aggregation with 70% → 20% teacher mixing decay

Key Innovations

FOV Randomization: fov_keep_ratio ∈ [0.35, 1.0] prevents sensor overfitting
Hardcase Curriculum: Mine failure trajectories, retrain with 35% hardcase resets
Symmetry Augmentation: 50% mirror transform eliminates left/right bias
Runtime Safety Layer: Distance-based velocity scaling for collision avoidance

Performance

Scenario	Success Rate	Collision Rate
Deploy (mild noise)	75.0%	25.0%
Stress (heavy noise)	75.2%	24.8%
Wide-FOV Clean	74.4%	25.6%

Real Robot Validation

Direction	Target	Final Distance	Result
Forward	2.0m	0.26m	✅ SUCCESS
Backward	-1.5m	0.26m	✅ SUCCESS
Left	1.5m	0.31m	✅ SUCCESS
Right	-2.0m	0.26m	✅ SUCCESS
Diagonal	(1.5, 1.5)m	0.30m	✅ SUCCESS

Usage

import torch

# Load student policy (for deployment)
student = torch.jit.load("student_policy.pt")
student.eval()

# Prepare observation (82-dim)
obs = torch.zeros(1, 82)  # [depth_rays(72), vel(3), goal_rel(2), goal_dist_angle(2), prev_action(3)]

# Get action
with torch.no_grad():
    action = student(obs)  # [vx, vy, omega]

Training Environment

Simulator: NVIDIA IsaacLab (Isaac Sim 4.5)
Arena: 8m × 8m with 24-32 cylindrical obstacles
Control Rate: 10 Hz policy / 50 Hz physics
Robot: Unitree G1 (capsule proxy in sim, full robot for real deployment)

Citation

If you use this model, please cite:

@misc{g1-navigation-dagger-2026,
  title={Teacher-Student Distillation via DAgger for Sim-to-Real Navigation on the Unitree G1},
  author={Adjimavo},
  year={2026},
  url={https://huggingface.co/Adjimavo/g1-navigation-dagger}
}

License

Apache 2.0

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning