File size: 3,712 Bytes
d53bd47 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 | ---
license: apache-2.0
tags:
- robotics
- navigation
- reinforcement-learning
- imitation-learning
- dagger
- humanoid
- unitree-g1
- isaaclab
- sim-to-real
language:
- en
pipeline_tag: reinforcement-learning
---
# G1 Navigation Policy (DAgger Distillation)
Vision-based navigation policies for the **Unitree G1 humanoid robot**, trained using teacher-student DAgger distillation.
## Model Description
This repository contains two PyTorch TorchScript models:
| Model | File | Input Dim | Output Dim | Description |
|-------|------|-----------|------------|-------------|
| **Student** | `student_policy.pt` | 82 | 3 | Deployable policy using depth rays from monocular depth estimation |
| **Teacher** | `teacher_policy.pt` | 34 | 3 | Privileged policy with ground-truth obstacle positions |
### Architecture
Both policies use a 3-layer MLP with ELU activations:
- **Student**: [82] β 256 β 128 β 64 β [3]
- **Teacher**: [34] β 256 β 128 β 64 β [3]
### Observation Spaces
**Student (82-dim):**
- Depth rays: 72 dims (Β±70Β° FOV, corrupted with noise/dropout)
- Robot velocity (vx, vy, Ο): 3 dims
- Goal relative position: 2 dims
- Goal distance & angle: 2 dims
- Previous action: 3 dims
**Teacher (34-dim):**
- Nearest obstacles: 8 Γ 3 = 24 dims (x, y, distance)
- Robot velocity: 3 dims
- Goal relative + distance/angle: 4 dims
- Previous action: 3 dims
### Action Space
Velocity commands: `[vx, vy, Ο]`
- `vx β [-0.6, 1.0]` m/s (forward/backward)
- `vy β [-0.5, 0.5]` m/s (lateral)
- `Ο β [-1.57, 1.57]` rad/s (yaw rate)
## Training Details
### Two-Stage Pipeline
1. **Stage 1: Teacher PPO** - Train privileged teacher with ground-truth obstacles using PPO (2000 iterations)
2. **Stage 2: DAgger Distillation** - Distill teacher to student using Dataset Aggregation with 70% β 20% teacher mixing decay
### Key Innovations
- **FOV Randomization**: `fov_keep_ratio β [0.35, 1.0]` prevents sensor overfitting
- **Hardcase Curriculum**: Mine failure trajectories, retrain with 35% hardcase resets
- **Symmetry Augmentation**: 50% mirror transform eliminates left/right bias
- **Runtime Safety Layer**: Distance-based velocity scaling for collision avoidance
### Performance
| Scenario | Success Rate | Collision Rate |
|----------|-------------|----------------|
| Deploy (mild noise) | 75.0% | 25.0% |
| Stress (heavy noise) | 75.2% | 24.8% |
| Wide-FOV Clean | 74.4% | 25.6% |
### Real Robot Validation
| Direction | Target | Final Distance | Result |
|-----------|--------|----------------|--------|
| Forward | 2.0m | 0.26m | β
SUCCESS |
| Backward | -1.5m | 0.26m | β
SUCCESS |
| Left | 1.5m | 0.31m | β
SUCCESS |
| Right | -2.0m | 0.26m | β
SUCCESS |
| Diagonal | (1.5, 1.5)m | 0.30m | β
SUCCESS |
## Usage
```python
import torch
# Load student policy (for deployment)
student = torch.jit.load("student_policy.pt")
student.eval()
# Prepare observation (82-dim)
obs = torch.zeros(1, 82) # [depth_rays(72), vel(3), goal_rel(2), goal_dist_angle(2), prev_action(3)]
# Get action
with torch.no_grad():
action = student(obs) # [vx, vy, omega]
```
## Training Environment
- **Simulator**: NVIDIA IsaacLab (Isaac Sim 4.5)
- **Arena**: 8m Γ 8m with 24-32 cylindrical obstacles
- **Control Rate**: 10 Hz policy / 50 Hz physics
- **Robot**: Unitree G1 (capsule proxy in sim, full robot for real deployment)
## Citation
If you use this model, please cite:
```bibtex
@misc{g1-navigation-dagger-2026,
title={Teacher-Student Distillation via DAgger for Sim-to-Real Navigation on the Unitree G1},
author={Adjimavo},
year={2026},
url={https://huggingface.co/Adjimavo/g1-navigation-dagger}
}
```
## License
Apache 2.0
|