Adjimavo
/

g1-navigation-dagger

+---
+license: apache-2.0
+tags:
+  - robotics
+  - navigation
+  - reinforcement-learning
+  - imitation-learning
+  - dagger
+  - humanoid
+  - unitree-g1
+  - isaaclab
+  - sim-to-real
+language:
+  - en
+pipeline_tag: reinforcement-learning
+---
+# G1 Navigation Policy (DAgger Distillation)
+Vision-based navigation policies for the **Unitree G1 humanoid robot**, trained using teacher-student DAgger distillation.
+## Model Description
+This repository contains two PyTorch TorchScript models:
+| Model | File | Input Dim | Output Dim | Description |
+|-------|------|-----------|------------|-------------|
+| **Student** | `student_policy.pt` | 82 | 3 | Deployable policy using depth rays from monocular depth estimation |
+| **Teacher** | `teacher_policy.pt` | 34 | 3 | Privileged policy with ground-truth obstacle positions |
+### Architecture
+Both policies use a 3-layer MLP with ELU activations:
+- **Student**: [82] → 256 → 128 → 64 → [3]
+- **Teacher**: [34] → 256 → 128 → 64 → [3]
+### Observation Spaces
+**Student (82-dim):**
+- Depth rays: 72 dims (±70° FOV, corrupted with noise/dropout)
+- Robot velocity (vx, vy, ω): 3 dims
+- Goal relative position: 2 dims
+- Goal distance & angle: 2 dims
+- Previous action: 3 dims
+**Teacher (34-dim):**
+- Nearest obstacles: 8 × 3 = 24 dims (x, y, distance)
+- Robot velocity: 3 dims
+- Goal relative + distance/angle: 4 dims
+- Previous action: 3 dims
+### Action Space
+Velocity commands: `[vx, vy, ω]`
+- `vx ∈ [-0.6, 1.0]` m/s (forward/backward)
+- `vy ∈ [-0.5, 0.5]` m/s (lateral)
+- `ω ∈ [-1.57, 1.57]` rad/s (yaw rate)
+## Training Details
+### Two-Stage Pipeline
+1. **Stage 1: Teacher PPO** - Train privileged teacher with ground-truth obstacles using PPO (2000 iterations)
+2. **Stage 2: DAgger Distillation** - Distill teacher to student using Dataset Aggregation with 70% → 20% teacher mixing decay
+### Key Innovations
+- **FOV Randomization**: `fov_keep_ratio ∈ [0.35, 1.0]` prevents sensor overfitting
+- **Hardcase Curriculum**: Mine failure trajectories, retrain with 35% hardcase resets
+- **Symmetry Augmentation**: 50% mirror transform eliminates left/right bias
+- **Runtime Safety Layer**: Distance-based velocity scaling for collision avoidance
+### Performance
+| Scenario | Success Rate | Collision Rate |
+|----------|-------------|----------------|
+| Deploy (mild noise) | 75.0% | 25.0% |
+| Stress (heavy noise) | 75.2% | 24.8% |
+| Wide-FOV Clean | 74.4% | 25.6% |
+### Real Robot Validation
+| Direction | Target | Final Distance | Result |
+|-----------|--------|----------------|--------|
+| Forward | 2.0m | 0.26m | ✅ SUCCESS |
+| Backward | -1.5m | 0.26m | ✅ SUCCESS |
+| Left | 1.5m | 0.31m | ✅ SUCCESS |
+| Right | -2.0m | 0.26m | ✅ SUCCESS |
+| Diagonal | (1.5, 1.5)m | 0.30m | ✅ SUCCESS |
+## Usage
+```python
+import torch
+# Load student policy (for deployment)
+student = torch.jit.load("student_policy.pt")
+student.eval()
+# Prepare observation (82-dim)
+obs = torch.zeros(1, 82)  # [depth_rays(72), vel(3), goal_rel(2), goal_dist_angle(2), prev_action(3)]
+# Get action
+with torch.no_grad():
+    action = student(obs)  # [vx, vy, omega]
+```
+## Training Environment
+- **Simulator**: NVIDIA IsaacLab (Isaac Sim 4.5)
+- **Arena**: 8m × 8m with 24-32 cylindrical obstacles
+- **Control Rate**: 10 Hz policy / 50 Hz physics
+- **Robot**: Unitree G1 (capsule proxy in sim, full robot for real deployment)
+## Citation
+If you use this model, please cite:
+```bibtex
+@misc{g1-navigation-dagger-2026,
+  title={Teacher-Student Distillation via DAgger for Sim-to-Real Navigation on the Unitree G1},
+  author={Adjimavo},
+  year={2026},
+  url={https://huggingface.co/Adjimavo/g1-navigation-dagger}
+}
+```
+## License
+Apache 2.0