Adjimavo commited on
Commit
d53bd47
Β·
verified Β·
1 Parent(s): 3b27f78

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +130 -0
README.md ADDED
@@ -0,0 +1,130 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - robotics
5
+ - navigation
6
+ - reinforcement-learning
7
+ - imitation-learning
8
+ - dagger
9
+ - humanoid
10
+ - unitree-g1
11
+ - isaaclab
12
+ - sim-to-real
13
+ language:
14
+ - en
15
+ pipeline_tag: reinforcement-learning
16
+ ---
17
+
18
+ # G1 Navigation Policy (DAgger Distillation)
19
+
20
+ Vision-based navigation policies for the **Unitree G1 humanoid robot**, trained using teacher-student DAgger distillation.
21
+
22
+ ## Model Description
23
+
24
+ This repository contains two PyTorch TorchScript models:
25
+
26
+ | Model | File | Input Dim | Output Dim | Description |
27
+ |-------|------|-----------|------------|-------------|
28
+ | **Student** | `student_policy.pt` | 82 | 3 | Deployable policy using depth rays from monocular depth estimation |
29
+ | **Teacher** | `teacher_policy.pt` | 34 | 3 | Privileged policy with ground-truth obstacle positions |
30
+
31
+ ### Architecture
32
+
33
+ Both policies use a 3-layer MLP with ELU activations:
34
+ - **Student**: [82] β†’ 256 β†’ 128 β†’ 64 β†’ [3]
35
+ - **Teacher**: [34] β†’ 256 β†’ 128 β†’ 64 β†’ [3]
36
+
37
+ ### Observation Spaces
38
+
39
+ **Student (82-dim):**
40
+ - Depth rays: 72 dims (Β±70Β° FOV, corrupted with noise/dropout)
41
+ - Robot velocity (vx, vy, Ο‰): 3 dims
42
+ - Goal relative position: 2 dims
43
+ - Goal distance & angle: 2 dims
44
+ - Previous action: 3 dims
45
+
46
+ **Teacher (34-dim):**
47
+ - Nearest obstacles: 8 Γ— 3 = 24 dims (x, y, distance)
48
+ - Robot velocity: 3 dims
49
+ - Goal relative + distance/angle: 4 dims
50
+ - Previous action: 3 dims
51
+
52
+ ### Action Space
53
+
54
+ Velocity commands: `[vx, vy, Ο‰]`
55
+ - `vx ∈ [-0.6, 1.0]` m/s (forward/backward)
56
+ - `vy ∈ [-0.5, 0.5]` m/s (lateral)
57
+ - `Ο‰ ∈ [-1.57, 1.57]` rad/s (yaw rate)
58
+
59
+ ## Training Details
60
+
61
+ ### Two-Stage Pipeline
62
+
63
+ 1. **Stage 1: Teacher PPO** - Train privileged teacher with ground-truth obstacles using PPO (2000 iterations)
64
+ 2. **Stage 2: DAgger Distillation** - Distill teacher to student using Dataset Aggregation with 70% β†’ 20% teacher mixing decay
65
+
66
+ ### Key Innovations
67
+
68
+ - **FOV Randomization**: `fov_keep_ratio ∈ [0.35, 1.0]` prevents sensor overfitting
69
+ - **Hardcase Curriculum**: Mine failure trajectories, retrain with 35% hardcase resets
70
+ - **Symmetry Augmentation**: 50% mirror transform eliminates left/right bias
71
+ - **Runtime Safety Layer**: Distance-based velocity scaling for collision avoidance
72
+
73
+ ### Performance
74
+
75
+ | Scenario | Success Rate | Collision Rate |
76
+ |----------|-------------|----------------|
77
+ | Deploy (mild noise) | 75.0% | 25.0% |
78
+ | Stress (heavy noise) | 75.2% | 24.8% |
79
+ | Wide-FOV Clean | 74.4% | 25.6% |
80
+
81
+ ### Real Robot Validation
82
+
83
+ | Direction | Target | Final Distance | Result |
84
+ |-----------|--------|----------------|--------|
85
+ | Forward | 2.0m | 0.26m | βœ… SUCCESS |
86
+ | Backward | -1.5m | 0.26m | βœ… SUCCESS |
87
+ | Left | 1.5m | 0.31m | βœ… SUCCESS |
88
+ | Right | -2.0m | 0.26m | βœ… SUCCESS |
89
+ | Diagonal | (1.5, 1.5)m | 0.30m | βœ… SUCCESS |
90
+
91
+ ## Usage
92
+
93
+ ```python
94
+ import torch
95
+
96
+ # Load student policy (for deployment)
97
+ student = torch.jit.load("student_policy.pt")
98
+ student.eval()
99
+
100
+ # Prepare observation (82-dim)
101
+ obs = torch.zeros(1, 82) # [depth_rays(72), vel(3), goal_rel(2), goal_dist_angle(2), prev_action(3)]
102
+
103
+ # Get action
104
+ with torch.no_grad():
105
+ action = student(obs) # [vx, vy, omega]
106
+ ```
107
+
108
+ ## Training Environment
109
+
110
+ - **Simulator**: NVIDIA IsaacLab (Isaac Sim 4.5)
111
+ - **Arena**: 8m Γ— 8m with 24-32 cylindrical obstacles
112
+ - **Control Rate**: 10 Hz policy / 50 Hz physics
113
+ - **Robot**: Unitree G1 (capsule proxy in sim, full robot for real deployment)
114
+
115
+ ## Citation
116
+
117
+ If you use this model, please cite:
118
+
119
+ ```bibtex
120
+ @misc{g1-navigation-dagger-2026,
121
+ title={Teacher-Student Distillation via DAgger for Sim-to-Real Navigation on the Unitree G1},
122
+ author={Adjimavo},
123
+ year={2026},
124
+ url={https://huggingface.co/Adjimavo/g1-navigation-dagger}
125
+ }
126
+ ```
127
+
128
+ ## License
129
+
130
+ Apache 2.0