Upload README.md with huggingface_hub

d53bd47 verified 16 days ago

3.71 kB

	---
	license: apache-2.0
	tags:
	- robotics
	- navigation
	- reinforcement-learning
	- imitation-learning
	- dagger
	- humanoid
	- unitree-g1
	- isaaclab
	- sim-to-real
	language:
	- en
	pipeline_tag: reinforcement-learning
	---

	# G1 Navigation Policy (DAgger Distillation)

	Vision-based navigation policies for the Unitree G1 humanoid robot, trained using teacher-student DAgger distillation.

	## Model Description

	This repository contains two PyTorch TorchScript models:

	\| Model \| File \| Input Dim \| Output Dim \| Description \|
	\|-------\|------\|-----------\|------------\|-------------\|
	\| Student \| `student_policy.pt` \| 82 \| 3 \| Deployable policy using depth rays from monocular depth estimation \|
	\| Teacher \| `teacher_policy.pt` \| 34 \| 3 \| Privileged policy with ground-truth obstacle positions \|

	### Architecture

	Both policies use a 3-layer MLP with ELU activations:
	- Student: [82] → 256 → 128 → 64 → [3]
	- Teacher: [34] → 256 → 128 → 64 → [3]

	### Observation Spaces

	Student (82-dim):
	- Depth rays: 72 dims (±70° FOV, corrupted with noise/dropout)
	- Robot velocity (vx, vy, ω): 3 dims
	- Goal relative position: 2 dims
	- Goal distance & angle: 2 dims
	- Previous action: 3 dims

	Teacher (34-dim):
	- Nearest obstacles: 8 × 3 = 24 dims (x, y, distance)
	- Robot velocity: 3 dims
	- Goal relative + distance/angle: 4 dims
	- Previous action: 3 dims

	### Action Space

	Velocity commands: `[vx, vy, ω]`
	- `vx ∈ [-0.6, 1.0]` m/s (forward/backward)
	- `vy ∈ [-0.5, 0.5]` m/s (lateral)
	- `ω ∈ [-1.57, 1.57]` rad/s (yaw rate)

	## Training Details

	### Two-Stage Pipeline

	1. Stage 1: Teacher PPO - Train privileged teacher with ground-truth obstacles using PPO (2000 iterations)
	2. Stage 2: DAgger Distillation - Distill teacher to student using Dataset Aggregation with 70% → 20% teacher mixing decay

	### Key Innovations

	- FOV Randomization: `fov_keep_ratio ∈ [0.35, 1.0]` prevents sensor overfitting
	- Hardcase Curriculum: Mine failure trajectories, retrain with 35% hardcase resets
	- Symmetry Augmentation: 50% mirror transform eliminates left/right bias
	- Runtime Safety Layer: Distance-based velocity scaling for collision avoidance

	### Performance

	\| Scenario \| Success Rate \| Collision Rate \|
	\|----------\|-------------\|----------------\|
	\| Deploy (mild noise) \| 75.0% \| 25.0% \|
	\| Stress (heavy noise) \| 75.2% \| 24.8% \|
	\| Wide-FOV Clean \| 74.4% \| 25.6% \|

	### Real Robot Validation

	\| Direction \| Target \| Final Distance \| Result \|
	\|-----------\|--------\|----------------\|--------\|
	\| Forward \| 2.0m \| 0.26m \| ✅ SUCCESS \|
	\| Backward \| -1.5m \| 0.26m \| ✅ SUCCESS \|
	\| Left \| 1.5m \| 0.31m \| ✅ SUCCESS \|
	\| Right \| -2.0m \| 0.26m \| ✅ SUCCESS \|
	\| Diagonal \| (1.5, 1.5)m \| 0.30m \| ✅ SUCCESS \|

	## Usage

	```python
	import torch

	# Load student policy (for deployment)
	student = torch.jit.load("student_policy.pt")
	student.eval()

	# Prepare observation (82-dim)
	obs = torch.zeros(1, 82) # [depth_rays(72), vel(3), goal_rel(2), goal_dist_angle(2), prev_action(3)]

	# Get action
	with torch.no_grad():
	action = student(obs) # [vx, vy, omega]
	```

	## Training Environment

	- Simulator: NVIDIA IsaacLab (Isaac Sim 4.5)
	- Arena: 8m × 8m with 24-32 cylindrical obstacles
	- Control Rate: 10 Hz policy / 50 Hz physics
	- Robot: Unitree G1 (capsule proxy in sim, full robot for real deployment)

	## Citation

	If you use this model, please cite:

	```bibtex
	@misc{g1-navigation-dagger-2026,
	title={Teacher-Student Distillation via DAgger for Sim-to-Real Navigation on the Unitree G1},
	author={Adjimavo},
	year={2026},
	url={https://huggingface.co/Adjimavo/g1-navigation-dagger}
	}
	```

	## License

	Apache 2.0