File size: 5,936 Bytes
50b1265 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 |
---
license: other
license_name: nvidia-open-model-license
license_link: https://developer.nvidia.com/open-model-license
language:
- en
library_name: transformers
tags:
- robotics
- vision-language-action
- manipulation
- gr00t
- nvidia
- physical-ai
- humanoid
- reachy2
- lerobot
datasets:
- ganatrask/NOVA
base_model:
- nvidia/GR00T-N1.6-3B
pipeline_tag: robotics
---
# NOVA Model - GR00T N1.6 Fine-tuned for Reachy 2
<p align="center">
<img src="https://img.shields.io/badge/NVIDIA-GR00T%20N1.6-76B900?style=for-the-badge&logo=nvidia" alt="GR00T N1.6"/>
<img src="https://img.shields.io/badge/Robot-Reachy%202-0066CC?style=for-the-badge" alt="Reachy 2"/>
<img src="https://img.shields.io/badge/Task-Pick%20%26%20Place-green?style=for-the-badge" alt="Pick & Place"/>
</p>
**NOVA** (Neural Open Vision Actions) is a fine-tuned version of NVIDIA's GR00T N1.6 vision-language-action model, trained specifically for [Pollen Robotics' Reachy 2](https://www.pollen-robotics.com/reachy/) humanoid robot.
## Model Description
This model is part of an end-to-end Physical AI pipeline that combines:
- **Voice Input**: Parakeet CTC 0.6B for speech-to-text
- **Scene Reasoning**: Cosmos Reason 2 for object detection and spatial understanding
- **Action Policy**: This fine-tuned GR00T N1.6 model for manipulation
### Model Details
| Property | Value |
|----------|-------|
| **Base Model** | [nvidia/GR00T-N1.6-3B](https://huggingface.co/nvidia/GR00T-N1.6-3B) |
| **Parameters** | ~3B |
| **Embodiment** | Reachy 2 (custom embodiment tag) |
| **Action Space** | 8-DOF (7 arm joints + gripper) |
| **Training Steps** | 30,000 |
| **Final Loss** | ~0.008-0.01 |
### Action Space
```python
action = [
shoulder_pitch, # -180° to 90°
shoulder_roll, # -180° to 10°
elbow_yaw, # -90° to 90°
elbow_pitch, # -125° to 0°
wrist_roll, # -100° to 100°
wrist_pitch, # -45° to 45°
wrist_yaw, # -30° to 30°
gripper, # 0 (closed) to 1 (open)
]
```
## Intended Use
This model is designed for:
- **Pick-and-place manipulation** tasks on Reachy 2 robot
- **Language-conditioned control** ("Pick up the red cube")
- **Research** in vision-language-action models and robotic manipulation
### Supported Tasks
- Pick up objects (cube, cylinder, capsule, rectangular box)
- Place objects in target locations
- Handle 8 color variations (red, green, blue, yellow, cyan, magenta, orange, purple)
## Training
### Training Data
Trained on the [ganatrask/NOVA dataset](https://huggingface.co/datasets/ganatrask/NOVA):
- **100 episodes** of expert demonstrations
- **32 task variations** (4 objects × 8 colors)
- Domain randomization (position, lighting, camera jitter)
- LeRobot v2.1 format
### Training Configuration
| Parameter | Value |
|-----------|-------|
| GPU | NVIDIA A100-SXM4-80GB |
| GPUs | 2 |
| Batch Size | 64 |
| Max Steps | 30,000 |
| Save Steps | 3,000 |
| Video Backend | decord |
### Training Command
```bash
python -m gr00t.train \
--dataset_repo_id ganatrask/NOVA \
--embodiment_tag reachy2 \
--video_backend decord \
--num_gpus 2 \
--batch_size 64 \
--max_steps 30000 \
--save_steps 3000 \
--output_dir ./checkpoints/groot-reachy2
```
## Usage
### Prerequisites
You need to apply a patch to Isaac-GR00T to add the Reachy 2 embodiment tag:
```bash
cd Isaac-GR00T
patch -p1 < ../patches/add_reachy2_embodiment.patch
```
### Inference
```python
from gr00t.data.embodiment_tags import EmbodimentTag
from gr00t.policy.gr00t_policy import Gr00tPolicy
import importlib.util
# Load modality config first
spec = importlib.util.spec_from_file_location(
"modality_config",
"configs/reachy2_modality_config.py"
)
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
# Load policy
policy = Gr00tPolicy(
embodiment_tag=EmbodimentTag.REACHY2,
model_path="ganatrask/NOVA", # or local checkpoint path
device="cuda",
strict=True,
)
# Run inference
obs = {
"video": {"front_cam": image[None, None, :, :, :]}, # (1, 1, H, W, 3)
"state": {"arm_joints": joints[None, None, :]}, # (1, 1, 7)
"language": {"annotation.human.task_description": [["Pick up the red cube"]]},
}
action, _ = policy.get_action(obs)
```
## Performance
| Metric | Value |
|--------|-------|
| Inference Speed | ~40ms/step (A100) |
| VRAM Usage | ~44GB / 80GB |
| Training Time | ~6 hours (30K steps) |
## Limitations
- **Simulation-trained**: Primarily trained on MuJoCo simulation data
- **Single-arm**: Currently supports right arm manipulation only
- **Fixed camera setup**: Expects front camera input at 224×224 resolution
- **Task scope**: Optimized for pick-and-place; may not generalize to other manipulation tasks
## Ethical Considerations
- This model should be used for research purposes
- Human supervision recommended for real robot deployment
- Not intended for safety-critical applications without extensive testing
## Citation
If you use this model, please cite:
```bibtex
@misc{nova2025,
title={NOVA: Neural Open Vision Actions},
author={ganatrask},
year={2025},
publisher={HuggingFace},
url={https://huggingface.co/ganatrask/NOVA}
}
```
## Acknowledgments
- **[NVIDIA](https://developer.nvidia.com/)** - GR00T N1.6 base model
- **[Pollen Robotics](https://www.pollen-robotics.com/)** - Reachy 2 robot
- **[HuggingFace](https://huggingface.co/)** - LeRobot framework
- **[VESSL AI](https://vessl.ai/)** - GPU compute for training
## License
This model inherits the [NVIDIA Open Model License](https://developer.nvidia.com/open-model-license) from the base GR00T N1.6 model.
## Links
- **GitHub**: [ganatrask/NOVA](https://github.com/ganatrask/NOVA)
- **Dataset**: [ganatrask/NOVA](https://huggingface.co/datasets/ganatrask/NOVA)
- **Base Model**: [nvidia/GR00T-N1.6-3B](https://huggingface.co/nvidia/GR00T-N1.6-3B)
|