---
license: other
license_name: nvidia-open-model-license
license_link: https://developer.nvidia.com/open-model-license
language:
- en
library_name: transformers
tags:
- robotics
- vision-language-action
- manipulation
- gr00t
- nvidia
- physical-ai
- humanoid
- reachy2
- lerobot
datasets:
- ganatrask/NOVA
base_model:
- nvidia/GR00T-N1.6-3B
pipeline_tag: robotics
---
# NOVA Model - GR00T N1.6 Fine-tuned for Reachy 2
**NOVA** (Neural Open Vision Actions) is a fine-tuned version of NVIDIA's GR00T N1.6 vision-language-action model, trained specifically for [Pollen Robotics' Reachy 2](https://www.pollen-robotics.com/reachy/) humanoid robot.
## Model Description
This model is part of an end-to-end Physical AI pipeline that combines:
- **Voice Input**: Parakeet CTC 0.6B for speech-to-text
- **Scene Reasoning**: Cosmos Reason 2 for object detection and spatial understanding
- **Action Policy**: This fine-tuned GR00T N1.6 model for manipulation
### Model Details
| Property | Value |
|----------|-------|
| **Base Model** | [nvidia/GR00T-N1.6-3B](https://huggingface.co/nvidia/GR00T-N1.6-3B) |
| **Parameters** | ~3B |
| **Embodiment** | Reachy 2 (custom embodiment tag) |
| **Action Space** | 8-DOF (7 arm joints + gripper) |
| **Training Steps** | 30,000 |
| **Final Loss** | ~0.008-0.01 |
### Action Space
```python
action = [
shoulder_pitch, # -180° to 90°
shoulder_roll, # -180° to 10°
elbow_yaw, # -90° to 90°
elbow_pitch, # -125° to 0°
wrist_roll, # -100° to 100°
wrist_pitch, # -45° to 45°
wrist_yaw, # -30° to 30°
gripper, # 0 (closed) to 1 (open)
]
```
## Intended Use
This model is designed for:
- **Pick-and-place manipulation** tasks on Reachy 2 robot
- **Language-conditioned control** ("Pick up the red cube")
- **Research** in vision-language-action models and robotic manipulation
### Supported Tasks
- Pick up objects (cube, cylinder, capsule, rectangular box)
- Place objects in target locations
- Handle 8 color variations (red, green, blue, yellow, cyan, magenta, orange, purple)
## Training
### Training Data
Trained on the [ganatrask/NOVA dataset](https://huggingface.co/datasets/ganatrask/NOVA):
- **100 episodes** of expert demonstrations
- **32 task variations** (4 objects × 8 colors)
- Domain randomization (position, lighting, camera jitter)
- LeRobot v2.1 format
### Training Configuration
| Parameter | Value |
|-----------|-------|
| GPU | NVIDIA A100-SXM4-80GB |
| GPUs | 2 |
| Batch Size | 64 |
| Max Steps | 30,000 |
| Save Steps | 3,000 |
| Video Backend | decord |
### Training Command
```bash
python -m gr00t.train \
--dataset_repo_id ganatrask/NOVA \
--embodiment_tag reachy2 \
--video_backend decord \
--num_gpus 2 \
--batch_size 64 \
--max_steps 30000 \
--save_steps 3000 \
--output_dir ./checkpoints/groot-reachy2
```
## Usage
### Prerequisites
You need to apply a patch to Isaac-GR00T to add the Reachy 2 embodiment tag:
```bash
cd Isaac-GR00T
patch -p1 < ../patches/add_reachy2_embodiment.patch
```
### Inference
```python
from gr00t.data.embodiment_tags import EmbodimentTag
from gr00t.policy.gr00t_policy import Gr00tPolicy
import importlib.util
# Load modality config first
spec = importlib.util.spec_from_file_location(
"modality_config",
"configs/reachy2_modality_config.py"
)
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
# Load policy
policy = Gr00tPolicy(
embodiment_tag=EmbodimentTag.REACHY2,
model_path="ganatrask/NOVA", # or local checkpoint path
device="cuda",
strict=True,
)
# Run inference
obs = {
"video": {"front_cam": image[None, None, :, :, :]}, # (1, 1, H, W, 3)
"state": {"arm_joints": joints[None, None, :]}, # (1, 1, 7)
"language": {"annotation.human.task_description": [["Pick up the red cube"]]},
}
action, _ = policy.get_action(obs)
```
## Performance
| Metric | Value |
|--------|-------|
| Inference Speed | ~40ms/step (A100) |
| VRAM Usage | ~44GB / 80GB |
| Training Time | ~6 hours (30K steps) |
## Limitations
- **Simulation-trained**: Primarily trained on MuJoCo simulation data
- **Single-arm**: Currently supports right arm manipulation only
- **Fixed camera setup**: Expects front camera input at 224×224 resolution
- **Task scope**: Optimized for pick-and-place; may not generalize to other manipulation tasks
## Ethical Considerations
- This model should be used for research purposes
- Human supervision recommended for real robot deployment
- Not intended for safety-critical applications without extensive testing
## Citation
If you use this model, please cite:
```bibtex
@misc{nova2025,
title={NOVA: Neural Open Vision Actions},
author={ganatrask},
year={2025},
publisher={HuggingFace},
url={https://huggingface.co/ganatrask/NOVA}
}
```
## Acknowledgments
- **[NVIDIA](https://developer.nvidia.com/)** - GR00T N1.6 base model
- **[Pollen Robotics](https://www.pollen-robotics.com/)** - Reachy 2 robot
- **[HuggingFace](https://huggingface.co/)** - LeRobot framework
- **[VESSL AI](https://vessl.ai/)** - GPU compute for training
## License
This model inherits the [NVIDIA Open Model License](https://developer.nvidia.com/open-model-license) from the base GR00T N1.6 model.
## Links
- **GitHub**: [ganatrask/NOVA](https://github.com/ganatrask/NOVA)
- **Dataset**: [ganatrask/NOVA](https://huggingface.co/datasets/ganatrask/NOVA)
- **Base Model**: [nvidia/GR00T-N1.6-3B](https://huggingface.co/nvidia/GR00T-N1.6-3B)