--- license: other license_name: nvidia-open-model-license license_link: https://developer.nvidia.com/open-model-license language: - en library_name: transformers tags: - robotics - vision-language-action - manipulation - gr00t - nvidia - physical-ai - humanoid - reachy2 - lerobot datasets: - ganatrask/NOVA base_model: - nvidia/GR00T-N1.6-3B pipeline_tag: robotics --- # NOVA Model - GR00T N1.6 Fine-tuned for Reachy 2

**NOVA** (Neural Open Vision Actions) is a fine-tuned version of NVIDIA's GR00T N1.6 vision-language-action model, trained specifically for [Pollen Robotics' Reachy 2](https://www.pollen-robotics.com/reachy/) humanoid robot. ## Model Description This model is part of an end-to-end Physical AI pipeline that combines: - **Voice Input**: Parakeet CTC 0.6B for speech-to-text - **Scene Reasoning**: Cosmos Reason 2 for object detection and spatial understanding - **Action Policy**: This fine-tuned GR00T N1.6 model for manipulation ### Model Details | Property | Value | |----------|-------| | **Base Model** | [nvidia/GR00T-N1.6-3B](https://huggingface.co/nvidia/GR00T-N1.6-3B) | | **Parameters** | ~3B | | **Embodiment** | Reachy 2 (custom embodiment tag) | | **Action Space** | 8-DOF (7 arm joints + gripper) | | **Training Steps** | 30,000 | | **Final Loss** | ~0.008-0.01 | ### Action Space ```python action = [ shoulder_pitch, # -180° to 90° shoulder_roll, # -180° to 10° elbow_yaw, # -90° to 90° elbow_pitch, # -125° to 0° wrist_roll, # -100° to 100° wrist_pitch, # -45° to 45° wrist_yaw, # -30° to 30° gripper, # 0 (closed) to 1 (open) ] ``` ## Intended Use This model is designed for: - **Pick-and-place manipulation** tasks on Reachy 2 robot - **Language-conditioned control** ("Pick up the red cube") - **Research** in vision-language-action models and robotic manipulation ### Supported Tasks - Pick up objects (cube, cylinder, capsule, rectangular box) - Place objects in target locations - Handle 8 color variations (red, green, blue, yellow, cyan, magenta, orange, purple) ## Training ### Training Data Trained on the [ganatrask/NOVA dataset](https://huggingface.co/datasets/ganatrask/NOVA): - **100 episodes** of expert demonstrations - **32 task variations** (4 objects × 8 colors) - Domain randomization (position, lighting, camera jitter) - LeRobot v2.1 format ### Training Configuration | Parameter | Value | |-----------|-------| | GPU | NVIDIA A100-SXM4-80GB | | GPUs | 2 | | Batch Size | 64 | | Max Steps | 30,000 | | Save Steps | 3,000 | | Video Backend | decord | ### Training Command ```bash python -m gr00t.train \ --dataset_repo_id ganatrask/NOVA \ --embodiment_tag reachy2 \ --video_backend decord \ --num_gpus 2 \ --batch_size 64 \ --max_steps 30000 \ --save_steps 3000 \ --output_dir ./checkpoints/groot-reachy2 ``` ## Usage ### Prerequisites You need to apply a patch to Isaac-GR00T to add the Reachy 2 embodiment tag: ```bash cd Isaac-GR00T patch -p1 < ../patches/add_reachy2_embodiment.patch ``` ### Inference ```python from gr00t.data.embodiment_tags import EmbodimentTag from gr00t.policy.gr00t_policy import Gr00tPolicy import importlib.util # Load modality config first spec = importlib.util.spec_from_file_location( "modality_config", "configs/reachy2_modality_config.py" ) module = importlib.util.module_from_spec(spec) spec.loader.exec_module(module) # Load policy policy = Gr00tPolicy( embodiment_tag=EmbodimentTag.REACHY2, model_path="ganatrask/NOVA", # or local checkpoint path device="cuda", strict=True, ) # Run inference obs = { "video": {"front_cam": image[None, None, :, :, :]}, # (1, 1, H, W, 3) "state": {"arm_joints": joints[None, None, :]}, # (1, 1, 7) "language": {"annotation.human.task_description": [["Pick up the red cube"]]}, } action, _ = policy.get_action(obs) ``` ## Performance | Metric | Value | |--------|-------| | Inference Speed | ~40ms/step (A100) | | VRAM Usage | ~44GB / 80GB | | Training Time | ~6 hours (30K steps) | ## Limitations - **Simulation-trained**: Primarily trained on MuJoCo simulation data - **Single-arm**: Currently supports right arm manipulation only - **Fixed camera setup**: Expects front camera input at 224×224 resolution - **Task scope**: Optimized for pick-and-place; may not generalize to other manipulation tasks ## Ethical Considerations - This model should be used for research purposes - Human supervision recommended for real robot deployment - Not intended for safety-critical applications without extensive testing ## Citation If you use this model, please cite: ```bibtex @misc{nova2025, title={NOVA: Neural Open Vision Actions}, author={ganatrask}, year={2025}, publisher={HuggingFace}, url={https://huggingface.co/ganatrask/NOVA} } ``` ## Acknowledgments - **[NVIDIA](https://developer.nvidia.com/)** - GR00T N1.6 base model - **[Pollen Robotics](https://www.pollen-robotics.com/)** - Reachy 2 robot - **[HuggingFace](https://huggingface.co/)** - LeRobot framework - **[VESSL AI](https://vessl.ai/)** - GPU compute for training ## License This model inherits the [NVIDIA Open Model License](https://developer.nvidia.com/open-model-license) from the base GR00T N1.6 model. ## Links - **GitHub**: [ganatrask/NOVA](https://github.com/ganatrask/NOVA) - **Dataset**: [ganatrask/NOVA](https://huggingface.co/datasets/ganatrask/NOVA) - **Base Model**: [nvidia/GR00T-N1.6-3B](https://huggingface.co/nvidia/GR00T-N1.6-3B)