# Model Overview

### Description
GR00T-N1.6-Rheo-PushCart is a vision language action model (VLA). This model is fine-tuned for preparing for surgical instruments transport in the Isaac for Healthcare Rheo workflow. It performs push-cart behavior by grasping the cart handle and moving a cart loaded with a sterilized tray to the surgical table using a G1 embodiment. 
This model is ready for commercial/non-commercial use.

### License/Terms of Use
**Governing Terms:** Your usage of the GR00T-N1.6-Rheo-PushCart model is governed by the [NVIDIA License](https://developer.download.nvidia.cn/licenses/NVIDIA-OneWay-Noncommercial-License-22Mar2022.pdf?t=eyJscyI6ImdzZW8iLCJsc2QiOiJodHRwczovL3d3dy5nb29nbGUuY29tLyIsIm5jaWQiOiJzby15b3V0LTg3MTcwMS12dDQ4In0=).<br>
You are responsible for ensuring that your use of NVIDIA provided models complies with all applicable laws.

### Deployment Geography
Global

### Use Case
This model is intended for Rheo simulation workflows focused on surgical instruments transport (push cart with sterilized tray to the surgical table). It is not intended for real-world clinical deployment.

### Release Date
Hugging Face (03/10/2026) via https://huggingface.co/nvidia/GR00T-N1.6-Rheo-Sim-PushCart/tree/main

## Reference(s)
[Nvidia Isaac-GR00T N1.6](https://github.com/NVIDIA/Isaac-GR00T) 
[Isaac For Healthcare](https://github.com/isaac-for-healthcare)

## Model Architecture
**Architecture Type:** Vision Language Action model
**Network Architecture:** GR00T N1.6
**This model was developed based on** GR00T N1.6
**Number of model parameters:** 3 billion

## Computational Load
**Cumulative Compute:** 6.84×10^18 FLOPs (hardware-based calculation using single NVIDIA H100 NVL for training)

**Estimated Energy and Emissions for Model Training:** 2.58 kWh, 0.00115 tCO₂e

## Input(s)
**Input Type(s):** Vision, State, Language Instruction  
**Input Format(s):**
- Vision: RGB images (uint8)
- State: Floating point
- Language Instruction: String

**Input Parameters:**
- Vision: Two-Dimensional (2D)
- State: One-Dimensional (1D)
- Language Instruction: One-Dimensional (1D)

**Other Properties Related to Input:**
- Vision: Single 480x640 uint8 RGB image frames from robot camera.
- State: 1x31 vector.

## Output(s)
**Output Type(s):** Actions
**Output Format(s):** Continuous-value vectors
**Output Parameters:** Two-Dimensional (2D), 16x32 tensor  
**Other Properties Related to Output:** Continuous-value vectors correspond to different motor controls on the robot embodiment.

Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g., GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.

## Software Integration
**Runtime Engine(s):** PyTorch 2.8.0  

**Supported Operating System::**
- NVIDIA Ampere
- NVIDIA Blackwell
- NVIDIA Hopper

**Preferred/Supported Operating System(s):**
- Linux (Ubuntu 22.04/24.04 LTS)

## Model Version(s)
GR00T-N1.6-Rheo-PushCart

## Training Datasets, Testing, and Evaluation Datasets
Manual teleoperation and IsaacLab mimic generation.

### Training Dataset
**Total Size:** 90 samples  
**Text Training Data Size:** Less than a Billion Tokens  
**Video Training Data Size:** Less than 10,000 Hours  
**Non-Audio, Image, Text Training Data Size:**  

Image/Video Data: RGB video frames from robot camera (640x480 pixels)  
Text Data: 90 language instruction strings by human labelling  
Action Data: 90 episodes of robot action trajectories (state observations and action sequences)  

**Data Modality:**  
- Text  
- Video  
- Action  

**Data Collection Method by dataset:** Automatic/Sensors
**Labeling Method by dataset:** Human  

**Data Properties:**  
Quantity: 90 simulation samples  
Modalities: Multi-modal data consisting of (i) RGB video frames, (ii) text-based language instructions, (iii) robot state observations  
Nature of Content: Data from Isaac Sim simulation environment collected in Isaac Lab mimic; no personal data or copyright-protected content; data represents surgical instrument transport tasks  
Linguistic Characteristics: Language instructions describing surgical instrument transport

**Sensor(s):**  
Vision sensors: RGB cameras capturing 640x480 pixel images in simulation  
Action sensors: Motor sensors on G1 embodiment

### Testing Datasets
**Data Collection Method by dataset:** Not Applicable
**Labeling Method by dataset:** Not Applicable
**Data Properties:** 
The evaluation was performed in simulation using the Isaac for Healthcare Rheo workflow. The testing data consists of dynamically generated episodes of the pushing cart task. 

### Evaluation Datasets
**Data Collection Method by dataset:** Not Applicable
**Labeling Method by dataset:** Not Applicable
**Data Properties:** 
The evaluation was performed in simulation using the Isaac for Healthcare Rheo workflow. The testing data consists of dynamically generated episodes of the pushing cart task. 

## Inference
**Engine:** PyTorch  
**Test Hardware:** NVIDIA RTX 5880 Ada Generation  
**Inference mode / Latency / Memory:** PyTorch 89.3 ± 1.9 ms, 8 GB

## Limitations
This model was trained on data from the Isaac for Healthcare Rheo workflow. Therefore, the model will only perform well in that specific operating room environment. This model is not expected to generalize to different robot platforms, environments, or surgical procedures outside of the trained domain.

## Ethical Considerations
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

For more detailed information on ethical considerations for this model, please see the Model Card++ Bias, Explainability, Safety & Security, and Privacy Subcards.

Please make sure you have proper rights and permissions for all input image and video content; if image or video includes people, personal health information, or intellectual property, the image or video generated will not blur or maintain proportions of image subjects included.

Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns [here](https://app.intigriti.com/programs/nvidia/nvidiavdp/detail).