nvidia
/

SO_ARM_Starter_Gr00t

+---
+license: apache-2.0
+---
+# Index
+- [Overview](https://docs.google.com/document/d/1qonEBUTEX3zC5PL_pKILSCkBS6WQ5dGzDq4XCJM-hRw/edit#overview)
+- [Bias](https://docs.google.com/document/d/1qonEBUTEX3zC5PL_pKILSCkBS6WQ5dGzDq4XCJM-hRw/edit#bias)
+- [Explainability](https://docs.google.com/document/d/1qonEBUTEX3zC5PL_pKILSCkBS6WQ5dGzDq4XCJM-hRw/edit#explainability)
+- [Privacy](https://docs.google.com/document/d/1qonEBUTEX3zC5PL_pKILSCkBS6WQ5dGzDq4XCJM-hRw/edit#privacy)
+- [Safety & Security](https://docs.google.com/document/d/1qonEBUTEX3zC5PL_pKILSCkBS6WQ5dGzDq4XCJM-hRw/edit#safety--security)
+# Model Overview
+### Description:
+GR00T N1.5 for SO-ARM starter is a vision language action model (VLA) fine-tuned to perform autonomous surgical assistance tasks, specifically surgical instrument management in the Isaac for Healthcare environment. It uses the weights and architecture from the NVIDIA Isaac GR00T N1.5, and is fine-tuned with simulation and real-world data from SO-ARM101 robotic arms.
+This model is ready for commercial/non-commercial use.
+### License/Terms of Use
+NSCL V1 License
+### Deployment Geography:
+Global
+Use Case:
+This model is intended to be used within Isaac for Healthcare as an autonomous SO-ARM starter system that can perform essential surgical assistance duties, including surgical instrument handling and management.
+## Reference(s):
+[Nvidia Isaac GR00T N1.5](https://github.com/NVIDIA/Isaac-GR00T)
+[Isaac For Healthcare](https://github.com/isaac-for-healthcare)
+## Model Architecture:
+**Architecture Type:** Vision Language Action model
+**Network Architecture:** [GR00T N1.5]([https://github.com/NVIDIA/Isaac-GR00T](https://github.com/NVIDIA/Isaac-GR00T))
+** This model was developed based on GR00T N1.5
+** This model has 3 billion parameters.
+### **Input**
+Input Type(s): Vision, State, Language InstructionInput Format:
+* Vision: Variable number of 224x224 uint8 image frames, coming from cameras
+* State: Floating Point (Robot Proprioception)
+* Language Instruction: String
+Input Parameters:
+* Vision: 2D - RGB image, square (224x224)
+* State: 1D - Floating number vector
+* Language Instruction: 1D - String
+Input Images:
+* Room Camera: 224x224 uint8 RGB image frames
+* Wrist Camera: 224x224 uint8 RGB image frames
+Input Prompt:Text String (Language Instruction)
+**Output**
+Output Type(s): Actions
+Output Format: Continuous-value vectors
+Output Parameters: Two-Dimensional (2D), 16x6 Tensor
+Other Properties Related to Output: Continuous-value vectors correspond to different motor controls on a robot, which depends on Degrees of Freedom of the robot embodiment.
+Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
+## Software Integration:
+**Runtime Engine(s):**
+* Pytorch - 2.5.1
+* TensorRT - 10.11.0.33
+**Supported Hardware Microarchitecture Compatibility:**
+* NVIDIA Ampere
+* NVIDIA Blackwell
+* NVIDIA Hopper
+**Preferred/Supported Operating System(s):**
+* Linux (Ubuntu 22.04/24.04 LTS)
+## Model Version(s):
+- GR00T N1.5 for SO-ARM starter
+## Training Datasets:
+Data Collection Method by Dataset:
+* Manual teleoperation
+# Inference:
+**Engine:** Pytorch / TensorRT
+**Test Hardware:**
+* Ada RTX 6000
+| Inference mode | Average Latency | Memory Usage |
+| :---- | :---- | :---- |
+| PyTorch | 42.16 ± 0.81 ms | 5.7 GB |
+| TensorRT | 26.96 ± 1.86 ms | 6.6 GB |
+# Limitations:
+This model was trained on data from the Isaac for Healthcare SO-ARM starter workflow. Therefore, the model will only perform well in that specific surgical assistance environment. This model is not expected to generalize to different robot platforms, surgical instruments, or surgical procedures outside of the trained domain.
+## Ethical Considerations:
+NVIDIA believes Trustworthy AI is a shared responsibility, and we have established policies and practices to enable development for a wide array of AI applications.  When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
+For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards [Insert Link to Model Card++ subcards here].
+Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).
+# Bias
+| Field | Response |
+| :---- | :---- |
+| Participation considerations from adversely impacted groups, protected class, in model design and testing: | Not Applicable |
+| Measures taken to mitigate against unwanted bias: | Not Applicable |
+# Explainability
+| Field | Response |
+| :---- | :---- |
+| Intended Domain: | SO-ARM starter |
+| Model Type: | Robot VLA model |
+| Intended Users: | Isaac For Healthcare users testing the scrub nurse environment. |
+| Output: | Action tensor (outputs the next 16 actions to complete the scissors handling) |
+| Describe how the model works: | Accepts vision, language and robot observations, outputs robot action policy. |
+| Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of: | Not Applicable |
+| Technical Limitations & Mitigation: | This model was trained on data from the Isaac for Healthcare SO-ARM starter workflow. Therefore, the model will only perform well in that singular environment. This model is not expected to generalize to different robot platforms, surgical instruments, or surgical procedures. |
+| Verified to have met prescribed NVIDIA quality standards: | Yes |
+| Performance Metrics: | Latency, Accuracy |
+| Potential Known Risks: | The model may not perfectly follow surgical protocols or handle unexpected surgical scenarios. This may happen due to: unexpected surgical setups, inconsistent camera positioning, and different deployment environments outside of the Isaac for Healthcare simulation environment. |
+| Licensing: | NSCL V1 License |
+# Privacy
+| Field | Response |
+| :---- | :---- |
+| Generatable or reverse engineerable personal data? | None |
+| Personal data used to create this model? | None |
+| How often is dataset reviewed? | Before Release |
+| Is there provenance for all datasets used in training? | Yes |
+| Does data labeling (annotation, metadata) comply with privacy laws? | Yes |
+| Applicable Privacy Policy | [https://www.nvidia.com/en-us/about-nvidia/privacy-policy/](https://www.nvidia.com/en-us/about-nvidia/privacy-policy/) |
+# Safety & Security
+| Field | Response |
+| :---- | :---- |
+| Model Application(s): | SO-ARM starter |
+| Model Application Field(s): | Medical Devices, Machinery and Robotics |
+| Describe the life critical impact (if present). | This model could pose significant risks if deployed on a robotic system in real surgical environments without proper validation. This model has been tested with simulation data and limited real-world data using Isaac for Healthcare and may make unexpected movements if attempted to be deployed in a new surgical environment. This model is not expected to generalize to different environments, robot platforms, or surgical procedures. |
+| Use Case Restrictions: | Abide by NSCL V1 License  |
+| Model and dataset restrictions: | The Principle of Least Privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to. |