Add model card and license

Browse files

Signed-off-by: Mingxin Zheng <mingxinz@nvidia.com>

Files changed (5) hide show

EXPLAINABILITY.md +13 -0
LICENSE +61 -0
PRIVACY.md +11 -0
README.md +132 -0
SAFETY_and_SECURITY.md +6 -0

EXPLAINABILITY.md ADDED Viewed

	@@ -0,0 +1,13 @@

+Field                                                                                                  |  Response
+:------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------
+Intended Domain:                                                                                        |  Rheo (Isaac for Healthcare simulation workflow focused on preparing for surgical instruments handling tasks in an OR environment).
+Model Type:                                                                                             |  Robot Vision Language Action (VLA) model
+Intended Users:                                                                                         |  Isaac For Healthcare users testing operating room environments in simulation.
+Output:                                                                                                 |  Action tensor (next 16 actions) for pick-and-place of a sterilized tray from shelf to cart.
+Describe how the model works:                                                                           |  Accepts vision frames, language instruction, and robot observation; encodes multimodal inputs and outputs continuous robot action vectors.
+Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of:   |  Not Applicable
+Technical Limitations & Mitigation:                                                                     |  This model has been trained and tested with simulation data from the Isaac for Healthcare Rheo workflow and is not expected to generalize outside this environment.
+Verified to have met prescribed NVIDIA quality standards:                                                |  Yes
+Performance Metrics:                                                                                    |  Latency, Task success rate
+Potential Known Risks:                                                                                  |  This model may make unexpected movements if deployed in environments outside the Rheo simulation. The model may not fully adhere to protocols or manage unforeseen scenarios when operating outside the simulated environment.
+Licensing:                                                                                              |  Nvidia License

LICENSE ADDED Viewed

	@@ -0,0 +1,61 @@

+NVIDIA License
+1. Definitions
+“Licensor” means any person or entity that distributes its Work.
+“Work” means (a) the original work of authorship made available under this license,
+which may include software, documentation, or other files, and (b) any additions to or
+derivative works  thereof  that are made available under this license.
+The terms “reproduce,” “reproduction,” “derivative works,” and “distribution” have the
+meaning as provided under U.S. copyright law; provided, however, that for the purposes
+of this license, derivative works shall not include works that remain separable from, or
+merely link (or bind by name) to the interfaces of, the Work.
+Works are “made available” under this license by including in or with the Work either (a)
+a copyright notice referencing the applicability of this license to the Work, or (b) a copy
+of this license.
+2. License Grant
+2.1 Copyright Grant. Subject to the terms and conditions of this license, each
+Licensor grants to you a perpetual, worldwide, non-exclusive, royalty-free,
+copyright license to use, reproduce, prepare derivative works of, publicly display,
+publicly perform, sublicense and distribute its Work and any resulting derivative
+works in any form.
+3. Limitations
+3.1 Redistribution. You may reproduce or distribute the Work only if (a) you do so
+under this license, (b) you include a complete copy of this license with your
+distribution, and (c) you retain without modification any copyright, patent,
+trademark, or attribution notices that are present in the Work.
+3.2 Derivative Works. You may specify that additional or different terms apply to
+the use, reproduction, and distribution of your derivative works of the Work (“Your
+Terms”) only if (a) Your Terms provide that the use limitation in Section 3.3
+applies to your derivative works, and (b) you identify the specific derivative works
+that are subject to Your Terms. Notwithstanding Your Terms, this license (including
+the redistribution requirements in Section 3.1) will continue to apply to the Work
+itself.
+3.3 Use Limitation. The Work and any derivative works thereof only may be used
+or intended for use non-commercially. Notwithstanding the foregoing, NVIDIA
+Corporation and its affiliates may use the Work and any derivative works
+commercially. As used herein, “non-commercially” means for research or
+evaluation purposes only.
+3.4 Patent Claims. If you bring or threaten to bring a patent claim against any
+Licensor (including any claim, cross-claim or counterclaim in a lawsuit) to enforce
+any patents that you allege are infringed by any Work, then your rights under this
+license from such Licensor (including the grant in Section 2.1) will terminate
+immediately.
+3.5 Trademarks. This license does not grant any rights to use any Licensor’s or its
+affiliates’ names, logos, or trademarks, except as necessary to reproduce the
+notices described in this license.
+3.6 Termination. If you violate any term of this license, then your rights under this
+license (including the grant in Section 2.1) will terminate immediately.
+4. Disclaimer of Warranty.
+THE WORK IS PROVIDED “AS IS” WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,
+EITHER EXPRESS OR IMPLIED, INCLUDING WARRANTIES OR CONDITIONS OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE OR NON-
+INFRINGEMENT. YOU BEAR THE RISK OF UNDERTAKING ANY ACTIVITIES UNDER THIS
+LICENSE.
+5. Limitation of Liability.
+EXCEPT AS PROHIBITED BY APPLICABLE LAW, IN NO EVENT AND UNDER NO LEGAL
+THEORY, WHETHER IN TORT (INCLUDING NEGLIGENCE), CONTRACT, OR OTHERWISE
+SHALL ANY LICENSOR BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY DIRECT,
+INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF OR
+RELATED TO THIS LICENSE, THE USE OR INABILITY TO USE THE WORK (INCLUDING BUT
+NOT LIMITED TO LOSS OF GOODWILL, BUSINESS INTERRUPTION, LOST PROFITS OR
+DATA, COMPUTER FAILURE OR MALFUNCTION, OR ANY OTHER DAMAGES OR LOSSES),
+EVEN IF THE LICENSOR HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

PRIVACY.md ADDED Viewed

	@@ -0,0 +1,11 @@

+Field                                                                                                                              |  Response
+:----------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------
+Generatable or reverse engineerable personal data?                                                                               |  No
+Personal data used to create this model?                                                                                         |  No
+Was consent obtained for any personal data used?                                                                                  |  Not Applicable
+How often is dataset reviewed?                                                         |  Before Release
+Was data from user interactions with the AI model (e.g. user input and prompts) used to train the model?  | No
+Is there provenance for all datasets used in training?                                                                            |  Yes
+Does data labeling (annotation, metadata) comply with privacy laws?                                                               |  Yes
+Is data compliant with data subject requests for data correction or removal, if such a request was made?                                                              |  No, not possible with externally-sourced data.
+Applicable Privacy Policy                                                                                                         |  https://www.nvidia.com/en-us/about-nvidia/privacy-policy/

README.md ADDED Viewed

	@@ -0,0 +1,132 @@

+# Model Overview
+### Description
+GR00T-N1.6-Rheo-PickNPlace is a vision language action model (VLA). This model is fine-tuned for preparing for surgical instruments handling in the Isaac for Healthcare Rheo workflow. It performs the pick‑and‑place of a sterilized box from a shelf to a cart using a G1 embodiment. This model is ready for commercial/non-commercial use.
+### License/Terms of Use
+Nvidia License. Additional Information: [Apache-2.0 license for http://huggingface.co/Qwen/Qwen2.5-7B-Instruct and https://huggingface.co/google/siglip2-so400m-patch16-512.]
+You are responsible for ensuring that your use of NVIDIA AI Foundation Models complies with all applicable laws.
+### Deployment Geography
+Global
+### Use Case
+This model is intended for Rheo simulation workflows focused on surgical instruments handling (sterilized box pick-and-place from shelf to cart). It is not intended for real-world clinical deployment.
+### Release Date
+Hugging Face (03/10/2026) via https://huggingface.co/nvidia/GR00T-N1.6-Rheo-PickNPlaceTray/tree/main
+## Reference(s)
+[Nvidia Isaac-GR00T N1.6](https://github.com/NVIDIA/Isaac-GR00T)
+[Isaac For Healthcare](https://github.com/isaac-for-healthcare)
+## Model Architecture
+**Architecture Type:** Vision Language Action model
+**Network Architecture:** GR00T N1.6
+**This model was developed based on** GR00T N1.6
+**Number of model parameters:** 3 billion
+## Computational Load
+**Cumulative Compute:** 2.45×10^19 FLOPs (hardware-based calculation using single NVIDIA H100 NVL for training)
+**Estimated Energy and Emissions for Model Training:** 5.37 kWh, 0.00217 tCO₂e
+## Input(s)
+**Input Type(s):** Vision, State, Language Instruction
+**Input Format(s):**
+- Vision: RGB images (uint8)
+- State: Floating point
+- Language Instruction: String
+**Input Parameters:**
+- Vision: Two-Dimensional (2D)
+- State: One-Dimensional (1D)
+- Language Instruction: One-Dimensional (1D)
+**Other Properties Related to Input:**
+- Vision: Raw 480x640 uint8 RGB frames from robot head camera; training preprocessing uses shortest_edge=256 with crop_fraction=0.95 (albumentations).
+- State: 1x31 vector.
+## Output(s)
+**Output Type(s):** Actions
+**Output Format(s):** Continuous-value vectors
+**Output Parameters:** Two-Dimensional (2D), 16x32 tensor
+**Other Properties Related to Output:** Continuous-value vectors correspond to different motor controls on the robot embodiment.
+Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g., GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
+## Software Integration
+**Runtime Engine(s):** PyTorch 2.8.0
+**Supported Operating System:**
+- NVIDIA Ampere
+- NVIDIA Blackwell
+- NVIDIA Hopper
+**Preferred/Supported Operating System(s):**
+- Linux (Ubuntu 22.04/24.04 LTS)
+## Model Version(s)
+GR00T-N1.6-Rheo-PickNPlace
+## Training Datasets, Testing, and Evaluation Datasets
+Manual teleoperation and IsaacLab mimic generation.
+### Training Dataset
+**Total Size:** 120 samples
+**Text Training Data Size:** Less than a Billion Tokens
+**Video Training Data Size:** Less than 10,000 Hours
+**Non-Audio, Image, Text Training Data Size:**
+Image/Video Data: RGB video frames from robot head camera (640x480 pixels)
+Text Data: 120 language instruction strings by human labelling
+Action Data: 120 episodes of robot action trajectories (state observations and action sequences)
+**Data Modality:**
+- Text
+- Video
+- Action
+**Data Collection Method by dataset:** Automatic/Sensors
+**Labeling Method by dataset:** Human
+**Data Properties:**
+Quantity: 120 simulation samples
+Modalities: Multi-modal data consisting of (i) RGB video frames, (ii) text-based language instructions, (iii) robot state observations
+Nature of Content: Data from Isaac Sim simulation environment collected in Isaac Lab mimic; no personal data or copyright-protected content; data represents surgical instrument manipulation tasks
+Linguistic Characteristics: Language instructions describing surgical instrument prepartion
+**Sensor(s):**
+Vision sensors: RGB cameras (robot head-mounted) capturing 640x480 pixel images in simulation
+Action sensors: Motor sensors on G1 embodiment
+### Testing Datasets
+**Data Collection Method by dataset:** Not Applicable
+**Labeling Method by dataset:** Not Applicable
+**Data Properties:**
+The evaluation was performed in simulation using the Isaac for Healthcare Rheo workflow. The testing data consists of dynamically generated episodes of the pick-and-place task.
+### Evaluation Datasets
+**Data Collection Method by dataset:** Not Applicable
+**Labeling Method by dataset:** Not Applicable
+**Data Properties:**
+The evaluation was performed in simulation using the Isaac for Healthcare Rheo workflow. The testing data consists of dynamically generated episodes of the pick-and-place task.
+## Inference
+**Engine:** PyTorch
+**Test Hardware:** NVIDIA RTX 5880 Ada Generation
+**Inference mode / Latency / Memory:** PyTorch 92.4 ± 1.3 ms, 8 GB
+## Limitations
+This model was trained on data from the Isaac for Healthcare Rheo workflow. Therefore, the model will only perform well in that specific operating room environment. This model is not expected to generalize to different robot platforms, environments, or surgical procedures outside of the trained domain.
+## Ethical Considerations
+NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
+For more detailed information on ethical considerations for this model, please see the Model Card++ Bias, Explainability, Safety & Security, and Privacy Subcards.
+Please make sure you have proper rights and permissions for all input image and video content; if image or video includes people, personal health information, or intellectual property, the image or video generated will not blur or maintain proportions of image subjects included.
+Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns [here](https://app.intigriti.com/programs/nvidia/nvidiavdp/detail).

SAFETY_and_SECURITY.md ADDED Viewed

	@@ -0,0 +1,6 @@

+Field                                               |  Response
+:---------------------------------------------------|:----------------------------------
+Model Application Field(s):                         |  Healthcare, Medical Devices, Machinery and Robotics
+Describe the life critical impact (if present).     |  This model is not tested or intended for use in mission critical applications that require functional safety. The use of the model in those applications is at the user's own risk and sole responsibility, including taking the necessary steps to add needed guardrails or safety mechanisms prior to deployment.
+Use Case Restrictions:                              |  Abide by Nvidia License
+Model and dataset restrictions:                     |  The Principle of Least Privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to.