File size: 11,118 Bytes

1d7d4c3

---
language:
  - en
license: other
license_name: nvidia-open-model-license
license_link: >-
  https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license
tags:
  - nvidia
  - instant-nurec
  - image-to-3d
  - 3d-generation
  - gaussian-splatting
  - physical-ai
pipeline_tag: image-to-3d
---

# Instant-NuRec | Model Card

[**Code**](https://github.com/NVIDIA/instant-nurec) | [**Model**](https://huggingface.co/nvidia/instant-nurec)

## Description:

Instant-NuRec is a model that takes a series of images as input and outputs Gaussian Splats. The model uses an alternate-attention Vision Transformer encoder following the Depth-Anything-v3 (DAv3) design and is initialized from the DAv3 ViT-Base checkpoint (DINOv2-based) before being finetuned on NVIDIA AV data. Instant-NuRec allows users to generate Gaussian Splats in less than 2 minutes. This model was trained to take up to 90 input images (5 views x 18 frames) with a resolution of 504x280.

![Instant-NuRec demo](./docs/demo.gif)

This model is ready for commercial/non-commercial use.

### License/Terms of Use:

### Governing Terms: Use of this model system is governed by the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).

Deployment Geography: Global

### Release Management:

Instant-NuRec is published as a standalone GitHub repository for code, with model weights distributed via Hugging Face.

- Release date: June 2026
- GitHub code: [https://github.com/NVIDIA/instant-nurec](https://github.com/NVIDIA/instant-nurec)
- Hugging Face model and weights: [https://huggingface.co/nvidia/instant-nurec](https://huggingface.co/nvidia/instant-nurec)

## Use Case:

Physical AI developers who are looking to create 3D automotive scenes for either closed-loop simulation or Synthetic Data Generation (SDG).

## Known Technical Limitations:

The model is not guaranteed to perform well with scenes that are outside of the common distribution. The model was not trained on extreme weather conditions. Night scenes are sparsely represented.

## Known Risk(s):

AV and robotics developers should be aware that this model cannot guarantee a 100% success rate. In cases of unsuccessful generation, the output may not possess an accurate real-world representation of the scene and should not be relied upon in safety-critical simulations.

## Reference(s):

- [Depth-Anything-v3](https://depth-anything.github.io/depth-anything-3/)
- [DINOv2](https://arxiv.org/abs/2304.07193)
- [STORM](https://research.nvidia.com/labs/avg/publication/yang.huang.etal.iclr2025/)
- [GS-LRM](https://arxiv.org/pdf/2404.19702)
- [Vision Transformer](https://arxiv.org/pdf/2010.11929)

## Model Architecture:

![Instant-NuRec architecture](./docs/architecture.png)

Instant-NuRec depends on the Vision Transformer and follows the alternate-attention design of Depth-Anything-v3. The encoder is paired with several lightweight DPT-style decoder heads for sky cubemap, camera-ISP, depth and context, motion, and Gaussian Splatting attributes. These heads produce the per-pixel attributes consumed by the 3D Gaussian representation.

Architecture Type: Transformer

Network Architecture: Other Not Listed - alternate-attention Vision Transformer (ViT-Base, DAv3 design) with DPT-style decoder heads.

This model was developed based on [Depth-Anything-v3 ViT-Base](https://huggingface.co/depth-anything/DA3-BASE), which is itself initialized from DINOv2.

Number of model parameters: 202M

## Model Input:

Input Type(s): NCoreV4 file

Input Format: Red, Green, Blue (RGB)

Input Parameters: Two-Dimensional (2D)

Other Properties Related to Input:

The NCoreV4 file packages, per scene:

- Up to 90 RGB images (5 views x 18 frames at 2-4 Hz) at a resolution of 504x280
- Camera 6-DoF pose (orientation and translation) for each image
- Camera intrinsics / field of view for each image
- Optional cuboid tracks of dynamic actors in the scene, represented as sequential 3D bounding box trajectories with fixed spatial size

## Model Output:

Output Type(s): One or more PLY files containing 3D Gaussian particles

Output Format: Polygon File Format (PLY)

Output Parameters: Three-Dimensional (3D)

Other Properties Related to Output:

A PLY file (Polygon File Format) contains 3D model data with the following specific components:

- Header: Defines the file structure, including format (ASCII or binary), the vertex element, its properties (x, y, z coordinates plus Gaussian attributes), and data types such as float and int.
- Vertex Data: One entry per Gaussian. Each entry stores the Gaussian's world-space position (x, y, z).
- Custom Data: Defines Gaussian attributes, such as scale, rotation, color, opacity, and semantics storing information if a Gaussian belongs to the road, background, or foreground.

3D Gaussian Splatting PLYs do not contain face data. The scene is represented purely as a collection of Gaussian primitives stored as vertex entries.

## Software Integration:

Runtime Engine(s):

- PyTorch-based inference, distributed via standalone GitHub repository

## Hardware Compatibility:

Supported Hardware Microarchitecture Compatibility:

- NVIDIA Ampere
- NVIDIA Blackwell
- NVIDIA Hopper
- NVIDIA Lovelace

Preferred/Supported Operating Systems: Linux

Hardware Specific Requirements:

The model can run on a single NVIDIA GPU with CUDA Compute Capability greater than or equal to 8.0. The following is required:

- GPU performance >= 300 Tflops
- GPU memory size >= 30GB for inference / 80GB for training
- GPU memory bandwidth >= 768 GB/s
- System RAM >= 32 GB
- System disk storage >= 100GB
- CPU >= 16 threads x 3GHz

NVIDIA AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA hardware and software frameworks, the model can achieve faster training and inference times compared to CPU-only solutions.

## Model Version:

Instant_NuRec_v1

## Inference:

Engine: PyTorch

Test Hardware:

- NVIDIA H100 (Hopper, datacenter - primary training/inference)
- NVIDIA A100 (Ampere, datacenter)
- NVIDIA RTX 5090 (Blackwell, consumer - validated for local single-GPU inference)

## Ethical Considerations:

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy subcards below.

Please make sure you have proper rights and permissions for all input image and video content. If image or video includes people, personal health information, or intellectual property, the image or video generated will not blur or maintain proportions of image subjects included.

Please report model quality, risk, security vulnerabilities, or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).

## Model Card++

### Bias

| Field | Response |
| :---- | :---- |
| Participation considerations from adversely impacted groups [protected classes](https://www.senate.ca.gov/content/protected-classes) in model design and testing: | None |
| Measures taken to mitigate against unwanted bias: | None |

### Explainability

| Field | Response |
| :---- | :---- |
| Intended Task/Domain: | Advanced Driver Assistance Systems |
| Model Type: | Image-to-3D Gaussians |
| Intended Users: | Autonomous Vehicles developers enhancing and improving Neural Reconstruction pipelines. |
| Output: | 3D Gaussian Splats as PLY file. |
| Describe how the model works: | The model takes a series of input images, and outputs a Gaussian Splatting scene. |
| Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of: | Not Applicable |
| Technical Limitations & Mitigation: | The model is not guaranteed to perform well with scenes that are outside of the common distribution. The model was not trained on extreme weather conditions. Night scenes are sparsely represented. |
| Verified to have met prescribed NVIDIA quality standards: | Yes |
| Performance Metrics: | PSNR (Peak Signal-to-Noise Ratio) |
| Potential Known Risks: | AV and robotics developers should be aware that this model cannot guarantee a 100% success rate. In cases of unsuccessful generation, the output may not possess an accurate real-world representation of the scene and should not be relied upon in safety-critical simulations. |
| Licensing: | Use of this model system is governed by the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/). |

### Privacy

| Field | Response |
| :---- | :---- |
| Generatable or reverse engineerable personal data? | No |
| Personal data used to create this model? | Yes |
| Was consent obtained for any personal data used? | No |
| Is a mechanism in place to honor data subject right of access or deletion of personal data? | Yes |
| If personal data was collected for the development of the model, was it collected directly by NVIDIA? | Yes |
| If personal data was collected for the development of the model by NVIDIA, do you maintain or have access to disclosures made to data subjects? | Yes |
| If personal data was collected for the development of this AI model, was it minimized to only what was required? | Yes |
| How often is the dataset reviewed? | Before release |
| Is there provenance for all datasets used in training? | Yes |
| Does data labeling (annotation, metadata) comply with privacy laws? | Yes |
| Is data compliant with data subject requests for data correction or removal, if such a request was made? | Yes |
| Was data from user interactions with the AI model, such as user input and prompts, used to train the model? | No |
| Applicable Privacy Policy | [https://www.nvidia.com/en-us/about-nvidia/privacy-policy/](https://www.nvidia.com/en-us/about-nvidia/privacy-policy/) |

### Safety & Security

| Field | Response |
| :---- | :---- |
| Model Application Field(s): | 3D Asset Generation |
| Describe the life critical impact. | Not Applicable. The model is not intended for direct life-critical decision-making, and outputs should not be used as the sole basis for autonomous vehicle perception, robotics control, or operational safety decisions. Additional validation and testing should be incorporated prior to deployment in real-world production. |
| Use Case Restrictions: | Abide by [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/) |
| Model and dataset restrictions: | The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to. |