Upload AH_multiview_diffusion_turbo.safetensors

#2
by jeanlancel - opened
.gitattributes CHANGED
@@ -33,4 +33,3 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
- *.gif filter=lfs diff=lfs merge=lfs -text
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
docs/pipeline.gif → AH_multiview_diffusion_turbo.safetensors RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9f7df1837e9eae37f572dc07bd30f96372b89d4c2ec1ba83e626f2acae7abcd8
3
- size 3645159
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3dbbc54c2db8875016234a732d18070a0705e9709dee4b8ae7ef61895f08a075
3
+ size 3345066418
AH_tokengs_lifting.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:576a4250e373547c6864cc3fa6ec310b7c66dd06b8025d609ec6681405896ff8
3
- size 1299556696
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9650e8aeeb9dbb5f42231044f6da327046043de0023f6ce64d0ea2f7c5cbdf85
3
+ size 1299556656
README.md CHANGED
@@ -1,183 +1,100 @@
1
  ---
2
  language:
3
- - en
4
  license: other
5
  license_name: nvidia-open-model-license
6
  license_link: >-
7
  https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license
8
  tags:
9
- - nvidia
10
- - asset-harvester
11
- - image-to-3d
12
- - 3d-generation
13
- - gaussian-splatting
14
- - physical-ai
15
  pipeline_tag: image-to-3d
16
  ---
17
 
18
  # Asset Harvester | System Model Card
19
- **Paper** | **Project Page** | [**Code**](https://github.com/NVIDIA/asset-harvester) | [**Model**](https://huggingface.co/nvidia/asset-harvester) | [**Data**](https://huggingface.co/datasets/nvidia/PhysicalAI-Autonomous-Vehicles-NCore)
20
 
21
  ## **Description:**
22
 
23
- **Asset Harvester** is an image-to-3D model and end-to-end system that converts sparse, in-the-wild object observations from real driving logs into complete, simulation-ready assets. The model generates 3D assets from a single image or multiple images of vehicles, VRUs or other road objects extracted from autonomous driving sessions. To run Asset Harvester, please check our [**codebase**](https://github.com/NVIDIA/asset-harvester).
24
-
25
- <p align="center">
26
- <img src="docs/pipeline.gif" alt="Asset Harvester teaser" width="100%" style="border: none;">
27
- </p>
28
-
29
- **Asset Harvester** turns real-world driving logs into complete, simulation-ready 3D assets — from just one or a few in-the-wild object views. It handles vehicles, pedestrians, riders, and other road objects, even under heavy occlusion, noisy calibration, and extreme viewpoint bias. A multiview diffusion model generates consistent novel viewpoints, and a feed-forward Gaussian reconstructor lifts them to full 3D in seconds. The result: high-fidelity 3D Gaussian splat assets ready for insertion into simulation environments. The pipeline plugs directly into NVIDIA NCore and NuRec for scalable data ingestion and closed-loop simulation.
30
-
31
- Here's how the model checkpoints in this repo are used in the end-to-end system following the order in the pipeline: The [AV object Mask2former](model_cards/AV_Object_Mask2former.md) instance segmentation model is used for image processing when parsing input views from NCore data sessions.
32
- The input images are encoded by [C-Radio](https://huggingface.co/nvidia/C-RADIO),
33
- and the multiview diffusion model, [SparseViewDiT](model_cards/MultiviewDiffusion.md), is then used to generate 16 multiview images of the input objects.
34
- In cases where camera parameters are not provided, the multiview diffusion model includes a camera pose estimation submodule that predicts camera parameters for the input images.
35
- Lastly, an [Object TokenGS](model_cards/Object_TokenGS.md) lifts the images to a 3D asset.
36
 
37
  This system is ready for commercial/non-commercial use
38
 
39
- <details>
40
- <summary><big><big><strong>🚗 Example Results 🚗</strong></big></big></summary>
41
-
42
- Each row contains the input image, object mask, and a rendering of the harvested 3DGS asset.
43
-
44
- #### 1. Vehicles / Trucks / Trailers
45
-
46
- <table>
47
- <tr>
48
- <td align="center"><img src="docs/in_the_wild_examples/bus_01.jpg" width="860"></td>
49
- </tr>
50
- <tr>
51
- <td align="center"><img src="docs/in_the_wild_examples/trailer_01.jpg" width="860"></td>
52
- </tr>
53
- <tr>
54
- <td align="center"><img src="docs/in_the_wild_examples/tractor_01.jpg" width="860"></td>
55
- </tr>
56
- <tr>
57
- <td align="center"><img src="docs/in_the_wild_examples/truck_01.jpg" width="860"></td>
58
- </tr>
59
- <tr>
60
- <td align="center"><img src="docs/in_the_wild_examples/sedan_01.jpg" width="860"></td>
61
- </tr>
62
- <tr>
63
- <td align="center"><img src="docs/in_the_wild_examples/suv_01.jpg" width="860"></td>
64
- </tr>
65
- <tr>
66
- <td align="center"><img src="docs/in_the_wild_examples/suv_02.jpg" width="860"></td>
67
- </tr>
68
- <tr>
69
- <td align="center"><img src="docs/in_the_wild_examples/sedan_02.jpg" width="860"></td>
70
- </tr>
71
- </table>
72
-
73
- #### 2. VRUs
74
-
75
- <table>
76
- <tr>
77
- <td align="center"><img src="docs/in_the_wild_examples/pedestrian_01.jpg" width="860"></td>
78
- </tr>
79
- <tr>
80
- <td align="center"><img src="docs/in_the_wild_examples/pedestrian_03.jpg" width="860"></td>
81
- </tr>
82
- <tr>
83
- <td align="center"><img src="docs/in_the_wild_examples/pedestrian_04.jpg" width="860"></td>
84
- </tr>
85
- <tr>
86
- <td align="center"><img src="docs/in_the_wild_examples/pedestrian_05.jpg" width="860"></td>
87
- </tr>
88
- <tr>
89
- <td align="center"><img src="docs/in_the_wild_examples/pedestrian_06.jpg" width="860"></td>
90
- </tr>
91
- <tr>
92
- <td align="center"><img src="docs/in_the_wild_examples/cyclist_02.jpg" width="860"></td>
93
- </tr>
94
- <tr>
95
- <td align="center"><img src="docs/in_the_wild_examples/stroller_01.jpg" width="860"></td>
96
- </tr>
97
- <tr>
98
- <td align="center"><img src="docs/in_the_wild_examples/stroller_02.jpg" width="860"></td>
99
- </tr>
100
- </table>
101
-
102
- #### 3. Other
103
-
104
- <table>
105
- <tr>
106
- <td align="center"><img src="docs/in_the_wild_examples/bin_01.jpg" width="860"></td>
107
- </tr>
108
- </table>
109
-
110
- </details>
111
-
112
- ### **License/Terms of Use**:
113
-
114
- ### Governing Terms: Use of this model system is governed by the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/) .
115
 
116
- **Deployment Geography:** Global
117
 
118
- ### **Release Management:**
119
-
120
- This system is exposed as a collection of models on [HuggingFace](https://huggingface.co/nvidia/asset-harvester) and inference scripts on [Github](https://github.com/NVIDIA/asset-harvester).
121
 
122
  ## **Automation Level:**
123
 
124
- Partial Automation
125
 
126
  ## **Use Case:**
127
 
128
- Physical AI developers who are looking to create 3D assets of vehicles or VRUs for either closed-loop simulation or Synthetic Data Generation (SDG).
129
 
130
  ## **Known Technical Limitations:**
131
 
132
- The system is not guaranteed to perform well with occluded objects or objects that are outside of the common distribution. For example, a heavily occluded vehicle can generate a poor or hallucinated 3D asset.
133
-
134
 
135
- ## Known Risk(s):
136
 
137
  AV and robotics developers should be aware that this model cannot guarantee a 100% success rate. In cases of unsuccessful generation, the output may not possess an accurate real-world representation of the asset and should not be relied upon in safety-critical simulations.
138
 
139
- ##
140
 
141
- **Reference(s):** _(coming soon)_
142
 
143
- [Asset Harvester: Turning Autonomous Driving Logs into 3D Assets for Simulation]()
144
 
145
  ## **System Architecture**
146
- System architecture details described in white paper above.
 
 
 
 
 
 
 
 
147
 
148
  ## **System Input:**
149
 
150
- **Input Type(s):** 1 or more images (up until 4\)
151
- **Input Format:** Red, Green, Blue (RGB)
152
- **Input Parameters:** Two-Dimensional (2D)
153
- **Other Properties Related to Input:**
154
 
155
- We currently accept up to 4 input images for each object. The resolution of the images are 512x512. The input images are extracted from NVIDIAs NCore data along w/ other metadata needed for downstream processing:
156
 
157
- * Camera orientation of each image
158
- * Camera distance of each image
159
- * Camera field of view of each image
160
  * Bounding box dimensions of each object
161
 
162
  ## **System Output:**
163
 
164
- **Output Type(s):** Corresponding 3D Gaussian asset to the object in input images
165
- **Output Format:** Polygon File Format (PLY)
166
- **Output Parameters:** Three-Dimensional (3D)
167
- **Other Properties Related to Output:**
168
 
169
- A PLY file (3D Gaussian Splatting, 3DGS) contains 3D object data with the following specific components:
170
 
171
- * **Header**: Defines the file structure, including format (ASCII or binary), Gaussian elements, their properties (e.g., position, appearance coefficients, opacity, scale, rotation), and data types (e.g., float, int).
172
- * **Gaussian Data**: Stores the parameters of each 3D Gaussian as vertex elements: center position (`x`, `y`, `z`), spherical harmonics DC coefficients (`f_dc_0`, `f_dc_1`, `f_dc_2`), `opacity`, anisotropic scale (`scale_0`, `scale_1`, `scale_2`), and rotation quaternion (`rot_0`, `rot_1`, `rot_2`, `rot_3`).
173
 
174
  ## **Hardware Compatibility:**
175
 
176
  **Supported Hardware Microarchitecture Compatibility:**
177
 
178
- * NVIDIA Ampere
179
- * NVIDIA Blackwell
180
- * NVIDIA Hopper
181
  * NVIDIA Lovelace
182
 
183
  **Preferred/Supported Operating Systems:** Linux
@@ -186,14 +103,14 @@ A PLY file (3D Gaussian Splatting, 3DGS) contains 3D object data with the follow
186
 
187
  The systems can run on a single GPU with an Nvidia GPU with CUDA Compute Capability greater than or equal to 8.0. The following is required:
188
 
189
- * GPU performance \>= 300 Tflops
190
- * GPU memory size \>= 30GB
191
- * GPU memory bandwidth \>= 768 GB/s
192
- * System RAM \>= 32 GB
193
- * System disk storage \>= 100GB
194
  * CPU \>= 16 threads x 3GHz
195
 
196
-
197
 
198
  ## **System Version:**
199
 
@@ -201,7 +118,7 @@ Asset\_Harvester\_GA
201
 
202
  ## **Inference:**
203
 
204
- **Engine:** Pytorch
205
  **Test Hardware:** A100, H100
206
 
207
  ## **Ethical Considerations:**
@@ -214,30 +131,30 @@ Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.
214
 
215
  ## Model Card++
216
 
217
- **Bias**
218
 
219
  | Field | Response |
220
  | :---- | :---- |
221
  | Participation considerations from adversely impacted groups [protected classes](https://www.senate.ca.gov/content/protected-classes) in model design and testing: | None |
222
  | Measures taken to mitigate against unwanted bias: | None |
223
 
224
- **Explainability**
225
 
226
  | Field | Response |
227
  | :---- | :---- |
228
- | Intended Domain | Autonomous Driving Simulation |
229
  | Model Type: | Image-to-3D Asset |
230
  | Intended Users: | Autonomous Vehicles developers enhancing and improving Neural Reconstruction pipelines. |
231
  | Output | 3D Asset |
232
- | Describe how the model works | The system takes as an input one or few images, and outputs a corresponding 3D asset |
233
  | Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of | None |
234
- | Technical Limitations | The system is not guaranteed to perform well with occluded objects or objects that are outside of the common distribution. For example, a heavily occluded vehicle image can generate a poor or hallucinated 3D asset |
235
  | Verified to have met prescribed NVIDIA quality standards | Yes |
236
  | Performance Metrics | PSNR (Peak Signal-to-Noise Ratio) |
237
  | Potential Known Risks | AV and robotics developers should be aware that this model cannot guarantee a 100% success rate. In cases of unsuccessful generation, the output may not possess an accurate real-world representation of the asset and should not be relied upon in safety-critical simulations. |
238
- | Licensing | Use of this model system is governed by the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/). |
239
 
240
- **Privacy**
241
 
242
  | Field | Response |
243
  | :---- | :---- |
@@ -255,11 +172,17 @@ Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.
255
  | Is data compliant with data subject requests for data correction or removal, if such a request was made? | Yes |
256
  | Applicable Privacy Policy | [https://www.nvidia.com/en-us/about-nvidia/privacy-policy/](https://www.nvidia.com/en-us/about-nvidia/privacy-policy/) |
257
 
258
- **Safety & Security**
259
 
260
  | Field | Response |
261
  | :---- | :---- |
262
  | Model Application(s): | 3D Asset Generation |
263
  | Describe the life critical impact (if present). | N/A \- The system should not be deployed in a vehicle to perform life-critical tasks. |
264
- | Use Case Restrictions: | Use of this model system is governed by the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/) |
265
  | Model and dataset restrictions: | The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training |
 
 
 
 
 
 
 
1
  ---
2
  language:
3
+ - en
4
  license: other
5
  license_name: nvidia-open-model-license
6
  license_link: >-
7
  https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license
8
  tags:
9
+ - nvidia
10
+ - asset-harvester
11
+ - image-to-3d
12
+ - 3d-generation
13
+ - gaussian-splatting
14
+ - physical-ai
15
  pipeline_tag: image-to-3d
16
  ---
17
 
18
  # Asset Harvester | System Model Card
 
19
 
20
  ## **Description:**
21
 
22
+ Asset Harvester is a system that leverages 4 models (see System Architecture below) to generate three-dimensional (3D) assets from a single image or multiple images of vehicles. [Mask2Former](https://docs.google.com/document/d/1OKMAhNruoLE254xLLdIWULPuwUWGNsbpg36BNUnpTSQ/edit?tab=t.0#heading=h.7axn5fq6ipu5) and [C-RADIO](https://huggingface.co/nvidia/C-RADIO) are used for view extraction from NCore data sessions, the [Multiview Diffusion (Sana-based)](https://docs.google.com/document/d/1y7qU1to8TrV07Tfz3crxJiuA_AL0Wlwwp6C-RW-NoLg/edit?tab=t.0#heading=h.g8ogslbqcx12) is then used to generate 16 multiview images of the input vehicle, and lastly [TokenGS](https://docs.google.com/document/d/1EZWB-had-1MMmrES9bvQlJHpjXQFawR619sX3HvsVpQ/edit?usp=sharing) generates the output 3D asset.
 
 
 
 
 
 
 
 
 
 
 
 
23
 
24
  This system is ready for commercial/non-commercial use
25
 
26
+ ### **License/Terms of Use**:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
 
28
+ ### GOVERNING TERMS: Your use of the model is governed by the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/) .
29
 
30
+ **Deployment Geography:** Global
 
 
31
 
32
  ## **Automation Level:**
33
 
34
+ Full Automation
35
 
36
  ## **Use Case:**
37
 
38
+ Physical AI developers who are looking to create 3D assets of vehicles for either closed-loop simulation or Synthetic Data Generation (SDG).
39
 
40
  ## **Known Technical Limitations:**
41
 
42
+ The system is not guaranteed to perform well with occluded objects or objects that are outside of the common distribution. For example, a heavily occluded vehicle can generate a poor or hallucinated 3D asset, like the following example:
 
43
 
44
+ ## Known Risk(s):
45
 
46
  AV and robotics developers should be aware that this model cannot guarantee a 100% success rate. In cases of unsuccessful generation, the output may not possess an accurate real-world representation of the asset and should not be relied upon in safety-critical simulations.
47
 
48
+ ##
49
 
50
+ **Release Date:** Public Github \[03/12/2026\]
51
 
52
+ **Reference(s):** None
53
 
54
  ## **System Architecture**
55
+
56
+ **Architecture Diagram:**
57
+
58
+ The following models are used by this system:
59
+
60
+ * [Mask2Former Model Card](https://docs.google.com/document/d/1OKMAhNruoLE254xLLdIWULPuwUWGNsbpg36BNUnpTSQ/edit?tab=t.0#heading=h.7axn5fq6ipu5)
61
+ * [C-RADIO Model Card](https://huggingface.co/nvidia/C-RADIO)
62
+ * [Multiview Diffusion (Sana-based) Model Card](https://docs.google.com/document/d/1y7qU1to8TrV07Tfz3crxJiuA_AL0Wlwwp6C-RW-NoLg/edit?tab=t.0#heading=h.g8ogslbqcx12)
63
+ * [TokenGS Model Card](https://docs.google.com/document/d/1EZWB-had-1MMmrES9bvQlJHpjXQFawR619sX3HvsVpQ/edit?usp=sharing)
64
 
65
  ## **System Input:**
66
 
67
+ **Input Type(s):** 1 or more images (up until 4\)
68
+ **Input Format:** Red, Green, Blue (RGB)
69
+ **Input Parameters:** Two-Dimensional (2D)
70
+ **Other Properties Related to Input:**
71
 
72
+ We currently accept up to 4 input images for each object. The resolution of the images are 512x512. The input images are extracted from NVIDIA's NCore data along w/ other metadata needed for downstream processing:
73
 
74
+ * Camera orientation of each image
75
+ * Camera distance of each image
76
+ * Camera field of view of each image
77
  * Bounding box dimensions of each object
78
 
79
  ## **System Output:**
80
 
81
+ **Output Type(s):** Corresponding 3D Gaussian asset to the object in input images
82
+ **Output Format:** Polygon File Format (PLY)
83
+ **Output Parameters:** Three-Dimensional (3D)
84
+ **Other Properties Related to Output:**
85
 
86
+ A [PLY file](https://en.wikipedia.org/wiki/PLY_(file_format)#:~:text=PLY%20is%20a%20computer%20file,dimensional%20data%20from%203D%20scanners.) (3D Gaussian Splatting, 3DGS) contains 3D object data with the following specific components:
87
 
88
+ * **Header**: Defines the file structure, including format (ASCII or binary), Gaussian elements, their properties (e.g., position, appearance coefficients, opacity, scale, rotation), and data types (e.g., float, int).
89
+ * **Gaussian Data**: Stores the parameters of each 3D Gaussian, including its center position (`x, y, z`), and optionally properties such as normals (`nx, ny, nz`), color or spherical harmonics coefficients (`f_dc_0, f_dc_1, f_dc_2`, and higher-order terms), opacity, anisotropic scale, and rotation.
90
 
91
  ## **Hardware Compatibility:**
92
 
93
  **Supported Hardware Microarchitecture Compatibility:**
94
 
95
+ * NVIDIA Ampere
96
+ * NVIDIA Blackwell
97
+ * NVIDIA Hopper
98
  * NVIDIA Lovelace
99
 
100
  **Preferred/Supported Operating Systems:** Linux
 
103
 
104
  The systems can run on a single GPU with an Nvidia GPU with CUDA Compute Capability greater than or equal to 8.0. The following is required:
105
 
106
+ * GPU performance \>= 300 Tflops
107
+ * GPU memory size \>= 30GB
108
+ * GPU memory bandwidth \>= 768 GB/s
109
+ * System RAM \>= 32 GB
110
+ * System disk storage \>= 100GB
111
  * CPU \>= 16 threads x 3GHz
112
 
113
+ ##
114
 
115
  ## **System Version:**
116
 
 
118
 
119
  ## **Inference:**
120
 
121
+ **Engine:** Pytorch
122
  **Test Hardware:** A100, H100
123
 
124
  ## **Ethical Considerations:**
 
131
 
132
  ## Model Card++
133
 
134
+ ### Bias
135
 
136
  | Field | Response |
137
  | :---- | :---- |
138
  | Participation considerations from adversely impacted groups [protected classes](https://www.senate.ca.gov/content/protected-classes) in model design and testing: | None |
139
  | Measures taken to mitigate against unwanted bias: | None |
140
 
141
+ ### Explainability
142
 
143
  | Field | Response |
144
  | :---- | :---- |
145
+ | Intended Domain | Advanced Driver Assistance Systems |
146
  | Model Type: | Image-to-3D Asset |
147
  | Intended Users: | Autonomous Vehicles developers enhancing and improving Neural Reconstruction pipelines. |
148
  | Output | 3D Asset |
149
+ | Describe how the model works | The system takes as an input an image, and outputs a corresponding 3D asset |
150
  | Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of | None |
151
+ | Technical Limitations | The system is not guaranteed to perform well with occluded objects or objects that are outside of the common distribution. For example, a heavily occluded vehicle can generate a poor or hallucinated 3D asset |
152
  | Verified to have met prescribed NVIDIA quality standards | Yes |
153
  | Performance Metrics | PSNR (Peak Signal-to-Noise Ratio) |
154
  | Potential Known Risks | AV and robotics developers should be aware that this model cannot guarantee a 100% success rate. In cases of unsuccessful generation, the output may not possess an accurate real-world representation of the asset and should not be relied upon in safety-critical simulations. |
155
+ | Licensing | The use of the model is governed by the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/). |
156
 
157
+ ### Privacy
158
 
159
  | Field | Response |
160
  | :---- | :---- |
 
172
  | Is data compliant with data subject requests for data correction or removal, if such a request was made? | Yes |
173
  | Applicable Privacy Policy | [https://www.nvidia.com/en-us/about-nvidia/privacy-policy/](https://www.nvidia.com/en-us/about-nvidia/privacy-policy/) |
174
 
175
+ ### Safety & Security
176
 
177
  | Field | Response |
178
  | :---- | :---- |
179
  | Model Application(s): | 3D Asset Generation |
180
  | Describe the life critical impact (if present). | N/A \- The system should not be deployed in a vehicle to perform life-critical tasks. |
181
+ | Use Case Restrictions: | Abide by [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/) |
182
  | Model and dataset restrictions: | The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training |
183
+
184
+ [image1]: images/image1.png
185
+
186
+ [image2]: images/image2.png
187
+
188
+ [image3]: images/image3.png
config.json DELETED
@@ -1,27 +0,0 @@
1
- {
2
- "format_version": 1,
3
- "name": "Asset Harvester",
4
- "description": "Bundle manifest for the Asset Harvester system model repository.",
5
- "components": [
6
- {
7
- "name": "camera_estimator",
8
- "file": "AH_camera_estimator.safetensors",
9
- "role": "camera estimation"
10
- },
11
- {
12
- "name": "multiview_diffusion",
13
- "file": "AH_multiview_diffusion.safetensors",
14
- "role": "multiview image generation"
15
- },
16
- {
17
- "name": "object_segmentation",
18
- "file": "AH_object_seg_jit.pt",
19
- "role": "object segmentation"
20
- },
21
- {
22
- "name": "tokengs_lifting",
23
- "file": "AH_tokengs_lifting.safetensors",
24
- "role": "3D lifting"
25
- }
26
- ]
27
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/in_the_wild_examples/bin_01.jpg DELETED
Binary file (55.4 kB)
 
docs/in_the_wild_examples/bus_01.jpg DELETED
Binary file (53.4 kB)
 
docs/in_the_wild_examples/cyclist_02.jpg DELETED
Binary file (64.5 kB)
 
docs/in_the_wild_examples/pedestrian_01.jpg DELETED
Binary file (57.3 kB)
 
docs/in_the_wild_examples/pedestrian_03.jpg DELETED
Binary file (59 kB)
 
docs/in_the_wild_examples/pedestrian_04.jpg DELETED
Binary file (37.8 kB)
 
docs/in_the_wild_examples/pedestrian_05.jpg DELETED
Binary file (52.3 kB)
 
docs/in_the_wild_examples/pedestrian_06.jpg DELETED
Binary file (48.2 kB)
 
docs/in_the_wild_examples/sedan_01.jpg DELETED
Binary file (52.2 kB)
 
docs/in_the_wild_examples/sedan_02.jpg DELETED
Binary file (50.2 kB)
 
docs/in_the_wild_examples/stroller_01.jpg DELETED
Binary file (76.2 kB)
 
docs/in_the_wild_examples/stroller_02.jpg DELETED
Binary file (72.1 kB)
 
docs/in_the_wild_examples/suv_01.jpg DELETED
Binary file (49.8 kB)
 
docs/in_the_wild_examples/suv_02.jpg DELETED
Binary file (43.8 kB)
 
docs/in_the_wild_examples/tractor_01.jpg DELETED
Binary file (62.5 kB)
 
docs/in_the_wild_examples/trailer_01.jpg DELETED
Binary file (41.4 kB)
 
docs/in_the_wild_examples/truck_01.jpg DELETED
Binary file (69.2 kB)
 
model_cards/AV_Object_Mask2former.md DELETED
@@ -1,144 +0,0 @@
1
- # Mask2Former Overview | Model Card
2
-
3
- ## **Description:**
4
-
5
- The AV Object Mask2Former is a model that performs object instance segmentation tasks. It was trained on object-centric AV images.
6
-
7
- This model is used in the Asset Harvester System.
8
-
9
- ### **License/Terms of Use:**
10
-
11
- GOVERNING TERMS: The use of the model is governed by the [NVIDIA Software and Model Evaluation License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).
12
-
13
- ### **Deployment Geography:**
14
-
15
- Global
16
-
17
- ### **Use Case:**
18
-
19
- The model can be used for segmenting object-centric AV images. Given an image cropped from AV video, it output binary mask of the object in the center of the image.
20
-
21
- ### **Release Date:**
22
-
23
- HuggingFace 03/16/26
24
-
25
- ## **Reference:**
26
-
27
- [Bowen Cheng](https://arxiv.org/search/cs?searchtype=author&query=Cheng,+B), [Ishan Misra](https://arxiv.org/search/cs?searchtype=author&query=Misra,+I), [Alexander G. Schwing](https://arxiv.org/search/cs?searchtype=author&query=Schwing,+A+G), [Alexander Kirillov](https://arxiv.org/search/cs?searchtype=author&query=Kirillov,+A), [Rohit Girdhar](https://arxiv.org/search/cs?searchtype=author&query=Girdhar,+R), Masked-attention Mask Transformer for Universal Image Segmentation, [https://arxiv.org/abs/2112.01527](https://arxiv.org/abs/2112.01527).
28
-
29
- ## **Model Architecture:**
30
-
31
- * Fully Convolutional Networks (FCNs) + Transformer
32
-
33
- ## **Input:**
34
-
35
- * **Input Type(s):** Image
36
- * **Input Format(s):** Red, Green, Blue (RGB)
37
- * **Input Parameters:** The input parameters to this model are 2D query features (X0) and 3D image features (Kl, Vl) with dimensions N x C, where N is the number of query features and C is the number of channels.
38
- * **Other Properties Related to Input:** Spatial resolution of image features: 32, 16, 8.
39
-
40
- ## **Output:**
41
-
42
- * **Output Type(s):** Image
43
- * **Output Format(s):** Binary mask
44
- * **Output Parameters:** The output parameters of this model are the predicted mask for each query, with dimensions of the input query features being N x C, where N is the number of query features and C is the number of channels.
45
- * **Other Properties Related to Output:** Resolution: H1=H=32, H2=H=16, H3=H=8 and W1=W=32, W2=W=16
46
-
47
- Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
48
-
49
- ## **Software Integration:**
50
-
51
- **Runtime Engine(s):**
52
- PyTorch
53
-
54
- **Supported Hardware Microarchitecture Compatibility:**
55
-
56
- * NVIDIA Ampere
57
- * NVIDIA Blackwell
58
- * NVIDIA Hopper
59
- * NVIDIA Lovelace
60
-
61
- **[Preferred/Supported] Operating System(s):**
62
- Linux
63
-
64
- ## **Model Version(s):**
65
-
66
- V1
67
-
68
- ## **Training, Testing, and Evaluation Datasets:**
69
-
70
- The AV Object Mask2former was trained, tested, and evaluated using NVIDIA proprietary AV dataset.
71
-
72
- | Dataset names | Size and content | Training partition | Test partition |
73
- | :---- | :---- | :---- | :---- |
74
- | Internal Nvidia AV dataset | Posed images of 278k objects | 83% (cross validation) | 17% |
75
-
76
- ### Internal NVIDIA AV dataset
77
-
78
- **Link:** N/A
79
-
80
- **Data Collection Method:** Sensors
81
-
82
- **Labeling Method by Dataset:** Automated. The labels we collected are binary masks of objects in the images.
83
-
84
- **Properties**: This dataset was collected using sensors mounted on the NVIDIA fleet and was auto-labeled using a third party tool to ensure high-quality annotations.
85
-
86
- ## **Inference:**
87
-
88
- **Engine:**
89
- PyTorch
90
-
91
- **Test Hardware:**
92
- A6000
93
-
94
- ## **Ethical Considerations:**
95
-
96
- NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
97
-
98
- For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards.
99
-
100
- Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).
101
-
102
- **Bias**
103
-
104
- | Field | Response |
105
- | :---- | :---- |
106
- | Participation considerations from adversely impacted groups [protected classes](https://www.senate.ca.gov/content/protected-classes) in model design and testing: | None |
107
- | Measures taken to mitigate against unwanted bias: | None |
108
-
109
- **Explainability**
110
-
111
- | Field | Response |
112
- | :---- | :---- |
113
- | Intended Domain | Advanced Driver Assistance Systems |
114
- | Model Type: | Object detection and Instance segmentation |
115
- | Intended Users: | Autonomous Vehicles developers enhancing and improving Neural Reconstruction pipelines. |
116
- | Output | Image Segmentation |
117
- | Describe how the model works | The model takes as an input an image, and outputs a segmentation mask of the image |
118
- | Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of | None |
119
- | Technical Limitations | The system does not guarantee a 100% success rate. The model was trained mostly on vehicles and would not perform well on pedestrians, cyclists, or other non-vehicular objects and struggles with small objects |
120
- | Verified to have met prescribed NVIDIA quality standards | Yes |
121
- | Performance Metrics | Intersection over Union (IOU) |
122
- | Potential Known Risks | AV and robotics developers should be aware that this model cannot guarantee a 100% success rate. In cases of unsuccessful generation, the output may not possess an accurate real-world representation of the asset and should not be relied upon in safety-critical simulations. |
123
- | Licensing | The use of the model is governed by the [NVIDIA Software and Model Evaluation License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/). |
124
-
125
- **Privacy**
126
-
127
- | Field | Response |
128
- | :---- | :---- |
129
- | Generatable or reverse engineerable personal data? | No |
130
- | Personal data used to create this model? | No |
131
- | How often is the dataset reviewed? | Before release |
132
- | Is there provenance for all datasets used in training? | Yes |
133
- | Does data labeling (annotation, metadata) comply with privacy laws? | Yes |
134
- | Is data compliant with data subject requests for data correction or removal, if such a request was made? | Yes |
135
- | Applicable Privacy Policy | [https://www.nvidia.com/en-us/about-nvidia/privacy-policy/](https://www.nvidia.com/en-us/about-nvidia/privacy-policy/) |
136
-
137
- **Safety & Security**
138
-
139
- | Field | Response |
140
- | :---- | :---- |
141
- | Model Application(s): | Object detection and Segmentation |
142
- | Describe the life critical impact (if present). | N/A \- The model should not be deployed in a vehicle to perform life-critical tasks. |
143
- | Use Case Restrictions: | The use of the model is governed by the [NVIDIA Software and Model Evaluation License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/). |
144
- | Model and dataset restrictions: | The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to. |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
model_cards/MultiviewDiffusion.md DELETED
@@ -1,170 +0,0 @@
1
- # Multiview Diffusion (Sana-based) | Model Card
2
-
3
- ## **Description:**
4
-
5
- The multiview diffusion model was trained on AV object images with a SANA base model. The model is conditioned on image input and outputs images of the same object in different viewpoints. It doesn't support text input.
6
-
7
- This model is used as part of the Asset Harvester GA.
8
-
9
- ### **License/Terms of Use:**
10
-
11
- ### Governing Terms: Use of this model system is governed by the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/) .
12
-
13
- ### **Deployment Geography:**
14
-
15
- Global
16
-
17
- ### **Use Case:**
18
-
19
- The multiview diffusion model takes a set of posed images as input and outputs 16 images from different viewpoints of the same input vehicle. The goal of it is to provide the 16 output images as input for three-dimensional (3D) reconstruction to generate 3D assets.
20
-
21
- ### **Release Date:**
22
-
23
- HuggingFace
24
-
25
- ## **Reference(s):**
26
-
27
- **Asset-Harvester: Turning Autonomous Driving Logs into 3D Assets for Simulation.** *NVIDIA white paper.*
28
- \[later we replace it with our paper link\]
29
-
30
- ## **Model Architecture:**
31
-
32
- **Architecture Type:** Linear Diffusion Transformer
33
-
34
- **Network Architecture:** Sparse View Linear-attention Diffusion Transformer, as described in our white paper,
35
- with a Deep Compression Autoencoder (DC-AE) for efficient high-resolution image generation. C-RADIO for image conditioning signal.
36
-
37
- ## **Input:**
38
-
39
- **Input Type(s):** Up to 4 Images (Adjustable via config parameter)
40
-
41
- **Input Format(s):** Red, Green, Blue (RGB)
42
-
43
- **Input Parameters:** Two-Dimensional (2D)
44
-
45
- **Other Properties Related to Input:** Camera matrices of images
46
-
47
- ## **Output:**
48
-
49
- **Output Type(s):** 16 Images
50
-
51
- **Output Format(s):** Red, Green, Blue (RGB)
52
-
53
- **Output Parameters:** Two-Dimensional (2D)
54
-
55
- **Other Properties Related to Output:** Camera poses of images
56
-
57
- Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
58
-
59
- ## **Software Integration:**
60
-
61
- **Runtime Engine(s):**
62
- PyTorch
63
-
64
- **Supported Hardware Microarchitecture Compatibility:**
65
- NVIDIA Ampere
66
-
67
- **[Preferred/Supported] Operating System(s):**
68
- Linux
69
-
70
- ## **Model Version(s):**
71
-
72
- v1
73
-
74
- ## **Training, Testing, and Evaluation Datasets:**
75
-
76
- The model was trained, tested, and finetuned using an Objaverse subset internal AV data, and Omniverse 3D assets (synthetic images).
77
-
78
- | Dataset names | Size and content | Training partition | Test partition |
79
- | :---- | :---- | :---- | :---- |
80
- | Nvidia Proprietary AV dataset | Posed images of 278k objects | 83% (cross validation) | 17% |
81
- | Omniverse 3D assets | 200 3D assets of objects | 100% | 0% |
82
- | Objaverse | 80k assets collected under commercially viable Creative Commons licenses, | 100% | 0% |
83
-
84
- ### Objaverse Commercially Viable Subset under CC licenses
85
-
86
- **Link:** https://objaverse.allenai.org
87
- **Data Collection Method:** Synthetic 3D assets aggregated from various open-source and licensed sources
88
- **Labeling Method by Dataset:** Hybrid: Human and Automated
89
- **Properties:** This dataset consists of a diverse set of over 80,000 synthetic 3D object models spanning everyday items, animals, tools, and complex structures. Each model is rendered into multi-view 2D images with associated camera poses, materials, and mesh properties.
90
-
91
- ### Nvidia Proprietary AV dataset
92
-
93
- **Data Collection Method:** Sensors
94
-
95
- **Labeling Method by Dataset:** Human
96
-
97
- **Properties**: This dataset was collected using sensors mounted on the NVIDIA fleet and was manually labeled by a team of human annotators to ensure high-quality annotations.
98
-
99
- ### Omniverse 3D assets
100
-
101
- **Data Collection Method:** Human
102
-
103
- **Labeling Method by Dataset:** Human
104
-
105
- **Properties**: This dataset was collected using humans that create 3D assets.
106
-
107
- ## **Inference:**
108
-
109
- **Engine:** PyTorch>=2.0.0
110
-
111
- **Test Hardware:**
112
- We tested on H100, A100, A6000 and RTX4090. Inference time using 1XA100 is 7 seconds per 16 images.
113
-
114
- ## **Ethical Considerations:**
115
-
116
- NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
117
-
118
- For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards.
119
-
120
- Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).
121
-
122
- **Bias**
123
-
124
- | Field | Response |
125
- | :---- | :---- |
126
- | Participation considerations from adversely impacted groups [protected classes](https://www.senate.ca.gov/content/protected-classes) in model design and testing: | None |
127
- | Measures taken to mitigate against unwanted bias: | None |
128
-
129
- **Explainability**
130
-
131
- | Field | Response |
132
- | :---- | :---- |
133
- | Intended Domain | Advanced Driver Assistance Systems |
134
- | Model Type: | Multiview creation |
135
- | Intended Users: | Autonomous Vehicles developers enhancing and improving Neural Reconstruction pipelines. |
136
- | Output | 16 images |
137
- | Describe how the model works | The model takes as an input an image (up to 4\) and outputs 16 multiviews of the vehicles detected in the original image |
138
- | Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of | None |
139
- | Technical Limitations | The system does not guarantee a 100% success rate. It cannot fully guarantee the safety and controllability of the generated image content. Additionally, challenges remain in certain complex cases, such as text rendering and the generation of faces and hands. |
140
- | Verified to have met prescribed NVIDIA quality standards | Yes |
141
- | Performance Metrics | Peak signal-to-noise ratio (PSNR), FID (Frechet Inception Distance), CLIPScore |
142
- | Potential Known Risks | AV and robotics developers should be aware that this model cannot guarantee a 100% success rate. In cases of unsuccessful generation, the output may not possess an accurate real-world representation of the asset and should not be relied upon in safety-critical simulations. |
143
- | Licensing | The use of the model is governed by the [NVIDIA Software and Model Evaluation License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/). |
144
-
145
- **Privacy**
146
-
147
- | Field | Response |
148
- | :---- | :---- |
149
- | Generatable or reverse engineerable personal data? | No |
150
- | Personal data used to create this model? | Yes |
151
- | Was consent obtained for any personal data used? | Yes |
152
- | Is a mechanism in place to honor data subject right of access or deletion of personal data? | Yes |
153
- | If personal data was collected for the development of the model, was it collected directly by NVIDIA? | No |
154
- | If personal data was collected for the development of the model by NVIDIA, do you maintain or have access to disclosures made to data subjects? | N/A |
155
- | If personal data was collected for the development of this AI model, was it minimized to only what was required? | Yes |
156
- | Was data from user interactions with the AI model (e.g. user input and prompts) used to train the model? | Yes |
157
- | How often is the dataset reviewed? | Before release |
158
- | Is there provenance for all datasets used in training? | Yes |
159
- | Does data labeling (annotation, metadata) comply with privacy laws? | Yes |
160
- | Is data compliant with data subject requests for data correction or removal, if such a request was made? | Yes |
161
- | Applicable Privacy Policy | [https://www.nvidia.com/en-us/about-nvidia/privacy-policy/](https://www.nvidia.com/en-us/about-nvidia/privacy-policy/) |
162
-
163
- **Safety & Security**
164
-
165
- | Field | Response |
166
- | :---- | :---- |
167
- | Model Application(s): | Multiview creation |
168
- | Describe the life critical impact (if present). | N/A \- The model should not be deployed in a vehicle to perform life-critical tasks. |
169
- | Use Case Restrictions: | The use of the model is governed by the [NVIDIA Software and Model Evaluation License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/). |
170
- | Model and dataset restrictions: | The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to. |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
model_cards/Object_TokenGS.md DELETED
@@ -1,146 +0,0 @@
1
- # Object TokenGS | Model Card
2
-
3
- ## **Description:**
4
-
5
- The Object TokenGS is a feed-forward neural reconstruction model that takes posed multi-view RGB images as input and predicts a 3D Gaussian Splatting (3DGS) representation for the object.
6
- TokenGS directly regresses 3D Gaussian centers in global coordinates and decouples the number of predicted Gaussians from input image resolution and number of views by using learnable Gaussian tokens in an encoder-decoder Transformer.
7
-
8
- ### **License/Terms of Use:**
9
- The model is a submodule that follows the terms of [Asset Havester](https://huggingface.co/nvidia/asset-harvester),
10
-
11
- ### **Deployment Geography:**
12
-
13
- Global
14
-
15
- ### **Use Case:**
16
-
17
- Object TokenGS can be used for multi-view 3D object lifting. It takes multiview images as input, and convert them into 3D Gaussian assets.
18
-
19
- ### **Release Date:**
20
-
21
- This model is on [HuggingFace](https://huggingface.co/nvidia/asset-harvester) and inference script is on [Github](https://github.com/NVIDIA/asset-harvester).
22
-
23
- ## **References(s):**
24
-
25
- - [Asset-Harvester: Turning Autonomous Driving Logs into 3D Assets for Simulation. ]()
26
-
27
- ## **Model Architecture:**
28
-
29
- System architecture details described in white paper above.
30
-
31
- ## **Input:**
32
-
33
- **Input Type(s):** Image
34
- **Input Format(s):** Red, Green, Blue (RGB) images plus camera parameters
35
- **Input Parameters:** Two-Dimensional (2D) images with camera intrinsics and extrinsics; optional timestamp conditioning for dynamic reconstruction
36
- **Other Properties Related to Input:**
37
-
38
- - Input includes camera intrinsics and camera extrinsics.
39
- - Images with resolution `512 x 512`
40
-
41
- ## **Output:**
42
-
43
- **Output Type(s):** 3D Gaussian Splatting primitives and rendered RGB images
44
- **Output Format(s):** 3DGS parameter tensors (14 attributes per Gaussian primitive) renderable to novel RGB views via a differentiable Gaussian splatting renderer
45
- **Output Parameters:** 14-dimensional (14D) Gaussian attributes
46
- **Other Properties Related to Output:**
47
-
48
- Each Gaussian includes:
49
-
50
- - Mean or center: `(x, y, z)`
51
- - Color: `(r, g, b)`
52
- - Scale: `(sx, sy, sz)`
53
- - Opacity: `alpha`
54
- - Rotation: quaternion `(qw, qx, qy, qz)`
55
-
56
- Our AI models are designed and optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA hardware and CUDA-enabled software frameworks, the model achieves faster training and inference times compared to CPU-only solutions.
57
-
58
- ## **Software Integration:**
59
-
60
- **Supported Hardware Microarchitecture Compatibility:**
61
-
62
- - NVIDIA Ampere
63
- - NVIDIA Blackwell
64
- - NVIDIA Hopper
65
- - NVIDIA Lovelace
66
-
67
-
68
- **Supported Operating System(s):**
69
-
70
- - Linux
71
-
72
- The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.
73
-
74
- ## **Model Version:**
75
-
76
- Asset\_Harvester\_GA
77
-
78
- ## **Training, Testing, and Evaluation Datasets:**
79
-
80
- Details described in white paper above.
81
-
82
-
83
- ## **Inference:**
84
-
85
- **Acceleration Engine:** PyTorch
86
- **Test Hardware:** NVIDIA A100, H100
87
-
88
- ## **Ethical Considerations:**
89
-
90
- NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with the license terms, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
91
-
92
- For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards.
93
-
94
- Please make sure you have proper rights and permissions for all input image and video content; if image or video includes people, personal health information, or intellectual property, the generated image or video will not automatically blur or maintain the proportions of image subjects included.
95
-
96
- Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).
97
-
98
- **Bias**
99
-
100
- | Field | Response |
101
- | :---- | :---- |
102
- | Participation considerations from adversely impacted groups [protected classes](https://www.senate.ca.gov/content/protected-classes) in model design and testing: | None |
103
- | Measures taken to mitigate against unwanted bias: | None |
104
-
105
- **Explainability**
106
-
107
- | Field | Response |
108
- | :---- | :---- |
109
- | Intended Task/Domain: | Multi-view 3D object reconstruction. |
110
- | Model Type: | Transformer |
111
- | Intended Users: | 3D vision, simulation, graphics, and robotics or physical AI researchers and developers. |
112
- | Output | 3D Gaussian Splat representation and rendered novel views. |
113
- | Describe how the model works | Encoder-decoder Transformer with learnable Gaussian tokens directly regresses 3D Gaussian attributes from posed images, trained with rendering and visibility losses. |
114
- | Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of | None |
115
- | Technical Limitations & Mitigation | TokenGS may miss fine-grained geometric details. Quality depends on camera pose quality and multiview coverage, so users should validate outputs and provide sufficient view diversity and accurate camera metadata. |
116
- | Verified to have met prescribed NVIDIA quality standards | Yes |
117
- | Performance Metrics | PSNR, SSIM, LPIPS; additional comparisons under view extrapolation and camera-noise robustness. |
118
- | Potential Known Risks | Reconstruction failures or incomplete geometry may produce misleading renderings or assets. |
119
- | Licensing | The use of the model is governed by the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/). |
120
-
121
- **Privacy**
122
-
123
- | Field | Response |
124
- | :---- | :---- |
125
- | Generatable or reverse engineerable personal data? | No |
126
- | Personal data used to create this model? | No |
127
- | Was consent obtained for any personal data used? | Not Applicable |
128
- | How often is the dataset reviewed? | Before release |
129
- | Is a mechanism in place to honor data subject right of access or deletion of personal data? | Not Applicable |
130
- | If personal data was collected for the development of the model, was it collected directly by NVIDIA? | Not Applicable |
131
- | If personal data was collected for the development of the model by NVIDIA, do you maintain or have access to disclosures made to data subjects? | Not Applicable |
132
- | If personal data was collected for the development of this AI model, was it minimized to only what was required? | Not Applicable |
133
- | Was data from user interactions with the AI model (e.g. user input and prompts) used to train the model? | No |
134
- | Is there provenance for all datasets used in training? | Yes |
135
- | Does data labeling (annotation, metadata) comply with privacy laws? | Yes |
136
- | Is data compliant with data subject requests for data correction or removal, if such a request was made? | Yes |
137
- | Applicable Privacy Policy | [https://www.nvidia.com/en-us/about-nvidia/privacy-policy/](https://www.nvidia.com/en-us/about-nvidia/privacy-policy/) |
138
-
139
- **Safety & Security**
140
-
141
- | Field | Response |
142
- | :---- | :---- |
143
- | Model Application(s): | 3D object reconstruction|
144
- | Describe the life critical impact (if present). | Not Applicable. The model is not intended for direct life-critical decision-making, and outputs should not be used as the sole basis for autonomous vehicle perception, robotics control, or operational safety decisions. Additional validation and testing should be incorporated prior to deployment in real-world production. |
145
- | Use Case Restrictions: | Abide by [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/) |
146
- | Model and dataset restrictions: | The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training |