Rename model_cards/TokenGS.md to model_cards/Object_TokenGS.md
#5
by kangxuey - opened
- .gitattributes +0 -1
- docs/pipeline.gif → AH_multiview_diffusion_turbo.safetensors +2 -2
- AH_tokengs_lifting.safetensors +2 -2
- README.md +3 -87
- config.json +0 -27
- docs/in_the_wild_examples/bin_01.jpg +0 -0
- docs/in_the_wild_examples/bus_01.jpg +0 -0
- docs/in_the_wild_examples/cyclist_02.jpg +0 -0
- docs/in_the_wild_examples/pedestrian_01.jpg +0 -0
- docs/in_the_wild_examples/pedestrian_03.jpg +0 -0
- docs/in_the_wild_examples/pedestrian_04.jpg +0 -0
- docs/in_the_wild_examples/pedestrian_05.jpg +0 -0
- docs/in_the_wild_examples/pedestrian_06.jpg +0 -0
- docs/in_the_wild_examples/sedan_01.jpg +0 -0
- docs/in_the_wild_examples/sedan_02.jpg +0 -0
- docs/in_the_wild_examples/stroller_01.jpg +0 -0
- docs/in_the_wild_examples/stroller_02.jpg +0 -0
- docs/in_the_wild_examples/suv_01.jpg +0 -0
- docs/in_the_wild_examples/suv_02.jpg +0 -0
- docs/in_the_wild_examples/tractor_01.jpg +0 -0
- docs/in_the_wild_examples/trailer_01.jpg +0 -0
- docs/in_the_wild_examples/truck_01.jpg +0 -0
- model_cards/{MultiviewDiffusion.md → MultviewDiffusion.md} +4 -5
.gitattributes
CHANGED
|
@@ -33,4 +33,3 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
-
*.gif filter=lfs diff=lfs merge=lfs -text
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
docs/pipeline.gif → AH_multiview_diffusion_turbo.safetensors
RENAMED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:3dbbc54c2db8875016234a732d18070a0705e9709dee4b8ae7ef61895f08a075
|
| 3 |
+
size 3345066418
|
AH_tokengs_lifting.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:9650e8aeeb9dbb5f42231044f6da327046043de0023f6ce64d0ea2f7c5cbdf85
|
| 3 |
+
size 1299556656
|
README.md
CHANGED
|
@@ -16,99 +16,15 @@ pipeline_tag: image-to-3d
|
|
| 16 |
---
|
| 17 |
|
| 18 |
# Asset Harvester | System Model Card
|
| 19 |
-
**Paper** | **Project Page** | [**Code**](https://github.com/NVIDIA/asset-harvester) | [**Model**](https://huggingface.co/nvidia/asset-harvester) | [**Data**](https://huggingface.co/datasets/nvidia/PhysicalAI-Autonomous-Vehicles-NCore)
|
| 20 |
|
| 21 |
-
##
|
| 22 |
-
|
| 23 |
-
**Asset Harvester** is an image-to-3D model and end-to-end system that converts sparse, in-the-wild object observations from real driving logs into complete, simulation-ready assets. The model generates 3D assets from a single image or multiple images of vehicles, VRUs or other road objects extracted from autonomous driving sessions. To run Asset Harvester, please check our [**codebase**](https://github.com/NVIDIA/asset-harvester).
|
| 24 |
-
|
| 25 |
-
<p align="center">
|
| 26 |
-
<img src="docs/pipeline.gif" alt="Asset Harvester teaser" width="100%" style="border: none;">
|
| 27 |
-
</p>
|
| 28 |
|
| 29 |
-
|
| 30 |
|
| 31 |
-
|
| 32 |
-
The input images are encoded by [C-Radio](https://huggingface.co/nvidia/C-RADIO),
|
| 33 |
-
and the multiview diffusion model, [SparseViewDiT](model_cards/MultiviewDiffusion.md), is then used to generate 16 multiview images of the input objects.
|
| 34 |
-
In cases where camera parameters are not provided, the multiview diffusion model includes a camera pose estimation submodule that predicts camera parameters for the input images.
|
| 35 |
-
Lastly, an [Object TokenGS](model_cards/Object_TokenGS.md) lifts the images to a 3D asset.
|
| 36 |
|
| 37 |
This system is ready for commercial/non-commercial use
|
| 38 |
|
| 39 |
-
<details>
|
| 40 |
-
<summary><big><big><strong>🚗 Example Results 🚗</strong></big></big></summary>
|
| 41 |
-
|
| 42 |
-
Each row contains the input image, object mask, and a rendering of the harvested 3DGS asset.
|
| 43 |
-
|
| 44 |
-
#### 1. Vehicles / Trucks / Trailers
|
| 45 |
-
|
| 46 |
-
<table>
|
| 47 |
-
<tr>
|
| 48 |
-
<td align="center"><img src="docs/in_the_wild_examples/bus_01.jpg" width="860"></td>
|
| 49 |
-
</tr>
|
| 50 |
-
<tr>
|
| 51 |
-
<td align="center"><img src="docs/in_the_wild_examples/trailer_01.jpg" width="860"></td>
|
| 52 |
-
</tr>
|
| 53 |
-
<tr>
|
| 54 |
-
<td align="center"><img src="docs/in_the_wild_examples/tractor_01.jpg" width="860"></td>
|
| 55 |
-
</tr>
|
| 56 |
-
<tr>
|
| 57 |
-
<td align="center"><img src="docs/in_the_wild_examples/truck_01.jpg" width="860"></td>
|
| 58 |
-
</tr>
|
| 59 |
-
<tr>
|
| 60 |
-
<td align="center"><img src="docs/in_the_wild_examples/sedan_01.jpg" width="860"></td>
|
| 61 |
-
</tr>
|
| 62 |
-
<tr>
|
| 63 |
-
<td align="center"><img src="docs/in_the_wild_examples/suv_01.jpg" width="860"></td>
|
| 64 |
-
</tr>
|
| 65 |
-
<tr>
|
| 66 |
-
<td align="center"><img src="docs/in_the_wild_examples/suv_02.jpg" width="860"></td>
|
| 67 |
-
</tr>
|
| 68 |
-
<tr>
|
| 69 |
-
<td align="center"><img src="docs/in_the_wild_examples/sedan_02.jpg" width="860"></td>
|
| 70 |
-
</tr>
|
| 71 |
-
</table>
|
| 72 |
-
|
| 73 |
-
#### 2. VRUs
|
| 74 |
-
|
| 75 |
-
<table>
|
| 76 |
-
<tr>
|
| 77 |
-
<td align="center"><img src="docs/in_the_wild_examples/pedestrian_01.jpg" width="860"></td>
|
| 78 |
-
</tr>
|
| 79 |
-
<tr>
|
| 80 |
-
<td align="center"><img src="docs/in_the_wild_examples/pedestrian_03.jpg" width="860"></td>
|
| 81 |
-
</tr>
|
| 82 |
-
<tr>
|
| 83 |
-
<td align="center"><img src="docs/in_the_wild_examples/pedestrian_04.jpg" width="860"></td>
|
| 84 |
-
</tr>
|
| 85 |
-
<tr>
|
| 86 |
-
<td align="center"><img src="docs/in_the_wild_examples/pedestrian_05.jpg" width="860"></td>
|
| 87 |
-
</tr>
|
| 88 |
-
<tr>
|
| 89 |
-
<td align="center"><img src="docs/in_the_wild_examples/pedestrian_06.jpg" width="860"></td>
|
| 90 |
-
</tr>
|
| 91 |
-
<tr>
|
| 92 |
-
<td align="center"><img src="docs/in_the_wild_examples/cyclist_02.jpg" width="860"></td>
|
| 93 |
-
</tr>
|
| 94 |
-
<tr>
|
| 95 |
-
<td align="center"><img src="docs/in_the_wild_examples/stroller_01.jpg" width="860"></td>
|
| 96 |
-
</tr>
|
| 97 |
-
<tr>
|
| 98 |
-
<td align="center"><img src="docs/in_the_wild_examples/stroller_02.jpg" width="860"></td>
|
| 99 |
-
</tr>
|
| 100 |
-
</table>
|
| 101 |
-
|
| 102 |
-
#### 3. Other
|
| 103 |
-
|
| 104 |
-
<table>
|
| 105 |
-
<tr>
|
| 106 |
-
<td align="center"><img src="docs/in_the_wild_examples/bin_01.jpg" width="860"></td>
|
| 107 |
-
</tr>
|
| 108 |
-
</table>
|
| 109 |
-
|
| 110 |
-
</details>
|
| 111 |
-
|
| 112 |
### **License/Terms of Use**:
|
| 113 |
|
| 114 |
### Governing Terms: Use of this model system is governed by the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/) .
|
|
|
|
| 16 |
---
|
| 17 |
|
| 18 |
# Asset Harvester | System Model Card
|
|
|
|
| 19 |
|
| 20 |
+
### [Paper (coming soon)]() | [Project Page (coming soon)](https://research.nvidia.com/labs/sil/asset-harvester) | [Code](https://github.com/NVIDIA/asset-harvester) | [Model](https://huggingface.co/nvidia/asset-harvester) | [Data](https://huggingface.co/datasets/nvidia/PhysicalAI-Autonomous-Vehicles-NCore)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
|
| 22 |
+
## **Description:**
|
| 23 |
|
| 24 |
+
**Asset Harvester** is a system that leverages 4 models (see the white paper for architecture) to generate 3D assets from a single image or multiple images of vehicles or VRUs. The [AV object Mask2former]() instance segmentation model is used for image processing when parsing input views from NCore data sessions. The input images are encoded by [C-Radio](https://huggingface.co/nvidia/C-RADIO), and the multiview diffusion model, [SparseViewDiT](), is then used to generate 16 multiview images of the input objects, and lastly an [Object TokenGS]() lifts the images to a 3D asset.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
|
| 26 |
This system is ready for commercial/non-commercial use
|
| 27 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
### **License/Terms of Use**:
|
| 29 |
|
| 30 |
### Governing Terms: Use of this model system is governed by the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/) .
|
config.json
DELETED
|
@@ -1,27 +0,0 @@
|
|
| 1 |
-
{
|
| 2 |
-
"format_version": 1,
|
| 3 |
-
"name": "Asset Harvester",
|
| 4 |
-
"description": "Bundle manifest for the Asset Harvester system model repository.",
|
| 5 |
-
"components": [
|
| 6 |
-
{
|
| 7 |
-
"name": "camera_estimator",
|
| 8 |
-
"file": "AH_camera_estimator.safetensors",
|
| 9 |
-
"role": "camera estimation"
|
| 10 |
-
},
|
| 11 |
-
{
|
| 12 |
-
"name": "multiview_diffusion",
|
| 13 |
-
"file": "AH_multiview_diffusion.safetensors",
|
| 14 |
-
"role": "multiview image generation"
|
| 15 |
-
},
|
| 16 |
-
{
|
| 17 |
-
"name": "object_segmentation",
|
| 18 |
-
"file": "AH_object_seg_jit.pt",
|
| 19 |
-
"role": "object segmentation"
|
| 20 |
-
},
|
| 21 |
-
{
|
| 22 |
-
"name": "tokengs_lifting",
|
| 23 |
-
"file": "AH_tokengs_lifting.safetensors",
|
| 24 |
-
"role": "3D lifting"
|
| 25 |
-
}
|
| 26 |
-
]
|
| 27 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
docs/in_the_wild_examples/bin_01.jpg
DELETED
|
Binary file (55.4 kB)
|
|
|
docs/in_the_wild_examples/bus_01.jpg
DELETED
|
Binary file (53.4 kB)
|
|
|
docs/in_the_wild_examples/cyclist_02.jpg
DELETED
|
Binary file (64.5 kB)
|
|
|
docs/in_the_wild_examples/pedestrian_01.jpg
DELETED
|
Binary file (57.3 kB)
|
|
|
docs/in_the_wild_examples/pedestrian_03.jpg
DELETED
|
Binary file (59 kB)
|
|
|
docs/in_the_wild_examples/pedestrian_04.jpg
DELETED
|
Binary file (37.8 kB)
|
|
|
docs/in_the_wild_examples/pedestrian_05.jpg
DELETED
|
Binary file (52.3 kB)
|
|
|
docs/in_the_wild_examples/pedestrian_06.jpg
DELETED
|
Binary file (48.2 kB)
|
|
|
docs/in_the_wild_examples/sedan_01.jpg
DELETED
|
Binary file (52.2 kB)
|
|
|
docs/in_the_wild_examples/sedan_02.jpg
DELETED
|
Binary file (50.2 kB)
|
|
|
docs/in_the_wild_examples/stroller_01.jpg
DELETED
|
Binary file (76.2 kB)
|
|
|
docs/in_the_wild_examples/stroller_02.jpg
DELETED
|
Binary file (72.1 kB)
|
|
|
docs/in_the_wild_examples/suv_01.jpg
DELETED
|
Binary file (49.8 kB)
|
|
|
docs/in_the_wild_examples/suv_02.jpg
DELETED
|
Binary file (43.8 kB)
|
|
|
docs/in_the_wild_examples/tractor_01.jpg
DELETED
|
Binary file (62.5 kB)
|
|
|
docs/in_the_wild_examples/trailer_01.jpg
DELETED
|
Binary file (41.4 kB)
|
|
|
docs/in_the_wild_examples/truck_01.jpg
DELETED
|
Binary file (69.2 kB)
|
|
|
model_cards/{MultiviewDiffusion.md → MultviewDiffusion.md}
RENAMED
|
@@ -31,8 +31,7 @@ HuggingFace
|
|
| 31 |
|
| 32 |
**Architecture Type:** Linear Diffusion Transformer
|
| 33 |
|
| 34 |
-
**Network Architecture:**
|
| 35 |
-
with a Deep Compression Autoencoder (DC-AE) for efficient high-resolution image generation. C-RADIO for image conditioning signal.
|
| 36 |
|
| 37 |
## **Input:**
|
| 38 |
|
|
@@ -77,18 +76,18 @@ The model was trained, tested, and finetuned using an Objaverse subset internal
|
|
| 77 |
|
| 78 |
| Dataset names | Size and content | Training partition | Test partition |
|
| 79 |
| :---- | :---- | :---- | :---- |
|
| 80 |
-
| Nvidia
|
| 81 |
| Omniverse 3D assets | 200 3D assets of objects | 100% | 0% |
|
| 82 |
| Objaverse | 80k assets collected under commercially viable Creative Commons licenses, | 100% | 0% |
|
| 83 |
|
| 84 |
-
### Objaverse Commercially Viable Subset
|
| 85 |
|
| 86 |
**Link:** https://objaverse.allenai.org
|
| 87 |
**Data Collection Method:** Synthetic 3D assets aggregated from various open-source and licensed sources
|
| 88 |
**Labeling Method by Dataset:** Hybrid: Human and Automated
|
| 89 |
**Properties:** This dataset consists of a diverse set of over 80,000 synthetic 3D object models spanning everyday items, animals, tools, and complex structures. Each model is rendered into multi-view 2D images with associated camera poses, materials, and mesh properties.
|
| 90 |
|
| 91 |
-
###
|
| 92 |
|
| 93 |
**Data Collection Method:** Sensors
|
| 94 |
|
|
|
|
| 31 |
|
| 32 |
**Architecture Type:** Linear Diffusion Transformer
|
| 33 |
|
| 34 |
+
**Network Architecture:** Linear-attention Diffusion Transformer with a Deep Compression Autoencoder (DC-AE) for efficient high-resolution image generation. C-RADIO for image conditioning signal.
|
|
|
|
| 35 |
|
| 36 |
## **Input:**
|
| 37 |
|
|
|
|
| 76 |
|
| 77 |
| Dataset names | Size and content | Training partition | Test partition |
|
| 78 |
| :---- | :---- | :---- | :---- |
|
| 79 |
+
| Internal Nvidia AV dataset | Posed images of 278k objects | 83% (cross validation) | 17% |
|
| 80 |
| Omniverse 3D assets | 200 3D assets of objects | 100% | 0% |
|
| 81 |
| Objaverse | 80k assets collected under commercially viable Creative Commons licenses, | 100% | 0% |
|
| 82 |
|
| 83 |
+
### Objaverse Commercially Viable Subset
|
| 84 |
|
| 85 |
**Link:** https://objaverse.allenai.org
|
| 86 |
**Data Collection Method:** Synthetic 3D assets aggregated from various open-source and licensed sources
|
| 87 |
**Labeling Method by Dataset:** Hybrid: Human and Automated
|
| 88 |
**Properties:** This dataset consists of a diverse set of over 80,000 synthetic 3D object models spanning everyday items, animals, tools, and complex structures. Each model is rendered into multi-view 2D images with associated camera poses, materials, and mesh properties.
|
| 89 |
|
| 90 |
+
### Internal NVIDIA AV dataset
|
| 91 |
|
| 92 |
**Data Collection Method:** Sensors
|
| 93 |
|