Update README.md
#7
by kangxuey - opened
- .gitattributes +0 -1
- README.md +5 -10
- docs/pipeline.gif +0 -3
.gitattributes
CHANGED
|
@@ -33,4 +33,3 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
-
*.gif filter=lfs diff=lfs merge=lfs -text
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
README.md
CHANGED
|
@@ -16,19 +16,14 @@ pipeline_tag: image-to-3d
|
|
| 16 |
---
|
| 17 |
|
| 18 |
# Asset Harvester | System Model Card
|
| 19 |
-
**Paper**
|
| 20 |
|
| 21 |
## **Description:**
|
| 22 |
|
| 23 |
-
**Asset Harvester**
|
| 24 |
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
</p>
|
| 28 |
-
|
| 29 |
-
**Asset Harvester** turns real-world driving logs into complete, simulation-ready 3D assets β from just one or a few in-the-wild object views. It handles vehicles, pedestrians, riders, and other road objects, even under heavy occlusion, noisy calibration, and extreme viewpoint bias. A multiview diffusion model generates consistent novel viewpoints, and a feed-forward Gaussian reconstructor lifts them to full 3D in seconds. The result: high-fidelity 3D Gaussian splat assets ready for insertion into simulation environments. The pipeline plugs directly into NVIDIA NCore and NuRec for scalable data ingestion and closed-loop simulation.
|
| 30 |
-
|
| 31 |
-
Here's how the model checkpoints in this repo are used in the end-to-end system following the order in the pipeline: The [AV object Mask2former](model_cards/AV_Object_Mask2former.md) instance segmentation model is used for image processing when parsing input views from NCore data sessions.
|
| 32 |
The input images are encoded by [C-Radio](https://huggingface.co/nvidia/C-RADIO),
|
| 33 |
and the multiview diffusion model, [SparseViewDiT](model_cards/MultiviewDiffusion.md), is then used to generate 16 multiview images of the input objects.
|
| 34 |
In cases where camera parameters are not provided, the multiview diffusion model includes a camera pose estimation submodule that predicts camera parameters for the input images.
|
|
@@ -37,7 +32,7 @@ Lastly, an [Object TokenGS](model_cards/Object_TokenGS.md) lifts the images to a
|
|
| 37 |
This system is ready for commercial/non-commercial use
|
| 38 |
|
| 39 |
<details>
|
| 40 |
-
<summary><big><big><strong>π
|
| 41 |
|
| 42 |
Each row contains the input image, object mask, and a rendering of the harvested 3DGS asset.
|
| 43 |
|
|
|
|
| 16 |
---
|
| 17 |
|
| 18 |
# Asset Harvester | System Model Card
|
| 19 |
+
[**Paper** (coming soon)]() | [**Project Page** (coming soon)](https://research.nvidia.com/labs/sil/asset-harvester) | [**Code**](https://github.com/NVIDIA/asset-harvester) | [**Model**](https://huggingface.co/nvidia/asset-harvester) | [**Data**](https://huggingface.co/datasets/nvidia/PhysicalAI-Autonomous-Vehicles-NCore)
|
| 20 |
|
| 21 |
## **Description:**
|
| 22 |
|
| 23 |
+
**Asset Harvester** generates 3D assets from a single image or multiple images of vehicles or VRUs extracted from autonomous driving sessions.
|
| 24 |
|
| 25 |
+
It leverages 4 models (see the white paper for architecture) in the process.
|
| 26 |
+
The [AV object Mask2former](model_cards/AV_Object_Mask2former.md) instance segmentation model is used for image processing when parsing input views from NCore data sessions.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
The input images are encoded by [C-Radio](https://huggingface.co/nvidia/C-RADIO),
|
| 28 |
and the multiview diffusion model, [SparseViewDiT](model_cards/MultiviewDiffusion.md), is then used to generate 16 multiview images of the input objects.
|
| 29 |
In cases where camera parameters are not provided, the multiview diffusion model includes a camera pose estimation submodule that predicts camera parameters for the input images.
|
|
|
|
| 32 |
This system is ready for commercial/non-commercial use
|
| 33 |
|
| 34 |
<details>
|
| 35 |
+
<summary><big><big><strong>π In-the-Wild Examples π</strong></big></big></summary>
|
| 36 |
|
| 37 |
Each row contains the input image, object mask, and a rendering of the harvested 3DGS asset.
|
| 38 |
|
docs/pipeline.gif
DELETED
Git LFS Details
|