Update README.md
#8
by kangxuey - opened
README.md
CHANGED
|
@@ -16,14 +16,19 @@ pipeline_tag: image-to-3d
|
|
| 16 |
---
|
| 17 |
|
| 18 |
# Asset Harvester | System Model Card
|
| 19 |
-
|
| 20 |
|
| 21 |
## **Description:**
|
| 22 |
|
| 23 |
-
**Asset Harvester** generates 3D assets from a single image or multiple images of vehicles or
|
| 24 |
|
| 25 |
-
|
| 26 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
The input images are encoded by [C-Radio](https://huggingface.co/nvidia/C-RADIO),
|
| 28 |
and the multiview diffusion model, [SparseViewDiT](model_cards/MultiviewDiffusion.md), is then used to generate 16 multiview images of the input objects.
|
| 29 |
In cases where camera parameters are not provided, the multiview diffusion model includes a camera pose estimation submodule that predicts camera parameters for the input images.
|
|
|
|
| 16 |
---
|
| 17 |
|
| 18 |
# Asset Harvester | System Model Card
|
| 19 |
+
**Paper** | **Project Page** | [**Code**](https://github.com/NVIDIA/asset-harvester) | [**Model**](https://huggingface.co/nvidia/asset-harvester) | [**Data**](https://huggingface.co/datasets/nvidia/PhysicalAI-Autonomous-Vehicles-NCore)
|
| 20 |
|
| 21 |
## **Description:**
|
| 22 |
|
| 23 |
+
**Asset Harvester** is an image-to-3D model and end-to-end system that converts sparse, in-the-wild object observations from real driving logs into complete, simulation-ready assets. The model generates 3D assets from a single image or multiple images of vehicles, VRUs or other road objects extracted from autonomous driving sessions. To run Asset Harvester, please check our [**codebase**](https://github.com/NVIDIA/asset-harvester).
|
| 24 |
|
| 25 |
+
<p align="center">
|
| 26 |
+
<img src="docs/pipeline.gif" alt="Asset Harvester teaser" width="100%" style="border: none;">
|
| 27 |
+
</p>
|
| 28 |
+
|
| 29 |
+
**Asset Harvester** turns real-world driving logs into complete, simulation-ready 3D assets — from just one or a few in-the-wild object views. It handles vehicles, pedestrians, riders, and other road objects, even under heavy occlusion, noisy calibration, and extreme viewpoint bias. A multiview diffusion model generates consistent novel viewpoints, and a feed-forward Gaussian reconstructor lifts them to full 3D in seconds. The result: high-fidelity 3D Gaussian splat assets ready for insertion into simulation environments. The pipeline plugs directly into NVIDIA NCore and NuRec for scalable data ingestion and closed-loop simulation.
|
| 30 |
+
|
| 31 |
+
Here's how the model checkpoints in this repo are used in the end-to-end system following the order in the pipeline: The [AV object Mask2former](model_cards/AV_Object_Mask2former.md) instance segmentation model is used for image processing when parsing input views from NCore data sessions.
|
| 32 |
The input images are encoded by [C-Radio](https://huggingface.co/nvidia/C-RADIO),
|
| 33 |
and the multiview diffusion model, [SparseViewDiT](model_cards/MultiviewDiffusion.md), is then used to generate 16 multiview images of the input objects.
|
| 34 |
In cases where camera parameters are not provided, the multiview diffusion model includes a camera pose estimation submodule that predicts camera parameters for the input images.
|