nvidia
/

asset-harvester

@@ -16,14 +16,19 @@ pipeline_tag: image-to-3d
 ---
 # Asset Harvester | System Model Card
-[**Paper** (coming soon)]() | [**Project Page** (coming soon)](https://research.nvidia.com/labs/sil/asset-harvester) | [**Code**](https://github.com/NVIDIA/asset-harvester) | [**Model**](https://huggingface.co/nvidia/asset-harvester) | [**Data**](https://huggingface.co/datasets/nvidia/PhysicalAI-Autonomous-Vehicles-NCore)
 ## **Description:**
-**Asset Harvester** generates 3D assets from a single image or multiple images of vehicles or VRUs extracted from autonomous driving sessions.
-It leverages 4 models (see the white paper for architecture) in the process.
-The [AV object Mask2former](model_cards/AV_Object_Mask2former.md) instance segmentation model is used for image processing when parsing input views from NCore data sessions.
 The input images are encoded by [C-Radio](https://huggingface.co/nvidia/C-RADIO),
 and the multiview diffusion model, [SparseViewDiT](model_cards/MultiviewDiffusion.md), is then used to generate 16 multiview images of the input objects.
 In cases where camera parameters are not provided, the multiview diffusion model includes a camera pose estimation submodule that predicts camera parameters for the input images.

 ---
 # Asset Harvester | System Model Card
+**Paper**  | **Project Page** | [**Code**](https://github.com/NVIDIA/asset-harvester) | [**Model**](https://huggingface.co/nvidia/asset-harvester) | [**Data**](https://huggingface.co/datasets/nvidia/PhysicalAI-Autonomous-Vehicles-NCore)
 ## **Description:**
+**Asset Harvester** is an image-to-3D model and end-to-end system that converts sparse, in-the-wild object observations from real driving logs into complete, simulation-ready assets. The model generates 3D assets from a single image or multiple images of vehicles, VRUs or other road objects extracted from autonomous driving sessions.  To run Asset Harvester, please check our [**codebase**](https://github.com/NVIDIA/asset-harvester).
+<p align="center">
+  <img src="docs/pipeline.gif" alt="Asset Harvester teaser" width="100%" style="border: none;">
+</p>
+**Asset Harvester** turns real-world driving logs into complete, simulation-ready 3D assets — from just one or a few in-the-wild object views. It handles vehicles, pedestrians, riders, and other road objects, even under heavy occlusion, noisy calibration, and extreme viewpoint bias. A multiview diffusion model generates consistent novel viewpoints, and a feed-forward Gaussian reconstructor lifts them to full 3D in seconds. The result: high-fidelity 3D Gaussian splat assets ready for insertion into simulation environments. The pipeline plugs directly into NVIDIA NCore and NuRec for scalable data ingestion and closed-loop simulation.
+Here's how the model checkpoints in this repo are used in the end-to-end system following the order in the pipeline: The [AV object Mask2former](model_cards/AV_Object_Mask2former.md) instance segmentation model is used for image processing when parsing input views from NCore data sessions.
 The input images are encoded by [C-Radio](https://huggingface.co/nvidia/C-RADIO),
 and the multiview diffusion model, [SparseViewDiT](model_cards/MultiviewDiffusion.md), is then used to generate 16 multiview images of the input objects.
 In cases where camera parameters are not provided, the multiview diffusion model includes a camera pose estimation submodule that predicts camera parameters for the input images.