Update README.md

by kangxuey - opened Mar 23

base: refs/heads/main

←

from: refs/pr/7

Discussion Files changed

-14

Files changed (3) hide show

.gitattributes +0 -1
README.md +5 -10
docs/pipeline.gif +0 -3

.gitattributes CHANGED Viewed

@@ -33,4 +33,3 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
-*.gif filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -16,19 +16,14 @@ pipeline_tag: image-to-3d
 ---
 # Asset Harvester | System Model Card
-[**Paper**](https://arxiv.org/abs/2604.18468)  | [**Live Demo!**](https://huggingface.co/spaces/nvidia/asset-harvester)  | [**Project Page**](https://research.nvidia.com/labs/sil/projects/asset-harvester) | [**Code**](https://github.com/NVIDIA/asset-harvester) | [**Model**](https://huggingface.co/nvidia/asset-harvester) | [**Data**](https://huggingface.co/datasets/nvidia/PhysicalAI-Autonomous-Vehicles-NCore)
 ## **Description:**
-**Asset Harvester** is an image-to-3D model and end-to-end system that converts sparse, in-the-wild object observations from real driving logs into complete, simulation-ready assets. The model generates 3D assets from a single image or multiple images of vehicles, VRUs or other road objects extracted from autonomous driving sessions.  To run Asset Harvester, please check our [**codebase**](https://github.com/NVIDIA/asset-harvester).
-<p align="center">
-  <img src="docs/pipeline.gif" alt="Asset Harvester teaser" width="100%" style="border: none;">
-</p>
-**Asset Harvester** turns real-world driving logs into complete, simulation-ready 3D assets — from just one or a few in-the-wild object views. It handles vehicles, pedestrians, riders, and other road objects, even under heavy occlusion, noisy calibration, and extreme viewpoint bias. A multiview diffusion model generates consistent novel viewpoints, and a feed-forward Gaussian reconstructor lifts them to full 3D in seconds. The result: high-fidelity 3D Gaussian splat assets ready for insertion into simulation environments. The pipeline plugs directly into NVIDIA NCore and NuRec for scalable data ingestion and closed-loop simulation.
-Here's how the model checkpoints in this repo are used in the end-to-end system following the order in the pipeline: The [AV object Mask2former](model_cards/AV_Object_Mask2former.md) instance segmentation model is used for image processing when parsing input views from NCore data sessions.
 The input images are encoded by [C-Radio](https://huggingface.co/nvidia/C-RADIO),
 and the multiview diffusion model, [SparseViewDiT](model_cards/MultiviewDiffusion.md), is then used to generate 16 multiview images of the input objects.
 In cases where camera parameters are not provided, the multiview diffusion model includes a camera pose estimation submodule that predicts camera parameters for the input images.
@@ -37,7 +32,7 @@ Lastly, an [Object TokenGS](model_cards/Object_TokenGS.md) lifts the images to a
 This system is ready for commercial/non-commercial use
 <details>
-<summary><big><big><strong>🚗 Example Results 🚗</strong></big></big></summary>
 Each row contains the input image, object mask, and a rendering of the harvested 3DGS asset.

 ---
 # Asset Harvester | System Model Card
+[**Paper** (coming soon)]() | [**Project Page** (coming soon)](https://research.nvidia.com/labs/sil/asset-harvester) | [**Code**](https://github.com/NVIDIA/asset-harvester) | [**Model**](https://huggingface.co/nvidia/asset-harvester) | [**Data**](https://huggingface.co/datasets/nvidia/PhysicalAI-Autonomous-Vehicles-NCore)
 ## **Description:**
+**Asset Harvester** generates 3D assets from a single image or multiple images of vehicles or VRUs extracted from autonomous driving sessions.
+It leverages 4 models (see the white paper for architecture) in the process.
+The [AV object Mask2former](model_cards/AV_Object_Mask2former.md) instance segmentation model is used for image processing when parsing input views from NCore data sessions.
 The input images are encoded by [C-Radio](https://huggingface.co/nvidia/C-RADIO),
 and the multiview diffusion model, [SparseViewDiT](model_cards/MultiviewDiffusion.md), is then used to generate 16 multiview images of the input objects.
 In cases where camera parameters are not provided, the multiview diffusion model includes a camera pose estimation submodule that predicts camera parameters for the input images.
 This system is ready for commercial/non-commercial use
 <details>
+<summary><big><big><strong>🚗 In-the-Wild Examples 🚗</strong></big></big></summary>
 Each row contains the input image, object mask, and a rendering of the harvested 3DGS asset.

docs/pipeline.gif DELETED Viewed

Git LFS Details

SHA256: 9f7df1837e9eae37f572dc07bd30f96372b89d4c2ec1ba83e626f2acae7abcd8
Pointer size: 132 Bytes
Size of remote file: 3.65 MB