Update README.md

#7
by kangxuey - opened
Files changed (3) hide show
  1. .gitattributes +0 -1
  2. README.md +5 -10
  3. docs/pipeline.gif +0 -3
.gitattributes CHANGED
@@ -33,4 +33,3 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
- *.gif filter=lfs diff=lfs merge=lfs -text
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
README.md CHANGED
@@ -16,19 +16,14 @@ pipeline_tag: image-to-3d
16
  ---
17
 
18
  # Asset Harvester | System Model Card
19
- **Paper** | **Project Page** | [**Code**](https://github.com/NVIDIA/asset-harvester) | [**Model**](https://huggingface.co/nvidia/asset-harvester) | [**Data**](https://huggingface.co/datasets/nvidia/PhysicalAI-Autonomous-Vehicles-NCore)
20
 
21
  ## **Description:**
22
 
23
- **Asset Harvester** is an image-to-3D model and end-to-end system that converts sparse, in-the-wild object observations from real driving logs into complete, simulation-ready assets. The model generates 3D assets from a single image or multiple images of vehicles, VRUs or other road objects extracted from autonomous driving sessions. To run Asset Harvester, please check our [**codebase**](https://github.com/NVIDIA/asset-harvester).
24
 
25
- <p align="center">
26
- <img src="docs/pipeline.gif" alt="Asset Harvester teaser" width="100%" style="border: none;">
27
- </p>
28
-
29
- **Asset Harvester** turns real-world driving logs into complete, simulation-ready 3D assets β€” from just one or a few in-the-wild object views. It handles vehicles, pedestrians, riders, and other road objects, even under heavy occlusion, noisy calibration, and extreme viewpoint bias. A multiview diffusion model generates consistent novel viewpoints, and a feed-forward Gaussian reconstructor lifts them to full 3D in seconds. The result: high-fidelity 3D Gaussian splat assets ready for insertion into simulation environments. The pipeline plugs directly into NVIDIA NCore and NuRec for scalable data ingestion and closed-loop simulation.
30
-
31
- Here's how the model checkpoints in this repo are used in the end-to-end system following the order in the pipeline: The [AV object Mask2former](model_cards/AV_Object_Mask2former.md) instance segmentation model is used for image processing when parsing input views from NCore data sessions.
32
  The input images are encoded by [C-Radio](https://huggingface.co/nvidia/C-RADIO),
33
  and the multiview diffusion model, [SparseViewDiT](model_cards/MultiviewDiffusion.md), is then used to generate 16 multiview images of the input objects.
34
  In cases where camera parameters are not provided, the multiview diffusion model includes a camera pose estimation submodule that predicts camera parameters for the input images.
@@ -37,7 +32,7 @@ Lastly, an [Object TokenGS](model_cards/Object_TokenGS.md) lifts the images to a
37
  This system is ready for commercial/non-commercial use
38
 
39
  <details>
40
- <summary><big><big><strong>πŸš— Example Results πŸš—</strong></big></big></summary>
41
 
42
  Each row contains the input image, object mask, and a rendering of the harvested 3DGS asset.
43
 
 
16
  ---
17
 
18
  # Asset Harvester | System Model Card
19
+ [**Paper** (coming soon)]() | [**Project Page** (coming soon)](https://research.nvidia.com/labs/sil/asset-harvester) | [**Code**](https://github.com/NVIDIA/asset-harvester) | [**Model**](https://huggingface.co/nvidia/asset-harvester) | [**Data**](https://huggingface.co/datasets/nvidia/PhysicalAI-Autonomous-Vehicles-NCore)
20
 
21
  ## **Description:**
22
 
23
+ **Asset Harvester** generates 3D assets from a single image or multiple images of vehicles or VRUs extracted from autonomous driving sessions.
24
 
25
+ It leverages 4 models (see the white paper for architecture) in the process.
26
+ The [AV object Mask2former](model_cards/AV_Object_Mask2former.md) instance segmentation model is used for image processing when parsing input views from NCore data sessions.
 
 
 
 
 
27
  The input images are encoded by [C-Radio](https://huggingface.co/nvidia/C-RADIO),
28
  and the multiview diffusion model, [SparseViewDiT](model_cards/MultiviewDiffusion.md), is then used to generate 16 multiview images of the input objects.
29
  In cases where camera parameters are not provided, the multiview diffusion model includes a camera pose estimation submodule that predicts camera parameters for the input images.
 
32
  This system is ready for commercial/non-commercial use
33
 
34
  <details>
35
+ <summary><big><big><strong>πŸš— In-the-Wild Examples πŸš—</strong></big></big></summary>
36
 
37
  Each row contains the input image, object mask, and a rendering of the harvested 3DGS asset.
38
 
docs/pipeline.gif DELETED

Git LFS Details

  • SHA256: 9f7df1837e9eae37f572dc07bd30f96372b89d4c2ec1ba83e626f2acae7abcd8
  • Pointer size: 132 Bytes
  • Size of remote file: 3.65 MB