---
license: mit
library_name: volfill
pipeline_tag: image-to-3d
tags:
- 3d-reconstruction
- amodal-completion
- single-view-reconstruction
- scene-reconstruction
- flow-matching
- diffusion-transformer
- point-cloud
---
# VolFill: Single-View Amodal 3D Scene Reconstruction with Volumetric Flow Matching
Pretrained checkpoints for **VolFill**, which recovers the **complete** 3D scene
geometry — including occluded surfaces — from a **single RGB image**, represented
as a 256³ Truncated Unsigned Distance Function (TUDF) grid.
> Authors: Tuan Duc Ngo¹, Chuang Gan¹, Evangelos Kalogerakis¹˒²
> | ¹University of Massachusetts Amherst ²Technical University of Crete
## Model description
VolFill is a two-stage latent generative model. A **hybrid 3D VAE** (sparse
encoder → dense bottleneck → hybrid dense-to-sparse decoder) compresses the 256³
TUDF to a compact 16³×16ch latent, and a **latent Diffusion Transformer trained
with flow matching** generates that latent — conditioned on (a) frozen MoGe-v2
image features as a global geometric prior and (b) a visible-geometry latent that
anchors the occluded regions. At inference the model encodes the visible region,
samples the DiT for 50 Euler steps with CFG = 3.0, and decodes to a TUDF that is
thresholded into a point cloud or mesh.
## Files
| File | Description |
|---|---|
| `volfill_dit.pth` | Latent flow-matching DiT (visible-latent conditioned, 16× variant) |
| `volfill_vae.pth` | Hybrid 3D VAE (sparse encoder + hybrid decoder) |
| `inference.yaml` | Model architecture + sampler config |
| `latent_stats_16x.npy` | Latent normalization statistics (mean / std) |
The MoGe geometry prior (`Ruicheng/moge-2-vitl`, `Ruicheng/moge-2-vitl-normal`)
is downloaded automatically on first run.
## Usage
Install the inference code from the [GitHub repo](https://github.com/ngoductuanlhp/VolFill)
(CUDA 13.0 / RTX 40-series), then everything in this model repo downloads
automatically:
```bash
# CLI — all weights/config/stats auto-download from this repo
python -m volfill.amodal.inference_latent_visible \
--hf_repo TuanNgo/VolFill --input_path image.jpg --output ./results/
```
```python
from PIL import Image
from volfill.amodal.inference_latent_visible import LatentTUDFVisibleInference
infer = LatentTUDFVisibleInference.from_pretrained("TuanNgo/VolFill")
result = infer(Image.open("image.jpg").convert("RGB"))
# result["tudf"]: (1, 1, 256, 256, 256) predicted TUDF in [-1, 1]
```
See the GitHub README for installation, point-cloud visualization, and local /
Google-Drive checkpoint options.
## Citation
```bibtex
@article{ngo2026volfill,
title = {VolFill: Single-View Amodal 3D Scene Reconstruction with Volumetric Flow Matching},
author = {Ngo, Tuan Duc and Gan, Chuang and Kalogerakis, Evangelos},
journal = {arXiv preprint arXiv:2605.31466},
year = {2026}
}
```
## License & acknowledgements
Released under the MIT License. Built on
[LaRI](https://github.com/ruili3/LaRI), reuses sparse-conv modules from
[TRELLIS](https://github.com/microsoft/TRELLIS), and uses
[MoGe-v2](https://github.com/microsoft/MoGe) as the visible geometry prior.