|
|
--- |
|
|
license: mit |
|
|
pipeline_tag: image-to-3d |
|
|
--- |
|
|
|
|
|
# Gen-3Diffusion: Realistic Image-to-3D Generation via 2D & 3D Diffusion Synergy |
|
|
|
|
|
This repository contains the `Gen-3Diffusion` model, which achieves realistic image-to-3D generation by leveraging a pre-trained 2D diffusion model and a 3D diffusion model, as presented in the paper: |
|
|
[**Gen-3Diffusion: Realistic Image-to-3D Generation via 2D & 3D Diffusion Synergy**](https://huggingface.co/papers/2412.06698) |
|
|
|
|
|
Project Page: [https://yuxuan-xue.com/gen-3diffusion](https://yuxuan-xue.com/gen-3diffusion) |
|
|
Code: [https://github.com/YuxuanSnow/Gen3Diffusion](https://github.com/YuxuanSnow/Gen3Diffusion) |
|
|
|
|
|
 |
|
|
|
|
|
## Key Insight :raised_hands: |
|
|
- 2D foundation models are powerful but output lacks 3D consistency! |
|
|
- 3D generative models can reconstruct 3D representation but is poor in generalization! |
|
|
- How to combine 2D foundation models with 3D generative models?: |
|
|
- they are both diffusion-based generative models => **Can be synchronized at each diffusion step** |
|
|
- 2D foundation model helps 3D generation => **provides strong prior informations about 3D shape** |
|
|
- 3D representation guides 2D diffusion sampling => **use rendered output from 3D reconstruction for reverse sampling, where 3D consistency is guaranteed** |
|
|
|
|
|
## Install |
|
|
Same Conda environment to Human-3Diffusion. Please skip if you already installed it. |
|
|
```bash |
|
|
# Conda environment |
|
|
conda create -n gen3diffusion python=3.10 |
|
|
conda activate gen3diffusion |
|
|
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu121 |
|
|
pip install xformers==0.0.22.post4 --index-url https://download.pytorch.org/whl/cu121 |
|
|
|
|
|
# Gaussian Opacity Fields |
|
|
git clone https://github.com/YuxuanSnow/gaussian-opacity-fields.git |
|
|
cd gaussian-opacity-fields && pip install submodules/diff-gaussian-rasterization |
|
|
pip install submodules/simple-knn/ && cd .. |
|
|
export CPATH=/usr/local/cuda-12.1/targets/x86_64-linux/include:$CPATH |
|
|
|
|
|
# Dependencies |
|
|
pip install -r requirements.txt |
|
|
|
|
|
# TSDF Fusion (Mesh extraction) Dependencies |
|
|
pip install --user numpy opencv-python scikit-image numba |
|
|
pip install --user pycuda |
|
|
pip install scipy==1.11 |
|
|
``` |
|
|
|
|
|
## Pretrained Weights |
|
|
Our pretrained weight can be downloaded from huggingface. |
|
|
```bash |
|
|
mkdir checkpoints_obj && cd checkpoints_obj |
|
|
wget https://huggingface.co/yuxuanx/gen3diffusion/resolve/main/model.safetensors |
|
|
wget https://huggingface.co/yuxuanx/gen3diffusion/resolve/main/model_1.safetensors |
|
|
wget https://huggingface.co/yuxuanx/gen3diffusion/resolve/main/pifuhd.pt |
|
|
cd .. |
|
|
``` |
|
|
The avatar reconstruction module is same to Human-3Diffusion. Please skip if you already installed Human-3Diffusion. |
|
|
```bash |
|
|
mkdir checkpoints_avatar && cd checkpoints_avatar |
|
|
wget https://huggingface.co/yuxuanx/human3diffusion/resolve/main/model.safetensors |
|
|
wget https://huggingface.co/yuxuanx/human3diffusion/resolve/main/model_1.safetensors |
|
|
wget https://huggingface.co/yuxuanx/human3diffusion/resolve/main/pifuhd.pt |
|
|
cd .. |
|
|
``` |
|
|
|
|
|
## Inference |
|
|
```bash |
|
|
# given one image of object, generate 3D-GS object |
|
|
# subject should be centered in a square image, please crop properly |
|
|
# recenter plays a huge role in object reconstruction. Please adjust the recentering if the reconstruction doesn't work well |
|
|
python infer.py --test_imgs test_imgs_obj --output output_obj --checkpoints checkpoints_obj |
|
|
|
|
|
# given generated 3D-GS, perform TSDF mesh extraction |
|
|
python infer_mesh.py --test_imgs test_imgs_obj --output output_obj --checkpoints checkpoints_obj --mesh_quality high |
|
|
``` |
|
|
|
|
|
```bash |
|
|
# given one image of human, generate 3D-GS avatar |
|
|
# subject should be centered in a square image, please crop properly |
|
|
python infer.py --test_imgs test_imgs_avatar --output output_avatar --checkpoints checkpoints_avatar |
|
|
|
|
|
# given generated 3D-GS, perform TSDF mesh extraction |
|
|
python infer_mesh.py --test_imgs test_imgs_avatar --output output_avatar --checkpoints checkpoints_avatar --mesh_quality high |
|
|
``` |
|
|
|
|
|
## Citation :writing_hand: |
|
|
```bibtex |
|
|
@inproceedings{xue2024gen3diffusion, |
|
|
title = {{Gen-3Diffusion: Realistic Image-to-3D Generation via 2D & 3D Diffusion Synergy }}, |
|
|
author = {Xue, Yuxuan and Xie, Xianghui and Marin, Riccardo and Pons-Moll, Gerard.},\ |
|
|
journal = {Arxiv},\ |
|
|
year = {2024},\ |
|
|
} |
|
|
``` |