| | --- |
| | license: cc-by-nc-2.0 |
| | pipeline_tag: image-to-3d |
| | library_name: transformers |
| | --- |
| | # [ECCV 2024] VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models |
| |
|
| | [Porject page](https://junlinhan.github.io/projects/vfusion3d.html), [Paper link](https://arxiv.org/abs/2403.12034) |
| |
|
| | VFusion3D is a large, feed-forward 3D generative model trained with a small amount of 3D data and a large volume of synthetic multi-view data. It is the first work exploring scalable 3D generative/reconstruction models as a step towards a 3D foundation. |
| |
|
| | [VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models](https://junlinhan.github.io/projects/vfusion3d.html)<br> |
| | [Junlin Han](https://junlinhan.github.io/), [Filippos Kokkinos](https://www.fkokkinos.com/), [Philip Torr](https://www.robots.ox.ac.uk/~phst/)<br> |
| | GenAI, Meta and TVG, University of Oxford<br> |
| | European Conference on Computer Vision (ECCV), 2024 |
| |
|
| |
|
| | ## News |
| |
|
| | - [08.08.2024] [HF Demo](https://huggingface.co/spaces/facebook/VFusion3D) is available, big thanks to [Jade Choghari](https://github.com/jadechoghari)'s help for making it possible. |
| | - [25.07.2024] Release weights and inference code for VFusion3D. |
| |
|
| |
|
| |
|
| | ## Quick Start |
| |
|
| | Getting started with VFusion3D is super easy! 🤗 Here’s how you can use the model with Hugging Face: |
| |
|
| | ### Install Dependencies (Optional) |
| |
|
| | Depending on your needs, you may want to enable specific features like mesh generation or video rendering. We've got you covered with these additional packages: |
| |
|
| | ```bash |
| | !pip --quiet install imageio[ffmpeg] PyMCubes trimesh rembg[gpu,cli] kiui |
| | ``` |
| |
|
| | ### Load model directly |
| | ```python |
| | import torch |
| | from transformers import AutoModel, AutoProcessor |
| | |
| | # load the model and processor |
| | model = AutoModel.from_pretrained("jadechoghari/vfusion3d", trust_remote_code=True) |
| | processor = AutoProcessor.from_pretrained("jadechoghari/vfusion3d") |
| | |
| | # download and preprocess the image |
| | import requests |
| | from PIL import Image |
| | from io import BytesIO |
| | |
| | image_url = 'https://sm.ign.com/ign_nordic/cover/a/avatar-gen/avatar-generations_prsz.jpg' |
| | response = requests.get(image_url) |
| | image = Image.open(BytesIO(response.content)) |
| | |
| | # preprocess the image and get the source camera |
| | image, source_camera = processor(image) |
| | |
| | |
| | # generate planes (default output) |
| | output_planes = model(image, source_camera) |
| | print("Planes shape:", output_planes.shape) |
| | |
| | # generate a 3D mesh |
| | output_planes, mesh_path = model(image, source_camera, export_mesh=True) |
| | print("Planes shape:", output_planes.shape) |
| | print("Mesh saved at:", mesh_path) |
| | |
| | # Generate a video |
| | output_planes, video_path = model(image, source_camera, export_video=True) |
| | print("Planes shape:", output_planes.shape) |
| | print("Video saved at:", video_path) |
| | |
| | ``` |
| | - **Default (Planes):** By default, VFusion3D outputs planes—ideal for further 3D operations. |
| | - **Export Mesh:** Want a 3D mesh? Just set `export_mesh=True`, and you'll get a `.obj` file ready to roll. You can also customize the mesh resolution by adjusting the `mesh_size` parameter. |
| | - **Export Video:** Fancy a 3D video? Set `export_video=True`, and you'll receive a beautifully rendered video from multiple angles. You can tweak `render_size` and `fps` to get the video just right. |
| |
|
| | Check out our [demo app](https://huggingface.co/spaces/facebook/VFusion3D) to see VFusion3D in action! 🤗 |
| |
|
| | ## Results and Comparisons |
| |
|
| | ### 3D Generation Results |
| | <img src='assets/gif1.gif' width=950> |
| |
|
| | <img src='assets/gif2.gif' width=950> |
| |
|
| | ### User Study Results |
| | <img src='assets/user.png' width=950> |
| |
|
| |
|
| |
|
| | ## Acknowledgement |
| |
|
| | - This inference code of VFusion3D heavily borrows from [OpenLRM](https://github.com/3DTopia/OpenLRM). |
| |
|
| | ## Citation |
| |
|
| | If you find this work useful, please cite us: |
| |
|
| |
|
| | ``` |
| | @article{han2024vfusion3d, |
| | title={VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models}, |
| | author={Junlin Han and Filippos Kokkinos and Philip Torr}, |
| | journal={European Conference on Computer Vision (ECCV)}, |
| | year={2024} |
| | } |
| | ``` |
| |
|
| | ## License |
| |
|
| | - The majority of VFusion3D is licensed under CC-BY-NC, however portions of the project are available under separate license terms: OpenLRM as a whole is licensed under the Apache License, Version 2.0, while certain components are covered by NVIDIA's proprietary license. |
| | - The model weights of VFusion3D is also licensed under CC-BY-NC. |