Improve model card: Add `library_name`, abstract, links, and usage example
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,7 +1,9 @@
|
|
| 1 |
---
|
| 2 |
-
license: cc-by-4.0
|
| 3 |
base_model:
|
| 4 |
- ashawkey/mvdream-sd2.1-diffusers
|
|
|
|
|
|
|
|
|
|
| 5 |
pipeline_tag: text-to-3d
|
| 6 |
tags:
|
| 7 |
- multiview
|
|
@@ -9,6 +11,83 @@ tags:
|
|
| 9 |
- RAG
|
| 10 |
- retrieval
|
| 11 |
- diffusion
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
base_model:
|
| 3 |
- ashawkey/mvdream-sd2.1-diffusers
|
| 4 |
+
datasets:
|
| 5 |
+
- yosepyossi/OOD-Eval
|
| 6 |
+
license: cc-by-4.0
|
| 7 |
pipeline_tag: text-to-3d
|
| 8 |
tags:
|
| 9 |
- multiview
|
|
|
|
| 11 |
- RAG
|
| 12 |
- retrieval
|
| 13 |
- diffusion
|
| 14 |
+
library_name: diffusers
|
| 15 |
+
---
|
| 16 |
+
|
| 17 |
+
# MV-RAG: Retrieval Augmented Multiview Diffusion
|
| 18 |
+
|
| 19 |
+
| [Project Page](https://yosefdayani.github.io/MV-RAG/) | [Paper](https://huggingface.co/papers/2508.16577) | [GitHub](https://github.com/yosefdayani/MV-RAG) | [Weights](https://huggingface.co/yosepyossi/mvrag) | [Benchmark (OOD-Eval)](https://huggingface.co/datasets/yosepyossi/OOD-Eval) |
|
| 20 |
+
|
| 21 |
+

|
| 22 |
+
|
| 23 |
+
## Abstract
|
| 24 |
+
Text-to-3D generation approaches have advanced significantly by leveraging pretrained 2D diffusion priors, producing high-quality and 3D-consistent outputs. However, they often fail to produce out-of-domain (OOD) or rare concepts, yielding inconsistent or inaccurate results. To this end, we propose MV-RAG, a novel text-to-3D pipeline that first retrieves relevant 2D images from a large in-the-wild 2D database and then conditions a multiview diffusion model on these images to synthesize consistent and accurate multiview outputs. Training such a retrieval-conditioned model is achieved via a novel hybrid strategy bridging structured multiview data and diverse 2D image collections. This involves training on multiview data using augmented conditioning views that simulate retrieval variance for view-specific reconstruction, alongside training on sets of retrieved real-world 2D images using a distinctive held-out view prediction objective: the model predicts the held-out view from the other views to infer 3D consistency from 2D data. To facilitate a rigorous OOD evaluation, we introduce a new collection of challenging OOD prompts. Experiments against state-of-the-art text-to-3D, image-to-3D, and personalization baselines show that our approach significantly improves 3D consistency, photorealism, and text adherence for OOD/rare concepts, while maintaining competitive performance on standard benchmarks.
|
| 25 |
+
|
| 26 |
+
## Overview
|
| 27 |
+
MV-RAG is a text-to-3D generation method that retrieves 2D reference images to guide a multiview diffusion model. By conditioning on both text and multiple real-world 2D images, MV-RAG improves realism and consistency for rare/out-of-distribution or newly emerging objects.
|
| 28 |
+
|
| 29 |
+
## Installation
|
| 30 |
+
|
| 31 |
+
We recommend creating a fresh conda environment to run MV-RAG:
|
| 32 |
+
|
| 33 |
+
```bash
|
| 34 |
+
# Clone the repository
|
| 35 |
+
git clone https://github.com/yosefdayani/MV-RAG.git
|
| 36 |
+
cd MV-RAG
|
| 37 |
+
|
| 38 |
+
# Create new environment
|
| 39 |
+
conda create -n mvrag python=3.9 -y
|
| 40 |
+
conda activate mvrag
|
| 41 |
+
|
| 42 |
+
# Install PyTorch (adjust CUDA version as needed)
|
| 43 |
+
# Example: CUDA 12.4, PyTorch 2.5.1
|
| 44 |
+
conda install pytorch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 pytorch-cuda=12.4 -c pytorch -c nvidia
|
| 45 |
+
|
| 46 |
+
# Install other dependencies
|
| 47 |
+
pip install -r requirements.txt
|
| 48 |
+
```
|
| 49 |
+
|
| 50 |
+
## Weights
|
| 51 |
+
|
| 52 |
+
MV-RAG weights are available on [Hugging Face](https://huggingface.co/yosepyossi/mvrag).
|
| 53 |
+
```bash
|
| 54 |
+
# Make sure git-lfs is installed (https://git-lfs.com)
|
| 55 |
+
git lfs install
|
| 56 |
+
|
| 57 |
+
git clone https://huggingface.co/yosepyossi/mvrag
|
| 58 |
+
```
|
| 59 |
+
|
| 60 |
+
Then the model weights should appear as MV-RAG/mvrag/...
|
| 61 |
+
|
| 62 |
+
## Usage Example
|
| 63 |
+
You could prompt the model on your retrieved local images by:
|
| 64 |
+
```bash
|
| 65 |
+
python main.py \
|
| 66 |
+
--prompt "Cadillac 341 automobile car" \
|
| 67 |
+
--retriever simple \
|
| 68 |
+
--folder_path "assets/Cadillac 341 automobile car" \
|
| 69 |
+
--seed 0 \
|
| 70 |
+
--k 4 \
|
| 71 |
+
--azimuth_start 45 # or 0 for front view
|
| 72 |
+
```
|
| 73 |
+
|
| 74 |
+
To see all command options run
|
| 75 |
+
```bash
|
| 76 |
+
python main.py --help
|
| 77 |
+
```
|
| 78 |
+
|
| 79 |
+
## Acknowledgement
|
| 80 |
+
This repository is based on [MVDream](https://github.com/bytedance/MVDream) and adapted from [MVDream Diffusers](https://github.com/ashawkey/mvdream_diffusers). We would like to thank the authors of these works for publicly releasing their code.
|
| 81 |
+
|
| 82 |
+
## Citation
|
| 83 |
+
``` bibtex
|
| 84 |
+
@misc{dayani2025mvragretrievalaugmentedmultiview,
|
| 85 |
+
title={MV-RAG: Retrieval Augmented Multiview Diffusion},
|
| 86 |
+
author={Yosef Dayani and Omer Benishu and Sagie Benaim},
|
| 87 |
+
year={2025},
|
| 88 |
+
eprint={2508.16577},
|
| 89 |
+
archivePrefix={arXiv},
|
| 90 |
+
primaryClass={cs.CV},
|
| 91 |
+
url={https://arxiv.org/abs/2508.16577},
|
| 92 |
+
}
|
| 93 |
+
```
|