Improve model card: Add `library_name`, abstract, links, and usage example

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +83 -4
README.md CHANGED
@@ -1,7 +1,9 @@
1
  ---
2
- license: cc-by-4.0
3
  base_model:
4
  - ashawkey/mvdream-sd2.1-diffusers
 
 
 
5
  pipeline_tag: text-to-3d
6
  tags:
7
  - multiview
@@ -9,6 +11,83 @@ tags:
9
  - RAG
10
  - retrieval
11
  - diffusion
12
- datasets:
13
- - yosepyossi/OOD-Eval
14
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  base_model:
3
  - ashawkey/mvdream-sd2.1-diffusers
4
+ datasets:
5
+ - yosepyossi/OOD-Eval
6
+ license: cc-by-4.0
7
  pipeline_tag: text-to-3d
8
  tags:
9
  - multiview
 
11
  - RAG
12
  - retrieval
13
  - diffusion
14
+ library_name: diffusers
15
+ ---
16
+
17
+ # MV-RAG: Retrieval Augmented Multiview Diffusion
18
+
19
+ | [Project Page](https://yosefdayani.github.io/MV-RAG/) | [Paper](https://huggingface.co/papers/2508.16577) | [GitHub](https://github.com/yosefdayani/MV-RAG) | [Weights](https://huggingface.co/yosepyossi/mvrag) | [Benchmark (OOD-Eval)](https://huggingface.co/datasets/yosepyossi/OOD-Eval) |
20
+
21
+ ![teaser](https://yosefdayani.github.io/MV-RAG/static/images/teaser.jpg)
22
+
23
+ ## Abstract
24
+ Text-to-3D generation approaches have advanced significantly by leveraging pretrained 2D diffusion priors, producing high-quality and 3D-consistent outputs. However, they often fail to produce out-of-domain (OOD) or rare concepts, yielding inconsistent or inaccurate results. To this end, we propose MV-RAG, a novel text-to-3D pipeline that first retrieves relevant 2D images from a large in-the-wild 2D database and then conditions a multiview diffusion model on these images to synthesize consistent and accurate multiview outputs. Training such a retrieval-conditioned model is achieved via a novel hybrid strategy bridging structured multiview data and diverse 2D image collections. This involves training on multiview data using augmented conditioning views that simulate retrieval variance for view-specific reconstruction, alongside training on sets of retrieved real-world 2D images using a distinctive held-out view prediction objective: the model predicts the held-out view from the other views to infer 3D consistency from 2D data. To facilitate a rigorous OOD evaluation, we introduce a new collection of challenging OOD prompts. Experiments against state-of-the-art text-to-3D, image-to-3D, and personalization baselines show that our approach significantly improves 3D consistency, photorealism, and text adherence for OOD/rare concepts, while maintaining competitive performance on standard benchmarks.
25
+
26
+ ## Overview
27
+ MV-RAG is a text-to-3D generation method that retrieves 2D reference images to guide a multiview diffusion model. By conditioning on both text and multiple real-world 2D images, MV-RAG improves realism and consistency for rare/out-of-distribution or newly emerging objects.
28
+
29
+ ## Installation
30
+
31
+ We recommend creating a fresh conda environment to run MV-RAG:
32
+
33
+ ```bash
34
+ # Clone the repository
35
+ git clone https://github.com/yosefdayani/MV-RAG.git
36
+ cd MV-RAG
37
+
38
+ # Create new environment
39
+ conda create -n mvrag python=3.9 -y
40
+ conda activate mvrag
41
+
42
+ # Install PyTorch (adjust CUDA version as needed)
43
+ # Example: CUDA 12.4, PyTorch 2.5.1
44
+ conda install pytorch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 pytorch-cuda=12.4 -c pytorch -c nvidia
45
+
46
+ # Install other dependencies
47
+ pip install -r requirements.txt
48
+ ```
49
+
50
+ ## Weights
51
+
52
+ MV-RAG weights are available on [Hugging Face](https://huggingface.co/yosepyossi/mvrag).
53
+ ```bash
54
+ # Make sure git-lfs is installed (https://git-lfs.com)
55
+ git lfs install
56
+
57
+ git clone https://huggingface.co/yosepyossi/mvrag
58
+ ```
59
+
60
+ Then the model weights should appear as MV-RAG/mvrag/...
61
+
62
+ ## Usage Example
63
+ You could prompt the model on your retrieved local images by:
64
+ ```bash
65
+ python main.py \
66
+ --prompt "Cadillac 341 automobile car" \
67
+ --retriever simple \
68
+ --folder_path "assets/Cadillac 341 automobile car" \
69
+ --seed 0 \
70
+ --k 4 \
71
+ --azimuth_start 45 # or 0 for front view
72
+ ```
73
+
74
+ To see all command options run
75
+ ```bash
76
+ python main.py --help
77
+ ```
78
+
79
+ ## Acknowledgement
80
+ This repository is based on [MVDream](https://github.com/bytedance/MVDream) and adapted from [MVDream Diffusers](https://github.com/ashawkey/mvdream_diffusers). We would like to thank the authors of these works for publicly releasing their code.
81
+
82
+ ## Citation
83
+ ``` bibtex
84
+ @misc{dayani2025mvragretrievalaugmentedmultiview,
85
+ title={MV-RAG: Retrieval Augmented Multiview Diffusion},
86
+ author={Yosef Dayani and Omer Benishu and Sagie Benaim},
87
+ year={2025},
88
+ eprint={2508.16577},
89
+ archivePrefix={arXiv},
90
+ primaryClass={cs.CV},
91
+ url={https://arxiv.org/abs/2508.16577},
92
+ }
93
+ ```