nielsr HF Staff commited on
Commit
0bc57f0
·
verified ·
1 Parent(s): 38f026b

Improve model card: Add `library_name`, abstract, links, and usage example

Browse files

This PR significantly enhances the model card by:

- Adding `library_name: diffusers` to the metadata, which enables the automated "how to use" widget on the Hugging Face Hub.
- Populating the content section with a comprehensive overview from the paper abstract and GitHub README.
- Including direct links to the paper on Hugging Face, the project page, and the GitHub repository.
- Adding a practical usage example with installation steps directly from the original GitHub README.
- Including other relevant sections like Weights, Acknowledgement, and Citation.

This ensures better discoverability and provides users with essential information directly on the model page, making it easier for them to understand and utilize the model.

Files changed (1) hide show
  1. README.md +83 -4
README.md CHANGED
@@ -1,7 +1,9 @@
1
  ---
2
- license: cc-by-4.0
3
  base_model:
4
  - ashawkey/mvdream-sd2.1-diffusers
 
 
 
5
  pipeline_tag: text-to-3d
6
  tags:
7
  - multiview
@@ -9,6 +11,83 @@ tags:
9
  - RAG
10
  - retrieval
11
  - diffusion
12
- datasets:
13
- - yosepyossi/OOD-Eval
14
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  base_model:
3
  - ashawkey/mvdream-sd2.1-diffusers
4
+ datasets:
5
+ - yosepyossi/OOD-Eval
6
+ license: cc-by-4.0
7
  pipeline_tag: text-to-3d
8
  tags:
9
  - multiview
 
11
  - RAG
12
  - retrieval
13
  - diffusion
14
+ library_name: diffusers
15
+ ---
16
+
17
+ # MV-RAG: Retrieval Augmented Multiview Diffusion
18
+
19
+ | [Project Page](https://yosefdayani.github.io/MV-RAG/) | [Paper](https://huggingface.co/papers/2508.16577) | [GitHub](https://github.com/yosefdayani/MV-RAG) | [Weights](https://huggingface.co/yosepyossi/mvrag) | [Benchmark (OOD-Eval)](https://huggingface.co/datasets/yosepyossi/OOD-Eval) |
20
+
21
+ ![teaser](https://yosefdayani.github.io/MV-RAG/static/images/teaser.jpg)
22
+
23
+ ## Abstract
24
+ Text-to-3D generation approaches have advanced significantly by leveraging pretrained 2D diffusion priors, producing high-quality and 3D-consistent outputs. However, they often fail to produce out-of-domain (OOD) or rare concepts, yielding inconsistent or inaccurate results. To this end, we propose MV-RAG, a novel text-to-3D pipeline that first retrieves relevant 2D images from a large in-the-wild 2D database and then conditions a multiview diffusion model on these images to synthesize consistent and accurate multiview outputs. Training such a retrieval-conditioned model is achieved via a novel hybrid strategy bridging structured multiview data and diverse 2D image collections. This involves training on multiview data using augmented conditioning views that simulate retrieval variance for view-specific reconstruction, alongside training on sets of retrieved real-world 2D images using a distinctive held-out view prediction objective: the model predicts the held-out view from the other views to infer 3D consistency from 2D data. To facilitate a rigorous OOD evaluation, we introduce a new collection of challenging OOD prompts. Experiments against state-of-the-art text-to-3D, image-to-3D, and personalization baselines show that our approach significantly improves 3D consistency, photorealism, and text adherence for OOD/rare concepts, while maintaining competitive performance on standard benchmarks.
25
+
26
+ ## Overview
27
+ MV-RAG is a text-to-3D generation method that retrieves 2D reference images to guide a multiview diffusion model. By conditioning on both text and multiple real-world 2D images, MV-RAG improves realism and consistency for rare/out-of-distribution or newly emerging objects.
28
+
29
+ ## Installation
30
+
31
+ We recommend creating a fresh conda environment to run MV-RAG:
32
+
33
+ ```bash
34
+ # Clone the repository
35
+ git clone https://github.com/yosefdayani/MV-RAG.git
36
+ cd MV-RAG
37
+
38
+ # Create new environment
39
+ conda create -n mvrag python=3.9 -y
40
+ conda activate mvrag
41
+
42
+ # Install PyTorch (adjust CUDA version as needed)
43
+ # Example: CUDA 12.4, PyTorch 2.5.1
44
+ conda install pytorch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 pytorch-cuda=12.4 -c pytorch -c nvidia
45
+
46
+ # Install other dependencies
47
+ pip install -r requirements.txt
48
+ ```
49
+
50
+ ## Weights
51
+
52
+ MV-RAG weights are available on [Hugging Face](https://huggingface.co/yosepyossi/mvrag).
53
+ ```bash
54
+ # Make sure git-lfs is installed (https://git-lfs.com)
55
+ git lfs install
56
+
57
+ git clone https://huggingface.co/yosepyossi/mvrag
58
+ ```
59
+
60
+ Then the model weights should appear as MV-RAG/mvrag/...
61
+
62
+ ## Usage Example
63
+ You could prompt the model on your retrieved local images by:
64
+ ```bash
65
+ python main.py \
66
+ --prompt "Cadillac 341 automobile car" \
67
+ --retriever simple \
68
+ --folder_path "assets/Cadillac 341 automobile car" \
69
+ --seed 0 \
70
+ --k 4 \
71
+ --azimuth_start 45 # or 0 for front view
72
+ ```
73
+
74
+ To see all command options run
75
+ ```bash
76
+ python main.py --help
77
+ ```
78
+
79
+ ## Acknowledgement
80
+ This repository is based on [MVDream](https://github.com/bytedance/MVDream) and adapted from [MVDream Diffusers](https://github.com/ashawkey/mvdream_diffusers). We would like to thank the authors of these works for publicly releasing their code.
81
+
82
+ ## Citation
83
+ ``` bibtex
84
+ @misc{dayani2025mvragretrievalaugmentedmultiview,
85
+ title={MV-RAG: Retrieval Augmented Multiview Diffusion},
86
+ author={Yosef Dayani and Omer Benishu and Sagie Benaim},
87
+ year={2025},
88
+ eprint={2508.16577},
89
+ archivePrefix={arXiv},
90
+ primaryClass={cs.CV},
91
+ url={https://arxiv.org/abs/2508.16577},
92
+ }
93
+ ```