Add model card for Multi-view Pyramid Transformer (MVP)

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +81 -0
README.md ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: image-to-3d
3
+ license: apache-2.0
4
+ ---
5
+
6
+ # Multi-view Pyramid Transformer: Look Coarser to See Broader
7
+
8
+ <div align="center">
9
+ <h1><span style="color:#93cf6a;">M</span>ulti-<span style="color:#93cf6a;">v</span>iew <span style="color:#93cf6a;">P</span>yramid Transformer: Look Coarser to See Broader</h1>
10
+
11
+ <a href="https://huggingface.co/papers/2512.07806"><img src="https://img.shields.io/badge/Paper-2512.07806-b31b1b" alt="Paper"></a>
12
+ <a href="https://gynjn.github.io/MVP/"><img src="https://img.shields.io/badge/Project_Page-green" alt="Project Page"></a>
13
+ <a href="https://github.com/Gynjn/MVP"><img src="https://img.shields.io/badge/GitHub-Code-blue.svg?logo=github&" alt="GitHub Code"></a>
14
+ </div>
15
+
16
+ This repository contains the official model for the paper "[Multi-view Pyramid Transformer: Look Coarser to See Broader](https://huggingface.co/papers/2512.07806)".
17
+
18
+ Multi-view Pyramid Transformer (MVP) is a scalable multi-view transformer architecture designed to directly reconstruct large 3D scenes from tens to hundreds of images in a single forward pass. MVP is built on two core design principles:
19
+ 1. **Local-to-global inter-view hierarchy**: Gradually broadens the model's perspective from local views to groups and ultimately the full scene.
20
+ 2. **Fine-to-coarse intra-view hierarchy**: Starts from detailed spatial representations and progressively aggregates them into compact, information-dense tokens.
21
+
22
+ This dual hierarchy achieves both computational efficiency and representational richness, enabling fast reconstruction of large and complex scenes. When coupled with 3D Gaussian Splatting as the underlying 3D representation, MVP achieves state-of-the-art generalizable reconstruction quality while maintaining high efficiency and scalability across a wide range of view configurations.
23
+
24
+ ## Installation
25
+
26
+ To set up the environment and install dependencies:
27
+
28
+ ```bash
29
+ # create conda environment
30
+ conda create -n mvp python=3.11 -y
31
+ conda activate mvp
32
+
33
+ # install PyTorch (adjust cuda version according to your system)
34
+ pip install -r requirements.txt
35
+ pip install git+https://github.com/nerfstudio-project/gsplat.git
36
+ ```
37
+
38
+ ## Checkpoints
39
+
40
+ The model checkpoints are hosted on [HuggingFace](https://huggingface.co/Gynjn/MVP) ([mvp_540x960](https://huggingface.co/Gynjn/MVP/resolve/main/mvp.pt?download=true)).
41
+
42
+ For training and evaluation, we used the DL3DV dataset after applying undistortion preprocessing with this [script](https://github.com/arthurhero/Long-LRM/blob/main/data/prosess_dl3dv.py), originally introduced in [Long-LRM](https://arthurhero.github.io/projects/llrm/index.html).
43
+
44
+ Download the DL3DV benchmark dataset from [here](https://huggingface.co/datasets/DL3DV/DL3DV-Benchmark/tree/main), and apply undistortion preprocessing.
45
+
46
+ ## Inference
47
+
48
+ To perform inference with the pre-trained model:
49
+
50
+ 1. Update the `inference.ckpt_path` field in `configs/inference.yaml` with the path to the downloaded pretrained model.
51
+ 2. Update the entries in `data/dl3dv_eval.txt` to point to the correct processed dataset path.
52
+
53
+ ```bash
54
+ # inference
55
+ CUDA_VISIBLE_DEVICES=0 python inference.py --config configs/inference.yaml
56
+ ```
57
+
58
+ ## Citation
59
+
60
+ If you find our work useful, please cite our paper:
61
+
62
+ ```bibtex
63
+ @article{kang2025multi,
64
+ title={Multi-view Pyramid Transformer: Look Coarser to See Broader},
65
+ author={Kang, Gyeongjin and Yang, Seungkwon and Nam, Seungtae and Lee, Younggeun and Kim, Jungwoo and Park, Eunbyung},
66
+ journal={arXiv preprint arXiv:2512.07806},
67
+ year={2025}
68
+ }
69
+ ```
70
+
71
+ ## Acknowledgements
72
+
73
+ This project is built on many amazing research works, thanks a lot to all the authors for sharing!
74
+
75
+ - [Gaussian-Splatting](https://github.com/graphdeco-inria/gaussian-splatting) and [gsplat](https://github.com/nerfstudio-project/gsplat)
76
+ - [LVSM](https://github.com/haian-jin/LVSM)
77
+ - [Long-LRM](https://github.com/arthurhero/Long-LRM)
78
+ - [LaCT](https://github.com/a1600012888/LaCT)
79
+ - [iLRM](https://github.com/Gynjn/iLRM)
80
+ - [ProPE](https://github.com/liruilong940607/prope)
81
+ - [LVT](https://toobaimt.github.io/lvt/)