Update pipeline tag, add library name, and expand model card for GeoMVD

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +81 -6
README.md CHANGED
@@ -1,18 +1,93 @@
1
  ---
2
- license: apache-2.0
3
- datasets:
4
- - allenai/objaverse
5
  base_model:
6
  - sudo-ai/zero123plus-v1.2
7
- pipeline_tag: image-to-image
 
 
 
 
8
  ---
9
 
10
- - base_model: GeoMVD is a modified version of the zero123++ weight files.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
- - LoRA_ckpt: The LoRA weights that we trained.
13
 
 
 
 
 
 
 
 
14
 
 
15
 
 
 
 
 
 
 
 
 
 
16
 
 
17
 
 
18
 
 
 
 
 
1
  ---
 
 
 
2
  base_model:
3
  - sudo-ai/zero123plus-v1.2
4
+ datasets:
5
+ - allenai/objaverse
6
+ license: apache-2.0
7
+ pipeline_tag: image-to-3d
8
+ library_name: diffusers
9
  ---
10
 
11
+ # GeoMVD: Geometry-Enhanced Multi-View Generation Model Based on Geometric Information Extraction
12
+
13
+ [![arXiv](https://img.shields.io/badge/arXiv-2511.12204-b31b1b.svg)](https://huggingface.co/papers/2511.12204) [![Project Page](https://img.shields.io/badge/Project_Page-GeoMVD-blue)](https://sobeymil.github.io/GeoMVD.com/) [![GitHub](https://img.shields.io/badge/GitHub-Code-181717?logo=github)](https://github.com/SobeyMIL/GeoMVD)
14
+
15
+ This repository presents **GeoMVD: Geometry-Enhanced Multi-View Generation Model Based on Geometric Information Extraction**, a novel approach for multi-view image generation, specifically designed for 3D reconstruction, virtual reality, and augmented reality applications. GeoMVD is a modified version of the Zero123++ weight files, incorporating mechanisms for extracting multi-view geometric information to generate high-quality and consistent images across views.
16
+
17
+ ![top](https://github.com/SobeyMIL/GeoMVD/raw/main/assert/top.png)
18
+
19
+ ## Abstract
20
+ Multi-view image generation holds significant application value in computer vision, particularly in domains like 3D reconstruction, virtual reality, and augmented reality. Most existing methods, which rely on extending single images, face notable computational challenges in maintaining cross-view consistency and generating high-resolution outputs. To address these issues, we propose the Geometry-guided Multi-View Diffusion Model, which incorporates mechanisms for extracting multi-view geometric information and adjusting the intensity of geometric features to generate images that are both consistent across views and rich in detail. Specifically, we design a multi-view geometry information extraction module that leverages depth maps, normal maps, and foreground segmentation masks to construct a shared geometric structure, ensuring shape and structural consistency across different views. To enhance consistency and detail restoration during generation, we develop a decoupled geometry-enhanced attention mechanism that strengthens feature focus on key geometric details, thereby improving overall image quality and detail preservation. Furthermore, we apply an adaptive learning strategy that fine-tunes the model to better capture spatial relationships and visual coherence between the generated views, ensuring realistic results. Our model also incorporates an iterative refinement process that progressively improves the output quality through multiple stages of image generation. Finally, a dynamic geometry information intensity adjustment mechanism is proposed to adaptively regulate the influence of geometric data, optimizing overall quality while ensuring the naturalness of generated images.
21
+
22
+ ## Model Details
23
+ * **Base Model**: GeoMVD is a modified version of the [sudo-ai/zero123plus-v1.2](https://huggingface.co/sudo-ai/zero123plus-v1.2) weight files.
24
+ * **LoRA Checkpoint**: This repository contains the LoRA weights that we trained.
25
+
26
+ ## Dependencies and Installation
27
+
28
+ ```bash
29
+ conda create --name GeoMVD python=3.10
30
+ conda activate GeoMVD
31
+
32
+ # Install PyTorch
33
+ pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124
34
+
35
+ # Install other requirements
36
+ pip install -r requirements.txt
37
+ ```
38
+
39
+ ## How to Use
40
+
41
+ ### Download the models
42
+
43
+ - [To-Hitori/GeoMVD at main](https://huggingface.co/To-Hitori/GeoMVD/tree/main/GeoMVD)
44
+ - [stabilityai/stable-diffusion-2 · Hugging Face](https://huggingface.co/stabilityai/stable-diffusion-2)
45
+ - [lemonaddie/Geowizard · Hugging Face](https://huggingface.co/lemonaddie/Geowizard)
46
+
47
+ ### step1 Generate Geo Image
48
+
49
+ ```bash
50
+ cd GIEM
51
+ # 1. Open run.py and modify the input and output paths.
52
+ # Format of the input path
53
+ # INPUT_PATH
54
+ # - image1.png
55
+ # - image2.png
56
+ # - ...
57
+ # The OUTPUT_PATH will be created at the location you specify.
58
+
59
+ # 2. Open GeoMVD\GIEM\src\run_geowizard.py and modify the path to the relevant weights.
60
+
61
+ python run.py
62
+ # Get the output image at OUTPUT_PATH
63
+ ```
64
 
65
+ ### step2 Generate Multi View Image
66
 
67
+ ```python
68
+ cd ../
69
+ # Open run_geomvd.py and modify the parameter pipeline_input to be OUTPUT_PATH from step1.
70
+ # Modify ckpt_path to the location where you downloaded the weights.
71
+ python run_geomvd.py
72
+ # The output result can be found at the corresponding location in OUTPUT_PATH.
73
+ ```
74
 
75
+ ## Citation
76
 
77
+ Please kindly cite our paper if you use our code, data, models or results:
78
+ ```bibtex
79
+ @inproceedings{wu2025geomvd,
80
+ title={GeoMVD: Geometry-Enhanced Multi-View Generation Model Based on Geometric Information Extraction},
81
+ author={Jiaqi Wu and Yaosen Chen and Shuyuan Zhu},
82
+ year={2025},
83
+ booktitle = {arxiv}
84
+ }
85
+ ```
86
 
87
+ ## Acknowledgements
88
 
89
+ We thank the authors of the following projects for their excellent contributions to 3D generative AI!
90
 
91
+ - [Zero123++](https://github.com/SUDO-AI-3D/zero123plus)
92
+ - [InstantMesh](https://github.com/TencentARC/InstantMesh)
93
+ - [Era3D](https://github.com/pengHTYX/Era3D)