To-Hitori
/

GeoMVD

@@ -1,18 +1,93 @@
 ---
-license: apache-2.0
-datasets:
-- allenai/objaverse
 base_model:
 - sudo-ai/zero123plus-v1.2
-pipeline_tag: image-to-image
 ---
-- base_model: GeoMVD is a modified version of the zero123++ weight files.
-- LoRA_ckpt: The LoRA weights that we trained.

 ---
 base_model:
 - sudo-ai/zero123plus-v1.2
+datasets:
+- allenai/objaverse
+license: apache-2.0
+pipeline_tag: image-to-3d
+library_name: diffusers
 ---
+# GeoMVD: Geometry-Enhanced Multi-View Generation Model Based on Geometric Information Extraction
+[![arXiv](https://img.shields.io/badge/arXiv-2511.12204-b31b1b.svg)](https://huggingface.co/papers/2511.12204) [![Project Page](https://img.shields.io/badge/Project_Page-GeoMVD-blue)](https://sobeymil.github.io/GeoMVD.com/) [![GitHub](https://img.shields.io/badge/GitHub-Code-181717?logo=github)](https://github.com/SobeyMIL/GeoMVD)
+This repository presents **GeoMVD: Geometry-Enhanced Multi-View Generation Model Based on Geometric Information Extraction**, a novel approach for multi-view image generation, specifically designed for 3D reconstruction, virtual reality, and augmented reality applications. GeoMVD is a modified version of the Zero123++ weight files, incorporating mechanisms for extracting multi-view geometric information to generate high-quality and consistent images across views.
+![top](https://github.com/SobeyMIL/GeoMVD/raw/main/assert/top.png)
+## Abstract
+Multi-view image generation holds significant application value in computer vision, particularly in domains like 3D reconstruction, virtual reality, and augmented reality. Most existing methods, which rely on extending single images, face notable computational challenges in maintaining cross-view consistency and generating high-resolution outputs. To address these issues, we propose the Geometry-guided Multi-View Diffusion Model, which incorporates mechanisms for extracting multi-view geometric information and adjusting the intensity of geometric features to generate images that are both consistent across views and rich in detail. Specifically, we design a multi-view geometry information extraction module that leverages depth maps, normal maps, and foreground segmentation masks to construct a shared geometric structure, ensuring shape and structural consistency across different views. To enhance consistency and detail restoration during generation, we develop a decoupled geometry-enhanced attention mechanism that strengthens feature focus on key geometric details, thereby improving overall image quality and detail preservation. Furthermore, we apply an adaptive learning strategy that fine-tunes the model to better capture spatial relationships and visual coherence between the generated views, ensuring realistic results. Our model also incorporates an iterative refinement process that progressively improves the output quality through multiple stages of image generation. Finally, a dynamic geometry information intensity adjustment mechanism is proposed to adaptively regulate the influence of geometric data, optimizing overall quality while ensuring the naturalness of generated images.
+## Model Details
+*   **Base Model**: GeoMVD is a modified version of the [sudo-ai/zero123plus-v1.2](https://huggingface.co/sudo-ai/zero123plus-v1.2) weight files.
+*   **LoRA Checkpoint**: This repository contains the LoRA weights that we trained.
+## Dependencies and Installation
+```bash
+conda create --name GeoMVD python=3.10
+conda activate GeoMVD
+# Install PyTorch
+pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124
+# Install other requirements
+pip install -r requirements.txt
+```
+## How to Use
+### Download the models
+- [To-Hitori/GeoMVD at main](https://huggingface.co/To-Hitori/GeoMVD/tree/main/GeoMVD)
+- [stabilityai/stable-diffusion-2 · Hugging Face](https://huggingface.co/stabilityai/stable-diffusion-2)
+- [lemonaddie/Geowizard · Hugging Face](https://huggingface.co/lemonaddie/Geowizard)
+### step1 Generate Geo Image
+```bash
+cd GIEM
+# 1. Open run.py and modify the input and output paths.
+# Format of the input path
+# INPUT_PATH
+# 	- image1.png
+#   - image2.png
+#   - ...
+# The OUTPUT_PATH will be created at the location you specify.
+# 2. Open GeoMVD\GIEM\src\run_geowizard.py and modify the path to the relevant weights.
+python run.py
+# Get the output image at OUTPUT_PATH
+```
+### step2 Generate Multi View Image
+```python
+cd ../
+# Open run_geomvd.py and modify the parameter pipeline_input to be OUTPUT_PATH from step1.
+# Modify ckpt_path to the location where you downloaded the weights.
+python run_geomvd.py
+# The output result can be found at the corresponding location in OUTPUT_PATH.
+```
+## Citation
+Please kindly cite our paper if you use our code, data, models or results:
+```bibtex
+@inproceedings{wu2025geomvd,
+        title={GeoMVD: Geometry-Enhanced Multi-View Generation Model Based on Geometric Information Extraction},
+        author={Jiaqi Wu and Yaosen Chen and Shuyuan Zhu},
+        year={2025},
+        booktitle = {arxiv}
+}
+```
+## Acknowledgements
+We thank the authors of the following projects for their excellent contributions to 3D generative AI!
+- [Zero123++](https://github.com/SUDO-AI-3D/zero123plus)
+- [InstantMesh](https://github.com/TencentARC/InstantMesh)
+- [Era3D](https://github.com/pengHTYX/Era3D)