snap-research
/

gtr

Image-to-3D

English

Model card Files Files and versions

xet

Community

zoujx96 commited on Aug 4, 2025

Commit

533fe21

verified ·

1 Parent(s): b478f77

Update README.md

Browse files

Files changed (1) hide show

README.md +76 -5

README.md CHANGED Viewed

@@ -1,5 +1,76 @@
----
-license: other
-license_name: snap-non-commercial-license
-license_link: LICENSE
----

+---
+license: other
+license_name: snap-non-commercial-license
+license_link: LICENSE
+datasets:
+- allenai/objaverse
+language:
+- en
+pipeline_tag: image-to-3d
+---
+## Model Details
+GTR is a large 3D reconstruction model that takes multi-view images as input and enables the generation of high-quality meshes with faithful texture reconstruction within seconds.
+## Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [Snap Research](https://github.com/snap-research)
+- **License:** [snap-non-commercial-license](https://huggingface.co/snap-research/gtr/blob/main/LICENSE)
+## Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [https://github.com/snap-research/snap_gtr/tree/main]
+- **Paper [optional]:** [https://arxiv.org/abs/2406.05649]
+## How to Get Started with the Model
+Use the code below to get started with the model.
+### Installation
+We recommend using `Python>=3.10`, `PyTorch==2.7.0`, and `CUDA>=12.4`.
+```bash
+conda create --name gtr python=3.10
+conda activate gtr
+pip install -U pip
+pip install torch==2.7.0 torchvision==0.22.0 torchmetrics==1.2.1 --index-url https://download.pytorch.org/whl/cu124
+pip install -U xformers --index-url https://download.pytorch.org/whl/cu124
+pip install -r requirements.txt
+```
+### How to Use
+Please download model checkpoint from [here](https://drive.google.com/file/d/1ITVqdDLmY5EISj4vsZ2O4sN5mZv9fUfB/view?usp=sharing), and then put it under the `ckpts/` directory.
+We provide multiview grid data examples under `./examples/` generated using [Zero123++](https://github.com/SUDO-AI-3D/zero123plus). Our inference script loads pretrained checkpoint, runs fast texture refinement,  reconstructs the textured mesh from multiview grid data and exports the mesh. There will be 3 files in the output folder, including exported mesh in `.obj` format, rotating gif visuals of mesh and rotating gif visuals of NeRF.
+To infer on multiview data from other sources, simply change camera parameters [here](https://github.sc-corp.net/Snapchat/GTR/blob/main/scripts/prepare_mv.py#L153-L157) accordingly to match the multiview data.
+```bash
+# Preprocessing
+python3 scripts/prepare_mv.py --in_dir ./examples/cute_horse.png --out_dir ./examples/cute_horse
+# Inference
+python3 scripts/inference.py --ckpt_path ckpts/full_checkpoint.pth --in_dir ./examples/cute_horse --out_dir ./outputs/cute_horse
+```
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+```bibtex
+@article{zhuang2024gtr,
+  title={Gtr: Improving large 3d reconstruction models through geometry and texture refinement},
+  author={Zhuang, Peiye and Han, Songfang and Wang, Chaoyang and Siarohin, Aliaksandr and Zou, Jiaxu and Vasilkovsky, Michael and Shakhrai, Vladislav and Korolev, Sergey and Tulyakov, Sergey and Lee, Hsin-Ying},
+  journal={arXiv preprint arXiv:2406.05649},
+  year={2024}
+}
+```