dtc111
/

UniPR-3D

Image Feature Extraction

place-recognition

visual-place-recognition

computer-vision

Model card Files Files and versions

UniPR-3D / README.md

nielsr's picture

nielsr HF Staff

Add model card and metadata

047bfab verified 26 days ago

|

1.69 kB

	---
	license: mit
	pipeline_tag: image-feature-extraction
	---

	# UniPR-3D: Towards Universal Visual Place Recognition with Visual Geometry Grounded Transformer

	UniPR-3D is a universal visual place recognition (VPR) framework that effectively integrates information from multiple views. It supports both frame-to-frame and sequence-to-sequence matching by leveraging 3D and 2D tokens with tailored aggregation strategies.

	- Paper: [UniPR-3D: Towards Universal Visual Place Recognition with Visual Geometry Grounded Transformer](https://huggingface.co/papers/2512.21078)
	- Repository: [https://github.com/dtc111111/UniPR-3D](https://github.com/dtc111111/UniPR-3D)

	## Description
	UniPR-3D builds on a Visual Geometry Grounded Transformer (VGGT) backbone capable of encoding multi-view 3D representations. To construct its descriptor, the model jointly leverages 3D tokens and intermediate 2D tokens, using dedicated aggregation modules to capture fine-grained texture cues while reasoning across viewpoints. To further enhance generalization, it incorporates both single- and multi-frame aggregation schemes along with a variable-length sequence retrieval strategy. It achieves state-of-the-art performance on several benchmarks, including MSLS, Pittsburgh, NordLand, and SPED.

	## Citation

	If you find our paper and code useful, please cite us:

	```bibtex
	@inproceedings{deng2026_unipr3d,
	title = {UniPR-3D: Towards Universal Visual Place Recognition with 3D Visual Geometry Grounded Transformer},
	author = {Tianchen Deng and Xun Chen and Ziming Li and Hongming Shen and Danwei Wang and Javier Civera and Hesheng Wang},
	booktitle = {Arxiv},
	year = {2026},
	}
	```