CroPond-7B / README.md
WangYipu2002's picture
Upload CroPond-7B
dbf5edd verified
metadata
language:
  - en
license: mit
tags:
  - cross-view
base_model: Qwen/Qwen2.5-VL-7B-Instruct
pipeline_tag: image-to-text

CroPond-7B

arXiv GitHub HuggingFace

CroPond-7B is a vision-language model specialized in cross-view point correspondence. Built upon Qwen2.5-VL-7B-Instruct and mainly trained on the CrossPoint-378K dataset, CroPond achieves state-of-the-art performance on cross-view correspondence tasks.

Evaluation

For detailed evaluation instructions, please visit the GitHub repository.

Citation

@article{wang2025crosspoint,
  title={Towards Cross-View Point Correspondence in Vision-Language Models},
  author={Wang, Yipu and Ji, Yuheng and Liu, Yuyang and Zhou, Enshen and Yang, Ziqiang and Tian, Yuxuan and Qin, Ziheng and Liu, Yue and Tan, Huajie and Chi, Cheng and Ma, Zhiyuan and Zeng, Daniel Dajun and Zheng, Xiaolong},
  journal={arXiv preprint arXiv:2512.04686},
  year={2025}
}