Add library_name and improve model card metadata

Hi! I'm Niels from the Hugging Face community team.

I'm opening this PR to add `library_name: transformers` to the model metadata. This allows the Hub to recognize the model's compatibility and provide the "Use in Transformers" button. I have also added task-specific tags like `geolocation` and `reasoning` to help users discover your work.

I've also updated the citation section with a BibTeX entry based on the paper information provided.

Files changed (1) hide show

README.md +19 -23

README.md CHANGED Viewed

@@ -1,14 +1,19 @@
 ---
 base_model:
 - Qwen/Qwen2.5-VL-7B-Instruct
 language:
 - en
 license: cc-by-nc-4.0
 pipeline_tag: image-text-to-text
 tags:
 - image
-datasets:
-- ghost233lism/GeoSeek
 ---
 <div align="center">
@@ -35,11 +40,7 @@ datasets:
 </div>
-<!-- ![teaser](assets/teaser.png) -->
-**GeoAgent** is a vision-language model for **image geolocation** that reasons closely with humans and derives fine-grained address conclusions. Built upon [Qwen2.5-VL](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct), it achieves strong performance across multiple geographic grains (city, region, country, continent) while generating interpretable chain-of-thought reasoning.
 GeoAgent introduces:
@@ -52,17 +53,6 @@ We also introduce [**GeoSeek**](https://huggingface.co/datasets/ghost233lism/Geo
 - **GeoSeek-Loc** (20k): Images for RL-based finetuning, sampled via a stratified strategy considering population, land area, and highway mileage to reduce geographic bias.
 - **GeoSeek-Val** (3k): Validation benchmark with locatability scores and scene categories (manmade structures, natural landscapes, etc.) for evaluation.
-<!-- <div align="center">
-<img src="assets/depthanything-AC-video.gif" alt="video" width="100%">
-</div> -->
-<!-- ## Model Architecture -->
-<!-- ![architecture](assets/pipeline.png) -->
 ## Installation
 ### Requirements
@@ -102,7 +92,7 @@ huggingface-cli download --resume-download ghost233lism/GeoAgent --local-dir gho
 ### Quick Inference
-We provide the quick inference scripts for single/batch image input in `infer/`.  Please refer to [infer/README](https://github.com/HVision-NKU/GeoAgent/infer/README.md) for detailed information.
 ### Training
@@ -115,7 +105,14 @@ bash tools/train_grpo.sh
 ## Citation
-Coming soon...
 ## License
@@ -132,7 +129,6 @@ For commercial licensing, please contact andrewhoux[AT]gmail.com.
 ## Acknowledgments
-We sincerely thank [Yue Zhang](https://tuxun.fun/), [H.M.](https://space.bilibili.com/1655209518?spm_id_from=333.337.0.0), [Haowen He](https://space.bilibili.com/111714204?spm_id_from=333.337.0.0), [Yuke Jun](https://space.bilibili.com/93569847?spm_id_from=333.337.0.0), and other experts in geography, as well as outstanding geolocation game players, for their valuable guidance, prompt design suggestions, and data support throughout the construction of the GeoSeek dataset.
-We also thank [Zhixiang Wang](https://tuxun.fun/), [Chilin Chen](https://tuxun.fun/), [Jincheng Shi](https://tuxun.fun/), [Liupeng Zhang](https://tuxun.fun/), [Yuan Gu](https://tuxun.fun/), [Yanghang Shao](https://tuxun.fun/), [Jinhua Zhang](https://tuxun.fun/), [Jiachen Zhu](https://tuxun.fun/), [Gucheng Qiuyue](https://tuxun.fun/), [Qingyang Guo](https://tuxun.fun/), [Jingchen Yang](https://tuxun.fun/), [Weilong Kong](https://tuxun.fun/), [Xinyuan Li](https://tuxun.fun/), and [Mr. Xu](https://tuxun.fun/) (an anonymous volunteer)
-for their outstanding contributions in providing high-quality reasoning process data.

 ---
 base_model:
 - Qwen/Qwen2.5-VL-7B-Instruct
+datasets:
+- ghost233lism/GeoSeek
 language:
 - en
 license: cc-by-nc-4.0
 pipeline_tag: image-text-to-text
+library_name: transformers
 tags:
 - image
+- geolocation
+- reasoning
+- chain-of-thought
+- rlhf
 ---
 <div align="center">
 </div>
+**GeoAgent** is a vision-language model for **image geolocation** that reasons closely with humans and derives fine-grained address conclusions. Built upon [Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct), it achieves strong performance across multiple geographic grains (city, region, country, continent) while generating interpretable chain-of-thought reasoning.
 GeoAgent introduces:
 - **GeoSeek-Loc** (20k): Images for RL-based finetuning, sampled via a stratified strategy considering population, land area, and highway mileage to reduce geographic bias.
 - **GeoSeek-Val** (3k): Validation benchmark with locatability scores and scene categories (manmade structures, natural landscapes, etc.) for evaluation.
 ## Installation
 ### Requirements
 ### Quick Inference
+We provide the quick inference scripts for single/batch image input in `infer/`. Please refer to [infer/README](https://github.com/HVision-NKU/GeoAgent/blob/main/infer/README.md) for detailed information.
 ### Training
 ## Citation
+```bibtex
+@article{jin2025geoagent,
+  title={GeoAgent: Learning to Geolocate Everywhere with Reinforced Geographic Characteristics},
+  author={Jin, Modi and Zhang, Yiming and Sun, Boyuan and Zhang, Dingwen and Cheng, Ming-Ming and Hou, Qibin},
+  journal={arXiv preprint arXiv:2602.12617},
+  year={2025}
+}
+```
 ## License
 ## Acknowledgments
+We sincerely thank [Yue Zhang](https://tuxun.fun/), [H.M.](https://space.bilibili.com/1655209518), [Haowen He](https://space.bilibili.com/111714204), [Yuke Jun](https://space.bilibili.com/93569847), and other experts in geography, as well as outstanding geolocation game players, for their valuable guidance, prompt design suggestions, and data support throughout the construction of the GeoSeek dataset.
+We also thank [Zhixiang Wang](https://tuxun.fun/), [Chilin Chen](https://tuxun.fun/), [Jincheng Shi](https://tuxun.fun/), [Liupeng Zhang](https://tuxun.fun/), [Yuan Gu](https://tuxun.fun/), [Yanghang Shao](https://tuxun.fun/), [Jinhua Zhang](https://tuxun.fun/), [Jiachen Zhu](https://tuxun.fun/), [Gucheng Qiuyue](https://tuxun.fun/), [Qingyang Guo](https://tuxun.fun/), [Jingchen Yang](https://tuxun.fun/), [Weilong Kong](https://tuxun.fun/), [Xinyuan Li](https://tuxun.fun/), and [Mr. Xu](https://tuxun.fun/) (an anonymous volunteer) for their outstanding contributions in providing high-quality reasoning process data.