nielsr HF Staff commited on
Commit
e5d8d82
·
verified ·
1 Parent(s): bf25806

Add library_name and improve model card metadata

Browse files

Hi! I'm Niels from the Hugging Face community team.

I'm opening this PR to add `library_name: transformers` to the model metadata. This allows the Hub to recognize the model's compatibility and provide the "Use in Transformers" button. I have also added task-specific tags like `geolocation` and `reasoning` to help users discover your work.

I've also updated the citation section with a BibTeX entry based on the paper information provided.

Files changed (1) hide show
  1. README.md +19 -23
README.md CHANGED
@@ -1,14 +1,19 @@
1
  ---
2
  base_model:
3
  - Qwen/Qwen2.5-VL-7B-Instruct
 
 
4
  language:
5
  - en
6
  license: cc-by-nc-4.0
7
  pipeline_tag: image-text-to-text
 
8
  tags:
9
  - image
10
- datasets:
11
- - ghost233lism/GeoSeek
 
 
12
  ---
13
 
14
  <div align="center">
@@ -35,11 +40,7 @@ datasets:
35
 
36
  </div>
37
 
38
- <!-- ![teaser](assets/teaser.png) -->
39
-
40
-
41
-
42
- **GeoAgent** is a vision-language model for **image geolocation** that reasons closely with humans and derives fine-grained address conclusions. Built upon [Qwen2.5-VL](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct), it achieves strong performance across multiple geographic grains (city, region, country, continent) while generating interpretable chain-of-thought reasoning.
43
 
44
  GeoAgent introduces:
45
 
@@ -52,17 +53,6 @@ We also introduce [**GeoSeek**](https://huggingface.co/datasets/ghost233lism/Geo
52
  - **GeoSeek-Loc** (20k): Images for RL-based finetuning, sampled via a stratified strategy considering population, land area, and highway mileage to reduce geographic bias.
53
  - **GeoSeek-Val** (3k): Validation benchmark with locatability scores and scene categories (manmade structures, natural landscapes, etc.) for evaluation.
54
 
55
-
56
-
57
- <!-- <div align="center">
58
- <img src="assets/depthanything-AC-video.gif" alt="video" width="100%">
59
- </div> -->
60
-
61
-
62
- <!-- ## Model Architecture -->
63
-
64
- <!-- ![architecture](assets/pipeline.png) -->
65
-
66
  ## Installation
67
 
68
  ### Requirements
@@ -102,7 +92,7 @@ huggingface-cli download --resume-download ghost233lism/GeoAgent --local-dir gho
102
 
103
  ### Quick Inference
104
 
105
- We provide the quick inference scripts for single/batch image input in `infer/`. Please refer to [infer/README](https://github.com/HVision-NKU/GeoAgent/infer/README.md) for detailed information.
106
 
107
  ### Training
108
 
@@ -115,7 +105,14 @@ bash tools/train_grpo.sh
115
 
116
  ## Citation
117
 
118
- Coming soon...
 
 
 
 
 
 
 
119
 
120
 
121
  ## License
@@ -132,7 +129,6 @@ For commercial licensing, please contact andrewhoux[AT]gmail.com.
132
 
133
  ## Acknowledgments
134
 
135
- We sincerely thank [Yue Zhang](https://tuxun.fun/), [H.M.](https://space.bilibili.com/1655209518?spm_id_from=333.337.0.0), [Haowen He](https://space.bilibili.com/111714204?spm_id_from=333.337.0.0), [Yuke Jun](https://space.bilibili.com/93569847?spm_id_from=333.337.0.0), and other experts in geography, as well as outstanding geolocation game players, for their valuable guidance, prompt design suggestions, and data support throughout the construction of the GeoSeek dataset.
136
 
137
- We also thank [Zhixiang Wang](https://tuxun.fun/), [Chilin Chen](https://tuxun.fun/), [Jincheng Shi](https://tuxun.fun/), [Liupeng Zhang](https://tuxun.fun/), [Yuan Gu](https://tuxun.fun/), [Yanghang Shao](https://tuxun.fun/), [Jinhua Zhang](https://tuxun.fun/), [Jiachen Zhu](https://tuxun.fun/), [Gucheng Qiuyue](https://tuxun.fun/), [Qingyang Guo](https://tuxun.fun/), [Jingchen Yang](https://tuxun.fun/), [Weilong Kong](https://tuxun.fun/), [Xinyuan Li](https://tuxun.fun/), and [Mr. Xu](https://tuxun.fun/) (an anonymous volunteer)
138
- for their outstanding contributions in providing high-quality reasoning process data.
 
1
  ---
2
  base_model:
3
  - Qwen/Qwen2.5-VL-7B-Instruct
4
+ datasets:
5
+ - ghost233lism/GeoSeek
6
  language:
7
  - en
8
  license: cc-by-nc-4.0
9
  pipeline_tag: image-text-to-text
10
+ library_name: transformers
11
  tags:
12
  - image
13
+ - geolocation
14
+ - reasoning
15
+ - chain-of-thought
16
+ - rlhf
17
  ---
18
 
19
  <div align="center">
 
40
 
41
  </div>
42
 
43
+ **GeoAgent** is a vision-language model for **image geolocation** that reasons closely with humans and derives fine-grained address conclusions. Built upon [Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct), it achieves strong performance across multiple geographic grains (city, region, country, continent) while generating interpretable chain-of-thought reasoning.
 
 
 
 
44
 
45
  GeoAgent introduces:
46
 
 
53
  - **GeoSeek-Loc** (20k): Images for RL-based finetuning, sampled via a stratified strategy considering population, land area, and highway mileage to reduce geographic bias.
54
  - **GeoSeek-Val** (3k): Validation benchmark with locatability scores and scene categories (manmade structures, natural landscapes, etc.) for evaluation.
55
 
 
 
 
 
 
 
 
 
 
 
 
56
  ## Installation
57
 
58
  ### Requirements
 
92
 
93
  ### Quick Inference
94
 
95
+ We provide the quick inference scripts for single/batch image input in `infer/`. Please refer to [infer/README](https://github.com/HVision-NKU/GeoAgent/blob/main/infer/README.md) for detailed information.
96
 
97
  ### Training
98
 
 
105
 
106
  ## Citation
107
 
108
+ ```bibtex
109
+ @article{jin2025geoagent,
110
+ title={GeoAgent: Learning to Geolocate Everywhere with Reinforced Geographic Characteristics},
111
+ author={Jin, Modi and Zhang, Yiming and Sun, Boyuan and Zhang, Dingwen and Cheng, Ming-Ming and Hou, Qibin},
112
+ journal={arXiv preprint arXiv:2602.12617},
113
+ year={2025}
114
+ }
115
+ ```
116
 
117
 
118
  ## License
 
129
 
130
  ## Acknowledgments
131
 
132
+ We sincerely thank [Yue Zhang](https://tuxun.fun/), [H.M.](https://space.bilibili.com/1655209518), [Haowen He](https://space.bilibili.com/111714204), [Yuke Jun](https://space.bilibili.com/93569847), and other experts in geography, as well as outstanding geolocation game players, for their valuable guidance, prompt design suggestions, and data support throughout the construction of the GeoSeek dataset.
133
 
134
+ We also thank [Zhixiang Wang](https://tuxun.fun/), [Chilin Chen](https://tuxun.fun/), [Jincheng Shi](https://tuxun.fun/), [Liupeng Zhang](https://tuxun.fun/), [Yuan Gu](https://tuxun.fun/), [Yanghang Shao](https://tuxun.fun/), [Jinhua Zhang](https://tuxun.fun/), [Jiachen Zhu](https://tuxun.fun/), [Gucheng Qiuyue](https://tuxun.fun/), [Qingyang Guo](https://tuxun.fun/), [Jingchen Yang](https://tuxun.fun/), [Weilong Kong](https://tuxun.fun/), [Xinyuan Li](https://tuxun.fun/), and [Mr. Xu](https://tuxun.fun/) (an anonymous volunteer) for their outstanding contributions in providing high-quality reasoning process data.