Improve model card for GeoReasoner: Add pipeline tag, update license, link paper and code
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,3 +1,83 @@
|
|
| 1 |
-
---
|
| 2 |
-
license:
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: cc-by-nc-4.0
|
| 3 |
+
pipeline_tag: image-text-to-text
|
| 4 |
+
---
|
| 5 |
+
|
| 6 |
+
# GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model
|
| 7 |
+
|
| 8 |
+
GeoReasoner is a novel large vision-language model (LVLM) for geo-localization in street views, enhanced with human inference knowledge. It addresses data scarcity and quality issues by creating a new dataset of highly locatable street views and integrating external knowledge from geo-localization games. Fine-tuned through reasoning and location-tuning stages, GeoReasoner significantly outperforms existing LVLMs and StreetCLIP in country-level (25%+) and city-level (38%+) geo-localization tasks.
|
| 9 |
+
|
| 10 |
+
This model was presented in the paper [GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model](https://huggingface.co/papers/2406.18572).
|
| 11 |
+
|
| 12 |
+
The official code and further details can be found on the [GitHub repository](https://github.com/lingli1996/GeoReasoner).
|
| 13 |
+
|
| 14 |
+
<div align="center">
|
| 15 |
+
|
| 16 |
+

|
| 17 |
+
</div>
|
| 18 |
+
|
| 19 |
+
## Release
|
| 20 |
+
- Data
|
| 21 |
+
- For Stage 1 (Reasoning Tuning Phase), We have released the SFT data on [](https://huggingface.co/datasets/ling1996/GeoReasoner_SFT).
|
| 22 |
+
- For Stage 2 (Location Tuning Phase), due to copyright issues with Google Street View images, we are unable to directly provide the corresponding data. However, you can retrieve the relevant data by using the official API provided by [Google Street View](https://www.google.com/streetview).
|
| 23 |
+
|
| 24 |
+
- Code
|
| 25 |
+
- loc_clip: the codebase for computing locatability of street view images.
|
| 26 |
+
- GeoReasoner: a collection of train and inference scripts of GeoReasoner models.
|
| 27 |
+
|
| 28 |
+
## Usage and License Notices
|
| 29 |
+
This project utilizes certain datasets and checkpoints that are subject to their respective original licenses. It is important to emphasize that the collected data from [GeoGuessr]( https://www.geoguessr.com) and [Tuxun](https://tuxun.fun) cannot be used for commercial purposes.
|
| 30 |
+
|
| 31 |
+
## Description
|
| 32 |
+
- For computing locatability of street view images
|
| 33 |
+
- Follow the [MaskFormer instruction](https://github.com/facebookresearch/MaskFormer/blob/main/GETTING_STARTED.md) to ensure that the Inference Demo with Pre-trained Models works correctly.
|
| 34 |
+
- Obtain the percentage for each category from the segmentation results.
|
| 35 |
+
- Calculate the locatability value by referring to the example in the script `loc_clip/locatability_comput.py`.
|
| 36 |
+
|
| 37 |
+
- For the inference of GeoReasoner models
|
| 38 |
+
- The pre-trained LVLM weights are available at [](https://huggingface.co/Qwen/Qwen-VL-Chat)
|
| 39 |
+
- Our LoRA weights are available at [](https://huggingface.co/ling1996/GeoReasoner_Models)
|
| 40 |
+
- Inference steps
|
| 41 |
+
```
|
| 42 |
+
cd GeoReasoner
|
| 43 |
+
git clone https://github.com/QwenLM/Qwen-VL.git
|
| 44 |
+
cd Qwen-VL
|
| 45 |
+
pip install -r requirements.txt
|
| 46 |
+
mkdir Qwen-VL-Models
|
| 47 |
+
mkdir LoRA
|
| 48 |
+
```
|
| 49 |
+
- Then download the pre-trained LVLM weights into the `Qwen-VL-Models` folder and the LoRA weights into the `LoRA` folder.
|
| 50 |
+
```Python
|
| 51 |
+
python infer.py # with the test image
|
| 52 |
+
# Due to the inherent randomness in LVLM generation, the generated reasons may not always be consistent.
|
| 53 |
+
```
|
| 54 |
+
- Training steps (Reasoning Tuning Phase)
|
| 55 |
+
```
|
| 56 |
+
cd GeoReasoner
|
| 57 |
+
git clone https://github.com/QwenLM/Qwen-VL.git
|
| 58 |
+
cd Qwen-VL
|
| 59 |
+
pip install -r requirements.txt
|
| 60 |
+
mkdir Qwen-VL-Models
|
| 61 |
+
mkdir LoRA
|
| 62 |
+
mkdir Dataset
|
| 63 |
+
```
|
| 64 |
+
- Then download the pre-trained LVLM weights into the `Qwen-VL-Models` folder and the SFT data into the `Dataset` folder.
|
| 65 |
+
```
|
| 66 |
+
mv finetune_lora_reason.sh Qwen-VL/finetune
|
| 67 |
+
cd Qwen-VL
|
| 68 |
+
sh finetune/finetune_lora_reason.sh
|
| 69 |
+
```
|
| 70 |
+
|
| 71 |
+
|
| 72 |
+
## Acknowledgments
|
| 73 |
+
We are very grateful for the source codes and outstanding contributions from [MaskFormer](https://github.com/facebookresearch/MaskFormer), [Sentence-BERT](https://github.com/UKPLab/sentence-transformers) and [Qwen-VL](https://github.com/QwenLM/Qwen-VL).
|
| 74 |
+
|
| 75 |
+
## Citation
|
| 76 |
+
```
|
| 77 |
+
@inproceedings{li2024georeasoner,
|
| 78 |
+
title={GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model},
|
| 79 |
+
author={Li, Ling and Ye, Yu and Zeng, Wei},
|
| 80 |
+
booktitle={International Conference on Machine Learning (ICML)},
|
| 81 |
+
year={2024}
|
| 82 |
+
}
|
| 83 |
+
```
|