nielsr HF Staff commited on
Commit
5271b3d
·
verified ·
1 Parent(s): 78f085c

Improve model card for GeoReasoner: Add pipeline tag, update license, link paper and code

Browse files

This PR significantly enhances the model card for GeoReasoner by:

* Updating the `license` from `mit` to `cc-by-nc-4.0`. This change is based on the "Usage and License Notices" in the GitHub README, which states that collected data "cannot be used for commercial purposes," making a non-commercial license like `cc-by-nc-4.0` more appropriate than the permissive `mit` license.
* Adding the `pipeline_tag: image-text-to-text` to accurately reflect the model's functionality as a Large Vision-Language Model processing images and generating text.
* Including a direct link to the associated paper: [GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model](https://huggingface.co/papers/2406.18572).
* Adding a link to the official GitHub repository: https://github.com/lingli1996/GeoReasoner.
* Providing a concise description of the model based on the paper's abstract.
* Integrating detailed "Usage and License Notices," "Description," "Acknowledgments," and "Citation" sections directly from the GitHub README for comprehensive information.

The `library_name` tag is intentionally omitted as there is no direct code evidence in the repository's README for out-of-the-box compatibility with a specific library like `transformers` for the GeoReasoner LoRA weights themselves, adhering to the strict guidelines for adding this metadata. Similarly, a runnable sample usage code snippet is not added, as the GitHub README primarily provides setup steps and a command to run a script, rather than a self-contained, direct Python code example for general use on the Hugging Face Hub.

Files changed (1) hide show
  1. README.md +83 -3
README.md CHANGED
@@ -1,3 +1,83 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ pipeline_tag: image-text-to-text
4
+ ---
5
+
6
+ # GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model
7
+
8
+ GeoReasoner is a novel large vision-language model (LVLM) for geo-localization in street views, enhanced with human inference knowledge. It addresses data scarcity and quality issues by creating a new dataset of highly locatable street views and integrating external knowledge from geo-localization games. Fine-tuned through reasoning and location-tuning stages, GeoReasoner significantly outperforms existing LVLMs and StreetCLIP in country-level (25%+) and city-level (38%+) geo-localization tasks.
9
+
10
+ This model was presented in the paper [GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model](https://huggingface.co/papers/2406.18572).
11
+
12
+ The official code and further details can be found on the [GitHub repository](https://github.com/lingli1996/GeoReasoner).
13
+
14
+ <div align="center">
15
+
16
+ ![GeoReasoner Overview](https://github.com/lingli1996/GeoReasoner/raw/main/figures/GeoReasoner.png)
17
+ </div>
18
+
19
+ ## Release
20
+ - Data
21
+ - For Stage 1 (Reasoning Tuning Phase), We have released the SFT data on [![Hugging Face](https://img.shields.io/badge/HuggingFace-GeoReasoner_SFT-FFD21F)](https://huggingface.co/datasets/ling1996/GeoReasoner_SFT).
22
+ - For Stage 2 (Location Tuning Phase), due to copyright issues with Google Street View images, we are unable to directly provide the corresponding data. However, you can retrieve the relevant data by using the official API provided by [Google Street View](https://www.google.com/streetview).
23
+
24
+ - Code
25
+ - loc_clip: the codebase for computing locatability of street view images.
26
+ - GeoReasoner: a collection of train and inference scripts of GeoReasoner models.
27
+
28
+ ## Usage and License Notices
29
+ This project utilizes certain datasets and checkpoints that are subject to their respective original licenses. It is important to emphasize that the collected data from [GeoGuessr]( https://www.geoguessr.com) and [Tuxun](https://tuxun.fun) cannot be used for commercial purposes.
30
+
31
+ ## Description
32
+ - For computing locatability of street view images
33
+ - Follow the [MaskFormer instruction](https://github.com/facebookresearch/MaskFormer/blob/main/GETTING_STARTED.md) to ensure that the Inference Demo with Pre-trained Models works correctly.
34
+ - Obtain the percentage for each category from the segmentation results.
35
+ - Calculate the locatability value by referring to the example in the script `loc_clip/locatability_comput.py`.
36
+
37
+ - For the inference of GeoReasoner models
38
+ - The pre-trained LVLM weights are available at [![Hugging Face](https://img.shields.io/badge/HuggingFace-Qwen_VL_Chat-FFD21F)](https://huggingface.co/Qwen/Qwen-VL-Chat)
39
+ - Our LoRA weights are available at [![Hugging Face](https://img.shields.io/badge/HuggingFace-GeoReasoner_Models-FFD21F)](https://huggingface.co/ling1996/GeoReasoner_Models)
40
+ - Inference steps
41
+ ```
42
+ cd GeoReasoner
43
+ git clone https://github.com/QwenLM/Qwen-VL.git
44
+ cd Qwen-VL
45
+ pip install -r requirements.txt
46
+ mkdir Qwen-VL-Models
47
+ mkdir LoRA
48
+ ```
49
+ - Then download the pre-trained LVLM weights into the `Qwen-VL-Models` folder and the LoRA weights into the `LoRA` folder.
50
+ ```Python
51
+ python infer.py # with the test image
52
+ # Due to the inherent randomness in LVLM generation, the generated reasons may not always be consistent.
53
+ ```
54
+ - Training steps (Reasoning Tuning Phase)
55
+ ```
56
+ cd GeoReasoner
57
+ git clone https://github.com/QwenLM/Qwen-VL.git
58
+ cd Qwen-VL
59
+ pip install -r requirements.txt
60
+ mkdir Qwen-VL-Models
61
+ mkdir LoRA
62
+ mkdir Dataset
63
+ ```
64
+ - Then download the pre-trained LVLM weights into the `Qwen-VL-Models` folder and the SFT data into the `Dataset` folder.
65
+ ```
66
+ mv finetune_lora_reason.sh Qwen-VL/finetune
67
+ cd Qwen-VL
68
+ sh finetune/finetune_lora_reason.sh
69
+ ```
70
+
71
+
72
+ ## Acknowledgments
73
+ We are very grateful for the source codes and outstanding contributions from [MaskFormer](https://github.com/facebookresearch/MaskFormer), [Sentence-BERT](https://github.com/UKPLab/sentence-transformers) and [Qwen-VL](https://github.com/QwenLM/Qwen-VL).
74
+
75
+ ## Citation
76
+ ```
77
+ @inproceedings{li2024georeasoner,
78
+ title={GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model},
79
+ author={Li, Ling and Ye, Yu and Zeng, Wei},
80
+ booktitle={International Conference on Machine Learning (ICML)},
81
+ year={2024}
82
+ }
83
+ ```