Add model card for GeoZero
Browse filesHi! Niels from the Hugging Face community science team here.
This PR adds a comprehensive model card for GeoZero. It includes:
- Relevant metadata such as `pipeline_tag`, `library_name`, and `tags` for better discoverability.
- Links to the paper ([GeoZero: Incentivizing Reasoning from Scratch on Geospatial Scenes](https://huggingface.co/papers/2511.22645)) and the GitHub repository ([MiliLab/GeoZero](https://github.com/MiliLab/GeoZero)).
- A summary of the GeoZero framework, its innovative training approach, and the A$^2$GRPO method.
- A sample inference usage snippet directly from the official GitHub README.
- The BibTeX citation for the work.
README.md
ADDED
|
@@ -0,0 +1,55 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
library_name: transformers
|
| 3 |
+
pipeline_tag: image-text-to-text
|
| 4 |
+
base_model: Qwen/Qwen3-VL-8B-Instruct
|
| 5 |
+
tags:
|
| 6 |
+
- remote-sensing
|
| 7 |
+
- geospatial
|
| 8 |
+
- reasoning
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
# GeoZero: Incentivizing Reasoning from Scratch on Geospatial Scenes
|
| 12 |
+
|
| 13 |
+
GeoZero is a novel framework that enables Multimodal Large Language Models (MLLMs) to perform emergent geospatial reasoning from scratch, without reliance on predefined Chain-of-Thought (CoT) supervision. This approach significantly reduces annotation costs and mitigates human biases, fostering diverse yet accurate thinking for geospatial scene understanding.
|
| 14 |
+
|
| 15 |
+
- **Paper:** [GeoZero: Incentivizing Reasoning from Scratch on Geospatial Scenes](https://huggingface.co/papers/2511.22645)
|
| 16 |
+
- **Repository:** [GitHub - MiliLab/GeoZero](https://github.com/MiliLab/GeoZero)
|
| 17 |
+
|
| 18 |
+
## Model Description
|
| 19 |
+
|
| 20 |
+
GeoZero is built upon the Qwen3-VL-8B-Instruct backbone and employs a two-stage training strategy:
|
| 21 |
+
1. **GeoZero-Instruct**: The model acquires preliminary geospatial knowledge through supervised fine-tuning.
|
| 22 |
+
2. **GeoZero-Hard**: Deep reasoning capabilities are stimulated during a subsequent reinforcement learning stage.
|
| 23 |
+
|
| 24 |
+
A key contribution of GeoZero is **Answer-Anchored Group Relative Policy Optimization (A$^2$GRPO)**, a method that regularizes the reasoning process using the model's own answers, promoting both diversity and accuracy in its cognitive functions. Extensive experiments demonstrate GeoZero's superior performance and emergent reasoning capabilities across various remote sensing vision-language benchmarks.
|
| 25 |
+
|
| 26 |
+
## Usage
|
| 27 |
+
|
| 28 |
+
### Inference
|
| 29 |
+
|
| 30 |
+
An inference script is provided in the official GitHub repository for evaluating GeoZero on various remote sensing vision-language tasks:
|
| 31 |
+
|
| 32 |
+
```bash
|
| 33 |
+
python single_infer_eval_geozero_think.py \
|
| 34 |
+
--model_path [model path] \
|
| 35 |
+
--json_path [dataset json path] \
|
| 36 |
+
--output_path [output saved path] \
|
| 37 |
+
--task [task type] --batchsize 4 --gpu [gpu id] --system [whether use the system prompt (Type1)]
|
| 38 |
+
```
|
| 39 |
+
|
| 40 |
+
## Citation
|
| 41 |
+
|
| 42 |
+
If you find GeoZero helpful in your research, please cite the following paper:
|
| 43 |
+
|
| 44 |
+
```bibtex
|
| 45 |
+
@article{wang2025geozero,
|
| 46 |
+
title = {GeoZero: Incentivizing Reasoning from Scratch on Geospatial Scenes},
|
| 47 |
+
author = {Wang, Di and Liu, Shunyu and Jiang, Wentao and Wang, Fengxiang and Liu, Yi and Qin, Xiaolei and Luo, Zhiming and Zhou, Chaoyang and Guo, Haonan and Zhang, Jing and Du, Bo and Tao, Dacheng and Zhang, Liangpei},
|
| 48 |
+
journal = {arXiv preprint arXiv:2511.22645},
|
| 49 |
+
year = {2025}
|
| 50 |
+
}
|
| 51 |
+
```
|
| 52 |
+
|
| 53 |
+
## Acknowledgements
|
| 54 |
+
|
| 55 |
+
This project is based on [Qwen3-VL](https://github.com/QwenLM/Qwen3-VL), [ms-swift](https://github.com/modelscope/ms-swift), and [RSEvalKit](https://github.com/fitzpchao/RSEvalKit). We thank the authors for their wonderful work!
|