nielsr HF Staff commited on
Commit
778dfe4
·
verified ·
1 Parent(s): 1000d8f

Improve model card: add pipeline tag, license, paper, code links and evaluation usage

Browse files

This PR significantly enhances the model card for `initiacms/GeoLLaVA-8K` by:

- Adding the `pipeline_tag: image-text-to-text` to improve discoverability on the Hugging Face Hub.
- Specifying the `license: cc-by-nc-4.0`, referencing the likely license of the associated GeoLLaVA-Data dataset.
- Including a direct link to the paper: [GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K Resolution](https://huggingface.co/papers/2505.21375).
- Adding a link to the official GitHub repository: https://github.com/MiliLab/GeoLLaVA-8K.
- Providing a concrete `lmms-eval` code snippet for evaluation, copied directly from the GitHub README, to guide users on model usage.
- Adding the BibTeX citation for proper attribution.

Please review and merge this PR if everything looks good.

Files changed (1) hide show
  1. README.md +60 -3
README.md CHANGED
@@ -1,9 +1,66 @@
1
  ---
2
- language:
3
- - en
4
  base_model:
5
  - lmms-lab/LongVA-7B
 
 
6
  library_name: transformers
 
 
7
  ---
8
 
9
- For model usage, you can refer to https://huggingface.co/lmms-lab/LongVA-7B#usage
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
2
  base_model:
3
  - lmms-lab/LongVA-7B
4
+ language:
5
+ - en
6
  library_name: transformers
7
+ license: cc-by-nc-4.0
8
+ pipeline_tag: image-text-to-text
9
  ---
10
 
11
+ <div align="center">
12
+ <h2><strong>GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K Resolution</strong></h2>
13
+ <h5>
14
+ <em>
15
+ Fengxiang Wang<sup>1</sup>, Mingshuo Chen<sup>2</sup>, Yueying Li<sup>1</sup>, Di Wang<sup>4,5</sup>, Haotian Wang<sup>1</sup>, <br/>
16
+ Zonghao Guo<sup>3</sup>, Zefan Wang<sup>3</sup>, Boqi Shan<sup>6</sup>, Long Lan<sup>1</sup>, Yuilin Wang<sup>3&nbsp;†</sup>, <br/>
17
+ Hongzhen Wang<sup>3&nbsp;†</sup>, Wenjing Yang<sup>1&nbsp;†</sup>, Bo Du<sup>4</sup>, Jing Zhang<sup>4&nbsp;†</sup>
18
+ </em>
19
+ <br/><br/>
20
+ <sup>1</sup> National University of Defense Technology, China<br/>
21
+ <sup>2</sup> Beijing University of Posts and Telecommunications, China<br/>
22
+ <sup>3</sup> Tsinghua University, China, <sup>4</sup> Wuhan University, China<br/>
23
+ <sup>5</sup> Zhongguancun Academy, China, <sup>6</sup> Beihang University, China
24
+ </h5>
25
+ <p>
26
+ 📃 <a href="https://arxiv.org/abs/2505.21375" target="_blank">Paper</a> |
27
+ 🤗 <a href="https://huggingface.co/initiacms/GeoLLaVA-8K" target="_blank">Model</a> |
28
+ 🤗 <a href="https://huggingface.co/datasets/initiacms/GeoLLaVA-Data" target="_blank">Dataset</a>
29
+ </p>
30
+ </div>
31
+
32
+ GeoLLaVA-8K is the first remote-sensing-focused multimodal large language model capable of handling inputs up to 8K×8K resolution, built on the LLaVA framework. It addresses two key bottlenecks in processing ultra-high-resolution (UHR) remote sensing imagery: (1) limited availability of UHR training data, and (2) token explosion caused by large image sizes. To overcome these, GeoLLaVA-8K introduces novel UHR vision-language datasets (SuperRS-VQA and HighRS-VQA) and proposes strategies like Background Token Pruning and Anchored Token Selection.
33
+
34
+ This model was presented in the paper: [GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K Resolution](https://huggingface.co/papers/2505.21375)
35
+ Official GitHub Repository: [https://github.com/MiliLab/GeoLLaVA-8K](https://github.com/MiliLab/GeoLLaVA-8K)
36
+
37
+ ## Usage
38
+
39
+ GeoLLaVA-8K is built upon [LongVA](https://github.com/EvolvingLMMs-Lab/LongVA). For detailed installation and finetuning instructions, please refer to the [GitHub repository](https://github.com/MiliLab/GeoLLaVA-8K).
40
+
41
+ For evaluation, you can use the `lmms-eval` framework as demonstrated below:
42
+
43
+ ```bash
44
+ CKPT_PATH=initiacms/GeoLLaVA-8K # or local path
45
+ accelerate launch --num_processes 8 --main_process_port 12345 -m lmms_eval \
46
+ --model "longva" \
47
+ --model_args "pretrained=${CKPT_PATH},use_flash_attention_2=True" \
48
+ --tasks xlrs-lite \
49
+ --batch_size 1 \
50
+ --log_samples \
51
+ --log_samples_suffix longva_xlrs_lite \
52
+ --output_path ./logs/
53
+ ```
54
+
55
+ ## Citation
56
+
57
+ If you find our work helpful, please consider citing:
58
+
59
+ ```latex
60
+ @article{wang2025geollava8kscalingremotesensingmultimodal,
61
+ title={GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K Resolution},
62
+ author={Fengxiang Wang and Mingshuo Chen and Yueying Li and Di Wang and Haotian Wang and Zonghao Guo and Zefan Wang and Boqi Shan and Long Lan and Yulin Wang and Hongzhen Wang and Wenjing Yang and Bo Du and Jing Zhang},
63
+ journal={arXiv preprint arXiv:2505.21375},
64
+ year={2025},
65
+ }
66
+ ```