RSCoVLM: Co-Training Vision Language Models for Remote Sensing Multi-task Learning
Qingyun Li*β Shuran Ma*β Junwei Luo*β Yi Yu*β Yue Zhouβ Fengxiang Wangβ Xudong Luβ Xiaoxing Wangβ Xin Heβ Yushi Chenβ Xue Yangβ
If you find our work helpful, please consider giving us a β!
- ArXiv Paper: https://arxiv.org/abs/2511.21272
- Published Paper: https://www.mdpi.com/2072-4292/18/2/222
- GitHub Repo: https://github.com/VisionXLab/RSCoVLM
- HuggingFace Page: https://huggingface.co/collections/Qingyun/rscovlm
This repo hosts the checkpoint of RSCoVLM based on Qwen/Qwen2.5-VL-7B-Instruct trained on a comprehensive remote sensing data recipe.
RSCoVLM is a technical practice to fine-tune Large Multimodal language Models for remote sensing image understanding, ultra-high-resolution image reasoning, oriented object detection, and so on. This repo hosts the official model weight of the paper: RSCoVLM: Co-Training Vision Language Models for Remote Sensing Multi-task Learning.
Downloading Guide
You can download with your web browser on the file page.
We recommand downloading in terminal using hf (pip install --upgrade huggingface_hub). You can refer to the document for more usages.
# Set Huggingface Mirror for Chinese users (if required):
export HF_ENDPOINT=https://hf-mirror.com
# Download a certain checkpoint:
hf download Qingyun/RSCoVLM-7B-2512 --repo-type model --local-dir checkpoint/RSCoVLM-7B-2512/
# If any error (such as network error) interrupts the downloading, you just need to execute the same command, the latest hf will resume downloading.
Cite
RSCoVLM and LMMRotate paper:
@ARTICLE{li2026rscovlm,
author={Li, Qingyun and Ma, Shuran and Luo, Junwei and Yu, Yi and Zhou, Yue and Wang, Fengxiang and Lu, Xudong and Wang, Xiaoxing and He, Xin and Chen, Yushi and Yang, Xue},
title{Co-Training Vision-Language Models for Remote Sensing Multi-Task Learning},
journal={Remote Sensing},
volume={18},
year={2026},
number={2},
article-number={222},
url={https://www.mdpi.com/2072-4292/18/2/222},
issn={2072-4292},
doi={10.3390/rs18020222}
}
@INPROCEEDINGS{11242725,
author={Li, Qingyun and He, Xin and Shu, Xinya and Yu, Yi and Chen, Dong and Chen, Yushi and Yang, Xue},
booktitle={IGARSS 2025 - 2025 IEEE International Geoscience and Remote Sensing Symposium},
title={A Simple Aerial Detection Baseline of Multimodal Language Models},
year={2025},
pages={6833-6837},
doi={10.1109/IGARSS55030.2025.11242725}
}
- Downloads last month
- 36