RSCoVLM: Co-Training Vision Language Models for Remote Sensing Multi-task Learning

Qingyun Li* Shuran Ma* Junwei Luo* Yi Yu* Yue Zhou Fengxiang Wang Xudong Lu Xiaoxing Wang Xin He Yushi Chen Xue Yang

If you find our work helpful, please consider giving us a ⭐!

ArXiv Paper: https://arxiv.org/abs/2511.21272
Published Paper: https://www.mdpi.com/2072-4292/18/2/222
GitHub Repo: https://github.com/VisionXLab/RSCoVLM
HuggingFace Page: https://huggingface.co/collections/Qingyun/rscovlm

This repo hosts the checkpoint of RSCoVLM based on Qwen/Qwen2.5-VL-7B-Instruct trained on a comprehensive remote sensing data recipe.

RSCoVLM is a technical practice to fine-tune Large Multimodal language Models for remote sensing image understanding, ultra-high-resolution image reasoning, oriented object detection, and so on. This repo hosts the official model weight of the paper: RSCoVLM: Co-Training Vision Language Models for Remote Sensing Multi-task Learning.

Downloading Guide

You can download with your web browser on the file page.

We recommand downloading in terminal using hf (pip install --upgrade huggingface_hub). You can refer to the document for more usages.

# Set Huggingface Mirror for Chinese users (if required):
export HF_ENDPOINT=https://hf-mirror.com 
# Download a certain checkpoint:
hf download Qingyun/RSCoVLM-7B-2512 --repo-type model --local-dir checkpoint/RSCoVLM-7B-2512/
# If any error (such as network error) interrupts the downloading, you just need to execute the same command, the latest hf will resume downloading.

Cite

RSCoVLM and LMMRotate paper:


@ARTICLE{li2026rscovlm,
  author={Li, Qingyun and Ma, Shuran and Luo, Junwei and Yu, Yi and Zhou, Yue and Wang, Fengxiang and Lu, Xudong and Wang, Xiaoxing and He, Xin and Chen, Yushi and Yang, Xue},
  title={Co-Training Vision-Language Models for Remote Sensing Multi-Task Learning},
  journal={Remote Sensing},
  volume={18},
  year={2026},
  number={2},
  article-number={222},
  url={https://www.mdpi.com/2072-4292/18/2/222},
  issn={2072-4292},
  doi={10.3390/rs18020222}
}

@INPROCEEDINGS{11242725,
  author={Li, Qingyun and He, Xin and Shu, Xinya and Yu, Yi and Chen, Dong and Chen, Yushi and Yang, Xue},
  booktitle={IGARSS 2025 - 2025 IEEE International Geoscience and Remote Sensing Symposium}, 
  title={A Simple Aerial Detection Baseline of Multimodal Language Models}, 
  year={2025},
  pages={6833-6837},
  doi={10.1109/IGARSS55030.2025.11242725}
}