RSCoVLM: Co-Training Vision Language Models for Remote Sensing Multi-task Learning

Qingyun Li*  Shuran Ma*  Junwei Luo*  Yi Yu*  Yue Zhou  Fengxiang Wang  Xudong Lu  Xiaoxing Wang  Xin He  Yushi Chen  Xue Yang 

If you find our work helpful, please consider giving us a ⭐!

This repo hosts the checkpoint of RSCoVLM based on Qwen/Qwen2.5-VL-7B-Instruct trained on a comprehensive remote sensing data recipe.

RSCoVLM is a technical practice to fine-tune Large Multimodal language Models for remote sensing image understanding, ultra-high-resolution image reasoning, oriented object detection, and so on. This repo hosts the official model weight of the paper: RSCoVLM: Co-Training Vision Language Models for Remote Sensing Multi-task Learning.

framework

Downloading Guide

You can download with your web browser on the file page.

We recommand downloading in terminal using hf (pip install --upgrade huggingface_hub). You can refer to the document for more usages.

# Set Huggingface Mirror for Chinese users (if required):
export HF_ENDPOINT=https://hf-mirror.com 
# Download a certain checkpoint:
hf download Qingyun/RSCoVLM-7B-2512 --repo-type model --local-dir checkpoint/RSCoVLM-7B-2512/
# If any error (such as network error) interrupts the downloading, you just need to execute the same command, the latest hf will resume downloading.

Cite

RSCoVLM and LMMRotate paper:


@ARTICLE{li2026rscovlm,
  author={Li, Qingyun and Ma, Shuran and Luo, Junwei and Yu, Yi and Zhou, Yue and Wang, Fengxiang and Lu, Xudong and Wang, Xiaoxing and He, Xin and Chen, Yushi and Yang, Xue},
  title{Co-Training Vision-Language Models for Remote Sensing Multi-Task Learning},
  journal={Remote Sensing},
  volume={18},
  year={2026},
  number={2},
  article-number={222},
  url={https://www.mdpi.com/2072-4292/18/2/222},
  issn={2072-4292},
  doi={10.3390/rs18020222}
}

@INPROCEEDINGS{11242725,
  author={Li, Qingyun and He, Xin and Shu, Xinya and Yu, Yi and Chen, Dong and Chen, Yushi and Yang, Xue},
  booktitle={IGARSS 2025 - 2025 IEEE International Geoscience and Remote Sensing Symposium}, 
  title={A Simple Aerial Detection Baseline of Multimodal Language Models}, 
  year={2025},
  pages={6833-6837},
  doi={10.1109/IGARSS55030.2025.11242725}
}
Downloads last month
36
Safetensors
Model size
8B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Qingyun/RSCoVLM-7B-2512

Finetuned
(949)
this model
Quantizations
2 models

Dataset used to train Qingyun/RSCoVLM-7B-2512

Collection including Qingyun/RSCoVLM-7B-2512

Paper for Qingyun/RSCoVLM-7B-2512