RSCoVLM-7B-2512 / README.md
Qingyun's picture
Update README.md
be83f28 verified
metadata
license: mit
datasets:
  - Qingyun/remote-sensing-sft-data
language:
  - en
base_model:
  - Qwen/Qwen2.5-VL-7B-Instruct
pipeline_tag: image-text-to-text
tags:
  - aerial
  - geoscience
  - remote sensing

RSCoVLM: Co-Training Vision Language Models for Remote Sensing Multi-task Learning

Qingyun Li*  Shuran Ma*  Junwei Luo*  Yi Yu*  Yue Zhou  Fengxiang Wang  Xudong Lu  Xiaoxing Wang  Xin He  Yushi Chen  Xue Yang 

If you find our work helpful, please consider giving us a ⭐!

This repo hosts the checkpoint of RSCoVLM based on Qwen/Qwen2.5-VL-7B-Instruct trained on a comprehensive remote sensing data recipe.

RSCoVLM is a technical practice to fine-tune Large Multimodal language Models for remote sensing image understanding, ultra-high-resolution image reasoning, oriented object detection, and so on. This repo hosts the official model weight of the paper: RSCoVLM: Co-Training Vision Language Models for Remote Sensing Multi-task Learning.

framework

Downloading Guide

You can download with your web browser on the file page.

We recommand downloading in terminal using hf (pip install --upgrade huggingface_hub). You can refer to the document for more usages.

# Set Huggingface Mirror for Chinese users (if required):
export HF_ENDPOINT=https://hf-mirror.com 
# Download a certain checkpoint:
hf download Qingyun/RSCoVLM-7B-2512 --repo-type model --local-dir checkpoint/RSCoVLM-7B-2512/
# If any error (such as network error) interrupts the downloading, you just need to execute the same command, the latest hf will resume downloading.

Cite

RSCoVLM and LMMRotate paper:


@ARTICLE{li2026rscovlm,
  author={Li, Qingyun and Ma, Shuran and Luo, Junwei and Yu, Yi and Zhou, Yue and Wang, Fengxiang and Lu, Xudong and Wang, Xiaoxing and He, Xin and Chen, Yushi and Yang, Xue},
  title={Co-Training Vision-Language Models for Remote Sensing Multi-Task Learning},
  journal={Remote Sensing},
  volume={18},
  year={2026},
  number={2},
  article-number={222},
  url={https://www.mdpi.com/2072-4292/18/2/222},
  issn={2072-4292},
  doi={10.3390/rs18020222}
}

@INPROCEEDINGS{11242725,
  author={Li, Qingyun and He, Xin and Shu, Xinya and Yu, Yi and Chen, Dong and Chen, Yushi and Yang, Xue},
  booktitle={IGARSS 2025 - 2025 IEEE International Geoscience and Remote Sensing Symposium}, 
  title={A Simple Aerial Detection Baseline of Multimodal Language Models}, 
  year={2025},
  pages={6833-6837},
  doi={10.1109/IGARSS55030.2025.11242725}
}