base_model:
- Qwen/Qwen2.5-VL-3B-Instruct
library_name: transformers
pipeline_tag: image-segmentation
Think2Seg-RS-3B
This repository contains the 3B prompter model for Think2Seg-RS, a decoupled framework for reasoning segmentation in remote sensing (RS) imagery.
Overview
Think2Seg-RS addresses the limitations of coupling linguistic reasoning and pixel prediction in remote sensing analysis. The framework decouples high-level semantic reasoning from low-level geometric execution by training an LVLM prompter (based on Qwen2.5-VL) to control a frozen Segment Anything Model (SAM2) via structured geometric prompts.
Through a result-oriented reinforcement learning objective, the model learns to translate abstract semantic reasoning into spatially grounded actions, achieving state-of-the-art performance on the EarthReason dataset.
- Paper: Bridging Semantics and Geometry: A Decoupled LVLM-SAM Framework for Reasoning Segmentation in Remote Sensing
- Repository: GitHub - Ricardo-XZ/Think2Seg-RS
- Base Model: Qwen2.5-VL-3B-Instruct
Key Features
- Decoupled Architecture: Separates high-level semantic reasoning from low-level geometric execution.
- Geospatial Understanding: Optimized for the complexities of remote sensing imagery and heterogeneous backgrounds.
- Zero-shot Generalization: The learned prompting policy generalizes effectively across multiple referring segmentation benchmarks.
Setup and Usage
For installation, training, and evaluation scripts, please visit the official GitHub repository.
Citation
If you find this work helpful, please consider citing:
@article{think2seg_rs_2025,
title={Bridging Semantics and Geometry: A Decoupled LVLM-SAM Framework for Reasoning Segmentation in Remote Sensing},
author={Luo, Junyu and Luo, Xiao and Chen, Xiusi and Xiao, Zhiping and Ju, Wei and Zhang, Ming},
journal={arXiv preprint arXiv:2512.19302},
year={2025}
}