Think2Seg-RS-3B / README.md
nielsr's picture
nielsr HF Staff
Update model card and pipeline tag
20c5f29 verified
|
raw
history blame
2.15 kB
metadata
base_model:
  - Qwen/Qwen2.5-VL-3B-Instruct
library_name: transformers
pipeline_tag: image-segmentation

Think2Seg-RS-3B

This repository contains the 3B prompter model for Think2Seg-RS, a decoupled framework for reasoning segmentation in remote sensing (RS) imagery.

Overview

Think2Seg-RS addresses the limitations of coupling linguistic reasoning and pixel prediction in remote sensing analysis. The framework decouples high-level semantic reasoning from low-level geometric execution by training an LVLM prompter (based on Qwen2.5-VL) to control a frozen Segment Anything Model (SAM2) via structured geometric prompts.

Through a result-oriented reinforcement learning objective, the model learns to translate abstract semantic reasoning into spatially grounded actions, achieving state-of-the-art performance on the EarthReason dataset.

Key Features

  • Decoupled Architecture: Separates high-level semantic reasoning from low-level geometric execution.
  • Geospatial Understanding: Optimized for the complexities of remote sensing imagery and heterogeneous backgrounds.
  • Zero-shot Generalization: The learned prompting policy generalizes effectively across multiple referring segmentation benchmarks.

Setup and Usage

For installation, training, and evaluation scripts, please visit the official GitHub repository.

Citation

If you find this work helpful, please consider citing:

@article{think2seg_rs_2025,
  title={Bridging Semantics and Geometry: A Decoupled LVLM-SAM Framework for Reasoning Segmentation in Remote Sensing},
  author={Luo, Junyu and Luo, Xiao and Chen, Xiusi and Xiao, Zhiping and Ju, Wei and Zhang, Ming},
  journal={arXiv preprint arXiv:2512.19302},
  year={2025}
}