| --- |
| license: apache-2.0 |
| pipeline_tag: image-segmentation |
| tags: |
| - referring-expression-segmentation |
| - sam |
| - gres |
| --- |
| |
| # SSP-SAM: SAM with Semantic-Spatial Prompt for Referring Expression Segmentation |
|
|
| <div align="center"> |
| <a href="https://arxiv.org/abs/2603.18086"><img src="https://img.shields.io/badge/arXiv-2603.18086-b31b1b?style=flat-square" alt="arXiv"></a> |
| <a href="https://huggingface.co/wayneicloud/SSP-SAM"><img src="https://img.shields.io/badge/HuggingFace-Checkpoint-yellow?style=flat-square" alt="HF Checkpoint"></a> |
| <a href="https://huggingface.co/wayneicloud/SSP-SAM"><img src="https://img.shields.io/badge/HuggingFace-Dataset-orange?style=flat-square" alt="HF Dataset"></a> |
| <img src="https://img.shields.io/badge/License-Apache--2.0-green?style=flat-square" alt="License"> |
| </div> |
|
|
| <div align="center"> |
| <a href='https://scholar.google.com/citations?user=D-27eLIAAAAJ&hl=zh-CN' target='_blank'>Wei Tang</a><sup>1</sup>  |
| <a href='https://scholar.google.com.hk/citations?hl=zh-CN&user=SVQYcYcAAAAJ' target='_blank'>Xuejing Liu</a><sup>✉,2</sup>  |
| <a href='https://scholar.google.com.hk/citations?user=a3FI8c4AAAAJ&hl=zh-CN' target='_blank'>Yanpeng Sun</a><sup>3</sup>  |
| <a href='https://imag-njust.net/zechaoli/' target='_blank'>Zechao Li</a><sup>✉,1</sup> |
| </div> |
| |
| <div align="center"> |
| <sup>1</sup>Nanjing University of Science and Technology;  |
| <sup>2</sup>Institute of Computing Technology, Chinese Academy of Sciences;  |
| <sup>3</sup>NExT++ Lab, National University of Singapore |
| <br> |
| <sup>✉</sup> Corresponding Authors |
| </div> |
| |
| --- |
|
|
| ## Overview |
|
|
| This repository provides the codebase of **SSP-SAM**, a referring expression segmentation framework built on top of SAM with semantic-spatial prompts. The model is presented in the paper [SSP-SAM: SAM with Semantic-Spatial Prompt for Referring Expression Segmentation](https://arxiv.org/abs/2603.18086). |
|
|
| Current repo status: |
| - Training/testing/data processing scripts are available. |
| - Multiple dataset configs are provided under `configs/`. |
|
|
| ## π₯ News |
|
|
| - **17 Mar, 2026**: Open-source codebase has been organized and released. |
| - **4 Dec, 2025**: SSP-SAM paper accepted by IEEE TCSVT. |
|
|
| ## π ToDo |
|
|
| - [X] Release final model checkpoints on Hugging Face |
| - [X] Release processed training/evaluation metadata |
| - [X] Release arXiv version |
|
|
| ## π Model Zoo & Links |
|
|
| - Paper: [SSP-SAM (arXiv:2603.18086)](https://arxiv.org/abs/2603.18086) |
| - Code: [GitHub - WayneTomas/SSP-SAM](https://github.com/WayneTomas/SSP-SAM) |
| - <img src="https://huggingface.co/front/assets/huggingface_logo-noborder.svg" alt="HF" width="16"/> Hugging Face Checkpoints/datasets: `https://huggingface.co/wayneicloud/SSP-SAM` |
|
|
| ## π Project Structure |
|
|
| ```text |
| . |
| βββ configs/ # training/evaluation configs |
| βββ data_seg/ # data preprocessing scripts and generated anns/masks |
| βββ datasets/ # dataloader and transforms |
| βββ models/ # SSP_SAM model definitions |
| βββ segment-anything/ # modified SAM dependency (editable install) |
| βββ train.py # training entry |
| βββ test.py # evaluation entry |
| βββ submit_train.sh # train launcher (with examples) |
| βββ submit_test.sh # test launcher (with examples) |
| ``` |
|
|
| ## βοΈ Environment Setup |
|
|
| Recommended: conda environment on macOS/Linux. |
|
|
| ```bash |
| conda create -n ssp_sam python=3.10 -y |
| conda activate ssp_sam |
| pip install --upgrade pip |
| |
| # 1) install PyTorch (CUDA example: cu121) |
| pip install torch==2.1.0+cu121 torchvision==0.16.0+cu121 torchaudio==2.1.0+cu121 --index-url https://download.pytorch.org/whl/cu121 |
| |
| # 2) install modified segment-anything first |
| cd segment-anything |
| pip install -e . |
| cd .. |
| |
| # 3) install remaining dependencies |
| pip install -r requirements.txt |
| ``` |
|
|
| > Note: the `segment-anything` code in this repository has been modified based on the original SAM implementation. |
| > Please install the local `segment-anything` in editable mode (`pip install -e .`) as shown above. |
|
|
| ## π§© Data Preparation |
|
|
| Please check: |
| - `data_seg/README.md` |
| - `data_seg/run.sh` |
|
|
| You have two options: |
|
|
| 1. **Use our provided annotations + generate masks locally (recommended)** |
| - <img src="https://huggingface.co/front/assets/huggingface_logo-noborder.svg" alt="HF" width="16"/> Download `data_seg/anns/*.json` and other prepared `data_seg` files from Hugging Face: |
| `https://huggingface.co/wayneicloud/SSP-SAM` |
| - You can directly use our `data_seg/anns/*.json`. |
| - `masks` should be generated on your side by running: |
| ```bash |
| bash data_seg/run.sh |
| ``` |
| |
| 2. **Regenerate annotations/masks by yourself** |
| See the collapsible section below in the [GitHub repository](https://github.com/WayneTomas/SSP-SAM). |
|
|
| ## π Training |
|
|
| Default training launcher: |
|
|
| ```bash |
| bash submit_train.sh |
| ``` |
|
|
| You can also run directly: |
|
|
| ```bash |
| torchrun --nproc_per_node=8 train.py \ |
| --config configs/SSP_SAM_CLIP_B_FT_unc.py \ |
| --clip_pretrained pretrained_checkpoints/CS/CS-ViT-B-16.pt |
| ``` |
|
|
| ### Resume Modes |
|
|
| `train.py` supports two resume modes: |
| - `--resume <ckpt>`: use this for interrupted training and continue from the previous checkpoint. |
| - `--resume_from_pretrain <ckpt>`: use this for loading pretrained weights before fine-tuning/training. |
|
|
| ## π Evaluation |
|
|
| Default testing launcher: |
|
|
| ```bash |
| bash submit_test.sh |
| ``` |
|
|
| Example direct command: |
|
|
| ```bash |
| torchrun --nproc_per_node=1 --master_port=29590 test.py \ |
| --config configs/SSP_SAM_CLIP_L_FT_unc.py \ |
| --test_split testB \ |
| --clip_pretrained pretrained_checkpoints/CS/CS-ViT-L-14-336px.pt \ |
| --checkpoint output/your_save_folder/checkpoint_best_miou.pth |
| ``` |
|
|
| ## π Acknowledgements |
|
|
| This repository benefits from ideas and/or codebases of the following projects: |
| - SimREC: https://github.com/luogen1996/SimREC |
| - gRefCOCO: https://github.com/henghuiding/gRefCOCO |
| - TransVG: https://github.com/djiajunustc/TransVG |
| - Segment Anything (SAM): https://github.com/facebookresearch/segment-anything |
|
|
| ## π Citation |
|
|
| If you find this repository useful, please cite our SSP-SAM paper. |
|
|
| ```bibtex |
| @article{ssp_sam_tcsvt, |
| title={SSP-SAM: SAM with Semantic-Spatial Prompt for Referring Expression Segmentation}, |
| author={Tang, Wei and Liu, Xuejing and Sun, Yanpeng and Li, Zechao}, |
| journal={IEEE Transactions on Circuits and Systems for Video Technology}, |
| year={2025} |
| } |
| ``` |