File size: 9,881 Bytes
e973a22 a6b3eeb e973a22 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 | ---
license: apache-2.0
---
# SSP-SAM: SAM with Semantic-Spatial Prompt for Referring Expression Segmentation
<div align="center">
<a href="https://arxiv.org/abs/xxxx.xxxxx"><img src="https://img.shields.io/badge/arXiv-Coming_Soon-b31b1b?style=flat-square" alt="arXiv"></a>
<a href="https://huggingface.co/wayneicloud/SSP-SAM"><img src="https://img.shields.io/badge/HuggingFace-Checkpoint-yellow?style=flat-square" alt="HF Checkpoint"></a>
<a href="https://huggingface.co/wayneicloud/SSP-SAM"><img src="https://img.shields.io/badge/HuggingFace-Dataset-orange?style=flat-square" alt="HF Dataset"></a>
<img src="https://img.shields.io/badge/License-Apache--2.0-green?style=flat-square" alt="License">
</div>
<div align="center">
<a href='https://scholar.google.com/citations?user=D-27eLIAAAAJ&hl=zh-CN' target='_blank'>Wei Tang</a><sup>1</sup> 
<a href='https://scholar.google.com.hk/citations?hl=zh-CN&user=SVQYcYcAAAAJ' target='_blank'>Xuejing Liu</a><sup>✉,2</sup> 
<a href='https://scholar.google.com.hk/citations?user=a3FI8c4AAAAJ&hl=zh-CN' target='_blank'>Yanpeng Sun</a><sup>3</sup> 
<a href='https://imag-njust.net/zechaoli/' target='_blank'>Zechao Li</a><sup>✉,1</sup>
</div>
<div align="center">
<sup>1</sup>Nanjing University of Science and Technology; 
<sup>2</sup>Institute of Computing Technology, Chinese Academy of Sciences; 
<sup>3</sup>NExT++ Lab, National University of Singapore
<br>
<sup>✉</sup> Corresponding Authors
</div>
---
## Overview
This repository provides the codebase of **SSP-SAM**, a referring expression segmentation framework built on top of SAM with semantic-spatial prompts.
Current repo status:
- Training/testing/data processing scripts are available.
- Multiple dataset configs are provided under `configs/`.
## ๐ฅ News
- **17 Mar, 2026**: Open-source codebase has been organized and released.
- **4 Dec, 2025**: SSP-SAM paper accepted by IEEE TCSVT.
## ๐ ToDo
- [X] Release final model checkpoints on Hugging Face
- [X] Release processed training/evaluation metadata
- [X] Release arXiv version
## ๐ Model Zoo & Links
- Paper: `https://arxiv.org/abs/xxxx.xxxxx`
- <img src="https://huggingface.co/front/assets/huggingface_logo-noborder.svg" alt="HF" width="16"/> Hugging Face Checkpoints/datasets: `https://huggingface.co/wayneicloud/SSP-SAM`
## ๐ Project Structure
```text
.
โโโ configs/ # training/evaluation configs
โโโ data_seg/ # data preprocessing scripts and generated anns/masks
โโโ datasets/ # dataloader and transforms
โโโ models/ # SSP_SAM model definitions
โโโ segment-anything/ # modified SAM dependency (editable install)
โโโ train.py # training entry
โโโ test.py # evaluation entry
โโโ submit_train.sh # train launcher (with examples)
โโโ submit_test.sh # test launcher (with examples)
```
## โ๏ธ Environment Setup
Recommended: conda environment on macOS/Linux.
```bash
conda create -n ssp_sam python=3.10 -y
conda activate ssp_sam
pip install --upgrade pip
# 1) install PyTorch (CUDA example: cu121)
pip install torch==2.1.0+cu121 torchvision==0.16.0+cu121 torchaudio==2.1.0+cu121 --index-url https://download.pytorch.org/whl/cu121
# 2) install modified segment-anything first
cd segment-anything
pip install -e .
cd ..
# 3) install remaining dependencies
pip install -r requirements.txt
```
> Note: the `segment-anything` code in this repository has been modified based on the original SAM implementation.
> Please install the local `segment-anything` in editable mode (`pip install -e .`) as shown above.
## ๐งฉ Data Preparation
Please check:
- `data_seg/README.md`
- `data_seg/run.sh`
You have two options:
1. **Use our provided annotations + generate masks locally (recommended)**
- <img src="https://huggingface.co/front/assets/huggingface_logo-noborder.svg" alt="HF" width="16"/> Download `data_seg/anns/*.json` and other prepared `data_seg` files from Hugging Face:
`https://huggingface.co/wayneicloud/SSP-SAM`
- You can directly use our `data_seg/anns/*.json`.
- `masks` should be generated on your side by running:
```bash
bash data_seg/run.sh
```
2. **Regenerate annotations/masks by yourself**
See the collapsible section below.
<details>
<summary>Generate Annotations/Masks by Yourself (click to expand)</summary>
References:
- `data_seg/README.md`
- `data_seg/run.sh`
- `legacy_data_prep_simrec.md` (legacy reference for raw data preparation and sources)
Required raw annotation folders/files for generation include (examples):
- `data_seg/refcoco/`
- `data_seg/refcoco+/`
- `data_seg/refcocog/`
- `data_seg/refclef/`
Each folder should contain raw files such as `instances.json` and `refs(...).p`.
Minimal expected layout (example):
```text
data_seg/
โโโ refcoco/
โ โโโ instances.json
โ โโโ refs(unc).p
โ โโโ refs(google).p
โโโ refcoco+/
โ โโโ instances.json
โ โโโ refs(unc).p
โโโ refcocog/
โ โโโ instances.json
โ โโโ refs(google).p
โ โโโ refs(umd).p
โโโ refclef/
โโโ instances.json
โโโ refs(unc).p
โโโ refs(berkeley).p
```
Example preprocessing command:
```bash
python ./data_seg/data_process.py \
--data_root ./data_seg \
--output_dir ./data_seg \
--dataset refcoco \
--split unc \
--generate_mask
```
</details>
Detailed dataset path/config settings are defined in the corresponding preprocessing scripts/config files in `data_seg/`.
Please modify them according to your local environment before running.
Also check dataset/image path settings in:
- `datasets/dataset.py`
> Important: in `datasets/dataset.py`, class `VGDataset`, you should update local paths for images/annotations/masks according to your machine.
Example local data organization:
```text
your_project_root/
โโโ data/ # set --data_root to this folder
โ โโโ coco/
โ โ โโโ train2014/ # COCO images (unc/unc+/gref/gref_umd/grefcoco)
โ โโโ referit/
โ โ โโโ images/ # ReferIt images
โ โโโ VG/ # Visual Genome images (merge pretrain path)
โ โโโ vg/ # Visual Genome images (phrase_cut path, if used)
โโโ data_seg/ # same level as data/
โโโ anns/
โ โโโ refcoco.json
โ โโโ refcoco+.json
โ โโโ refcocog_umd.json
โ โโโ refclef.json
โ โโโ grefcoco.json
โโโ masks/
โโโ refcoco/
โโโ refcoco+/
โโโ refcocog_umd/
โโโ refclef/
โโโ grefcoco/
```
For training/testing, use:
- `data_seg/anns/*.json` (provided)
- `data_seg/masks/*` (generated locally via `bash data_seg/run.sh`)
### Required Images and Raw Data Sources
For training/evaluation, you need the corresponding image files locally (COCO/Flickr/ReferIt/VG depending on dataset split and config).
Common sources:
- RefCOCO / RefCOCO+ / RefCOCOg / RefClef annotations: http://bvisionweb1.cs.unc.edu/licheng/referit/data/
- MS COCO 2014 images: https://cocodataset.org/
- Flickr30k images: http://shannon.cs.illinois.edu/DenotationGraph/
- ReferItGame images: due to original dataset restrictions, please download by yourself from the official/authorized source.
- Visual Genome images: https://visualgenome.org/
## ๐ Training
Default training launcher:
```bash
bash submit_train.sh
```
`submit_train.sh` already includes commented examples for multiple datasets, e.g.:
- `refcoco`
- `refcoco+`
- `refcocog_umd`
- `referit`
- `grefcoco`
You can also run directly:
```bash
torchrun --nproc_per_node=8 train.py \
--config configs/SSP_SAM_CLIP_B_FT_unc.py \
--clip_pretrained pretrained_checkpoints/CS/CS-ViT-B-16.pt
```
### Resume Modes
`train.py` supports two resume modes:
- `--resume <ckpt>`: use this for interrupted training and continue from the previous checkpoint (ๆญ็น็ปญ่ฎญ).
- `--resume_from_pretrain <ckpt>`: use this for loading pretrained weights before fine-tuning/training.
## ๐ Evaluation
Default testing launcher:
```bash
bash submit_test.sh
```
Example direct command:
```bash
torchrun --nproc_per_node=1 --master_port=29590 test.py \
--config configs/SSP_SAM_CLIP_L_FT_unc.py \
--test_split testB \
--clip_pretrained pretrained_checkpoints/CS/CS-ViT-L-14-336px.pt \
--checkpoint output/your_save_folder/checkpoint_best_miou.pth
```
## ๐ Notes
- COCO image path in visualization prioritizes `data/coco/train2014`.
- Current mask prediction/evaluation path uses `512x512` mask space.
- Config files in `configs/` are set with:
- `output_dir='outputs/your_save_folder'`
- `batch_size=8`
- `freeze_epochs=20`
## ๐ Acknowledgements
This repository benefits from ideas and/or codebases of the following projects:
- SimREC: https://github.com/luogen1996/SimREC
- gRefCOCO: https://github.com/henghuiding/gRefCOCO
- TransVG: https://github.com/djiajunustc/TransVG
- Segment Anything (SAM): https://github.com/facebookresearch/segment-anything
Thanks to the authors for their valuable open-source contributions.
## ๐ Citation
If you find this repository useful, please cite our SSP-SAM paper.
```bibtex
@article{ssp_sam_tcsvt,
title={SSP-SAM: SAM with Semantic-Spatial Prompt for Referring Expression Segmentation},
author={Tang, Wei and Liu, Xuejing and Sun, Yanpeng and Li, Zechao},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
year={2025}
}
```
|