File size: 9,881 Bytes
e973a22
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a6b3eeb
e973a22
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
---
license: apache-2.0
---
# SSP-SAM: SAM with Semantic-Spatial Prompt for Referring Expression Segmentation

<div align="center">
  <a href="https://arxiv.org/abs/xxxx.xxxxx"><img src="https://img.shields.io/badge/arXiv-Coming_Soon-b31b1b?style=flat-square" alt="arXiv"></a>
  <a href="https://huggingface.co/wayneicloud/SSP-SAM"><img src="https://img.shields.io/badge/HuggingFace-Checkpoint-yellow?style=flat-square" alt="HF Checkpoint"></a>
  <a href="https://huggingface.co/wayneicloud/SSP-SAM"><img src="https://img.shields.io/badge/HuggingFace-Dataset-orange?style=flat-square" alt="HF Dataset"></a>
  <img src="https://img.shields.io/badge/License-Apache--2.0-green?style=flat-square" alt="License">
</div>

<div align="center">
    <a href='https://scholar.google.com/citations?user=D-27eLIAAAAJ&hl=zh-CN' target='_blank'>Wei Tang</a><sup>1</sup>&emsp;
    <a href='https://scholar.google.com.hk/citations?hl=zh-CN&user=SVQYcYcAAAAJ' target='_blank'>Xuejing Liu</a><sup>&#x2709,2</sup>&emsp;
    <a href='https://scholar.google.com.hk/citations?user=a3FI8c4AAAAJ&hl=zh-CN' target='_blank'>Yanpeng Sun</a><sup>3</sup>&emsp;
    <a href='https://imag-njust.net/zechaoli/' target='_blank'>Zechao Li</a><sup>&#x2709,1</sup>
</div>

<div align="center">
    <sup>1</sup>Nanjing University of Science and Technology;&emsp;
    <sup>2</sup>Institute of Computing Technology, Chinese Academy of Sciences;&emsp;
    <sup>3</sup>NExT++ Lab, National University of Singapore
    <br>
    <sup>&#x2709</sup> Corresponding Authors
</div>

---

## Overview

This repository provides the codebase of **SSP-SAM**, a referring expression segmentation framework built on top of SAM with semantic-spatial prompts.

Current repo status:
- Training/testing/data processing scripts are available.
- Multiple dataset configs are provided under `configs/`.

## ๐Ÿ’ฅ News

- **17 Mar, 2026**: Open-source codebase has been organized and released.
- **4 Dec, 2025**: SSP-SAM paper accepted by IEEE TCSVT.

## ๐Ÿ“Œ ToDo

- [X] Release final model checkpoints on Hugging Face
- [X] Release processed training/evaluation metadata
- [X] Release arXiv version

## ๐Ÿ”— Model Zoo & Links

- Paper: `https://arxiv.org/abs/xxxx.xxxxx`
- <img src="https://huggingface.co/front/assets/huggingface_logo-noborder.svg" alt="HF" width="16"/> Hugging Face Checkpoints/datasets: `https://huggingface.co/wayneicloud/SSP-SAM`

## ๐Ÿ“ Project Structure

```text
.
โ”œโ”€โ”€ configs/                 # training/evaluation configs
โ”œโ”€โ”€ data_seg/                # data preprocessing scripts and generated anns/masks
โ”œโ”€โ”€ datasets/                # dataloader and transforms
โ”œโ”€โ”€ models/                  # SSP_SAM model definitions
โ”œโ”€โ”€ segment-anything/        # modified SAM dependency (editable install)
โ”œโ”€โ”€ train.py                 # training entry
โ”œโ”€โ”€ test.py                  # evaluation entry
โ”œโ”€โ”€ submit_train.sh          # train launcher (with examples)
โ””โ”€โ”€ submit_test.sh           # test launcher (with examples)
```

## โš™๏ธ Environment Setup

Recommended: conda environment on macOS/Linux.

```bash
conda create -n ssp_sam python=3.10 -y
conda activate ssp_sam
pip install --upgrade pip

# 1) install PyTorch (CUDA example: cu121)
pip install torch==2.1.0+cu121 torchvision==0.16.0+cu121 torchaudio==2.1.0+cu121 --index-url https://download.pytorch.org/whl/cu121

# 2) install modified segment-anything first
cd segment-anything
pip install -e .
cd ..

# 3) install remaining dependencies
pip install -r requirements.txt
```

> Note: the `segment-anything` code in this repository has been modified based on the original SAM implementation.  
> Please install the local `segment-anything` in editable mode (`pip install -e .`) as shown above.

## ๐Ÿงฉ Data Preparation

Please check:
- `data_seg/README.md`
- `data_seg/run.sh`

You have two options:

1. **Use our provided annotations + generate masks locally (recommended)**  
   - <img src="https://huggingface.co/front/assets/huggingface_logo-noborder.svg" alt="HF" width="16"/> Download `data_seg/anns/*.json` and other prepared `data_seg` files from Hugging Face:  
     `https://huggingface.co/wayneicloud/SSP-SAM`
   - You can directly use our `data_seg/anns/*.json`.
   - `masks` should be generated on your side by running:
     ```bash
     bash data_seg/run.sh
     ```

2. **Regenerate annotations/masks by yourself**  
   See the collapsible section below.

<details>
<summary>Generate Annotations/Masks by Yourself (click to expand)</summary>

References:
- `data_seg/README.md`
- `data_seg/run.sh`
- `legacy_data_prep_simrec.md` (legacy reference for raw data preparation and sources)

Required raw annotation folders/files for generation include (examples):
- `data_seg/refcoco/`
- `data_seg/refcoco+/`
- `data_seg/refcocog/`
- `data_seg/refclef/`

Each folder should contain raw files such as `instances.json` and `refs(...).p`.

Minimal expected layout (example):

```text
data_seg/
โ”œโ”€โ”€ refcoco/
โ”‚   โ”œโ”€โ”€ instances.json
โ”‚   โ”œโ”€โ”€ refs(unc).p
โ”‚   โ””โ”€โ”€ refs(google).p
โ”œโ”€โ”€ refcoco+/
โ”‚   โ”œโ”€โ”€ instances.json
โ”‚   โ””โ”€โ”€ refs(unc).p
โ”œโ”€โ”€ refcocog/
โ”‚   โ”œโ”€โ”€ instances.json
โ”‚   โ”œโ”€โ”€ refs(google).p
โ”‚   โ””โ”€โ”€ refs(umd).p
โ””โ”€โ”€ refclef/
    โ”œโ”€โ”€ instances.json
    โ”œโ”€โ”€ refs(unc).p
    โ””โ”€โ”€ refs(berkeley).p
```

Example preprocessing command:

```bash
python ./data_seg/data_process.py \
  --data_root ./data_seg \
  --output_dir ./data_seg \
  --dataset refcoco \
  --split unc \
  --generate_mask
```

</details>

Detailed dataset path/config settings are defined in the corresponding preprocessing scripts/config files in `data_seg/`.  
Please modify them according to your local environment before running.
Also check dataset/image path settings in:
- `datasets/dataset.py`

> Important: in `datasets/dataset.py`, class `VGDataset`, you should update local paths for images/annotations/masks according to your machine.

Example local data organization:

```text
your_project_root/
โ”œโ”€โ”€ data/                                        # set --data_root to this folder
โ”‚   โ”œโ”€โ”€ coco/
โ”‚   โ”‚   โ””โ”€โ”€ train2014/                           # COCO images (unc/unc+/gref/gref_umd/grefcoco)
โ”‚   โ”œโ”€โ”€ referit/
โ”‚   โ”‚   โ””โ”€โ”€ images/                              # ReferIt images
โ”‚   โ”œโ”€โ”€ VG/                                      # Visual Genome images (merge pretrain path)
โ”‚   โ””โ”€โ”€ vg/                                      # Visual Genome images (phrase_cut path, if used)
โ””โ”€โ”€ data_seg/                                    # same level as data/
    โ”œโ”€โ”€ anns/
    โ”‚   โ”œโ”€โ”€ refcoco.json
    โ”‚   โ”œโ”€โ”€ refcoco+.json
    โ”‚   โ”œโ”€โ”€ refcocog_umd.json
    โ”‚   โ”œโ”€โ”€ refclef.json
    โ”‚   โ””โ”€โ”€ grefcoco.json
    โ””โ”€โ”€ masks/
        โ”œโ”€โ”€ refcoco/
        โ”œโ”€โ”€ refcoco+/
        โ”œโ”€โ”€ refcocog_umd/
        โ”œโ”€โ”€ refclef/
        โ””โ”€โ”€ grefcoco/
```

For training/testing, use:
- `data_seg/anns/*.json` (provided)
- `data_seg/masks/*` (generated locally via `bash data_seg/run.sh`)

### Required Images and Raw Data Sources

For training/evaluation, you need the corresponding image files locally (COCO/Flickr/ReferIt/VG depending on dataset split and config).  
Common sources:
- RefCOCO / RefCOCO+ / RefCOCOg / RefClef annotations: http://bvisionweb1.cs.unc.edu/licheng/referit/data/
- MS COCO 2014 images: https://cocodataset.org/
- Flickr30k images: http://shannon.cs.illinois.edu/DenotationGraph/
- ReferItGame images: due to original dataset restrictions, please download by yourself from the official/authorized source.
- Visual Genome images: https://visualgenome.org/

## ๐Ÿš€ Training

Default training launcher:

```bash
bash submit_train.sh
```

`submit_train.sh` already includes commented examples for multiple datasets, e.g.:
- `refcoco`
- `refcoco+`
- `refcocog_umd`
- `referit`
- `grefcoco`

You can also run directly:

```bash
torchrun --nproc_per_node=8 train.py \
  --config configs/SSP_SAM_CLIP_B_FT_unc.py \
  --clip_pretrained pretrained_checkpoints/CS/CS-ViT-B-16.pt
```

### Resume Modes

`train.py` supports two resume modes:
- `--resume <ckpt>`: use this for interrupted training and continue from the previous checkpoint (ๆ–ญ็‚น็ปญ่ฎญ).
- `--resume_from_pretrain <ckpt>`: use this for loading pretrained weights before fine-tuning/training.

## ๐Ÿ“Š Evaluation

Default testing launcher:

```bash
bash submit_test.sh
```

Example direct command:

```bash
torchrun --nproc_per_node=1 --master_port=29590 test.py \
  --config configs/SSP_SAM_CLIP_L_FT_unc.py \
  --test_split testB \
  --clip_pretrained pretrained_checkpoints/CS/CS-ViT-L-14-336px.pt \
  --checkpoint output/your_save_folder/checkpoint_best_miou.pth
```

## ๐Ÿ“ Notes

- COCO image path in visualization prioritizes `data/coco/train2014`.
- Current mask prediction/evaluation path uses `512x512` mask space.
- Config files in `configs/` are set with:
  - `output_dir='outputs/your_save_folder'`
  - `batch_size=8`
  - `freeze_epochs=20`

## ๐ŸŒˆ Acknowledgements

This repository benefits from ideas and/or codebases of the following projects:

- SimREC: https://github.com/luogen1996/SimREC
- gRefCOCO: https://github.com/henghuiding/gRefCOCO
- TransVG: https://github.com/djiajunustc/TransVG
- Segment Anything (SAM): https://github.com/facebookresearch/segment-anything

Thanks to the authors for their valuable open-source contributions.

## ๐Ÿ“š Citation

If you find this repository useful, please cite our SSP-SAM paper.

```bibtex
@article{ssp_sam_tcsvt,
  title={SSP-SAM: SAM with Semantic-Spatial Prompt for Referring Expression Segmentation},
  author={Tang, Wei and Liu, Xuejing and Sun, Yanpeng and Li, Zechao},
  journal={IEEE Transactions on Circuits and Systems for Video Technology},
  year={2025}
}
```