wayneicloud commited on
Commit
e973a22
ยท
verified ยท
1 Parent(s): 9dc3ec3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +288 -3
README.md CHANGED
@@ -1,3 +1,288 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+ # SSP-SAM: SAM with Semantic-Spatial Prompt for Referring Expression Segmentation
5
+
6
+ <div align="center">
7
+ <a href="https://arxiv.org/abs/xxxx.xxxxx"><img src="https://img.shields.io/badge/arXiv-Coming_Soon-b31b1b?style=flat-square" alt="arXiv"></a>
8
+ <a href="https://huggingface.co/wayneicloud/SSP-SAM"><img src="https://img.shields.io/badge/HuggingFace-Checkpoint-yellow?style=flat-square" alt="HF Checkpoint"></a>
9
+ <a href="https://huggingface.co/wayneicloud/SSP-SAM"><img src="https://img.shields.io/badge/HuggingFace-Dataset-orange?style=flat-square" alt="HF Dataset"></a>
10
+ <img src="https://img.shields.io/badge/License-Apache--2.0-green?style=flat-square" alt="License">
11
+ </div>
12
+
13
+ <div align="center">
14
+ <a href='https://scholar.google.com/citations?user=D-27eLIAAAAJ&hl=zh-CN' target='_blank'>Wei Tang</a><sup>1</sup>&emsp;
15
+ <a href='https://scholar.google.com.hk/citations?hl=zh-CN&user=SVQYcYcAAAAJ' target='_blank'>Xuejing Liu</a><sup>&#x2709,2</sup>&emsp;
16
+ <a href='https://scholar.google.com.hk/citations?user=a3FI8c4AAAAJ&hl=zh-CN' target='_blank'>Yanpeng Sun</a><sup>3</sup>&emsp;
17
+ <a href='https://imag-njust.net/zechaoli/' target='_blank'>Zechao Li</a><sup>&#x2709,1</sup>
18
+ </div>
19
+
20
+ <div align="center">
21
+ <sup>1</sup>Nanjing University of Science and Technology;&emsp;
22
+ <sup>2</sup>Institute of Computing Technology, Chinese Academy of Sciences;&emsp;
23
+ <sup>3</sup>NExT++ Lab, National University of Singapore
24
+ <br>
25
+ <sup>&#x2709</sup> Corresponding Authors
26
+ </div>
27
+
28
+ ---
29
+
30
+ ## Overview
31
+
32
+ This repository provides the codebase of **SSP-SAM**, a referring expression segmentation framework built on top of SAM with semantic-spatial prompts.
33
+
34
+ Current repo status:
35
+ - Training/testing/data processing scripts are available.
36
+ - Multiple dataset configs are provided under `configs/`.
37
+
38
+ ## ๐Ÿ’ฅ News
39
+
40
+ - **17 Mar, 2026**: Open-source codebase has been organized and released.
41
+ - **4 Dec, 2025**: SSP-SAM paper accepted by IEEE TCSVT.
42
+
43
+ ## ๐Ÿ“Œ ToDo
44
+
45
+ - [X] Release final model checkpoints on Hugging Face
46
+ - [X] Release processed training/evaluation metadata
47
+ - [ ] Release arXiv version
48
+
49
+ ## ๐Ÿ”— Model Zoo & Links
50
+
51
+ - Paper: `https://arxiv.org/abs/xxxx.xxxxx`
52
+ - <img src="https://huggingface.co/front/assets/huggingface_logo-noborder.svg" alt="HF" width="16"/> Hugging Face Checkpoints/datasets: `https://huggingface.co/wayneicloud/SSP-SAM`
53
+
54
+ ## ๐Ÿ“ Project Structure
55
+
56
+ ```text
57
+ .
58
+ โ”œโ”€โ”€ configs/ # training/evaluation configs
59
+ โ”œโ”€โ”€ data_seg/ # data preprocessing scripts and generated anns/masks
60
+ โ”œโ”€โ”€ datasets/ # dataloader and transforms
61
+ โ”œโ”€โ”€ models/ # SSP_SAM model definitions
62
+ โ”œโ”€โ”€ segment-anything/ # modified SAM dependency (editable install)
63
+ โ”œโ”€โ”€ train.py # training entry
64
+ โ”œโ”€โ”€ test.py # evaluation entry
65
+ โ”œโ”€โ”€ submit_train.sh # train launcher (with examples)
66
+ โ””โ”€โ”€ submit_test.sh # test launcher (with examples)
67
+ ```
68
+
69
+ ## โš™๏ธ Environment Setup
70
+
71
+ Recommended: conda environment on macOS/Linux.
72
+
73
+ ```bash
74
+ conda create -n ssp_sam python=3.10 -y
75
+ conda activate ssp_sam
76
+ pip install --upgrade pip
77
+
78
+ # 1) install PyTorch (CUDA example: cu121)
79
+ pip install torch==2.1.0+cu121 torchvision==0.16.0+cu121 torchaudio==2.1.0+cu121 --index-url https://download.pytorch.org/whl/cu121
80
+
81
+ # 2) install modified segment-anything first
82
+ cd segment-anything
83
+ pip install -e .
84
+ cd ..
85
+
86
+ # 3) install remaining dependencies
87
+ pip install -r requirements.txt
88
+ ```
89
+
90
+ > Note: the `segment-anything` code in this repository has been modified based on the original SAM implementation.
91
+ > Please install the local `segment-anything` in editable mode (`pip install -e .`) as shown above.
92
+
93
+ ## ๐Ÿงฉ Data Preparation
94
+
95
+ Please check:
96
+ - `data_seg/README.md`
97
+ - `data_seg/run.sh`
98
+
99
+ You have two options:
100
+
101
+ 1. **Use our provided annotations + generate masks locally (recommended)**
102
+ - <img src="https://huggingface.co/front/assets/huggingface_logo-noborder.svg" alt="HF" width="16"/> Download `data_seg/anns/*.json` and other prepared `data_seg` files from Hugging Face:
103
+ `https://huggingface.co/wayneicloud/SSP-SAM`
104
+ - You can directly use our `data_seg/anns/*.json`.
105
+ - `masks` should be generated on your side by running:
106
+ ```bash
107
+ bash data_seg/run.sh
108
+ ```
109
+
110
+ 2. **Regenerate annotations/masks by yourself**
111
+ See the collapsible section below.
112
+
113
+ <details>
114
+ <summary>Generate Annotations/Masks by Yourself (click to expand)</summary>
115
+
116
+ References:
117
+ - `data_seg/README.md`
118
+ - `data_seg/run.sh`
119
+ - `legacy_data_prep_simrec.md` (legacy reference for raw data preparation and sources)
120
+
121
+ Required raw annotation folders/files for generation include (examples):
122
+ - `data_seg/refcoco/`
123
+ - `data_seg/refcoco+/`
124
+ - `data_seg/refcocog/`
125
+ - `data_seg/refclef/`
126
+
127
+ Each folder should contain raw files such as `instances.json` and `refs(...).p`.
128
+
129
+ Minimal expected layout (example):
130
+
131
+ ```text
132
+ data_seg/
133
+ โ”œโ”€โ”€ refcoco/
134
+ โ”‚ โ”œโ”€โ”€ instances.json
135
+ โ”‚ โ”œโ”€โ”€ refs(unc).p
136
+ โ”‚ โ””โ”€โ”€ refs(google).p
137
+ โ”œโ”€โ”€ refcoco+/
138
+ โ”‚ โ”œโ”€โ”€ instances.json
139
+ โ”‚ โ””โ”€โ”€ refs(unc).p
140
+ โ”œโ”€โ”€ refcocog/
141
+ โ”‚ โ”œโ”€โ”€ instances.json
142
+ โ”‚ โ”œโ”€โ”€ refs(google).p
143
+ โ”‚ โ””โ”€โ”€ refs(umd).p
144
+ โ””โ”€โ”€ refclef/
145
+ โ”œโ”€โ”€ instances.json
146
+ โ”œโ”€โ”€ refs(unc).p
147
+ โ””โ”€โ”€ refs(berkeley).p
148
+ ```
149
+
150
+ Example preprocessing command:
151
+
152
+ ```bash
153
+ python ./data_seg/data_process.py \
154
+ --data_root ./data_seg \
155
+ --output_dir ./data_seg \
156
+ --dataset refcoco \
157
+ --split unc \
158
+ --generate_mask
159
+ ```
160
+
161
+ </details>
162
+
163
+ Detailed dataset path/config settings are defined in the corresponding preprocessing scripts/config files in `data_seg/`.
164
+ Please modify them according to your local environment before running.
165
+ Also check dataset/image path settings in:
166
+ - `datasets/dataset.py`
167
+
168
+ > Important: in `datasets/dataset.py`, class `VGDataset`, you should update local paths for images/annotations/masks according to your machine.
169
+
170
+ Example local data organization:
171
+
172
+ ```text
173
+ your_project_root/
174
+ โ”œโ”€โ”€ data/ # set --data_root to this folder
175
+ โ”‚ โ”œโ”€โ”€ coco/
176
+ โ”‚ โ”‚ โ””โ”€โ”€ train2014/ # COCO images (unc/unc+/gref/gref_umd/grefcoco)
177
+ โ”‚ โ”œโ”€โ”€ referit/
178
+ โ”‚ โ”‚ โ””โ”€โ”€ images/ # ReferIt images
179
+ โ”‚ โ”œโ”€โ”€ VG/ # Visual Genome images (merge pretrain path)
180
+ โ”‚ โ””โ”€โ”€ vg/ # Visual Genome images (phrase_cut path, if used)
181
+ โ””โ”€โ”€ data_seg/ # same level as data/
182
+ โ”œโ”€โ”€ anns/
183
+ โ”‚ โ”œโ”€โ”€ refcoco.json
184
+ โ”‚ โ”œโ”€โ”€ refcoco+.json
185
+ โ”‚ โ”œโ”€โ”€ refcocog_umd.json
186
+ โ”‚ โ”œโ”€โ”€ refclef.json
187
+ โ”‚ โ””โ”€โ”€ grefcoco.json
188
+ โ””โ”€โ”€ masks/
189
+ โ”œโ”€โ”€ refcoco/
190
+ โ”œโ”€โ”€ refcoco+/
191
+ โ”œโ”€โ”€ refcocog_umd/
192
+ โ”œโ”€โ”€ refclef/
193
+ โ””โ”€โ”€ grefcoco/
194
+ ```
195
+
196
+ For training/testing, use:
197
+ - `data_seg/anns/*.json` (provided)
198
+ - `data_seg/masks/*` (generated locally via `bash data_seg/run.sh`)
199
+
200
+ ### Required Images and Raw Data Sources
201
+
202
+ For training/evaluation, you need the corresponding image files locally (COCO/Flickr/ReferIt/VG depending on dataset split and config).
203
+ Common sources:
204
+ - RefCOCO / RefCOCO+ / RefCOCOg / RefClef annotations: http://bvisionweb1.cs.unc.edu/licheng/referit/data/
205
+ - MS COCO 2014 images: https://cocodataset.org/
206
+ - Flickr30k images: http://shannon.cs.illinois.edu/DenotationGraph/
207
+ - ReferItGame images: due to original dataset restrictions, please download by yourself from the official/authorized source.
208
+ - Visual Genome images: https://visualgenome.org/
209
+
210
+ ## ๐Ÿš€ Training
211
+
212
+ Default training launcher:
213
+
214
+ ```bash
215
+ bash submit_train.sh
216
+ ```
217
+
218
+ `submit_train.sh` already includes commented examples for multiple datasets, e.g.:
219
+ - `refcoco`
220
+ - `refcoco+`
221
+ - `refcocog_umd`
222
+ - `referit`
223
+ - `grefcoco`
224
+
225
+ You can also run directly:
226
+
227
+ ```bash
228
+ torchrun --nproc_per_node=8 train.py \
229
+ --config configs/SSP_SAM_CLIP_B_FT_unc.py \
230
+ --clip_pretrained pretrained_checkpoints/CS/CS-ViT-B-16.pt
231
+ ```
232
+
233
+ ### Resume Modes
234
+
235
+ `train.py` supports two resume modes:
236
+ - `--resume <ckpt>`: use this for interrupted training and continue from the previous checkpoint (ๆ–ญ็‚น็ปญ่ฎญ).
237
+ - `--resume_from_pretrain <ckpt>`: use this for loading pretrained weights before fine-tuning/training.
238
+
239
+ ## ๐Ÿ“Š Evaluation
240
+
241
+ Default testing launcher:
242
+
243
+ ```bash
244
+ bash submit_test.sh
245
+ ```
246
+
247
+ Example direct command:
248
+
249
+ ```bash
250
+ torchrun --nproc_per_node=1 --master_port=29590 test.py \
251
+ --config configs/SSP_SAM_CLIP_L_FT_unc.py \
252
+ --test_split testB \
253
+ --clip_pretrained pretrained_checkpoints/CS/CS-ViT-L-14-336px.pt \
254
+ --checkpoint output/your_save_folder/checkpoint_best_miou.pth
255
+ ```
256
+
257
+ ## ๐Ÿ“ Notes
258
+
259
+ - COCO image path in visualization prioritizes `data/coco/train2014`.
260
+ - Current mask prediction/evaluation path uses `512x512` mask space.
261
+ - Config files in `configs/` are set with:
262
+ - `output_dir='outputs/your_save_folder'`
263
+ - `batch_size=8`
264
+ - `freeze_epochs=20`
265
+
266
+ ## ๐ŸŒˆ Acknowledgements
267
+
268
+ This repository benefits from ideas and/or codebases of the following projects:
269
+
270
+ - SimREC: https://github.com/luogen1996/SimREC
271
+ - gRefCOCO: https://github.com/henghuiding/gRefCOCO
272
+ - TransVG: https://github.com/djiajunustc/TransVG
273
+ - Segment Anything (SAM): https://github.com/facebookresearch/segment-anything
274
+
275
+ Thanks to the authors for their valuable open-source contributions.
276
+
277
+ ## ๐Ÿ“š Citation
278
+
279
+ If you find this repository useful, please cite our SSP-SAM paper.
280
+
281
+ ```bibtex
282
+ @article{ssp_sam_tcsvt,
283
+ title={SSP-SAM: SAM with Semantic-Spatial Prompt for Referring Expression Segmentation},
284
+ author={Tang, Wei and Liu, Xuejing and Sun, Yanpeng and Li, Zechao},
285
+ journal={IEEE Transactions on Circuits and Systems for Video Technology},
286
+ year={2025}
287
+ }
288
+ ```