| --- |
| license: apache-2.0 |
| --- |
| # SSP-SAM: SAM with Semantic-Spatial Prompt for Referring Expression Segmentation |
|
|
| <div align="center"> |
| <a href="https://arxiv.org/abs/xxxx.xxxxx"><img src="https://img.shields.io/badge/arXiv-Coming_Soon-b31b1b?style=flat-square" alt="arXiv"></a> |
| <a href="https://huggingface.co/wayneicloud/SSP-SAM"><img src="https://img.shields.io/badge/HuggingFace-Checkpoint-yellow?style=flat-square" alt="HF Checkpoint"></a> |
| <a href="https://huggingface.co/wayneicloud/SSP-SAM"><img src="https://img.shields.io/badge/HuggingFace-Dataset-orange?style=flat-square" alt="HF Dataset"></a> |
| <img src="https://img.shields.io/badge/License-Apache--2.0-green?style=flat-square" alt="License"> |
| </div> |
|
|
| <div align="center"> |
| <a href='https://scholar.google.com/citations?user=D-27eLIAAAAJ&hl=zh-CN' target='_blank'>Wei Tang</a><sup>1</sup>  |
| <a href='https://scholar.google.com.hk/citations?hl=zh-CN&user=SVQYcYcAAAAJ' target='_blank'>Xuejing Liu</a><sup>✉,2</sup>  |
| <a href='https://scholar.google.com.hk/citations?user=a3FI8c4AAAAJ&hl=zh-CN' target='_blank'>Yanpeng Sun</a><sup>3</sup>  |
| <a href='https://imag-njust.net/zechaoli/' target='_blank'>Zechao Li</a><sup>✉,1</sup> |
| </div> |
| |
| <div align="center"> |
| <sup>1</sup>Nanjing University of Science and Technology;  |
| <sup>2</sup>Institute of Computing Technology, Chinese Academy of Sciences;  |
| <sup>3</sup>NExT++ Lab, National University of Singapore |
| <br> |
| <sup>✉</sup> Corresponding Authors |
| </div> |
| |
| --- |
|
|
| ## Overview |
|
|
| This repository provides the codebase of **SSP-SAM**, a referring expression segmentation framework built on top of SAM with semantic-spatial prompts. |
|
|
| Current repo status: |
| - Training/testing/data processing scripts are available. |
| - Multiple dataset configs are provided under `configs/`. |
|
|
| ## ๐ฅ News |
|
|
| - **17 Mar, 2026**: Open-source codebase has been organized and released. |
| - **4 Dec, 2025**: SSP-SAM paper accepted by IEEE TCSVT. |
|
|
| ## ๐ ToDo |
|
|
| - [X] Release final model checkpoints on Hugging Face |
| - [X] Release processed training/evaluation metadata |
| - [X] Release arXiv version |
|
|
| ## ๐ Model Zoo & Links |
|
|
| - Paper: `https://arxiv.org/abs/xxxx.xxxxx` |
| - <img src="https://huggingface.co/front/assets/huggingface_logo-noborder.svg" alt="HF" width="16"/> Hugging Face Checkpoints/datasets: `https://huggingface.co/wayneicloud/SSP-SAM` |
|
|
| ## ๐ Project Structure |
|
|
| ```text |
| . |
| โโโ configs/ # training/evaluation configs |
| โโโ data_seg/ # data preprocessing scripts and generated anns/masks |
| โโโ datasets/ # dataloader and transforms |
| โโโ models/ # SSP_SAM model definitions |
| โโโ segment-anything/ # modified SAM dependency (editable install) |
| โโโ train.py # training entry |
| โโโ test.py # evaluation entry |
| โโโ submit_train.sh # train launcher (with examples) |
| โโโ submit_test.sh # test launcher (with examples) |
| ``` |
|
|
| ## โ๏ธ Environment Setup |
|
|
| Recommended: conda environment on macOS/Linux. |
|
|
| ```bash |
| conda create -n ssp_sam python=3.10 -y |
| conda activate ssp_sam |
| pip install --upgrade pip |
| |
| # 1) install PyTorch (CUDA example: cu121) |
| pip install torch==2.1.0+cu121 torchvision==0.16.0+cu121 torchaudio==2.1.0+cu121 --index-url https://download.pytorch.org/whl/cu121 |
| |
| # 2) install modified segment-anything first |
| cd segment-anything |
| pip install -e . |
| cd .. |
| |
| # 3) install remaining dependencies |
| pip install -r requirements.txt |
| ``` |
|
|
| > Note: the `segment-anything` code in this repository has been modified based on the original SAM implementation. |
| > Please install the local `segment-anything` in editable mode (`pip install -e .`) as shown above. |
|
|
| ## ๐งฉ Data Preparation |
|
|
| Please check: |
| - `data_seg/README.md` |
| - `data_seg/run.sh` |
|
|
| You have two options: |
|
|
| 1. **Use our provided annotations + generate masks locally (recommended)** |
| - <img src="https://huggingface.co/front/assets/huggingface_logo-noborder.svg" alt="HF" width="16"/> Download `data_seg/anns/*.json` and other prepared `data_seg` files from Hugging Face: |
| `https://huggingface.co/wayneicloud/SSP-SAM` |
| - You can directly use our `data_seg/anns/*.json`. |
| - `masks` should be generated on your side by running: |
| ```bash |
| bash data_seg/run.sh |
| ``` |
| |
| 2. **Regenerate annotations/masks by yourself** |
| See the collapsible section below. |
|
|
| <details> |
| <summary>Generate Annotations/Masks by Yourself (click to expand)</summary> |
|
|
| References: |
| - `data_seg/README.md` |
| - `data_seg/run.sh` |
| - `legacy_data_prep_simrec.md` (legacy reference for raw data preparation and sources) |
|
|
| Required raw annotation folders/files for generation include (examples): |
| - `data_seg/refcoco/` |
| - `data_seg/refcoco+/` |
| - `data_seg/refcocog/` |
| - `data_seg/refclef/` |
|
|
| Each folder should contain raw files such as `instances.json` and `refs(...).p`. |
|
|
| Minimal expected layout (example): |
|
|
| ```text |
| data_seg/ |
| โโโ refcoco/ |
| โ โโโ instances.json |
| โ โโโ refs(unc).p |
| โ โโโ refs(google).p |
| โโโ refcoco+/ |
| โ โโโ instances.json |
| โ โโโ refs(unc).p |
| โโโ refcocog/ |
| โ โโโ instances.json |
| โ โโโ refs(google).p |
| โ โโโ refs(umd).p |
| โโโ refclef/ |
| โโโ instances.json |
| โโโ refs(unc).p |
| โโโ refs(berkeley).p |
| ``` |
|
|
| Example preprocessing command: |
|
|
| ```bash |
| python ./data_seg/data_process.py \ |
| --data_root ./data_seg \ |
| --output_dir ./data_seg \ |
| --dataset refcoco \ |
| --split unc \ |
| --generate_mask |
| ``` |
|
|
| </details> |
|
|
| Detailed dataset path/config settings are defined in the corresponding preprocessing scripts/config files in `data_seg/`. |
| Please modify them according to your local environment before running. |
| Also check dataset/image path settings in: |
| - `datasets/dataset.py` |
|
|
| > Important: in `datasets/dataset.py`, class `VGDataset`, you should update local paths for images/annotations/masks according to your machine. |
|
|
| Example local data organization: |
|
|
| ```text |
| your_project_root/ |
| โโโ data/ # set --data_root to this folder |
| โ โโโ coco/ |
| โ โ โโโ train2014/ # COCO images (unc/unc+/gref/gref_umd/grefcoco) |
| โ โโโ referit/ |
| โ โ โโโ images/ # ReferIt images |
| โ โโโ VG/ # Visual Genome images (merge pretrain path) |
| โ โโโ vg/ # Visual Genome images (phrase_cut path, if used) |
| โโโ data_seg/ # same level as data/ |
| โโโ anns/ |
| โ โโโ refcoco.json |
| โ โโโ refcoco+.json |
| โ โโโ refcocog_umd.json |
| โ โโโ refclef.json |
| โ โโโ grefcoco.json |
| โโโ masks/ |
| โโโ refcoco/ |
| โโโ refcoco+/ |
| โโโ refcocog_umd/ |
| โโโ refclef/ |
| โโโ grefcoco/ |
| ``` |
|
|
| For training/testing, use: |
| - `data_seg/anns/*.json` (provided) |
| - `data_seg/masks/*` (generated locally via `bash data_seg/run.sh`) |
|
|
| ### Required Images and Raw Data Sources |
|
|
| For training/evaluation, you need the corresponding image files locally (COCO/Flickr/ReferIt/VG depending on dataset split and config). |
| Common sources: |
| - RefCOCO / RefCOCO+ / RefCOCOg / RefClef annotations: http://bvisionweb1.cs.unc.edu/licheng/referit/data/ |
| - MS COCO 2014 images: https://cocodataset.org/ |
| - Flickr30k images: http://shannon.cs.illinois.edu/DenotationGraph/ |
| - ReferItGame images: due to original dataset restrictions, please download by yourself from the official/authorized source. |
| - Visual Genome images: https://visualgenome.org/ |
|
|
| ## ๐ Training |
|
|
| Default training launcher: |
|
|
| ```bash |
| bash submit_train.sh |
| ``` |
|
|
| `submit_train.sh` already includes commented examples for multiple datasets, e.g.: |
| - `refcoco` |
| - `refcoco+` |
| - `refcocog_umd` |
| - `referit` |
| - `grefcoco` |
|
|
| You can also run directly: |
|
|
| ```bash |
| torchrun --nproc_per_node=8 train.py \ |
| --config configs/SSP_SAM_CLIP_B_FT_unc.py \ |
| --clip_pretrained pretrained_checkpoints/CS/CS-ViT-B-16.pt |
| ``` |
|
|
| ### Resume Modes |
|
|
| `train.py` supports two resume modes: |
| - `--resume <ckpt>`: use this for interrupted training and continue from the previous checkpoint (ๆญ็น็ปญ่ฎญ). |
| - `--resume_from_pretrain <ckpt>`: use this for loading pretrained weights before fine-tuning/training. |
|
|
| ## ๐ Evaluation |
|
|
| Default testing launcher: |
|
|
| ```bash |
| bash submit_test.sh |
| ``` |
|
|
| Example direct command: |
|
|
| ```bash |
| torchrun --nproc_per_node=1 --master_port=29590 test.py \ |
| --config configs/SSP_SAM_CLIP_L_FT_unc.py \ |
| --test_split testB \ |
| --clip_pretrained pretrained_checkpoints/CS/CS-ViT-L-14-336px.pt \ |
| --checkpoint output/your_save_folder/checkpoint_best_miou.pth |
| ``` |
|
|
| ## ๐ Notes |
|
|
| - COCO image path in visualization prioritizes `data/coco/train2014`. |
| - Current mask prediction/evaluation path uses `512x512` mask space. |
| - Config files in `configs/` are set with: |
| - `output_dir='outputs/your_save_folder'` |
| - `batch_size=8` |
| - `freeze_epochs=20` |
|
|
| ## ๐ Acknowledgements |
|
|
| This repository benefits from ideas and/or codebases of the following projects: |
|
|
| - SimREC: https://github.com/luogen1996/SimREC |
| - gRefCOCO: https://github.com/henghuiding/gRefCOCO |
| - TransVG: https://github.com/djiajunustc/TransVG |
| - Segment Anything (SAM): https://github.com/facebookresearch/segment-anything |
|
|
| Thanks to the authors for their valuable open-source contributions. |
|
|
| ## ๐ Citation |
|
|
| If you find this repository useful, please cite our SSP-SAM paper. |
|
|
| ```bibtex |
| @article{ssp_sam_tcsvt, |
| title={SSP-SAM: SAM with Semantic-Spatial Prompt for Referring Expression Segmentation}, |
| author={Tang, Wei and Liu, Xuejing and Sun, Yanpeng and Li, Zechao}, |
| journal={IEEE Transactions on Circuits and Systems for Video Technology}, |
| year={2025} |
| } |
| ``` |
|
|