File size: 9,881 Bytes

---
license: apache-2.0
---
# SSP-SAM: SAM with Semantic-Spatial Prompt for Referring Expression Segmentation

<div align="center">
  <a href="https://arxiv.org/abs/xxxx.xxxxx"><img src="https://img.shields.io/badge/arXiv-Coming_Soon-b31b1b?style=flat-square" alt="arXiv"></a>
  <a href="https://huggingface.co/wayneicloud/SSP-SAM"><img src="https://img.shields.io/badge/HuggingFace-Checkpoint-yellow?style=flat-square" alt="HF Checkpoint"></a>
  <a href="https://huggingface.co/wayneicloud/SSP-SAM"><img src="https://img.shields.io/badge/HuggingFace-Dataset-orange?style=flat-square" alt="HF Dataset"></a>
  <img src="https://img.shields.io/badge/License-Apache--2.0-green?style=flat-square" alt="License">
</div>

<div align="center">
    <a href='https://scholar.google.com/citations?user=D-27eLIAAAAJ&hl=zh-CN' target='_blank'>Wei Tang</a><sup>1</sup>&emsp;
    <a href='https://scholar.google.com.hk/citations?hl=zh-CN&user=SVQYcYcAAAAJ' target='_blank'>Xuejing Liu</a><sup>&#x2709,2</sup>&emsp;
    <a href='https://scholar.google.com.hk/citations?user=a3FI8c4AAAAJ&hl=zh-CN' target='_blank'>Yanpeng Sun</a><sup>3</sup>&emsp;
    <a href='https://imag-njust.net/zechaoli/' target='_blank'>Zechao Li</a><sup>&#x2709,1</sup>
</div>

<div align="center">
    <sup>1</sup>Nanjing University of Science and Technology;&emsp;
    <sup>2</sup>Institute of Computing Technology, Chinese Academy of Sciences;&emsp;
    <sup>3</sup>NExT++ Lab, National University of Singapore
    <br>
    <sup>&#x2709</sup> Corresponding Authors
</div>

---

## Overview

This repository provides the codebase of **SSP-SAM**, a referring expression segmentation framework built on top of SAM with semantic-spatial prompts.

Current repo status:
- Training/testing/data processing scripts are available.
- Multiple dataset configs are provided under `configs/`.

## 💥 News

- **17 Mar, 2026**: Open-source codebase has been organized and released.
- **4 Dec, 2025**: SSP-SAM paper accepted by IEEE TCSVT.

## 📌 ToDo

- [X] Release final model checkpoints on Hugging Face
- [X] Release processed training/evaluation metadata
- [X] Release arXiv version

## 🔗 Model Zoo & Links

- Paper: `https://arxiv.org/abs/xxxx.xxxxx`
- <img src="https://huggingface.co/front/assets/huggingface_logo-noborder.svg" alt="HF" width="16"/> Hugging Face Checkpoints/datasets: `https://huggingface.co/wayneicloud/SSP-SAM`

## 📁 Project Structure

```text
.
├── configs/                 # training/evaluation configs
├── data_seg/                # data preprocessing scripts and generated anns/masks
├── datasets/                # dataloader and transforms
├── models/                  # SSP_SAM model definitions
├── segment-anything/        # modified SAM dependency (editable install)
├── train.py                 # training entry
├── test.py                  # evaluation entry
├── submit_train.sh          # train launcher (with examples)
└── submit_test.sh           # test launcher (with examples)
```

## ⚙️ Environment Setup

Recommended: conda environment on macOS/Linux.

```bash
conda create -n ssp_sam python=3.10 -y
conda activate ssp_sam
pip install --upgrade pip

# 1) install PyTorch (CUDA example: cu121)
pip install torch==2.1.0+cu121 torchvision==0.16.0+cu121 torchaudio==2.1.0+cu121 --index-url https://download.pytorch.org/whl/cu121

# 2) install modified segment-anything first
cd segment-anything
pip install -e .
cd ..

# 3) install remaining dependencies
pip install -r requirements.txt
```

> Note: the `segment-anything` code in this repository has been modified based on the original SAM implementation.  
> Please install the local `segment-anything` in editable mode (`pip install -e .`) as shown above.

## 🧩 Data Preparation

Please check:
- `data_seg/README.md`
- `data_seg/run.sh`

You have two options:

1. **Use our provided annotations + generate masks locally (recommended)**  
   - <img src="https://huggingface.co/front/assets/huggingface_logo-noborder.svg" alt="HF" width="16"/> Download `data_seg/anns/*.json` and other prepared `data_seg` files from Hugging Face:  
     `https://huggingface.co/wayneicloud/SSP-SAM`
   - You can directly use our `data_seg/anns/*.json`.
   - `masks` should be generated on your side by running:
     ```bash
     bash data_seg/run.sh
     ```

2. **Regenerate annotations/masks by yourself**  
   See the collapsible section below.

<details>
<summary>Generate Annotations/Masks by Yourself (click to expand)</summary>

References:
- `data_seg/README.md`
- `data_seg/run.sh`
- `legacy_data_prep_simrec.md` (legacy reference for raw data preparation and sources)

Required raw annotation folders/files for generation include (examples):
- `data_seg/refcoco/`
- `data_seg/refcoco+/`
- `data_seg/refcocog/`
- `data_seg/refclef/`

Each folder should contain raw files such as `instances.json` and `refs(...).p`.

Minimal expected layout (example):

```text
data_seg/
├── refcoco/
│   ├── instances.json
│   ├── refs(unc).p
│   └── refs(google).p
├── refcoco+/
│   ├── instances.json
│   └── refs(unc).p
├── refcocog/
│   ├── instances.json
│   ├── refs(google).p
│   └── refs(umd).p
└── refclef/
    ├── instances.json
    ├── refs(unc).p
    └── refs(berkeley).p
```

Example preprocessing command:

```bash
python ./data_seg/data_process.py \
  --data_root ./data_seg \
  --output_dir ./data_seg \
  --dataset refcoco \
  --split unc \
  --generate_mask
```

</details>

Detailed dataset path/config settings are defined in the corresponding preprocessing scripts/config files in `data_seg/`.  
Please modify them according to your local environment before running.
Also check dataset/image path settings in:
- `datasets/dataset.py`

> Important: in `datasets/dataset.py`, class `VGDataset`, you should update local paths for images/annotations/masks according to your machine.

Example local data organization:

```text
your_project_root/
├── data/                                        # set --data_root to this folder
│   ├── coco/
│   │   └── train2014/                           # COCO images (unc/unc+/gref/gref_umd/grefcoco)
│   ├── referit/
│   │   └── images/                              # ReferIt images
│   ├── VG/                                      # Visual Genome images (merge pretrain path)
│   └── vg/                                      # Visual Genome images (phrase_cut path, if used)
└── data_seg/                                    # same level as data/
    ├── anns/
    │   ├── refcoco.json
    │   ├── refcoco+.json
    │   ├── refcocog_umd.json
    │   ├── refclef.json
    │   └── grefcoco.json
    └── masks/
        ├── refcoco/
        ├── refcoco+/
        ├── refcocog_umd/
        ├── refclef/
        └── grefcoco/
```

For training/testing, use:
- `data_seg/anns/*.json` (provided)
- `data_seg/masks/*` (generated locally via `bash data_seg/run.sh`)

### Required Images and Raw Data Sources

For training/evaluation, you need the corresponding image files locally (COCO/Flickr/ReferIt/VG depending on dataset split and config).  
Common sources:
- RefCOCO / RefCOCO+ / RefCOCOg / RefClef annotations: http://bvisionweb1.cs.unc.edu/licheng/referit/data/
- MS COCO 2014 images: https://cocodataset.org/
- Flickr30k images: http://shannon.cs.illinois.edu/DenotationGraph/
- ReferItGame images: due to original dataset restrictions, please download by yourself from the official/authorized source.
- Visual Genome images: https://visualgenome.org/

## 🚀 Training

Default training launcher:

```bash
bash submit_train.sh
```

`submit_train.sh` already includes commented examples for multiple datasets, e.g.:
- `refcoco`
- `refcoco+`
- `refcocog_umd`
- `referit`
- `grefcoco`

You can also run directly:

```bash
torchrun --nproc_per_node=8 train.py \
  --config configs/SSP_SAM_CLIP_B_FT_unc.py \
  --clip_pretrained pretrained_checkpoints/CS/CS-ViT-B-16.pt
```

### Resume Modes

`train.py` supports two resume modes:
- `--resume <ckpt>`: use this for interrupted training and continue from the previous checkpoint (断点续训).
- `--resume_from_pretrain <ckpt>`: use this for loading pretrained weights before fine-tuning/training.

## 📊 Evaluation

Default testing launcher:

```bash
bash submit_test.sh
```

Example direct command:

```bash
torchrun --nproc_per_node=1 --master_port=29590 test.py \
  --config configs/SSP_SAM_CLIP_L_FT_unc.py \
  --test_split testB \
  --clip_pretrained pretrained_checkpoints/CS/CS-ViT-L-14-336px.pt \
  --checkpoint output/your_save_folder/checkpoint_best_miou.pth
```

## 📝 Notes

- COCO image path in visualization prioritizes `data/coco/train2014`.
- Current mask prediction/evaluation path uses `512x512` mask space.
- Config files in `configs/` are set with:
  - `output_dir='outputs/your_save_folder'`
  - `batch_size=8`
  - `freeze_epochs=20`

## 🌈 Acknowledgements

This repository benefits from ideas and/or codebases of the following projects:

- SimREC: https://github.com/luogen1996/SimREC
- gRefCOCO: https://github.com/henghuiding/gRefCOCO
- TransVG: https://github.com/djiajunustc/TransVG
- Segment Anything (SAM): https://github.com/facebookresearch/segment-anything

Thanks to the authors for their valuable open-source contributions.

## 📚 Citation

If you find this repository useful, please cite our SSP-SAM paper.

```bibtex
@article{ssp_sam_tcsvt,
  title={SSP-SAM: SAM with Semantic-Spatial Prompt for Referring Expression Segmentation},
  author={Tang, Wei and Liu, Xuejing and Sun, Yanpeng and Li, Zechao},
  journal={IEEE Transactions on Circuits and Systems for Video Technology},
  year={2025}
}
```