InstructSAM-2B / README.md

nielsr HF Staff

Add model card and metadata

9bcd8aa verified about 1 month ago

1.61 kB

pipeline_tag: image-segmentation

InstructSAM: Segment Any Instance with Any Instructions

InstructSAM is a unified and streamlined framework designed for multi-instance segmentation under arbitrary instructions. It formulates instruction-driven instance segmentation as a set-structured query prediction problem, bridging a vision-language model (VLM) and SAM3. This design equips SAM3 with high-level instruction understanding and compositional reasoning without modifying its core architecture.

Paper: InstructSAM: Segment Any Instance with Any Instructions
Repository: https://github.com/DCDmllm/InstructSAM

Usage

To use this model, please refer to the official repository for environment setup and installation.

You can run single-image inference using the provided inference script:

python3 -m instructsam.infer \
  --model_path CircleRadon/InstructSAM-2B \
  --image-path path/to/image.jpg \
  --query "Please segment the object in the image." \
  --output-dir vis

The script prints the generated text and mask scores, then writes mask overlays to vis/.

Citation

If you find this project useful, please cite using this BibTeX:

@article{yuan2026instructsam,
  title     = {InstructSAM: Segment Any Instance with Any Instructions},
  author    = {Yuqian Yuan, Wentong Li, Zhaocheng Li Yutong Lin, Juncheng Li, Siliang Tang, Jun Xiao, Yueting Zhuang, Wenqiao Zhang},
  year      = {2026},
  journal   = {arXiv},
}