InstructSAM-2B / README.md
nielsr's picture
nielsr HF Staff
Add model card and metadata
9bcd8aa verified
|
Raw
History Blame
1.61 kB
metadata
pipeline_tag: image-segmentation

InstructSAM: Segment Any Instance with Any Instructions

InstructSAM is a unified and streamlined framework designed for multi-instance segmentation under arbitrary instructions. It formulates instruction-driven instance segmentation as a set-structured query prediction problem, bridging a vision-language model (VLM) and SAM3. This design equips SAM3 with high-level instruction understanding and compositional reasoning without modifying its core architecture.

Usage

To use this model, please refer to the official repository for environment setup and installation.

You can run single-image inference using the provided inference script:

python3 -m instructsam.infer \
  --model_path CircleRadon/InstructSAM-2B \
  --image-path path/to/image.jpg \
  --query "Please segment the object in the image." \
  --output-dir vis

The script prints the generated text and mask scores, then writes mask overlays to vis/.

Citation

If you find this project useful, please cite using this BibTeX:

@article{yuan2026instructsam,
  title     = {InstructSAM: Segment Any Instance with Any Instructions},
  author    = {Yuqian Yuan, Wentong Li, Zhaocheng Li Yutong Lin, Juncheng Li, Siliang Tang, Jun Xiao, Yueting Zhuang, Wenqiao Zhang},
  year      = {2026},
  journal   = {arXiv},
}