| --- |
| pipeline_tag: image-segmentation |
| --- |
| |
| # InstructSAM: Segment Any Instance with Any Instructions |
|
|
| InstructSAM is a unified and streamlined framework designed for multi-instance segmentation under arbitrary instructions. It formulates instruction-driven instance segmentation as a set-structured query prediction problem, bridging a vision-language model (VLM) and SAM3. This design equips SAM3 with high-level instruction understanding and compositional reasoning without modifying its core architecture. |
|
|
| - **Paper:** [InstructSAM: Segment Any Instance with Any Instructions](https://huggingface.co/papers/2605.26102) |
| - **Repository:** [https://github.com/DCDmllm/InstructSAM](https://github.com/DCDmllm/InstructSAM) |
|
|
| ## Usage |
|
|
| To use this model, please refer to the [official repository](https://github.com/DCDmllm/InstructSAM) for environment setup and installation. |
|
|
| You can run single-image inference using the provided inference script: |
|
|
| ```bash |
| python3 -m instructsam.infer \ |
| --model_path CircleRadon/InstructSAM-2B \ |
| --image-path path/to/image.jpg \ |
| --query "Please segment the object in the image." \ |
| --output-dir vis |
| ``` |
|
|
| The script prints the generated text and mask scores, then writes mask overlays to `vis/`. |
|
|
| ## Citation |
|
|
| If you find this project useful, please cite using this BibTeX: |
|
|
| ```bibtex |
| @article{yuan2026instructsam, |
| title = {InstructSAM: Segment Any Instance with Any Instructions}, |
| author = {Yuqian Yuan, Wentong Li, Zhaocheng Li Yutong Lin, Juncheng Li, Siliang Tang, Jun Xiao, Yueting Zhuang, Wenqiao Zhang}, |
| year = {2026}, |
| journal = {arXiv}, |
| } |
| ``` |