--- pipeline_tag: image-segmentation --- # InstructSAM: Segment Any Instance with Any Instructions InstructSAM is a unified and streamlined framework designed for multi-instance segmentation under arbitrary instructions. It formulates instruction-driven instance segmentation as a set-structured query prediction problem, bridging a vision-language model (VLM) and SAM3. This design equips SAM3 with high-level instruction understanding and compositional reasoning without modifying its core architecture. - **Paper:** [InstructSAM: Segment Any Instance with Any Instructions](https://huggingface.co/papers/2605.26102) - **Repository:** [https://github.com/DCDmllm/InstructSAM](https://github.com/DCDmllm/InstructSAM) ## Usage To use this model, please refer to the [official repository](https://github.com/DCDmllm/InstructSAM) for environment setup and installation. You can run single-image inference using the provided inference script: ```bash python3 -m instructsam.infer \ --model_path CircleRadon/InstructSAM-2B \ --image-path path/to/image.jpg \ --query "Please segment the object in the image." \ --output-dir vis ``` The script prints the generated text and mask scores, then writes mask overlays to `vis/`. ## Citation If you find this project useful, please cite using this BibTeX: ```bibtex @article{yuan2026instructsam, title = {InstructSAM: Segment Any Instance with Any Instructions}, author = {Yuqian Yuan, Wentong Li, Zhaocheng Li Yutong Lin, Juncheng Li, Siliang Tang, Jun Xiao, Yueting Zhuang, Wenqiao Zhang}, year = {2026}, journal = {arXiv}, } ```