CircleRadon
/

InstructSAM-2B

Model card Files Files and versions

InstructSAM-2B / README.md

nielsr's picture

nielsr HF Staff

Add model card and metadata

9bcd8aa verified about 1 month ago

|

1.61 kB

	---
	pipeline_tag: image-segmentation
	---

	# InstructSAM: Segment Any Instance with Any Instructions

	InstructSAM is a unified and streamlined framework designed for multi-instance segmentation under arbitrary instructions. It formulates instruction-driven instance segmentation as a set-structured query prediction problem, bridging a vision-language model (VLM) and SAM3. This design equips SAM3 with high-level instruction understanding and compositional reasoning without modifying its core architecture.

	- Paper: [InstructSAM: Segment Any Instance with Any Instructions](https://huggingface.co/papers/2605.26102)
	- Repository: [https://github.com/DCDmllm/InstructSAM](https://github.com/DCDmllm/InstructSAM)

	## Usage

	To use this model, please refer to the [official repository](https://github.com/DCDmllm/InstructSAM) for environment setup and installation.

	You can run single-image inference using the provided inference script:

	```bash
	python3 -m instructsam.infer \
	--model_path CircleRadon/InstructSAM-2B \
	--image-path path/to/image.jpg \
	--query "Please segment the object in the image." \
	--output-dir vis
	```

	The script prints the generated text and mask scores, then writes mask overlays to `vis/`.

	## Citation

	If you find this project useful, please cite using this BibTeX:

	```bibtex
	@article{yuan2026instructsam,
	title = {InstructSAM: Segment Any Instance with Any Instructions},
	author = {Yuqian Yuan, Wentong Li, Zhaocheng Li Yutong Lin, Juncheng Li, Siliang Tang, Jun Xiao, Yueting Zhuang, Wenqiao Zhang},
	year = {2026},
	journal = {arXiv},
	}
	```