nielsr HF Staff

Update model card with paper, project links, and metadata

a0fd499 verified about 2 months ago

2.25 kB

	---
	license: apache-2.0
	pipeline_tag: image-segmentation
	tags:
	- conversational-image-segmentation
	- lora
	---

	# ConverSeg-Net-3B

	This repository contains raw checkpoints for ConverSeg-Net-3B, introduced in the paper [Conversational Image Segmentation: Grounding Abstract Concepts with Scalable Supervision](https://huggingface.co/papers/2602.13195).

	ConverSeg-Net is designed for Conversational Image Segmentation (CIS), which focuses on grounding abstract, intent-driven concepts, including functional and physical reasoning, into pixel-accurate masks.

	- Project Page: [https://glab-caltech.github.io/converseg/](https://glab-caltech.github.io/converseg/)
	- Code: [https://github.com/AadSah/ConverSeg](https://github.com/AadSah/ConverSeg)
	- Paper: [arXiv:2602.13195](https://arxiv.org/abs/2602.13195)

	## Important Note

	These are not Hugging Face `from_pretrained` model files. They are raw checkpoint files and LoRA adapter files meant to be downloaded and used with the official [ConverSeg codebase](https://github.com/AadSah/ConverSeg).

	## Download

	```bash
	git lfs install
	git clone https://huggingface.co/aadarsh99/ConverSeg-Net-3B ./checkpoints/ConverSeg-Net-3B
	```

	## Sample Usage

	After cloning the [ConverSeg codebase](https://github.com/AadSah/ConverSeg) and setting up the environment, you can run inference using the `demo.py` script by pointing to the downloaded checkpoint paths:

	```bash
	python demo.py \
	--final_ckpt ./checkpoints/ConverSeg-Net-3B/ConverSeg-Net_sam2_90000.torch.torch \
	--plm_ckpt ./checkpoints/ConverSeg-Net-3B/ConverSeg-Net_plm_90000.torch \
	--lora_ckpt ./checkpoints/ConverSeg-Net-3B/lora_plm_adapter_90000 \
	--model_cfg sam2_hiera_l.yaml \
	--base_ckpt /path/to/sam2_hiera_large.pt \
	--image /path/to/image.jpg \
	--prompt "the left-most person" \
	--device cuda \
	--out_dir ./demo_outputs
	```

	## Citation

	```bibtex
	@misc{sahoo2026conversationalimagesegmentationgrounding,
	title = {Conversational Image Segmentation: Grounding Abstract Concepts with Scalable Supervision},
	author = {Aadarsh Sahoo and Georgia Gkioxari},
	year = {2026},
	eprint = {2602.13195},
	archivePrefix = {arXiv},
	primaryClass = {cs.CV},
	url = {https://arxiv.org/abs/2602.13195},
	}
	```