| --- |
| license: apache-2.0 |
| pipeline_tag: image-segmentation |
| tags: |
| - conversational-image-segmentation |
| - lora |
| --- |
| |
| # ConverSeg-Net-3B |
|
|
| This repository contains raw checkpoints for **ConverSeg-Net-3B**, introduced in the paper [Conversational Image Segmentation: Grounding Abstract Concepts with Scalable Supervision](https://huggingface.co/papers/2602.13195). |
|
|
| ConverSeg-Net is designed for Conversational Image Segmentation (CIS), which focuses on grounding abstract, intent-driven concepts, including functional and physical reasoning, into pixel-accurate masks. |
|
|
| - **Project Page:** [https://glab-caltech.github.io/converseg/](https://glab-caltech.github.io/converseg/) |
| - **Code:** [https://github.com/AadSah/ConverSeg](https://github.com/AadSah/ConverSeg) |
| - **Paper:** [arXiv:2602.13195](https://arxiv.org/abs/2602.13195) |
|
|
| ## Important Note |
|
|
| These are **not** Hugging Face `from_pretrained` model files. They are raw checkpoint files and LoRA adapter files meant to be downloaded and used with the official [ConverSeg codebase](https://github.com/AadSah/ConverSeg). |
|
|
| ## Download |
|
|
| ```bash |
| git lfs install |
| git clone https://huggingface.co/aadarsh99/ConverSeg-Net-3B ./checkpoints/ConverSeg-Net-3B |
| ``` |
|
|
| ## Sample Usage |
|
|
| After cloning the [ConverSeg codebase](https://github.com/AadSah/ConverSeg) and setting up the environment, you can run inference using the `demo.py` script by pointing to the downloaded checkpoint paths: |
|
|
| ```bash |
| python demo.py \ |
| --final_ckpt ./checkpoints/ConverSeg-Net-3B/ConverSeg-Net_sam2_90000.torch.torch \ |
| --plm_ckpt ./checkpoints/ConverSeg-Net-3B/ConverSeg-Net_plm_90000.torch \ |
| --lora_ckpt ./checkpoints/ConverSeg-Net-3B/lora_plm_adapter_90000 \ |
| --model_cfg sam2_hiera_l.yaml \ |
| --base_ckpt /path/to/sam2_hiera_large.pt \ |
| --image /path/to/image.jpg \ |
| --prompt "the left-most person" \ |
| --device cuda \ |
| --out_dir ./demo_outputs |
| ``` |
|
|
| ## Citation |
|
|
| ```bibtex |
| @misc{sahoo2026conversationalimagesegmentationgrounding, |
| title = {Conversational Image Segmentation: Grounding Abstract Concepts with Scalable Supervision}, |
| author = {Aadarsh Sahoo and Georgia Gkioxari}, |
| year = {2026}, |
| eprint = {2602.13195}, |
| archivePrefix = {arXiv}, |
| primaryClass = {cs.CV}, |
| url = {https://arxiv.org/abs/2602.13195}, |
| } |
| ``` |