metadata
license: apache-2.0
pipeline_tag: image-segmentation
tags:
- conversational-image-segmentation
- lora
ConverSeg-Net-3B
This repository contains raw checkpoints for ConverSeg-Net-3B, introduced in the paper Conversational Image Segmentation: Grounding Abstract Concepts with Scalable Supervision.
ConverSeg-Net is designed for Conversational Image Segmentation (CIS), which focuses on grounding abstract, intent-driven concepts, including functional and physical reasoning, into pixel-accurate masks.
- Project Page: https://glab-caltech.github.io/converseg/
- Code: https://github.com/AadSah/ConverSeg
- Paper: arXiv:2602.13195
Important Note
These are not Hugging Face from_pretrained model files. They are raw checkpoint files and LoRA adapter files meant to be downloaded and used with the official ConverSeg codebase.
Download
git lfs install
git clone https://huggingface.co/aadarsh99/ConverSeg-Net-3B ./checkpoints/ConverSeg-Net-3B
Sample Usage
After cloning the ConverSeg codebase and setting up the environment, you can run inference using the demo.py script by pointing to the downloaded checkpoint paths:
python demo.py \
--final_ckpt ./checkpoints/ConverSeg-Net-3B/ConverSeg-Net_sam2_90000.torch.torch \
--plm_ckpt ./checkpoints/ConverSeg-Net-3B/ConverSeg-Net_plm_90000.torch \
--lora_ckpt ./checkpoints/ConverSeg-Net-3B/lora_plm_adapter_90000 \
--model_cfg sam2_hiera_l.yaml \
--base_ckpt /path/to/sam2_hiera_large.pt \
--image /path/to/image.jpg \
--prompt "the left-most person" \
--device cuda \
--out_dir ./demo_outputs
Citation
@misc{sahoo2026conversationalimagesegmentationgrounding,
title = {Conversational Image Segmentation: Grounding Abstract Concepts with Scalable Supervision},
author = {Aadarsh Sahoo and Georgia Gkioxari},
year = {2026},
eprint = {2602.13195},
archivePrefix = {arXiv},
primaryClass = {cs.CV},
url = {https://arxiv.org/abs/2602.13195},
}