ConverSeg-Net-3B / README.md
nielsr's picture
nielsr HF Staff
Update model card with paper, project links, and metadata
a0fd499 verified
|
raw
history blame
2.25 kB
metadata
license: apache-2.0
pipeline_tag: image-segmentation
tags:
  - conversational-image-segmentation
  - lora

ConverSeg-Net-3B

This repository contains raw checkpoints for ConverSeg-Net-3B, introduced in the paper Conversational Image Segmentation: Grounding Abstract Concepts with Scalable Supervision.

ConverSeg-Net is designed for Conversational Image Segmentation (CIS), which focuses on grounding abstract, intent-driven concepts, including functional and physical reasoning, into pixel-accurate masks.

Important Note

These are not Hugging Face from_pretrained model files. They are raw checkpoint files and LoRA adapter files meant to be downloaded and used with the official ConverSeg codebase.

Download

git lfs install
git clone https://huggingface.co/aadarsh99/ConverSeg-Net-3B ./checkpoints/ConverSeg-Net-3B

Sample Usage

After cloning the ConverSeg codebase and setting up the environment, you can run inference using the demo.py script by pointing to the downloaded checkpoint paths:

python demo.py \
  --final_ckpt ./checkpoints/ConverSeg-Net-3B/ConverSeg-Net_sam2_90000.torch.torch \
  --plm_ckpt ./checkpoints/ConverSeg-Net-3B/ConverSeg-Net_plm_90000.torch \
  --lora_ckpt ./checkpoints/ConverSeg-Net-3B/lora_plm_adapter_90000 \
  --model_cfg sam2_hiera_l.yaml \
  --base_ckpt /path/to/sam2_hiera_large.pt \
  --image /path/to/image.jpg \
  --prompt "the left-most person" \
  --device cuda \
  --out_dir ./demo_outputs

Citation

@misc{sahoo2026conversationalimagesegmentationgrounding,
  title = {Conversational Image Segmentation: Grounding Abstract Concepts with Scalable Supervision},
  author = {Aadarsh Sahoo and Georgia Gkioxari},
  year = {2026},
  eprint = {2602.13195},
  archivePrefix = {arXiv},
  primaryClass = {cs.CV},
  url = {https://arxiv.org/abs/2602.13195}, 
}