ConverSeg-Net-3B / README.md
nielsr's picture
nielsr HF Staff
Update model card with paper, project links, and metadata
a0fd499 verified
|
raw
history blame
2.25 kB
---
license: apache-2.0
pipeline_tag: image-segmentation
tags:
- conversational-image-segmentation
- lora
---
# ConverSeg-Net-3B
This repository contains raw checkpoints for **ConverSeg-Net-3B**, introduced in the paper [Conversational Image Segmentation: Grounding Abstract Concepts with Scalable Supervision](https://huggingface.co/papers/2602.13195).
ConverSeg-Net is designed for Conversational Image Segmentation (CIS), which focuses on grounding abstract, intent-driven concepts, including functional and physical reasoning, into pixel-accurate masks.
- **Project Page:** [https://glab-caltech.github.io/converseg/](https://glab-caltech.github.io/converseg/)
- **Code:** [https://github.com/AadSah/ConverSeg](https://github.com/AadSah/ConverSeg)
- **Paper:** [arXiv:2602.13195](https://arxiv.org/abs/2602.13195)
## Important Note
These are **not** Hugging Face `from_pretrained` model files. They are raw checkpoint files and LoRA adapter files meant to be downloaded and used with the official [ConverSeg codebase](https://github.com/AadSah/ConverSeg).
## Download
```bash
git lfs install
git clone https://huggingface.co/aadarsh99/ConverSeg-Net-3B ./checkpoints/ConverSeg-Net-3B
```
## Sample Usage
After cloning the [ConverSeg codebase](https://github.com/AadSah/ConverSeg) and setting up the environment, you can run inference using the `demo.py` script by pointing to the downloaded checkpoint paths:
```bash
python demo.py \
--final_ckpt ./checkpoints/ConverSeg-Net-3B/ConverSeg-Net_sam2_90000.torch.torch \
--plm_ckpt ./checkpoints/ConverSeg-Net-3B/ConverSeg-Net_plm_90000.torch \
--lora_ckpt ./checkpoints/ConverSeg-Net-3B/lora_plm_adapter_90000 \
--model_cfg sam2_hiera_l.yaml \
--base_ckpt /path/to/sam2_hiera_large.pt \
--image /path/to/image.jpg \
--prompt "the left-most person" \
--device cuda \
--out_dir ./demo_outputs
```
## Citation
```bibtex
@misc{sahoo2026conversationalimagesegmentationgrounding,
title = {Conversational Image Segmentation: Grounding Abstract Concepts with Scalable Supervision},
author = {Aadarsh Sahoo and Georgia Gkioxari},
year = {2026},
eprint = {2602.13195},
archivePrefix = {arXiv},
primaryClass = {cs.CV},
url = {https://arxiv.org/abs/2602.13195},
}
```