aadarsh99
/

ConverSeg-Net-3B

Safetensors

Model card Files Files and versions

xet

Community

Update model card with paper, project links, and metadata

by nielsr HF Staff - opened Feb 18

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+46

-4

Files changed (1) hide show

README.md +46 -4

README.md CHANGED Viewed

@@ -1,17 +1,59 @@
 ---
 license: apache-2.0
 ---
 # ConverSeg-Net-3B
-Raw checkpoints for **ConverSeg-Net**.
-These are **not** Hugging Face `from_pretrained` model files.
-They are meant to be downloaded and used with the ConverSeg codebase:
-https://github.com/AadSah/ConverSeg
 ## Download
 ```bash
 git lfs install
 git clone https://huggingface.co/aadarsh99/ConverSeg-Net-3B ./checkpoints/ConverSeg-Net-3B

 ---
 license: apache-2.0
+pipeline_tag: image-segmentation
+tags:
+- conversational-image-segmentation
+- lora
 ---
 # ConverSeg-Net-3B
+This repository contains raw checkpoints for **ConverSeg-Net-3B**, introduced in the paper [Conversational Image Segmentation: Grounding Abstract Concepts with Scalable Supervision](https://huggingface.co/papers/2602.13195).
+ConverSeg-Net is designed for Conversational Image Segmentation (CIS), which focuses on grounding abstract, intent-driven concepts, including functional and physical reasoning, into pixel-accurate masks.
+- **Project Page:** [https://glab-caltech.github.io/converseg/](https://glab-caltech.github.io/converseg/)
+- **Code:** [https://github.com/AadSah/ConverSeg](https://github.com/AadSah/ConverSeg)
+- **Paper:** [arXiv:2602.13195](https://arxiv.org/abs/2602.13195)
+## Important Note
+These are **not** Hugging Face `from_pretrained` model files. They are raw checkpoint files and LoRA adapter files meant to be downloaded and used with the official [ConverSeg codebase](https://github.com/AadSah/ConverSeg).
 ## Download
 ```bash
 git lfs install
 git clone https://huggingface.co/aadarsh99/ConverSeg-Net-3B ./checkpoints/ConverSeg-Net-3B
+```
+## Sample Usage
+After cloning the [ConverSeg codebase](https://github.com/AadSah/ConverSeg) and setting up the environment, you can run inference using the `demo.py` script by pointing to the downloaded checkpoint paths:
+```bash
+python demo.py \
+  --final_ckpt ./checkpoints/ConverSeg-Net-3B/ConverSeg-Net_sam2_90000.torch.torch \
+  --plm_ckpt ./checkpoints/ConverSeg-Net-3B/ConverSeg-Net_plm_90000.torch \
+  --lora_ckpt ./checkpoints/ConverSeg-Net-3B/lora_plm_adapter_90000 \
+  --model_cfg sam2_hiera_l.yaml \
+  --base_ckpt /path/to/sam2_hiera_large.pt \
+  --image /path/to/image.jpg \
+  --prompt "the left-most person" \
+  --device cuda \
+  --out_dir ./demo_outputs
+```
+## Citation
+```bibtex
+@misc{sahoo2026conversationalimagesegmentationgrounding,
+  title = {Conversational Image Segmentation: Grounding Abstract Concepts with Scalable Supervision},
+  author = {Aadarsh Sahoo and Georgia Gkioxari},
+  year = {2026},
+  eprint = {2602.13195},
+  archivePrefix = {arXiv},
+  primaryClass = {cs.CV},
+  url = {https://arxiv.org/abs/2602.13195},
+}
+```