SegVol (text/point/box-promptable 3D segmentation) -- SegVol (MONAI ViT + CLIP text + SAM decoder)
Description
SegVol, ported to JAX / Equinox from the upstream PyTorch release. SegVol is a SAM-style promptable segmentation model for volumetric medical images that, alongside point and box (spatial) prompts, accepts text (semantic) prompts via a CLIP text encoder. It pairs a MONAI ViT image encoder (perceptron patch embedding) with a SAM prompt encoder + two-way- transformer mask decoder whose mask logits are fused with a text-aligned similarity map. Forward: one volume plus any combination of text token ids, a point set, and a box -> mask logits at the input resolution.
Intended use
Promptable 3D segmentation of a single-channel medical volume (intensity-normalised and resampled to 32x256x256 by the upstream processor). Prompts -- any combination of: CLIP token ids (L,) for a text prompt, point coordinates (N, 3) with labels (N,) where 1 = foreground / 0 = background, and a box (6,) [x0, y0, z0, x1, y1, z1]. Returns a single foreground mask-logit volume at the input resolution. The string-to-token tokenisation (CLIP BPE) and the zoom-in/zoom-out sliding-window refinement are out-of-model preprocessing / inference concerns.
Usage
from ilex.models.segvol import SegVol
model = SegVol.from_pretrained('ilex-hub/segvol.1')
Authors
Du Y., Bai F., Huang T., Zhao B.
Citation
Du Y., Bai F., Huang T., Zhao B. (2024). SegVol: Universal and Interactive Volumetric Medical Image Segmentation. NeurIPS 2024. arXiv:2311.13385. Built on Kirillov A., et al. (2023), Segment Anything, ICCV 2023, arXiv:2304.02643.
References
- Du Y., Bai F., Huang T., Zhao B. (2024). SegVol: Universal and Interactive Volumetric Medical Image Segmentation. NeurIPS 2024. arXiv:2311.13385.
- Kirillov A., Mintun E., Ravi N., et al. (2023). Segment Anything. ICCV 2023. arXiv:2304.02643.
- Weights + code: https://huggingface.co/BAAI/SegVol ; https://github.com/BAAI-DCAI/SegVol
License
HF Hub license tag: mit
Effective terms: MIT (the SegVol authors, BAAI) on both the network code and the released BAAI/SegVol weights. The underlying Segment Anything design is Meta's (Apache-2.0); the text encoder is the HuggingFace transformers CLIPTextModel. No commercial restrictions; no gating required. The ilex JAX / Equinox port code is separately licensed under Apache-2.0 / GPL-3.0.
Upstream license reference: https://github.com/BAAI-DCAI/SegVol/blob/main/LICENSE
Copyright
Network architecture and pretrained weights: copyright (c) the SegVol authors (BAAI), released under the MIT License. The underlying Segment Anything design is Meta's (Apache-2.0); the CLIP text encoder is the HuggingFace transformers CLIPTextModel. JAX / Equinox port: copyright (c) the ilex authors, released under the Apache-2.0 / GPL-3.0 dual license used by ilex itself.
Upstream source
Original weights / reference implementation: https://github.com/BAAI-DCAI/SegVol
Provenance
This artefact was produced by ilex's
save/load pipeline. The architecture is implemented in
ilex.models.segvol.SegVol and the weights have been converted
from their upstream format. See the upstream source above
for the canonical reference.
- Downloads last month
- 16