SegVol (text/point/box-promptable 3D segmentation) -- SegVol (MONAI ViT + CLIP text + SAM decoder)

Description

SegVol, ported to JAX / Equinox from the upstream PyTorch release. SegVol is a SAM-style promptable segmentation model for volumetric medical images that, alongside point and box (spatial) prompts, accepts text (semantic) prompts via a CLIP text encoder. It pairs a MONAI ViT image encoder (perceptron patch embedding) with a SAM prompt encoder + two-way- transformer mask decoder whose mask logits are fused with a text-aligned similarity map. Forward: one volume plus any combination of text token ids, a point set, and a box -> mask logits at the input resolution.

Intended use

Promptable 3D segmentation of a single-channel medical volume (intensity-normalised and resampled to 32x256x256 by the upstream processor). Prompts -- any combination of: CLIP token ids (L,) for a text prompt, point coordinates (N, 3) with labels (N,) where 1 = foreground / 0 = background, and a box (6,) [x0, y0, z0, x1, y1, z1]. Returns a single foreground mask-logit volume at the input resolution. The string-to-token tokenisation (CLIP BPE) and the zoom-in/zoom-out sliding-window refinement are out-of-model preprocessing / inference concerns.

Usage

from ilex.models.segvol import SegVol
model = SegVol.from_pretrained('ilex-hub/segvol.1')

Authors

Du Y., Bai F., Huang T., Zhao B.

Citation

Du Y., Bai F., Huang T., Zhao B. (2024). SegVol: Universal and Interactive Volumetric Medical Image Segmentation. NeurIPS 2024. arXiv:2311.13385. Built on Kirillov A., et al. (2023), Segment Anything, ICCV 2023, arXiv:2304.02643.

References

License

HF Hub license tag: mit

Effective terms: MIT (the SegVol authors, BAAI) on both the network code and the released BAAI/SegVol weights. The underlying Segment Anything design is Meta's (Apache-2.0); the text encoder is the HuggingFace transformers CLIPTextModel. No commercial restrictions; no gating required. The ilex JAX / Equinox port code is separately licensed under Apache-2.0 / GPL-3.0.

Upstream license reference: https://github.com/BAAI-DCAI/SegVol/blob/main/LICENSE

Copyright

Network architecture and pretrained weights: copyright (c) the SegVol authors (BAAI), released under the MIT License. The underlying Segment Anything design is Meta's (Apache-2.0); the CLIP text encoder is the HuggingFace transformers CLIPTextModel. JAX / Equinox port: copyright (c) the ilex authors, released under the Apache-2.0 / GPL-3.0 dual license used by ilex itself.

Upstream source

Original weights / reference implementation: https://github.com/BAAI-DCAI/SegVol

Provenance

This artefact was produced by ilex's save/load pipeline. The architecture is implemented in ilex.models.segvol.SegVol and the weights have been converted from their upstream format. See the upstream source above for the canonical reference.

Downloads last month
16
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for ilex-hub/segvol.1