| <div align="center"> |
|
|
| <h1 align="center">SegMoTE: Token-Level Mixture of Experts for Medical Image Segmentation</h1> |
|
|
| **Yujie Lu<sup>1*</sup>, Jingwen Li<sup>2*</sup>, Sibo Ju<sup>3</sup>, Yanzhou Su<sup>4</sup>, He Yao<sup>1</sup>, Yisong Liu<sup>1</sup>, Min Zhu<sup>1†</sup>, Junlong Cheng<sup>1†</sup> |
| |
| <sup>1</sup>Sichuan University |
| <sup>2</sup>Xinjiang University |
| <sup>3</sup>Fuzhou University |
| <sup>4</sup>Alibaba DAMO Academy |
| |
| |
| **CVPR 2026 (Oral)** |
|
|
| [](https://arxiv.org/abs/2602.19213) |
| [](#installation) |
| [](#checkpoint) |
|
|
| </div> |
|
|
| ## Abstract |
|
|
| Medical image segmentation requires robust adaptation across heterogeneous |
| modalities and anatomical structures, while pixel-level annotation remains |
| expensive. SegMoTE is an efficient adaptation framework built on the Segment |
| Anything Model (SAM). It introduces a **token-level Mixture of Experts |
| (MoTE)** mechanism that dynamically selects modality-adaptive expert tokens, |
| and a **Progressive Prompt Tokenization (PPT)** module that learns |
| feature-conditioned prompts for prompt-free segmentation on suitable |
| foreground-background tasks. Trained on the curated **MedSeg-HQ** dataset, |
| SegMoTE aims to retain the flexible prompt interface and generalization ability |
| of SAM while providing lightweight adaptation for multimodal medical image |
| segmentation. |
|
|
|
|
| ## Citation |
|
|
| The BibTeX entry will be updated after the public paper record is available: |
|
|
| ```bibtex |
| @article{lu2026segmote, |
| title={SegMoTE: Token-Level Mixture of Experts for Medical Image Segmentation}, |
| author={Lu, Yujie and Li, Jingwen and Ju, Sibo and Su, Yanzhou and Liu, Yisong and Zhu, Min and Cheng, Junlong and others}, |
| journal={arXiv preprint arXiv:2602.19213}, |
| year={2026} |
| } |
| ``` |
|
|
|
|