SegMoTE: Token-Level Mixture of Experts for Medical Image Segmentation

<div align="center">

<h1 align="center">SegMoTE: Token-Level Mixture of Experts for Medical Image Segmentation</h1>

**Yujie Lu<sup>1*</sup>, Jingwen Li<sup>2*</sup>, Sibo Ju<sup>3</sup>, Yanzhou Su<sup>4</sup>, He Yao<sup>1</sup>, Yisong Liu<sup>1</sup>, Min Zhu<sup>1&dagger;</sup>, Junlong Cheng<sup>1&dagger;</sup>

<sup>1</sup>Sichuan University &nbsp;&nbsp;
<sup>2</sup>Xinjiang University &nbsp;&nbsp;
<sup>3</sup>Fuzhou University &nbsp;&nbsp;
<sup>4</sup>Alibaba DAMO Academy


**CVPR 2026 (Oral)**

[![Paper](https://img.shields.io/badge/Paper-arXiv-b31b1b.svg)](https://arxiv.org/abs/2602.19213)
[![Code](https://img.shields.io/badge/Code-PyTorch-blue.svg)](#installation)
[![Model](https://img.shields.io/badge/Model-Download-orange.svg)](#checkpoint)

</div>

## Abstract

Medical image segmentation requires robust adaptation across heterogeneous
modalities and anatomical structures, while pixel-level annotation remains
expensive. SegMoTE is an efficient adaptation framework built on the Segment
Anything Model (SAM). It introduces a **token-level Mixture of Experts
(MoTE)** mechanism that dynamically selects modality-adaptive expert tokens,
and a **Progressive Prompt Tokenization (PPT)** module that learns
feature-conditioned prompts for prompt-free segmentation on suitable
foreground-background tasks. Trained on the curated **MedSeg-HQ** dataset,
SegMoTE aims to retain the flexible prompt interface and generalization ability
of SAM while providing lightweight adaptation for multimodal medical image
segmentation.


## Citation

The BibTeX entry will be updated after the public paper record is available:

```bibtex
@article{lu2026segmote,
  title={SegMoTE: Token-Level Mixture of Experts for Medical Image Segmentation},
  author={Lu, Yujie and Li, Jingwen and Ju, Sibo and Su, Yanzhou and Liu, Yisong and Zhu, Min and Cheng, Junlong and others},
  journal={arXiv preprint arXiv:2602.19213},
  year={2026}
}
```