Papers
arxiv:2602.04105

Expert Selections In MoE Models Reveal (Almost) As Much As Text

Published on Feb 4
Authors:
,

Abstract

Text reconstruction attacks on mixture-of-experts language models demonstrate that expert selection routing reveals substantial information about underlying tokens, with transformer-based decoders achieving high accuracy in token recovery.

AI-generated summary

We present a text-reconstruction attack on mixture-of-experts (MoE) language models that recovers tokens from expert selections alone. In MoE models, each token is routed to a subset of expert subnetworks; we show these routing decisions leak substantially more information than previously understood. Prior work using logistic regression achieves limited reconstruction; we show that a 3-layer MLP improves this to 63.1% top-1 accuracy, and that a transformer-based sequence decoder recovers 91.2% of tokens top-1 (94.8% top-10) on 32-token sequences from OpenWebText after training on 100M tokens. These results connect MoE routing to the broader literature on embedding inversion. We outline practical leakage scenarios (e.g., distributed inference and side channels) and show that adding noise reduces but does not eliminate reconstruction. Our findings suggest that expert selections in MoE deployments should be treated as sensitive as the underlying text.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2602.04105 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2602.04105 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2602.04105 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.