MUX: Continuous Reasoning via Multiplexed Tokens

Ayhan Suleymanzade¹ Halil Alperen Gozeten² Michael Bronstein^1,3 Ismail Ilkan Ceylan^1,4† Jinwoo Kim^5†

¹AITHYRA ²University of Michigan ³University of Oxford ⁴TU Wien ⁵KAIST ^†Equal advising

Overview

MUX compresses discrete chain-of-thought reasoning into continuous multiplexed tokens. Each latent token is trained to represent a weighted linear superposition of a span of discrete reasoning subwords via KL-divergence distillation.

Available Checkpoints

File	Base Model	Dataset	Latent Tokens	GSM8K Accuracy
`mux-gpt2-gsm8k.bin`	GPT-2 (124M)	GSM8K-AUG	6	48.45%
`mux-llama1b-gsm8k.bin`	LLaMA 3.2 1B-Instruct	GSM8K-AUG	6	57.01%

Usage

# Clone the MUX codebase
git clone https://github.com/<your-org>/MUX.git
cd MUX

# Download checkpoint
# Option 1: huggingface-cli
huggingface-cli download MisakiTaro0414/MUX mux-llama1b-gsm8k.bin --local-dir .

# Option 2: direct download
wget https://huggingface.co/MisakiTaro0414/MUX/resolve/main/mux-llama1b-gsm8k.bin

# Evaluate
CKPT_DIR=. bash scripts/eval_llama1b.sh

Training Details

Training data: whynlp/gsm8k-aug (385K augmented GSM8K samples)
Optimization: LoRA (r=128, alpha=32) with cosine LR schedule
Loss: Superposition KL-divergence distillation + reference loss
Hardware: NVIDIA H100 GPUs

Citation

@article{suleymanzade2025mux,
  title   = {MUX: Continuous Reasoning via Multiplexed Tokens},
  author  = {Suleymanzade, Ayhan and Gozeten, Halil Alperen and Bronstein, Michael and Ceylan, {\.I}smail {\.I}lkan and Kim, Jinwoo},
  journal = {arXiv preprint},
  year    = {2025}
}

License

Apache 2.0

Downloads last month: -; Downloads are not tracked for this model. How to track

MisakiTaro0414
/

MUX