MUX: Continuous Reasoning via Multiplexed Tokens

Ayhan Suleymanzade1   Halil Alperen Gozeten2   Michael Bronstein1,3   Ismail Ilkan Ceylan1,4†   Jinwoo Kim5†

1AITHYRA   2University of Michigan   3University of Oxford   4TU Wien   5KAIST   Equal advising

Overview

MUX compresses discrete chain-of-thought reasoning into continuous multiplexed tokens. Each latent token is trained to represent a weighted linear superposition of a span of discrete reasoning subwords via KL-divergence distillation.

Available Checkpoints

File Base Model Dataset Latent Tokens GSM8K Accuracy
mux-gpt2-gsm8k.bin GPT-2 (124M) GSM8K-AUG 6 48.45%
mux-llama1b-gsm8k.bin LLaMA 3.2 1B-Instruct GSM8K-AUG 6 57.01%

Usage

# Clone the MUX codebase
git clone https://github.com/<your-org>/MUX.git
cd MUX

# Download checkpoint
# Option 1: huggingface-cli
huggingface-cli download MisakiTaro0414/MUX mux-llama1b-gsm8k.bin --local-dir .

# Option 2: direct download
wget https://huggingface.co/MisakiTaro0414/MUX/resolve/main/mux-llama1b-gsm8k.bin

# Evaluate
CKPT_DIR=. bash scripts/eval_llama1b.sh

Training Details

  • Training data: whynlp/gsm8k-aug (385K augmented GSM8K samples)
  • Optimization: LoRA (r=128, alpha=32) with cosine LR schedule
  • Loss: Superposition KL-divergence distillation + reference loss
  • Hardware: NVIDIA H100 GPUs

Citation

@article{suleymanzade2025mux,
  title   = {MUX: Continuous Reasoning via Multiplexed Tokens},
  author  = {Suleymanzade, Ayhan and Gozeten, Halil Alperen and Bronstein, Michael and Ceylan, {\.I}smail {\.I}lkan and Kim, Jinwoo},
  journal = {arXiv preprint},
  year    = {2025}
}

License

Apache 2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train MisakiTaro0414/MUX