whynlp/gsm8k-aug
Viewer • Updated • 387k • 1.77k • 4
Ayhan Suleymanzade1 Halil Alperen Gozeten2 Michael Bronstein1,3 Ismail Ilkan Ceylan1,4† Jinwoo Kim5†
1AITHYRA 2University of Michigan 3University of Oxford 4TU Wien 5KAIST †Equal advising
MUX compresses discrete chain-of-thought reasoning into continuous multiplexed tokens. Each latent token is trained to represent a weighted linear superposition of a span of discrete reasoning subwords via KL-divergence distillation.
| File | Base Model | Dataset | Latent Tokens | GSM8K Accuracy |
|---|---|---|---|---|
mux-gpt2-gsm8k.bin |
GPT-2 (124M) | GSM8K-AUG | 6 | 48.45% |
mux-llama1b-gsm8k.bin |
LLaMA 3.2 1B-Instruct | GSM8K-AUG | 6 | 57.01% |
# Clone the MUX codebase
git clone https://github.com/<your-org>/MUX.git
cd MUX
# Download checkpoint
# Option 1: huggingface-cli
huggingface-cli download MisakiTaro0414/MUX mux-llama1b-gsm8k.bin --local-dir .
# Option 2: direct download
wget https://huggingface.co/MisakiTaro0414/MUX/resolve/main/mux-llama1b-gsm8k.bin
# Evaluate
CKPT_DIR=. bash scripts/eval_llama1b.sh
@article{suleymanzade2025mux,
title = {MUX: Continuous Reasoning via Multiplexed Tokens},
author = {Suleymanzade, Ayhan and Gozeten, Halil Alperen and Bronstein, Michael and Ceylan, {\.I}smail {\.I}lkan and Kim, Jinwoo},
journal = {arXiv preprint},
year = {2025}
}
Apache 2.0