Learnable Multipliers: Freeing the Scale of Language Model Matrix Layers Paper • 2601.04890 • Published 18 days ago • 41
Nested Learning: The Illusion of Deep Learning Architectures Paper • 2512.24695 • Published 26 days ago • 40
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss Paper • 2512.23447 • Published 28 days ago • 95
Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space Paper • 2512.24617 • Published 26 days ago • 61
Improving Recursive Transformers with Mixture of LoRAs Paper • 2512.12880 • Published Dec 14, 2025 • 6
SPICE: Self-Play In Corpus Environments Improves Reasoning Paper • 2510.24684 • Published Oct 28, 2025 • 18
SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations Paper • 2512.14080 • Published Dec 16, 2025 • 8
Trainable Log-linear Sparse Attention for Efficient Diffusion Transformers Paper • 2512.16615 • Published Dec 18, 2025 • 5
ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding Paper • 2512.13586 • Published Dec 15, 2025 • 92