Redesign Mixture-of-Experts Routers with Manifold Power Iteration Paper • 2606.12397 • Published 20 days ago • 89
PolarQuant: Leveraging Polar Transformation for Efficient Key Cache Quantization and Decoding Acceleration Paper • 2502.00527 • Published Feb 1, 2025 • 3
Your UnEmbedding Matrix is Secretly a Feature Lens for Text Embeddings Paper • 2606.07502 • Published 25 days ago • 99
DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards Paper • 2605.21467 • Published May 20 • 207
LLaDA-o: An Effective and Length-Adaptive Omni Diffusion Model Paper • 2603.01068 • Published Mar 1 • 22
view article Article The Heterogeneous Feature of RoPE-based Attention in Long-Context LLMs SII-xrliu • Nov 15, 2025 • 15
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss Paper • 2512.23447 • Published Dec 29, 2025 • 100
ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding Paper • 2512.13586 • Published Dec 15, 2025 • 93 • 6
ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding Paper • 2512.13586 • Published Dec 15, 2025 • 93
ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding Paper • 2512.13586 • Published Dec 15, 2025 • 93
ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding Paper • 2512.13586 • Published Dec 15, 2025 • 93