Beyond Reward Engineering: A Data Recipe for Long-Context Reinforcement Learning Paper • 2606.18831 • Published 9 days ago • 6
Rethinking the Role of Efficient Attention in Hybrid Architectures Paper • 2606.15378 • Published 13 days ago • 17
InfLLM-V2: Dense-Sparse Switchable Attention for Seamless Short-to-Long Adaptation Paper • 2509.24663 • Published Sep 29, 2025 • 18
SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention Paper • 2509.24006 • Published Sep 28, 2025 • 119
Sparsing Law: Towards Large Language Models with Greater Activation Sparsity Paper • 2411.02335 • Published Nov 4, 2024 • 11
Configurable Foundation Models: Building LLMs from a Modular Perspective Paper • 2409.02877 • Published Sep 4, 2024 • 32
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies Paper • 2404.06395 • Published Apr 9, 2024 • 26
Advancing LLM Reasoning Generalists with Preference Trees Paper • 2404.02078 • Published Apr 2, 2024 • 46