Test-Time Training with KV Binding Is Secretly Linear Attention Paper • 2602.21204 • Published 17 days ago • 30
Untied Ulysses: Memory-Efficient Context Parallelism via Headwise Chunking Paper • 2602.21196 • Published 17 days ago • 5
2Mamba2Furious: Linear in Complexity, Competitive in Accuracy Paper • 2602.17363 • Published 22 days ago • 8