Memory is Reconstructed, Not Retrieved: Graph Memory for LLM Agents Paper • 2606.06036 • Published 18 days ago • 71
Skip a Layer or Loop It? Learning Program-of-Layers in LLMs Paper • 2606.06574 • Published 18 days ago • 23
Redesign Mixture-of-Experts Routers with Manifold Power Iteration Paper • 2606.12397 • Published 12 days ago • 87
Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It Paper • 2606.11052 • Published 13 days ago • 16
FlashMemory-DeepSeek-V4: Lightning Index Ultra-Long Context via Lookahead Sparse Attention Paper • 2606.09079 • Published 14 days ago • 62
Less is More: Recursive Reasoning with Tiny Networks Paper • 2510.04871 • Published Oct 6, 2025 • 517 • 43
Less is More: Recursive Reasoning with Tiny Networks Paper • 2510.04871 • Published Oct 6, 2025 • 517
Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding Paper • 2605.29707 • Published 25 days ago • 147
NITP: Next Implicit Token Prediction for LLM Pre-training Paper • 2605.24956 • Published 29 days ago • 35
LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws Paper • 2605.23901 • Published about 1 month ago • 13
Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information Paper • 2605.11609 • Published May 12 • 196
Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention Paper • 2605.22791 • Published May 21 • 33