Where does output diversity collapse in post-training? Paper • 2604.16027 • Published 24 days ago • 22
Beyond RAG for Agent Memory: Retrieval by Decoupling and Aggregation Paper • 2602.02007 • Published Feb 2 • 19
Chain Of Thought Compression: A Theoritical Analysis Paper • 2601.21576 • Published Jan 29 • 20
SSA: Sparse Sparse Attention by Aligning Full and Sparse Attention Outputs in Feature Space Paper • 2511.20102 • Published Nov 25, 2025 • 28 • 3
SSA: Sparse Sparse Attention by Aligning Full and Sparse Attention Outputs in Feature Space Paper • 2511.20102 • Published Nov 25, 2025 • 28
SSA: Sparse Sparse Attention by Aligning Full and Sparse Attention Outputs in Feature Space Paper • 2511.20102 • Published Nov 25, 2025 • 28
Running on CPU Upgrade Featured 3.16k The Smol Training Playbook 📚 3.16k The secrets to building world-class LLMs
Two Heads Are Better Than One: Dual-Model Verbal Reflection at Inference-Time Paper • 2502.19230 • Published Feb 26, 2025 • 2
EnigmaToM: Improve LLMs' Theory-of-Mind Reasoning Capabilities with Neural Knowledge Base of Entity States Paper • 2503.03340 • Published Mar 5, 2025 • 1
DARS Collection Two Heads Are Better Than One: Dual-Model Verbal Reflection at Inference-Time • 4 items • Updated Oct 22, 2025
DARS Collection Two Heads Are Better Than One: Dual-Model Verbal Reflection at Inference-Time • 4 items • Updated Oct 22, 2025
Latent Refinement Decoding: Enhancing Diffusion-Based Language Models by Refining Belief States Paper • 2510.11052 • Published Oct 13, 2025 • 52