Chain Of Thought Compression: A Theoritical Analysis Paper ⢠2601.21576 ⢠Published 7 days ago ⢠13
SSA: Sparse Sparse Attention by Aligning Full and Sparse Attention Outputs in Feature Space Paper ⢠2511.20102 ⢠Published Nov 25, 2025 ⢠27 ⢠3
SSA: Sparse Sparse Attention by Aligning Full and Sparse Attention Outputs in Feature Space Paper ⢠2511.20102 ⢠Published Nov 25, 2025 ⢠27
SSA: Sparse Sparse Attention by Aligning Full and Sparse Attention Outputs in Feature Space Paper ⢠2511.20102 ⢠Published Nov 25, 2025 ⢠27
Runtime error Featured 2.95k The Smol Training Playbook š 2.95k The secrets to building world-class LLMs
Two Heads Are Better Than One: Dual-Model Verbal Reflection at Inference-Time Paper ⢠2502.19230 ⢠Published Feb 26, 2025 ⢠2
EnigmaToM: Improve LLMs' Theory-of-Mind Reasoning Capabilities with Neural Knowledge Base of Entity States Paper ⢠2503.03340 ⢠Published Mar 5, 2025 ⢠1
DARS Collection Two Heads Are Better Than One: Dual-Model Verbal Reflection at Inference-Time ⢠4 items ⢠Updated Oct 22, 2025
DARS Collection Two Heads Are Better Than One: Dual-Model Verbal Reflection at Inference-Time ⢠4 items ⢠Updated Oct 22, 2025
Latent Refinement Decoding: Enhancing Diffusion-Based Language Models by Refining Belief States Paper ⢠2510.11052 ⢠Published Oct 13, 2025 ⢠52
Fine-Tuning on Noisy Instructions: Effects on Generalization and Performance Paper ⢠2510.03528 ⢠Published Oct 3, 2025 ⢠19
IntrEx: A Dataset for Modeling Engagement in Educational Conversations Paper ⢠2509.06652 ⢠Published Sep 8, 2025 ⢠26