Reasoning Shift: How Context Silently Shortens LLM Reasoning Paper • 2604.01161 • Published 15 days ago • 31
The Depth Ceiling: On the Limits of Large Language Models in Discovering Latent Planning Paper • 2604.06427 • Published 10 days ago • 11
BERT-as-a-Judge: A Robust Alternative to Lexical Methods for Efficient Reference-Based LLM Evaluation Paper • 2604.09497 • Published 7 days ago • 26
Masked by Consensus: Disentangling Privileged Knowledge in LLM Correctness Paper • 2604.12373 • Published 3 days ago • 7
GlotOCR Bench: OCR Models Still Struggle Beyond a Handful of Unicode Scripts Paper • 2604.12978 • Published 3 days ago • 5