Towards Real-world Human Behavior Simulation: Benchmarking Large Language Models on Long-horizon, Cross-scenario, Heterogeneous Behavior Traces Paper • 2604.08362 • Published 7 days ago • 15
Adam's Law: Textual Frequency Law on Large Language Models Paper • 2604.02176 • Published 14 days ago • 470
GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning Paper • 2604.02721 • Published 13 days ago • 354
VLMs Need Words: Vision Language Models Ignore Visual Detail In Favor of Semantic Anchors Paper • 2604.02486 • Published 14 days ago • 9
SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise Paper • 2602.12783 • Published Feb 13 • 216
Video Models Reason Early: Exploiting Plan Commitment for Maze Solving Paper • 2603.30043 • Published 16 days ago • 14