Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections Paper • 2603.12180 • Published 1 day ago • 39
view article Article Ulysses Sequence Parallelism: Training with Million-Token Contexts 5 days ago • 17
Unchecked and Overlooked: Addressing the Checkbox Blind Spot in Large Language Models with CheckboxQA Paper • 2504.10419 • Published Apr 14, 2025 • 3
In Case You Missed It: ARC 'Challenge' Is Not That Challenging Paper • 2412.17758 • Published Dec 23, 2024 • 17
Can Models Help Us Create Better Models? Evaluating LLMs as Data Scientists Paper • 2410.23331 • Published Oct 30, 2024 • 8