VideoSeeker: Incentivizing Instance-level Video Understanding via Native Agentic Tool Invocation Paper • 2605.16079 • Published May 15 • 29
DF3DV-1K: A Large-Scale Dataset and Benchmark for Distractor-Free Novel View Synthesis Paper • 2604.13416 • Published 4 days ago • 28
Beyond the Current Observation: Evaluating Multimodal Large Language Models in Controllable Non-Markov Games Paper • 2606.19338 • Published 5 days ago • 46
Covering Human Action Space for Computer Use: Data Synthesis and Benchmark Paper • 2605.12501 • Published May 12 • 16
ClawArena: Benchmarking AI Agents in Evolving Information Environments Paper • 2604.04202 • Published Apr 5 • 37
Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections Paper • 2603.12180 • Published Mar 12 • 65
Can LLM Agents Be CFOs? A Benchmark for Resource Allocation in Dynamic Enterprise Environments Paper • 2603.23638 • Published Mar 24 • 11
TERMINATOR: Learning Optimal Exit Points for Early Stopping in Chain-of-Thought Reasoning Paper • 2603.12529 • Published Mar 13 • 19
On Token's Dilemma: Dynamic MoE with Drift-Aware Token Assignment for Continual Learning of Large Vision Language Models Paper • 2603.27481 • Published Mar 29 • 35
✂️ Abliteration Collection Uncensored models using abliteration. See this article for more information: huggingface.co/blog/mlabonne/abliteration • 32 items • Updated Mar 2 • 169
🧠 Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community • 24 items • Updated May 19, 2025 • 190