Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders Paper • 2603.06569 • Published 6 days ago • 101
VILLAIN at AVerImaTeC: Verifying Image-Text Claims via Multi-Agent Collaboration Paper • 2602.04587 • Published Feb 4
CostNav: A Navigation Benchmark for Real-World Economic-Cost Evaluation of Physical AI Agents Paper • 2511.20216 • Published Nov 25, 2025
Team HUMANE at AVeriTeC 2025: HerO 2 for Efficient Fact Verification Paper • 2507.11004 • Published Jul 15, 2025 • 1
DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos Paper • 2602.06949 • Published Feb 6 • 35
Judging What We Cannot Solve: A Consequence-Based Approach for Oracle-Free Evaluation of Research-Level Math Paper • 2602.06291 • Published Feb 6 • 23