HippoCamp: Benchmarking Contextual Agents on Personal Computers Paper • 2604.01221 • Published about 20 hours ago • 14
HippoCamp: Benchmarking Contextual Agents on Personal Computers Paper • 2604.01221 • Published about 20 hours ago • 14
Understanding by Reconstruction: Reversing the Software Development Process for LLM Pretraining Paper • 2603.11103 • Published 22 days ago • 8
AHAN: Asymmetric Hierarchical Attention Network for Identical Twin Face Verification Paper • 2602.21503 • Published Feb 25
CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization Paper • 2507.06181 • Published Jul 8, 2025 • 45
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning Paper • 2509.02544 • Published Sep 2, 2025 • 127
AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation Paper • 2410.00371 • Published Oct 1, 2024
Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning Paper • 2506.13654 • Published Jun 16, 2025 • 43
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models Paper • 2407.12772 • Published Jul 17, 2024 • 35
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models Paper • 2407.12772 • Published Jul 17, 2024 • 35
Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos Paper • 2501.13826 • Published Jan 23, 2025 • 23
Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models Paper • 2412.09645 • Published Dec 10, 2024 • 36