TOBench: A Task-Oriented Omni-Modal Benchmark for Real-World Tool-Using Agents Paper • 2605.16909 • Published May 16 • 9
view article Article Supercharge your OCR Pipelines with Open Models +5 merve, ariG23498, davanstrien, hynky, andito, reach-vb, pcuenq • Oct 21, 2025 • 315
SpineBench: A Clinically Salient, Level-Aware Benchmark Powered by the SpineMed-450k Corpus Paper • 2510.03160 • Published Oct 3, 2025 • 4
A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems Paper • 2508.07407 • Published Aug 10, 2025 • 99
Arch-Router: Aligning LLM Routing with Human Preferences Paper • 2506.16655 • Published Jun 19, 2025 • 18
AutoMind: Adaptive Knowledgeable Agent for Automated Data Science Paper • 2506.10974 • Published Jun 12, 2025 • 19
Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning Paper • 2506.10521 • Published Jun 12, 2025 • 75
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention Paper • 2506.13585 • Published Jun 16, 2025 • 278
Surfer-H Meets Holo1: Cost-Efficient Web Agent Powered by Open Weights Paper • 2506.02865 • Published Jun 3, 2025 • 34
FusionAudio-1.2M: Towards Fine-grained Audio Captioning with Multimodal Contextual Fusion Paper • 2506.01111 • Published Jun 1, 2025 • 32
IVY-FAKE: A Unified Explainable Framework and Benchmark for Image and Video AIGC Detection Paper • 2506.00979 • Published Jun 1, 2025 • 12
Shifting AI Efficiency From Model-Centric to Data-Centric Compression Paper • 2505.19147 • Published May 25, 2025 • 146