VisualClaw: A Real-Time, Personalized Agent for the Physical World Paper • 2606.16295 • Published 18 days ago • 28
Adaptive Auto-Harness: Sustained Self-Improvement for Agentic System Deployment on Open-Ended Task Streams Paper • 2606.01770 • Published Jun 1 • 12
Harness Updating Is Not Harness Benefit: Disentangling Evolution Capabilities in Self-Evolving LLM Agents Paper • 2605.30621 • Published May 28 • 22
ClinSeekAgent: Automating Multimodal Evidence Seeking for Agentic Clinical Reasoning Paper • 2605.20176 • Published May 19 • 12
ClinSeekAgent: Automating Multimodal Evidence Seeking for Agentic Clinical Reasoning Paper • 2605.20176 • Published May 19 • 12
STAR-1: Safer Alignment of Reasoning LLMs with 1K Data Paper • 2504.01903 • Published Apr 2, 2025 • 1
Mimicking the Physicist's Eye:A VLM-centric Approach for Physics Formula Discovery Paper • 2508.17380 • Published Aug 24, 2025 • 7
Target-Oriented Pretraining Data Selection via Neuron-Activated Graph Paper • 2604.15706 • Published Apr 17 • 10
Chasing the Public Score: User Pressure and Evaluation Exploitation in Coding Agent Workflows Paper • 2604.20200 • Published Apr 22 • 5
VLAA-GUI: Knowing When to Stop, Recover, and Search, A Modular Framework for GUI Automation Paper • 2604.21375 • Published Apr 23 • 19
VLAA-GUI: Knowing When to Stop, Recover, and Search, A Modular Framework for GUI Automation Paper • 2604.21375 • Published Apr 23 • 19
Chasing the Public Score: User Pressure and Evaluation Exploitation in Coding Agent Workflows Paper • 2604.20200 • Published Apr 22 • 5
Target-Oriented Pretraining Data Selection via Neuron-Activated Graph Paper • 2604.15706 • Published Apr 17 • 10
Target-Oriented Pretraining Data Selection via Neuron-Activated Graph Paper • 2604.15706 • Published Apr 17 • 10
STAR-1: Safer Alignment of Reasoning LLMs with 1K Data Paper • 2504.01903 • Published Apr 2, 2025 • 1
AHELM: A Holistic Evaluation of Audio-Language Models Paper • 2508.21376 • Published Aug 29, 2025 • 9
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought Paper • 2511.02779 • Published Nov 4, 2025 • 60
Where on Earth? A Vision-Language Benchmark for Probing Model Geolocation Skills Across Scales Paper • 2510.10880 • Published Oct 13, 2025