Mimicking the Physicist's Eye:A VLM-centric Approach for Physics Formula Discovery Paper • 2508.17380 • Published Aug 24, 2025 • 7
Target-Oriented Pretraining Data Selection via Neuron-Activated Graph Paper • 2604.15706 • Published Apr 17 • 10
Chasing the Public Score: User Pressure and Evaluation Exploitation in Coding Agent Workflows Paper • 2604.20200 • Published 29 days ago • 5
VLAA-GUI: Knowing When to Stop, Recover, and Search, A Modular Framework for GUI Automation Paper • 2604.21375 • Published 28 days ago • 17
Target-Oriented Pretraining Data Selection via Neuron-Activated Graph Paper • 2604.15706 • Published Apr 17 • 10
STAR-1: Safer Alignment of Reasoning LLMs with 1K Data Paper • 2504.01903 • Published Apr 2, 2025 • 1
AHELM: A Holistic Evaluation of Audio-Language Models Paper • 2508.21376 • Published Aug 29, 2025 • 9
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought Paper • 2511.02779 • Published Nov 4, 2025 • 60
MemMA: Coordinating the Memory Cycle through Multi-Agent Reasoning and In-Situ Self-Evolution Paper • 2603.18718 • Published Mar 19 • 10
Where on Earth? A Vision-Language Benchmark for Probing Model Geolocation Skills Across Scales Paper • 2510.10880 • Published Oct 13, 2025
How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs Paper • 2311.16101 • Published Nov 27, 2023 • 1
AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation Paper • 2410.09040 • Published Oct 11, 2024
Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw Paper • 2604.04759 • Published Apr 6 • 24
MemMA: Coordinating the Memory Cycle through Multi-Agent Reasoning and In-Situ Self-Evolution Paper • 2603.18718 • Published Mar 19 • 10
Unnoticeable Backdoor Attacks on Graph Neural Networks Paper • 2303.01263 • Published Feb 11, 2023 • 1
Protap: A Benchmark for Protein Modeling on Realistic Downstream Applications Paper • 2506.02052 • Published Jun 1, 2025
Unlocking the Power of Multi-Agent LLM for Reasoning: From Lazy Agents to Deliberation Paper • 2511.02303 • Published Nov 4, 2025 • 1
How Far Are LLMs from Professional Poker Players? Revisiting Game-Theoretic Reasoning with Agentic Tool Use Paper • 2602.00528 • Published Jan 31