Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems Paper • 2605.04018 • Published 3 days ago • 26
Stargazer: A Scalable Model-Fitting Benchmark Environment for AI Agents under Astrophysical Constraints Paper • 2604.15664 • Published 21 days ago • 4
LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model Paper • 2604.20796 • Published 16 days ago • 239
Crowded in B-Space: Calibrating Shared Directions for LoRA Merging Paper • 2604.16826 • Published 20 days ago • 18
RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time Paper • 2604.11626 • Published 25 days ago • 101
Can Natural Image Autoencoders Compactly Tokenize fMRI Volumes for Long-Range Dynamics Modeling? Paper • 2604.03619 • Published Apr 4 • 9
Think Longer to Explore Deeper: Learn to Explore In-Context via Length-Incentivized Reinforcement Learning Paper • 2602.11748 • Published Feb 12 • 37
ClawBench: Can AI Agents Complete Everyday Online Tasks? Paper • 2604.08523 • Published 29 days ago • 261
GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning Paper • 2604.02721 • Published Apr 3 • 627
GBQA: A Game Benchmark for Evaluating LLMs as Quality Assurance Engineers Paper • 2604.02648 • Published Apr 3 • 47
Adam's Law: Textual Frequency Law on Large Language Models Paper • 2604.02176 • Published Apr 2 • 501