MM-Zero: Self-Evolving Multi-Model Vision Language Models From Zero Data Paper • 2603.09206 • Published 3 days ago • 41
AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios Paper • 2602.23166 • Published 15 days ago • 40
Heterogeneous Agent Collaborative Reinforcement Learning Paper • 2603.02604 • Published 10 days ago • 172
MOOSE-Star: Unlocking Tractable Training for Scientific Discovery by Breaking the Complexity Barrier Paper • 2603.03756 • Published 9 days ago • 86
RubricBench: Aligning Model-Generated Rubrics with Human Standards Paper • 2603.01562 • Published 11 days ago • 57
On Data Engineering for Scaling LLM Terminal Capabilities Paper • 2602.21193 • Published 17 days ago • 94
From Perception to Action: An Interactive Benchmark for Vision Reasoning Paper • 2602.21015 • Published 17 days ago • 23
AgentSkiller: Scaling Generalist Agent Intelligence through Semantically Integrated Cross-Domain Data Synthesis Paper • 2602.09372 • Published Feb 10 • 5
Self-Rewarding Sequential Monte Carlo for Masked Diffusion Language Models Paper • 2602.01849 • Published Feb 2 • 5
Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models Paper • 2601.07372 • Published Jan 12 • 45
DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation Paper • 2601.09688 • Published Jan 14 • 127
DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation Paper • 2601.09688 • Published Jan 14 • 127
One Sample to Rule Them All: Extreme Data Efficiency in RL Scaling Paper • 2601.03111 • Published Jan 6 • 10