Brick-Composer: Using MLLMs for Assembly with Diverse Bricks Paper • 2606.05445 • Published 22 days ago • 8
AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints Paper • 2606.05622 • Published 21 days ago • 43
Ψ-Bench: Evaluating Persona-Sensitive Influencing in Persuasive Dialogues Paper • 2606.02754 • Published 23 days ago • 13
Advancing Creative Physical Intelligence in Large Multimodal Models Paper • 2605.26396 • Published May 25 • 21
Advancing Creative Physical Intelligence in Large Multimodal Models Paper • 2605.26396 • Published May 25 • 21
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe Paper • 2604.13016 • Published Apr 14 • 113
CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing Paper • 2605.02910 • Published May 6 • 23
CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing Paper • 2605.02910 • Published May 6 • 23
PEARL: Self-Evolving Assistant for Time Management with Reinforcement Learning Paper • 2601.11957 • Published Jan 28 • 3
NarrativeTrack: Evaluating Video Language Models Beyond the Frame Paper • 2601.01095 • Published Jan 3 • 8
Tool-R0: Self-Evolving LLM Agents for Tool-Learning from Zero Data Paper • 2602.21320 • Published Feb 24 • 12
AgentDoG Collection A Diagnostic Guardrail Framework for AI Agent Safety and Security • 12 items • Updated 4 days ago • 112