When Does Multi-Agent RL Improve LLM Workflows? Workflow, Scale, and Policy-Sharing Tradeoffs Paper • 2605.24202 • Published 14 days ago • 16
CoSPlay: Cooperative Self-Play at Test-Time with Self-Generated Code and Unit Test Paper • 2605.23491 • Published 14 days ago • 9
EVA-Bench: A New End-to-end Framework for Evaluating Voice Agents Paper • 2605.13841 • Published 23 days ago • 64
LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling Paper • 2605.08083 • Published 28 days ago • 69
AutoGUI-v2: A Comprehensive Multi-Modal GUI Functionality Understanding Benchmark Paper • 2604.24441 • Published Apr 27 • 4
LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model Paper • 2604.20796 • Published Apr 22 • 243