Beyond Alignment: Value Diversity as a Collective Property in Multicultural Agent Systems Paper • 2606.05985 • Published 26 days ago • 10
Small RL Controller, Large Language Model: RL-Guided Adaptive Sampling for Test-Time Scaling Paper • 2606.03102 • Published 28 days ago • 14
You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories Paper • 2605.21468 • Published May 20 • 51
LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling Paper • 2605.08083 • Published May 8 • 70
Nonsense Helps: Prompt Space Perturbation Broadens Reasoning Exploration Paper • 2605.05566 • Published May 7 • 38
MM-Zero: Self-Evolving Multi-Model Vision Language Models From Zero Data Paper • 2603.09206 • Published Mar 10 • 54
Training Data Efficiency in Multimodal Process Reward Models Paper • 2602.04145 • Published Feb 4 • 80
Parallel-Probe: Towards Efficient Parallel Thinking via 2D Probing Paper • 2602.03845 • Published Feb 3 • 27
Guided Self-Evolving LLMs with Minimal Human Supervision Paper • 2512.02472 • Published Dec 2, 2025 • 55
VisPlay: Self-Evolving Vision-Language Models from Images Paper • 2511.15661 • Published Nov 19, 2025 • 45
TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning Paper • 2509.25760 • Published Sep 30, 2025 • 55
D-Artemis: A Deliberative Cognitive Framework for Mobile GUI Multi-Agents Paper • 2509.21799 • Published Sep 26, 2025 • 9