Beyond Alignment: Value Diversity as a Collective Property in Multicultural Agent Systems Paper • 2606.05985 • Published 22 days ago • 10
Beyond Alignment: Value Diversity as a Collective Property in Multicultural Agent Systems Paper • 2606.05985 • Published 22 days ago • 10
Small RL Controller, Large Language Model: RL-Guided Adaptive Sampling for Test-Time Scaling Paper • 2606.03102 • Published 24 days ago • 14
You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories Paper • 2605.21468 • Published May 20 • 51
LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling Paper • 2605.08083 • Published May 8 • 70
Nonsense Helps: Prompt Space Perturbation Broadens Reasoning Exploration Paper • 2605.05566 • Published May 7 • 38