BAPO: Boundary-Aware Policy Optimization for Reliable Agentic Search Paper • 2601.11037 • Published 11 days ago • 17
Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge Paper • 2601.08808 • Published 14 days ago • 38
Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs Paper • 2601.08763 • Published 14 days ago • 141