Omni-Thinker: Scaling Cross-Domain Generalization in LLMs via Multi-Task RL with Hybrid Rewards Paper • 2507.14783 • Published Jul 20, 2025 • 4
Guided by Gut: Efficient Test-Time Scaling with Reinforced Intrinsic Confidence Paper • 2505.20325 • Published May 23, 2025 • 46