OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation Paper • 2506.02397 • Published Jun 3, 2025 • 36
ReCreate: Reasoning and Creating Domain Agents Driven by Experience Paper • 2601.11100 • Published 14 days ago • 18
Scheduling Your LLM Reinforcement Learning with Reasoning Trees Paper • 2510.24832 • Published Oct 28, 2025
GAPO: Robust Advantage Estimation for Real-World Code LLMs Paper • 2510.21830 • Published Oct 22, 2025
Multi-class Support Vector Machine with Maximizing Minimum Margin Paper • 2312.06578 • Published Dec 11, 2023
PSL: Rethinking and Improving Softmax Loss from Pairwise Perspective for Recommendation Paper • 2411.00163 • Published Oct 31, 2024 • 1
Breaking the Top-K Barrier: Advancing Top-K Ranking Metrics Optimization in Recommender Systems Paper • 2508.05673 • Published Aug 4, 2025 • 1
Rethinking Entropy Interventions in RLVR: An Entropy Change Perspective Paper • 2510.10150 • Published Oct 11, 2025 • 1
ReCreate: Reasoning and Creating Domain Agents Driven by Experience Paper • 2601.11100 • Published 14 days ago • 18
Rethinking Entropy Interventions in RLVR: An Entropy Change Perspective Paper • 2510.10150 • Published Oct 11, 2025 • 1