Rethinking Sample Polarity in Reinforcement Learning with Verifiable Rewards Paper • 2512.21625 • Published 5 days ago • 3
Rethinking Sample Polarity in Reinforcement Learning with Verifiable Rewards Paper • 2512.21625 • Published 5 days ago • 3
Investigating the Pre-Training Dynamics of In-Context Learning: Task Recognition vs. Task Learning Paper • 2406.14022 • Published Jun 20, 2024
Rethinking the Evaluation for Conversational Recommendation in the Era of Large Language Models Paper • 2305.13112 • Published May 22, 2023
Improving Conversational Recommendation Systems via Counterfactual Data Simulation Paper • 2306.02842 • Published Jun 5, 2023
Towards High Data Efficiency in Reinforcement Learning with Verifiable Reward Paper • 2509.01321 • Published Sep 1
Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model Paper • 2510.18855 • Published Oct 21 • 71
WebResearcher: Unleashing unbounded reasoning capability in Long-Horizon Agents Paper • 2509.13309 • Published Sep 16 • 67