You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories Paper • 2605.21468 • Published May 20 • 50
SemCoT: Accelerating Chain-of-Thought Reasoning through Semantically-Aligned Implicit Tokens Paper • 2510.24940 • Published Oct 28, 2025 • 18
Rank-GRPO: Training LLM-based Conversational Recommender Systems with Reinforcement Learning Paper • 2510.20150 • Published Oct 23, 2025 • 7
SemCoT: Accelerating Chain-of-Thought Reasoning through Semantically-Aligned Implicit Tokens Paper • 2510.24940 • Published Oct 28, 2025 • 18 • 2
Rank-GRPO: Training LLM-based Conversational Recommender Systems with Reinforcement Learning Paper • 2510.20150 • Published Oct 23, 2025 • 7 • 2
Rank-GRPO: Training LLM-based Conversational Recommender Systems with Reinforcement Learning Paper • 2510.20150 • Published Oct 23, 2025 • 7
SemCoT: Accelerating Chain-of-Thought Reasoning through Semantically-Aligned Implicit Tokens Paper • 2510.24940 • Published Oct 28, 2025 • 18
TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning Paper • 2509.25760 • Published Sep 30, 2025 • 55