ResRL: Boosting LLM Reasoning via Negative Sample Projection Residual Reinforcement Learning
Paper • 2605.00380 • Published • 2
None defined yet.
ResRL: Boosting LLM Reasoning via Negative Sample Projection Residual Reinforcement Learning
From $P(y|x)$ to $P(y)$: Investigating Reinforcement Learning in Pre-train Space