STAPO: Stabilizing Reinforcement Learning for LLMs by Silencing Rare Spurious Tokens Paper • 2602.15620 • Published 1 day ago • 2