f-GRPO and Beyond: Divergence-Based Reinforcement Learning Algorithms for General LLM Alignment Paper • 2602.05946 • Published 6 days ago