-
Flash-GRPO: Efficient Alignment for Video Diffusion via One-Step Policy Optimization
Paper • 2605.15980 • Published • 32 -
NGRPO: Negative-enhanced Group Relative Policy Optimization
Paper • 2509.18851 • Published • 2 -
CEPO: RLVR Self-Distillation using Contrastive Evidence Policy Optimization
Paper • 2605.19436 • Published • 12 -
Delta Attention Residuals
Paper • 2605.18855 • Published • 4
Vansh Kumar
Vansh2676
AI & ML interests
interested in NLP
Recent Activity
updated a collection about 11 hours ago
Reinforcement learning updated a collection about 12 hours ago
Reinforcement learning updated a collection about 12 hours ago
Reinforcement learning Organizations
None yet