None defined yet.
D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models
SeeUPO: Sequence-Level Agentic-RL with Convergence Guarantees