E-GRPO / README.md
nielsr's picture
nielsr HF Staff
Add model card for E-GRPO
0046c19 verified
|
raw
history blame
1.43 kB
metadata
license: apache-2.0
pipeline_tag: text-to-image

E-GRPO: High Entropy Steps Drive Effective Reinforcement Learning for Flow Models

This repository contains the weights for E-GRPO, as presented in the paper E-GRPO: High Entropy Steps Drive Effective Reinforcement Learning for Flow Models.

Introduction

E-GRPO (Entropy-Guided Group Relative Policy Optimization) is a reinforcement learning approach designed to enhance flow-matching models for human preference alignment. The key insight is that high-entropy denoising steps are more critical for policy optimization. The authors propose a merging-step strategy that focuses training on these important steps, leading to more efficient and effective exploration compared to standard SDE or ODE sampling methods.

Resources

Citation

If you find this work helpful for your research, please consider citing:

@article{zhang2025egrpo,
  title={E-GRPO: High Entropy Steps Drive Effective Reinforcement Learning for Flow Models},
  author={Zhang, Shengjun and Zhang, Zhang and Dai, Chensheng and Duan, Yueqi},
  journal={arXiv preprint arXiv:2601.00423},
  year={2025}
}