RealGRPO / README.md
YangZhou24's picture
Create README.md
6ce214a verified
metadata
language:
  - en
tags:
  - text-to-image
  - diffusion
  - flux
  - grpo
  - alignment
pipeline_tag: text-to-image
base_model: black-forest-labs/FLUX.1-dev

RealGRPO FLUX DiT Weights

This repository provides DiT weights fine-tuned from FLUX.1-dev with GRPO using the RealGRPO strategy.

RealGRPO targets a common post-training issue in image generation: reward hacking (e.g., over-smoothing, over-saturation, and synthetic-looking artifacts).
Compared with vanilla FLUX and standard GRPO baselines, these weights are optimized to better preserve prompt intent while reducing reward-driven artifacts.

What Is Included

  • Fine-tuned FLUX DiT weights (GRPO post-training).
  • Training objective based on contrastive positive/negative style guidance.
  • Compatibility with the RealGRPO codebase inference scripts.

Method (Brief)

RealGRPO uses a LLM to generate prompt-specific style pairs:

  • positive style cues (pos_style)
  • negative style cues (neg_style)

The reward encourages similarity to positive cues while penalizing negative cues, helping the model avoid artifact-prone shortcuts during alignment.

Note: This release contains DiT alignment weights, not a standalone full pipeline package. You need download black-forest-labs/FLUX.1-dev and replace the contents of the transfermer directory with the contents of this repository.