| | --- |
| | language: |
| | - en |
| | tags: |
| | - text-to-image |
| | - diffusion |
| | - flux |
| | - grpo |
| | - alignment |
| | pipeline_tag: text-to-image |
| | base_model: black-forest-labs/FLUX.1-dev |
| | --- |
| | |
| | # RealGRPO FLUX DiT Weights |
| |
|
| | This repository provides **DiT weights** fine-tuned from **FLUX.1-dev** with **GRPO** using the **RealGRPO** strategy. |
| |
|
| | RealGRPO targets a common post-training issue in image generation: **reward hacking** (e.g., over-smoothing, over-saturation, and synthetic-looking artifacts). |
| | Compared with vanilla FLUX and standard GRPO baselines, these weights are optimized to better preserve prompt intent while reducing reward-driven artifacts. |
| |
|
| | ## What Is Included |
| |
|
| | - Fine-tuned FLUX DiT weights (GRPO post-training). |
| | - Training objective based on contrastive positive/negative style guidance. |
| | - Compatibility with the RealGRPO codebase inference scripts. |
| |
|
| | ## Method (Brief) |
| |
|
| | RealGRPO uses a LLM to generate prompt-specific style pairs: |
| | - positive style cues (`pos_style`) |
| | - negative style cues (`neg_style`) |
| |
|
| | The reward encourages similarity to positive cues while penalizing negative cues, helping the model avoid artifact-prone shortcuts during alignment. |
| |
|
| | > Note: This release contains DiT alignment weights, not a standalone full pipeline package. You need download black-forest-labs/FLUX.1-dev and replace the contents of the `transfermer` directory with the contents of this repository. |
| |
|
| | |
| |
|
| |
|
| |
|