YangZhou24
/

RealGRPO

Model card Files Files and versions

YangZhou24 commited on Mar 2

Commit

6ce214a

·

verified ·

1 Parent(s): 5cc4eb9

Create README.md

Files changed (1) hide show

README.md +39 -0

README.md ADDED Viewed

	@@ -0,0 +1,39 @@

+---
+language:
+- en
+tags:
+- text-to-image
+- diffusion
+- flux
+- grpo
+- alignment
+pipeline_tag: text-to-image
+base_model: black-forest-labs/FLUX.1-dev
+---
+# RealGRPO FLUX DiT Weights
+This repository provides **DiT weights** fine-tuned from **FLUX.1-dev** with **GRPO** using the **RealGRPO** strategy.
+RealGRPO targets a common post-training issue in image generation: **reward hacking** (e.g., over-smoothing, over-saturation, and synthetic-looking artifacts).
+Compared with vanilla FLUX and standard GRPO baselines, these weights are optimized to better preserve prompt intent while reducing reward-driven artifacts.
+## What Is Included
+- Fine-tuned FLUX DiT weights (GRPO post-training).
+- Training objective based on contrastive positive/negative style guidance.
+- Compatibility with the RealGRPO codebase inference scripts.
+## Method (Brief)
+RealGRPO uses a LLM to generate prompt-specific style pairs:
+- positive style cues (`pos_style`)
+- negative style cues (`neg_style`)
+The reward encourages similarity to positive cues while penalizing negative cues, helping the model avoid artifact-prone shortcuts during alignment.
+> Note: This release contains DiT alignment weights, not a standalone full pipeline package. You need download black-forest-labs/FLUX.1-dev and replace the contents of the `transfermer` directory with the contents of this repository.