YangZhou24
/

RealGRPO

Model card Files Files and versions

RealGRPO / README.md

YangZhou24's picture

Create README.md

6ce214a verified 6 days ago

|

history blame contribute delete

1.38 kB

	---
	language:
	- en
	tags:
	- text-to-image
	- diffusion
	- flux
	- grpo
	- alignment
	pipeline_tag: text-to-image
	base_model: black-forest-labs/FLUX.1-dev
	---

	# RealGRPO FLUX DiT Weights

	This repository provides DiT weights fine-tuned from FLUX.1-dev with GRPO using the RealGRPO strategy.

	RealGRPO targets a common post-training issue in image generation: reward hacking (e.g., over-smoothing, over-saturation, and synthetic-looking artifacts).
	Compared with vanilla FLUX and standard GRPO baselines, these weights are optimized to better preserve prompt intent while reducing reward-driven artifacts.

	## What Is Included

	- Fine-tuned FLUX DiT weights (GRPO post-training).
	- Training objective based on contrastive positive/negative style guidance.
	- Compatibility with the RealGRPO codebase inference scripts.

	## Method (Brief)

	RealGRPO uses a LLM to generate prompt-specific style pairs:
	- positive style cues (`pos_style`)
	- negative style cues (`neg_style`)

	The reward encourages similarity to positive cues while penalizing negative cues, helping the model avoid artifact-prone shortcuts during alignment.

	> Note: This release contains DiT alignment weights, not a standalone full pipeline package. You need download black-forest-labs/FLUX.1-dev and replace the contents of the `transfermer` directory with the contents of this repository.