Update README.md

83703b2 verified 6 months ago

5.66 kB

	---
	library_name: diffusers
	pipeline_tag: text-to-image
	tags:
	- text-to-image
	- image-generation
	- flux
	- dc-gen
	- diffusers
	base_model:
	- dc-ai/dc_flux_2K4K
	- black-forest-labs/FLUX.1-Krea-dev
	---

	# blanchon/dc_flux_krea_diffusers

	Diffusers-compatible port of DC-Gen-FLUX (Krea) for efficient high-resolution text-to-image generation (2K / 4K).

	This repository repackages the original DC-Gen FLUX.1-Krea checkpoint into a 🧨 Diffusers `DiffusionPipeline`, enabling standard Diffusers workflows while preserving the behavior and performance of the upstream model.

	---

	## Model Details

	### Model Description

	FLUX.1 DC-Gen Krea [dev] is a DC-Gen–adapted FLUX.1-Krea checkpoint that replaces the original FLUX VAE with a deeply compressed DC-AE latent space.
	Using embedding alignment followed by lightweight LoRA fine-tuning, DC-Gen enables much faster native 2K / 4K image generation while preserving the base model’s realism and text-rendering quality.

	This repository does not retrain the model. It only provides a Diffusers port of the upstream checkpoint for easier inference and deployment.

	- DC-Gen method & model: NVIDIA DC-Gen team
	(Wenkun He, Yuchao Gu, Junyu Chen*, Dongyun Zou, Yujun Lin, Zhekai Zhang, Haocheng Xi, Muyang Li, Ligeng Zhu, Jincheng Yu, Junsong Chen, Enze Xie, Song Han, Han Cai)
	- Diffusers port: @blanchon
	- Model type: Text-to-image diffusion (FLUX family, rectified flow transformer)
	- License: FLUX.1 [dev] Non-Commercial License (same as upstream)
	- Upstream checkpoint: `dc-ai/dc_flux_2K4K`
	- Base model family: `black-forest-labs/FLUX.1-Krea-dev`

	---

	## Model Sources

	- DC-Gen project: https://github.com/dc-ai-projects/DC-Gen
	- DC-Gen homepage: https://hanlab.mit.edu/projects/dc-gen
	- Paper: https://arxiv.org/abs/2509.25180
	- Upstream checkpoint: https://huggingface.co/dc-ai/dc_flux_2K4K
	- FLUX.1-Krea base model: https://huggingface.co/black-forest-labs/FLUX.1-Krea-dev

	---

	## Uses

	### Direct Use

	- High-resolution text-to-image generation (1024 → 4096 px)
	- Diffusers-based inference, demos, and deployment
	- Research on efficient latent-space diffusion and high-resolution synthesis

	### Downstream Use

	- Further research or finetuning only if compliant with the upstream license
	- Integration into non-commercial creative or research tools

	### Out-of-Scope Use

	- Commercial usage (not permitted by the FLUX.1-dev license)
	- Illegal, harmful, or deceptive content generation

	---

	## Bias, Risks, and Limitations

	- The model may reproduce societal biases present in its training data.
	- High-resolution generation is GPU- and VRAM-intensive.
	- Outputs are not guaranteed to be factual or safe without moderation.
	- This repo does not introduce new safety mechanisms beyond those of the base model.

	### Recommendations

	- Review the FLUX.1-dev non-commercial license carefully before use.
	- Apply standard content filtering and safety practices in downstream applications.
	- Expect memory usage to scale significantly with resolution.

	---

	## How to Get Started with the Model

	### Minimal Load

	```python
	import torch
	from diffusers import DiffusionPipeline

	pipe = DiffusionPipeline.from_pretrained(
	"blanchon/dc_flux_krea_diffusers",
	trust_remote_code=True,
	torch_dtype=torch.bfloat16,
	).to("cuda")
	````

	### Image Generation Example

	```python
	import torch
	from diffusers import DiffusionPipeline

	pipe = DiffusionPipeline.from_pretrained(
	"blanchon/dc_flux_krea_diffusers",
	trust_remote_code=True,
	torch_dtype=torch.bfloat16,
	).to("cuda")

	prompt = "a tiny astronaut hatching from an egg on mars"

	image = pipe(
	prompt=prompt,
	width=2048,
	height=2048,
	guidance_scale=4.5,
	num_inference_steps=28,
	output_type="pil",
	).images[0]

	image.save("dc_flux_krea.png")
	```

	For reproducible results, pass a seeded `torch.Generator(device="cuda")`.

	---

	## Training Details

	### Training Data

	This repository does not introduce new training data.

	According to the DC-Gen paper, post-training uses synthetic data generated from the base model to adapt it to a deeply compressed latent space.

	### Training Procedure

	DC-Gen applies:

	1. Embedding alignment to bridge the representation gap between latent spaces
	2. LoRA fine-tuning to recover base-model quality

	See the DC-Gen paper for full methodological details.

	---

	## Evaluation

	This repository does not add new evaluation results.

	All reported quality, throughput, and latency benchmarks originate from the DC-Gen technical report.

	---

	## Technical Specifications

	### Architecture

	* FLUX-family text-to-image diffusion model
	* Rectified flow transformer
	* Deeply compressed DC-AE latent space (DC-Gen)

	### Hardware Requirements

	* CUDA-capable GPU strongly recommended
	* 2K/4K generation requires substantial VRAM (≥24 GB recommended)

	---

	## Citation

	If you use this model in research, please cite:

	```bibtex
	@article{he2025dc,
	title={DC-Gen: Post-Training Diffusion Acceleration with Deeply Compressed Latent Space},
	author={He, Wenkun and Gu, Yuchao and Chen, Junyu and Zou, Dongyun and Lin, Yujun and Zhang, Zhekai and Xi, Haocheng and Li, Muyang and Zhu, Ligeng and Yu, Jincheng and others},
	journal={arXiv preprint arXiv:2509.25180},
	year={2025}
	}
	```

	---

	## Model Card Authors

	* DC-Gen research & model: DC-Gen team (NVIDIA)
	* Diffusers port & model card: @blanchon

	## Model Card Contact

	* For research questions: see the DC-Gen project page
	* For Diffusers port issues: use the Hugging Face Discussions tab