blanchon's picture
Update README.md
83703b2 verified
|
Raw
History Blame Contribute Delete
5.66 kB
---
library_name: diffusers
pipeline_tag: text-to-image
tags:
- text-to-image
- image-generation
- flux
- dc-gen
- diffusers
base_model:
- dc-ai/dc_flux_2K4K
- black-forest-labs/FLUX.1-Krea-dev
---
# blanchon/dc_flux_krea_diffusers
**Diffusers-compatible port of DC-Gen-FLUX (Krea)** for efficient high-resolution text-to-image generation (2K / 4K).
This repository repackages the original **DC-Gen FLUX.1-Krea checkpoint** into a 🧨 **Diffusers** `DiffusionPipeline`, enabling standard Diffusers workflows while preserving the behavior and performance of the upstream model.
---
## Model Details
### Model Description
**FLUX.1 DC-Gen Krea [dev]** is a DC-Gen–adapted FLUX.1-Krea checkpoint that replaces the original FLUX VAE with a **deeply compressed DC-AE latent space**.
Using **embedding alignment** followed by **lightweight LoRA fine-tuning**, DC-Gen enables much faster native **2K / 4K image generation** while preserving the base model’s realism and text-rendering quality.
This repository does **not** retrain the model. It only provides a **Diffusers port** of the upstream checkpoint for easier inference and deployment.
- **DC-Gen method & model:** NVIDIA DC-Gen team
(Wenkun He*, Yuchao Gu*, Junyu Chen*, Dongyun Zou, Yujun Lin, Zhekai Zhang, Haocheng Xi, Muyang Li, Ligeng Zhu, Jincheng Yu, Junsong Chen, Enze Xie, Song Han, Han Cai)
- **Diffusers port:** @blanchon
- **Model type:** Text-to-image diffusion (FLUX family, rectified flow transformer)
- **License:** FLUX.1 [dev] **Non-Commercial License** (same as upstream)
- **Upstream checkpoint:** `dc-ai/dc_flux_2K4K`
- **Base model family:** `black-forest-labs/FLUX.1-Krea-dev`
---
## Model Sources
- **DC-Gen project:** https://github.com/dc-ai-projects/DC-Gen
- **DC-Gen homepage:** https://hanlab.mit.edu/projects/dc-gen
- **Paper:** https://arxiv.org/abs/2509.25180
- **Upstream checkpoint:** https://huggingface.co/dc-ai/dc_flux_2K4K
- **FLUX.1-Krea base model:** https://huggingface.co/black-forest-labs/FLUX.1-Krea-dev
---
## Uses
### Direct Use
- High-resolution text-to-image generation (1024 → 4096 px)
- Diffusers-based inference, demos, and deployment
- Research on efficient latent-space diffusion and high-resolution synthesis
### Downstream Use
- Further research or finetuning **only if compliant with the upstream license**
- Integration into non-commercial creative or research tools
### Out-of-Scope Use
- Commercial usage (not permitted by the FLUX.1-dev license)
- Illegal, harmful, or deceptive content generation
---
## Bias, Risks, and Limitations
- The model may reproduce societal biases present in its training data.
- High-resolution generation is GPU- and VRAM-intensive.
- Outputs are not guaranteed to be factual or safe without moderation.
- This repo does not introduce new safety mechanisms beyond those of the base model.
### Recommendations
- Review the FLUX.1-dev non-commercial license carefully before use.
- Apply standard content filtering and safety practices in downstream applications.
- Expect memory usage to scale significantly with resolution.
---
## How to Get Started with the Model
### Minimal Load
```python
import torch
from diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained(
"blanchon/dc_flux_krea_diffusers",
trust_remote_code=True,
torch_dtype=torch.bfloat16,
).to("cuda")
````
### Image Generation Example
```python
import torch
from diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained(
"blanchon/dc_flux_krea_diffusers",
trust_remote_code=True,
torch_dtype=torch.bfloat16,
).to("cuda")
prompt = "a tiny astronaut hatching from an egg on mars"
image = pipe(
prompt=prompt,
width=2048,
height=2048,
guidance_scale=4.5,
num_inference_steps=28,
output_type="pil",
).images[0]
image.save("dc_flux_krea.png")
```
For reproducible results, pass a seeded `torch.Generator(device="cuda")`.
---
## Training Details
### Training Data
This repository does **not** introduce new training data.
According to the DC-Gen paper, post-training uses **synthetic data generated from the base model** to adapt it to a deeply compressed latent space.
### Training Procedure
DC-Gen applies:
1. **Embedding alignment** to bridge the representation gap between latent spaces
2. **LoRA fine-tuning** to recover base-model quality
See the DC-Gen paper for full methodological details.
---
## Evaluation
This repository does not add new evaluation results.
All reported quality, throughput, and latency benchmarks originate from the DC-Gen technical report.
---
## Technical Specifications
### Architecture
* FLUX-family text-to-image diffusion model
* Rectified flow transformer
* Deeply compressed DC-AE latent space (DC-Gen)
### Hardware Requirements
* CUDA-capable GPU strongly recommended
* 2K/4K generation requires substantial VRAM (≥24 GB recommended)
---
## Citation
If you use this model in research, please cite:
```bibtex
@article{he2025dc,
title={DC-Gen: Post-Training Diffusion Acceleration with Deeply Compressed Latent Space},
author={He, Wenkun and Gu, Yuchao and Chen, Junyu and Zou, Dongyun and Lin, Yujun and Zhang, Zhekai and Xi, Haocheng and Li, Muyang and Zhu, Ligeng and Yu, Jincheng and others},
journal={arXiv preprint arXiv:2509.25180},
year={2025}
}
```
---
## Model Card Authors
* **DC-Gen research & model:** DC-Gen team (NVIDIA)
* **Diffusers port & model card:** @blanchon
## Model Card Contact
* For research questions: see the DC-Gen project page
* For Diffusers port issues: use the Hugging Face Discussions tab