Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation
Paper
•
2406.02347
•
Published
•
3
The 0.5B model for few steps image generation. It uses the SD3 VAE and Qwen 0.5B as text encoder.
The model learns from a denoised output of a larger pretrained model and a loss value from a discriminator model.
In its current state, it's evaluated as a refiner model in the second pass.
Clone the inference script to the current working directory.
from onediffusion.models.denoiser.nextdit.modeling_nextdit import NextDiT
NextDiT.from_pretrained('twodgirl/oneplus', subfolder='transformer')
Due to the lack of visible progress and adaptation of the Lumina T2I and OneDiffusion models, this repo was born.
OneDiffusion has shown a better understanding of art styles and human poses than similar sized DiT models.
The architecture has proven its worth. The training script builds on it and follows the Flash Diffusion training method.
It was also possible to finetune such a model with limited VRAM.