LLamaGen_GF / README.md

nielsr HF Staff

Add comprehensive model card for Guidance-Free Training (GFT) with metadata

161d551 verified 6 months ago

preview code

raw

history blame

3.35 kB

metadata

license: mit
pipeline_tag: text-to-image
library_name: diffusers

Guidance-Free Training (GFT): Visual Generation Without Guidance

This repository contains checkpoints and code for Guidance-Free Training (GFT), a novel approach for visual generative models presented in the paper Visual Generation Without Guidance. GFT aims to eliminate the need for Classifier-Free Guidance (CFG) during sampling, effectively halving the computational cost of inference while matching or surpassing CFG's performance.

Unlike previous distillation-based approaches, GFT enables training directly from scratch and requires minimal modifications to existing codebases. It is a universal algorithm applicable across various visual generative models, including diffusion, autoregressive, and masked-prediction architectures.

Paper: Visual Generation Without Guidance GitHub Repository: https://github.com/thu-ml/GFT

GFT comparison

Qualitative T2I comparison between vanilla conditional generation, GFT, and CFG on Stable Diffusion 1.5 with the prompt "Elegant crystal vase holding pink peonies, soft raindrops tracing paths down the window behind it".

Key Features

Highly Efficient: GFT reduces sampling to a single model inference, effectively halving the computational cost compared to CFG.
Minimal Modifications: It requires fewer than 10 lines of code changes to existing visual generative model codebases, inheriting most design choices and hyperparameters.
Universal Applicability: GFT is highly versatile, working across diverse visual generative models such as diffusion, Flow, autoregressive, and masked-prediction architectures.
Training from Scratch: Unlike distillation methods, GFT enables direct training of guidance-free models from scratch.
Performance Match: Consistently achieves comparable or even lower FID scores with similar diversity-fidelity trade-offs compared to CFG baselines.
Flexible Sampling: Allows adjustment of sampling temperature with only a single model.

temperature control

GFT allows us to adjust sampling temperature of visual generation, with only a single model.

Usage and Pretrained Checkpoints

The project provides training code and pretrained guidance-free checkpoints for various models. These include:

DiT models
Stable Diffusion 1.5 models (e.g., SD1.5-GF-finetune)
LlamaGen models

For detailed implementation, training instructions, and example usage, please refer to the respective directories within the GitHub repository.

Citation

If you find our project helpful, please consider citing:

@article{chen2025visual,
  title={Visual Generation Without Guidance},
  author={Chen, Huayu and Jiang, Kai and Zheng, Kaiwen and Chen, Jianfei and Su, Hang and Zhu, Jun},
  journal={arXiv preprint arXiv:2501.15420},
  year={2025}
}