NVIDIA FastGen: Fast Generation from Diffusion Models
Weili Nie β’ Julius Berner β’ Chao Liu β’ Arash Vahdat
FastGen is a PyTorch-based framework for building fast generative models using various distillation and acceleration techniques. It supports:
- large-scale training with β₯10B parameters.
- different tasks and modalities, including T2I, I2V, and V2V.
- various distillation methods, including consistency models, distribution matching distillation, self-forcing, and more.
Repository Structure
fastgen/
βββ fastgen/
β βββ callbacks/ # Training callbacks (EMA, profiling, etc.)
β βββ configs/ # Configuration system
β β βββ experiments/ # Experiment configs
β β βββ methods/ # Method-specific configs
β βββ datasets/ # Dataset loaders
β βββ methods/ # Training methods (CM, DMD2, SFT, KD etc.)
β βββ networks/ # Neural network architectures
β βββ third_party/ # Third-party dependencies
β βββ trainer.py # Main training loop
β βββ utils/ # Utilities (distributed, checkpointing)
βββ scripts/ # Inference and evaluation scripts
βββ tests/ # Unit tests
βββ Makefile # Development commands (lint, format, test)
βββ train.py # Main training entry point
Setup
Recommended: Use the provided Docker container for a consistent environment. See CONTRIBUTING.md for Docker setup instructions. Otherwise, create a new conda environment with conda create -y -n fastgen python=3.12.3 pip; conda activate fastgen.
Installation
git clone https://github.com/NVlabs/FastGen.git
cd FastGen
pip install -e .
Offline Installation
For air-gapped or isolated environments without internet access, use the pre-packaged offline bundle from HuggingFace:
# On a machine WITH internet: download the offline package (~68 GB)
pip install huggingface_hub
huggingface-cli download taohu/fastgen-offline --local-dir ./offline_package
# Transfer to offline machine (USB, rsync, etc.)
# rsync -avz ./offline_package user@offline-machine:/path/to/
On the offline machine:
cd /path/to/offline_package
# Run the setup script (creates conda env, installs packages, configures paths)
bash FastGen/scripts/setup_offline_env.sh
# Activate the environment
source activate_fastgen.sh
# Verify installation
python -c "import torch; print(f'PyTorch: {torch.__version__}')"
python -c "from diffusers import WanPipeline; print('Diffusers: OK')"
python -c "from fastgen.networks.Wan.network import Wan; print('FastGen: OK')"
Run data-free training (no video dataset required):
cd FastGen
python train.py --config=fastgen/configs/experiments/WanT2V/config_sf_datafree.py
The offline package includes:
- Python wheels for all dependencies (~3.3 GB)
- Wan-T2V-1.3B model from HuggingFace (~27 GB)
- Self-Forcing checkpoints (~38 GB)
- FastGen source code
To prepare your own offline package (e.g., for different Python version or platform):
python scripts/prepare_offline_package.py --output-dir ./my_offline_package \
--python-version 311 --platform manylinux2014_x86_64 --create-archive
Credentials (Optional)
For W&B logging, get your API key and save it to credentials/wandb_api.txt or set the WANDB_API_KEY environment variable.
Without either of these, W&B will prompt for your API key interactively.
For more details, including S3 storage and other environment variables, see fastgen/configs/README.md.
Quick Start
Before running the following commands, download the CIFAR-10 dataset and pretrained EDM models:
python scripts/download_data.py --dataset cifar10
For other datasets and models, see fastgen/networks/README.md and fastgen/datasets/README.md.
Basic Training
python train.py --config=fastgen/configs/experiments/EDM/config_dmd2_test.py
If you run out-of-memory, try a smaller batch-size, e.g., dataloader_train.batch_size=32, which automatically uses gradient accumulation to match the global batch-size.
Expected Output: See the training log for a link to the run on wandb.ai. Training outputs go to $FASTGEN_OUTPUT_ROOT/{project}/{group}/{name}/. With default settings, outputs are organized as follows:
FASTGEN_OUTPUT/fastgen/cifar10/debug/
βββ checkpoints/ # Model checkpoints in the format {iteration:07d}.pth
β βββ 0001000.pth
β βββ ...
βββ config.yaml # Resolved configuration for reproducibility
βββ wandb_id.txt # W&B run ID for resuming
βββ ...
DDP/FSDP2 Training
For multi-GPU training, use DDP:
torchrun --nproc_per_node=8 train.py \
--config=fastgen/configs/experiments/EDM/config_dmd2_test.py \
- trainer.ddp=True log_config.name=test_ddp
For large models, use FSDP2 for model sharding by replacing trainer.ddp=True with trainer.fsdp=True.
Inference
python scripts/inference/image_model_inference.py --config fastgen/configs/experiments/EDM/config_dmd2_test.py \
--classes=10 --prompt_file=scripts/inference/prompts/classes.txt --ckpt=FASTGEN_OUTPUT/fastgen/cifar10/debug/checkpoints/0002000.pth - log_config.name=test_inference
For other inferences modes and FID evaluations, see scripts/README.md.
Command-Line Overrides
Override any config parameter using Hydra-style syntax (note the - separator):
python train.py --config=path/to/config.py - key=value nested.key=value
Documentation
Detailed documentation is available in each component's README:
| Component | Documentation | Description |
|---|---|---|
| Methods | fastgen/methods/README.md | Training methods (sCM, MeanFlow, DMD2, Self-Forcing, etc.) |
| Networks | fastgen/networks/README.md | Network architectures (EDM, SD, SDXL, Flux, WAN, CogVideoX, Cosmos) and pretrained models |
| Configs | fastgen/configs/README.md | Configuration system, environment variables, and creating custom configs |
| Datasets | fastgen/datasets/README.md | Dataset preparation and WebDataset loaders |
| Callbacks | fastgen/callbacks/README.md | Training callbacks (EMA, logging, gradient clipping, etc.) |
| Inference | scripts/README.md | Inference modes (T2I, T2V, I2V, V2V, etc.) and FID evaluation |
| Third Party | fastgen/third_party/README.md | Third-party dependencies (Depth Anything V2, etc.) |
Supported Methods
| Category | Methods |
|---|---|
| Consistency Models | CM, sCM, TCM, MeanFlow |
| Distribution Matching | DMD2, f-Distill, LADD, CausVid, Self-Forcing |
| Fine-Tuning | SFT, CausalSFT |
| Knowledge Distillation | KD, CausalKD |
See fastgen/methods/README.md for details.
Supported Networks and Data
FastGen is designed to be agnostic to the network and data and you can add your own architectures and datasets (see fastgen/networks/README.md and fastgen/datasets/README.md). For reference, we provide the following implementations:
| Data | Networks |
|---|---|
| Image | EDM, EDM2, DiT, SD 1.5, SDXL, Flux |
| Video | WAN (T2V, I2V, VACE), CogVideoX, Cosmos Predict2 |
See fastgen/networks/README.md for details. Not all combinations of methods and networks are currently supported. We provide typical use-cases in our predefined configs in fastgen/configs/experiments.
We plan to provide distilled student checkpoints for CIFAR-10 and ImageNet soon.
Contributing
We welcome contributions! Please see CONTRIBUTING.md for details.
We thank everyone who has helped design, build, and test FastGen!
- Core contributors: Weili Nie, Julius Berner, Chao Liu
- Other contributors: James Lucas, David Pankratz, Sihyun Yu, Willis Ma, Yilun Xu, Shengqu Cai, Xinyin Ma, Yanke Song
- Collaborators: Sophia Zalewski, Wei Xiong, Christian Laforte, Sajad Norouzi, Kaiwen Zheng, MiloΕ‘ HaΕ‘an, Saeed Hadadan, Gene Liu, David Dynerman, Grace Lam, Pooya Jannaty, Jan Kautz, and many more.
- Project lead: Arash Vahdat
License
This project is licensed under the Apache License 2.0 - see LICENSE for details. Third-party licenses are documented in licenses/README.md.
Reference
@article{fastgen2026,
title={NVIDIA FastGen: Fast Generation from Diffusion Models},
author={Nie, Weili and Berner, Julius and Liu, Chao and Vahdat, Arash},
url={https://github.com/NVlabs/FastGen},
year={2026},
}