fastgen-offline / FastGen /README.md
taohu's picture
Upload FastGen/README.md with huggingface_hub
2c50b20 verified

NVIDIA FastGen: Fast Generation from Diffusion Models

License PyTorch NVIDIA

Weili Nie β€’ Julius Berner β€’ Chao Liu β€’ Arash Vahdat

FastGen header

FastGen is a PyTorch-based framework for building fast generative models using various distillation and acceleration techniques. It supports:

  • large-scale training with β‰₯10B parameters.
  • different tasks and modalities, including T2I, I2V, and V2V.
  • various distillation methods, including consistency models, distribution matching distillation, self-forcing, and more.

Repository Structure

fastgen/
β”œβ”€β”€ fastgen/
β”‚   β”œβ”€β”€ callbacks/           # Training callbacks (EMA, profiling, etc.)
β”‚   β”œβ”€β”€ configs/             # Configuration system
β”‚   β”‚   β”œβ”€β”€ experiments/     # Experiment configs
β”‚   β”‚   └── methods/         # Method-specific configs
β”‚   β”œβ”€β”€ datasets/            # Dataset loaders
β”‚   β”œβ”€β”€ methods/             # Training methods (CM, DMD2, SFT, KD etc.)
β”‚   β”œβ”€β”€ networks/            # Neural network architectures
β”‚   β”œβ”€β”€ third_party/         # Third-party dependencies
β”‚   β”œβ”€β”€ trainer.py           # Main training loop
β”‚   └── utils/               # Utilities (distributed, checkpointing)
β”œβ”€β”€ scripts/                 # Inference and evaluation scripts
β”œβ”€β”€ tests/                   # Unit tests
β”œβ”€β”€ Makefile                 # Development commands (lint, format, test)
└── train.py                 # Main training entry point

Setup

Recommended: Use the provided Docker container for a consistent environment. See CONTRIBUTING.md for Docker setup instructions. Otherwise, create a new conda environment with conda create -y -n fastgen python=3.12.3 pip; conda activate fastgen.

Installation

git clone https://github.com/NVlabs/FastGen.git
cd FastGen
pip install -e .

Offline Installation

For air-gapped or isolated environments without internet access, use the pre-packaged offline bundle from HuggingFace:

# On a machine WITH internet: download the offline package (~68 GB)
pip install huggingface_hub
huggingface-cli download taohu/fastgen-offline --local-dir ./offline_package

# Transfer to offline machine (USB, rsync, etc.)
# rsync -avz ./offline_package user@offline-machine:/path/to/

On the offline machine:

cd /path/to/offline_package

# Run the setup script (creates conda env, installs packages, configures paths)
bash FastGen/scripts/setup_offline_env.sh

# Activate the environment
source activate_fastgen.sh

# Verify installation
python -c "import torch; print(f'PyTorch: {torch.__version__}')"
python -c "from diffusers import WanPipeline; print('Diffusers: OK')"
python -c "from fastgen.networks.Wan.network import Wan; print('FastGen: OK')"

Run data-free training (no video dataset required):

cd FastGen
python train.py --config=fastgen/configs/experiments/WanT2V/config_sf_datafree.py

The offline package includes:

  • Python wheels for all dependencies (~3.3 GB)
  • Wan-T2V-1.3B model from HuggingFace (~27 GB)
  • Self-Forcing checkpoints (~38 GB)
  • FastGen source code

To prepare your own offline package (e.g., for different Python version or platform):

python scripts/prepare_offline_package.py --output-dir ./my_offline_package \
    --python-version 311 --platform manylinux2014_x86_64 --create-archive

Credentials (Optional)

For W&B logging, get your API key and save it to credentials/wandb_api.txt or set the WANDB_API_KEY environment variable. Without either of these, W&B will prompt for your API key interactively. For more details, including S3 storage and other environment variables, see fastgen/configs/README.md.

Quick Start

Before running the following commands, download the CIFAR-10 dataset and pretrained EDM models:

python scripts/download_data.py --dataset cifar10

For other datasets and models, see fastgen/networks/README.md and fastgen/datasets/README.md.

Basic Training

python train.py --config=fastgen/configs/experiments/EDM/config_dmd2_test.py

If you run out-of-memory, try a smaller batch-size, e.g., dataloader_train.batch_size=32, which automatically uses gradient accumulation to match the global batch-size.

Expected Output: See the training log for a link to the run on wandb.ai. Training outputs go to $FASTGEN_OUTPUT_ROOT/{project}/{group}/{name}/. With default settings, outputs are organized as follows:

FASTGEN_OUTPUT/fastgen/cifar10/debug/
β”œβ”€β”€ checkpoints/    # Model checkpoints in the format {iteration:07d}.pth
β”‚   β”œβ”€β”€ 0001000.pth
β”‚   └── ...
β”œβ”€β”€ config.yaml     # Resolved configuration for reproducibility
β”œβ”€β”€ wandb_id.txt    # W&B run ID for resuming
└── ...          

DDP/FSDP2 Training

For multi-GPU training, use DDP:

torchrun --nproc_per_node=8 train.py \
    --config=fastgen/configs/experiments/EDM/config_dmd2_test.py \
    - trainer.ddp=True log_config.name=test_ddp

For large models, use FSDP2 for model sharding by replacing trainer.ddp=True with trainer.fsdp=True.

Inference

python scripts/inference/image_model_inference.py --config fastgen/configs/experiments/EDM/config_dmd2_test.py \
  --classes=10 --prompt_file=scripts/inference/prompts/classes.txt --ckpt=FASTGEN_OUTPUT/fastgen/cifar10/debug/checkpoints/0002000.pth - log_config.name=test_inference

For other inferences modes and FID evaluations, see scripts/README.md.

Command-Line Overrides

Override any config parameter using Hydra-style syntax (note the - separator):

python train.py --config=path/to/config.py - key=value nested.key=value

Documentation

Detailed documentation is available in each component's README:

Component Documentation Description
Methods fastgen/methods/README.md Training methods (sCM, MeanFlow, DMD2, Self-Forcing, etc.)
Networks fastgen/networks/README.md Network architectures (EDM, SD, SDXL, Flux, WAN, CogVideoX, Cosmos) and pretrained models
Configs fastgen/configs/README.md Configuration system, environment variables, and creating custom configs
Datasets fastgen/datasets/README.md Dataset preparation and WebDataset loaders
Callbacks fastgen/callbacks/README.md Training callbacks (EMA, logging, gradient clipping, etc.)
Inference scripts/README.md Inference modes (T2I, T2V, I2V, V2V, etc.) and FID evaluation
Third Party fastgen/third_party/README.md Third-party dependencies (Depth Anything V2, etc.)

Supported Methods

Category Methods
Consistency Models CM, sCM, TCM, MeanFlow
Distribution Matching DMD2, f-Distill, LADD, CausVid, Self-Forcing
Fine-Tuning SFT, CausalSFT
Knowledge Distillation KD, CausalKD

See fastgen/methods/README.md for details.

Supported Networks and Data

FastGen is designed to be agnostic to the network and data and you can add your own architectures and datasets (see fastgen/networks/README.md and fastgen/datasets/README.md). For reference, we provide the following implementations:

Data Networks
Image EDM, EDM2, DiT, SD 1.5, SDXL, Flux
Video WAN (T2V, I2V, VACE), CogVideoX, Cosmos Predict2

See fastgen/networks/README.md for details. Not all combinations of methods and networks are currently supported. We provide typical use-cases in our predefined configs in fastgen/configs/experiments.

We plan to provide distilled student checkpoints for CIFAR-10 and ImageNet soon.

Contributing

We welcome contributions! Please see CONTRIBUTING.md for details.

We thank everyone who has helped design, build, and test FastGen!

  • Core contributors: Weili Nie, Julius Berner, Chao Liu
  • Other contributors: James Lucas, David Pankratz, Sihyun Yu, Willis Ma, Yilun Xu, Shengqu Cai, Xinyin Ma, Yanke Song
  • Collaborators: Sophia Zalewski, Wei Xiong, Christian Laforte, Sajad Norouzi, Kaiwen Zheng, MiloΕ‘ HaΕ‘an, Saeed Hadadan, Gene Liu, David Dynerman, Grace Lam, Pooya Jannaty, Jan Kautz, and many more.
  • Project lead: Arash Vahdat

License

This project is licensed under the Apache License 2.0 - see LICENSE for details. Third-party licenses are documented in licenses/README.md.

Reference

@article{fastgen2026,
  title={NVIDIA FastGen: Fast Generation from Diffusion Models},
  author={Nie, Weili and Berner, Julius and Liu, Chao and Vahdat, Arash},
  url={https://github.com/NVlabs/FastGen},
  year={2026},
}