Upload FastGen/README.md with huggingface_hub

2c50b20 verified 2 months ago

preview code

raw

history blame contribute delete

10.4 kB

NVIDIA FastGen: Fast Generation from Diffusion Models

Weili Nie • Julius Berner • Chao Liu • Arash Vahdat

FastGen header

FastGen is a PyTorch-based framework for building fast generative models using various distillation and acceleration techniques. It supports:

large-scale training with ≥10B parameters.
different tasks and modalities, including T2I, I2V, and V2V.
various distillation methods, including consistency models, distribution matching distillation, self-forcing, and more.

Repository Structure

fastgen/
├── fastgen/
│   ├── callbacks/           # Training callbacks (EMA, profiling, etc.)
│   ├── configs/             # Configuration system
│   │   ├── experiments/     # Experiment configs
│   │   └── methods/         # Method-specific configs
│   ├── datasets/            # Dataset loaders
│   ├── methods/             # Training methods (CM, DMD2, SFT, KD etc.)
│   ├── networks/            # Neural network architectures
│   ├── third_party/         # Third-party dependencies
│   ├── trainer.py           # Main training loop
│   └── utils/               # Utilities (distributed, checkpointing)
├── scripts/                 # Inference and evaluation scripts
├── tests/                   # Unit tests
├── Makefile                 # Development commands (lint, format, test)
└── train.py                 # Main training entry point

Setup

Recommended: Use the provided Docker container for a consistent environment. See CONTRIBUTING.md for Docker setup instructions. Otherwise, create a new conda environment with conda create -y -n fastgen python=3.12.3 pip; conda activate fastgen.

Installation

git clone https://github.com/NVlabs/FastGen.git
cd FastGen
pip install -e .

Offline Installation

For air-gapped or isolated environments without internet access, use the pre-packaged offline bundle from HuggingFace:

# On a machine WITH internet: download the offline package (~68 GB)
pip install huggingface_hub
huggingface-cli download taohu/fastgen-offline --local-dir ./offline_package

# Transfer to offline machine (USB, rsync, etc.)
# rsync -avz ./offline_package user@offline-machine:/path/to/

On the offline machine:

cd /path/to/offline_package

# Run the setup script (creates conda env, installs packages, configures paths)
bash FastGen/scripts/setup_offline_env.sh

# Activate the environment
source activate_fastgen.sh

# Verify installation
python -c "import torch; print(f'PyTorch: {torch.__version__}')"
python -c "from diffusers import WanPipeline; print('Diffusers: OK')"
python -c "from fastgen.networks.Wan.network import Wan; print('FastGen: OK')"

Run data-free training (no video dataset required):

cd FastGen
python train.py --config=fastgen/configs/experiments/WanT2V/config_sf_datafree.py

The offline package includes:

Python wheels for all dependencies (~3.3 GB)
Wan-T2V-1.3B model from HuggingFace (~27 GB)
Self-Forcing checkpoints (~38 GB)
FastGen source code

To prepare your own offline package (e.g., for different Python version or platform):

python scripts/prepare_offline_package.py --output-dir ./my_offline_package \
    --python-version 311 --platform manylinux2014_x86_64 --create-archive

Credentials (Optional)

For W&B logging, get your API key and save it to credentials/wandb_api.txt or set the WANDB_API_KEY environment variable. Without either of these, W&B will prompt for your API key interactively. For more details, including S3 storage and other environment variables, see fastgen/configs/README.md.

Quick Start

Before running the following commands, download the CIFAR-10 dataset and pretrained EDM models:

python scripts/download_data.py --dataset cifar10

For other datasets and models, see fastgen/networks/README.md and fastgen/datasets/README.md.

Basic Training

python train.py --config=fastgen/configs/experiments/EDM/config_dmd2_test.py

If you run out-of-memory, try a smaller batch-size, e.g., dataloader_train.batch_size=32, which automatically uses gradient accumulation to match the global batch-size.

Expected Output: See the training log for a link to the run on wandb.ai. Training outputs go to $FASTGEN_OUTPUT_ROOT/{project}/{group}/{name}/. With default settings, outputs are organized as follows:

FASTGEN_OUTPUT/fastgen/cifar10/debug/
├── checkpoints/    # Model checkpoints in the format {iteration:07d}.pth
│   ├── 0001000.pth
│   └── ...
├── config.yaml     # Resolved configuration for reproducibility
├── wandb_id.txt    # W&B run ID for resuming
└── ...

DDP/FSDP2 Training

For multi-GPU training, use DDP:

torchrun --nproc_per_node=8 train.py \
    --config=fastgen/configs/experiments/EDM/config_dmd2_test.py \
    - trainer.ddp=True log_config.name=test_ddp

For large models, use FSDP2 for model sharding by replacing trainer.ddp=True with trainer.fsdp=True.

Inference

python scripts/inference/image_model_inference.py --config fastgen/configs/experiments/EDM/config_dmd2_test.py \
  --classes=10 --prompt_file=scripts/inference/prompts/classes.txt --ckpt=FASTGEN_OUTPUT/fastgen/cifar10/debug/checkpoints/0002000.pth - log_config.name=test_inference

For other inferences modes and FID evaluations, see scripts/README.md.

Command-Line Overrides

Override any config parameter using Hydra-style syntax (note the - separator):

python train.py --config=path/to/config.py - key=value nested.key=value

Documentation

Detailed documentation is available in each component's README:

Component	Documentation	Description
Methods	fastgen/methods/README.md	Training methods (sCM, MeanFlow, DMD2, Self-Forcing, etc.)
Networks	fastgen/networks/README.md	Network architectures (EDM, SD, SDXL, Flux, WAN, CogVideoX, Cosmos) and pretrained models
Configs	fastgen/configs/README.md	Configuration system, environment variables, and creating custom configs
Datasets	fastgen/datasets/README.md	Dataset preparation and WebDataset loaders
Callbacks	fastgen/callbacks/README.md	Training callbacks (EMA, logging, gradient clipping, etc.)
Inference	scripts/README.md	Inference modes (T2I, T2V, I2V, V2V, etc.) and FID evaluation
Third Party	fastgen/third_party/README.md	Third-party dependencies (Depth Anything V2, etc.)

Supported Methods

Category	Methods
Consistency Models	CM, sCM, TCM, MeanFlow
Distribution Matching	DMD2, f-Distill, LADD, CausVid, Self-Forcing
Fine-Tuning	SFT, CausalSFT
Knowledge Distillation	KD, CausalKD

See fastgen/methods/README.md for details.

Supported Networks and Data

FastGen is designed to be agnostic to the network and data and you can add your own architectures and datasets (see fastgen/networks/README.md and fastgen/datasets/README.md). For reference, we provide the following implementations:

Data	Networks
Image	EDM, EDM2, DiT, SD 1.5, SDXL, Flux
Video	WAN (T2V, I2V, VACE), CogVideoX, Cosmos Predict2

See fastgen/networks/README.md for details. Not all combinations of methods and networks are currently supported. We provide typical use-cases in our predefined configs in fastgen/configs/experiments.

We plan to provide distilled student checkpoints for CIFAR-10 and ImageNet soon.

Contributing

We welcome contributions! Please see CONTRIBUTING.md for details.

We thank everyone who has helped design, build, and test FastGen!

Core contributors: Weili Nie, Julius Berner, Chao Liu
Other contributors: James Lucas, David Pankratz, Sihyun Yu, Willis Ma, Yilun Xu, Shengqu Cai, Xinyin Ma, Yanke Song
Collaborators: Sophia Zalewski, Wei Xiong, Christian Laforte, Sajad Norouzi, Kaiwen Zheng, Miloš Hašan, Saeed Hadadan, Gene Liu, David Dynerman, Grace Lam, Pooya Jannaty, Jan Kautz, and many more.
Project lead: Arash Vahdat

License

This project is licensed under the Apache License 2.0 - see LICENSE for details. Third-party licenses are documented in licenses/README.md.

Reference

@article{fastgen2026,
  title={NVIDIA FastGen: Fast Generation from Diffusion Models},
  author={Nie, Weili and Berner, Julius and Liu, Chao and Vahdat, Arash},
  url={https://github.com/NVlabs/FastGen},
  year={2026},
}