Self-Forcing: Quick Reference Guide
Self-Forcing is a distribution matching distillation method for causal/autoregressive video generation. It trains a fast student model to match a teacher model's output distribution while maintaining gradient flow through an autoregressive rollout process.
Reference: Huang et al., 2025
Environment Setup
conda activate sf
pip install -e .
Quick Start
1. Download Checkpoint
huggingface-cli download gdhe17/Self-Forcing checkpoints/ode_init.pt \
--local-dir FASTGEN_OUTPUT/MODEL/Self-Forcing
2. Run Training
Data-Free Training (Recommended for testing)
Uses random tensors instead of real video data. GAN discriminator is disabled.
Single GPU:
python train.py --config=fastgen/configs/experiments/WanT2V/config_sf_datafree.py
Single GPU (specific device):
CUDA_VISIBLE_DEVICES=0 python train.py --config=fastgen/configs/experiments/WanT2V/config_sf_datafree.py
Multi-GPU (8 GPUs):
torchrun --nproc_per_node=8 train.py \
--config=fastgen/configs/experiments/WanT2V/config_sf_datafree.py \
- trainer.ddp=True
Training with Real Video Data
For best quality, use real video data with GAN discriminator enabled.
Prepare WebDataset shards (see fastgen/datasets/README.md):
/path/to/videos/
βββ 00000.tar (sample_001.mp4, sample_001.txt, ...)
βββ 00001.tar
βββ ...
Single GPU:
python train.py --config=fastgen/configs/experiments/WanT2V/config_sf.py \
- dataloader_train.datatags='["WDS:/path/to/your/videos"]'
Multi-GPU (8 GPUs):
torchrun --nproc_per_node=8 train.py \
--config=fastgen/configs/experiments/WanT2V/config_sf.py \
- trainer.ddp=True dataloader_train.datatags='["WDS:/path/to/your/videos"]'
Available Configs
| Model | Config Path | Notes |
|---|---|---|
| WAN T2V (light) | fastgen/configs/experiments/WanT2V/config_sf_datafree_light.py |
Data-free, reduced resolution (~8-12GB VRAM) |
| WAN T2V (data-free) | fastgen/configs/experiments/WanT2V/config_sf_datafree.py |
Data-free, full 480p (~24GB VRAM) |
| WAN T2V (with data) | fastgen/configs/experiments/WanT2V/config_sf.py |
Requires video data, GAN enabled |
| VACE-WAN V2V | fastgen/configs/experiments/WanV2V/config_sf.py |
Requires video data |
Key Configuration Parameters
Self-Forcing Specific
| Parameter | Default | Description |
|---|---|---|
enable_gradient_in_rollout |
True |
Enable gradients at exit step during rollout |
start_gradient_frame |
0 |
Frame index to start gradient tracking |
same_step_across_blocks |
True |
Use same exit step for all blocks |
last_step_only |
False |
Exit only at the last denoising step |
context_noise |
0.0 |
Noise level added to cached context (0-1) |
Training Settings
| Parameter | Default | Description |
|---|---|---|
student_sample_steps |
4 |
Number of denoising steps |
student_update_freq |
5 |
Student update frequency |
gan_loss_weight_gen |
0.001 |
GAN loss weight for generator |
Custom Training Example
python train.py --config=fastgen/configs/experiments/WanT2V/config_sf.py \
- model.gan_loss_weight_gen=0.005 \
model.student_sample_steps=6 \
model.context_noise=0.1 \
trainer.max_iter=10000 \
dataloader_train.batch_size=2
How It Works
- Autoregressive Rollout: Processes video chunk-by-chunk with KV-caching
- Gradient Tracking: Enables gradients at stochastic exit steps during rollout
- Distribution Matching: Combines VSD loss with GAN training
- Alternating Updates:
- Student updates (VSD + GAN loss)
- Discriminator/fake score updates (adversarial training)
Key Files
- Method:
fastgen/methods/distribution_matching/self_forcing.py - Config Template:
fastgen/configs/methods/config_self_forcing.py - Tests:
tests/test_sfmodel.py
Testing
pytest tests/test_sfmodel.py -v