taohu's picture
Upload folder using huggingface_hub
0839907 verified

Self-Forcing: Quick Reference Guide

Self-Forcing is a distribution matching distillation method for causal/autoregressive video generation. It trains a fast student model to match a teacher model's output distribution while maintaining gradient flow through an autoregressive rollout process.

Reference: Huang et al., 2025


Environment Setup

conda activate sf
pip install -e .

Quick Start

1. Download Checkpoint

huggingface-cli download gdhe17/Self-Forcing checkpoints/ode_init.pt \
  --local-dir FASTGEN_OUTPUT/MODEL/Self-Forcing

2. Run Training

Data-Free Training (Recommended for testing)

Uses random tensors instead of real video data. GAN discriminator is disabled.

Single GPU:

python train.py --config=fastgen/configs/experiments/WanT2V/config_sf_datafree.py

Single GPU (specific device):

CUDA_VISIBLE_DEVICES=0 python train.py --config=fastgen/configs/experiments/WanT2V/config_sf_datafree.py

Multi-GPU (8 GPUs):

torchrun --nproc_per_node=8 train.py \
  --config=fastgen/configs/experiments/WanT2V/config_sf_datafree.py \
  - trainer.ddp=True

Training with Real Video Data

For best quality, use real video data with GAN discriminator enabled.

Prepare WebDataset shards (see fastgen/datasets/README.md):

/path/to/videos/
β”œβ”€β”€ 00000.tar (sample_001.mp4, sample_001.txt, ...)
β”œβ”€β”€ 00001.tar
└── ...

Single GPU:

python train.py --config=fastgen/configs/experiments/WanT2V/config_sf.py \
  - dataloader_train.datatags='["WDS:/path/to/your/videos"]'

Multi-GPU (8 GPUs):

torchrun --nproc_per_node=8 train.py \
  --config=fastgen/configs/experiments/WanT2V/config_sf.py \
  - trainer.ddp=True dataloader_train.datatags='["WDS:/path/to/your/videos"]'

Available Configs

Model Config Path Notes
WAN T2V (light) fastgen/configs/experiments/WanT2V/config_sf_datafree_light.py Data-free, reduced resolution (~8-12GB VRAM)
WAN T2V (data-free) fastgen/configs/experiments/WanT2V/config_sf_datafree.py Data-free, full 480p (~24GB VRAM)
WAN T2V (with data) fastgen/configs/experiments/WanT2V/config_sf.py Requires video data, GAN enabled
VACE-WAN V2V fastgen/configs/experiments/WanV2V/config_sf.py Requires video data

Key Configuration Parameters

Self-Forcing Specific

Parameter Default Description
enable_gradient_in_rollout True Enable gradients at exit step during rollout
start_gradient_frame 0 Frame index to start gradient tracking
same_step_across_blocks True Use same exit step for all blocks
last_step_only False Exit only at the last denoising step
context_noise 0.0 Noise level added to cached context (0-1)

Training Settings

Parameter Default Description
student_sample_steps 4 Number of denoising steps
student_update_freq 5 Student update frequency
gan_loss_weight_gen 0.001 GAN loss weight for generator

Custom Training Example

python train.py --config=fastgen/configs/experiments/WanT2V/config_sf.py \
  - model.gan_loss_weight_gen=0.005 \
    model.student_sample_steps=6 \
    model.context_noise=0.1 \
    trainer.max_iter=10000 \
    dataloader_train.batch_size=2

How It Works

  1. Autoregressive Rollout: Processes video chunk-by-chunk with KV-caching
  2. Gradient Tracking: Enables gradients at stochastic exit steps during rollout
  3. Distribution Matching: Combines VSD loss with GAN training
  4. Alternating Updates:
    • Student updates (VSD + GAN loss)
    • Discriminator/fake score updates (adversarial training)

Key Files

  • Method: fastgen/methods/distribution_matching/self_forcing.py
  • Config Template: fastgen/configs/methods/config_self_forcing.py
  • Tests: tests/test_sfmodel.py

Testing

pytest tests/test_sfmodel.py -v