File size: 5,662 Bytes
0839907
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
# FastGen Inference

Generate images and videos using pretrained or distilled models.

| Script | Modality | Modes |
|--------|----------|-------|
| [`image_model_inference.py`](inference/image_model_inference.py) | Image | Unconditional, class-conditional, T2I |
| [`video_model_inference.py`](inference/video_model_inference.py) | Video | T2V, I2V, V2V, Video2World |

## Key Arguments

| Argument | Description |
|----------|-------------|
| `--do_student_sampling` | Run distilled student (few-step) |
| `--do_teacher_sampling` | Run teacher (multi-step) |
| `--ckpt_path` | Path to distilled checkpoint |
| `--num_steps` | Sampling steps for teacher |
| `--classes N` | Class-conditional with N classes |
| `--unconditional` | Unconditional generation |
| `--input_image_file` | Input images for I2V |
| `--source_video_file` | Source videos for V2V |
| `--fps` | Output video frame rate |
| `model.guidance_scale` | CFG scale (config override) |
| `trainer.seed` | Random seed for reproducibility (config override) |


## Example commands

### Unconditional
```bash
python scripts/inference/image_model_inference.py \
    --config fastgen/configs/experiments/EDM/config_sft_edm_cifar10.py \
    --do_student_sampling False --unconditional --num_samples 16 --num_steps 18
```

### Class-Conditional
```bash
python scripts/inference/image_model_inference.py \
    --config fastgen/configs/experiments/DiT/config_sft_sit_xl.py \
    --do_student_sampling False --classes 1000 --num_steps 50 \
    --prompt_file scripts/inference/prompts/classes.txt \
    - model.guidance_scale=4.0
```

### Text-to-Image (T2I)
```bash
python scripts/inference/image_model_inference.py \
    --config fastgen/configs/experiments/Flux/config_sft.py \
    --do_student_sampling False --num_steps 50 \
    - model.guidance_scale=3.5
```

### Text-to-Video (T2V)
```bash
python scripts/inference/video_model_inference.py \
    --config fastgen/configs/experiments/WanT2V/config_dmd2.py \
    --do_student_sampling False --num_steps 50 --fps 16 \
    --neg_prompt_file scripts/inference/prompts/negative_prompt.txt \
    - model.guidance_scale=5.0
```

### Image-to-Video (I2V)
```bash
python scripts/inference/video_model_inference.py \
    --config fastgen/configs/experiments/WanI2V/config_dmd2_wan22_5b.py \
    --do_student_sampling False --num_steps 50 --fps 16 \
    --neg_prompt_file scripts/inference/prompts/negative_prompt.txt \
    --input_image_file scripts/inference/prompts/source_image_paths.txt \
    - model.guidance_scale=5.0
```

### Video-to-Video (V2V)
```bash
python scripts/inference/video_model_inference.py \
    --config fastgen/configs/experiments/WanV2V/config_sft.py \
    --do_student_sampling False --num_steps 50 --fps 16 \
    --neg_prompt_file scripts/inference/prompts/negative_prompt.txt \
    --source_video_file scripts/inference/prompts/source_video_paths.txt \
    - model.guidance_scale=5.0
```

### Video2World (Cosmos)
```bash
python scripts/inference/video_model_inference.py \
    --config fastgen/configs/experiments/CosmosPredict2/config_sft.py \
    --do_student_sampling False --num_steps 35 --fps 24 \
    --neg_prompt_file scripts/inference/prompts/negative_prompt_cosmos.txt \
    --input_image_file scripts/inference/prompts/source_image_paths.txt \
    --num_conditioning_frames 1 \
    - model.guidance_scale=5.0 model.net.is_video2world=True model.input_shape="[16, 24, 88, 160]"
```

### Causal / Autoregressive
Use causal configs (e.g., `config_sft_causal_wan22_5b.py`) for autoregressive generation.

```bash
python scripts/inference/video_model_inference.py \
    --config fastgen/configs/experiments/WanI2V/config_sft_causal_wan22_5b.py \
    --do_student_sampling False --num_steps 50 --fps 16 \
    --neg_prompt_file scripts/inference/prompts/negative_prompt.txt \
    --input_image_file scripts/inference/prompts/source_image_paths.txt \
    - model.guidance_scale=5.0
```

For generating longer videos via extrapolation:
- `--num_segments N`: Generate N consecutive video segments autoregressively (default: 1)
- `--overlap_frames K`: Overlap K latent frames between segments for temporal consistency (default: 0)

---

## FID Evaluation

Compute Fréchet Inception Distance (FID) for image models using [`fid/compute_fid_from_ckpts.py`](fid/compute_fid_from_ckpts.py).

### Usage

```bash
torchrun --nproc_per_node=8 scripts/fid/compute_fid_from_ckpts.py \
    --config fastgen/configs/experiments/EDM/config_dmd2_cifar10.py
```

This script:
1. Loads checkpoints from `trainer.checkpointer.save_dir`
2. Generates `eval.num_samples` images using student sampling
3. Computes FID against reference statistics
4. Saves results to `{save_path}/{eval.samples_dir}/fid.json`


### Config Options

| Parameter | Description |
|-----------|-------------|
| `eval.num_samples` | Number of samples to generate (default: 50000) |
| `eval.min_ckpt` | Minimum checkpoint iteration to evaluate |
| `eval.max_ckpt` | Maximum checkpoint iteration to evaluate |
| `eval.samples_dir` | Subdirectory name for generated samples |
| `eval.save_images` | Save visualization grid instead of computing FID |

### Reference Statistics

FID reference statistics are computed following the [EDM](https://github.com/NVlabs/edm) and [EDM2](https://github.com/NVlabs/edm2) repositories. Store them in `$DATA_ROOT_DIR/fid-refs/`:

| Dataset | Reference File |
|---------|----------------|
| CIFAR-10 | `fid-refs/cifar10-32x32.npz` |
| ImageNet-64 (EDM) | `fid-refs/imagenet-64x64.npz` |
| ImageNet-64 (EDM2) | `fid-refs/imagenet-64x64-edmv2.npz` |
| ImageNet-256 | `fid-refs/imagenet_256.pkl` |
| COCO-2014 | `fid-refs/coco2014_eval_30k.npz` |