File size: 3,118 Bytes
90aa35e
4db877c
 
 
90aa35e
4db877c
 
 
 
 
 
 
 
 
 
697fddf
 
 
 
 
 
 
 
 
 
 
4db877c
 
 
 
 
 
 
 
 
 
 
 
697fddf
 
90aa35e
697fddf
 
 
 
 
 
 
 
 
 
 
 
 
 
90aa35e
697fddf
4db877c
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
# Backend snapshot for checkpoint-700

This directory is the code snapshot for the training backend used by:

`sfp4_v4_sparse09_hpo_on_ours_p_init2050_1n_interactive/checkpoint-700`

Key runtime settings:

- `FASTVIDEO_ATTENTION_BACKEND=SPARSE_FP4_OURS_P_ATTN`
- `FASTVIDEO_SPARSE_FP4_USE_HIGH_PREC_O=1`
- `VSA_SPARSITY=0.9`
- `VSA_INIT_SPARSITY=0.9`
- `VSA_WARMUP_STEPS=0`
- tile size: `4 x 4 x 4 = 64` video tokens

Training attention semantics:

- Video self-attention uses `SPARSE_FP4_OURS_P_ATTN`.
- Cross-attention is not quantized/sparse in this backend. It falls back to
  dense SDPA when `query_length != key_length`.
- `force_dense` paths also use dense SDPA.
- Q/K/V fake quantization uses FP4 with STE and no q/k mean subtraction.
- Selected sparse tiles use group-local P quantization in the Triton kernel.
- Dropped VSA tiles use tile-level q_mean/k_mean score plus mean_v
  compensation.

Important files:

- `fastvideo/attention/backends/sparse_fp4_ours_p_attn.py`: Python attention backend, Q/K/V fake quantization, top-k block map, tile mean setup.
- `fastvideo-kernel/python/fastvideo_kernel/block_sparse_attn_ours_p.py`: PyTorch custom op and autograd wrapper.
- `fastvideo-kernel/python/fastvideo_kernel/triton_kernels/block_sparse_attn_triton_ours_p.py`: Triton forward/backward kernel.
- `fastvideo-kernel/python/fastvideo_kernel/triton_kernels/nvfp4_utils.py`: FP4 quant/dequant utilities used by the kernel.
- `fastvideo-kernel/python/fastvideo_kernel/triton_kernels/quant_utils.py`: Q/K/V fake quant kernels.
- `fastvideo/attention/backends/video_sparse_attn.py`: VSA metadata and tile-size helper.
- `fastvideo/platforms/interface.py` and `fastvideo/platforms/cuda.py`: backend enum and CUDA backend selection wiring.
- `fastvideo/training/training_pipeline.py` and `fastvideo/training/wan_training_pipeline.py`: legacy SFT training path used by the launch script.
- `scripts/training/run_sparse_fp4_train_v4_1n_sparse09_hpo_on_ours_p_init2050_interactive.sh`: exact Slurm wrapper for this run.
- `scripts/training/run_sparse_fp4_train_v4_common.sh`: common SFT launch/resume script.
- `training_attention_settings.json`: structured attention/training settings
  for this checkpoint.
- `scripts/inference/run_sfp4_ours_p_checkpoint_700.sh`: inference example
  for the uploaded transformer checkpoint.
- `fastvideo/entrypoints/cli/generate.py`, `fastvideo/entrypoints/video_generator.py`,
  `fastvideo/pipelines/basic/wan/wan_pipeline.py`, and
  `fastvideo/pipelines/stages/denoising.py`: `fastvideo generate` inference
  path used by the example script.

Example inference flow:

```bash
hf download yitongl/sparse_quant_exp \
  --repo-type model \
  --local-dir checkpoints/hf_download/sparse_quant_exp \
  --include 'transformer/*'

bash backend_snapshot/scripts/inference/run_sfp4_ours_p_checkpoint_700.sh
```

Source repo HEAD when staged:

`3f818d0fc532ec6494b465967d5f485150917d0c`

Note: several backend files were uncommitted or locally modified when this
snapshot was staged, so the files here are the authoritative copy for this
checkpoint rather than the clean git commit alone.