Instructions to use yitongl/sparse_quant_exp with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use yitongl/sparse_quant_exp with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("yitongl/sparse_quant_exp", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
Backend snapshot for checkpoint-700
This directory is the code snapshot for the training backend used by:
sfp4_v4_sparse09_hpo_on_ours_p_init2050_1n_interactive/checkpoint-700
Key runtime settings:
FASTVIDEO_ATTENTION_BACKEND=SPARSE_FP4_OURS_P_ATTNFASTVIDEO_SPARSE_FP4_USE_HIGH_PREC_O=1VSA_SPARSITY=0.9VSA_INIT_SPARSITY=0.9VSA_WARMUP_STEPS=0- tile size:
4 x 4 x 4 = 64video tokens
Training attention semantics:
- Video self-attention uses
SPARSE_FP4_OURS_P_ATTN. - Cross-attention is not quantized/sparse in this backend. It falls back to
dense SDPA when
query_length != key_length. force_densepaths also use dense SDPA.- Q/K/V fake quantization uses FP4 with STE and no q/k mean subtraction.
- Selected sparse tiles use group-local P quantization in the Triton kernel.
- Dropped VSA tiles use tile-level q_mean/k_mean score plus mean_v compensation.
Important files:
fastvideo/attention/backends/sparse_fp4_ours_p_attn.py: Python attention backend, Q/K/V fake quantization, top-k block map, tile mean setup.fastvideo-kernel/python/fastvideo_kernel/block_sparse_attn_ours_p.py: PyTorch custom op and autograd wrapper.fastvideo-kernel/python/fastvideo_kernel/triton_kernels/block_sparse_attn_triton_ours_p.py: Triton forward/backward kernel.fastvideo-kernel/python/fastvideo_kernel/triton_kernels/nvfp4_utils.py: FP4 quant/dequant utilities used by the kernel.fastvideo-kernel/python/fastvideo_kernel/triton_kernels/quant_utils.py: Q/K/V fake quant kernels.fastvideo/attention/backends/video_sparse_attn.py: VSA metadata and tile-size helper.fastvideo/platforms/interface.pyandfastvideo/platforms/cuda.py: backend enum and CUDA backend selection wiring.fastvideo/training/training_pipeline.pyandfastvideo/training/wan_training_pipeline.py: legacy SFT training path used by the launch script.scripts/training/run_sparse_fp4_train_v4_1n_sparse09_hpo_on_ours_p_init2050_interactive.sh: exact Slurm wrapper for this run.scripts/training/run_sparse_fp4_train_v4_common.sh: common SFT launch/resume script.training_attention_settings.json: structured attention/training settings for this checkpoint.scripts/inference/run_sfp4_ours_p_checkpoint_700.sh: inference example for the uploaded transformer checkpoint.fastvideo/entrypoints/cli/generate.py,fastvideo/entrypoints/video_generator.py,fastvideo/pipelines/basic/wan/wan_pipeline.py, andfastvideo/pipelines/stages/denoising.py:fastvideo generateinference path used by the example script.
Example inference flow:
hf download yitongl/sparse_quant_exp \
--repo-type model \
--local-dir checkpoints/hf_download/sparse_quant_exp \
--include 'transformer/*'
bash backend_snapshot/scripts/inference/run_sfp4_ours_p_checkpoint_700.sh
Source repo HEAD when staged:
3f818d0fc532ec6494b465967d5f485150917d0c
Note: several backend files were uncommitted or locally modified when this snapshot was staged, so the files here are the authoritative copy for this checkpoint rather than the clean git commit alone.