Standalone Inference Helper
This folder contains a portable inference helper for:
sfp4_v4_sparse09_hpo_on_ours_p_init2050_1n_interactive/checkpoint-700
It is not a full vendored copy of Wan or FastVideo. It contains the sparse FP4 backend overlay and a runner that can be applied to a FastVideo checkout or installation so the uploaded checkpoint can be used for normal inference.
Contents
run_inference.py: downloads/loadstransformer/diffusion_pytorch_model.safetensorsfromyitongl/sparse_quant_expand runsVideoGenerator.run.sh: convenience wrapper that installs the overlay intoFASTVIDEO_ROOTand then runsrun_inference.py.install_overlay.py: copies the bundled sparse FP4 backend files into a FastVideo checkout/install.overlay_files/: exact runtime source files needed bySPARSE_FP4_OURS_P_ATTN.training_attention_settings.json: structured settings for the uploaded checkpoint.
Expected Environment
- A working FastVideo Python environment.
- FastVideo dependencies installed, including PyTorch, Triton, safetensors, and Hugging Face Hub.
- Access to the base model
Wan-AI/Wan2.1-T2V-1.3B-Diffusers. - A CUDA GPU supported by the custom Triton kernels.
Usage
From a machine with this HF repo downloaded:
export FASTVIDEO_ROOT=/path/to/FastVideo
bash standalone_inference/run.sh \
--output-path outputs/sfp4_checkpoint_700 \
--seed 1000
The script sets:
FASTVIDEO_ATTENTION_BACKEND=SPARSE_FP4_OURS_P_ATTN
FASTVIDEO_SPARSE_FP4_USE_HIGH_PREC_O=1
and downloads the uploaded checkpoint-700 transformer weights unless --weights
is provided.
To use a local safetensors file:
export FASTVIDEO_ROOT=/path/to/FastVideo
bash standalone_inference/run.sh \
--weights /path/to/diffusion_pytorch_model.safetensors \
--prompt "your prompt"
Attention Semantics
- Self-attention uses
SPARSE_FP4_OURS_P_ATTN. - Q/K/V use FP4 fake quantization with STE.
- VSA tile size is
4 x 4 x 4 = 64tokens. - Selected sparse tiles use group-local P quantization in the Triton kernel.
- Dropped tiles use tile mean compensation.
- Cross-attention falls back to dense SDPA and is not sparse/FP4.
Checkpoint
The current HF main transformer file is checkpoint-700:
transformer/diffusion_pytorch_model.safetensors
Local SHA256 used when preparing this helper:
4595ca81ea7085c15ccf14b738aa9c0fdf2d2786641f49b55e0bc0e99bf042d2