Instructions to use yitongl/5090_test with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use yitongl/5090_test with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("yitongl/5090_test", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
Standalone Source Map
Important files:
fastvideo/models/dits/wanvideo.py- Defines
WanTransformerBlock_VSA. - Adds
to_gate_compress. - Passes
gate_compressinto self-attention.
- Defines
fastvideo/attention/backends/sparse_fp4_compress_attn.py- Backend name:
SPARSE_FP4_COMPRESS_ATTN. - Sparse FP4 main branch plus high-precision block-mean compress branch.
- Prints
FASTVIDEO_BACKEND_CONFIRM: SPARSE_FP4_COMPRESS_ATTN is running.
- Backend name:
fastvideo/attention/backends/sparse_fp4_attn.py- Base sparse FP4 attention without compress branch.
fastvideo/layers/nvfp4_fake_quant_linear.pyNVFP4FakeQuantReplicatedLinear.- Wan replacement helpers for normal fake-quant QAT and SVD-LoRA variants.
fastvideo/train/models/wan/wan.py- Training-time switches for enabling fake-quant linear and gate quantization.
fastvideo/platforms/interface.py,fastvideo/platforms/cuda.py- Registers
SPARSE_FP4_ATTNandSPARSE_FP4_COMPRESS_ATTN.
- Registers
fastvideo-kernel/python/fastvideo_kernel/...- Block-sparse attention and quantization kernel source used by the backend.
The full source snapshot in ../repo_source/ is preferred when running the
included DCP export script. This directory is meant for quick inspection and
porting into another inference stack.
2026-05-07 Fake Attention Fix
../FAKE_ATTENTION_V_QUANT_FIX.md documents the fake sparse-FP4 attention
update that aligns fake V quantization with the current real SA3 Vt/PV kernel:
Q/K still quantize across D, while V now uses token-axis per-16 scale groups
and is stored back in the original V layout.