Instructions to use yitongl/5090_test with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use yitongl/5090_test with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("yitongl/5090_test", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
| # Standalone Source Map | |
| Important files: | |
| - `fastvideo/models/dits/wanvideo.py` | |
| - Defines `WanTransformerBlock_VSA`. | |
| - Adds `to_gate_compress`. | |
| - Passes `gate_compress` into self-attention. | |
| - `fastvideo/attention/backends/sparse_fp4_compress_attn.py` | |
| - Backend name: `SPARSE_FP4_COMPRESS_ATTN`. | |
| - Sparse FP4 main branch plus high-precision block-mean compress branch. | |
| - Prints `FASTVIDEO_BACKEND_CONFIRM: SPARSE_FP4_COMPRESS_ATTN is running`. | |
| - `fastvideo/attention/backends/sparse_fp4_attn.py` | |
| - Base sparse FP4 attention without compress branch. | |
| - `fastvideo/layers/nvfp4_fake_quant_linear.py` | |
| - `NVFP4FakeQuantReplicatedLinear`. | |
| - Wan replacement helpers for normal fake-quant QAT and SVD-LoRA variants. | |
| - `fastvideo/train/models/wan/wan.py` | |
| - Training-time switches for enabling fake-quant linear and gate quantization. | |
| - `fastvideo/platforms/interface.py`, `fastvideo/platforms/cuda.py` | |
| - Registers `SPARSE_FP4_ATTN` and `SPARSE_FP4_COMPRESS_ATTN`. | |
| - `fastvideo-kernel/python/fastvideo_kernel/...` | |
| - Block-sparse attention and quantization kernel source used by the backend. | |
| The full source snapshot in `../repo_source/` is preferred when running the | |
| included DCP export script. This directory is meant for quick inspection and | |
| porting into another inference stack. | |
| ## 2026-05-07 Fake Attention Fix | |
| `../FAKE_ATTENTION_V_QUANT_FIX.md` documents the fake sparse-FP4 attention | |
| update that aligns fake V quantization with the current real SA3 Vt/PV kernel: | |
| Q/K still quantize across `D`, while V now uses token-axis per-16 scale groups | |
| and is stored back in the original V layout. | |