Instructions to use yitongl/5090_test with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use yitongl/5090_test with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("yitongl/5090_test", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
File size: 2,437 Bytes
d4cc469 378f6fb d4cc469 378f6fb | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | # Standalone Source Map
Important files:
- `fastvideo/models/dits/wanvideo.py`
- Defines `WanTransformerBlock_VSA`.
- Adds `to_gate_compress`.
- Passes `gate_compress` into self-attention.
- `fastvideo/attention/backends/sparse_fp4_compress_attn.py`
- Backend name: `SPARSE_FP4_COMPRESS_ATTN`.
- Sparse FP4 main branch plus high-precision block-mean compress branch.
- Prints `FASTVIDEO_BACKEND_CONFIRM: SPARSE_FP4_COMPRESS_ATTN is running`.
- `fastvideo/attention/backends/sparse_fp4_attn.py`
- Base sparse FP4 attention without compress branch.
- `fastvideo/layers/nvfp4_fake_quant_linear.py`
- `NVFP4FakeQuantReplicatedLinear`.
- Wan replacement helpers for normal fake-quant QAT and SVD-LoRA variants.
- `fastvideo/train/models/wan/wan.py`
- Training-time switches for enabling fake-quant linear and gate quantization.
- `fastvideo/pipelines/basic/wan/wan_dmd_pipeline.py`
- Deployment/validation pipeline used by this checkpoint.
- It constructs `FlowMatchEulerDiscreteScheduler` from `flow_shift`.
- `fastvideo/pipelines/stages/denoising.py`
- Contains `DmdDenoisingStage`, the actual solver path for online validation.
- Uses fixed DMD timesteps, predicts clean video, and re-noises to the next
timestep instead of running a normal UniPC/Euler `scheduler.step` loop.
- `fastvideo/models/schedulers/scheduling_flow_match_euler_discrete.py`
- Flow-match scheduler implementation used for DMD conversion and re-noising.
- `fastvideo/platforms/interface.py`, `fastvideo/platforms/cuda.py`
- Registers `SPARSE_FP4_ATTN` and `SPARSE_FP4_COMPRESS_ATTN`.
- `fastvideo-kernel/python/fastvideo_kernel/...`
- Block-sparse attention and quantization kernel source used by the backend.
The full source snapshot in `../repo_source/` is preferred when running the
included DCP export script. This directory is meant for quick inspection and
porting into another inference stack.
Solver reproduction summary:
- Use `WanDMDPipeline`, not the vanilla diffusers scheduler in `model_index.json`.
- The copied `scheduler/scheduler_config.json` is base-model metadata
(`UniPCMultistepScheduler`, `solver_type=bh2`) and is not the solver used by
online validation.
- Matching solver: `DmdDenoisingStage` with `FlowMatchEulerDiscreteScheduler`,
`shift=8.0`, DMD timesteps `[1000, 757, 522]`, `num_inference_steps=3`,
`guidance_scale=6.0`, attention backend `SPARSE_FP4_COMPRESS_ATTN`, sparsity
`0.8`.
|