Instructions to use yitongl/sparse_quant_exp with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use yitongl/sparse_quant_exp with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("yitongl/sparse_quant_exp", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
File size: 2,437 Bytes
1d0c0cc | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 | # Standalone Inference Helper
This folder contains a portable inference helper for:
`sfp4_v4_sparse09_hpo_on_ours_p_init2050_1n_interactive/checkpoint-700`
It is not a full vendored copy of Wan or FastVideo. It contains the sparse FP4
backend overlay and a runner that can be applied to a FastVideo checkout or
installation so the uploaded checkpoint can be used for normal inference.
## Contents
- `run_inference.py`: downloads/loads `transformer/diffusion_pytorch_model.safetensors` from `yitongl/sparse_quant_exp` and runs `VideoGenerator`.
- `run.sh`: convenience wrapper that installs the overlay into `FASTVIDEO_ROOT` and then runs `run_inference.py`.
- `install_overlay.py`: copies the bundled sparse FP4 backend files into a FastVideo checkout/install.
- `overlay_files/`: exact runtime source files needed by `SPARSE_FP4_OURS_P_ATTN`.
- `training_attention_settings.json`: structured settings for the uploaded checkpoint.
## Expected Environment
- A working FastVideo Python environment.
- FastVideo dependencies installed, including PyTorch, Triton, safetensors, and
Hugging Face Hub.
- Access to the base model `Wan-AI/Wan2.1-T2V-1.3B-Diffusers`.
- A CUDA GPU supported by the custom Triton kernels.
## Usage
From a machine with this HF repo downloaded:
```bash
export FASTVIDEO_ROOT=/path/to/FastVideo
bash standalone_inference/run.sh \
--output-path outputs/sfp4_checkpoint_700 \
--seed 1000
```
The script sets:
```bash
FASTVIDEO_ATTENTION_BACKEND=SPARSE_FP4_OURS_P_ATTN
FASTVIDEO_SPARSE_FP4_USE_HIGH_PREC_O=1
```
and downloads the uploaded checkpoint-700 transformer weights unless `--weights`
is provided.
To use a local safetensors file:
```bash
export FASTVIDEO_ROOT=/path/to/FastVideo
bash standalone_inference/run.sh \
--weights /path/to/diffusion_pytorch_model.safetensors \
--prompt "your prompt"
```
## Attention Semantics
- Self-attention uses `SPARSE_FP4_OURS_P_ATTN`.
- Q/K/V use FP4 fake quantization with STE.
- VSA tile size is `4 x 4 x 4 = 64` tokens.
- Selected sparse tiles use group-local P quantization in the Triton kernel.
- Dropped tiles use tile mean compensation.
- Cross-attention falls back to dense SDPA and is not sparse/FP4.
## Checkpoint
The current HF `main` transformer file is checkpoint-700:
`transformer/diffusion_pytorch_model.safetensors`
Local SHA256 used when preparing this helper:
`4595ca81ea7085c15ccf14b738aa9c0fdf2d2786641f49b55e0bc0e99bf042d2`
|