Instructions to use FastVideo/Wan2.1-VSA-T2V-14B-720P-Diffusers with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use FastVideo/Wan2.1-VSA-T2V-14B-720P-Diffusers with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("FastVideo/Wan2.1-VSA-T2V-14B-720P-Diffusers", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -44,18 +44,23 @@ fastvideo generate \
|
|
| 44 |
--sp-size $num_gpus \
|
| 45 |
--tp-size 1 \
|
| 46 |
--num-gpus $num_gpus \
|
| 47 |
-
--
|
| 48 |
-
--
|
| 49 |
-
--
|
|
|
|
|
|
|
|
|
|
|
|
|
| 50 |
--num-inference-steps 50 \
|
| 51 |
--fps 16 \
|
| 52 |
-
--guidance-scale
|
| 53 |
--flow-shift 5.0 \
|
| 54 |
--VSA-sparsity 0.9 \
|
| 55 |
-
--prompt
|
| 56 |
--negative-prompt "Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards" \
|
| 57 |
--seed 1024 \
|
| 58 |
-
--output-path VSA-
|
|
|
|
| 59 |
```
|
| 60 |
- Try it out on **FastVideo** — we support a wide range of GPUs from **H100** to **4090**
|
| 61 |
- We use [FastVideo 720P Synthetic Wan dataset](https://huggingface.co/datasets/FastVideo/Wan-Syn_77x768x1280_250k) for training.
|
|
|
|
| 44 |
--sp-size $num_gpus \
|
| 45 |
--tp-size 1 \
|
| 46 |
--num-gpus $num_gpus \
|
| 47 |
+
--dit-cpu-offload False \
|
| 48 |
+
--vae-cpu-offload False \
|
| 49 |
+
--text-encoder-cpu-offload True \
|
| 50 |
+
--pin-cpu-memory False \
|
| 51 |
+
--height 448 \
|
| 52 |
+
--width 832 \
|
| 53 |
+
--num-frames 77 \
|
| 54 |
--num-inference-steps 50 \
|
| 55 |
--fps 16 \
|
| 56 |
+
--guidance-scale 5.0 \
|
| 57 |
--flow-shift 5.0 \
|
| 58 |
--VSA-sparsity 0.9 \
|
| 59 |
+
--prompt-txt assets/prompt.txt \
|
| 60 |
--negative-prompt "Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards" \
|
| 61 |
--seed 1024 \
|
| 62 |
+
--output-path outputs_Wan-VSA-14B/ \
|
| 63 |
+
--enable_torch_compile
|
| 64 |
```
|
| 65 |
- Try it out on **FastVideo** — we support a wide range of GPUs from **H100** to **4090**
|
| 66 |
- We use [FastVideo 720P Synthetic Wan dataset](https://huggingface.co/datasets/FastVideo/Wan-Syn_77x768x1280_250k) for training.
|