Add `library_name` and sample usage

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +71 -7
README.md CHANGED
@@ -1,13 +1,14 @@
1
  ---
2
- license: apache-2.0
3
  base_model: Wan-AI/Wan2.1-T2V-1.3B
 
 
4
  tags:
5
  - text-to-video
6
  - diffusion
7
  - video-generation
8
  - turbodiffusion
9
  - wan2.1
10
- pipeline_tag: text-to-video
11
  ---
12
 
13
  <p align="center">
@@ -16,14 +17,77 @@ pipeline_tag: text-to-video
16
 
17
  # TurboWan2.1-T2V-1.3B-480P
18
 
19
- - This HuggingFace repo contains the `TurboWan2.1-T2V-1.3B-480P` model.
 
 
 
 
20
 
21
- - For RTX 5090, RTX 4090, or similar GPUs, please use the `TurboWan2.1-T2V-1.3B-480P-quant`. For other GPUs with a bigger GPU memory than 40GB, we recommend using `TurboWan2.1-T2V-1.3B-480P`.
22
 
23
- - For usage instructions, please see **https://github.com/thu-ml/TurboDiffusion**
24
 
25
- - Paper: [TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times](https://arxiv.org/pdf/2512.16093)
26
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
 
28
  ## Citation
29
  ```
@@ -81,4 +145,4 @@ pipeline_tag: text-to-video
81
  journal={arXiv preprint arXiv:2505.11594},
82
  year={2025}
83
  }
84
- ```
 
1
  ---
 
2
  base_model: Wan-AI/Wan2.1-T2V-1.3B
3
+ license: apache-2.0
4
+ pipeline_tag: text-to-video
5
  tags:
6
  - text-to-video
7
  - diffusion
8
  - video-generation
9
  - turbodiffusion
10
  - wan2.1
11
+ library_name: diffusers
12
  ---
13
 
14
  <p align="center">
 
17
 
18
  # TurboWan2.1-T2V-1.3B-480P
19
 
20
+ This HuggingFace repo contains the `TurboWan2.1-T2V-1.3B-480P` model, as presented in the paper [TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times](https://arxiv.org/pdf/2512.16093).
21
+
22
+ For RTX 5090, RTX 4090, or similar GPUs, please use the `TurboWan2.1-T2V-1.3B-480P-quant`. For other GPUs with a bigger GPU memory than 40GB, we recommend using `TurboWan2.1-T2V-1.3B-480P`.
23
+
24
+ For more detailed usage instructions and the full codebase, please see the [TurboDiffusion GitHub repository](https://github.com/thu-ml/TurboDiffusion).
25
 
26
+ ## Sample Usage
27
 
28
+ For GPUs with more than 40GB of GPU memory, **e.g., H100, we recommend using the unquantized checkpoint (without `-quant`) and removing `--quant_linear` from the command.**
29
 
30
+ 1. Download the Wan2.1 VAE (**applicable for both Wan2.1 and Wan2.2**) and umT5 text encoder checkpoints from the official [Wan2.1](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B) repository on Huggingface:
31
 
32
+ ```bash
33
+ mkdir checkpoints
34
+ cd checkpoints
35
+ wget https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B/resolve/main/Wan2.1_VAE.pth
36
+ wget https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B/resolve/main/models_t5_umt5-xxl-enc-bf16.pth
37
+ ```
38
+
39
+ 2. Download our finetuned checkpoints:
40
+ ```bash
41
+ wget https://huggingface.co/TurboDiffusion/TurboWan2.1-T2V-1.3B-480P/resolve/main/TurboWan2.1-T2V-1.3B-480P.pth
42
+ ```
43
+
44
+ For RTX 5090, RTX 4090, or similar GPUs, please use the quantized checkpoint:
45
+
46
+ ```bash
47
+ wget https://huggingface.co/TurboDiffusion/TurboWan2.1-T2V-1.3B-480P/resolve/main/TurboWan2.1-T2V-1.3B-480P-quant.pth
48
+ ```
49
+
50
+
51
+ For the Wan2.2-I2V model, download both the high-noise and low-noise checkpoints:
52
+ ```bash
53
+ wget https://huggingface.co/TurboDiffusion/TurboWan2.2-I2V-A14B-720P/resolve/main/TurboWan2.2-I2V-A14B-high-720P.pth
54
+ wget https://huggingface.co/TurboDiffusion/TurboWan2.2-I2V-A14B-720P/resolve/main/TurboWan2.2-I2V-A14B-low-720P.pth
55
+ ```
56
+
57
+ 3. Use the inference script for the **T2V** model:
58
+ ```bash
59
+ export PYTHONPATH=turbodiffusion
60
+
61
+ # Arguments:
62
+ # --dit_path Path to the finetuned TurboDiffusion checkpoint
63
+ # --model Model to use: Wan2.1-1.3B or Wan2.1-14B (default: Wan2.1-1.3B)
64
+ # --num_samples Number of videos to generate (default: 1)
65
+ # --num_steps Sampling steps, 1–4 (default: 4)
66
+ # --sigma_max Initial sigma for rCM (default: 80); larger choices (e.g., 1600) reduce diversity but may enhance quality
67
+ # --vae_path Path to Wan2.1 VAE (default: checkpoints/Wan2.1_VAE.pth)
68
+ # --text_encoder_path Path to umT5 text encoder (default: checkpoints/models_t5_umt5-xxl-enc-bf16.pth)
69
+ # --num_frames Number of frames to generate (default: 81)
70
+ # --prompt Text prompt for video generation
71
+ # --resolution Output resolution: "480p" or "720p" (default: 480p)
72
+ # --aspect_ratio Aspect ratio in W:H format (default: 16:9)
73
+ # --seed Random seed for reproducibility (default: 0)
74
+ # --save_path Output file path including extension (default: output/generated_video.mp4)
75
+ # --attention_type Attention module to use: original, sla or sagesla (default: sagesla)
76
+ # --sla_topk Top-k ratio for SLA/SageSLA attention (default: 0.1), we recommend using 0.15 for better video quality
77
+ # --quant_linear Enable quantization for linear layers, pass this if using a quantized checkpoint
78
+ # --default_norm Use the original LayerNorm and RMSNorm of Wan models
79
+
80
+ python turbodiffusion/inference/wan2.1_t2v_infer.py \
81
+ --model Wan2.1-1.3B \
82
+ --dit_path checkpoints/TurboWan2.1-T2V-1.3B-480P-quant.pth \
83
+ --resolution 480p \
84
+ --prompt "A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about." \
85
+ --num_samples 1 \
86
+ --num_steps 4 \
87
+ --quant_linear \
88
+ --attention_type sagesla \
89
+ --sla_topk 0.1
90
+ ```
91
 
92
  ## Citation
93
  ```
 
145
  journal={arXiv preprint arXiv:2505.11594},
146
  year={2025}
147
  }
148
+ ```