FastVideo
/

Wan2.1-VSA-T2V-14B-720P-Diffusers

Model card Files Files and versions

BrianChen1129 commited on Jul 30, 2025

Commit

629816d

·

verified ·

1 Parent(s): 74b3b88

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -20,7 +20,7 @@ license: apache-2.0
 ## Model Overview
 - This model is finetuned with [VSA](https://arxiv.org/pdf/2505.13389), based on [Wan-AI/Wan2.1-T2V-14B-Diffusers](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B-Diffusers).
 - It achieves up to 2.1x speed up on a single **H100** GPU.
-- Our model is trained on **77×768×1280** resolution, but it supports generating videos with any resolution.(quality may degrade).
 - We set **VSA attention sparsity** to 0.9, and training runs for **1500 steps (~14 hours)**. You can tune this value from 0 to 0.9 to balance speed and performance for inference.
 - Finetuning and inference scripts are available in the [FastVideo](https://github.com/hao-ai-lab/FastVideo) repository:
   - [1 Node/GPU debugging finetuning script](https://github.com/hao-ai-lab/FastVideo/blob/main/scripts/finetune/finetune_v1_VSA.sh)

 ## Model Overview
 - This model is finetuned with [VSA](https://arxiv.org/pdf/2505.13389), based on [Wan-AI/Wan2.1-T2V-14B-Diffusers](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B-Diffusers).
 - It achieves up to 2.1x speed up on a single **H100** GPU.
+- Our model is trained on **77×768×1280** resolution, but it supports generating videos with **any resolution**.(quality may degrade).
 - We set **VSA attention sparsity** to 0.9, and training runs for **1500 steps (~14 hours)**. You can tune this value from 0 to 0.9 to balance speed and performance for inference.
 - Finetuning and inference scripts are available in the [FastVideo](https://github.com/hao-ai-lab/FastVideo) repository:
   - [1 Node/GPU debugging finetuning script](https://github.com/hao-ai-lab/FastVideo/blob/main/scripts/finetune/finetune_v1_VSA.sh)