|
|
--- |
|
|
base_model: |
|
|
- Wan-AI/Wan2.2-I2V-A14B |
|
|
license: apache-2.0 |
|
|
pipeline_tag: image-to-video |
|
|
--- |
|
|
|
|
|
# FastVideo CausalWan2.2-I2V-A14B-Preview-Diffusers Model |
|
|
|
|
|
|
|
|
|
|
|
<p align="center"> |
|
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/6532f70333c5982a291ca909/zAB7U0da8_8fP1N0DgUS2.png" width="200"/> |
|
|
</p> |
|
|
<div> |
|
|
<div align="center"> |
|
|
<a href="https://github.com/hao-ai-lab/FastVideo" target="_blank">FastVideo Team</a>  |
|
|
</div> |
|
|
|
|
|
<div align="center"> |
|
|
<a href="https://github.com/hao-ai-lab/FastVideo">Github</a> | |
|
|
<a href="https://hao-ai-lab.github.io/FastVideo">Project Page</a> |
|
|
</div> |
|
|
</div> |
|
|
|
|
|
## Disclaimer |
|
|
Note that this is a preview model, meaning there are still quality issues. The inference speed is also unoptimized. |
|
|
|
|
|
## Introduction |
|
|
We're excited to introduce the **CausalWan2.2 I2V A14B series**—a new line of models. |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Overview |
|
|
|
|
|
- 8-step inference is supported. |
|
|
- Try it out on **FastVideo** — we support a wide range of GPUs from **H100** to **4090**, and also support **Mac** users! |
|
|
|
|
|
## Inference code |
|
|
|
|
|
```python |
|
|
from fastvideo import VideoGenerator, SamplingParam |
|
|
import json |
|
|
# from fastvideo.configs.sample import SamplingParam |
|
|
|
|
|
OUTPUT_PATH = "video_samples_self_forcing_causal_wan2_2_14B_i2v" |
|
|
def main(): |
|
|
# FastVideo will automatically use the optimal default arguments for the |
|
|
# model. |
|
|
# If a local path is provided, FastVideo will make a best effort |
|
|
# attempt to identify the optimal arguments. |
|
|
generator = VideoGenerator.from_pretrained( |
|
|
"FastVideo/SFWan2.2-I2V-A14B-Preview-Diffusers", |
|
|
# FastVideo will automatically handle distributed setup |
|
|
num_gpus=1, |
|
|
use_fsdp_inference=True, |
|
|
dit_cpu_offload=True, # DiT need to be offloaded for MoE |
|
|
dit_precision="fp32", |
|
|
vae_cpu_offload=False, |
|
|
text_encoder_cpu_offload=True, |
|
|
dmd_denoising_steps=[1000, 850, 700, 550, 350, 275, 200, 125], |
|
|
# Set pin_cpu_memory to false if CPU RAM is limited and there're no frequent CPU-GPU transfer |
|
|
pin_cpu_memory=True, |
|
|
# image_encoder_cpu_offload=False, |
|
|
) |
|
|
|
|
|
sampling_param = SamplingParam.from_pretrained("FastVideo/SFWan2.2-I2V-A14B-Preview-Diffusers") |
|
|
sampling_param.num_frames = 81 |
|
|
sampling_param.width = 832 |
|
|
sampling_param.height = 480 |
|
|
sampling_param.seed = 1000 |
|
|
|
|
|
with open("prompts/mixkit_i2v.jsonl", "r") as f: |
|
|
prompt_image_pairs = json.load(f) |
|
|
|
|
|
for prompt_image_pair in prompt_image_pairs: |
|
|
prompt = prompt_image_pair["prompt"] |
|
|
image_path = prompt_image_pair["image_path"] |
|
|
_ = generator.generate_video(prompt, image_path=image_path, output_path=OUTPUT_PATH, save_video=True, sampling_param=sampling_param) |
|
|
|
|
|
|
|
|
if __name__ == "__main__": |
|
|
main() |
|
|
``` |