File size: 2,770 Bytes

fa9fbe1
d7c2fbf
 
fa9fbe1
d7c2fbf
fa9fbe1
 
d7c2fbf
a6e1294
 
 
fa9fbe1
a6e1294
d7c2fbf
 
 
 
 
 
 
 
 
 
fa9fbe1
 
d7c2fbf
 
fa9fbe1
d7c2fbf
 
fa9fbe1
d7c2fbf
fa9fbe1
d7c2fbf
fa9fbe1
d7c2fbf
 
e9db6c5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0977572
e9db6c5

---
base_model:
- Wan-AI/Wan2.2-I2V-A14B
license: apache-2.0
pipeline_tag: image-to-video
---

# FastVideo CausalWan2.2-I2V-A14B-Preview-Diffusers Model



<p align="center">
  <img src="https://cdn-uploads.huggingface.co/production/uploads/6532f70333c5982a291ca909/zAB7U0da8_8fP1N0DgUS2.png" width="200"/>
</p>
<div>
  <div align="center">
    <a href="https://github.com/hao-ai-lab/FastVideo" target="_blank">FastVideo Team</a>&emsp;
  </div>

  <div align="center">
    <a href="https://github.com/hao-ai-lab/FastVideo">Github</a> |
    <a href="https://hao-ai-lab.github.io/FastVideo">Project Page</a>
  </div>
</div>

## Disclaimer
Note that this is a preview model, meaning there are still quality issues. The inference speed is also unoptimized.

## Introduction
We're excited to introduce the **CausalWan2.2 I2V A14B series**—a new line of models.

---

## Model Overview

- 8-step inference is supported.
- Try it out on **FastVideo** — we support a wide range of GPUs from **H100** to **4090**, and also support **Mac** users!

## Inference code

```python
from fastvideo import VideoGenerator, SamplingParam
import json
# from fastvideo.configs.sample import SamplingParam

OUTPUT_PATH = "video_samples_self_forcing_causal_wan2_2_14B_i2v"
def main():
    # FastVideo will automatically use the optimal default arguments for the
    # model.
    # If a local path is provided, FastVideo will make a best effort
    # attempt to identify the optimal arguments.
    generator = VideoGenerator.from_pretrained(
        "FastVideo/SFWan2.2-I2V-A14B-Preview-Diffusers",
        # FastVideo will automatically handle distributed setup
        num_gpus=1,
        use_fsdp_inference=True,
        dit_cpu_offload=True, # DiT need to be offloaded for MoE
        dit_precision="fp32",
        vae_cpu_offload=False,
        text_encoder_cpu_offload=True,
        dmd_denoising_steps=[1000, 850, 700, 550, 350, 275, 200, 125],
        # Set pin_cpu_memory to false if CPU RAM is limited and there're no frequent CPU-GPU transfer
        pin_cpu_memory=True,
        # image_encoder_cpu_offload=False,
    )

    sampling_param = SamplingParam.from_pretrained("FastVideo/SFWan2.2-I2V-A14B-Preview-Diffusers")
    sampling_param.num_frames = 81
    sampling_param.width = 832
    sampling_param.height = 480
    sampling_param.seed = 1000

    with open("prompts/mixkit_i2v.jsonl", "r") as f:
        prompt_image_pairs = json.load(f)

    for prompt_image_pair in prompt_image_pairs:
        prompt = prompt_image_pair["prompt"]
        image_path = prompt_image_pair["image_path"]
        _ = generator.generate_video(prompt, image_path=image_path, output_path=OUTPUT_PATH, save_video=True, sampling_param=sampling_param)


if __name__ == "__main__":
    main()
```