|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
tags: |
|
|
- video |
|
|
- video-generation |
|
|
- video-to-video |
|
|
- diffusers |
|
|
- wan2.2 |
|
|
--- |
|
|
# Wan2.2 Video Continuation (Demo) |
|
|
#### *The current project is still in development. |
|
|
This repo contains the code for video continuation inference using [Wan2.2](https://github.com/Wan-Video/Wan2.2). |
|
|
The main idea was taken from [LongCat-Video](https://huggingface.co/meituan-longcat/LongCat-Video). |
|
|
|
|
|
|
|
|
Demo example (Only the first 32 frames are original; the rest are generated) |
|
|
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/63fde49f6315a264aba6a7ed/fPm3hJ9SlZ-29ncWZHygW.mp4"></video> |
|
|
|
|
|
## Description |
|
|
This is simple lora for Wan2.2TI transformer. |
|
|
First test - rank = 64, alpha = 128. |
|
|
It was trained using around 10k video. Input video frames 16-64 and output video frames 41-81. |
|
|
Mostly attention processor has been changed for this approach. |
|
|
See <a href="https://github.com/TheDenk/wan2.2-video-continuation">Github code</a>. |
|
|
|
|
|
### Models |
|
|
| Model | Best input frames count | Best output frames count | Resolution | Huggingface Link | |
|
|
|-------|:-----------:|:------------------:|:------------------:|:------------------:| |
|
|
| TI2V-5B | 24-32-40 | 49-61-81 | 704x1280| [Link](https://huggingface.co/TheDenk/wan2.2-video-continuation) | |
|
|
|
|
|
|
|
|
### How to |
|
|
Clone repo |
|
|
```bash |
|
|
git clone https://github.com/TheDenk/wan2.2-video-continuation |
|
|
cd wan2.2-video-continuation |
|
|
``` |
|
|
|
|
|
Create venv |
|
|
```bash |
|
|
python -m venv venv |
|
|
source venv/bin/activate |
|
|
``` |
|
|
|
|
|
Install requirements |
|
|
```bash |
|
|
pip install git+https://github.com/huggingface/diffusers.git |
|
|
pip install -r requirements.txt |
|
|
``` |
|
|
|
|
|
|
|
|
### Inference examples |
|
|
#### Simple inference with cli |
|
|
#### Gradio inference |
|
|
```bash |
|
|
python -m inference.gradio_web_demo \ |
|
|
--base_model_path Wan-AI/Wan2.2-TI2V-5B-Diffusers \ |
|
|
--lora_path TheDenk/wan2.2-video-continuation |
|
|
``` |
|
|
|
|
|
|
|
|
```bash |
|
|
python -m inference.cli_demo \ |
|
|
--video_path "resources/ship.mp4" \ |
|
|
--num_input_frames 24 \ |
|
|
--num_output_frames 81 \ |
|
|
--prompt "Watercolor style, the wet suminagashi inks slowly spread into the shape of an island on the paper, with the edges continuously blending into delicate textural variations. A tiny paper boat floats in the direction of the water flow towards the still-wet areas, creating subtle ripples around it. Centered composition with soft natural light pouring in from the side, revealing subtle color gradations and a sense of movement." \ |
|
|
--base_model_path Wan-AI/Wan2.2-TI2V-5B-Diffusers \ |
|
|
--lora_path TheDenk/wan2.2-video-continuation |
|
|
``` |
|
|
|
|
|
|
|
|
#### Detailed Inference |
|
|
```bash |
|
|
python -m inference.cli_demo \ |
|
|
--video_path "resources/ship.mp4" \ |
|
|
--num_input_frames 24 \ |
|
|
--num_output_frames 81 \ |
|
|
--prompt "Watercolor style, the wet suminagashi inks slowly spread into the shape of an island on the paper, with the edges continuously blending into delicate textural variations. A tiny paper boat floats in the direction of the water flow towards the still-wet areas, creating subtle ripples around it. Centered composition with soft natural light pouring in from the side, revealing subtle color gradations and a sense of movement." \ |
|
|
--base_model_path Wan-AI/Wan2.2-TI2V-5B-Diffusers \ |
|
|
--lora_path TheDenk/wan2.2-video-continuation \ |
|
|
--num_inference_steps 50 \ |
|
|
--guidance_scale 5.0 \ |
|
|
--video_height 480 \ |
|
|
--video_width 832 \ |
|
|
--negative_prompt "bad quality, low quality" \ |
|
|
--seed 42 \ |
|
|
--out_fps 24 \ |
|
|
--output_path "result.mp4" \ |
|
|
--teacache_treshold 0.5 |
|
|
``` |
|
|
|
|
|
|
|
|
#### Minimal code example |
|
|
```python |
|
|
import os |
|
|
os.environ['CUDA_VISIBLE_DEVICES'] = "0" |
|
|
os.environ["TOKENIZERS_PARALLELISM"] = "false" |
|
|
|
|
|
import torch |
|
|
from diffusers.utils import load_video, export_to_video |
|
|
from diffusers import AutoencoderKLWan, UniPCMultistepScheduler |
|
|
|
|
|
from wan_continuous_transformer import WanTransformer3DModel |
|
|
from wan_continuous_pipeline import WanContinuousVideoPipeline |
|
|
|
|
|
base_model_path = "Wan-AI/Wan2.2-TI2V-5B-Diffusers" |
|
|
lora_path = "TheDenk/wan2.2-video-continuation" |
|
|
vae = AutoencoderKLWan.from_pretrained(base_model_path, subfolder="vae", torch_dtype=torch.float32) |
|
|
transformer = WanTransformer3DModel.from_pretrained(base_model_path, subfolder="transformer", torch_dtype=torch.bfloat16) |
|
|
|
|
|
pipe = WanContinuousVideoPipeline.from_pretrained( |
|
|
pretrained_model_name_or_path=base_model_path, |
|
|
transformer=transformer, |
|
|
vae=vae, |
|
|
torch_dtype=torch.bfloat16 |
|
|
) |
|
|
pipe.enable_model_cpu_offload() |
|
|
|
|
|
pipe.transformer.load_lora_adapter( |
|
|
lora_path, |
|
|
weight_name="pytorch_lora_weights.safetensors", |
|
|
adapter_name="video_continuation", |
|
|
prefix=None, |
|
|
) |
|
|
pipe.set_adapters("video_continuation", adapter_weights=1.0) |
|
|
|
|
|
img_h = 480 # 704 512 480 |
|
|
img_w = 832 # 1280 832 768 |
|
|
|
|
|
num_input_frames = 24 # 16 24 32 |
|
|
num_output_frames = 81 # 81 49 |
|
|
|
|
|
video_path = 'ship.mp4' |
|
|
previous_video = load_video(video_path)[-num_input_frames:] |
|
|
|
|
|
prompt = "Watercolor style, the wet suminagashi inks slowly spread into the shape of an island on the paper, with the edges continuously blending into delicate textural variations. A tiny paper boat floats in the direction of the water flow towards the still-wet areas, creating subtle ripples around it. Centered composition with soft natural light pouring in from the side, revealing subtle color gradations and a sense of movement." |
|
|
negative_prompt = "bad quality, low quality" |
|
|
|
|
|
output = pipe( |
|
|
previous_video=previous_video, |
|
|
prompt=prompt, |
|
|
negative_prompt=negative_prompt, |
|
|
height=img_h, |
|
|
width=img_w, |
|
|
num_frames=num_output_frames, |
|
|
guidance_scale=5, |
|
|
generator=torch.Generator(device="cuda").manual_seed(42), |
|
|
output_type="pil", |
|
|
|
|
|
teacache_treshold=0.4, |
|
|
).frames[0] |
|
|
|
|
|
export_to_video(output, "output.mp4", fps=16) |
|
|
``` |
|
|
|
|
|
|
|
|
## Acknowledgements |
|
|
Original code and models [Wan2.2](https://github.com/Wan-Video/Wan2.2). |
|
|
Video continuation approach from [LongCat-Video](https://huggingface.co/meituan-longcat/LongCat-Video). |
|
|
Increase inference speed with [TeaCache](https://github.com/ali-vilab/TeaCache) |
|
|
|
|
|
## Citations |
|
|
``` |
|
|
@misc{TheDenk, |
|
|
title={Wan2.2 Video Continuation}, |
|
|
author={Karachev Denis}, |
|
|
url={https://github.com/TheDenk/wan2.2-video-continuation}, |
|
|
publisher={Github}, |
|
|
year={2025} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Contacts |
|
|
<p>Issues should be raised directly in the repository. For professional support and recommendations please <a>welcomedenk@gmail.com</a>.</p> |