YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Wan 2.1 I2V LoRA β€” Character Consistency Training

Professional training script for Wan 2.1 I2V LoRA that actually works for character consistency in video generation.

πŸ”‘ Why This Works (vs wavespeed.ai)

Feature wavespeed.ai This Script
Data format Images (zip) Video clips
Rank Max 64 128-512
I2V layers Unknown add_k_proj, add_v_proj (image cross-attention)
Training type T2I-style on images True I2V on video
Temporal consistency ❌ None βœ… Learned from video

πŸ“ Files

  • train_wan_i2v_lora.py β€” Full training script
  • TRAINING_GUIDE.md β€” Detailed setup instructions

🎯 Quick Start

1. Prepare Dataset

mkdir dataset
cp your_videos/*.mp4 dataset/
# Create captions.txt with format: video_name|SKSCHAR your prompt here
cat > dataset/captions.txt << 'EOF'
video_0|SKSCHAR woman walking in park
video_1|SKSCHAR woman talking to camera
EOF

2. Run Training (14B model)

# Requires A100 80GB / L40S / H100 (48GB+ VRAM)
# ~$2-3/hour on RunPod/Vast.ai/Lambda Labs

pip install torch transformers diffusers accelerate peft

accelerate launch train_wan_i2v_lora.py \
    --pretrained_model Wan-AI/Wan2.1-I2V-14B-480P-Diffusers \
    --dataset_dir ./dataset \
    --output_dir ./output \
    --rank 128 \
    --lora_alpha 128 \
    --lr 1e-4 \
    --max_steps 1000 \
    --grad_accum 4 \
    --mixed_precision bf16 \
    --trigger_word SKSCHAR \
    --push_to_hub \
    --hub_model_id yourname/character-lora

3. Inference

from diffusers import WanImageToVideoPipeline, AutoencoderKLWan
from transformers import CLIPVisionModel
import torch

model_id = "Wan-AI/Wan2.1-I2V-14B-480P-Diffusers"
pipe = WanImageToVideoPipeline.from_pretrained(
    model_id,
    vae=AutoencoderKLWan.from_pretrained(model_id, subfolder="vae", torch_dtype=torch.float32),
    image_encoder=CLIPVisionModel.from_pretrained(model_id, subfolder="image_encoder", torch_dtype=torch.float32),
    torch_dtype=torch.bfloat16,
)
pipe.to("cuda")

pipe.load_lora_weights("./output/final", adapter_name="char")
pipe.set_adapters(["char"], [0.8])

from diffusers.utils import load_image
image = load_image("reference.jpg")

output = pipe(
    image=image,
    prompt="SKSCHAR woman dancing gracefully",
    height=480, width=832,
    num_frames=81,
    guidance_scale=5.0,
    num_inference_steps=25,
).frames[0]

from diffusers.utils import export_to_video
export_to_video(output, "result.mp4", fps=16)

πŸ“Š Demo Dataset

Ready-to-use test dataset: Useravailablepls/wan-i2v-lora-demo-videos

python3 -c "
import requests, os
os.makedirs('demo_dataset', exist_ok=True)
base = 'https://huggingface.co/datasets/Useravailablepls/wan-i2v-lora-demo-videos/resolve/main/'
for f in ['video_0.mp4', 'video_1.mp4', 'captions.txt']:
    r = requests.get(base + f)
    open(f'demo_dataset/{f}', 'wb').write(r.content)
    print(f'Downloaded {f}')
"

πŸ”§ Key Parameters

Parameter Recommended Why
rank 128-256 (14B), 64 (1.3B) Higher = better consistency
lora_alpha = rank Standard practice
lr 1e-4 Constant schedule
max_steps 500-1000 More = overfitting
grad_accum 4-8 Effective batch size
num_frames 81 (81-1)/4+1 = 21 latent frames

πŸ“š Based On

  • Wan 2.1 (arXiv:2503.20314)
  • Pusa VTA (arXiv:2507.16116) β€” LoRA rank=512, alpha=1.4 recipe
  • UniAnimate-DiT (arXiv:2504.11289) β€” Video conditioning at patchified level
  • starsfriday's LoRA configs (target_modules from adapter_config.json)

⚠️ Hardware Requirements

Model VRAM GPU Examples Cost/hr
14B 48-80GB A100, L40S, H100 $2-6
1.3B 16-24GB T4, A10G, RTX 3090 Free-$1

Free options for 1.3B: Google Colab (T4), Kaggle (T4)

πŸ“ Citation

@article{wan2025wan21,
  title={Wan 2.1: Comprehensive and Efficient Video Generation},
  author={Wan Video Team},
  journal={arXiv preprint arXiv:2503.20314},
  year={2025}
}

πŸ“œ License

Apache-2.0 (same as base Wan 2.1 model)

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Papers for Useravailablepls/wan-i2v-character-lora