WAN2.1-VACE / CLAUDE.md
oKen38461's picture
README.mdを更新し、Wan2.1 VACE顔保持動画生成システムの詳細を追加しました。プロジェクトの概要、主な機能、技術仕様、使い方、インストール手順、ライセンス情報を含め、SDKバージョンを4.0.0に変更しました。
725a76b

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

This is a face-preserving video generation system using Alibaba's Wan2.1 VACE model. It creates 512x512 resolution videos at 24fps by interpolating between start and end frames while maintaining the identity of a reference face.

Common Commands

Setup and Installation

# Install dependencies
pip install -r requirements.txt

# Download the model (first time only, ~20GB)
python model_download.py

# Run the application
python app.py

Development Commands

  • No formal testing framework is configured
  • No linting tools are configured (consider adding ruff or flake8)
  • To check if the app runs: python -m py_compile app.py vace_integration.py config.py model_download.py

Architecture Overview

The codebase follows a clean separation of concerns:

  1. app.py - Gradio web UI layer

    • Handles user interactions and file uploads
    • Manages temporary file operations
    • Calls VACEProcessor for video generation
  2. vace_integration.py - Core processing logic

    • VACEProcessor class orchestrates the entire pipeline
    • Creates template videos with gray frame interpolation
    • Invokes the Wan2.1-VACE model via subprocess
    • Handles mask generation and frame processing
  3. config.py - Centralized configuration

    • All paths, parameters, and settings
    • Model configuration (resolution, fps, frame count)
    • Directory management
  4. model_download.py - Model management

    • Downloads Wan2.1-VACE-1.3B from Hugging Face
    • Validates model integrity

Key Technical Details

  • Model: Wan2.1-VACE-1.3B (1.3 billion parameters)
  • GPU Requirements: NVIDIA A10 or better (24GB VRAM)
  • Processing Flow:
    1. User uploads 3 images (reference face, start frame, end frame)
    2. System creates a template video with gray frames between start/end
    3. VACE model interpolates frames while preserving face identity
    4. Generated video is saved to /results/ directory

Important Directories

  • /cache/ - Model and framework caches (huggingface, torch, transformers)
  • /workspace/ - Temporary processing directory
  • /results/ - Output videos
  • /examples/ - Sample images for demo

Notes for Development

  • The project uses subprocess calls to invoke the model, not direct Python imports
  • All configuration is centralized in config.py - modify settings there
  • The Gradio interface is designed for Hugging Face Spaces deployment
  • Japanese documentation files (構築ガイド.md, 作業タスク.md) contain implementation details

音声での通知

ユーザーへ何かしらの入力を求める際、つまりあなたの返答が終了したら、 毎回必ずsayコマンドを使って返答メッセージを出力してください。 なるべく短い文言で端的に答えること。タイムアウトしないよう5文字以内推奨。

例: say "完了" say "エラー" say "準備OK"