Spaces:

oKen38461
/

WAN2.1-VACE

Paused

App Files Files Community

WAN2.1-VACE / CLAUDE.md

oKen38461

README.mdを更新し、Wan2.1 VACE顔保持動画生成システムの詳細を追加しました。プロジェクトの概要、主な機能、技術仕様、使い方、インストール手順、ライセンス情報を含め、SDKバージョンを4.0.0に変更しました。

725a76b 6 months ago

preview code

raw

history blame contribute delete

3.04 kB

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

This is a face-preserving video generation system using Alibaba's Wan2.1 VACE model. It creates 512x512 resolution videos at 24fps by interpolating between start and end frames while maintaining the identity of a reference face.

Common Commands

Setup and Installation

# Install dependencies
pip install -r requirements.txt

# Download the model (first time only, ~20GB)
python model_download.py

# Run the application
python app.py

Development Commands

No formal testing framework is configured
No linting tools are configured (consider adding ruff or flake8)
To check if the app runs: python -m py_compile app.py vace_integration.py config.py model_download.py

Architecture Overview

The codebase follows a clean separation of concerns:

app.py - Gradio web UI layer
- Handles user interactions and file uploads
- Manages temporary file operations
- Calls VACEProcessor for video generation
vace_integration.py - Core processing logic
- VACEProcessor class orchestrates the entire pipeline
- Creates template videos with gray frame interpolation
- Invokes the Wan2.1-VACE model via subprocess
- Handles mask generation and frame processing
config.py - Centralized configuration
- All paths, parameters, and settings
- Model configuration (resolution, fps, frame count)
- Directory management
model_download.py - Model management
- Downloads Wan2.1-VACE-1.3B from Hugging Face
- Validates model integrity

Key Technical Details

Model: Wan2.1-VACE-1.3B (1.3 billion parameters)
GPU Requirements: NVIDIA A10 or better (24GB VRAM)
Processing Flow:
1. User uploads 3 images (reference face, start frame, end frame)
2. System creates a template video with gray frames between start/end
3. VACE model interpolates frames while preserving face identity
4. Generated video is saved to /results/ directory

Important Directories

/cache/ - Model and framework caches (huggingface, torch, transformers)
/workspace/ - Temporary processing directory
/results/ - Output videos
/examples/ - Sample images for demo

Notes for Development

The project uses subprocess calls to invoke the model, not direct Python imports
All configuration is centralized in config.py - modify settings there
The Gradio interface is designed for Hugging Face Spaces deployment
Japanese documentation files (構築ガイド.md, 作業タスク.md) contain implementation details

音声での通知

ユーザーへ何かしらの入力を求める際、つまりあなたの返答が終了したら、毎回必ずsayコマンドを使って返答メッセージを出力してください。なるべく短い文言で端的に答えること。タイムアウトしないよう5文字以内推奨。

例: say "完了" say "エラー" say "準備OK"