| ### π WanVideo Model Suite | |
| **Combined & Quantized Models for ComfyUI Workflows** | |
| *Derived from `Wan-AI/Wan2.1-VACE-14B`* | |
| --- | |
| ## π Overview | |
| This repository provides optimized models for [**WanVideo**](https://github.com/kijai/ComfyUI-WanVideoWrapper)βa high-fidelity video generation framework. Models are quantized to balance performance and resource efficiency while retaining visual quality. Designed for seamless integration with ComfyUI via: | |
| - **[WanVideo Wrapper](https://github.com/kijai/ComfyUI-WanVideoWrapper)** (Third-party extension) | |
| - Native **WanVideo nodes** in ComfyUI | |
| --- | |
| ## π§ Key Components | |
| ### 1. **Core Diffusion Models** | |
| | File | Size | Description | | |
| |------|------|-------------| | |
| | `wan2.1_i2v_720p_14B_fp8_e4m3fn.safetensors` | Quantized (FP8) | Base video generation model (14B params, 720p). | | |
| | `fantasytalking_fp16.safetensors` | FP16 | Specialized model for expressive dialogue animation. | | |
| ### 2. **Text & Vision Encoders** | |
| | File | Type | Role | | |
| |------|------|------| | |
| | `umt5-xxl-enc-bf16.safetensors` | Text Encoder (UMT5-XXL) | BF16 precision for multilingual text understanding. | | |
| | `clip_vision_h.safetensors` | Vision Encoder | Processes visual inputs for conditional generation. | | |
| --- | |
| ## π ComfyUI Setup Guide | |
| Place files in these directories within your ComfyUI installation: | |
| ```bash | |
| models/ | |
| βββ diffusion_models/ | |
| β βββ wan2.1_i2v_720p_14B_fp8_e4m3fn.safetensors | |
| β βββ fantasytalking_fp16.safetensors | |
| βββ clip_vision/ | |
| β βββ clip_vision_h.safetensors | |
| βββ text_encoders/ | |
| βββ umt5-xxl-enc-bf16.safetensors | |
| ``` | |
| --- | |
| ## π Dependencies & Resources | |
| 1. **Vision Encoder Resources** | |
| - Download `clip_vision_h.safetensors` from: | |
| [Comfy-Org/Wan_2.1_ComfyUI_repackaged](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/tree/main/split_files/clip_vision) | |
| 2. **FantasyTalking Model** | |
| - Source code & usage: [GitHub Repository](https://github.com/Fantasy-AMAP/fantasy-talking) | |
| 3. **Base Model** | |
| - Full precision version: [Wan-AI/Wan2.1-VACE-14B](https://huggingface.co/Wan-AI/Wan2.1-VACE-14B) | |
| --- | |
| ## π‘ Usage Notes | |
| - **Quantization Benefits**: FP8 reduces VRAM usage by ~50% vs FP16, enabling 720p generation on consumer GPUs. | |
| - **Workflow Compatibility**: Combine with `Text-to-Video`, `Image-to-Video`, or `FantasyTalking` nodes in ComfyUI. | |
| - **Multi-Modal Inputs**: UMT5-XXL encoder supports multilingual prompts (e.g., English, Chinese). | |
| --- | |
| ## βοΈ License | |
| *Inherited from parent models ([Check Wan-AI License](https://huggingface.co/Wan-AI/Wan2.1-VACE-14B)). Non-commercial/research use recommended pending verification.* | |
| --- | |
| **β¨ Pro Tip**: For optimal results, pair with WanVideoβs temporal consistency modules to reduce frame flickering in long sequences. | |
| --- | |
| *Model Card curated by the ComfyUI community. Maintained for reproducibility and ease of deployment.* |