WanVideo_comfy / README.md
ALGOTECH's picture
Update README.md
c64d752 verified
### πŸš€ WanVideo Model Suite
**Combined & Quantized Models for ComfyUI Workflows**
*Derived from `Wan-AI/Wan2.1-VACE-14B`*
---
## πŸ“‹ Overview
This repository provides optimized models for [**WanVideo**](https://github.com/kijai/ComfyUI-WanVideoWrapper)β€”a high-fidelity video generation framework. Models are quantized to balance performance and resource efficiency while retaining visual quality. Designed for seamless integration with ComfyUI via:
- **[WanVideo Wrapper](https://github.com/kijai/ComfyUI-WanVideoWrapper)** (Third-party extension)
- Native **WanVideo nodes** in ComfyUI
---
## πŸ”§ Key Components
### 1. **Core Diffusion Models**
| File | Size | Description |
|------|------|-------------|
| `wan2.1_i2v_720p_14B_fp8_e4m3fn.safetensors` | Quantized (FP8) | Base video generation model (14B params, 720p). |
| `fantasytalking_fp16.safetensors` | FP16 | Specialized model for expressive dialogue animation. |
### 2. **Text & Vision Encoders**
| File | Type | Role |
|------|------|------|
| `umt5-xxl-enc-bf16.safetensors` | Text Encoder (UMT5-XXL) | BF16 precision for multilingual text understanding. |
| `clip_vision_h.safetensors` | Vision Encoder | Processes visual inputs for conditional generation. |
---
## πŸ“ ComfyUI Setup Guide
Place files in these directories within your ComfyUI installation:
```bash
models/
β”œβ”€β”€ diffusion_models/
β”‚ β”œβ”€β”€ wan2.1_i2v_720p_14B_fp8_e4m3fn.safetensors
β”‚ └── fantasytalking_fp16.safetensors
β”œβ”€β”€ clip_vision/
β”‚ └── clip_vision_h.safetensors
└── text_encoders/
└── umt5-xxl-enc-bf16.safetensors
```
---
## πŸ”— Dependencies & Resources
1. **Vision Encoder Resources**
- Download `clip_vision_h.safetensors` from:
[Comfy-Org/Wan_2.1_ComfyUI_repackaged](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/tree/main/split_files/clip_vision)
2. **FantasyTalking Model**
- Source code & usage: [GitHub Repository](https://github.com/Fantasy-AMAP/fantasy-talking)
3. **Base Model**
- Full precision version: [Wan-AI/Wan2.1-VACE-14B](https://huggingface.co/Wan-AI/Wan2.1-VACE-14B)
---
## πŸ’‘ Usage Notes
- **Quantization Benefits**: FP8 reduces VRAM usage by ~50% vs FP16, enabling 720p generation on consumer GPUs.
- **Workflow Compatibility**: Combine with `Text-to-Video`, `Image-to-Video`, or `FantasyTalking` nodes in ComfyUI.
- **Multi-Modal Inputs**: UMT5-XXL encoder supports multilingual prompts (e.g., English, Chinese).
---
## βš–οΈ License
*Inherited from parent models ([Check Wan-AI License](https://huggingface.co/Wan-AI/Wan2.1-VACE-14B)). Non-commercial/research use recommended pending verification.*
---
**✨ Pro Tip**: For optimal results, pair with WanVideo’s temporal consistency modules to reduce frame flickering in long sequences.
---
*Model Card curated by the ComfyUI community. Maintained for reproducibility and ease of deployment.*