File size: 3,006 Bytes

### 🚀 WanVideo Model Suite  
**Combined & Quantized Models for ComfyUI Workflows**  
*Derived from `Wan-AI/Wan2.1-VACE-14B`*

---

## 📋 Overview  
This repository provides optimized models for [**WanVideo**](https://github.com/kijai/ComfyUI-WanVideoWrapper)—a high-fidelity video generation framework. Models are quantized to balance performance and resource efficiency while retaining visual quality. Designed for seamless integration with ComfyUI via:  
- **[WanVideo Wrapper](https://github.com/kijai/ComfyUI-WanVideoWrapper)** (Third-party extension)  
- Native **WanVideo nodes** in ComfyUI  

---

## 🔧 Key Components  

### 1. **Core Diffusion Models**  
| File | Size | Description |  
|------|------|-------------|  
| `wan2.1_i2v_720p_14B_fp8_e4m3fn.safetensors` | Quantized (FP8) | Base video generation model (14B params, 720p). |  
| `fantasytalking_fp16.safetensors` | FP16 | Specialized model for expressive dialogue animation. |  

### 2. **Text & Vision Encoders**  
| File | Type | Role |  
|------|------|------|  
| `umt5-xxl-enc-bf16.safetensors` | Text Encoder (UMT5-XXL) | BF16 precision for multilingual text understanding. |  
| `clip_vision_h.safetensors` | Vision Encoder | Processes visual inputs for conditional generation. |  

---

## 📁 ComfyUI Setup Guide  
Place files in these directories within your ComfyUI installation:  
```bash
models/  
├── diffusion_models/  
│   ├── wan2.1_i2v_720p_14B_fp8_e4m3fn.safetensors  
│   └── fantasytalking_fp16.safetensors  
├── clip_vision/  
│   └── clip_vision_h.safetensors  
└── text_encoders/  
    └── umt5-xxl-enc-bf16.safetensors  
```

---

## 🔗 Dependencies & Resources  
1. **Vision Encoder Resources**  
   - Download `clip_vision_h.safetensors` from:  
     [Comfy-Org/Wan_2.1_ComfyUI_repackaged](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/tree/main/split_files/clip_vision)  
   
2. **FantasyTalking Model**  
   - Source code & usage: [GitHub Repository](https://github.com/Fantasy-AMAP/fantasy-talking)  

3. **Base Model**  
   - Full precision version: [Wan-AI/Wan2.1-VACE-14B](https://huggingface.co/Wan-AI/Wan2.1-VACE-14B)  

---

## 💡 Usage Notes  
- **Quantization Benefits**: FP8 reduces VRAM usage by ~50% vs FP16, enabling 720p generation on consumer GPUs.  
- **Workflow Compatibility**: Combine with `Text-to-Video`, `Image-to-Video`, or `FantasyTalking` nodes in ComfyUI.  
- **Multi-Modal Inputs**: UMT5-XXL encoder supports multilingual prompts (e.g., English, Chinese).  

---

## ⚖️ License  
*Inherited from parent models ([Check Wan-AI License](https://huggingface.co/Wan-AI/Wan2.1-VACE-14B)). Non-commercial/research use recommended pending verification.*  

---

**✨ Pro Tip**: For optimal results, pair with WanVideo’s temporal consistency modules to reduce frame flickering in long sequences.  

---  
*Model Card curated by the ComfyUI community. Maintained for reproducibility and ease of deployment.*