File size: 3,006 Bytes
c64d752
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b78b09e
c64d752
 
 
 
 
 
 
 
 
 
 
 
 
 
b78b09e
 
c64d752
 
 
 
 
 
 
 
 
 
 
 
b78b09e
c64d752
 
 
 
b78b09e
 
 
c64d752
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
### πŸš€ WanVideo Model Suite  
**Combined & Quantized Models for ComfyUI Workflows**  
*Derived from `Wan-AI/Wan2.1-VACE-14B`*

---

## πŸ“‹ Overview  
This repository provides optimized models for [**WanVideo**](https://github.com/kijai/ComfyUI-WanVideoWrapper)β€”a high-fidelity video generation framework. Models are quantized to balance performance and resource efficiency while retaining visual quality. Designed for seamless integration with ComfyUI via:  
- **[WanVideo Wrapper](https://github.com/kijai/ComfyUI-WanVideoWrapper)** (Third-party extension)  
- Native **WanVideo nodes** in ComfyUI  

---

## πŸ”§ Key Components  

### 1. **Core Diffusion Models**  
| File | Size | Description |  
|------|------|-------------|  
| `wan2.1_i2v_720p_14B_fp8_e4m3fn.safetensors` | Quantized (FP8) | Base video generation model (14B params, 720p). |  
| `fantasytalking_fp16.safetensors` | FP16 | Specialized model for expressive dialogue animation. |  

### 2. **Text & Vision Encoders**  
| File | Type | Role |  
|------|------|------|  
| `umt5-xxl-enc-bf16.safetensors` | Text Encoder (UMT5-XXL) | BF16 precision for multilingual text understanding. |  
| `clip_vision_h.safetensors` | Vision Encoder | Processes visual inputs for conditional generation. |  

---

## πŸ“ ComfyUI Setup Guide  
Place files in these directories within your ComfyUI installation:  
```bash
models/  
β”œβ”€β”€ diffusion_models/  
β”‚   β”œβ”€β”€ wan2.1_i2v_720p_14B_fp8_e4m3fn.safetensors  
β”‚   └── fantasytalking_fp16.safetensors  
β”œβ”€β”€ clip_vision/  
β”‚   └── clip_vision_h.safetensors  
└── text_encoders/  
    └── umt5-xxl-enc-bf16.safetensors  
```

---

## πŸ”— Dependencies & Resources  
1. **Vision Encoder Resources**  
   - Download `clip_vision_h.safetensors` from:  
     [Comfy-Org/Wan_2.1_ComfyUI_repackaged](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/tree/main/split_files/clip_vision)  
   
2. **FantasyTalking Model**  
   - Source code & usage: [GitHub Repository](https://github.com/Fantasy-AMAP/fantasy-talking)  

3. **Base Model**  
   - Full precision version: [Wan-AI/Wan2.1-VACE-14B](https://huggingface.co/Wan-AI/Wan2.1-VACE-14B)  

---

## πŸ’‘ Usage Notes  
- **Quantization Benefits**: FP8 reduces VRAM usage by ~50% vs FP16, enabling 720p generation on consumer GPUs.  
- **Workflow Compatibility**: Combine with `Text-to-Video`, `Image-to-Video`, or `FantasyTalking` nodes in ComfyUI.  
- **Multi-Modal Inputs**: UMT5-XXL encoder supports multilingual prompts (e.g., English, Chinese).  

---

## βš–οΈ License  
*Inherited from parent models ([Check Wan-AI License](https://huggingface.co/Wan-AI/Wan2.1-VACE-14B)). Non-commercial/research use recommended pending verification.*  

---

**✨ Pro Tip**: For optimal results, pair with WanVideo’s temporal consistency modules to reduce frame flickering in long sequences.  

---  
*Model Card curated by the ComfyUI community. Maintained for reproducibility and ease of deployment.*