ALGOTECH
/

WanVideo_comfy

Model card Files Files and versions

WanVideo_comfy / README.md

ALGOTECH's picture

Update README.md

c64d752 verified 6 months ago

|

history blame contribute delete

3.01 kB

	### 🚀 WanVideo Model Suite
	Combined & Quantized Models for ComfyUI Workflows
	Derived from `Wan-AI/Wan2.1-VACE-14B`

	---

	## 📋 Overview
	This repository provides optimized models for [WanVideo](https://github.com/kijai/ComfyUI-WanVideoWrapper)—a high-fidelity video generation framework. Models are quantized to balance performance and resource efficiency while retaining visual quality. Designed for seamless integration with ComfyUI via:
	- [WanVideo Wrapper](https://github.com/kijai/ComfyUI-WanVideoWrapper) (Third-party extension)
	- Native WanVideo nodes in ComfyUI

	---

	## 🔧 Key Components

	### 1. Core Diffusion Models
	\| File \| Size \| Description \|
	\|------\|------\|-------------\|
	\| `wan2.1_i2v_720p_14B_fp8_e4m3fn.safetensors` \| Quantized (FP8) \| Base video generation model (14B params, 720p). \|
	\| `fantasytalking_fp16.safetensors` \| FP16 \| Specialized model for expressive dialogue animation. \|

	### 2. Text & Vision Encoders
	\| File \| Type \| Role \|
	\|------\|------\|------\|
	\| `umt5-xxl-enc-bf16.safetensors` \| Text Encoder (UMT5-XXL) \| BF16 precision for multilingual text understanding. \|
	\| `clip_vision_h.safetensors` \| Vision Encoder \| Processes visual inputs for conditional generation. \|

	---

	## 📁 ComfyUI Setup Guide
	Place files in these directories within your ComfyUI installation:
	```bash
	models/
	├── diffusion_models/
	│ ├── wan2.1_i2v_720p_14B_fp8_e4m3fn.safetensors
	│ └── fantasytalking_fp16.safetensors
	├── clip_vision/
	│ └── clip_vision_h.safetensors
	└── text_encoders/
	└── umt5-xxl-enc-bf16.safetensors
	```

	---

	## 🔗 Dependencies & Resources
	1. Vision Encoder Resources
	- Download `clip_vision_h.safetensors` from:
	[Comfy-Org/Wan_2.1_ComfyUI_repackaged](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/tree/main/split_files/clip_vision)

	2. FantasyTalking Model
	- Source code & usage: [GitHub Repository](https://github.com/Fantasy-AMAP/fantasy-talking)

	3. Base Model
	- Full precision version: [Wan-AI/Wan2.1-VACE-14B](https://huggingface.co/Wan-AI/Wan2.1-VACE-14B)

	---

	## 💡 Usage Notes
	- Quantization Benefits: FP8 reduces VRAM usage by ~50% vs FP16, enabling 720p generation on consumer GPUs.
	- Workflow Compatibility: Combine with `Text-to-Video`, `Image-to-Video`, or `FantasyTalking` nodes in ComfyUI.
	- Multi-Modal Inputs: UMT5-XXL encoder supports multilingual prompts (e.g., English, Chinese).

	---

	## ⚖️ License
	Inherited from parent models ([Check Wan-AI License](https://huggingface.co/Wan-AI/Wan2.1-VACE-14B)). Non-commercial/research use recommended pending verification.

	---

	✨ Pro Tip: For optimal results, pair with WanVideo’s temporal consistency modules to reduce frame flickering in long sequences.

	---
	Model Card curated by the ComfyUI community. Maintained for reproducibility and ease of deployment.