LiconStudio
/

VBVR-wan2.2-comfy-bf16

video-generation

Model card Files Files and versions

VBVR-wan2.2-comfy-bf16 / README.md

LiconStudio's picture

Update README.md

0a961dd verified 9 days ago

|

history blame contribute delete

3.42 kB

	---
	license: apache-2.0
	license_name: wan-ai-license
	license_link: https://github.com/Wan-Video/Wan2.2/blob/main/LICENSE.txt
	base_model: Video-Reason/VBVR-Wan2.2
	library_name: diffusers
	tags:
	- wan2.2
	- i2v
	- fp8
	- comfyui
	- video-generation
	- surgical-quant
	---
	# Wan2.2-I2V-14B: HiFi-Surgical-FP8 & BF16 (ComfyUI Optimized)

	This model follows the Wan-AI Software License Agreement. Please refer to the original repository for usage restrictions.

	This repository provides two high-performance versions of Wan2.2-I2V-14B, meticulously optimized for the ComfyUI ecosystem. We offer a standard BF16 version and a specialized HiFi-Surgical-FP8 mixed-precision version.

	* Original Project: [Video-Reason Wan2.2](https://video-reason.com/)
	* Original Weights: [HuggingFace - VBVR-Wan2.2](https://huggingface.co/Video-Reason/VBVR-Wan2.2)

	---

	## 💎 The HiFi-Surgical Optimization Strategy

	Unlike generic "one-click" quantization scripts that often cause visual degradation in Wan2.2, our HiFi-Surgical-FP8 version uses a data-driven, diagnostic-led approach to preserve cinematic quality.

	### 1. Layer-Wise SNR Calibration
	We performed a deep medical-grade scan on all 406 linear weight tensors of the FP32 Master. Only layers maintaining an SNR (Signal-to-Noise Ratio) > 31.5dB were converted to FP8. This ensures that the mathematical "soul" of the model remains intact.

	### 2. High-Outlier Protection
	Wan2.2 weights are notoriously "fragile" with sharp numerical peaks. Our strategy identifies layers with a high Outlier Index (Max/Std deviation > 12) and locks them in BF16. This specifically targets and eliminates the "sparkle" noise and flickering artifacts common in standard FP8 conversions.

	### 3. Structural Integrity (Blocks 30-39)
	We have physically isolated the Cross-Attention layers in the final blocks of the DiT architecture. By keeping these critical layers in BF16, we ensure that prompt adherence and temporal consistency are not compromised.

	---

	## 📊 Comparison & Specs

	\| Feature \| Standard BF16 \| HiFi-Surgical-FP8 (Recommended) \|
	\| :--- \| :--- \| :--- \|
	\| File Size \| ~27.2 GB \| ~22.4 GB \|
	\| Precision \| Pure Bfloat16 \| Hybrid FP8-E4M3 / BF16 \|
	\| VRAM Requirement \| 24GB+ \| 16GB - 24GB \|
	\| Visual Fidelity \| Reference Grade \| 99% Reference Match \|
	\| Inference Speed \| Base Speed \| Accelerated on Blackwell/Hopper \|

	---

	## 🛠️ ComfyUI Integration & Usage

	These models are specifically converted and tested for ComfyUI.

	1. Native Scaling Support: We have included the `scale_weight` metadata for every quantized tensor. This allows ComfyUI loaders to utilize hardware-level scaling on NVIDIA Blackwell (RTX 50-series) and Hopper architectures for maximum speed.
	2. How to Use:
	* Place the `.safetensors` file in your `ComfyUI/models/diffusion_models/.
	* Use the CheckpointLoaderSimple or the specialized UNETLoader.
	* Ensure your ComfyUI is up-to-date to support the `float8_e4m3fn` type.

	---

	## 📝 Diagnostic Methodology

	Each weight in the HiFi version was selected based on the following diagnostic results:
	* Total Layers Scanned: 406
	* FP8 Layers: 184 (Non-sensitive FFN & Attention layers)
	* BF16 Layers: 222 (Sensitive Cross-Attention & Outlier-heavy layers)
	* Target Hardware: Optimized for RTX 4090, 5090, and H100/H200.