Can this model be quantized to FP8 scale?
Hi, is it possible to make an FP8 scaled version of this model? Would love to see an FP8 variant for lower VRAM usage. Thanks!
Based on my extensive SNR (Signal-to-Noise Ratio) diagnostic testing, I have found that the architecture of this model is exceptionally sensitive—essentially 'too weak' for aggressive offline FP8 quantization across all layers.
I have determined that 21GB is the absolute physical limit for a pre-quantized model if we intend to preserve its original generative capabilities and motion fidelity. Any further compression via offline methods results in significant intelligence degradation.
If your hardware requires an even smaller footprint, I strongly recommend using Kijai’s ComfyUI nodes for real-time (on-the-fly) quantization. While this method may be slightly slower in terms of inference speed, the dynamic nature of real-time quantization handles the model's weights much more gracefully than a static offline export, yielding far superior visual results.
Based on my extensive SNR (Signal-to-Noise Ratio) diagnostic testing, I have found that the architecture of this model is exceptionally sensitive—essentially 'too weak' for aggressive offline FP8 quantization across all layers.
I have determined that 21GB is the absolute physical limit for a pre-quantized model if we intend to preserve its original generative capabilities and motion fidelity. Any further compression via offline methods results in significant intelligence degradation.
If your hardware requires an even smaller footprint, I strongly recommend using Kijai’s ComfyUI nodes for real-time (on-the-fly) quantization. While this method may be slightly slower in terms of inference speed, the dynamic nature of real-time quantization handles the model's weights much more gracefully than a static offline export, yielding far superior visual results.
Hello, if I use the UNET loader in ComfyUI and select FP8 quantization, is that possible? Is it possible to quantize this to NFP4 format? I have a 50-series graphics card, and this format is faster
Based on my extensive SNR (Signal-to-Noise Ratio) diagnostic testing, I have found that the architecture of this model is exceptionally sensitive—essentially 'too weak' for aggressive offline FP8 quantization across all layers.
I have determined that 21GB is the absolute physical limit for a pre-quantized model if we intend to preserve its original generative capabilities and motion fidelity. Any further compression via offline methods results in significant intelligence degradation.
If your hardware requires an even smaller footprint, I strongly recommend using Kijai’s ComfyUI nodes for real-time (on-the-fly) quantization. While this method may be slightly slower in terms of inference speed, the dynamic nature of real-time quantization handles the model's weights much more gracefully than a static offline export, yielding far superior visual results.
Hello, if I use the UNET loader in ComfyUI and select FP8 quantization, is that possible? Is it possible to quantize this to NFP4 format? I have a 50-series graphics card, and this format is faster
Yes, using Kijai’s model loading nodes and selecting FP8 E4M3 will perform real-time quantization to an FP8 footprint.
In my opinion, NFP4 quantization is impractical for video models. While it allows the model to run on hardware with limited VRAM, the loss in precision is unacceptable. This is especially true for Wan2.1, which my testing has shown to be extremely 'fragile'—any level of quantization results in a visible degradation of quality.