Can this model be quantized to FP8 scale?

by Nuke1229 - opened Feb 26

Feb 26

Hi, is it possible to make an FP8 scaled version of this model? Would love to see an FP8 variant for lower VRAM usage. Thanks!

LiconStudio

Owner Feb 26

Based on my extensive SNR (Signal-to-Noise Ratio) diagnostic testing, I have found that the architecture of this model is exceptionally sensitive—essentially 'too weak' for aggressive offline FP8 quantization across all layers.

I have determined that 21GB is the absolute physical limit for a pre-quantized model if we intend to preserve its original generative capabilities and motion fidelity. Any further compression via offline methods results in significant intelligence degradation.

If your hardware requires an even smaller footprint, I strongly recommend using Kijai’s ComfyUI nodes for real-time (on-the-fly) quantization. While this method may be slightly slower in terms of inference speed, the dynamic nature of real-time quantization handles the model's weights much more gracefully than a static offline export, yielding far superior visual results.

Universe97

Feb 28

Based on my extensive SNR (Signal-to-Noise Ratio) diagnostic testing, I have found that the architecture of this model is exceptionally sensitive—essentially 'too weak' for aggressive offline FP8 quantization across all layers.

I have determined that 21GB is the absolute physical limit for a pre-quantized model if we intend to preserve its original generative capabilities and motion fidelity. Any further compression via offline methods results in significant intelligence degradation.

If your hardware requires an even smaller footprint, I strongly recommend using Kijai’s ComfyUI nodes for real-time (on-the-fly) quantization. While this method may be slightly slower in terms of inference speed, the dynamic nature of real-time quantization handles the model's weights much more gracefully than a static offline export, yielding far superior visual results.

Hello, if I use the UNET loader in ComfyUI and select FP8 quantization, is that possible? Is it possible to quantize this to NFP4 format? I have a 50-series graphics card, and this format is faster

LiconStudio

Owner Feb 28

Based on my extensive SNR (Signal-to-Noise Ratio) diagnostic testing, I have found that the architecture of this model is exceptionally sensitive—essentially 'too weak' for aggressive offline FP8 quantization across all layers.

I have determined that 21GB is the absolute physical limit for a pre-quantized model if we intend to preserve its original generative capabilities and motion fidelity. Any further compression via offline methods results in significant intelligence degradation.

If your hardware requires an even smaller footprint, I strongly recommend using Kijai’s ComfyUI nodes for real-time (on-the-fly) quantization. While this method may be slightly slower in terms of inference speed, the dynamic nature of real-time quantization handles the model's weights much more gracefully than a static offline export, yielding far superior visual results.

Hello, if I use the UNET loader in ComfyUI and select FP8 quantization, is that possible? Is it possible to quantize this to NFP4 format? I have a 50-series graphics card, and this format is faster

Yes, using Kijai’s model loading nodes and selecting FP8 E4M3 will perform real-time quantization to an FP8 footprint.

In my opinion, NFP4 quantization is impractical for video models. While it allows the model to run on hardware with limited VRAM, the loss in precision is unacceptable. This is especially true for Wan2.1, which my testing has shown to be extremely 'fragile'—any level of quantization results in a visible degradation of quality.

mediaizy

Mar 19

Can this be converted to GGUF or its pointless?

LiconStudio

Owner Mar 20

Can this be converted to GGUF or its pointless?

I could upload one Q8_0 after my current project, but I dont think the performance will be good since the SNR result shows wan2.2 is a very weak model to be quanted. I'm working on VBVR dataset with LTX2.3, its a quite huge job, but anyway, I'll upload a GGUF version after this.

mediaizy

Mar 20

Thank you man , that would be great.

ucren

Apr 1

Is there a lora extraction of this model?

LiconStudio

Owner Apr 1

Is there a lora extraction of this model?
https://huggingface.co/Kijai/WanVideo_comfy/tree/main/LoRAs/VBVR,kijai already did this.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment