SFW Quantized Models

#1
by dani2191 - opened

First of all, thak you for your work. I've been using the Q3 models with my 8GB 3060ti these past few days with quite good results.
Due to my HW limitations, and use case, I wanted to make my own quantized model for the SFW version of Remix and stumbled upon the Comfyui-GGUF Tools + Llama-Quantize convert scripts. I've only used it for my own SDXL models quantizations for now.
I was wondering if those were enough for quantizing Wan 2.2 models (docs only mentions Wan 2.1) or if you do it in another way.

I've also use the tools from gguf custom node to create these, the only thing was that I had the ZLUDA owner, patientx, provide the scripts that works with amd cards with the ZLUDA repo.
I've also created a script that would run the once to create multiple quantize versions in one run.
Just an heads up, you will need a total of 85Gb RAM ( I have 32Gb + 50 GB pagefile) to GGUF Wan 2.2 models.

Great. I think I'll manage. I can always make a jupyter notebook for Colab if I need to.

Sign up or log in to comment