What was the process to quantize?

#1
by Lockout - opened

I tried to load previous chroma I had quantized with https://github.com/silveroxides/convert_to_quant and got a big fat error.

I tried to load previous chroma I had quantized with https://github.com/silveroxides/convert_to_quant and got a big fat error.

There is a lot of difference between int8 tensorwise (somewhat new, not previously implemented in convert to quant) and int8 blockwise, so it is not supported in my own node. No speed-up has been possible for blockwise.
This is the branch needed for tensorwise quantization: https://github.com/silveroxides/convert_to_quant/tree/feature/int8-refactor

I am currently requantizing the klein models for better quality and will upload a good chroma int8 model soon.

Here is the improved command that was given to me by silver:
`convert_to_quant -i ./Chroma1-HD.safetensors -o ./Chroma1-HD-int8tensormixed.safetensors --int8 --scaling_mode tensor --distillation_large --comfy_quant --save-quant-metadata --optimizer radam --calib_samples 8192 --num_iter 3000 --lr "7.1267000000029e-4" --top_p 0.2 --min_k 256 --max_k 1024 --lr_schedule plateau --lr_patience 5 --lr_factor 0.92 --lr_min "1e-9" --lr_cooldown 1 --lr_threshold 1e-8 --early-stop-lr 8e-9 --lr-shape-influence 1.5 --low-memory

I will give it a try. I think he merged tensorwise by now. No wonder I get NaN trying the node on my old model. I do still have problems torch compiling.

edit: needs his kitchen fork too. with converted model it seems to edge out descaled fp8 on my turning card and compiles. think quality is slightly better.

Sign up or log in to comment