What was the process to quantize?
I tried to load previous chroma I had quantized with https://github.com/silveroxides/convert_to_quant and got a big fat error.
I tried to load previous chroma I had quantized with https://github.com/silveroxides/convert_to_quant and got a big fat error.
There is a lot of difference between int8 tensorwise (somewhat new, not previously implemented in convert to quant) and int8 blockwise, so it is not supported in my own node. No speed-up has been possible for blockwise.
This is the branch needed for tensorwise quantization: https://github.com/silveroxides/convert_to_quant/tree/feature/int8-refactor
I am currently requantizing the klein models for better quality and will upload a good chroma int8 model soon.
Here is the improved command that was given to me by silver:
`convert_to_quant -i ./Chroma1-HD.safetensors -o ./Chroma1-HD-int8tensormixed.safetensors --int8 --scaling_mode tensor --distillation_large --comfy_quant --save-quant-metadata --optimizer radam --calib_samples 8192 --num_iter 3000 --lr "7.1267000000029e-4" --top_p 0.2 --min_k 256 --max_k 1024 --lr_schedule plateau --lr_patience 5 --lr_factor 0.92 --lr_min "1e-9" --lr_cooldown 1 --lr_threshold 1e-8 --early-stop-lr 8e-9 --lr-shape-influence 1.5 --low-memory
I will give it a try. I think he merged tensorwise by now. No wonder I get NaN trying the node on my old model. I do still have problems torch compiling.
edit: needs his kitchen fork too. with converted model it seems to edge out descaled fp8 on my turning card and compiles. think quality is slightly better.