When converting fp16/bf16 diffuser models do you prefer int8 block-wise or tensor-wise

#1
by Winnougan - opened

From my understanding, we can convert and quantize any diffuser model, like Z-Image Turbo into INT8 block-wise or tensor-wise. Which do you prefer for performance and quality? Thanks

Tensorwise, as it is the fastest. Block wise ends up as slow as bf16.

Sign up or log in to comment