When converting fp16/bf16 diffuser models do you prefer int8 block-wise or tensor-wise

by Winnougan - opened Feb 4

Feb 4

From my understanding, we can convert and quantize any diffuser model, like Z-Image Turbo into INT8 block-wise or tensor-wise. Which do you prefer for performance and quality? Thanks

bertbobson

Owner Feb 4

Tensorwise, as it is the fastest. Block wise ends up as slow as bf16.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment