FP8 Quantization?

by MoRanYue - opened Jun 3

Jun 3

•

10 GB may be too large to inference in low-vRAM GPU, will there be a mixed FP8 and even FP4 quantization?

Owner Jun 4

Thanks for raising this! We haven’t tested model quantization yet, but we’ll consider providing a quantized version once its stability is confirmed.

Jun 5

@worstchan Thanks for your response, it will be good for consumer-level devices.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment