Question regarding the FP8 version
#9
by
thecr7guy
- opened
Hello,
Great work on this amazing model. Wanted to know if there is a plan to release the FP8 version of zai-org/GLM-4.6V-Flash ?
If you have enough VRAM then you don't need a separate FP8 version: https://docs.vllm.ai/en/v0.5.4/quantization/fp8.html
Dynamic quantization of an original precision BF16/FP16 model to FP8 can be achieved with vLLM without any calibration data required. You can enable the feature by specifying --quantization="fp8" in the command line or setting quantization="fp8" in the LLM constructor.