amd
/

grok-1-FP8-KV

@@ -27,7 +27,8 @@ python3 quantize_quark.py \
         --num_calib_data 128 \
         --model_export quark_safetensors \
         --multi_gpu \
-        --no_weight_matrix_merge
 ```
 ## Deployment
 Quark has its own export format and allows FP8 quantized models to be efficiently deployed using the vLLM backend(vLLM-compatible).

         --num_calib_data 128 \
         --model_export quark_safetensors \
         --multi_gpu \
+        --no_weight_matrix_merge \
+        --custom_mode fp8
 ```
 ## Deployment
 Quark has its own export format and allows FP8 quantized models to be efficiently deployed using the vLLM backend(vLLM-compatible).