amd
/

DeepSeek-R1-MXFP4

8-bit precision

Model card Files Files and versions

linzhao-amd commited on Aug 4, 2025

Commit

985c484

·

verified ·

1 Parent(s): 855b652

Update README.md

Files changed (1) hide show

README.md +2 -1

README.md CHANGED Viewed

@@ -33,11 +33,12 @@ You can either perform the dequantization manually using this [conversion script
 **Quantization scripts:**
 ```
 cd Quark/examples/torch/language_modeling/llm_ptq/
 python3 quantize_quark.py --model_dir $MODEL_DIR \
                           --quant_scheme w_mxfp4_a_mxfp4 \
                           --group_size 32 \
                           --num_calib_data 128 \
-                          --exclude_layers "*mlp.gate.*" "*lm_head" \
                           --multi_gpu \
                           --quant_algo autosmoothquant \
                           --model_export hf_format \

 **Quantization scripts:**
 ```
 cd Quark/examples/torch/language_modeling/llm_ptq/
 python3 quantize_quark.py --model_dir $MODEL_DIR \
                           --quant_scheme w_mxfp4_a_mxfp4 \
                           --group_size 32 \
                           --num_calib_data 128 \
+                          --exclude_layers "*self_attn*" "*mlp.gate.*" "*lm_head" \
                           --multi_gpu \
                           --quant_algo autosmoothquant \
                           --model_export hf_format \