amd
/

grok-1-W4A8KV8

Model card Files Files and versions

bowenbaoamd commited on Aug 11, 2025

Commit

549ab42

·

verified ·

1 Parent(s): c582233

Update README.md

Files changed (1) hide show

README.md +2 -14

README.md CHANGED Viewed

@@ -15,20 +15,8 @@ base_model: lmzheng/grok-1
 ### INT4 Packing
 Every eight `int4` values are packed into a single `int32` integeter following the sequence defined by `order_map = [0, 2, 4, 6, 1, 3, 5, 7]`.
 ## Quick Start
-1. [Download and install Quark](https://quark.docs.amd.com/latest/install.html)
-2. Run the quantization script in the example folder using the following command line:
-```sh
-export MODEL_DIR = [local model checkpoint folder] or lmzheng/grok-1
-python3 quantize_quark.py \
-        --model_dir $MODEL_DIR \
-        --output_dir grok-1-W4A8KV8 \
-        --quant_scheme TBD \
-        --kv_cache_dtype fp8 \
-        --num_calib_data 128 \
-        --model_export hf_format \
-        --multi_gpu \
-        --custom_mode fp8
-```
 ## Deployment
 Quark has its own export format and allows FP8 quantized models to be efficiently deployed using the SGLang backend.
 ## Evaluation

 ### INT4 Packing
 Every eight `int4` values are packed into a single `int32` integeter following the sequence defined by `order_map = [0, 2, 4, 6, 1, 3, 5, 7]`.
 ## Quick Start
+Follow [Quantizing Sharded Grok-1 with Quark for SGLang](https://github.com/BowenBao/sglang/blob/8939d00a41c96575971fdaf9d5bd764e28db547a/scripts/quark/README.md) to produced the quantized model using Quark.
 ## Deployment
 Quark has its own export format and allows FP8 quantized models to be efficiently deployed using the SGLang backend.
 ## Evaluation