Update README.md
Browse files
README.md
CHANGED
|
@@ -15,20 +15,8 @@ base_model: lmzheng/grok-1
|
|
| 15 |
### INT4 Packing
|
| 16 |
Every eight `int4` values are packed into a single `int32` integeter following the sequence defined by `order_map = [0, 2, 4, 6, 1, 3, 5, 7]`.
|
| 17 |
## Quick Start
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
```sh
|
| 21 |
-
export MODEL_DIR = [local model checkpoint folder] or lmzheng/grok-1
|
| 22 |
-
python3 quantize_quark.py \
|
| 23 |
-
--model_dir $MODEL_DIR \
|
| 24 |
-
--output_dir grok-1-W4A8KV8 \
|
| 25 |
-
--quant_scheme TBD \
|
| 26 |
-
--kv_cache_dtype fp8 \
|
| 27 |
-
--num_calib_data 128 \
|
| 28 |
-
--model_export hf_format \
|
| 29 |
-
--multi_gpu \
|
| 30 |
-
--custom_mode fp8
|
| 31 |
-
```
|
| 32 |
## Deployment
|
| 33 |
Quark has its own export format and allows FP8 quantized models to be efficiently deployed using the SGLang backend.
|
| 34 |
## Evaluation
|
|
|
|
| 15 |
### INT4 Packing
|
| 16 |
Every eight `int4` values are packed into a single `int32` integeter following the sequence defined by `order_map = [0, 2, 4, 6, 1, 3, 5, 7]`.
|
| 17 |
## Quick Start
|
| 18 |
+
Follow [Quantizing Sharded Grok-1 with Quark for SGLang](https://github.com/BowenBao/sglang/blob/8939d00a41c96575971fdaf9d5bd764e28db547a/scripts/quark/README.md) to produced the quantized model using Quark.
|
| 19 |
+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
## Deployment
|
| 21 |
Quark has its own export format and allows FP8 quantized models to be efficiently deployed using the SGLang backend.
|
| 22 |
## Evaluation
|