meituan
/

DeepSeek-R1-Block-INT8

yuanzu commited on Feb 24, 2025

Commit

160722a

verified ·

1 Parent(s): d4440e1

Update README.md (#5)

- Update README.md (1f37595e7f0ca7d27fbf9224ae97095afaedf72e)

Co-authored-by: laixinn <yuanzu@users.noreply.huggingface.co>

Files changed (1) hide show

README.md CHANGED Viewed

@@ -16,14 +16,14 @@ In benchmarking, we observe **no accuracy loss** and up to **30\%** performance
 ## 1. Benchmarking Result (detailed in [PULL REQUEST](https://github.com/sgl-project/sglang/pull/3730)):
 | Model  | Config | Accuracy (GSM8K) | Accuracy (MMLU) | Output Throughput(qps=128) | Output Throughput(bs=1) |
 |--------|--------|-------------------|----------------|------------------------------|--------------------------|
-| BF16 R1 | (A100\*16)x2 | 95.8              | 87.1           | 4450.02 (+33%)                | 44.18 (+18%)             |
-| INT8 R1 | A100\*32  | 95.5              | 87.1           | 3342.29                       | 37.20                     |
 ## 2. Quantization Process
 We apply INT8 quantization to the BF16 checkpoints.
-The weight scales are determined by dividing the block-wise maximum of element values by the INT8 type maximum.
 To generate this weight, run the provided script in the ``./inference`` directory:

 ## 1. Benchmarking Result (detailed in [PULL REQUEST](https://github.com/sgl-project/sglang/pull/3730)):
 | Model  | Config | Accuracy (GSM8K) | Accuracy (MMLU) | Output Throughput(qps=128) | Output Throughput(bs=1) |
 |--------|--------|-------------------|----------------|------------------------------|--------------------------|
+| BF16 R1 | A100\*32  | 95.5              | 87.1           | 3342.29                       | 37.20                     |
+| INT8 R1 | (A100\*16)x2 | **95.8**              | **87.1**           | 4450.02 **(+33%)**                | 44.18 **(+18%)**             |
 ## 2. Quantization Process
 We apply INT8 quantization to the BF16 checkpoints.
+The quantization scales are determined by dividing the block-wise maximum of element values by the INT8 type maximum.
 To generate this weight, run the provided script in the ``./inference`` directory: