Update README.md
Browse files
README.md
CHANGED
|
@@ -11,7 +11,7 @@ The INT8 data type is both friendly and efficient for most hardware platforms.
|
|
| 11 |
|
| 12 |
In benchmarking, we observe **no accuracy loss** and up to **33\%** performance enhancement.
|
| 13 |
|
| 14 |
-
[
|
| 15 |
|
| 16 |
## 1. Benchmarking Result (detailed in [PULL REQUEST](https://github.com/sgl-project/sglang/pull/3730)):
|
| 17 |
| Model | Config | Accuracy (GSM8K) | Accuracy (MMLU) | Output Throughput(qps=128) |
|
|
|
|
| 11 |
|
| 12 |
In benchmarking, we observe **no accuracy loss** and up to **33\%** performance enhancement.
|
| 13 |
|
| 14 |
+
Thanks to our merged [PULL REQUEST](https://github.com/sgl-project/sglang/pull/3730), [SGLang](https://github.com/sgl-project/sglang/tree/main) is now support the block-wise INT8 quantization operation.
|
| 15 |
|
| 16 |
## 1. Benchmarking Result (detailed in [PULL REQUEST](https://github.com/sgl-project/sglang/pull/3730)):
|
| 17 |
| Model | Config | Accuracy (GSM8K) | Accuracy (MMLU) | Output Throughput(qps=128) |
|