Commit
·
0936cdd
1
Parent(s):
8bc14a9
Update README.md
Browse files
README.md
CHANGED
|
@@ -29,15 +29,15 @@ This model is a statically quantized version of [optimum/distilbert-base-uncased
|
|
| 29 |
It achieves the following results on the evaluation set:
|
| 30 |
|
| 31 |
- Vanilla model: 92.5%
|
| 32 |
-
- Quantized model: 92.24
|
| 33 |
=> The quantized model achieves 99.72% accuracy of the fp32 model
|
| 34 |
|
| 35 |
Latency
|
| 36 |
-
Payload sequence length: 128
|
| 37 |
-
Instance type: AWS c6i.xlarge
|
| 38 |
-
Vanilla model: P95 latency (ms) - 86.7772593483096; Average latency (ms) - 62.55 +\- 8.66;
|
| 39 |
-
Quantized model: P95 latency (ms) - 27.027633551188046; Average latency (ms) - 26.17 +\- 0.66;
|
| 40 |
-
Improvement through quantization: 2.39x
|
| 41 |
|
| 42 |
## How to use
|
| 43 |
|
|
|
|
| 29 |
It achieves the following results on the evaluation set:
|
| 30 |
|
| 31 |
- Vanilla model: 92.5%
|
| 32 |
+
- Quantized model: 92.24%.
|
| 33 |
=> The quantized model achieves 99.72% accuracy of the fp32 model
|
| 34 |
|
| 35 |
Latency
|
| 36 |
+
Payload sequence length: 128
|
| 37 |
+
Instance type: AWS c6i.xlarge
|
| 38 |
+
Vanilla model: P95 latency (ms) - 86.7772593483096; Average latency (ms) - 62.55 +\- 8.66;
|
| 39 |
+
Quantized model: P95 latency (ms) - 27.027633551188046; Average latency (ms) - 26.17 +\- 0.66;
|
| 40 |
+
Improvement through quantization: 2.39x
|
| 41 |
|
| 42 |
## How to use
|
| 43 |
|