philschmid commited on
Commit
8bc14a9
·
1 Parent(s): 095e3a4

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -0
README.md ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - optimum
4
+ datasets:
5
+ - banking77
6
+ metrics:
7
+ - accuracy
8
+ model-index:
9
+ - name: quantized-distilbert-banking77
10
+ results:
11
+ - task:
12
+ name: Text Classification
13
+ type: text-classification
14
+ dataset:
15
+ name: banking77
16
+ type: banking77
17
+ metrics:
18
+ - name: Accuracy
19
+ type: accuracy
20
+ value: 0.9224
21
+ ---
22
+
23
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
24
+ should probably proofread and complete it, then remove this comment. -->
25
+
26
+ # Quantized-distilbert-banking77
27
+
28
+ This model is a statically quantized version of [optimum/distilbert-base-uncased-finetuned-banking77](https://huggingface.co/optimum/distilbert-base-uncased-finetuned-banking77) on the `banking77` dataset.
29
+ It achieves the following results on the evaluation set:
30
+
31
+ - Vanilla model: 92.5%
32
+ - Quantized model: 92.24%
33
+ => The quantized model achieves 99.72% accuracy of the fp32 model
34
+
35
+ Latency
36
+ Payload sequence length: 128
37
+ Instance type: AWS c6i.xlarge
38
+ Vanilla model: P95 latency (ms) - 86.7772593483096; Average latency (ms) - 62.55 +\- 8.66;
39
+ Quantized model: P95 latency (ms) - 27.027633551188046; Average latency (ms) - 26.17 +\- 0.66;
40
+ Improvement through quantization: 2.39x
41
+
42
+ ## How to use
43
+
44
+ ```python
45
+ from optimum.onnxruntime import ORTModelForSequenceClassification
46
+ from transformers import pipeline, AutoTokenizer
47
+
48
+ model = ORTModelForSequenceClassification.from_pretrained("philschmid/quantized-distilbert-banking77")
49
+ tokenizer = AutoTokenizer.from_pretrained("philschmid/quantized-distilbert-banking77")
50
+
51
+ remote_clx = pipeline("text-classification",model=model, tokenizer=tokenizer)
52
+
53
+ remote_clx("What is the exchange rate like on this app?")
54
+ ```