Jinawei commited on
Commit
56edca2
·
1 Parent(s): cc491f8

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +32 -0
README.md ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ XtremeDistilTransformers for Distilling Massive Neural Networks
2
+ XtremeDistilTransformers is a distilled task-agnostic transformer model that leverages task transfer for learning a small universal model that can be applied to arbitrary tasks and languages as outlined in the paper XtremeDistilTransformers: Task Transfer for Task-agnostic Distillation.
3
+
4
+ We leverage task transfer combined with multi-task distillation techniques from the papers XtremeDistil: Multi-stage Distillation for Massive Multilingual Models and MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers with the following Github code.
5
+
6
+ This l6-h384 checkpoint with 6 layers, 384 hidden size, 12 attention heads corresponds to 22 million parameters with 5.3x speedup over BERT-base.
7
+
8
+ Other available checkpoints: xtremedistil-l6-h384-uncased and xtremedistil-l12-h384-uncased
9
+
10
+ The following table shows the results on GLUE dev set and SQuAD-v2.
11
+
12
+ Models #Params Speedup MNLI QNLI QQP RTE SST MRPC SQUAD2 Avg
13
+ BERT 109 1x 84.5 91.7 91.3 68.6 93.2 87.3 76.8 84.8
14
+ DistilBERT 66 2x 82.2 89.2 88.5 59.9 91.3 87.5 70.7 81.3
15
+ TinyBERT 66 2x 83.5 90.5 90.6 72.2 91.6 88.4 73.1 84.3
16
+ MiniLM 66 2x 84.0 91.0 91.0 71.5 92.0 88.4 76.4 84.9
17
+ MiniLM 22 5.3x 82.8 90.3 90.6 68.9 91.3 86.6 72.9 83.3
18
+ XtremeDistil-l6-h256 13 8.7x 83.9 89.5 90.6 80.1 91.2 90.0 74.1 85.6
19
+ XtremeDistil-l6-h384 22 5.3x 85.4 90.3 91.0 80.9 92.3 90.0 76.6 86.6
20
+ XtremeDistil-l12-h384 33 2.7x 87.2 91.9 91.3 85.6 93.1 90.4 80.2 88.5
21
+ Tested with tensorflow 2.3.1, transformers 4.1.1, torch 1.6.0
22
+
23
+ If you use this checkpoint in your work, please cite:
24
+
25
+ @misc{mukherjee2021xtremedistiltransformers,
26
+ title={XtremeDistilTransformers: Task Transfer for Task-agnostic Distillation},
27
+ author={Subhabrata Mukherjee and Ahmed Hassan Awadallah and Jianfeng Gao},
28
+ year={2021},
29
+ eprint={2106.04563},
30
+ archivePrefix={arXiv},
31
+ primaryClass={cs.CL}
32
+ }