Update README.md
Browse files
README.md
CHANGED
|
@@ -1,27 +1,40 @@
|
|
| 1 |
-
|
| 2 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
|
| 4 |
-
|
| 5 |
|
| 6 |
-
|
| 7 |
|
| 8 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
|
| 10 |
The following table shows the results on GLUE dev set and SQuAD-v2.
|
| 11 |
|
| 12 |
-
Models
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
MiniLM
|
| 18 |
-
|
| 19 |
-
XtremeDistil-l6-
|
| 20 |
-
XtremeDistil-
|
| 21 |
-
|
|
|
|
|
|
|
| 22 |
|
| 23 |
If you use this checkpoint in your work, please cite:
|
| 24 |
|
|
|
|
| 25 |
@misc{mukherjee2021xtremedistiltransformers,
|
| 26 |
title={XtremeDistilTransformers: Task Transfer for Task-agnostic Distillation},
|
| 27 |
author={Subhabrata Mukherjee and Ahmed Hassan Awadallah and Jianfeng Gao},
|
|
@@ -29,4 +42,5 @@ If you use this checkpoint in your work, please cite:
|
|
| 29 |
eprint={2106.04563},
|
| 30 |
archivePrefix={arXiv},
|
| 31 |
primaryClass={cs.CL}
|
| 32 |
-
}
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language: en
|
| 3 |
+
thumbnail: https://huggingface.co/front/thumbnails/microsoft.png
|
| 4 |
+
tags:
|
| 5 |
+
- text-classification
|
| 6 |
+
license: mit
|
| 7 |
+
---
|
| 8 |
|
| 9 |
+
# XtremeDistilTransformers for Distilling Massive Neural Networks
|
| 10 |
|
| 11 |
+
XtremeDistilTransformers is a distilled task-agnostic transformer model that leverages task transfer for learning a small universal model that can be applied to arbitrary tasks and languages as outlined in the paper [XtremeDistilTransformers: Task Transfer for Task-agnostic Distillation](https://arxiv.org/abs/2106.04563).
|
| 12 |
|
| 13 |
+
We leverage task transfer combined with multi-task distillation techniques from the papers [XtremeDistil: Multi-stage Distillation for Massive Multilingual Models](https://www.aclweb.org/anthology/2020.acl-main.202.pdf) and [MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers](https://proceedings.neurips.cc/paper/2020/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf) with the following [Github code](https://github.com/microsoft/xtreme-distil-transformers).
|
| 14 |
+
|
| 15 |
+
|
| 16 |
+
This l6-h384 checkpoint with **6** layers, **384** hidden size, **12** attention heads corresponds to **22 million** parameters with **5.3x** speedup over BERT-base.
|
| 17 |
+
|
| 18 |
+
Other available checkpoints: [xtremedistil-l6-h384-uncased](https://huggingface.co/microsoft/xtremedistil-l6-h384-uncased) and [xtremedistil-l12-h384-uncased](https://huggingface.co/microsoft/xtremedistil-l12-h384-uncased)
|
| 19 |
|
| 20 |
The following table shows the results on GLUE dev set and SQuAD-v2.
|
| 21 |
|
| 22 |
+
| Models | #Params | Speedup | MNLI | QNLI | QQP | RTE | SST | MRPC | SQUAD2 | Avg |
|
| 23 |
+
|----------------|--------|---------|------|------|------|------|------|------|--------|-------|
|
| 24 |
+
| BERT | 109 | 1x | 84.5 | 91.7 | 91.3 | 68.6 | 93.2 | 87.3 | 76.8 | 84.8 |
|
| 25 |
+
| DistilBERT | 66 | 2x | 82.2 | 89.2 | 88.5 | 59.9 | 91.3 | 87.5 | 70.7 | 81.3 |
|
| 26 |
+
| TinyBERT | 66 | 2x | 83.5 | 90.5 | 90.6 | 72.2 | 91.6 | 88.4 | 73.1 | 84.3 |
|
| 27 |
+
| MiniLM | 66 | 2x | 84.0 | 91.0 | 91.0 | 71.5 | 92.0 | 88.4 | 76.4 | 84.9 |
|
| 28 |
+
| MiniLM | 22 | 5.3x | 82.8 | 90.3 | 90.6 | 68.9 | 91.3 | 86.6 | 72.9 | 83.3 |
|
| 29 |
+
| XtremeDistil-l6-h256 | 13 | 8.7x | 83.9 | 89.5 | 90.6 | 80.1 | 91.2 | 90.0 | 74.1 | 85.6 |
|
| 30 |
+
| XtremeDistil-l6-h384 | 22 | 5.3x | 85.4 | 90.3 | 91.0 | 80.9 | 92.3 | 90.0 | 76.6 | 86.6 |
|
| 31 |
+
| XtremeDistil-l12-h384 | 33 | 2.7x | 87.2 | 91.9 | 91.3 | 85.6 | 93.1 | 90.4 | 80.2 | 88.5 |
|
| 32 |
+
|
| 33 |
+
Tested with `tensorflow 2.3.1, transformers 4.1.1, torch 1.6.0`
|
| 34 |
|
| 35 |
If you use this checkpoint in your work, please cite:
|
| 36 |
|
| 37 |
+
``` latex
|
| 38 |
@misc{mukherjee2021xtremedistiltransformers,
|
| 39 |
title={XtremeDistilTransformers: Task Transfer for Task-agnostic Distillation},
|
| 40 |
author={Subhabrata Mukherjee and Ahmed Hassan Awadallah and Jianfeng Gao},
|
|
|
|
| 42 |
eprint={2106.04563},
|
| 43 |
archivePrefix={arXiv},
|
| 44 |
primaryClass={cs.CL}
|
| 45 |
+
}
|
| 46 |
+
```
|