Commit
路
6c62aa0
1
Parent(s):
8084f28
Update README.md
Browse files
README.md
CHANGED
|
@@ -11,13 +11,13 @@ tags:
|
|
| 11 |
## Overview
|
| 12 |
**Language model:** deepset/tinybert-6L-768D-squad2
|
| 13 |
**Language:** English
|
| 14 |
-
**Training data:** SQuAD 2.0 training set x 20 augmented + SQuAD 2.0 training set
|
| 15 |
**Eval data:** SQuAD 2.0 dev set
|
| 16 |
**Infrastructure**: 1x V100 GPU
|
| 17 |
**Published**: Dec 8th, 2021
|
| 18 |
|
| 19 |
## Details
|
| 20 |
-
- haystack's intermediate layer and prediction layer distillation features were used for training (based on [TinyBERT](https://arxiv.org/pdf/1909.10351.pdf)). deepset/bert-base-uncased-squad2 was used as the teacher model.
|
| 21 |
|
| 22 |
## Hyperparameters
|
| 23 |
### Intermediate layer distillation
|
|
@@ -29,7 +29,6 @@ learning_rate = 5e-5
|
|
| 29 |
lr_schedule = LinearWarmup
|
| 30 |
embeds_dropout_prob = 0.1
|
| 31 |
temperature = 1
|
| 32 |
-
distillation_loss_weight = 0.75
|
| 33 |
```
|
| 34 |
### Prediction layer distillation
|
| 35 |
```
|
|
@@ -40,7 +39,7 @@ learning_rate = 3e-5
|
|
| 40 |
lr_schedule = LinearWarmup
|
| 41 |
embeds_dropout_prob = 0.1
|
| 42 |
temperature = 1
|
| 43 |
-
distillation_loss_weight = 0
|
| 44 |
```
|
| 45 |
## Performance
|
| 46 |
```
|
|
|
|
| 11 |
## Overview
|
| 12 |
**Language model:** deepset/tinybert-6L-768D-squad2
|
| 13 |
**Language:** English
|
| 14 |
+
**Training data:** SQuAD 2.0 training set x 20 augmented + SQuAD 2.0 training set without augmentation
|
| 15 |
**Eval data:** SQuAD 2.0 dev set
|
| 16 |
**Infrastructure**: 1x V100 GPU
|
| 17 |
**Published**: Dec 8th, 2021
|
| 18 |
|
| 19 |
## Details
|
| 20 |
+
- haystack's intermediate layer and prediction layer distillation features were used for training (based on [TinyBERT](https://arxiv.org/pdf/1909.10351.pdf)). deepset/bert-base-uncased-squad2 was used as the teacher model and huawei-noah/TinyBERT_General_6L_768D was used as the student model.
|
| 21 |
|
| 22 |
## Hyperparameters
|
| 23 |
### Intermediate layer distillation
|
|
|
|
| 29 |
lr_schedule = LinearWarmup
|
| 30 |
embeds_dropout_prob = 0.1
|
| 31 |
temperature = 1
|
|
|
|
| 32 |
```
|
| 33 |
### Prediction layer distillation
|
| 34 |
```
|
|
|
|
| 39 |
lr_schedule = LinearWarmup
|
| 40 |
embeds_dropout_prob = 0.1
|
| 41 |
temperature = 1
|
| 42 |
+
distillation_loss_weight = 1.0
|
| 43 |
```
|
| 44 |
## Performance
|
| 45 |
```
|