Update README.md
Browse files
README.md
CHANGED
|
@@ -13,8 +13,9 @@ license: mit
|
|
| 13 |
|
| 14 |
Please check the [official repository](https://github.com/microsoft/DeBERTa) for more details and updates.
|
| 15 |
|
|
|
|
|
|
|
| 16 |
This is the DeBERTa V3 small model with 6 layers, 768 hidden size. Total parameters is 143M while Embedding layer take about 98M due to the usage of 128k vocabulary. It's trained with 160GB data.
|
| 17 |
-
For more details of our V3 model, please check appendix A11 in our paper.
|
| 18 |
|
| 19 |
#### Fine-tuning on NLU tasks
|
| 20 |
|
|
@@ -24,6 +25,7 @@ We present the dev results on SQuAD 1.1/2.0 and MNLI tasks.
|
|
| 24 |
|-------------------|-----------|-----------|--------|
|
| 25 |
| RoBERTa-base | 91.5/84.6 | 83.7/80.5 | 87.6 |
|
| 26 |
| XLNet-base | -/- | -/80.2 | 86.8 |
|
|
|
|
| 27 |
| **DeBERTa-v3-small** | 93.1/87.2 | 86.2/83.1 | 88.2 |
|
| 28 |
| +SiFT | -/- | -/- | 88.8 |
|
| 29 |
|
|
|
|
| 13 |
|
| 14 |
Please check the [official repository](https://github.com/microsoft/DeBERTa) for more details and updates.
|
| 15 |
|
| 16 |
+
In DeBERTa V3 we replaced MLM objective with RTD(Replaced Token Detection) objective during pre-training, which significantly improves the model performance. Please check appendix A11 in our paper [DeBERTa](https://arxiv.org/abs/2006.03654) for more details.
|
| 17 |
+
|
| 18 |
This is the DeBERTa V3 small model with 6 layers, 768 hidden size. Total parameters is 143M while Embedding layer take about 98M due to the usage of 128k vocabulary. It's trained with 160GB data.
|
|
|
|
| 19 |
|
| 20 |
#### Fine-tuning on NLU tasks
|
| 21 |
|
|
|
|
| 25 |
|-------------------|-----------|-----------|--------|
|
| 26 |
| RoBERTa-base | 91.5/84.6 | 83.7/80.5 | 87.6 |
|
| 27 |
| XLNet-base | -/- | -/80.2 | 86.8 |
|
| 28 |
+
|DeBERTa-base |93.1/87.2| 86.2/83.1| 88.8|
|
| 29 |
| **DeBERTa-v3-small** | 93.1/87.2 | 86.2/83.1 | 88.2 |
|
| 30 |
| +SiFT | -/- | -/- | 88.8 |
|
| 31 |
|