microsoft
/

deberta-v3-small

Model card Files Files and versions

DeBERTa commited on Oct 19, 2021

Commit

bfb08ac

·

1 Parent(s): 7205507

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -13,8 +13,9 @@ license: mit
 Please check the [official repository](https://github.com/microsoft/DeBERTa) for more details and updates.
 This is the DeBERTa V3 small model with 6 layers, 768 hidden size. Total parameters is 143M while Embedding layer take about 98M due to the usage of 128k vocabulary. It's trained with 160GB data.
-For more details of our V3 model, please check appendix A11 in our paper.
 #### Fine-tuning on NLU tasks
@@ -24,6 +25,7 @@ We present the dev results on SQuAD 1.1/2.0 and MNLI tasks.
 |-------------------|-----------|-----------|--------|
 | RoBERTa-base      | 91.5/84.6 | 83.7/80.5 | 87.6   |
 | XLNet-base        | -/-       | -/80.2    | 86.8   |
 | **DeBERTa-v3-small**  | 93.1/87.2 | 86.2/83.1 | 88.2   |
 | +SiFT  | -/- | -/- | 88.8   |

 Please check the [official repository](https://github.com/microsoft/DeBERTa) for more details and updates.
+In DeBERTa V3 we replaced MLM objective with RTD(Replaced Token Detection) objective during pre-training, which significantly improves the model performance. Please check appendix A11 in our paper [DeBERTa](https://arxiv.org/abs/2006.03654) for more details.
 This is the DeBERTa V3 small model with 6 layers, 768 hidden size. Total parameters is 143M while Embedding layer take about 98M due to the usage of 128k vocabulary. It's trained with 160GB data.
 #### Fine-tuning on NLU tasks
 |-------------------|-----------|-----------|--------|
 | RoBERTa-base      | 91.5/84.6 | 83.7/80.5 | 87.6   |
 | XLNet-base        | -/-       | -/80.2    | 86.8   |
+|DeBERTa-base	|93.1/87.2|	86.2/83.1|	88.8|
 | **DeBERTa-v3-small**  | 93.1/87.2 | 86.2/83.1 | 88.2   |
 | +SiFT  | -/- | -/- | 88.8   |