DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing
Paper • 2111.09543 • Published • 3
This is a pair classification model that was trained to determine whether the given “hypothesis” logically follows from the “premise.
This model is based upon a DeBERTa-v3 model that was fine-tuned on 27K pairs of texts.
27K premise-hypothesis pairs data with entailment and contraction labels
- learning_rate: 2e-05
- train_batch_size: 8
- eval_batch_size: 8
- weighted_decay=0.0002
- num_epochs: 2
- overall training time: 30mins
- duration per training epoch: 15mins
- model size: 1.7GB
A subset of the development set provided, amounting to 6.7K pairs.
- Macro-p:0.928
- Macro-r:0.927
- Macro-F1:0.927
- W_Macro-p:0.928
- W_Macro-r:0.928
- W_Macro-F1:0.928
- Mcc:0.855
The model obtained an F1-score of 93% and an MCC of 86%.
- RAM: at least 16 GB
- Storage: at least 2GB,
- GPU: V100
- Transformers 4.18.0
- Pytorch 1.11.0+cu113
Any inputs (concatenation of two sequences) longer than 512 subwords will be truncated by the model.
The hyperparameters were determined by experimentation with different values.