BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper
•
1810.04805
•
Published
•
26
We used the pretrained model from bert-base-uncased and finetuned it on MultiNLI dataset.
The training parameters were kept the same as Devlin et al., 2019 (learning rate = 2e-5, training epochs = 3, max_sequence_len = 128 and batch_size = 32).
The evaluation results are mentioned in the table below.
| Test Corpus | Accuracy |
|---|---|
| Matched | 0.8456 |
| Mismatched | 0.8484 |