| license: mit | |
| # Distilled-RoBERTa | |
| The DistilBERT model is a [RoBERTa](https://huggingface.co/deepset/roberta-base-squad2-distilled) model, which is trained on the SQuAD 2.0 training set, fine-tuned on the [NewsQA](https://huggingface.co/datasets/lucadiliello/newsqa) dataset. | |
| ## Hyperparameters | |
| ``` | |
| batch_size = 16 | |
| n_epochs = 3 | |
| max_seq_len = 512 | |
| learning_rate = 2e-5 | |
| optimizer=AdamW | |
| lr_schedule = LinearWarmup | |
| weight_decay=0.01 | |
| embeds_dropout_prob = 0.1 | |
| ``` |