Instructions to use nyu-mll/roberta-base-10M-1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use nyu-mll/roberta-base-10M-1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="nyu-mll/roberta-base-10M-1")# Load model directly from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("nyu-mll/roberta-base-10M-1") model = AutoModelForMaskedLM.from_pretrained("nyu-mll/roberta-base-10M-1") - Notebooks
- Google Colab
- Kaggle
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
RoBERTa Pretrained on Smaller Datasets
We pretrain RoBERTa on smaller datasets (1M, 10M, 100M, 1B tokens). We release 3 models with lowest perplexities for each pretraining data size out of 25 runs (or 10 in the case of 1B tokens). The pretraining data reproduces that of BERT: We combine English Wikipedia and a reproduction of BookCorpus using texts from smashwords in a ratio of approximately 3:1.
Hyperparameters and Validation Perplexity
The hyperparameters and validation perplexities corresponding to each model are as follows:
| Model Name | Training Size | Model Size | Max Steps | Batch Size | Validation Perplexity |
|---|---|---|---|---|---|
| roberta-base-1B-1 | 1B | BASE | 100K | 512 | 3.93 |
| roberta-base-1B-2 | 1B | BASE | 31K | 1024 | 4.25 |
| roberta-base-1B-3 | 1B | BASE | 31K | 4096 | 3.84 |
| roberta-base-100M-1 | 100M | BASE | 100K | 512 | 4.99 |
| roberta-base-100M-2 | 100M | BASE | 31K | 1024 | 4.61 |
| roberta-base-100M-3 | 100M | BASE | 31K | 512 | 5.02 |
| roberta-base-10M-1 | 10M | BASE | 10K | 1024 | 11.31 |
| roberta-base-10M-2 | 10M | BASE | 10K | 512 | 10.78 |
| roberta-base-10M-3 | 10M | BASE | 31K | 512 | 11.58 |
| roberta-med-small-1M-1 | 1M | MED-SMALL | 100K | 512 | 153.38 |
| roberta-med-small-1M-2 | 1M | MED-SMALL | 10K | 512 | 134.18 |
| roberta-med-small-1M-3 | 1M | MED-SMALL | 31K | 512 | 139.39 |
The hyperparameters corresponding to model sizes mentioned above are as follows:
| Model Size | L | AH | HS | FFN | P |
|---|---|---|---|---|---|
| BASE | 12 | 12 | 768 | 3072 | 125M |
| MED-SMALL | 6 | 8 | 512 | 2048 | 45M |
(AH = number of attention heads; HS = hidden size; FFN = feedforward network dimension; P = number of parameters.)
For other hyperparameters, we select:
- Peak Learning rate: 5e-4
- Warmup Steps: 6% of max steps
- Dropout: 0.1
- Downloads last month
- 472