--- license: apache-2.0 --- # NB-ROBERTA Training Code This is the current training code for the planned nb-roberta models. We are currently planning to run the following experiments:

Name	nb-roberta-base-old (C)
Corpus	NbAiLab/nb_bert
Pod size	v4-64
Batch size	6248 = 1984 = 2k
Learning rate	3e-4 (RoBERTa article is using 6e-4 and bs=8k)
Number of steps	250k

Name	nb-roberta-base-ext (B)
Corpus	NbAiLab/nbailab_extended
Pod size	v4-64
Batch size	6248 = 1984 = 2k
Learning rate	3e-4 (RoBERTa article is using 6e-4 and bs=8k)
Number of steps	250k

Name	nb-roberta-large-ext
Corpus	NbAiLab/nbailab_extended
Pod size	v4-64
Batch size	3248 = 2024 = 1k
Learning rate	2-e4 (RoBERTa article is using 4e-4 and bs=8k)
Number of steps	500k

Name	nb-roberta-base-scandi
Corpus	NbAiLab/scandinavian
Pod size	v4-64
Batch size	6248 = 1984 = 2k
Learning rate	3e-4 (RoBERTa article is using 6e-4 and bs=8k)
Number of steps	250k

Name	nb-roberta-large-scandi
Corpus	NbAiLab/scandinavian
Pod size	v4-64
Batch size	3248 = 1024 = 1k
Learning rate	2-e4 (RoBERTa article is using 4e-4 and bs=8k)
Number of steps	500k

## Calculations Some basic that we used when estimating the number of training steps: * The Scandinavic Corpus is 85GB * The Scandinavic Corpus contains 13B words * With a conversion factor of 2.3, this is estimated to around 30B tokens * 30B tokens / (512 seq length * 3000 batch size) = 20.000 steps