--- license: apache-2.0 --- # NB-ROBERTA Training Code This is the current training code for the planned nb-roberta models. We are currently planning to run the following experiments:
Name nb-roberta-base-old (C)
Corpus NbAiLab/nb_bert
Pod size v4-64
Batch size 62*4*8 = 1984 = 2k
Learning rate 3e-4 (RoBERTa article is using 6e-4 and bs=8k)
Number of steps 250k
Name nb-roberta-base-ext (B)
Corpus NbAiLab/nbailab_extended
Pod size v4-64
Batch size 62*4*8 = 1984 = 2k
Learning rate 3e-4 (RoBERTa article is using 6e-4 and bs=8k)
Number of steps 250k
Name nb-roberta-large-ext
Corpus NbAiLab/nbailab_extended
Pod size v4-64
Batch size 32*4*8 = 2024 = 1k
Learning rate 2-e4 (RoBERTa article is using 4e-4 and bs=8k)
Number of steps 500k
Name nb-roberta-base-scandi
Corpus NbAiLab/scandinavian
Pod size v4-64
Batch size 62*4*8 = 1984 = 2k
Learning rate 3e-4 (RoBERTa article is using 6e-4 and bs=8k)
Number of steps 250k
Name nb-roberta-large-scandi
Corpus NbAiLab/scandinavian
Pod size v4-64
Batch size 32*4*8 = 1024 = 1k
Learning rate 2-e4 (RoBERTa article is using 4e-4 and bs=8k)
Number of steps 500k
## Calculations Some basic that we used when estimating the number of training steps: * The Scandinavic Corpus is 85GB * The Scandinavic Corpus contains 13B words * With a conversion factor of 2.3, this is estimated to around 30B tokens * 30B tokens / (512 seq length * 3000 batch size) = 20.000 steps