h-params used to pretrain from scratch
#8
by
StephennFernandes
- opened
Hi team,
great work on releasing a multilingual variant of modernBERT.
given that you pretrained on around 60B tokens. could you guys please share a full detailed list of all the hyper parameters used to pretrain the model.
if not the actual pretraining codebase with actual h-params ?