h-params used to pretrain from scratch

#8
by StephennFernandes - opened

Hi team,
great work on releasing a multilingual variant of modernBERT.
given that you pretrained on around 60B tokens. could you guys please share a full detailed list of all the hyper parameters used to pretrain the model.
if not the actual pretraining codebase with actual h-params ?

Owner

you can take a look at this repo: https://github.com/neavo/KeywordGachaModel

Sign up or log in to comment