metadata
license: mit
library_name: transformers
Global Batch size : 384 seq_len: 2048
Checkpoint every 500 steps
i.e every 393216000 tokens or 400M Tokens
Current Revison available as
checkpoint-500393Mcheckpoint-1000786Mcheckpoint-15001.18Bcheckpoint-20001.57Bcheckpoint-25001.96B
max_lr : 7e-5