Questions about checkpoints

#82
by jsrozner - opened

Hi,
I am working on a research project and have a few questions re checkpoints (https://huggingface.co/answerdotai/ModernBERT-base-training-checkpoints/tree/main)

  1. is pretrain/ep0-ba4000-rank0.pt the earliest checkpoint?
  2. is learning-rate-decay/ep0-ba10598-rank0.pt the last checkpoint?

But

  • how can I load the zero-trained model, after initialization, but prior to any training
  • if (2) is correct that it's the final checkpoint, then I observe that when I load that checkpoint, behavior is not identical to loading directly from "answerdotai/ModernBERT". So it seems like it's not actually the final?

Thanks,
Josh

P.S. the preferred approach for saving checkpoints is what, e.g., pythia does, where branches are used for different revisions. Then the normal huggingface .from_pretrained(model_name, revision) can be used. Currently I am hackily pushing weights from a checkpoint into the model and hoping that it's correct. (Am I doing that wrong?)

Sign up or log in to comment