Questions about checkpoints

#82

by jsrozner - opened Oct 27

Oct 27

•

Hi,
I am working on a research project and have a few questions re checkpoints (https://huggingface.co/answerdotai/ModernBERT-base-training-checkpoints/tree/main)

is pretrain/ep0-ba4000-rank0.pt the earliest checkpoint?
is learning-rate-decay/ep0-ba10598-rank0.pt the last checkpoint?

But

how can I load the zero-trained model, after initialization, but prior to any training
if (2) is correct that it's the final checkpoint, then I observe that when I load that checkpoint, behavior is not identical to loading directly from "answerdotai/ModernBERT". So it seems like it's not actually the final?

Thanks,
Josh

P.S. the preferred approach for saving checkpoints is what, e.g., pythia does, where branches are used for different revisions. Then the normal huggingface .from_pretrained(model_name, revision) can be used. Currently I am hackily pushing weights from a checkpoint into the model and hoping that it's correct. (Am I doing that wrong?)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment