MLM Loss

#48

by ViacheslavBG - opened Jul 23, 2023

Jul 23, 2023

Good day

I've noticed that BERT's MLM (bert-base-uncased) loss is approximately 2.5 on wikipedia dataset on which it was trained. However, the original paper reported ~4 perplexity, i.e. loss ~1.38.
I continue learning it using run_mlm.py script. MLM loss decreased to 1.8 for 10000 steps.
May anybody explain why this checkpoint has a such big loss?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment