Fill-Mask
Transformers
PyTorch
TensorBoard
Safetensors
French
modernbert
camembert

Is loss metric summed instead of averaged?

#2
by julbo - opened

First, thanks for this amazing model!

I'm under the impression that in training stats the loss metrics are summed instead of averaged: contrary to camembert or camemberta, when I use a validation set of 20%, the reported trainset loss is 4x higher than the testset loss.

This is a bit misleading, see picture, especially wrt other models where the losses are averaged.

Screenshot from 2025-05-16 22-08-54.png

Thanks

Sign up or log in to comment