Warning about this not being scaled for pretraining with google/electra-small-discriminator?

by nthngdy - opened Oct 6, 2022

Oct 6, 2022

Hello!

I have been trying to reproduce the results for some time and the loss collapsed in each experiment. I was not careful with the size of the generator as I trusted this version to be scaled properly for the corresponding google/electra-small-discriminator model.

It turns out this model is actually scaled as the Electra-small ++ model, which explains my collapse issue.
I think it would benefit everyone and be less misleading if there was a warning of some sort explaining this in the model card. Could you please consider adding such a disclaimer?

Thank you again for sharing the weights here,
Nathan

lysandre

Oct 6, 2022

Hey @nthngdy , thanks for your feedback! Feel free to open a PR mentioning what you think would be helpful to be less misleading, and I'd be happy to merge it in.

nthngdy

Oct 6, 2022

To be honest I am still confused about why this generator is not scaled properly, do you have any idea?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment