Model outputs nans

#17
by oolongie - opened

I've been trying to finetune gemma-2-2b, and it seems to work fine when I train it locally, on CPU. However, on two of the clusters that I have available, the model outputs nans. I'm using a singularity container on both and the same dataset for finetuning. So the only difference is likely using GPUs. What could be a cause of this behaviour?

Okay, so I've been able to find the cause and solution. The issue happens because of using padding, here there are several possible solutions proposed.

oolongie changed discussion status to closed

Sign up or log in to comment