Model outputs nans

#17

by oolongie - opened Aug 5, 2024

Aug 5, 2024

I've been trying to finetune gemma-2-2b, and it seems to work fine when I train it locally, on CPU. However, on two of the clusters that I have available, the model outputs nans. I'm using a singularity container on both and the same dataset for finetuning. So the only difference is likely using GPUs. What could be a cause of this behaviour?

oolongie

Aug 5, 2024

Okay, so I've been able to find the cause and solution. The issue happens because of using padding, here there are several possible solutions proposed.

oolongie changed discussion status to closed Aug 5, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment