Float32 training

by Iker - opened Jun 28, 2024

Jun 28, 2024

Hello!

First of all, thank you for your work; I find it very interesting and useful!

While reading through the documentation, I found that you have trained the model using float32:

We use DeepSpeed with full float32 training.

I find this surprising, as most models are trained with bfloat16. I was curious about why you made this decision. Was the model performance better with float32 than bfloat16?

carlosep93

Jun 28, 2024

Hello!

Thank you for your interest in our model.

At the beginning of the project, we observed some training instability due to deepspeed and fp16. As fp32 didn't cause any memory bottlenecks in our configuration, we decided to train it this way.

We continued working on the problems, and they are solved now. Future models will be trained using fp16 instead.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment