Weights are in FP16 (loaded in FP32) but paper mentions BF16

#17

by AdrienC - opened May 31, 2024

May 31, 2024

The paper mentions that the training was done in bf16 (as one would expect with a Mistral model) however the safetensors files are float16 and the config.json loads the weights in float32. I would expect that saving the weights in FP16 could lead to overflows coming from BF16.

Could you give us more details on how to load and potentially fine-tune this model without running into issues ?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment