Why are the released weights in fp32?

by BootsofLagrangian - opened 10 days ago

10 days ago

The paper mentions using PyTorch AMP (bfloat16) for training, but the model is released in float32.

Is there a specific reason for this? I assume you released the master weights directly (maybe for exact reproducibility or referencing OLMo's checkpointing style)?

Just curious since most recent models usually release 2-byte tensors (bf16/fp16).

P.S. Thanks for extending the max context length to 36k! I previously asked about the 4k limit in Molmo - https://huggingface.co/allenai/Molmo-72B-0924/discussions/15 -, so I'm really happy to see this massive upgrade.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment