@stas on Hugging Face: "PSA for DeepSpeed users - a long outstanding precision-related critical bug…"

Post

133

PSA for DeepSpeed users - a long outstanding precision-related critical bug has been identified and fixed in https://github.com/deepspeedai/DeepSpeed/pull/8066 and a new release has been made.

The issue was about mixed precision mode downcasting buffers that had to be in fp32 - massively impacting correctness due to large static buffers - e.g. RoPE in Qwen3 models when using long sequence lengths 32K+.

Hopefully this fix brings Deepspeed to a close parity with FSDP2 which has been an issue since a long time.

You can still have the old behavior but you'd now need to manually configure it - by default the model's buffers will now remain in the original precision.

Please install deepspeed==0.19.2 which will do the right thing.

Thanks to Tunji Ruwase and Claude Opus 4.8 via Cursor for identifying and fixing the problem.

Join the conversation