Fix For NaN Logits in HuggingFace Distribution of OpenELM

by jasonkrone - opened Sep 13, 2024

base: refs/heads/main

←

from: refs/pr/3

Discussion Files changed

-1

jasonkrone

Sep 13, 2024

•

edited Sep 13, 2024

I found that left padding of inputs led to NaN logits. The fix (credit to this thread), is to change the line min_dtype = torch.finfo(dtype).min to min_dtype = torch.finfo(dtype).min / 2 in the function _update_causal_mask.

I presume all other OpenELM model sizes and variations require the same fix.

Note: the if not is_tracing and torch.any(attention_mask != 1): condition in the _update_causal_mask function seems to be addressing the same issue; however, this mitigation only occurs when self.config._attn_implementation == "sdpa", whereas the issue is present even if self.config._attn_implementation == "eager".

P.S. thanks for your work on OpenELM!

Fix For NaN Logits in HuggingFace Distribution of OpenELM13bd99e0

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment