NaN outputs when running in torch.float16 with text containing specific tokens (e.g., "import")
Hello Qwen team,
I encountered a numerical instability issue when loading the model with dtype=torch.float16 using sentence-transformers.
I discovered this while processing code snippets. After bisecting the input data, I found that any text containing the word "import" causes the entire embedding output to become NaN. This suggests that the activation values for this specific token likely exceed the FP16 range, causing the issue to propagate to the whole sequence.
Reproduction Code:
from sentence_transformers import SentenceTransformer
import torch
# Problematic configuration: torch.float16
model = SentenceTransformer(
"Qwen/Qwen3-Embedding-8B",
model_kwargs={"dtype": torch.float16}
)
# Case 1: The single word fails
print("Single word:", model.encode("import"))
# Case 2: A sentence containing the word also fails (returns all NaNs)
print("Sentence:", model.encode("import numpy as np"))
Solution / Workaround:
Switching the dtype to bfloat16 resolves the issue and produces valid embeddings.
# Working configuration: torch.bfloat16
model = SentenceTransformer(
"Qwen/Qwen3-Embedding-8B",
model_kwargs={"dtype": torch.bfloat16}
)
Given that "import" is an extremely common keyword in code data, this issue significantly impacts the model's usability in FP16. I suggest adding a note to the Model Card recommending bfloat16 to avoid these overflow issues.
Thanks for the great model!