nvidia
/

llama-embed-nemotron-8b

Feature Extraction

sentence-transformers

sentence-similarity

Model card Files Files and versions

ybabakhin commited on Nov 15, 2025

Commit

5129df8

·

verified ·

1 Parent(s): 07320ee

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -133,7 +133,7 @@ attn_implementation = "eager"  # Or "flash_attention_2"
 model = SentenceTransformer(
     "nvidia/llama-embed-nemotron-8b",
     trust_remote_code=True,
-    model_kwargs={"attn_implementation": attn_implementation, "torch_dtype": "float32"},
     tokenizer_kwargs={"padding_side": "left"},
 )
@@ -152,7 +152,7 @@ document_embeddings = model.encode_document(documents)
 scores = (query_embeddings @ document_embeddings.T)
 print(scores.tolist())
-# [[0.37646484375, 0.057891845703125]]
 ```
 Or using Hugging Face Transformers like here:

 model = SentenceTransformer(
     "nvidia/llama-embed-nemotron-8b",
     trust_remote_code=True,
+    model_kwargs={"attn_implementation": attn_implementation, "torch_dtype": "bfloat16"},
     tokenizer_kwargs={"padding_side": "left"},
 )
 scores = (query_embeddings @ document_embeddings.T)
 print(scores.tolist())
+# [[0.3770667314529419, 0.05808388814330101]]
 ```
 Or using Hugging Face Transformers like here: