Instructions to use nvidia/NV-Embed-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use nvidia/NV-Embed-v1 with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("nvidia/NV-Embed-v1", trust_remote_code=True) sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
Why do we need to hardcode self._attn_implementation = "eager"
#35
by shantanuagarwal - opened
Thanks a lot for making the code public. Looking into modeling_nvembed.py file, I notice two things:
layer.self_attn.is_causal = False. This makes sense as we want to enforce bi-directionality.- However, what I am not understanding is, why do we need to enforce that the attention implementation be
eager? So, sdpa/flash_attention_2 is not supported?
What I am trying to understand is, what would need to change in BidirectionalMistralModel's forward to make it compatible with sdpa/flash_attention_2?