Instructions to use nvidia/NV-Embed-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use nvidia/NV-Embed-v1 with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("nvidia/NV-Embed-v1", trust_remote_code=True) sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
Getting "KeyError" when loading model
I have built from source using pip install -q git+https://github.com/huggingface/transformers.git
When trying to load the model:model = AutoModel.from_pretrained("nvidia/NV-Embed-v1",trust_remote_code=True,token=token)
I get the following exception:
KeyError Traceback (most recent call last)
Cell In[11], line 1
----> 1 model = AutoModel.from_pretrained("nvidia/NV-Embed-v1",
2 trust_remote_code=True,
3 token=token) KeyError: 'NVEmbedConfig'
Any hints?
Thank you
Thank you for reporting the issue. Can you try upgrading your transformers package? For example, upgrading the python packages as below,
pip uninstall -y transformer-engine
pip install torch==2.2.0
pip install transformers --upgrade
pip install flash-attn==2.2.0
Same error. KeyError: ‘NVEmbedConfig’. Have uninstalled and installed suggested libraries. Would like to use the model. Any suggestions are appreciated.
For me, passing token will have this issue, when i do
huggingface-cli login
This issue goes away after forcing re-download
@nada5
Could you please post the exact versions under which it works?
I use cuda version 11.8, V11.8.89 (this is fixed).
After the update, I have
sentence-transformers==2.7.0
transformers==4.41.2
torch==2.2.0
flash-attn==2.2.0
but then an ImportError occurs when trying to load "nvidia/NV-Embed-v1":
4 import torch.nn as nn
6 # isort: off
7 # We need to import the CUDA kernels after importing torch
----> 8 import flash_attn_2_cuda as flash_attn_cuda
10 # isort: on
13 def _get_block_size(device, head_dim, is_dropout, is_causal):
14 # This should match the block sizes in the CUDA kernel
When I just try to update flash-attn with !pip install --upgrade flash-attn --no-build-isolation to flash-attn==2.5.9.post1, I still get the same ImportError. When I downgrade to torch=2.1.2 (which works fine with other HF models), I am back to KeyError: 'NVEmbedConfig' .
I got it to work. For me it required a newer Cuda version, it worked with cuda_12.1.r12.1.