Instructions to use nvidia/NV-Embed-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use nvidia/NV-Embed-v1 with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("nvidia/NV-Embed-v1", trust_remote_code=True) sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
Multi-Lingual?
#10
by dejanseo - opened
Tokenizer suggests a multi-lingual vocabulary. Would be interesting to hear more details about how much of your training data was non-English, and whether this is all just identical to original Mistral. I will put it to a test soon on a large multi-lingual website to find related pages for internal link recommendations.
I did test it but honestly can't tell the difference in embeddings quality between NV-Embed-v1 and LaBSE. In fact I think LaBSE is a little better at similarity mapping.