Instructions to use intfloat/multilingual-e5-large-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use intfloat/multilingual-e5-large-instruct with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("intfloat/multilingual-e5-large-instruct") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Transformers
How to use intfloat/multilingual-e5-large-instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="intfloat/multilingual-e5-large-instruct")# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("intfloat/multilingual-e5-large-instruct") model = AutoModel.from_pretrained("intfloat/multilingual-e5-large-instruct") - Inference
- Notebooks
- Google Colab
- Kaggle
Ensuring compatibility with the sentence-transformers library
Browse filesI encountered the following error when attempting to load multilingual-e5-large-instruct using the sentence-transformers library:
`not found: multilingual-e5-large-instruct/sentence_xlnet_config.json.`
Upon investigating, I noticed that the sentence-transformers library looks for additional model configuration files like `sentence_bert_config.json` during model loading, as shown in this code snippet:
(https://github.com/UKPLab/sentence-transformers/blob/c68bf68299a4435c6a48ea15d789fef596bf1444/sentence_transformers/models/Transformer.py#L527-L540)
Additionally, other embedding models, such as bge-m3, also include this configuration file: https://huggingface.co/BAAI/bge-m3/blob/main/sentence_bert_config.json
To address this issue, I created the necessary `sentence_bert_config.json` file based on the xlm-roberta configuration.
|
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"max_seq_length": 512,
|
| 3 |
+
"do_lower_case": false
|
| 4 |
+
}
|