update README
Browse files
README.md
CHANGED
|
@@ -3330,7 +3330,7 @@ model = AutoModel.from_pretrained('Salesforce/SFR-Embedding-Mistral')
|
|
| 3330 |
# get the embeddings
|
| 3331 |
max_length = 4096
|
| 3332 |
input_texts = queries + passages
|
| 3333 |
-
batch_dict = tokenizer(input_texts, max_length=max_length
|
| 3334 |
outputs = model(**batch_dict)
|
| 3335 |
embeddings = last_token_pool(outputs.last_hidden_state, batch_dict['attention_mask'])
|
| 3336 |
|
|
@@ -3369,7 +3369,9 @@ print(scores.tolist())
|
|
| 3369 |
# [[86.71537780761719, 36.645721435546875], [35.00497055053711, 82.07388305664062]]
|
| 3370 |
```
|
| 3371 |
|
| 3372 |
-
|
|
|
|
|
|
|
| 3373 |
|
| 3374 |
SFR-Embedding Team (∗indicates lead contributors).
|
| 3375 |
* Rui Meng*
|
|
|
|
| 3330 |
# get the embeddings
|
| 3331 |
max_length = 4096
|
| 3332 |
input_texts = queries + passages
|
| 3333 |
+
batch_dict = tokenizer(input_texts, max_length=max_length, padding=True, truncation=True, return_tensors="pt")
|
| 3334 |
outputs = model(**batch_dict)
|
| 3335 |
embeddings = last_token_pool(outputs.last_hidden_state, batch_dict['attention_mask'])
|
| 3336 |
|
|
|
|
| 3369 |
# [[86.71537780761719, 36.645721435546875], [35.00497055053711, 82.07388305664062]]
|
| 3370 |
```
|
| 3371 |
|
| 3372 |
+
### MTEB Benchmark Evaluation
|
| 3373 |
+
Check out [unilm/e5](https://github.com/microsoft/unilm/tree/master/e5) to reproduce evaluation results on the [BEIR](https://arxiv.org/abs/2104.08663) and [MTEB](https://arxiv.org/abs/2210.07316) benchmark.
|
| 3374 |
+
|
| 3375 |
|
| 3376 |
SFR-Embedding Team (∗indicates lead contributors).
|
| 3377 |
* Rui Meng*
|