BAAI
/

bge-m3

Sentence Similarity

sentence-transformers

feature-extraction

text-embeddings-inference

Model card Files Files and versions

Shitao commited on Feb 11, 2024

Commit

cfdb103

·

verified ·

1 Parent(s): aa47896

Update README.md

Files changed (1) hide show

README.md +7 -6

README.md CHANGED Viewed

@@ -215,11 +215,6 @@ print(model.compute_score(sentence_pairs,
 We compare BGE-M3 with some popular methods, including BM25, openAI embedding, etc.
-We utilized Pyserini to implement BM25, and the test results can be reproduced by this [script](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB/MLDR#bm25-baseline).
-To make the BM25 and BGE-M3 more comparable, in the experiment,
-BM25 used the same tokenizer as BGE-M3 (i.e., the tokenizer of XLM-Roberta).
-Using the same vocabulary can also ensure that both approaches have the same retrieval latency.
 - Multilingual (Miracl dataset)
@@ -242,6 +237,12 @@ Using the same vocabulary can also ensure that both approaches have the same ret
   - NarritiveQA:
   ![avatar](./imgs/nqa.jpg)
 ## Training
 - Self-knowledge Distillation: combining multiple outputs from different
@@ -259,7 +260,7 @@ Refer to our [report](https://arxiv.org/pdf/2402.03216.pdf) for more details.
 ## Acknowledgement
 Thanks the authors of open-sourced datasets, including Miracl, MKQA, NarritiveQA, etc.
-Thanks the open-sourced libraries like [Tevatron](https://github.com/texttron/tevatron), [pyserial](https://github.com/pyserial/pyserial).

 We compare BGE-M3 with some popular methods, including BM25, openAI embedding, etc.
 - Multilingual (Miracl dataset)
   - NarritiveQA:
   ![avatar](./imgs/nqa.jpg)
+- BM25
+We utilized Pyserini to implement BM25, and the test results can be reproduced by this [script](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB/MLDR#bm25-baseline).
+![avatar](./imgs/bm25.jpg)
 ## Training
 - Self-knowledge Distillation: combining multiple outputs from different
 ## Acknowledgement
 Thanks the authors of open-sourced datasets, including Miracl, MKQA, NarritiveQA, etc.
+Thanks the open-sourced libraries like [Tevatron](https://github.com/texttron/tevatron), [Pyserini](https://github.com/castorini/pyserini).