Sentence Similarity
sentence-transformers
PyTorch
Safetensors
xlm-roberta
feature-extraction
text-embeddings-inference
Instructions to use BAAI/bge-m3-unsupervised with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use BAAI/bge-m3-unsupervised with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("BAAI/bge-m3-unsupervised") sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Notebooks
- Google Colab
- Kaggle
low performance on this checkpoint
#1
by pxyu - opened
Hi,
I am doing some experiments with the BGE-M3 family of models to test the impacts of unsupervised pre-training. Here are some results (R@100 on MIRACL):
| MODEL | DE | EN | ES |
|---|---|---|---|
| XLMR + 60M CC News data | 722 | 721 | 763 |
| BGE RETRO + 60M CC News data | 772 | 774 | 789 |
| BGE Unsupervised (this repo) | 727 | 758 | 668 |
| BGE M3 | 908 | 907 | 902 |
It is obvious that the third row BGE Unsupervised is kind of an outlier here, because the unsupervised pre-training done on your side seem worse than 60M datapoints training on my side. I wonder if you uploaded the wrong checkpoint or that I am not using/evaluating this checkpoint correctly.
Thanks.