Sentence Similarity
sentence-transformers
PyTorch
ONNX
xlm-roberta
feature-extraction
Eval Results
text-embeddings-inference
Instructions to use BAAI/bge-m3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use BAAI/bge-m3 with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("BAAI/bge-m3") sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Inference
- Notebooks
- Google Colab
- Kaggle
中文Dense retrieval性能与BGE V1.5相比如何?
#3
by TianyuLLM - opened
作者可否给出与BGE V1.5的性能对比?
您好,我们暂时没有完善的对比结果。
不过鉴于BGE-M3是多语言模型,单个语言上只做向量检索的话可能并不会明显强于BGE v1.5。但BGE-M3的优势在于更通用(多语言、长文本),同时混合检索具有更好的准确度和泛化性,混合检索结果应该会强于BGE V1.5。
总之,建议根据实际需求选择模型。哪个在具体任务上好用哪个。
非常感谢作者的回答,还想请教个问题,用BGE-M3做长文档召回,召回后如何使用BGE-Reranker(max_length=512)重排呢?
BGE-Reranker确实不支持太长的文本,我们后面会更新一版bge-reranker。
目前可以尝试直接用BGE-M3加权不同检索模式分数做重排,参考compute_score函数:https://huggingface.co/BAAI/bge-m3#compute-score-for-text-pairs