Sentence Similarity
sentence-transformers
PyTorch
ONNX
xlm-roberta
feature-extraction
Eval Results
text-embeddings-inference
Instructions to use BAAI/bge-m3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use BAAI/bge-m3 with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("BAAI/bge-m3") sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Inference
- Notebooks
- Google Colab
- Kaggle
使用BGE-M3做NLI微调?
#121
by weiminw - opened
请问, 是否可以基于BGE-M3 用SNLI, MNLI, Fever等数据集, 以及一些合成的NLI数据集来post training NLI? 或者基于Rerank模型来微调? 因为我需要训练模型去识别复杂条款文档的一些内容蕴含关系? 比如合同条款之间,流程制度之间的关系. 辛苦给一些建议. 之前看过其他BERT模型, 要么就是context 太小(512). 要么就是不支持中文.
用reranker模型来微调embedding模型似乎不太可行,因为两个模型的输出分布相差有点大,对完全不相关的片段,reranker模型能输出接近0,但embedding模型基本在0.5以上,可以试试用更大的Qwen-embedding-8B来蒸馏小模型?