Sentence Similarity
sentence-transformers
PyTorch
ONNX
xlm-roberta
feature-extraction
Eval Results
text-embeddings-inference
Instructions to use BAAI/bge-m3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use BAAI/bge-m3 with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("BAAI/bge-m3") sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Inference
- Notebooks
- Google Colab
- Kaggle
How to load with HF Transformers?
#17
by jhflow - opened
Hi, Thank you for your remarkable work!. I'm really impressed by the performance of this model.
For some reason, I want to load this model via Huggingface transformers (AutoModel.from_pretrinaed or something) not via FlagEmbdding.
Can I do so?
Yes, you can load it in the same way with bge-1.5: https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/baai_general_embedding#using-huggingface-transformers
jhflow changed discussion status to closed
Thank you!
How can I get dense, colbert embeddings with transformers?
Given
from transformers import AutoModel, AutoTokenizer
from torch import Tensor
import torch
model_path = 'BAAI/bge-m3'
model = AutoModel.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)
test_sentence = ["this is a test sentence"]
batch_dict = tokenizer(test_sentence, return_tensors='pt', max_length=128, padding=True, truncation=True)
outputs = model(**batch_dict)
I get BaseModelOutputWithPoolingAndCrossAttentions with pooler_output and last_hidden_state keys. Is pooler_output the CLS embedding and last_hidden_state all the token embeddings?
Kindly clarify. Thank you.