Sentence Similarity
sentence-transformers
PyTorch
ONNX
xlm-roberta
feature-extraction
Eval Results
text-embeddings-inference
Instructions to use BAAI/bge-m3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use BAAI/bge-m3 with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("BAAI/bge-m3") sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Inference
- Notebooks
- Google Colab
- Kaggle
Colbert Mode Usage
#41
by pulkitchahar - opened
I wanted to store the colbert embeddings for faster reranking of retrieval based on dense vec. But considering that if a document have 1024 tokens on average(trunc if more), I will have 1024*1024 matrix, the size of which if i use fp16 will be 2MB. That sounds huge, especially when I think about scaling up. Am I doing this right, or am I missing something? Are there any ways to decrease the size but still keep the performance similar to original.
i'm also interested