Instructions to use BAAI/bge-m3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use BAAI/bge-m3 with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("BAAI/bge-m3") sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Inference
- Notebooks
- Google Colab
- Kaggle
Evaluation for finetuned bge-m3 model
Hello BAAI team,
I've done finetuning BAAI/bge-m3 model for Vietnamese. But when I used bge -m3 with Sentence Transformer to evaluate using Information Retrieval Evaluation, I got an error that it required 128 GB RAM for Cuda GPU. This model is small but I don't understand the RAM GPU requirement. Please tell me about solution of this error.
And the second question is, if I want to evaluate my finetuned model using the evaluation steps on Github, how can I evaluate it with Vietnamese or other language because I've only seen that you use MS Macro (English Ver) in config of the evaluation.
I am looking forward to hearing from you.
This may be due to the long text input with a large batch size. The memory cost is high when you encode text of 8192 tokens with a large batch size. You can set a smaller input length and batch size.
Regarding to evaluation, you need to prepare your dataset following https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/baai_general_embedding/finetune/eval_msmarco.py#L187 (eval_data) and https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/baai_general_embedding/finetune/eval_msmarco.py#L188 (corpus).
This may be due to the long text input with a large batch size. The memory cost is high when you encode text of 8192 tokens with a large batch size. You can set a smaller input length and batch size.
Regarding to evaluation, you need to prepare your dataset following https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/baai_general_embedding/finetune/eval_msmarco.py#L187 (eval_data) and https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/baai_general_embedding/finetune/eval_msmarco.py#L188 (corpus).
How can I add my evaluation and corpus data to a github link?
