Instructions to use BAAI/bge-multilingual-gemma2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use BAAI/bge-multilingual-gemma2 with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("BAAI/bge-multilingual-gemma2") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Transformers
How to use BAAI/bge-multilingual-gemma2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="BAAI/bge-multilingual-gemma2")# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("BAAI/bge-multilingual-gemma2") model = AutoModel.from_pretrained("BAAI/bge-multilingual-gemma2") - Inference
- Notebooks
- Google Colab
- Kaggle
Integrate with Sentence Transformers (+ third parties like LangChain/Haystack/LlamaIndex, etc.)
Hello!
Preface
First of all, congratulations on this release! I'm very glad to see another powerful multilingual model, and I'm rather excited about your training data release. I don't believe I shared this with you yet, but I took a lot of the datasets from https://huggingface.co/datasets/Shitao/bge-m3-data/tree/main and turned them into standalone datasets published under the Sentence Transformers organization for people to use when finetuning: https://huggingface.co/datasets?other=sentence-transformers&sort=created&author=sentence-transformers
Pull Request overview
- Integrate this model with Sentence Transformers
- Add usage snippet in README + ST tag
Details
This should allow this model to be more easily used with Sentence Transformers, as well as third parties like LangChain & beyond.
Great work on these models as always 👏
- Tom Aarsen
Thank you so much for your PR. And we are also very glad that our dataset can be used in the community.