Sentence Similarity
sentence-transformers
Safetensors
Transformers
new
feature-extraction
mteb
multilingual
text-embeddings-inference
custom_code
Eval Results (legacy)
Instructions to use Alibaba-NLP/gte-multilingual-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use Alibaba-NLP/gte-multilingual-base with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("Alibaba-NLP/gte-multilingual-base", trust_remote_code=True) sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Transformers
How to use Alibaba-NLP/gte-multilingual-base with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Alibaba-NLP/gte-multilingual-base", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Do you plan to open-source the training code?
#1
by adol01 - opened
This model is really good; it would be great if it could be open-sourced.
In the short term, we do not plan to open-source the training code. Our main focus remains on how to build better and more efficient models, which we will then open-source to the community.
The MLM pre-training code is adapted from Hugging Face code (run_mlm.py) to fit the large dataset, without too many additional modifications for optimization.
The contrastive learning code is similar to texttron/tevatron/, nomic-ai/contrastors, or FlagOpen/FlagEmbedding.