Instructions to use Alibaba-NLP/gte-multilingual-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use Alibaba-NLP/gte-multilingual-base with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("Alibaba-NLP/gte-multilingual-base", trust_remote_code=True) sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Transformers
How to use Alibaba-NLP/gte-multilingual-base with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Alibaba-NLP/gte-multilingual-base", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
use Flash Attention
I attempted to use Flash Attention, but encountered the following error: NewModel does not support Flash Attention 2.0 yet. The model gte-multilingual-base does not yet support Flash Attention 2.0 ?
Could you please paste the code for your model inference here? It would help us with debugging.
Could you please paste the code for your model inference here? It would help us with debugging.
model = AutoModel.from_pretrained(model_path, trust_remote_code=True, attn_implementation="flash_attention_2" )
ValueError: NewModel does not support Flash Attention 2.0 yet. Please request to add support where the model is hosted
The xformers has flash attention 2 kernel, and will dispatch to it when on the appropriate device and data type, ref to https://huggingface.co/Alibaba-NLP/new-impl#recommendation-enable-unpadding-and-acceleration-with-xformers