Instructions to use Alibaba-NLP/gte-Qwen2-7B-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use Alibaba-NLP/gte-Qwen2-7B-instruct with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("Alibaba-NLP/gte-Qwen2-7B-instruct", trust_remote_code=True) sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Transformers
How to use Alibaba-NLP/gte-Qwen2-7B-instruct with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Alibaba-NLP/gte-Qwen2-7B-instruct", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("Alibaba-NLP/gte-Qwen2-7B-instruct", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
Pooling method: mean vs last?
Same to title, which one should i choose for inference or training?
recommending to use the last token pooling method, please refer to the example code in the model introduction.
I noted that in the original GTE paper "Towards General Text Embeddings with Multi-stage Contrastive Learning" Section 3.1 Model Architecture, mean pooling is used. However in gte-Qwen2-7B-instruct, last token pooling is used, as is shown in the example code and config file. I wonder is there any literature reference or experience could be shared on the design choice of the pooling method? It looks like bidirectional embedding models typically use mean pooling (as is the case in the original GTE paper with BERT), while the last token embedding is more common for decoder-only LLM based models.