Instructions to use Alibaba-NLP/gte-Qwen2-7B-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use Alibaba-NLP/gte-Qwen2-7B-instruct with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("Alibaba-NLP/gte-Qwen2-7B-instruct", trust_remote_code=True) sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Transformers
How to use Alibaba-NLP/gte-Qwen2-7B-instruct with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Alibaba-NLP/gte-Qwen2-7B-instruct", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("Alibaba-NLP/gte-Qwen2-7B-instruct", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
Padding token for batched embedding in Transformers?
Wondering if there are any best or special practices for embedding batches of documents with this model. In my own testing I have found that the presence of extra items in a batch (if it causes any padding to occur) can have an impact on the resulting embedding compared to case of a single-document batch.
The tokenizer in the Transformers approach always ends an eos token, but it doesn't add any bos tokens (which are also the same as the eos token), and further it uses the eos/bos token as a padding token... Is that by design?
Tips would be much appreciated
Whether batch inference uses padding tokens depends on the tokenizer's padding parameter being set to true or false. We recommend not using the padding mode.