Instructions to use Alibaba-NLP/gte-Qwen2-7B-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use Alibaba-NLP/gte-Qwen2-7B-instruct with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("Alibaba-NLP/gte-Qwen2-7B-instruct", trust_remote_code=True) sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Transformers
How to use Alibaba-NLP/gte-Qwen2-7B-instruct with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Alibaba-NLP/gte-Qwen2-7B-instruct", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("Alibaba-NLP/gte-Qwen2-7B-instruct", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
Fine-tuning Alibaba-NLP/gte-Qwen2-7B-instruct for Domain-Specific Retrieval with Query, Positive, and Hard Negatives
Hi,
I am exploring the possibility of fine-tuning the Alibaba-NLP/gte-Qwen2-7B-instruct model for a domain-specific retrieval task in spanish using a dataset formatted as follows:
Query: A single text input representing the search query.
Positive examples: A list of documents relevant to the query.
Hard negatives: A list of documents contextually similar to the query but explicitly non-relevant.
Could you provide some examples or recommendations for configuring the model to handle this structure effectively? Additionally:
Are there specific pre-processing steps required to handle Spanish text or domain-specific terminology?
Does the model have any inherent support for Spanish, or are there additional considerations when working with non-English datasets?
Are there examples or guidelines available for fine-tuning the model on a retrieval task with this format?
I would greatly appreciate any insights, examples, or resources that could help in this process.
Hello,
is there any fine-tuning script for this model? It would be interesting to tune this model for downstream tasks.
Thanks !
Hello, You can use our open source projects to fine-tune Alibaba-NLP/gte-Qwen2-7B-instruct :https://github.com/NLPJCL/RAG-Retrieval/tree/master/rag_retrieval/train/embedding
Hello, You can use our open source projects to fine-tune Alibaba-NLP/gte-Qwen2-7B-instruct :https://github.com/NLPJCL/RAG-Retrieval/tree/master/rag_retrieval/train/embedding
Great, I will try this script. Thanks for your advise!