Fine-tuning Alibaba-NLP/gte-Qwen2-7B-instruct for Domain-Specific Retrieval with Query, Positive, and Hard Negatives

#41

by wilfoderek - opened Dec 5, 2024

Dec 5, 2024

Hi,

I am exploring the possibility of fine-tuning the Alibaba-NLP/gte-Qwen2-7B-instruct model for a domain-specific retrieval task in spanish using a dataset formatted as follows:

Query: A single text input representing the search query.
Positive examples: A list of documents relevant to the query.
Hard negatives: A list of documents contextually similar to the query but explicitly non-relevant.

Could you provide some examples or recommendations for configuring the model to handle this structure effectively? Additionally:

Are there specific pre-processing steps required to handle Spanish text or domain-specific terminology?
Does the model have any inherent support for Spanish, or are there additional considerations when working with non-English datasets?
Are there examples or guidelines available for fine-tuning the model on a retrieval task with this format?
I would greatly appreciate any insights, examples, or resources that could help in this process.

lryyyy

Dec 5, 2024

Hello,
is there any fine-tuning script for this model? It would be interesting to tune this model for downstream tasks.
Thanks !

lijiacheng06

Dec 26, 2024

Hello， You can use our open source projects to fine-tune Alibaba-NLP/gte-Qwen2-7B-instruct ：https://github.com/NLPJCL/RAG-Retrieval/tree/master/rag_retrieval/train/embedding

lryyyy

Jan 2, 2025

Hello， You can use our open source projects to fine-tune Alibaba-NLP/gte-Qwen2-7B-instruct ：https://github.com/NLPJCL/RAG-Retrieval/tree/master/rag_retrieval/train/embedding

Great, I will try this script. Thanks for your advise!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment