--- tags: - reranker - qwen3 - information-retrieval - product-search base_model: Qwen/Qwen3-Reranker-0.6B license: mit language: - en pipeline_tag: text-classification --- # Qwen3-Reranker-HomeDepot Fine-tuned [Qwen3-Reranker-0.6B](https://huggingface.co/Qwen/Qwen3-Reranker-0.6B) on the Home Depot product search dataset for e-commerce search ranking. ## Model Description This model is a cross-encoder reranker trained to score query-product pairs for relevance. It takes a search query and product description as input and outputs a relevance score between 0 and 1. **Base Model**: Qwen/Qwen3-Reranker-0.6B **Training Dataset**: Home Depot Product Search **Training Samples**: 51,911 **Task**: Binary relevance classification (relevant/irrelevant) ## Training Details ### Dataset - **Total samples**: 51,911 - **Splits**: 70% train / 15% validation / 15% test - **Splitting strategy**: Query-stratified (prevents data leakage) - **Label threshold**: Relevance ≥ 2.33 → relevant (1), else irrelevant (0) - **Label distribution**: ~68% relevant, ~32% irrelevant ### Training Configuration ``` Learning rate: 5e-6 Batch size: 8 × 2 = 16 Epochs: 3 Optimizer: AdamW (weight_decay=0.01) Scheduler: Linear warmup + decay Mixed precision: BF16 ``` ### Hardware - **GPU**: NVIDIA A100 80GB - **Training time**: ~2-4 hours ## Usage ### Installation ```bash pip install transformers torch ``` ### Basic Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch # Load model and tokenizer model = AutoModelForCausalLM.from_pretrained( "codefactory4791/Qwen3-Reranker-HomeDepot", torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True ) tokenizer = AutoTokenizer.from_pretrained( "codefactory4791/Qwen3-Reranker-HomeDepot", trust_remote_code=True ) # Prepare input query = "cordless drill" document = "DEWALT 20V MAX Cordless Drill Kit with battery and charger" # Format prompt prompt = f'''<|im_start|>system Judge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be "yes" or "no".<|im_end|> <|im_start|>user : Given a web search query, retrieve relevant passages that answer the query : {query} : {document}<|im_end|> <|im_start|>assistant ''' # Tokenize and get score inputs = tokenizer(prompt, return_tensors="pt").to(model.device) outputs = model(**inputs) logits = outputs.logits[0, -1, :] # Get yes/no token probabilities token_yes = tokenizer.convert_tokens_to_ids('yes') token_no = tokenizer.convert_tokens_to_ids('no') score = torch.sigmoid(logits[token_yes] - logits[token_no]).item() print(f"Relevance score: {score:.4f}") ``` ### Using with Ranking-Qwen Library ```python from ranking_qwen.models import QwenReranker # Load fine-tuned model reranker = QwenReranker(model_name="codefactory4791/Qwen3-Reranker-HomeDepot") # Score multiple candidates scores = reranker.compute_scores( queries=["drill bits", "drill bits"], documents=[ "DEWALT 14-Piece Titanium Drill Bit Set", "Black+Decker Screwdriver Set" ] ) # Returns: [0.92, 0.31] ``` ## Performance Expected metrics on Home Depot test set: - **NDCG@10**: ≥ 0.80 - **MAP**: ≥ 0.75 - **MRR**: ≥ 0.85 - **AUC**: ≥ 0.90 ## Limitations - Trained specifically for Home Depot product search - May not generalize well to other domains without fine-tuning - Maximum sequence length: 8192 tokens (though 2048 is recommended for speed) ## Citation ```bibtex @misc{qwen3-reranker-homedepot, author = {Your Name}, title = {Qwen3-Reranker Fine-tuned on Home Depot Dataset}, year = {2026}, publisher = {HuggingFace}, howpublished = {\url{https://huggingface.co/codefactory4791/Qwen3-Reranker-HomeDepot}} } ``` ## License MIT License - See base model license for additional details. ## Acknowledgments - Base model: [Qwen3-Reranker-0.6B](https://huggingface.co/Qwen/Qwen3-Reranker-0.6B) - Dataset: Home Depot Product Search Relevance - Training framework: HuggingFace Transformers