--- license: apache-2.0 language: - en tags: - ecommerce - e-commerce - retail - marketplace - shopping - amazon - ebay - alibaba - google - rakuten - bestbuy - walmart - flipkart - wayfair - shein - target - etsy - shopify - taobao - asos - carrefour - costco - overstock - pretraining - encoder - language-modeling - foundation-model base_model: - thebajajra/RexBERT-mini pipeline_tag: text-ranking library_name: sentence-transformers datasets: - thebajajra/Amazebay-Relevance ---
[](https://huggingface.co/collections/thebajajra/rexreranker) [](https://huggingface.co/datasets/thebajajra/Amazebay-Relevance) [](https://huggingface.co/datasets/thebajajra/eress) [](https://github.com/bajajra/RexRerankers) [](https://huggingface.co/blog/thebajajra/rexrerankers) # RexReranker Mini State-of-the-art **e-commerce** neural reranker based on RexBERT-mini that predicts relevance scores, given a search query and product details. ## Features - **Output**: Predicts a probability score between 0.0 and 1.0 - **CrossEncoder Compatible**: Works directly with Sentence Transformers CrossEncoder - **Mean Pooling**: Uses mean pooling over all tokens for robust representations ## Installation ```bash pip install transformers sentence-transformers torch ``` ## Quick Start ### 1. Using HuggingFace Transformers ```python from transformers import AutoModelForSequenceClassification, AutoTokenizer import torch model_id = "thebajajra/RexReranker-mini" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForSequenceClassification.from_pretrained(model_id) device = "cuda" if torch.cuda.is_available() else "cpu" model = model.to(device).eval() query = "best laptop for programming" title = "MacBook Pro M3" description = "Powerful laptop with M3 chip, 16GB RAM, perfect for developers and creative professionals" inputs = tokenizer( f"Query: {query}", f"Title: {title}\nDescription: {description}", return_tensors="pt", truncation=True, max_length=min(model.config.max_position_embeddings, 7999), ).to(device) with torch.no_grad(): outputs = model(**inputs) score = outputs.logits.squeeze(-1) # shape: [batch] print(f"Relevance Score: {score[0].item():.4f}") ``` ### 2. Using Sentence Transformers CrossEncoder ```python from sentence_transformers import CrossEncoder # Load as CrossEncoder model = CrossEncoder( "thebajajra/RexReranker-mini", trust_remote_code=True ) # Single prediction query = "best laptop for programming" document = "MacBook Pro M3 - Powerful laptop with M3 chip for developers" score = model.predict([(query, document)])[0] print(f"Score: {score:.4f}") ``` ### 3. Batch Reranking with CrossEncoder ```python from sentence_transformers import CrossEncoder model = CrossEncoder("thebajajra/RexReranker-mini", trust_remote_code=True) query = "best laptop for programming" documents = [ "MacBook Pro M3 - Powerful laptop with M3 chip for developers", "Gaming Mouse RGB - High precision gaming mouse with 16000 DPI", "ThinkPad X1 Carbon - Business ultrabook with long battery life", "Mechanical Keyboard - Cherry MX switches for typing comfort", "Dell XPS 15 - Premium laptop with 4K OLED display", ] # Get scores for all documents pairs = [(query, doc) for doc in documents] scores = model.predict(pairs) # Print ranked results print(f"Query: {query}\n") for doc, score in sorted(zip(documents, scores), key=lambda x: x[1], reverse=True): print(f" {score:.4f} | {doc[:60]}") ``` ### 4. Using CrossEncoder's rank() Method ```python from sentence_transformers import CrossEncoder model = CrossEncoder("thebajajra/RexReranker-mini", trust_remote_code=True) query = "wireless headphones with noise cancellation" documents = [ "Sony WH-1000XM5 - Industry-leading noise cancellation headphones", "Apple AirPods Max - Premium over-ear headphones with spatial audio", "Bose QuietComfort 45 - Comfortable wireless noise cancelling headphones", "JBL Tune 750BTNC - Affordable wireless headphones with ANC", "Logitech Gaming Headset - Wired gaming headphones with microphone", ] # Rank documents results = model.rank(query, documents, top_k=3) print(f"Query: {query}\n") print("Top 3 Results:") for result in results: idx = result['corpus_id'] score = result['score'] print(f" {score:.4f} | {documents[idx][:60]}") ``` ## Input Format The model expects query-document pairs formatted as: | Field | Format | |-------|--------| | Text A (Query) | `Query: {your search query}` | | Text B (Document) | `Title: {document title}\nDescription: {document description}` |