RexReranker-micro / README.md
thebajajra's picture
Update README.md
731930b verified
metadata
license: apache-2.0
language:
  - en
tags:
  - ecommerce
  - e-commerce
  - retail
  - marketplace
  - shopping
  - amazon
  - ebay
  - alibaba
  - google
  - rakuten
  - bestbuy
  - walmart
  - flipkart
  - wayfair
  - shein
  - target
  - etsy
  - shopify
  - taobao
  - asos
  - carrefour
  - costco
  - overstock
  - pretraining
  - encoder
  - language-modeling
  - foundation-model
base_model:
  - thebajajra/RexBERT-micro
pipeline_tag: text-ranking
library_name: sentence-transformers
datasets:
  - thebajajra/Amazebay-Relevance

RexReranker

Models Data ERSS GitHub Blog

RexReranker Micro

State-of-the-art e-commerce neural reranker based on RexBERT-micro that predicts relevance scores, given a search query and product details.

Features

  • Output: Predicts a probability score between 0.0 and 1.0
  • CrossEncoder Compatible: Works directly with Sentence Transformers CrossEncoder
  • Mean Pooling: Uses mean pooling over all tokens for robust representations

Installation

pip install transformers sentence-transformers torch

Quick Start

1. Using HuggingFace Transformers

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

model_id = "thebajajra/RexReranker-micro"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device).eval()

query = "best laptop for programming"
title = "MacBook Pro M3"
description = "Powerful laptop with M3 chip, 16GB RAM, perfect for developers and creative professionals"

inputs = tokenizer(
    f"Query: {query}",
    f"Title: {title}\nDescription: {description}",
    return_tensors="pt",
    truncation=True,
    max_length=min(model.config.max_position_embeddings, 7999),
).to(device)

with torch.no_grad():
    outputs = model(**inputs)
    score = outputs.logits.squeeze(-1)   # shape: [batch]
    print(f"Relevance Score: {score[0].item():.4f}")

2. Using Sentence Transformers CrossEncoder

from sentence_transformers import CrossEncoder

# Load as CrossEncoder
model = CrossEncoder(
    "thebajajra/RexReranker-micro",
    trust_remote_code=True
)

# Single prediction
query = "best laptop for programming"
document = "MacBook Pro M3 - Powerful laptop with M3 chip for developers"

score = model.predict([(query, document)])[0]
print(f"Score: {score:.4f}")

3. Batch Reranking with CrossEncoder

from sentence_transformers import CrossEncoder

model = CrossEncoder("thebajajra/RexReranker-micro", trust_remote_code=True)

query = "best laptop for programming"
documents = [
    "MacBook Pro M3 - Powerful laptop with M3 chip for developers",
    "Gaming Mouse RGB - High precision gaming mouse with 16000 DPI",
    "ThinkPad X1 Carbon - Business ultrabook with long battery life",
    "Mechanical Keyboard - Cherry MX switches for typing comfort",
    "Dell XPS 15 - Premium laptop with 4K OLED display",
]

# Get scores for all documents
pairs = [(query, doc) for doc in documents]
scores = model.predict(pairs)

# Print ranked results
print(f"Query: {query}\n")
for doc, score in sorted(zip(documents, scores), key=lambda x: x[1], reverse=True):
    print(f"  {score:.4f} | {doc[:60]}")

4. Using CrossEncoder's rank() Method

from sentence_transformers import CrossEncoder

model = CrossEncoder("thebajajra/RexReranker-micro", trust_remote_code=True)

query = "wireless headphones with noise cancellation"
documents = [
    "Sony WH-1000XM5 - Industry-leading noise cancellation headphones",
    "Apple AirPods Max - Premium over-ear headphones with spatial audio",
    "Bose QuietComfort 45 - Comfortable wireless noise cancelling headphones",
    "JBL Tune 750BTNC - Affordable wireless headphones with ANC",
    "Logitech Gaming Headset - Wired gaming headphones with microphone",
]

# Rank documents
results = model.rank(query, documents, top_k=3)

print(f"Query: {query}\n")
print("Top 3 Results:")
for result in results:
    idx = result['corpus_id']
    score = result['score']
    print(f"  {score:.4f} | {documents[idx][:60]}")

Input Format

The model expects query-document pairs formatted as:

Field Format
Text A (Query) Query: {your search query}
Text B (Document) Title: {document title}\nDescription: {document description}