README.md · thebajajra/RexReranker-mini at main

File size: 5,128 Bytes

---
license: apache-2.0
language:
- en
tags:
- ecommerce
- e-commerce
- retail
- marketplace
- shopping
- amazon
- ebay
- alibaba
- google
- rakuten
- bestbuy
- walmart
- flipkart
- wayfair
- shein
- target
- etsy
- shopify
- taobao
- asos
- carrefour
- costco
- overstock
- pretraining
- encoder
- language-modeling
- foundation-model
base_model:
- thebajajra/RexBERT-mini
pipeline_tag: text-ranking
library_name: sentence-transformers
datasets:
- thebajajra/Amazebay-Relevance
---

<p align="center">
<img src="https://cdn-uploads.huggingface.co/production/uploads/6893dd21467f7d2f5f358a95/apOIbl5PdJuRk-tQMdDc8.png" alt="RexReranker">
</p>
<p align="center">
</p>

[![Models](https://img.shields.io/badge/🤗%20Hugging%20Face-Models-red)](https://huggingface.co/collections/thebajajra/rexreranker)
[![Data](https://img.shields.io/badge/🤗%20Training%20Data-AmazebayR-yellow)](https://huggingface.co/datasets/thebajajra/Amazebay-Relevance)
[![ERSS](https://img.shields.io/badge/🤗%20Evaluation%20Data-ERSS-blue)](https://huggingface.co/datasets/thebajajra/eress)
[![GitHub](https://img.shields.io/badge/GitHub-Code-black)](https://github.com/bajajra/RexRerankers)
[![Blog](https://img.shields.io/badge/Blog-Blog-green)](https://huggingface.co/blog/thebajajra/rexrerankers)

# RexReranker Mini

State-of-the-art **e-commerce** neural reranker based on RexBERT-mini that predicts relevance scores, given a search query and product details.

## Features

- **Output**: Predicts a probability score between 0.0 and 1.0
- **CrossEncoder Compatible**: Works directly with Sentence Transformers CrossEncoder
- **Mean Pooling**: Uses mean pooling over all tokens for robust representations

## Installation

```bash
pip install transformers sentence-transformers torch
```

## Quick Start

### 1. Using HuggingFace Transformers

```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

model_id = "thebajajra/RexReranker-mini"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device).eval()

query = "best laptop for programming"
title = "MacBook Pro M3"
description = "Powerful laptop with M3 chip, 16GB RAM, perfect for developers and creative professionals"

inputs = tokenizer(
    f"Query: {query}",
    f"Title: {title}\nDescription: {description}",
    return_tensors="pt",
    truncation=True,
    max_length=min(model.config.max_position_embeddings, 7999),
).to(device)

with torch.no_grad():
    outputs = model(**inputs)
    score = outputs.logits.squeeze(-1)   # shape: [batch]
    print(f"Relevance Score: {score[0].item():.4f}")
```

### 2. Using Sentence Transformers CrossEncoder

```python
from sentence_transformers import CrossEncoder

# Load as CrossEncoder
model = CrossEncoder(
    "thebajajra/RexReranker-mini",
    trust_remote_code=True
)

# Single prediction
query = "best laptop for programming"
document = "MacBook Pro M3 - Powerful laptop with M3 chip for developers"

score = model.predict([(query, document)])[0]
print(f"Score: {score:.4f}")
```

### 3. Batch Reranking with CrossEncoder

```python
from sentence_transformers import CrossEncoder

model = CrossEncoder("thebajajra/RexReranker-mini", trust_remote_code=True)

query = "best laptop for programming"
documents = [
    "MacBook Pro M3 - Powerful laptop with M3 chip for developers",
    "Gaming Mouse RGB - High precision gaming mouse with 16000 DPI",
    "ThinkPad X1 Carbon - Business ultrabook with long battery life",
    "Mechanical Keyboard - Cherry MX switches for typing comfort",
    "Dell XPS 15 - Premium laptop with 4K OLED display",
]

# Get scores for all documents
pairs = [(query, doc) for doc in documents]
scores = model.predict(pairs)

# Print ranked results
print(f"Query: {query}\n")
for doc, score in sorted(zip(documents, scores), key=lambda x: x[1], reverse=True):
    print(f"  {score:.4f} | {doc[:60]}")
```

### 4. Using CrossEncoder's rank() Method

```python
from sentence_transformers import CrossEncoder

model = CrossEncoder("thebajajra/RexReranker-mini", trust_remote_code=True)

query = "wireless headphones with noise cancellation"
documents = [
    "Sony WH-1000XM5 - Industry-leading noise cancellation headphones",
    "Apple AirPods Max - Premium over-ear headphones with spatial audio",
    "Bose QuietComfort 45 - Comfortable wireless noise cancelling headphones",
    "JBL Tune 750BTNC - Affordable wireless headphones with ANC",
    "Logitech Gaming Headset - Wired gaming headphones with microphone",
]

# Rank documents
results = model.rank(query, documents, top_k=3)

print(f"Query: {query}\n")
print("Top 3 Results:")
for result in results:
    idx = result['corpus_id']
    score = result['score']
    print(f"  {score:.4f} | {documents[idx][:60]}")
```

## Input Format

The model expects query-document pairs formatted as:

| Field | Format |
|-------|--------|
| Text A (Query) | `Query: {your search query}` |
| Text B (Document) | `Title: {document title}\nDescription: {document description}` |