RexReranker-base / README.md

Update README.md

f187e43 verified 1 day ago

5.3 kB

	---
	license: apache-2.0
	language:
	- en
	tags:
	- ecommerce
	- e-commerce
	- retail
	- marketplace
	- shopping
	- amazon
	- ebay
	- alibaba
	- google
	- rakuten
	- bestbuy
	- walmart
	- flipkart
	- wayfair
	- shein
	- target
	- etsy
	- shopify
	- taobao
	- asos
	- carrefour
	- costco
	- overstock
	- pretraining
	- encoder
	- language-modeling
	- foundation-model
	base_model:
	- thebajajra/RexBERT-base
	pipeline_tag: text-ranking
	library_name: sentence-transformers
	datasets:
	- thebajajra/Amazebay-Relevance
	---

	<p align="center">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/6893dd21467f7d2f5f358a95/apOIbl5PdJuRk-tQMdDc8.png" alt="RexReranker">
	</p>
	<p align="center">
	</p>

	[![Models](https://img.shields.io/badge/🤗%20Hugging%20Face-Models-red)](https://huggingface.co/collections/thebajajra/rexreranker)
	[![Data](https://img.shields.io/badge/🤗%20Training%20Data-AmazebayR-yellow)](https://huggingface.co/datasets/thebajajra/Amazebay-Relevance)
	[![ERSS](https://img.shields.io/badge/🤗%20Evaluation%20Data-ERSS-blue)](https://huggingface.co/datasets/thebajajra/eress)
	[![GitHub](https://img.shields.io/badge/GitHub-Code-black)](https://github.com/bajajra/RexRerankers)
	[![Blog](https://img.shields.io/badge/Blog-Blog-green)](https://huggingface.co/blog/thebajajra/rexrerankers)

	# RexReranker Base

	State-of-the-art e-commerce neural reranker based on RexBERT-base that predicts relevance scores, given a search query and product details.

	## Features

	- Output: Predicts a probability score between 0.0 and 1.0
	- CrossEncoder Compatible: Works directly with Sentence Transformers CrossEncoder
	- Mean Pooling: Uses mean pooling over all tokens for robust representations

	## Evaluation Performance


	![ndcg5_comparison_chart_rex_red_family](https://cdn-uploads.huggingface.co/production/uploads/6893dd21467f7d2f5f358a95/13YPP-i1d45XWyLlUzh2r.png)

	## Installation

	```bash
	pip install transformers sentence-transformers torch
	```

	## Quick Start

	### 1. Using HuggingFace Transformers

	```python
	from transformers import AutoModelForSequenceClassification, AutoTokenizer
	import torch

	model_id = "thebajajra/RexReranker-base"

	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForSequenceClassification.from_pretrained(model_id)

	device = "cuda" if torch.cuda.is_available() else "cpu"
	model = model.to(device).eval()

	query = "best laptop for programming"
	title = "MacBook Pro M3"
	description = "Powerful laptop with M3 chip, 16GB RAM, perfect for developers and creative professionals"

	inputs = tokenizer(
	f"Query: {query}",
	f"Title: {title}\nDescription: {description}",
	return_tensors="pt",
	truncation=True,
	max_length=min(model.config.max_position_embeddings, 7999),
	).to(device)

	with torch.no_grad():
	outputs = model(**inputs)
	score = outputs.logits.squeeze(-1) # shape: [batch]
	print(f"Relevance Score: {score[0].item():.4f}")
	```

	### 2. Using Sentence Transformers CrossEncoder

	```python
	from sentence_transformers import CrossEncoder

	# Load as CrossEncoder
	model = CrossEncoder(
	"thebajajra/RexReranker-base",
	trust_remote_code=True
	)

	# Single prediction
	query = "best laptop for programming"
	document = "MacBook Pro M3 - Powerful laptop with M3 chip for developers"

	score = model.predict([(query, document)])[0]
	print(f"Score: {score:.4f}")
	```

	### 3. Batch Reranking with CrossEncoder

	```python
	from sentence_transformers import CrossEncoder

	model = CrossEncoder("thebajajra/RexReranker-base", trust_remote_code=True)

	query = "best laptop for programming"
	documents = [
	"MacBook Pro M3 - Powerful laptop with M3 chip for developers",
	"Gaming Mouse RGB - High precision gaming mouse with 16000 DPI",
	"ThinkPad X1 Carbon - Business ultrabook with long battery life",
	"Mechanical Keyboard - Cherry MX switches for typing comfort",
	"Dell XPS 15 - Premium laptop with 4K OLED display",
	]

	# Get scores for all documents
	pairs = [(query, doc) for doc in documents]
	scores = model.predict(pairs)

	# Print ranked results
	print(f"Query: {query}\n")
	for doc, score in sorted(zip(documents, scores), key=lambda x: x[1], reverse=True):
	print(f" {score:.4f} \| {doc[:60]}")
	```

	### 4. Using CrossEncoder's rank() Method

	```python
	from sentence_transformers import CrossEncoder

	model = CrossEncoder("thebajajra/RexReranker-base", trust_remote_code=True)

	query = "wireless headphones with noise cancellation"
	documents = [
	"Sony WH-1000XM5 - Industry-leading noise cancellation headphones",
	"Apple AirPods Max - Premium over-ear headphones with spatial audio",
	"Bose QuietComfort 45 - Comfortable wireless noise cancelling headphones",
	"JBL Tune 750BTNC - Affordable wireless headphones with ANC",
	"Logitech Gaming Headset - Wired gaming headphones with microphone",
	]

	# Rank documents
	results = model.rank(query, documents, top_k=3)

	print(f"Query: {query}\n")
	print("Top 3 Results:")
	for result in results:
	idx = result['corpus_id']
	score = result['score']
	print(f" {score:.4f} \| {documents[idx][:60]}")
	```

	## Input Format

	The model expects query-document pairs formatted as:

	\| Field \| Format \|
	\|-------\|--------\|
	\| Text A (Query) \| `Query: {your search query}` \|
	\| Text B (Document) \| `Title: {document title}\nDescription: {document description}` \|