search-reranker-broad-v1

search-reranker-broad-v1 is a broad fine-tune of cross-encoder/mmarco-mMiniLMv2-L12-H384-v1 for search reranking.

It is intended for reranking a small candidate set of query-document pairs after retrieval. The repo contains both the fine-tuned root checkpoint and a standard ONNX Runtime CPU artifact at onnx/model.onnx.

Artifacts

  • Root files: Sentence Transformers / Transformers checkpoint for direct use with CrossEncoder or AutoModelForSequenceClassification
  • onnx/model.onnx: dynamic int8 ONNX Runtime export using per-channel weight quantization
  • Tokenizer files at repo root shared by both runtimes

Training

  • Base model: cross-encoder/mmarco-mMiniLMv2-L12-H384-v1
  • Training rows: 1,837
  • Epochs: 2
  • Batch size: 16
  • Learning rate: 2e-5
  • Max length: 128

The fine-tune used a broad weakly supervised reranking dataset built from benchmark queries and competing candidate pages over public web content.

Validation

Local validation on 11 March 2026:

Slice Runtime Top-1 Hit@5 MRR
Broad weak-label eval fp32 checkpoint 0.7882 1.0000 0.8769
Broad weak-label eval ONNX qint8 per-channel 0.7765 1.0000 0.8704
Office-holder canary fp32 checkpoint 0.8333 1.0000 0.9028
Office-holder canary ONNX qint8 per-channel 0.8333 1.0000 0.9074

The ONNX artifact was selected from a small dynamic-int8 profile sweep because it had the best quality retention among the tested q8 variants.

Dataset

The training dataset is not published with this model yet.

Reason:

  • the supervision was derived from an internal benchmark workflow
  • it still needs additional curation before it is suitable as a standalone public dataset release

Usage

Sentence Transformers

from sentence_transformers import CrossEncoder

model = CrossEncoder("temsa/search-reranker-broad-v1")
scores = model.predict([
    ["who is the taoiseach?", "Micheal Martin is the Taoiseach."],
    ["who is the taoiseach?", "Former Taoisigh list..."],
])
print(scores)

Transformers

from transformers import AutoModelForSequenceClassification, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("temsa/search-reranker-broad-v1")
model = AutoModelForSequenceClassification.from_pretrained("temsa/search-reranker-broad-v1")

ONNX Runtime

from huggingface_hub import hf_hub_download
import onnxruntime as ort

model_path = hf_hub_download("temsa/search-reranker-broad-v1", "onnx/model.onnx")
session = ort.InferenceSession(model_path, providers=["CPUExecutionProvider"])
print(session.get_providers())

Limitations

  • The broad training signal is weakly supervised, not fully manually judged.
  • This model is a candidate general reranker, not a guaranteed drop-in quality win for every domain.
  • Validate against your own canaries before replacing an existing reranker globally.
Downloads last month
288
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for temsa/search-reranker-broad-v1