zerank-2-reranker-seq
A Qwen3ForSequenceClassification reranker derived from
zeroentropy/zerank-2-reranker.
The original model is a Qwen3ForCausalLM reranker that scores a (query, document)
pair using the next-token logit of a single relevance token (true_token_id = 9454,
from its 1_LogitScore sentence-transformers head). Because the model uses tied
embeddings, that logit is hidden_state · embed_tokens.weight[9454]. This conversion
copies that single embedding row into the score head of a standard
Qwen3ForSequenceClassification model, producing a num_labels=1 reranker whose
output logit is identical (by construction) to the original relevance score.
This makes the model loadable directly via AutoModelForSequenceClassification and
servable as a cross-encoder reranker (e.g. by infinity),
without the causal-LM + logit-extraction path.
Conversion method: https://github.com/michaelfeil/infinity/blob/main/docs/lm_head_to_classifier/convert_lm.py
Details
architectures:["Qwen3ForSequenceClassification"]num_labels: 1 (single relevance logit; apply a sigmoid for a 0–1 score)- dtype:
bfloat16(matches the source; not downcast to fp16) scorehead:Linear(hidden_size, 1, bias=False), weight =embed_tokens.weight[9454]
Note on prompt formatting
The original model was trained with a chat template that places the query in a
system turn and the document in a user turn, followed by an assistant generation
prefix. Generic sequence-classification servers tokenize the raw (query, document)
pair and do not apply this template, which can shift scores relative to the native
sentence-transformers usage. For best fidelity, format inputs as:
<|im_start|>system
{query}<|im_end|>
<|im_start|>user
{document}<|im_end|>
<|im_start|>assistant
Usage
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
name = "baseten-admin/zerank-2-reranker-seq"
tok = AutoTokenizer.from_pretrained(name)
model = AutoModelForSequenceClassification.from_pretrained(name, torch_dtype=torch.bfloat16).eval()
query, document = "What is the capital of France?", "The capital of France is Paris."
text = (
f"<|im_start|>system\n{query}<|im_end|>\n"
f"<|im_start|>user\n{document}<|im_end|>\n"
f"<|im_start|>assistant\n"
)
with torch.no_grad():
logit = model(**tok(text, return_tensors="pt")).logits.reshape(-1)[0]
score = torch.sigmoid(logit)
print(score.item())
- Downloads last month
- 18