baseten-admin's picture
Upload folder using huggingface_hub
ac4b5f3 verified
|
Raw
History Blame Contribute Delete
2.77 kB
---
license: apache-2.0
base_model: zeroentropy/zerank-2-reranker
pipeline_tag: text-classification
tags:
- reranker
- cross-encoder
- sequence-classification
- qwen3
---
# zerank-2-reranker-seq
A `Qwen3ForSequenceClassification` reranker derived from
[`zeroentropy/zerank-2-reranker`](https://huggingface.co/zeroentropy/zerank-2-reranker).
The original model is a `Qwen3ForCausalLM` reranker that scores a (query, document)
pair using the next-token logit of a single relevance token (`true_token_id = 9454`,
from its `1_LogitScore` sentence-transformers head). Because the model uses tied
embeddings, that logit is `hidden_state · embed_tokens.weight[9454]`. This conversion
copies that single embedding row into the `score` head of a standard
`Qwen3ForSequenceClassification` model, producing a `num_labels=1` reranker whose
output logit is identical (by construction) to the original relevance score.
This makes the model loadable directly via `AutoModelForSequenceClassification` and
servable as a cross-encoder reranker (e.g. by [infinity](https://github.com/michaelfeil/infinity)),
without the causal-LM + logit-extraction path.
Conversion method: https://github.com/michaelfeil/infinity/blob/main/docs/lm_head_to_classifier/convert_lm.py
## Details
- `architectures`: `["Qwen3ForSequenceClassification"]`
- `num_labels`: 1 (single relevance logit; apply a sigmoid for a 0–1 score)
- dtype: `bfloat16` (matches the source; not downcast to fp16)
- `score` head: `Linear(hidden_size, 1, bias=False)`, weight = `embed_tokens.weight[9454]`
## Note on prompt formatting
The original model was trained with a chat template that places the query in a
`system` turn and the document in a `user` turn, followed by an assistant generation
prefix. Generic sequence-classification servers tokenize the raw `(query, document)`
pair and do **not** apply this template, which can shift scores relative to the native
sentence-transformers usage. For best fidelity, format inputs as:
```
<|im_start|>system
{query}<|im_end|>
<|im_start|>user
{document}<|im_end|>
<|im_start|>assistant
```
## Usage
```python
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
name = "baseten-admin/zerank-2-reranker-seq"
tok = AutoTokenizer.from_pretrained(name)
model = AutoModelForSequenceClassification.from_pretrained(name, torch_dtype=torch.bfloat16).eval()
query, document = "What is the capital of France?", "The capital of France is Paris."
text = (
f"<|im_start|>system\n{query}<|im_end|>\n"
f"<|im_start|>user\n{document}<|im_end|>\n"
f"<|im_start|>assistant\n"
)
with torch.no_grad():
logit = model(**tok(text, return_tensors="pt")).logits.reshape(-1)[0]
score = torch.sigmoid(logit)
print(score.item())
```