| --- |
| license: apache-2.0 |
| base_model: zeroentropy/zerank-2-reranker |
| pipeline_tag: text-classification |
| tags: |
| - reranker |
| - cross-encoder |
| - sequence-classification |
| - qwen3 |
| --- |
| |
| # zerank-2-reranker-seq |
|
|
| A `Qwen3ForSequenceClassification` reranker derived from |
| [`zeroentropy/zerank-2-reranker`](https://huggingface.co/zeroentropy/zerank-2-reranker). |
|
|
| The original model is a `Qwen3ForCausalLM` reranker that scores a (query, document) |
| pair using the next-token logit of a single relevance token (`true_token_id = 9454`, |
| from its `1_LogitScore` sentence-transformers head). Because the model uses tied |
| embeddings, that logit is `hidden_state · embed_tokens.weight[9454]`. This conversion |
| copies that single embedding row into the `score` head of a standard |
| `Qwen3ForSequenceClassification` model, producing a `num_labels=1` reranker whose |
| output logit is identical (by construction) to the original relevance score. |
|
|
| This makes the model loadable directly via `AutoModelForSequenceClassification` and |
| servable as a cross-encoder reranker (e.g. by [infinity](https://github.com/michaelfeil/infinity)), |
| without the causal-LM + logit-extraction path. |
|
|
| Conversion method: https://github.com/michaelfeil/infinity/blob/main/docs/lm_head_to_classifier/convert_lm.py |
|
|
| ## Details |
|
|
| - `architectures`: `["Qwen3ForSequenceClassification"]` |
| - `num_labels`: 1 (single relevance logit; apply a sigmoid for a 0–1 score) |
| - dtype: `bfloat16` (matches the source; not downcast to fp16) |
| - `score` head: `Linear(hidden_size, 1, bias=False)`, weight = `embed_tokens.weight[9454]` |
|
|
| ## Note on prompt formatting |
|
|
| The original model was trained with a chat template that places the query in a |
| `system` turn and the document in a `user` turn, followed by an assistant generation |
| prefix. Generic sequence-classification servers tokenize the raw `(query, document)` |
| pair and do **not** apply this template, which can shift scores relative to the native |
| sentence-transformers usage. For best fidelity, format inputs as: |
|
|
| ``` |
| <|im_start|>system |
| {query}<|im_end|> |
| <|im_start|>user |
| {document}<|im_end|> |
| <|im_start|>assistant |
| ``` |
|
|
| ## Usage |
|
|
| ```python |
| import torch |
| from transformers import AutoModelForSequenceClassification, AutoTokenizer |
| |
| name = "baseten-admin/zerank-2-reranker-seq" |
| tok = AutoTokenizer.from_pretrained(name) |
| model = AutoModelForSequenceClassification.from_pretrained(name, torch_dtype=torch.bfloat16).eval() |
| |
| query, document = "What is the capital of France?", "The capital of France is Paris." |
| text = ( |
| f"<|im_start|>system\n{query}<|im_end|>\n" |
| f"<|im_start|>user\n{document}<|im_end|>\n" |
| f"<|im_start|>assistant\n" |
| ) |
| with torch.no_grad(): |
| logit = model(**tok(text, return_tensors="pt")).logits.reshape(-1)[0] |
| score = torch.sigmoid(logit) |
| print(score.item()) |
| ``` |
|
|