--- license: apache-2.0 base_model: zeroentropy/zerank-2-reranker pipeline_tag: text-classification tags: - reranker - cross-encoder - sequence-classification - qwen3 --- # zerank-2-reranker-seq A `Qwen3ForSequenceClassification` reranker derived from [`zeroentropy/zerank-2-reranker`](https://huggingface.co/zeroentropy/zerank-2-reranker). The original model is a `Qwen3ForCausalLM` reranker that scores a (query, document) pair using the next-token logit of a single relevance token (`true_token_id = 9454`, from its `1_LogitScore` sentence-transformers head). Because the model uses tied embeddings, that logit is `hidden_state · embed_tokens.weight[9454]`. This conversion copies that single embedding row into the `score` head of a standard `Qwen3ForSequenceClassification` model, producing a `num_labels=1` reranker whose output logit is identical (by construction) to the original relevance score. This makes the model loadable directly via `AutoModelForSequenceClassification` and servable as a cross-encoder reranker (e.g. by [infinity](https://github.com/michaelfeil/infinity)), without the causal-LM + logit-extraction path. Conversion method: https://github.com/michaelfeil/infinity/blob/main/docs/lm_head_to_classifier/convert_lm.py ## Details - `architectures`: `["Qwen3ForSequenceClassification"]` - `num_labels`: 1 (single relevance logit; apply a sigmoid for a 0–1 score) - dtype: `bfloat16` (matches the source; not downcast to fp16) - `score` head: `Linear(hidden_size, 1, bias=False)`, weight = `embed_tokens.weight[9454]` ## Note on prompt formatting The original model was trained with a chat template that places the query in a `system` turn and the document in a `user` turn, followed by an assistant generation prefix. Generic sequence-classification servers tokenize the raw `(query, document)` pair and do **not** apply this template, which can shift scores relative to the native sentence-transformers usage. For best fidelity, format inputs as: ``` <|im_start|>system {query}<|im_end|> <|im_start|>user {document}<|im_end|> <|im_start|>assistant ``` ## Usage ```python import torch from transformers import AutoModelForSequenceClassification, AutoTokenizer name = "baseten-admin/zerank-2-reranker-seq" tok = AutoTokenizer.from_pretrained(name) model = AutoModelForSequenceClassification.from_pretrained(name, torch_dtype=torch.bfloat16).eval() query, document = "What is the capital of France?", "The capital of France is Paris." text = ( f"<|im_start|>system\n{query}<|im_end|>\n" f"<|im_start|>user\n{document}<|im_end|>\n" f"<|im_start|>assistant\n" ) with torch.no_grad(): logit = model(**tok(text, return_tensors="pt")).logits.reshape(-1)[0] score = torch.sigmoid(logit) print(score.item()) ```