baseten-admin
/

zerank-2-reranker-seq

Text Classification

sequence-classification

Model card Files Files and versions

zerank-2-reranker-seq / README.md

baseten-admin's picture

Upload folder using huggingface_hub

ac4b5f3 verified 11 days ago

|

History Blame Contribute Delete

2.77 kB

	---
	license: apache-2.0
	base_model: zeroentropy/zerank-2-reranker
	pipeline_tag: text-classification
	tags:
	- reranker
	- cross-encoder
	- sequence-classification
	- qwen3
	---

	# zerank-2-reranker-seq

	A `Qwen3ForSequenceClassification` reranker derived from
	[`zeroentropy/zerank-2-reranker`](https://huggingface.co/zeroentropy/zerank-2-reranker).

	The original model is a `Qwen3ForCausalLM` reranker that scores a (query, document)
	pair using the next-token logit of a single relevance token (`true_token_id = 9454`,
	from its `1_LogitScore` sentence-transformers head). Because the model uses tied
	embeddings, that logit is `hidden_state · embed_tokens.weight[9454]`. This conversion
	copies that single embedding row into the `score` head of a standard
	`Qwen3ForSequenceClassification` model, producing a `num_labels=1` reranker whose
	output logit is identical (by construction) to the original relevance score.

	This makes the model loadable directly via `AutoModelForSequenceClassification` and
	servable as a cross-encoder reranker (e.g. by [infinity](https://github.com/michaelfeil/infinity)),
	without the causal-LM + logit-extraction path.

	Conversion method: https://github.com/michaelfeil/infinity/blob/main/docs/lm_head_to_classifier/convert_lm.py

	## Details

	- `architectures`: `["Qwen3ForSequenceClassification"]`
	- `num_labels`: 1 (single relevance logit; apply a sigmoid for a 0–1 score)
	- dtype: `bfloat16` (matches the source; not downcast to fp16)
	- `score` head: `Linear(hidden_size, 1, bias=False)`, weight = `embed_tokens.weight[9454]`

	## Note on prompt formatting

	The original model was trained with a chat template that places the query in a
	`system` turn and the document in a `user` turn, followed by an assistant generation
	prefix. Generic sequence-classification servers tokenize the raw `(query, document)`
	pair and do not apply this template, which can shift scores relative to the native
	sentence-transformers usage. For best fidelity, format inputs as:

	```
	<\|im_start\|>system
	{query}<\|im_end\|>
	<\|im_start\|>user
	{document}<\|im_end\|>
	<\|im_start\|>assistant
	```

	## Usage

	```python
	import torch
	from transformers import AutoModelForSequenceClassification, AutoTokenizer

	name = "baseten-admin/zerank-2-reranker-seq"
	tok = AutoTokenizer.from_pretrained(name)
	model = AutoModelForSequenceClassification.from_pretrained(name, torch_dtype=torch.bfloat16).eval()

	query, document = "What is the capital of France?", "The capital of France is Paris."
	text = (
	f"<\|im_start\|>system\n{query}<\|im_end\|>\n"
	f"<\|im_start\|>user\n{document}<\|im_end\|>\n"
	f"<\|im_start\|>assistant\n"
	)
	with torch.no_grad():
	logit = model(**tok(text, return_tensors="pt")).logits.reshape(-1)[0]
	score = torch.sigmoid(logit)
	print(score.item())
	```