Text Ranking
Transformers
Safetensors
sentence-transformers
English
Chinese
multilingual
qwen3_5_text
text-generation
reranker
retrieval
rag
agentic-search
qwen3.5
Instructions to use infgrad/Prism-Qwen3.5-Reranker-4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use infgrad/Prism-Qwen3.5-Reranker-4B with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("infgrad/Prism-Qwen3.5-Reranker-4B") model = AutoModelForCausalLM.from_pretrained("infgrad/Prism-Qwen3.5-Reranker-4B") - sentence-transformers
How to use infgrad/Prism-Qwen3.5-Reranker-4B with sentence-transformers:
from sentence_transformers import CrossEncoder model = CrossEncoder("infgrad/Prism-Qwen3.5-Reranker-4B") query = "Which planet is known as the Red Planet?" passages = [ "Venus is often called Earth's twin because of its similar size and proximity.", "Mars, known for its reddish appearance, is often referred to as the Red Planet.", "Jupiter, the largest planet in our solar system, has a prominent red spot.", "Saturn, famous for its rings, is sometimes mistaken for the Red Planet." ] scores = model.predict([(query, passage) for passage in passages]) print(scores) - Notebooks
- Google Colab
- Kaggle
Upload 2 files
Browse files- .gitattributes +1 -0
- README.md +173 -3
- model_architecture.png +3 -0
.gitattributes
CHANGED
|
@@ -34,3 +34,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
| 37 |
+
model_architecture.png filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
|
@@ -1,3 +1,173 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: mit
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
- zh
|
| 6 |
+
- multilingual
|
| 7 |
+
pipeline_tag: text-ranking
|
| 8 |
+
library_name: transformers
|
| 9 |
+
tags:
|
| 10 |
+
- reranker
|
| 11 |
+
- retrieval
|
| 12 |
+
- rag
|
| 13 |
+
- agentic-search
|
| 14 |
+
- qwen3.5
|
| 15 |
+
---
|
| 16 |
+
|
| 17 |
+
# Prism-Reranker
|
| 18 |
+
|
| 19 |
+
**Beyond Relevance Scoring — Jointly Producing Contributions and Evidence for Agentic Retrieval.**
|
| 20 |
+
|
| 21 |
+
A reranker family that, unlike standard rerankers that emit only a relevance score, returns three things in a single forward pass: a calibrated score, a one-sentence *contribution*, and a self-contained *evidence* passage extracted from the document.
|
| 22 |
+
|
| 23 |
+

|
| 24 |
+
|
| 25 |
+
## Released models
|
| 26 |
+
|
| 27 |
+
Five checkpoints are released on the Hugging Face Hub. Four are fine-tuned from the **Qwen3.5** backbone; one (`-4B-exp`) is an experimental extension built on top of **Qwen3-Reranker-4B**, demonstrating that the same recipe transfers to an existing LLM-based reranker without losing ranking quality.
|
| 28 |
+
|
| 29 |
+
| Model | Backbone | Parameters | Hugging Face |
|
| 30 |
+
|---|---|---|---|
|
| 31 |
+
| Prism-Qwen3.5-Reranker-0.8B | Qwen3.5 | 0.8B | [infgrad/Prism-Qwen3.5-Reranker-0.8B](https://huggingface.co/infgrad/Prism-Qwen3.5-Reranker-0.8B) |
|
| 32 |
+
| Prism-Qwen3.5-Reranker-2B | Qwen3.5 | 2B | [infgrad/Prism-Qwen3.5-Reranker-2B](https://huggingface.co/infgrad/Prism-Qwen3.5-Reranker-2B) |
|
| 33 |
+
| Prism-Qwen3.5-Reranker-4B | Qwen3.5 | 4B | [infgrad/Prism-Qwen3.5-Reranker-4B](https://huggingface.co/infgrad/Prism-Qwen3.5-Reranker-4B) |
|
| 34 |
+
| Prism-Qwen3.5-Reranker-9B | Qwen3.5 | 9B | [infgrad/Prism-Qwen3.5-Reranker-9B](https://huggingface.co/infgrad/Prism-Qwen3.5-Reranker-9B) |
|
| 35 |
+
| Prism-Qwen3-Reranker-4B-exp | Qwen3-Reranker-4B | 4B | [infgrad/Prism-Qwen3-Reranker-4B-exp](https://huggingface.co/infgrad/Prism-Qwen3-Reranker-4B-exp) |
|
| 36 |
+
|
| 37 |
+
|
| 38 |
+
|
| 39 |
+
## Why this model?
|
| 40 |
+
|
| 41 |
+
In agentic / RAG pipelines, a relevance score is rarely the end goal. After deciding a document is relevant, the agent still has to read it, denoise it, and decide what to do next. Prism-Reranker folds that work into the reranker itself:
|
| 42 |
+
|
| 43 |
+
- **Relevance score** — `s(q, d) = σ(ℓ_yes − ℓ_no) ∈ (0, 1)`. Calibrated, ranking-ready.
|
| 44 |
+
- **`<contribution>`** — one sentence stating *every* core point the document contributes to the query. Useful for the agent to plan its next step without re-reading the doc.
|
| 45 |
+
- **`<evidence>`** — a self-contained, faithfully-rephrased rewrite of the query-relevant content. Drops irrelevant background, preserves verbatim proper nouns / numbers / dates / code / URLs. You can feed `<evidence>` directly to a downstream LLM and skip the raw document — saving context tokens and removing web-noise.
|
| 46 |
+
|
| 47 |
+
If the document is not relevant, the model outputs `no` and stops. No contribution/evidence is generated.
|
| 48 |
+
|
| 49 |
+
## Highlights
|
| 50 |
+
|
| 51 |
+
- **Backbones**: Qwen3.5 series for the four main sizes, no architectural changes; one extension variant on top of Qwen3-Reranker-4B.
|
| 52 |
+
- **Context length**: training data capped at **10K tokens** per example, covering most real-world documents.
|
| 53 |
+
- **Multilingual**: Chinese / English primary; other languages supported but with less coverage.
|
| 54 |
+
- **Keyword-query robust**: agents often emit keyword-style queries instead of well-formed questions. ~30% of training queries were rewritten by an LLM into keyword form, so the model handles both natural and keyword queries.
|
| 55 |
+
- **Real-world data distribution**: in addition to open reranker datasets (MS MARCO, T2Ranking, MIRACL, …), training includes synthetic queries paired with real Tavily / Exa web-search results, matching what an actual agent sees at inference time.
|
| 56 |
+
- **Length × score balanced**: training data was rebalanced so that document length is not a relevance shortcut.
|
| 57 |
+
- **Training recipe**: distillation (point-wise MSE on a strong commercial reranker's scores) + SFT on `yes/no` + `<contribution>` + `<evidence>`, supervised by a 5-LLM-as-judge ensemble.
|
| 58 |
+
|
| 59 |
+
## Quickstart
|
| 60 |
+
|
| 61 |
+
```python
|
| 62 |
+
import torch
|
| 63 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 64 |
+
|
| 65 |
+
MODEL_PATH = "infgrad/Prism-Qwen3.5-Reranker-4B" # or any sibling repo above
|
| 66 |
+
|
| 67 |
+
SYSTEM_PROMPT = (
|
| 68 |
+
"Judge whether the Document meets the requirements based on "
|
| 69 |
+
"the Query and the Instruct provided. "
|
| 70 |
+
)
|
| 71 |
+
|
| 72 |
+
INSTRUCTION = (
|
| 73 |
+
'Judge if the document is relevant to the query. Reply "yes" or "no".\n'
|
| 74 |
+
'On "yes", also emit:\n'
|
| 75 |
+
"<contribution>One sentence covering every core point the document "
|
| 76 |
+
"contributes to the query, without elaboration.</contribution>\n"
|
| 77 |
+
"<evidence>Self-contained rewrite of the query-relevant content. Rules:\n"
|
| 78 |
+
"- Faithful: rephrase only; add or infer nothing.\n"
|
| 79 |
+
"- Self-contained: evidence alone must fully answer the query.\n"
|
| 80 |
+
"- Concise: drop query-irrelevant background.\n"
|
| 81 |
+
"- Verbatim (no translation): proper nouns, terms, abbreviations, "
|
| 82 |
+
"numbers, dates, code, URLs.\n"
|
| 83 |
+
"- Output language: multilingual doc -> query's language; else doc's language."
|
| 84 |
+
"</evidence>"
|
| 85 |
+
)
|
| 86 |
+
|
| 87 |
+
PROMPT_TEMPLATE = (
|
| 88 |
+
"<|im_start|>system\n{system}<|im_end|>\n"
|
| 89 |
+
"<|im_start|>user\n"
|
| 90 |
+
"<Instruct>: {instruction}\n"
|
| 91 |
+
"<Query>: {query}\n"
|
| 92 |
+
"<Document>: {doc}<|im_end|>\n"
|
| 93 |
+
"<|im_start|>assistant\n<think>\n\n</think>\n\n"
|
| 94 |
+
)
|
| 95 |
+
|
| 96 |
+
|
| 97 |
+
def build_prompt(query: str, doc: str) -> str:
|
| 98 |
+
return PROMPT_TEMPLATE.format(
|
| 99 |
+
system=SYSTEM_PROMPT, instruction=INSTRUCTION, query=query, doc=doc
|
| 100 |
+
)
|
| 101 |
+
|
| 102 |
+
|
| 103 |
+
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
|
| 104 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 105 |
+
MODEL_PATH,
|
| 106 |
+
torch_dtype=torch.bfloat16,
|
| 107 |
+
device_map="cuda",
|
| 108 |
+
attn_implementation="sdpa",
|
| 109 |
+
).eval()
|
| 110 |
+
|
| 111 |
+
yes_id = tokenizer.encode("yes", add_special_tokens=False)[0]
|
| 112 |
+
no_id = tokenizer.encode("no", add_special_tokens=False)[0]
|
| 113 |
+
|
| 114 |
+
|
| 115 |
+
@torch.no_grad()
|
| 116 |
+
def rerank(query: str, doc: str, max_new_tokens: int = 512):
|
| 117 |
+
prompt = build_prompt(query, doc)
|
| 118 |
+
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)
|
| 119 |
+
|
| 120 |
+
out = model.generate(
|
| 121 |
+
input_ids=input_ids,
|
| 122 |
+
max_new_tokens=max_new_tokens,
|
| 123 |
+
do_sample=False,
|
| 124 |
+
return_dict_in_generate=True,
|
| 125 |
+
output_scores=True,
|
| 126 |
+
pad_token_id=tokenizer.pad_token_id or tokenizer.eos_token_id,
|
| 127 |
+
)
|
| 128 |
+
|
| 129 |
+
# Relevance score = softmax over {yes, no} at the first generated token.
|
| 130 |
+
first_logprobs = torch.log_softmax(out.scores[0][0].float(), dim=-1)
|
| 131 |
+
yes_p = first_logprobs[yes_id].exp()
|
| 132 |
+
no_p = first_logprobs[no_id].exp()
|
| 133 |
+
score = (yes_p / (yes_p + no_p)).item()
|
| 134 |
+
|
| 135 |
+
# Decoded text holds yes/no plus <contribution>...</contribution><evidence>...</evidence>
|
| 136 |
+
gen_ids = out.sequences[0, input_ids.shape[1]:]
|
| 137 |
+
text = tokenizer.decode(gen_ids, skip_special_tokens=True)
|
| 138 |
+
return {"score": score, "text": text}
|
| 139 |
+
|
| 140 |
+
|
| 141 |
+
example = rerank(
|
| 142 |
+
query="What is the boiling point of water at sea level?",
|
| 143 |
+
doc=(
|
| 144 |
+
"Water boils at 100 C (212 F) at standard atmospheric pressure (1 atm), "
|
| 145 |
+
"which corresponds to sea-level conditions."
|
| 146 |
+
),
|
| 147 |
+
)
|
| 148 |
+
print(example)
|
| 149 |
+
```
|
| 150 |
+
|
| 151 |
+
Expected shape of the output:
|
| 152 |
+
|
| 153 |
+
```text
|
| 154 |
+
{
|
| 155 |
+
"score": 0.98,
|
| 156 |
+
"text": "yes\n<contribution>...</contribution>\n<evidence>...</evidence>"
|
| 157 |
+
}
|
| 158 |
+
```
|
| 159 |
+
|
| 160 |
+
For irrelevant pairs the score is close to 0 and `text` is just `"no"`.
|
| 161 |
+
|
| 162 |
+
|
| 163 |
+
## Notes on usage
|
| 164 |
+
|
| 165 |
+
- The first generated token is always `yes` or `no` — the score is well-defined even if you stop generation immediately (cheap mode: `max_new_tokens=1`). Generate further only when you also want contribution/evidence.
|
| 166 |
+
- Inputs longer than 10K tokens may degrade — truncate the document side first.
|
| 167 |
+
- Greedy decoding is fine for ranking. For diverse evidence rephrasings, use `temperature=0.3-0.5`.
|
| 168 |
+
|
| 169 |
+
|
| 170 |
+
|
| 171 |
+
## Contact
|
| 172 |
+
|
| 173 |
+
Dun Zhang — `dunnzhang0@gmail.com` (independent researcher).
|
model_architecture.png
ADDED
|
Git LFS Details
|