Add complete model card for E2Rank-0.6B

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +226 -0
README.md ADDED
@@ -0,0 +1,226 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ pipeline_tag: feature-extraction
4
+ license: apache-2.0
5
+ ---
6
+
7
+ # E2Rank: Your Text Embedding can Also be an Effective and Efficient Listwise Reranker
8
+
9
+ [![Paper](https://img.shields.io/badge/Paper-2510.22733-red)](https://huggingface.co/papers/2510.22733)
10
+ [![Project Page](https://img.shields.io/badge/Project_Page-Website-blue)](https://alibaba-nlp.github.io/E2Rank/)
11
+ [![GitHub](https://img.shields.io/badge/GitHub-Code-black?logo=github)](https://github.com/Alibaba-NLP/E2Rank)
12
+
13
+ ## Introduction
14
+ We introduce $\textrm{E}^2\text{Rank}$,
15
+ meaning **E**fficient **E**mbedding-based **Rank**ing
16
+ (also meaning **Embedding-to-Rank**),
17
+ which extends a single text embedding model
18
+ to perform both high-quality retrieval and listwise reranking,
19
+ thereby achieving strong effectiveness with remarkable efficiency.
20
+
21
+ By applying cosine similarity between the query and
22
+ document embeddings as a unified ranking function, the listwise ranking prompt,
23
+ which is constructed from the original query and its candidate documents, serves
24
+ as an enhanced query enriched with signals from the top-K documents, akin to
25
+ pseudo-relevance feedback (PRF) in traditional retrieval models. This design
26
+ preserves the efficiency and representational quality of the base embedding model
27
+ while significantly improving its reranking performance.
28
+
29
+ Empirically, E2Rank achieves state-of-the-art results on the BEIR reranking benchmark
30
+ and demonstrates competitive performance on the reasoning-intensive BRIGHT benchmark,
31
+ with very low reranking latency. We also show that the ranking training process
32
+ improves embedding performance on the MTEB benchmark.
33
+ Our findings indicate that a single embedding model can effectively unify retrieval and reranking,
34
+ offering both computational efficiency and competitive ranking accuracy.
35
+
36
+ **Our work highlights the potential of single embedding models to serve as unified retrieval-reranking engines, offering a practical, efficient, and accurate alternative to complex multi-stage ranking systems.**
37
+
38
+ <div align="center">
39
+ <img src="https://github.com/Alibaba-NLP/E2Rank/raw/main/assets/cover.png" width="90%" height="auto" alt="Overview of E2Rank, average reranking performance on the BEIR benchmark, and reranking latency on the Covid dataset.">
40
+ <p style="width: 70%; margin-left: auto; margin-right: auto">
41
+ <b>(a)</b> Overview of E2Rank. <b>(b)</b> Average reranking performance on the BEIR benchmark, E2Rank outperforms other baselines. <b>(c)</b> Reranking latency per query on the Covid dataset, E2Rank can achieve several times the acceleration compared with RankQwen3.
42
+ </p>
43
+ </div>
44
+
45
+ ## Usage
46
+
47
+ ### Embedding Model
48
+
49
+ The usage of E2Rank as an embedding model is similar to [Qwen3-Embedding](https://github.com/QwenLM/Qwen3-Embedding). The only difference is that Qwen3-Embedding will automatically append an EOS token, while E2Rank requires users to manully append the special token `<|endoftext|>` at the end of each input text.
50
+
51
+ <details>
52
+ <summary><b>Transformers Usage</b></summary>
53
+
54
+ ```python
55
+ # Requires transformers>=4.51.0
56
+ import torch
57
+ import torch.nn.functional as F
58
+
59
+ from torch import Tensor
60
+ from transformers import AutoTokenizer, AutoModel
61
+
62
+
63
+ def last_token_pool(last_hidden_states: Tensor, attention_mask: Tensor) -> Tensor:
64
+ left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0])
65
+ if left_padding:
66
+ return last_hidden_states[:, -1]
67
+ else:
68
+ sequence_lengths = attention_mask.sum(dim=1) - 1
69
+ batch_size = last_hidden_states.shape[0]
70
+ return last_hidden_states[torch.arange(batch_size, device=last_hidden_states.device), sequence_lengths]
71
+
72
+
73
+ def get_detailed_instruct(task_description: str, query: str) -> str:
74
+ return f'Instruct: {task_description}\
75
+ Query:{query}'
76
+
77
+ # Each query must come with a one-sentence instruction that describes the task
78
+ task = 'Given a web search query, retrieve relevant passages that answer the query'
79
+
80
+ queries = [
81
+ get_detailed_instruct(task, 'What is the capital of China?'),
82
+ get_detailed_instruct(task, 'Explain gravity')
83
+ ]
84
+ # No need to add instruction for retrieval documents
85
+ documents = [
86
+ "The capital of China is Beijing.",
87
+ "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun."
88
+ ]
89
+ input_texts = queries + documents
90
+ input_texts = [t + "<|endoftext|>" for t in input_texts]
91
+
92
+ tokenizer = AutoTokenizer.from_pretrained('Alibaba-NLP/E2Rank-0.6B', padding_side='left')
93
+ model = AutoModel.from_pretrained('Alibaba-NLP/E2Rank-0.6B')
94
+
95
+ max_length = 8192
96
+
97
+ # Tokenize the input texts
98
+ batch_dict = tokenizer(
99
+ input_texts,
100
+ padding=True,
101
+ truncation=True,
102
+ max_length=max_length,
103
+ return_tensors="pt",
104
+ )
105
+ batch_dict.to(model.device)
106
+ with torch.no_grad():
107
+ outputs = model(**batch_dict)
108
+ embeddings = last_token_pool(outputs.last_hidden_state, batch_dict['attention_mask'])
109
+
110
+ # normalize embeddings
111
+ embeddings = F.normalize(embeddings, p=2, dim=1)
112
+ scores = (embeddings[:2] @ embeddings[2:].T)
113
+
114
+ print(scores.tolist())
115
+ # [[0.5950675010681152, 0.030417663976550102], [0.061970409005880356, 0.562691330909729]]
116
+ ```
117
+ </details>
118
+
119
+ ### Reranking
120
+
121
+ For using E2Rank as a reranker, you only need to perform additional processing on the query by adding (part of) the docs that needs to be reranked to the *listwise prompt*, while the rest is the same as using the embedding model.
122
+
123
+ <details>
124
+ <summary><b>Transformers Usage</b></summary>
125
+
126
+ ```python
127
+ # Requires transformers>=4.51.0
128
+ import torch
129
+ import torch.nn.functional as F
130
+
131
+ from torch import Tensor
132
+ from transformers import AutoTokenizer, AutoModel
133
+
134
+
135
+ tokenizer = AutoTokenizer.from_pretrained('Alibaba-NLP/E2Rank-0.6B', padding_side='left')
136
+ model = AutoModel.from_pretrained('Alibaba-NLP/E2Rank-0.6B')
137
+
138
+
139
+ def last_token_pool(last_hidden_states: Tensor, attention_mask: Tensor) -> Tensor:
140
+ left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0])
141
+ if left_padding:
142
+ return last_hidden_states[:, -1]
143
+ else:
144
+ sequence_lengths = attention_mask.sum(dim=1) - 1
145
+ batch_size = last_hidden_states.shape[0]
146
+ return last_hidden_states[torch.arange(batch_size, device=last_hidden_states.device), sequence_lengths]
147
+
148
+
149
+ def get_listwise_prompt(task_description: str, query: str, documents: list[str], num_input_docs: int = 20) -> str:
150
+ input_docs = documents[:num_input_docs]
151
+ input_docs = "\
152
+ ".join([f"[{i}] {doc}" for i, doc in enumerate(input_docs, start=1)])
153
+ messages = [{
154
+ "role": "user",
155
+ "content": f'{task_description}\
156
+ Documents:\
157
+ {input_docs}Search Query:{query}'
158
+ }]
159
+ text = tokenizer.apply_chat_template(
160
+ messages,
161
+ tokenize=False,
162
+ add_generation_prompt=True,
163
+ enable_thinking=False,
164
+ )
165
+ return text
166
+
167
+ task = 'Given a web search query and some relevant documents, rerank the documents that answer the query:'
168
+
169
+ queries = [
170
+ 'What is the capital of China?',
171
+ 'Explain gravity'
172
+ ]
173
+
174
+ # No need to add instruction for retrieval documents
175
+ documents = [
176
+ "The capital of China is Beijing.",
177
+ "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun."
178
+ ]
179
+ documents = [doc + "<|endoftext|>" for doc in documents]
180
+
181
+ pseudo_queries = [
182
+ get_listwise_prompt(task, queries[0], documents),
183
+ get_listwise_prompt(task, queries[1], documents)
184
+ ] # no need to add the EOS token here
185
+
186
+ input_texts = pseudo_queries + documents
187
+
188
+
189
+ max_length = 8192
190
+ # Tokenize the input texts
191
+ batch_dict = tokenizer(
192
+ input_texts,
193
+ padding=True,
194
+ truncation=True,
195
+ max_length=max_length,
196
+ return_tensors="pt",
197
+ )
198
+ batch_dict.to(model.device)
199
+ with torch.no_grad():
200
+ outputs = model(**batch_dict)
201
+ embeddings = last_token_pool(outputs.last_hidden_state, batch_dict['attention_mask'])
202
+
203
+ # normalize embeddings
204
+ embeddings = F.normalize(embeddings, p=2, dim=1)
205
+ scores = (embeddings[:2] @ embeddings[2:].T)
206
+
207
+ print(scores.tolist())
208
+ # [[0.8513513207435608, 0.24268491566181183], [0.33154672384262085, 0.7923378944396973]]
209
+ ```
210
+ </details>
211
+
212
+ ## Citation
213
+
214
+ If this work is helpful, please kindly cite as:
215
+
216
+ ```bibtext
217
+ @misc{liu2025e2rank,
218
+ title={E2Rank: Your Text Embedding can Also be an Effective and Efficient Listwise Reranker},
219
+ author={Qi Liu and Yanzhao Zhang and Mingxin Li and Dingkun Long and Pengjun Xie and Jiaxin Mao},
220
+ year={2025},
221
+ eprint={2510.22733},
222
+ archivePrefix={arXiv},
223
+ primaryClass={cs.CL},
224
+ url={https://arxiv.org/abs/2510.22733},
225
+ }
226
+ ```