Alibaba-NLP
/

E2Rank-0.6B-Embedding-Only

+---
+library_name: transformers
+pipeline_tag: feature-extraction
+license: apache-2.0
+---
+# E2Rank: Your Text Embedding can Also be an Effective and Efficient Listwise Reranker
+[![Paper](https://img.shields.io/badge/Paper-2510.22733-red)](https://huggingface.co/papers/2510.22733)
+[![Project Page](https://img.shields.io/badge/Project_Page-Website-blue)](https://alibaba-nlp.github.io/E2Rank/)
+[![GitHub](https://img.shields.io/badge/GitHub-Code-black?logo=github)](https://github.com/Alibaba-NLP/E2Rank)
+## Introduction
+We introduce $\textrm{E}^2\text{Rank}$,
+meaning **E**fficient **E**mbedding-based **Rank**ing
+(also meaning **Embedding-to-Rank**),
+which extends a single text embedding model
+to perform both high-quality retrieval and listwise reranking,
+thereby achieving strong effectiveness with remarkable efficiency.
+By applying cosine similarity between the query and
+document embeddings as a unified ranking function, the listwise ranking prompt,
+which is constructed from the original query and its candidate documents, serves
+as an enhanced query enriched with signals from the top-K documents, akin to
+pseudo-relevance feedback (PRF) in traditional retrieval models. This design
+preserves the efficiency and representational quality of the base embedding model
+while significantly improving its reranking performance.
+Empirically, E2Rank achieves state-of-the-art results on the BEIR reranking benchmark
+and demonstrates competitive performance on the reasoning-intensive BRIGHT benchmark,
+with very low reranking latency. We also show that the ranking training process
+improves embedding performance on the MTEB benchmark.
+Our findings indicate that a single embedding model can effectively unify retrieval and reranking,
+offering both computational efficiency and competitive ranking accuracy.
+**Our work highlights the potential of single embedding models to serve as unified retrieval-reranking engines, offering a practical, efficient, and accurate alternative to complex multi-stage ranking systems.**
+<div align="center">
+    <img src="https://github.com/Alibaba-NLP/E2Rank/raw/main/assets/cover.png" width="90%" height="auto" alt="Overview of E2Rank, average reranking performance on the BEIR benchmark, and reranking latency on the Covid dataset.">
+    <p style="width: 70%; margin-left: auto; margin-right: auto">
+        <b>(a)</b> Overview of E2Rank. <b>(b)</b> Average reranking performance on the BEIR benchmark, E2Rank outperforms other baselines. <b>(c)</b> Reranking latency per query on the Covid dataset, E2Rank can achieve several times the acceleration compared with RankQwen3.
+    </p>
+</div>
+## Usage
+### Embedding Model
+The usage of E2Rank as an embedding model is similar to [Qwen3-Embedding](https://github.com/QwenLM/Qwen3-Embedding). The only difference is that Qwen3-Embedding will automatically append an EOS token, while E2Rank requires users to manully append the special token `<|endoftext|>` at the end of each input text.
+<details>
+<summary><b>Transformers Usage</b></summary>
+```python
+# Requires transformers>=4.51.0
+import torch
+import torch.nn.functional as F
+from torch import Tensor
+from transformers import AutoTokenizer, AutoModel
+def last_token_pool(last_hidden_states: Tensor, attention_mask: Tensor) -> Tensor:
+    left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0])
+    if left_padding:
+        return last_hidden_states[:, -1]
+    else:
+        sequence_lengths = attention_mask.sum(dim=1) - 1
+        batch_size = last_hidden_states.shape[0]
+        return last_hidden_states[torch.arange(batch_size, device=last_hidden_states.device), sequence_lengths]
+def get_detailed_instruct(task_description: str, query: str) -> str:
+    return f'Instruct: {task_description}\
+Query:{query}'
+# Each query must come with a one-sentence instruction that describes the task
+task = 'Given a web search query, retrieve relevant passages that answer the query'
+queries = [
+    get_detailed_instruct(task, 'What is the capital of China?'),
+    get_detailed_instruct(task, 'Explain gravity')
+]
+# No need to add instruction for retrieval documents
+documents = [
+    "The capital of China is Beijing.",
+    "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun."
+]
+input_texts = queries + documents
+input_texts = [t + "<|endoftext|>" for t in input_texts]
+tokenizer = AutoTokenizer.from_pretrained('Alibaba-NLP/E2Rank-0.6B', padding_side='left')
+model = AutoModel.from_pretrained('Alibaba-NLP/E2Rank-0.6B')
+max_length = 8192
+# Tokenize the input texts
+batch_dict = tokenizer(
+    input_texts,
+    padding=True,
+    truncation=True,
+    max_length=max_length,
+    return_tensors="pt",
+)
+batch_dict.to(model.device)
+with torch.no_grad():
+    outputs = model(**batch_dict)
+    embeddings = last_token_pool(outputs.last_hidden_state, batch_dict['attention_mask'])
+    # normalize embeddings
+    embeddings = F.normalize(embeddings, p=2, dim=1)
+    scores = (embeddings[:2] @ embeddings[2:].T)
+print(scores.tolist())
+# [[0.5950675010681152, 0.030417663976550102], [0.061970409005880356, 0.562691330909729]]
+```
+</details>
+### Reranking
+For using E2Rank as a reranker, you only need to perform additional processing on the query by adding (part of) the docs that needs to be reranked to the *listwise prompt*, while the rest is the same as using the embedding model.
+<details>
+<summary><b>Transformers Usage</b></summary>
+```python
+# Requires transformers>=4.51.0
+import torch
+import torch.nn.functional as F
+from torch import Tensor
+from transformers import AutoTokenizer, AutoModel
+tokenizer = AutoTokenizer.from_pretrained('Alibaba-NLP/E2Rank-0.6B', padding_side='left')
+model = AutoModel.from_pretrained('Alibaba-NLP/E2Rank-0.6B')
+def last_token_pool(last_hidden_states: Tensor, attention_mask: Tensor) -> Tensor:
+    left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0])
+    if left_padding:
+        return last_hidden_states[:, -1]
+    else:
+        sequence_lengths = attention_mask.sum(dim=1) - 1
+        batch_size = last_hidden_states.shape[0]
+        return last_hidden_states[torch.arange(batch_size, device=last_hidden_states.device), sequence_lengths]
+def get_listwise_prompt(task_description: str, query: str, documents: list[str], num_input_docs: int = 20) -> str:
+    input_docs = documents[:num_input_docs]
+    input_docs = "\
+".join([f"[{i}] {doc}" for i, doc in enumerate(input_docs, start=1)])
+    messages = [{
+        "role": "user",
+        "content": f'{task_description}\
+Documents:\
+{input_docs}Search Query:{query}'
+    }]
+    text = tokenizer.apply_chat_template(
+        messages,
+        tokenize=False,
+        add_generation_prompt=True,
+        enable_thinking=False,
+    )
+    return text
+task = 'Given a web search query and some relevant documents, rerank the documents that answer the query:'
+queries = [
+    'What is the capital of China?',
+    'Explain gravity'
+]
+# No need to add instruction for retrieval documents
+documents = [
+    "The capital of China is Beijing.",
+    "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun."
+]
+documents = [doc + "<|endoftext|>" for doc in documents]
+pseudo_queries = [
+    get_listwise_prompt(task, queries[0], documents),
+    get_listwise_prompt(task, queries[1], documents)
+]  # no need to add the EOS token here
+input_texts = pseudo_queries + documents
+max_length = 8192
+# Tokenize the input texts
+batch_dict = tokenizer(
+    input_texts,
+    padding=True,
+    truncation=True,
+    max_length=max_length,
+    return_tensors="pt",
+)
+batch_dict.to(model.device)
+with torch.no_grad():
+    outputs = model(**batch_dict)
+    embeddings = last_token_pool(outputs.last_hidden_state, batch_dict['attention_mask'])
+    # normalize embeddings
+    embeddings = F.normalize(embeddings, p=2, dim=1)
+    scores = (embeddings[:2] @ embeddings[2:].T)
+print(scores.tolist())
+# [[0.8513513207435608, 0.24268491566181183], [0.33154672384262085, 0.7923378944396973]]
+```
+</details>
+## Citation
+If this work is helpful, please kindly cite as:
+```bibtext
+@misc{liu2025e2rank,
+      title={E2Rank: Your Text Embedding can Also be an Effective and Efficient Listwise Reranker},
+      author={Qi Liu and Yanzhao Zhang and Mingxin Li and Dingkun Long and Pengjun Xie and Jiaxin Mao},
+      year={2025},
+      eprint={2510.22733},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2510.22733},
+}
+```