Alibaba-NLP
/

E2Rank-8B-Embedding-Only

+---
+license: apache-2.0
+library_name: transformers
+pipeline_tag: feature-extraction
+---
+# E2Rank: Your Text Embedding can Also be an Effective and Efficient Listwise Reranker
+We introduce $\textrm{E}^2\text{Rank}$, meaning **E**fficient **E**mbedding-based **Rank**ing (also meaning **Embedding-to-Rank**), which extends a single text embedding model to perform both high-quality retrieval and listwise reranking, thereby achieving strong effectiveness with remarkable efficiency.
+This model is presented in the paper: [$\text{E}^2\text{Rank}$: Your Text Embedding can Also be an Effective and Efficient Listwise Reranker](https://huggingface.co/papers/2510.22733).
+**Project Page**: https://alibaba-nlp.github.io/E2Rank/
+**Code**: https://github.com/Alibaba-NLP/E2Rank
+<div align="center">
+    <img src="https://github.com/Alibaba-NLP/E2Rank/raw/main/assets/cover.png" width="90%" height="auto" />
+    <p style="width: 70%; margin-left: auto; margin-right: auto">
+        <b>(a)</b> Overview of E2Rank. <b>(b)</b> Average reranking performance on the BEIR benchmark, E2Rank outperforms other baselines. <b>(c)</b> Reranking latency per query on the Covid dataset, E2Rank can achieve several times the acceleration compared with RankQwen3.
+    </p>
+</div>
+## Introduction
+We introduce $\textrm{E}^2\text{Rank}$,
+meaning **E**fficient **E**mbedding-based **Rank**ing
+(also meaning **Embedding-to-Rank**),
+which extends a single text embedding model
+to perform both high-quality retrieval and listwise reranking,
+thereby achieving strong effectiveness with remarkable efficiency.
+By applying cosine similarity between the query and
+document embeddings as a unified ranking function, the listwise ranking prompt,
+which is constructed from the original query and its candidate documents, serves
+as an enhanced query enriched with signals from the top-K documents, akin to
+pseudo-relevance feedback (PRF) in traditional retrieval models. This design
+preserves the efficiency and representational quality of the base embedding model
+while significantly improving its reranking performance.
+Empirically, E2Rank achieves state-of-the-art results on the BEIR reranking benchmark
+and demonstrates competitive performance on the reasoning-intensive BRIGHT benchmark,
+with very low reranking latency. We also show that the ranking training process
+improves embedding performance on the MTEB benchmark.
+Our findings indicate that a single embedding model can effectively unify retrieval and reranking,
+offering both computational efficiency and competitive ranking accuracy.
+Our work highlights the potential of single embedding models to serve as unified retrieval-reranking engines, offering a practical, efficient, and accurate alternative to complex multi-stage ranking systems.
+## Abstract
+Text embedding models serve as a fundamental component in real-world search applications. By mapping queries and documents into a shared embedding space, they deliver competitive retrieval performance with high efficiency. However, their ranking fidelity remains limited compared to dedicated rerankers, especially recent LLM-based listwise rerankers, which capture fine-grained query-document and document-document interactions. In this paper, we propose a simple yet effective unified framework $\text{E}^2\text{Rank}$, means Efficient Embedding-based Ranking (also means Embedding-to-Rank), which extends a single text embedding model to perform both high-quality retrieval and listwise reranking through continued training under a listwise ranking objective, thereby achieving strong effectiveness with remarkable efficiency. By applying cosine similarity between the query and document embeddings as a unified ranking function, the listwise ranking prompt, which is constructed from the original query and its candidate documents, serves as an enhanced query enriched with signals from the top-K documents, akin to pseudo-relevance feedback (PRF) in traditional retrieval models. This design preserves the efficiency and representational quality of the base embedding model while significantly improving its reranking performance. Empirically, $\textrm{E}^2\text{Rank}$ achieves state-of-the-art results on the BEIR reranking benchmark and demonstrates competitive performance on the reasoning-intensive BRIGHT benchmark, with very low reranking latency. We also show that the ranking training process improves embedding performance on the MTEB benchmark. Our findings indicate that a single embedding model can effectively unify retrieval and reranking, offering both computational efficiency and competitive ranking accuracy.
+## Usage
+### Embedding Model
+The usage of E2Rank as an embedding model is similar to [Qwen3-Embedding](https://github.com/QwenLM/Qwen3-Embedding). The only difference is that Qwen3-Embedding will automatically append an EOS token, while E2Rank requires users to manually append the special token `<|endoftext|>` at the end of each input text.
+The following code demonstrates how to use `Alibaba-NLP/E2Rank-0.6B` (or other E2Rank models) with the Hugging Face `transformers` library to obtain embeddings.
+```python
+# Requires transformers>=4.51.0
+import torch
+import torch.nn.functional as F
+from torch import Tensor
+from transformers import AutoTokenizer, AutoModel
+def last_token_pool(last_hidden_states: Tensor, attention_mask: Tensor) -> Tensor:
+    left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0])
+    if left_padding:
+        return last_hidden_states[:, -1]
+    else:
+        sequence_lengths = attention_mask.sum(dim=1) - 1
+        batch_size = last_hidden_states.shape[0]
+        return last_hidden_states[torch.arange(batch_size, device=last_hidden_states.device), sequence_lengths]
+def get_detailed_instruct(task_description: str, query: str) -> str:
+    return f'Instruct: {task_description}\
+Query:{query}'
+# Each query must come with a one-sentence instruction that describes the task
+task = 'Given a web search query, retrieve relevant passages that answer the query'
+queries = [
+    get_detailed_instruct(task, 'What is the capital of China?'),
+    get_detailed_instruct(task, 'Explain gravity')
+]
+# No need to add instruction for retrieval documents
+documents = [
+    "The capital of China is Beijing.",
+    "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun."
+]
+input_texts = queries + documents
+input_texts = [t + "<|endoftext|>" for t in input_texts]
+tokenizer = AutoTokenizer.from_pretrained('Alibaba-NLP/E2Rank-0.6B', padding_side='left')
+model = AutoModel.from_pretrained('Alibaba-NLP/E2Rank-0.6B')
+max_length = 8192
+# Tokenize the input texts
+batch_dict = tokenizer(
+    input_texts,
+    padding=True,
+    truncation=True,
+    max_length=max_length,
+    return_tensors="pt",
+)
+batch_dict.to(model.device)
+with torch.no_grad():
+    outputs = model(**batch_dict)
+    embeddings = last_token_pool(outputs.last_hidden_state, batch_dict['attention_mask'])
+    # normalize embeddings
+    embeddings = F.normalize(embeddings, p=2, dim=1)
+    scores = (embeddings[:2] @ embeddings[2:].T)
+print(scores.tolist())
+# [[0.5950675010681152, 0.030417663976550102], [0.061970409005880356, 0.562691330909729]]
+```
+## Citation
+If this work is helpful, please kindly cite as:
+```bibtext
+@misc{liu2025e2rank,
+      title={E2Rank: Your Text Embedding can Also be an Effective and Efficient Listwise Reranker},
+      author={Qi Liu and Yanzhao Zhang and Mingxin Li and Dingkun Long and Pengjun Xie and Jiaxin Mao},
+      year={2025},
+      eprint={2510.22733},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2510.22733},
+}
+```