Alibaba-NLP
/

E2Rank-0.6B

@@ -47,9 +47,10 @@ pipeline_tag: feature-extraction
 </div>
-<a href="https://Alibaba-NLP.github.io/E2Rank/">[🤖 Website]</a> |
-<a href="https://arxiv.org/pdf/2510.00000">[📄 Arxiv Paper]</a> |
-<a href="https://huggingface.co/collections/Alibaba-NLP/e2rank">[🤗 Huggingface Collection]</a> |
 </div>
@@ -367,7 +368,49 @@ print(scores.tolist())
 Since E2Rank extends a single text embedding model to perform both high-quality retrieval and listwise reranking, you can directly use it to build an end-to-end search system. By reusing the embeddings computed during the retrieval stage, E2Rank only need to compute the pseudo query's embedding and can efficiently rerank the retrieved documents with minimal additional computational overhead.
-Example code is comming soon.
 # 🚩 Citation

 </div>
+<a href="https://Alibaba-NLP.github.io/E2Rank/">🤖 Website</a> |
+<a href="https://arxiv.org/pdf/2510.00000">📄 Arxiv Paper</a> |
+<a href="https://huggingface.co/collections/Alibaba-NLP/e2rank">🤗 Huggingface Collection</a> |
+<a href="#🚩-citation">🚩 Citation</a>
 </div>
 Since E2Rank extends a single text embedding model to perform both high-quality retrieval and listwise reranking, you can directly use it to build an end-to-end search system. By reusing the embeddings computed during the retrieval stage, E2Rank only need to compute the pseudo query's embedding and can efficiently rerank the retrieved documents with minimal additional computational overhead.
+Example code is coming soon.
+# 📊 Evaluation
+## Reranking Benchmark
+### BEIR
+|                                                            | Covid | NFCorpus | Touche | DBPedia | SciFact | Signal |  News | Robust |    Avg.   |
+|------------------------------------------------------------|:-----:|:--------:|:------:|:-------:|:-------:|:------:|:-----:|:------:|:---------:|
+| BM25                                                       | 59.47 |   30.75  |  44.22 |  31.80  |  67.89  |  33.05 | 39.52 |  40.70 |   43.43   |
+| *Zero-shot Listwise Reranker*                              |       |          |        |         |         |        |       |        |           |
+| RankGPT-4o                                                 | 83.41 |   39.67  |  32.26 |  45.56  |  77.41  |  34.20 | 51.92 |  60.25 |   53.09   |
+| RankGPT-4o-mini                                            | 80.03 |   38.73  |  30.91 |  44.54  |  73.14  |  33.64 | 50.91 |  57.41 |   51.16   |
+| RankQwen3-14B                                              | 84.45 |   38.94  |  38.30 |  44.52  |  78.64  |  33.58 | 51.24 |  59.66 |   53.67   |
+| RankQwen3-32B                                              | 83.48 |   39.22  |  37.13 |  45.00  |  78.22  |  32.12 | 51.08 |  60.74 |   53.37   |
+| *Fine-tuned Listwise Reranker based on Qwen3*              |       |          |        |         |         |        |       |        |           |
+| RankQwen3-0.6B                                             | 78.35 |   36.41  |  37.54 |  39.19  |  71.01  |  30.96 | 44.43 |  46.31 |   48.03   |
+| RankQwen3-4B                                               | 83.91 |   39.88  |  32.66 |  43.91  |  76.37  |  32.15 | 50.81 |  59.36 |   52.38   |
+| RankQwen3-8B                                               | 85.37 |   40.05  |  31.73 |  45.44  |  78.96  |  32.48 | 52.36 |  60.72 |   53.39   |
+| *Ours*                                                     |       |          |        |         |         |        |       |        |           |
+| **E2Rank-0.6B**                                            | 79.17 |   38.60  |  41.91 |  41.96  |  73.43  |  35.26 | 52.75 |  53.67 |   52.09   |
+| **E2Rank-4B**                                              | 83.30 |   39.20  |  43.16 |  42.95  |  77.19  |  34.48 | 52.71 |  60.16 |   54.14   |
+| **E2Rank-8B**                                              | 84.09 |   39.08  |  42.06 |  43.44  |  77.49  |  34.01 | 54.25 |  60.34 | **54.35** |
+## Embedding Benchmark
+### MTEB (Eng, v1)
+| Models                             | Retr. | Rerank. | Clust. | PairClass. | Class. |  STS  | Summ. |    Avg.   |
+|------------------------------------|:-----:|:-------:|:------:|:----------:|:------:|:-----:|:-----:|:---------:|
+| Instructor-xl                      | 49.26 |  57.29  |  44.74 |    86.62   |  73.12 | 83.06 | 32.32 | 61.79     |
+| BGE-large-en-v1.5                  | 54.29 |  60.03  |  46.08 |    87.12   |  75.97 | 83.11 | 31.61 | 64.23     |
+| GritLM-7B                          | 53.10 |  61.30  |  48.90 |    86.90   |  77.00 | 82.80 | 29.40 | 64.70     |
+| E5-Mistral-7b-v1                   | 52.78 |  60.38  |  47.78 |    88.47   |  76.80 | 83.77 | 31.90 | 64.56     |
+| Echo-Mistral-7b-v1                 | 55.52 |  58.14  |  46.32 |    87.34   |  77.43 | 82.56 | 30.73 | 64.68     |
+| LLM2Vec-Mistral-7B                 | 55.99 |  58.42  |  45.54 |    87.99   |  76.63 | 84.09 | 29.96 | 64.80     |
+| LLM2Vec-Meta-LLaMA-3-8B            | 56.63 |  59.68  |  46.45 |    87.80   |  75.92 | 83.58 | 30.94 | 65.01     |
+| **E2Rank-0.6B**                    | 51.74 |  55.97  |  40.85 |    83.93   |  73.66 | 81.41 | 30.90 | 61.25     |
+| **E2Rank-4B**                      | 55.33 |  59.10  |  44.27 |    87.14   |  77.08 | 84.03 | 30.06 | 64.47     |
+| **E2Rank-8B**                      | 56.89 |  59.58  |  44.75 |    86.96   |  76.81 | 84.52 | 30.23 | **65.03** |
+> Note: For baselines, we only compared with models that are trained using public datasets.
 # 🚩 Citation