File size: 3,793 Bytes
c77c4bc
 
b846730
30f63c9
 
 
 
 
 
 
 
 
 
1ae2dc0
 
c77c4bc
 
30f63c9
c77c4bc
d39e92b
 
c77c4bc
30f63c9
 
c77c4bc
30f63c9
 
 
 
c77c4bc
97b668c
c77c4bc
156f34c
c77c4bc
97b668c
a75d158
7ca7333
156f34c
7ca7333
 
 
 
 
 
97b668c
 
 
 
 
1c4c371
97b668c
7ca7333
 
 
97b668c
 
 
a75d158
8aaaceb
c77c4bc
8aaaceb
97b668c
 
 
 
 
 
c77c4bc
97b668c
c77c4bc
97b668c
 
 
 
 
c77c4bc
97b668c
 
 
 
 
 
 
 
 
 
c77c4bc
d39e92b
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
---
library_name: transformers
pipeline_tag: text-ranking
license: mit
language:
- en
- zh
base_model:
- jhu-clsp/mmBERT-base
tags:
- reranker
- modernbert
- English
- zh-tw
- zh-cn
---

# AuroraX: A Fast Cross-Lingual Reranker Bridging English and Chinese

AuroraX is a lightweight yet powerful cross-lingual reranker built upon the [mmBERT-base](https://huggingface.co/jhu-clsp/mmBERT-base) architecture.
It is designed to bridge **Traditional Chinese**, **Simplified Chinese** and **English**, enabling high-quality semantic ranking across languages with remarkable efficiency.

Despite having only 110M non-embedding parameters, AuroraX achieves comparable performance to state-of-the-art rerankers that are twice as large.
Its design emphasizes both speed and language adaptability, making it ideal for real-world multilingual retrieval and re-ranking applications.

✨ Key Features:
- 🌏 **Cross-Lingual Understanding** — Trained to handle English, Traditional Chinese, and Simplified Chinese seamlessly.  
-**Lightweight & Fast** — Only 110M parameters (non-embedding), optimized for latency-sensitive pipelines.  
- 🎯 **SOTA-Level Accuracy** — Comparable or superior to larger rerankers on Chinese and English benchmarks.

---

## Evaluation

### Monolingual Benchmarks

| Model | Metric | CMedQAv2-reranking (ZH) | T2Reranking (ZH) | **ZH AVG** | AskUbuntuDupQuestions (EN) | HUMENews21InstructionReranking (EN) | HUMEWikipediaRerankingMultilingual (EN) | SciDocsRR (EN) | **EN AVG** | **Total AVG** |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| **AuroraX-Reranker-Base-v1.0**<br>*(Ours, 300M with 100M non-embed params)* | **mrr@10** | **0.8201** | **0.8554** | **0.8378** | **0.7936** | **1.0000** | **0.9778** | **0.9305** | **0.9255** | **0.8962** |
|  | **mrr@5** | 0.8145 | 0.8514 | 0.8329 | 0.7841 | 1.0000 | 0.9778 | 0.9289 | 0.9227 | 0.8928 |
| **bge-reranker-v2-m3**<br>*(600M params)* | **mrr@10** | 0.8598 | 0.8004 | 0.8301 | 0.7635 | 0.9839 | 0.8750 | 0.9211 | 0.8859 | 0.8673 |
|  | **mrr@5** | 0.8569 | 0.7954 | 0.8262 | 0.7532 | 0.9839 | 0.8750 | 0.9191 | 0.8828 | 0.8639 |
| **jina-reranker-v2-base-multilingual**<br>*(300M params)* | **mrr@10** | 0.2828 | 0.7577 | 0.5203 | 0.7420 | 1.0000 | 0.8761 | 0.9478 | 0.8915 | 0.7677 |
|  | **mrr@5** | 0.2759 | 0.7512 | 0.5136 | 0.7299 | 1.0000 | 0.8761 | 0.9467 | 0.8882 | 0.7633 |

---

### Cross-Lingual (ZH ↔ EN) Results

| Model | inhouse-en2zh (HitRate@5) | inhouse-zh2en (HitRate@5) |
| --- | --- | --- |
| **AuroraX-Reranker-Base-v1.0 (Ours, 300M with 100M non-embed params)** | **0.8459** | **0.9427** |
| **bge-reranker-v2-m3 (600M params)** | 0.8179 | 0.9160 |
| **jina-reranker-v2-base-multilingual (300M params)** | 0.7815 | 0.8855 |

---
## Usage

### Sentence-Transformers

```py
from sentence_transformers import CrossEncoder

model = CrossEncoder("aqweteddy/AuroraX-Reranker-Base-v1.0")
score = model.predict([("What is Deep Learning?", "Deep learning is a subfield of ML...")])
print(score)
```

### Text-Embedding-Inference (API)

1. Install and launch the router:

```bash
text-embeddings-router --model-id aqweteddy/AuroraX-Reranker-Base-v1.0
```

2. Run via REST API:

```bash
curl 127.0.0.1:8080/rerank \
  -X POST \
  -d '{"query": "What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."]}' \
  -H 'Content-Type: application/json'
```

---

## Citation

```
@misc{aurorax2025,
  title         = {AuroraX: A Fast Cross-Lingual Reranker Bridging English and Chinese},
  author        = {aqweteddy},
  year          = {2025},
  howpublished  = {\url{https://huggingface.co/aqweteddy/AuroraX-Reranker-Base-v1.0}},
  note          = {Lightweight and powerful eranker for English, Traditional Chinese, and Simplified Chinese}
}
```