File size: 3,793 Bytes
c77c4bc b846730 30f63c9 1ae2dc0 c77c4bc 30f63c9 c77c4bc d39e92b c77c4bc 30f63c9 c77c4bc 30f63c9 c77c4bc 97b668c c77c4bc 156f34c c77c4bc 97b668c a75d158 7ca7333 156f34c 7ca7333 97b668c 1c4c371 97b668c 7ca7333 97b668c a75d158 8aaaceb c77c4bc 8aaaceb 97b668c c77c4bc 97b668c c77c4bc 97b668c c77c4bc 97b668c c77c4bc d39e92b | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 | ---
library_name: transformers
pipeline_tag: text-ranking
license: mit
language:
- en
- zh
base_model:
- jhu-clsp/mmBERT-base
tags:
- reranker
- modernbert
- English
- zh-tw
- zh-cn
---
# AuroraX: A Fast Cross-Lingual Reranker Bridging English and Chinese
AuroraX is a lightweight yet powerful cross-lingual reranker built upon the [mmBERT-base](https://huggingface.co/jhu-clsp/mmBERT-base) architecture.
It is designed to bridge **Traditional Chinese**, **Simplified Chinese** and **English**, enabling high-quality semantic ranking across languages with remarkable efficiency.
Despite having only 110M non-embedding parameters, AuroraX achieves comparable performance to state-of-the-art rerankers that are twice as large.
Its design emphasizes both speed and language adaptability, making it ideal for real-world multilingual retrieval and re-ranking applications.
✨ Key Features:
- 🌏 **Cross-Lingual Understanding** — Trained to handle English, Traditional Chinese, and Simplified Chinese seamlessly.
- ⚡ **Lightweight & Fast** — Only 110M parameters (non-embedding), optimized for latency-sensitive pipelines.
- 🎯 **SOTA-Level Accuracy** — Comparable or superior to larger rerankers on Chinese and English benchmarks.
---
## Evaluation
### Monolingual Benchmarks
| Model | Metric | CMedQAv2-reranking (ZH) | T2Reranking (ZH) | **ZH AVG** | AskUbuntuDupQuestions (EN) | HUMENews21InstructionReranking (EN) | HUMEWikipediaRerankingMultilingual (EN) | SciDocsRR (EN) | **EN AVG** | **Total AVG** |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| **AuroraX-Reranker-Base-v1.0**<br>*(Ours, 300M with 100M non-embed params)* | **mrr@10** | **0.8201** | **0.8554** | **0.8378** | **0.7936** | **1.0000** | **0.9778** | **0.9305** | **0.9255** | **0.8962** |
| | **mrr@5** | 0.8145 | 0.8514 | 0.8329 | 0.7841 | 1.0000 | 0.9778 | 0.9289 | 0.9227 | 0.8928 |
| **bge-reranker-v2-m3**<br>*(600M params)* | **mrr@10** | 0.8598 | 0.8004 | 0.8301 | 0.7635 | 0.9839 | 0.8750 | 0.9211 | 0.8859 | 0.8673 |
| | **mrr@5** | 0.8569 | 0.7954 | 0.8262 | 0.7532 | 0.9839 | 0.8750 | 0.9191 | 0.8828 | 0.8639 |
| **jina-reranker-v2-base-multilingual**<br>*(300M params)* | **mrr@10** | 0.2828 | 0.7577 | 0.5203 | 0.7420 | 1.0000 | 0.8761 | 0.9478 | 0.8915 | 0.7677 |
| | **mrr@5** | 0.2759 | 0.7512 | 0.5136 | 0.7299 | 1.0000 | 0.8761 | 0.9467 | 0.8882 | 0.7633 |
---
### Cross-Lingual (ZH ↔ EN) Results
| Model | inhouse-en2zh (HitRate@5) | inhouse-zh2en (HitRate@5) |
| --- | --- | --- |
| **AuroraX-Reranker-Base-v1.0 (Ours, 300M with 100M non-embed params)** | **0.8459** | **0.9427** |
| **bge-reranker-v2-m3 (600M params)** | 0.8179 | 0.9160 |
| **jina-reranker-v2-base-multilingual (300M params)** | 0.7815 | 0.8855 |
---
## Usage
### Sentence-Transformers
```py
from sentence_transformers import CrossEncoder
model = CrossEncoder("aqweteddy/AuroraX-Reranker-Base-v1.0")
score = model.predict([("What is Deep Learning?", "Deep learning is a subfield of ML...")])
print(score)
```
### Text-Embedding-Inference (API)
1. Install and launch the router:
```bash
text-embeddings-router --model-id aqweteddy/AuroraX-Reranker-Base-v1.0
```
2. Run via REST API:
```bash
curl 127.0.0.1:8080/rerank \
-X POST \
-d '{"query": "What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."]}' \
-H 'Content-Type: application/json'
```
---
## Citation
```
@misc{aurorax2025,
title = {AuroraX: A Fast Cross-Lingual Reranker Bridging English and Chinese},
author = {aqweteddy},
year = {2025},
howpublished = {\url{https://huggingface.co/aqweteddy/AuroraX-Reranker-Base-v1.0}},
note = {Lightweight and powerful eranker for English, Traditional Chinese, and Simplified Chinese}
}
``` |