|
|
--- |
|
|
library_name: transformers |
|
|
pipeline_tag: text-ranking |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
- zh |
|
|
base_model: |
|
|
- jhu-clsp/mmBERT-base |
|
|
tags: |
|
|
- reranker |
|
|
- modernbert |
|
|
- English |
|
|
- zh-tw |
|
|
- zh-cn |
|
|
--- |
|
|
|
|
|
# AuroraX: A Fast Cross-Lingual Reranker Bridging English and Chinese |
|
|
|
|
|
AuroraX is a lightweight yet powerful cross-lingual reranker built upon the [mmBERT-base](https://huggingface.co/jhu-clsp/mmBERT-base) architecture. |
|
|
It is designed to bridge **Traditional Chinese**, **Simplified Chinese** and **English**, enabling high-quality semantic ranking across languages with remarkable efficiency. |
|
|
|
|
|
Despite having only 110M non-embedding parameters, AuroraX achieves comparable performance to state-of-the-art rerankers that are twice as large. |
|
|
Its design emphasizes both speed and language adaptability, making it ideal for real-world multilingual retrieval and re-ranking applications. |
|
|
|
|
|
β¨ Key Features: |
|
|
- π **Cross-Lingual Understanding** β Trained to handle English, Traditional Chinese, and Simplified Chinese seamlessly. |
|
|
- β‘ **Lightweight & Fast** β Only 110M parameters (non-embedding), optimized for latency-sensitive pipelines. |
|
|
- π― **SOTA-Level Accuracy** β Comparable or superior to larger rerankers on Chinese and English benchmarks. |
|
|
|
|
|
--- |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
### Monolingual Benchmarks |
|
|
|
|
|
| Model | Metric | CMedQAv2-reranking (ZH) | T2Reranking (ZH) | **ZH AVG** | AskUbuntuDupQuestions (EN) | HUMENews21InstructionReranking (EN) | HUMEWikipediaRerankingMultilingual (EN) | SciDocsRR (EN) | **EN AVG** | **Total AVG** | |
|
|
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | |
|
|
| **AuroraX-Reranker-Base-v1.0**<br>*(Ours, 300M with 100M non-embed params)* | **mrr@10** | **0.8201** | **0.8554** | **0.8378** | **0.7936** | **1.0000** | **0.9778** | **0.9305** | **0.9255** | **0.8962** | |
|
|
| | **mrr@5** | 0.8145 | 0.8514 | 0.8329 | 0.7841 | 1.0000 | 0.9778 | 0.9289 | 0.9227 | 0.8928 | |
|
|
| **bge-reranker-v2-m3**<br>*(600M params)* | **mrr@10** | 0.8598 | 0.8004 | 0.8301 | 0.7635 | 0.9839 | 0.8750 | 0.9211 | 0.8859 | 0.8673 | |
|
|
| | **mrr@5** | 0.8569 | 0.7954 | 0.8262 | 0.7532 | 0.9839 | 0.8750 | 0.9191 | 0.8828 | 0.8639 | |
|
|
| **jina-reranker-v2-base-multilingual**<br>*(300M params)* | **mrr@10** | 0.2828 | 0.7577 | 0.5203 | 0.7420 | 1.0000 | 0.8761 | 0.9478 | 0.8915 | 0.7677 | |
|
|
| | **mrr@5** | 0.2759 | 0.7512 | 0.5136 | 0.7299 | 1.0000 | 0.8761 | 0.9467 | 0.8882 | 0.7633 | |
|
|
|
|
|
--- |
|
|
|
|
|
### Cross-Lingual (ZH β EN) Results |
|
|
|
|
|
| Model | inhouse-en2zh (HitRate@5) | inhouse-zh2en (HitRate@5) | |
|
|
| --- | --- | --- | |
|
|
| **AuroraX-Reranker-Base-v1.0 (Ours, 300M with 100M non-embed params)** | **0.8459** | **0.9427** | |
|
|
| **bge-reranker-v2-m3 (600M params)** | 0.8179 | 0.9160 | |
|
|
| **jina-reranker-v2-base-multilingual (300M params)** | 0.7815 | 0.8855 | |
|
|
|
|
|
--- |
|
|
## Usage |
|
|
|
|
|
### Sentence-Transformers |
|
|
|
|
|
```py |
|
|
from sentence_transformers import CrossEncoder |
|
|
|
|
|
model = CrossEncoder("aqweteddy/AuroraX-Reranker-Base-v1.0") |
|
|
score = model.predict([("What is Deep Learning?", "Deep learning is a subfield of ML...")]) |
|
|
print(score) |
|
|
``` |
|
|
|
|
|
### Text-Embedding-Inference (API) |
|
|
|
|
|
1. Install and launch the router: |
|
|
|
|
|
```bash |
|
|
text-embeddings-router --model-id aqweteddy/AuroraX-Reranker-Base-v1.0 |
|
|
``` |
|
|
|
|
|
2. Run via REST API: |
|
|
|
|
|
```bash |
|
|
curl 127.0.0.1:8080/rerank \ |
|
|
-X POST \ |
|
|
-d '{"query": "What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."]}' \ |
|
|
-H 'Content-Type: application/json' |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Citation |
|
|
|
|
|
``` |
|
|
@misc{aurorax2025, |
|
|
title = {AuroraX: A Fast Cross-Lingual Reranker Bridging English and Chinese}, |
|
|
author = {aqweteddy}, |
|
|
year = {2025}, |
|
|
howpublished = {\url{https://huggingface.co/aqweteddy/AuroraX-Reranker-Base-v1.0}}, |
|
|
note = {Lightweight and powerful eranker for English, Traditional Chinese, and Simplified Chinese} |
|
|
} |
|
|
``` |