becleverksh commited on
Commit
6235d35
·
verified ·
1 Parent(s): 52ba794

docs: add Avg column to encoder comparison

Browse files
Files changed (1) hide show
  1. README.md +11 -11
README.md CHANGED
@@ -138,21 +138,21 @@ Trained for 3 epochs (early-stopped) with a cosine schedule; training loss decre
138
  | Accuracy@1 | 0.5950 | 0.6480 | +0.053 (+8.9%) |
139
  | MAP@10 | 0.6013 | 0.6410 | +0.040 (+6.6%) |
140
 
141
- **v4 is the released model.** Fine-tuning lifts in-domain retrieval by **~6%** over the base `Qwen3-Embedding-4B`, with the largest gains on top-rank precision (Accuracy@1, Recall@1). It also surpasses the 0.6B sibling (test NDCG@10 0.6695) by **+0.021 (+3.2%)** — a modest scale gain at ~7× the parameters, so the 0.6B remains the better pick for latency-sensitive serving.
142
 
143
  ### Comparison with other encoders
144
  On the *same* in-domain test set, untuned encoders — our own `Qwen3-Embedding` base (0.6B / 4B) and public multilingual SOTA models (each run with its own native prompt format) — all fall **well below this model**: domain fine-tuning beats general-purpose scale:
145
 
146
- | Model | Params | NDCG@10 | MRR@10 | Recall@10 | Accuracy@1 | MAP@10 |
147
- |-------|:---:|:---:|:---:|:---:|:---:|:---:|
148
- | LiquidAI/LFM2.5-Embedding-350M | 0.35B | 0.5983 | 0.6166 | 0.6799 | 0.5320 | 0.5519 |
149
- | Qwen3-Embedding-0.6B (base) | 0.6B | 0.6186 | 0.6449 | 0.7046 | 0.5560 | 0.5652 |
150
- | google/embeddinggemma-300m | 0.3B | 0.6373 | 0.6664 | 0.7082 | 0.5790 | 0.5906 |
151
- | BAAI/bge-m3 | 0.6B | 0.6426 | 0.6660 | 0.7261 | 0.5730 | 0.5913 |
152
- | intfloat/multilingual-e5-large | 0.6B | 0.6476 | 0.6722 | 0.7313 | 0.5790 | 0.5958 |
153
- | Qwen3-Embedding-4B (base) | 4B | 0.6508 | 0.6805 | 0.7244 | 0.5950 | 0.6013 |
154
- | MoAI-Embedding-0.6B (sibling) | 0.6B | 0.6695 | 0.7060 | 0.7508 | 0.6190 | 0.6171 |
155
- | **MoAI-Embedding-4B (this model)** | 4B | **0.6906** | **0.7283** | **0.7620** | **0.6480** | **0.6410** |
156
 
157
  This model improves over its own `Qwen3-Embedding-4B` base by **+0.040 NDCG@10 (+6.1%)** and leads the best general-purpose baseline (e5-large) by **+0.043 NDCG@10**. Notably, the untuned **4B base (`0.6508`) trails the fine-tuned 0.6B sibling (`0.6695`)** — fine-tuning outweighs scale. _Caveat: these baselines are not tuned on BC Card data — the comparison illustrates the value of domain adaptation, not a defect in the baselines._
158
 
 
138
  | Accuracy@1 | 0.5950 | 0.6480 | +0.053 (+8.9%) |
139
  | MAP@10 | 0.6013 | 0.6410 | +0.040 (+6.6%) |
140
 
141
+ **v4 is the released model.** Fine-tuning lifts in-domain retrieval by **roughly +7%** over the base `Qwen3-Embedding-4B`, with the largest gains on top-rank precision (Accuracy@1, Recall@1). It also surpasses the 0.6B sibling (test NDCG@10 0.6695) by **+0.021 (+3.2%)** — a modest scale gain at ~7× the parameters, so the 0.6B remains the better pick for latency-sensitive serving.
142
 
143
  ### Comparison with other encoders
144
  On the *same* in-domain test set, untuned encoders — our own `Qwen3-Embedding` base (0.6B / 4B) and public multilingual SOTA models (each run with its own native prompt format) — all fall **well below this model**: domain fine-tuning beats general-purpose scale:
145
 
146
+ | Model | Params | NDCG@10 | MRR@10 | Recall@10 | Accuracy@1 | MAP@10 | Avg |
147
+ |-------|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
148
+ | LiquidAI/LFM2.5-Embedding-350M | 0.35B | 0.5983 | 0.6166 | 0.6799 | 0.5320 | 0.5519 | 0.5957 |
149
+ | Qwen3-Embedding-0.6B (base) | 0.6B | 0.6186 | 0.6449 | 0.7046 | 0.5560 | 0.5652 | 0.6179 |
150
+ | google/embeddinggemma-300m | 0.3B | 0.6373 | 0.6664 | 0.7082 | 0.5790 | 0.5906 | 0.6363 |
151
+ | BAAI/bge-m3 | 0.6B | 0.6426 | 0.6660 | 0.7261 | 0.5730 | 0.5913 | 0.6398 |
152
+ | intfloat/multilingual-e5-large | 0.6B | 0.6476 | 0.6722 | 0.7313 | 0.5790 | 0.5958 | 0.6452 |
153
+ | Qwen3-Embedding-4B (base) | 4B | 0.6508 | 0.6805 | 0.7244 | 0.5950 | 0.6013 | 0.6504 |
154
+ | MoAI-Embedding-0.6B (sibling) | 0.6B | 0.6695 | 0.7060 | 0.7508 | 0.6190 | 0.6171 | 0.6725 |
155
+ | **MoAI-Embedding-4B (this model)** | 4B | **0.6906** | **0.7283** | **0.7620** | **0.6480** | **0.6410** | **0.6940** |
156
 
157
  This model improves over its own `Qwen3-Embedding-4B` base by **+0.040 NDCG@10 (+6.1%)** and leads the best general-purpose baseline (e5-large) by **+0.043 NDCG@10**. Notably, the untuned **4B base (`0.6508`) trails the fine-tuned 0.6B sibling (`0.6695`)** — fine-tuning outweighs scale. _Caveat: these baselines are not tuned on BC Card data — the comparison illustrates the value of domain adaptation, not a defect in the baselines._
158