becleverksh commited on
Commit
646b3bf
·
verified ·
1 Parent(s): bd15a2c

docs: add external-baseline comparison (bge-m3 / e5-large / embeddinggemma)

Browse files
Files changed (1) hide show
  1. README.md +13 -0
README.md CHANGED
@@ -140,6 +140,19 @@ Trained for 3 epochs (early-stopped) with a cosine schedule; training loss decre
140
 
141
  **v4 is the released model.** Fine-tuning lifts in-domain retrieval by **~6%** over the base `Qwen3-Embedding-4B`, with the largest gains on top-rank precision (Accuracy@1, Recall@1). It also surpasses the 0.6B sibling (test NDCG@10 0.6695) by **+0.021 (+3.2%)** — a modest scale gain at ~7× the parameters, so the 0.6B remains the better pick for latency-sensitive serving.
142
 
 
 
 
 
 
 
 
 
 
 
 
 
 
143
  <br>
144
 
145
  ## 2.4. Limitations
 
140
 
141
  **v4 is the released model.** Fine-tuning lifts in-domain retrieval by **~6%** over the base `Qwen3-Embedding-4B`, with the largest gains on top-rank precision (Accuracy@1, Recall@1). It also surpasses the 0.6B sibling (test NDCG@10 0.6695) by **+0.021 (+3.2%)** — a modest scale gain at ~7× the parameters, so the 0.6B remains the better pick for latency-sensitive serving.
142
 
143
+ ### Comparison with public encoders
144
+ On the *same* in-domain test set, public multilingual SOTA encoders (each run with its own native prompt format) all fall **well below this model** — domain fine-tuning beats general-purpose scale:
145
+
146
+ | Model | Params | NDCG@10 | MRR@10 | Recall@10 | Accuracy@1 |
147
+ |-------|:---:|:---:|:---:|:---:|:---:|
148
+ | google/embeddinggemma-300m | 0.3B | 0.6373 | 0.6664 | 0.7082 | 0.5790 |
149
+ | BAAI/bge-m3 | 0.6B | 0.6426 | 0.6660 | 0.7261 | 0.5730 |
150
+ | intfloat/multilingual-e5-large | 0.6B | 0.6476 | 0.6722 | 0.7313 | 0.5790 |
151
+ | MoAI-Embedding-0.6B (sibling) | 0.6B | 0.6695 | 0.7060 | 0.7508 | 0.6190 |
152
+ | **MoAI-Embedding-4B (this model)** | 4B | **0.6906** | **0.7283** | **0.7620** | **0.6480** |
153
+
154
+ This model leads the best general-purpose baseline (e5-large) by **+0.043 NDCG@10**. _Caveat: these baselines are not tuned on BC Card data — the comparison illustrates the value of domain adaptation, not a defect in the baselines._
155
+
156
  <br>
157
 
158
  ## 2.4. Limitations