dragonkue commited on
Commit
d21e609
·
verified ·
1 Parent(s): d8634f9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +25 -2
README.md CHANGED
@@ -60,6 +60,17 @@ SentenceTransformer(
60
 
61
  ## Usage
62
 
 
 
 
 
 
 
 
 
 
 
 
63
  ### Direct Usage (Sentence Transformers)
64
 
65
  First install the Sentence Transformers library:
@@ -162,7 +173,6 @@ You can finetune this model on your own dataset.
162
  * Standard metric : NDCG@10
163
 
164
  #### Information Retrieval
165
-
166
  | Model | Size(M) | Average | XPQARetrieval | PublicHealthQA | MIRACLRetrieval | Ko-StrategyQA | BelebeleRetrieval | AutoRAGRetrieval | MrTidyRetrieval |
167
  |:------------------------------------------------------------|----------:|----------:|----------------:|-----------------:|------------------:|----------------:|--------------------:|-------------------:|------------------:|
168
  | BAAI/bge-m3 | 560 | 0.724169 | 0.36075 | 0.80412 | 0.70146 | 0.79405 | 0.93164 | 0.83008 | 0.64708 |
@@ -170,13 +180,14 @@ You can finetune this model on your own dataset.
170
  | intfloat/multilingual-e5-large | 560 | 0.721607 | 0.3571 | 0.82534 | 0.66486 | 0.80348 | 0.94499 | 0.81337 | 0.64211 |
171
  | intfloat/multilingual-e5-base | 278 | 0.689429 | 0.3607 | 0.77203 | 0.6227 | 0.76355 | 0.92868 | 0.79752 | 0.58082 |
172
  | **dragonkue/multilingual-e5-small-ko** | 118 | 0.688819 | 0.34871 | 0.79729 | 0.61113 | 0.76173 | 0.9297 | 0.86184 | 0.51133 |
 
173
  | intfloat/multilingual-e5-small | 118 | 0.670906 | 0.33003 | 0.73668 | 0.61238 | 0.75157 | 0.90531 | 0.80068 | 0.55969 |
174
  | ibm-granite/granite-embedding-278m-multilingual | 278 | 0.616466 | 0.23058 | 0.77668 | 0.59216 | 0.71762 | 0.83231 | 0.70226 | 0.46365 |
175
  | ibm-granite/granite-embedding-107m-multilingual | 107 | 0.599759 | 0.23058 | 0.73209 | 0.58413 | 0.70531 | 0.82063 | 0.68243 | 0.44314 |
176
  | sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 | 118 | 0.409766 | 0.21345 | 0.67409 | 0.25676 | 0.45903 | 0.71491 | 0.42296 | 0.12716 |
177
 
178
  #### Performance Comparison by Model Size (Based on Average NDCG@10)
179
- <img src="https://cdn-uploads.huggingface.co/production/uploads/642b0c2fecec03b4464a1d9b/ZgOwD9nlgVchYBqK4iXTW.png" width="1000"/>
180
 
181
  <!--
182
  ### Recommendations
@@ -417,10 +428,22 @@ For text embedding tasks like text retrieval or semantic similarity, what matter
417
  }
418
  ```
419
 
 
 
 
 
 
 
 
 
 
420
  ## Limitations
421
 
422
  Long texts will be truncated to at most 512 tokens.
423
 
 
 
 
424
  <!--
425
  ## Glossary
426
 
 
60
 
61
  ## Usage
62
 
63
+ **🪶 Lightweight Version Available**
64
+
65
+ We also introduce a lightweight variant of this model:
66
+ [`exp-models/dragonkue-KoEn-E5-Tiny`](https://huggingface.co/exp-models/dragonkue-KoEn-E5-Tiny),
67
+ which removes all tokens **except Korean and English** to reduce model size while maintaining performance.
68
+
69
+ The repository also includes a **GGUF-quantized version**, making it suitable for efficient local or on-device embedding model serving.
70
+
71
+ > 🔧 For practical deployment, we highly recommend using this **lightweight retriever** in combination with a **reranker** model — it forms a powerful and resource-efficient retrieval setup.
72
+
73
+
74
  ### Direct Usage (Sentence Transformers)
75
 
76
  First install the Sentence Transformers library:
 
173
  * Standard metric : NDCG@10
174
 
175
  #### Information Retrieval
 
176
  | Model | Size(M) | Average | XPQARetrieval | PublicHealthQA | MIRACLRetrieval | Ko-StrategyQA | BelebeleRetrieval | AutoRAGRetrieval | MrTidyRetrieval |
177
  |:------------------------------------------------------------|----------:|----------:|----------------:|-----------------:|------------------:|----------------:|--------------------:|-------------------:|------------------:|
178
  | BAAI/bge-m3 | 560 | 0.724169 | 0.36075 | 0.80412 | 0.70146 | 0.79405 | 0.93164 | 0.83008 | 0.64708 |
 
180
  | intfloat/multilingual-e5-large | 560 | 0.721607 | 0.3571 | 0.82534 | 0.66486 | 0.80348 | 0.94499 | 0.81337 | 0.64211 |
181
  | intfloat/multilingual-e5-base | 278 | 0.689429 | 0.3607 | 0.77203 | 0.6227 | 0.76355 | 0.92868 | 0.79752 | 0.58082 |
182
  | **dragonkue/multilingual-e5-small-ko** | 118 | 0.688819 | 0.34871 | 0.79729 | 0.61113 | 0.76173 | 0.9297 | 0.86184 | 0.51133 |
183
+ | **exp-models/dragonkue-KoEn-E5-Tiny** | 37 | 0.687496 | 0.34735 | 0.7925 | 0.6143 | 0.75978 | 0.93018 | 0.86503 | 0.50333 |
184
  | intfloat/multilingual-e5-small | 118 | 0.670906 | 0.33003 | 0.73668 | 0.61238 | 0.75157 | 0.90531 | 0.80068 | 0.55969 |
185
  | ibm-granite/granite-embedding-278m-multilingual | 278 | 0.616466 | 0.23058 | 0.77668 | 0.59216 | 0.71762 | 0.83231 | 0.70226 | 0.46365 |
186
  | ibm-granite/granite-embedding-107m-multilingual | 107 | 0.599759 | 0.23058 | 0.73209 | 0.58413 | 0.70531 | 0.82063 | 0.68243 | 0.44314 |
187
  | sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 | 118 | 0.409766 | 0.21345 | 0.67409 | 0.25676 | 0.45903 | 0.71491 | 0.42296 | 0.12716 |
188
 
189
  #### Performance Comparison by Model Size (Based on Average NDCG@10)
190
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/642b0c2fecec03b4464a1d9b/Utunk7FbZsTDEVsOVUms1.png" width="1000"/>
191
 
192
  <!--
193
  ### Recommendations
 
428
  }
429
  ```
430
 
431
+ #### KURE
432
+ ```bibtex
433
+ @misc{KURE,
434
+ publisher = {Youngjoon Jang, Junyoung Son, Taemin Lee},
435
+ year = {2024},
436
+ url = {https://github.com/nlpai-lab/KURE}
437
+ }
438
+ ```
439
+
440
  ## Limitations
441
 
442
  Long texts will be truncated to at most 512 tokens.
443
 
444
+ ## Acknowledgements
445
+ Special thanks to lemon-mint for their valuable contribution in optimizing and compressing this model.
446
+
447
  <!--
448
  ## Glossary
449