dragonkue
/

multilingual-e5-small-ko

@@ -60,6 +60,17 @@ SentenceTransformer(
 ## Usage
 ### Direct Usage (Sentence Transformers)
 First install the Sentence Transformers library:
@@ -162,7 +173,6 @@ You can finetune this model on your own dataset.
 * Standard metric : NDCG@10
 #### Information Retrieval
 | Model                                                       |   Size(M) |   Average |   XPQARetrieval |   PublicHealthQA |   MIRACLRetrieval |   Ko-StrategyQA |   BelebeleRetrieval |   AutoRAGRetrieval |   MrTidyRetrieval |
 |:------------------------------------------------------------|----------:|----------:|----------------:|-----------------:|------------------:|----------------:|--------------------:|-------------------:|------------------:|
 | BAAI/bge-m3                                                 |       560 |  0.724169 |         0.36075 |          0.80412 |           0.70146 |         0.79405 |             0.93164 |            0.83008 |           0.64708 |
@@ -170,13 +180,14 @@ You can finetune this model on your own dataset.
 | intfloat/multilingual-e5-large                              |       560 |  0.721607 |         0.3571  |          0.82534 |           0.66486 |         0.80348 |             0.94499 |            0.81337 |           0.64211 |
 | intfloat/multilingual-e5-base                               |       278 |  0.689429 |         0.3607  |          0.77203 |           0.6227  |         0.76355 |             0.92868 |            0.79752 |           0.58082 |
 | **dragonkue/multilingual-e5-small-ko**                      |       118 |  0.688819 |         0.34871 |          0.79729 |           0.61113 |         0.76173 |             0.9297  |            0.86184 |           0.51133 |
 | intfloat/multilingual-e5-small                              |       118 |  0.670906 |         0.33003 |          0.73668 |           0.61238 |         0.75157 |             0.90531 |            0.80068 |           0.55969 |
 | ibm-granite/granite-embedding-278m-multilingual             |       278 |  0.616466 |         0.23058 |          0.77668 |           0.59216 |         0.71762 |             0.83231 |            0.70226 |           0.46365 |
 | ibm-granite/granite-embedding-107m-multilingual             |       107 |  0.599759 |         0.23058 |          0.73209 |           0.58413 |         0.70531 |             0.82063 |            0.68243 |           0.44314 |
 | sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 |       118 |  0.409766 |         0.21345 |          0.67409 |           0.25676 |         0.45903 |             0.71491 |            0.42296 |           0.12716 |
 #### Performance Comparison by Model Size (Based on Average NDCG@10)
-<img src="https://cdn-uploads.huggingface.co/production/uploads/642b0c2fecec03b4464a1d9b/ZgOwD9nlgVchYBqK4iXTW.png" width="1000"/>
 <!--
 ### Recommendations
@@ -417,10 +428,22 @@ For text embedding tasks like text retrieval or semantic similarity, what matter
 }
 ```
 ## Limitations
 Long texts will be truncated to at most 512 tokens.
 <!--
 ## Glossary

 ## Usage
+**🪶 Lightweight Version Available**
+We also introduce a lightweight variant of this model:
+[`exp-models/dragonkue-KoEn-E5-Tiny`](https://huggingface.co/exp-models/dragonkue-KoEn-E5-Tiny),
+which removes all tokens **except Korean and English** to reduce model size while maintaining performance.
+The repository also includes a **GGUF-quantized version**, making it suitable for efficient local or on-device embedding model serving.
+> 🔧 For practical deployment, we highly recommend using this **lightweight retriever** in combination with a **reranker** model — it forms a powerful and resource-efficient retrieval setup.
 ### Direct Usage (Sentence Transformers)
 First install the Sentence Transformers library:
 * Standard metric : NDCG@10
 #### Information Retrieval
 | Model                                                       |   Size(M) |   Average |   XPQARetrieval |   PublicHealthQA |   MIRACLRetrieval |   Ko-StrategyQA |   BelebeleRetrieval |   AutoRAGRetrieval |   MrTidyRetrieval |
 |:------------------------------------------------------------|----------:|----------:|----------------:|-----------------:|------------------:|----------------:|--------------------:|-------------------:|------------------:|
 | BAAI/bge-m3                                                 |       560 |  0.724169 |         0.36075 |          0.80412 |           0.70146 |         0.79405 |             0.93164 |            0.83008 |           0.64708 |
 | intfloat/multilingual-e5-large                              |       560 |  0.721607 |         0.3571  |          0.82534 |           0.66486 |         0.80348 |             0.94499 |            0.81337 |           0.64211 |
 | intfloat/multilingual-e5-base                               |       278 |  0.689429 |         0.3607  |          0.77203 |           0.6227  |         0.76355 |             0.92868 |            0.79752 |           0.58082 |
 | **dragonkue/multilingual-e5-small-ko**                      |       118 |  0.688819 |         0.34871 |          0.79729 |           0.61113 |         0.76173 |             0.9297  |            0.86184 |           0.51133 |
+| **exp-models/dragonkue-KoEn-E5-Tiny**                       |        37 |  0.687496 |         0.34735 |          0.7925  |           0.6143  |         0.75978 |             0.93018 |            0.86503 |           0.50333 |
 | intfloat/multilingual-e5-small                              |       118 |  0.670906 |         0.33003 |          0.73668 |           0.61238 |         0.75157 |             0.90531 |            0.80068 |           0.55969 |
 | ibm-granite/granite-embedding-278m-multilingual             |       278 |  0.616466 |         0.23058 |          0.77668 |           0.59216 |         0.71762 |             0.83231 |            0.70226 |           0.46365 |
 | ibm-granite/granite-embedding-107m-multilingual             |       107 |  0.599759 |         0.23058 |          0.73209 |           0.58413 |         0.70531 |             0.82063 |            0.68243 |           0.44314 |
 | sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 |       118 |  0.409766 |         0.21345 |          0.67409 |           0.25676 |         0.45903 |             0.71491 |            0.42296 |           0.12716 |
 #### Performance Comparison by Model Size (Based on Average NDCG@10)
+<img src="https://cdn-uploads.huggingface.co/production/uploads/642b0c2fecec03b4464a1d9b/Utunk7FbZsTDEVsOVUms1.png" width="1000"/>
 <!--
 ### Recommendations
 }
 ```
+#### KURE
+```bibtex
+@misc{KURE,
+  publisher = {Youngjoon Jang, Junyoung Son, Taemin Lee},
+  year = {2024},
+  url = {https://github.com/nlpai-lab/KURE}
+}
+```
 ## Limitations
 Long texts will be truncated to at most 512 tokens.
+## Acknowledgements
+Special thanks to lemon-mint for their valuable contribution in optimizing and compressing this model.
 <!--
 ## Glossary