Update README.md
Browse files
README.md
CHANGED
|
@@ -60,6 +60,17 @@ SentenceTransformer(
|
|
| 60 |
|
| 61 |
## Usage
|
| 62 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 63 |
### Direct Usage (Sentence Transformers)
|
| 64 |
|
| 65 |
First install the Sentence Transformers library:
|
|
@@ -162,7 +173,6 @@ You can finetune this model on your own dataset.
|
|
| 162 |
* Standard metric : NDCG@10
|
| 163 |
|
| 164 |
#### Information Retrieval
|
| 165 |
-
|
| 166 |
| Model | Size(M) | Average | XPQARetrieval | PublicHealthQA | MIRACLRetrieval | Ko-StrategyQA | BelebeleRetrieval | AutoRAGRetrieval | MrTidyRetrieval |
|
| 167 |
|:------------------------------------------------------------|----------:|----------:|----------------:|-----------------:|------------------:|----------------:|--------------------:|-------------------:|------------------:|
|
| 168 |
| BAAI/bge-m3 | 560 | 0.724169 | 0.36075 | 0.80412 | 0.70146 | 0.79405 | 0.93164 | 0.83008 | 0.64708 |
|
|
@@ -170,13 +180,14 @@ You can finetune this model on your own dataset.
|
|
| 170 |
| intfloat/multilingual-e5-large | 560 | 0.721607 | 0.3571 | 0.82534 | 0.66486 | 0.80348 | 0.94499 | 0.81337 | 0.64211 |
|
| 171 |
| intfloat/multilingual-e5-base | 278 | 0.689429 | 0.3607 | 0.77203 | 0.6227 | 0.76355 | 0.92868 | 0.79752 | 0.58082 |
|
| 172 |
| **dragonkue/multilingual-e5-small-ko** | 118 | 0.688819 | 0.34871 | 0.79729 | 0.61113 | 0.76173 | 0.9297 | 0.86184 | 0.51133 |
|
|
|
|
| 173 |
| intfloat/multilingual-e5-small | 118 | 0.670906 | 0.33003 | 0.73668 | 0.61238 | 0.75157 | 0.90531 | 0.80068 | 0.55969 |
|
| 174 |
| ibm-granite/granite-embedding-278m-multilingual | 278 | 0.616466 | 0.23058 | 0.77668 | 0.59216 | 0.71762 | 0.83231 | 0.70226 | 0.46365 |
|
| 175 |
| ibm-granite/granite-embedding-107m-multilingual | 107 | 0.599759 | 0.23058 | 0.73209 | 0.58413 | 0.70531 | 0.82063 | 0.68243 | 0.44314 |
|
| 176 |
| sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 | 118 | 0.409766 | 0.21345 | 0.67409 | 0.25676 | 0.45903 | 0.71491 | 0.42296 | 0.12716 |
|
| 177 |
|
| 178 |
#### Performance Comparison by Model Size (Based on Average NDCG@10)
|
| 179 |
-
<img src="https://cdn-uploads.huggingface.co/production/uploads/642b0c2fecec03b4464a1d9b/
|
| 180 |
|
| 181 |
<!--
|
| 182 |
### Recommendations
|
|
@@ -417,10 +428,22 @@ For text embedding tasks like text retrieval or semantic similarity, what matter
|
|
| 417 |
}
|
| 418 |
```
|
| 419 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 420 |
## Limitations
|
| 421 |
|
| 422 |
Long texts will be truncated to at most 512 tokens.
|
| 423 |
|
|
|
|
|
|
|
|
|
|
| 424 |
<!--
|
| 425 |
## Glossary
|
| 426 |
|
|
|
|
| 60 |
|
| 61 |
## Usage
|
| 62 |
|
| 63 |
+
**🪶 Lightweight Version Available**
|
| 64 |
+
|
| 65 |
+
We also introduce a lightweight variant of this model:
|
| 66 |
+
[`exp-models/dragonkue-KoEn-E5-Tiny`](https://huggingface.co/exp-models/dragonkue-KoEn-E5-Tiny),
|
| 67 |
+
which removes all tokens **except Korean and English** to reduce model size while maintaining performance.
|
| 68 |
+
|
| 69 |
+
The repository also includes a **GGUF-quantized version**, making it suitable for efficient local or on-device embedding model serving.
|
| 70 |
+
|
| 71 |
+
> 🔧 For practical deployment, we highly recommend using this **lightweight retriever** in combination with a **reranker** model — it forms a powerful and resource-efficient retrieval setup.
|
| 72 |
+
|
| 73 |
+
|
| 74 |
### Direct Usage (Sentence Transformers)
|
| 75 |
|
| 76 |
First install the Sentence Transformers library:
|
|
|
|
| 173 |
* Standard metric : NDCG@10
|
| 174 |
|
| 175 |
#### Information Retrieval
|
|
|
|
| 176 |
| Model | Size(M) | Average | XPQARetrieval | PublicHealthQA | MIRACLRetrieval | Ko-StrategyQA | BelebeleRetrieval | AutoRAGRetrieval | MrTidyRetrieval |
|
| 177 |
|:------------------------------------------------------------|----------:|----------:|----------------:|-----------------:|------------------:|----------------:|--------------------:|-------------------:|------------------:|
|
| 178 |
| BAAI/bge-m3 | 560 | 0.724169 | 0.36075 | 0.80412 | 0.70146 | 0.79405 | 0.93164 | 0.83008 | 0.64708 |
|
|
|
|
| 180 |
| intfloat/multilingual-e5-large | 560 | 0.721607 | 0.3571 | 0.82534 | 0.66486 | 0.80348 | 0.94499 | 0.81337 | 0.64211 |
|
| 181 |
| intfloat/multilingual-e5-base | 278 | 0.689429 | 0.3607 | 0.77203 | 0.6227 | 0.76355 | 0.92868 | 0.79752 | 0.58082 |
|
| 182 |
| **dragonkue/multilingual-e5-small-ko** | 118 | 0.688819 | 0.34871 | 0.79729 | 0.61113 | 0.76173 | 0.9297 | 0.86184 | 0.51133 |
|
| 183 |
+
| **exp-models/dragonkue-KoEn-E5-Tiny** | 37 | 0.687496 | 0.34735 | 0.7925 | 0.6143 | 0.75978 | 0.93018 | 0.86503 | 0.50333 |
|
| 184 |
| intfloat/multilingual-e5-small | 118 | 0.670906 | 0.33003 | 0.73668 | 0.61238 | 0.75157 | 0.90531 | 0.80068 | 0.55969 |
|
| 185 |
| ibm-granite/granite-embedding-278m-multilingual | 278 | 0.616466 | 0.23058 | 0.77668 | 0.59216 | 0.71762 | 0.83231 | 0.70226 | 0.46365 |
|
| 186 |
| ibm-granite/granite-embedding-107m-multilingual | 107 | 0.599759 | 0.23058 | 0.73209 | 0.58413 | 0.70531 | 0.82063 | 0.68243 | 0.44314 |
|
| 187 |
| sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 | 118 | 0.409766 | 0.21345 | 0.67409 | 0.25676 | 0.45903 | 0.71491 | 0.42296 | 0.12716 |
|
| 188 |
|
| 189 |
#### Performance Comparison by Model Size (Based on Average NDCG@10)
|
| 190 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/642b0c2fecec03b4464a1d9b/Utunk7FbZsTDEVsOVUms1.png" width="1000"/>
|
| 191 |
|
| 192 |
<!--
|
| 193 |
### Recommendations
|
|
|
|
| 428 |
}
|
| 429 |
```
|
| 430 |
|
| 431 |
+
#### KURE
|
| 432 |
+
```bibtex
|
| 433 |
+
@misc{KURE,
|
| 434 |
+
publisher = {Youngjoon Jang, Junyoung Son, Taemin Lee},
|
| 435 |
+
year = {2024},
|
| 436 |
+
url = {https://github.com/nlpai-lab/KURE}
|
| 437 |
+
}
|
| 438 |
+
```
|
| 439 |
+
|
| 440 |
## Limitations
|
| 441 |
|
| 442 |
Long texts will be truncated to at most 512 tokens.
|
| 443 |
|
| 444 |
+
## Acknowledgements
|
| 445 |
+
Special thanks to lemon-mint for their valuable contribution in optimizing and compressing this model.
|
| 446 |
+
|
| 447 |
<!--
|
| 448 |
## Glossary
|
| 449 |
|