dragonkue
/

multilingual-e5-small-ko

Sentence Similarity

sentence-transformers

feature-extraction

Generated from Trainer

text-embeddings-inference

Model card Files Files and versions

dragonkue commited on May 11, 2025

Commit

78fa19f

·

verified ·

1 Parent(s): 5267538

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -177,7 +177,7 @@ You can finetune this model on your own dataset.
 | sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 |    118 |  0.456867 |         0.21345 |          0.67409 |           0.25676 |         0.45903 |             0.71491 |            0.42296 |         nan       |
 #### Performance Comparison by Model Size (Based on Average NDCG@10)
-<img src="https://cdn-uploads.huggingface.co/production/uploads/642b0c2fecec03b4464a1d9b/Ba2bVpPlB7egF80USITJ5.png" width="800"/>
 <!--
@@ -364,7 +364,7 @@ pip install -U sentence-transformers
 - Tokenizers: 0.21.1
 ## FAQ
-1. Do I need to add the prefix "query: " and "passage: " to input texts?
 Yes, this is how the model is trained, otherwise you will see a performance degradation.
@@ -376,7 +376,7 @@ Use "query: " prefix for symmetric tasks such as semantic similarity, bitext min
 Use "query: " prefix if you want to use embeddings as features, such as linear probing classification, clustering.
-2. Why does the cosine similarity scores distribute around 0.7 to 1.0?
 This is a known and expected behavior as we use a low temperature 0.01 for InfoNCE contrastive loss.

 | sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 |    118 |  0.456867 |         0.21345 |          0.67409 |           0.25676 |         0.45903 |             0.71491 |            0.42296 |         nan       |
 #### Performance Comparison by Model Size (Based on Average NDCG@10)
+<img src="https://cdn-uploads.huggingface.co/production/uploads/642b0c2fecec03b4464a1d9b/Ba2bVpPlB7egF80USITJ5.png" width="1000"/>
 <!--
 - Tokenizers: 0.21.1
 ## FAQ
+**1. Do I need to add the prefix "query: " and "passage: " to input texts?**
 Yes, this is how the model is trained, otherwise you will see a performance degradation.
 Use "query: " prefix if you want to use embeddings as features, such as linear probing classification, clustering.
+**2. Why does the cosine similarity scores distribute around 0.7 to 1.0?**
 This is a known and expected behavior as we use a low temperature 0.01 for InfoNCE contrastive loss.