dangvantuan
/

vietnamese-document-embedding

@@ -17,7 +17,7 @@ metrics:
 - spearmanr
 ---
 ## Model Description:
-[**vietnamese-embedding-LongContext**](https://huggingface.co/dangvantuan/vietnamese-embedding-LongContext) is the Embedding Model for Vietnamese language  with context length up to 8096 tokens. This model is a specialized sentence-embedding trained specifically for the Vietnamese language, which is built upon [gte-multilingual](Alibaba-NLP/gte-multilingual-base) and trained using the Multi-Negative Ranking Loss, Matryoshka2dLoss and SimilarityLoss.
 ## Full Model Architecture
 ```
@@ -100,7 +100,7 @@ test_evaluator(model, output_path="./")
 **Spearman score**
 | Model                                                                                                               | [STSB]   | [STS12]| [STS13] | [STS14] | [STS15] |    [STS16] | [SICK] | Mean |
 |-----------------------------------------------------------|---------|----------|----------|----------|----------|----------|---------|--------|
-| [dangvantuan/vietnamese-embedding](https://huggingface.co/dangvantuan/vietnamese-embedding)                                                 |**84.84**|	**79.04**|	**85.30**|	**81.38**|	**87.06**|	**79.95**|	**79.58**|	**82.45**|
 | [dangvantuan/vietnamese-embedding-LongContext](https://huggingface.co/dangvantuan/vietnamese-embedding-LongContext)  |85.25|	75.77|	83.82|	81.69|	88.48|	81.5|	78.2|	82.10|
 ## Citation
@@ -120,7 +120,7 @@ test_evaluator(model, output_path="./")
 	   journal={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
 	   year={2020}
 	}
-    @article{thakur2020augmented,
       title={Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks},
       author={Thakur, Nandan and Reimers, Nils and Daxenberger, Johannes and Gurevych, Iryna},
       journal={arXiv e-prints},

 - spearmanr
 ---
 ## Model Description:
+[**vietnamese-embedding-LongContext**](https://huggingface.co/dangvantuan/vietnamese-embedding-LongContext) is the Embedding Model for Vietnamese language  with context length up to 8096 tokens. This model is a specialized text-embedding trained specifically for the Vietnamese language, which is built upon [gte-multilingual](Alibaba-NLP/gte-multilingual-base) and trained using the Multi-Negative Ranking Loss, Matryoshka2dLoss and SimilarityLoss.
 ## Full Model Architecture
 ```
 **Spearman score**
 | Model                                                                                                               | [STSB]   | [STS12]| [STS13] | [STS14] | [STS15] |    [STS16] | [SICK] | Mean |
 |-----------------------------------------------------------|---------|----------|----------|----------|----------|----------|---------|--------|
+| [dangvantuan/vietnamese-embedding](https://huggingface.co/dangvantuan/vietnamese-embedding)                                                 |84.84|	79.04|	85.30|	81.38|	87.06|	79.95|	79.58|	82.45|
 | [dangvantuan/vietnamese-embedding-LongContext](https://huggingface.co/dangvantuan/vietnamese-embedding-LongContext)  |85.25|	75.77|	83.82|	81.69|	88.48|	81.5|	78.2|	82.10|
 ## Citation
 	   journal={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
 	   year={2020}
 	}
+  @article{thakur2020augmented,
       title={Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks},
       author={Thakur, Nandan and Reimers, Nils and Daxenberger, Johannes and Gurevych, Iryna},
       journal={arXiv e-prints},