Update README.md
Browse files
README.md
CHANGED
|
@@ -17,7 +17,7 @@ metrics:
|
|
| 17 |
- spearmanr
|
| 18 |
---
|
| 19 |
## Model Description:
|
| 20 |
-
[**vietnamese-embedding-LongContext**](https://huggingface.co/dangvantuan/vietnamese-embedding-LongContext) is the Embedding Model for Vietnamese language with context length up to 8096 tokens. This model is a specialized
|
| 21 |
|
| 22 |
## Full Model Architecture
|
| 23 |
```
|
|
@@ -100,7 +100,7 @@ test_evaluator(model, output_path="./")
|
|
| 100 |
**Spearman score**
|
| 101 |
| Model | [STSB] | [STS12]| [STS13] | [STS14] | [STS15] | [STS16] | [SICK] | Mean |
|
| 102 |
|-----------------------------------------------------------|---------|----------|----------|----------|----------|----------|---------|--------|
|
| 103 |
-
| [dangvantuan/vietnamese-embedding](https://huggingface.co/dangvantuan/vietnamese-embedding)
|
| 104 |
| [dangvantuan/vietnamese-embedding-LongContext](https://huggingface.co/dangvantuan/vietnamese-embedding-LongContext) |85.25| 75.77| 83.82| 81.69| 88.48| 81.5| 78.2| 82.10|
|
| 105 |
|
| 106 |
## Citation
|
|
@@ -120,7 +120,7 @@ test_evaluator(model, output_path="./")
|
|
| 120 |
journal={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
|
| 121 |
year={2020}
|
| 122 |
}
|
| 123 |
-
|
| 124 |
title={Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks},
|
| 125 |
author={Thakur, Nandan and Reimers, Nils and Daxenberger, Johannes and Gurevych, Iryna},
|
| 126 |
journal={arXiv e-prints},
|
|
|
|
| 17 |
- spearmanr
|
| 18 |
---
|
| 19 |
## Model Description:
|
| 20 |
+
[**vietnamese-embedding-LongContext**](https://huggingface.co/dangvantuan/vietnamese-embedding-LongContext) is the Embedding Model for Vietnamese language with context length up to 8096 tokens. This model is a specialized text-embedding trained specifically for the Vietnamese language, which is built upon [gte-multilingual](Alibaba-NLP/gte-multilingual-base) and trained using the Multi-Negative Ranking Loss, Matryoshka2dLoss and SimilarityLoss.
|
| 21 |
|
| 22 |
## Full Model Architecture
|
| 23 |
```
|
|
|
|
| 100 |
**Spearman score**
|
| 101 |
| Model | [STSB] | [STS12]| [STS13] | [STS14] | [STS15] | [STS16] | [SICK] | Mean |
|
| 102 |
|-----------------------------------------------------------|---------|----------|----------|----------|----------|----------|---------|--------|
|
| 103 |
+
| [dangvantuan/vietnamese-embedding](https://huggingface.co/dangvantuan/vietnamese-embedding) |84.84| 79.04| 85.30| 81.38| 87.06| 79.95| 79.58| 82.45|
|
| 104 |
| [dangvantuan/vietnamese-embedding-LongContext](https://huggingface.co/dangvantuan/vietnamese-embedding-LongContext) |85.25| 75.77| 83.82| 81.69| 88.48| 81.5| 78.2| 82.10|
|
| 105 |
|
| 106 |
## Citation
|
|
|
|
| 120 |
journal={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
|
| 121 |
year={2020}
|
| 122 |
}
|
| 123 |
+
@article{thakur2020augmented,
|
| 124 |
title={Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks},
|
| 125 |
author={Thakur, Nandan and Reimers, Nils and Daxenberger, Johannes and Gurevych, Iryna},
|
| 126 |
journal={arXiv e-prints},
|