Update README.md
Browse files
README.md
CHANGED
|
@@ -37,7 +37,7 @@ SentenceTransformer(
|
|
| 37 |
- Dataset: [STSB-fr and en]
|
| 38 |
- Method: Fine-tuning specifically for the semantic textual similarity benchmark using Siamese BERT-Networks configured with the 'sentence-transformers' library.
|
| 39 |
### Stage 4: Advanced Augmentation Fine-tuning
|
| 40 |
-
- Dataset: STSB
|
| 41 |
- Method: Employed an advanced strategy using [Augmented SBERT](https://arxiv.org/abs/2010.08240) with Pair Sampling Strategies, integrating both Cross-Encoder and Bi-Encoder models. This stage further refined the embeddings by enriching the training data dynamically, enhancing the model's robustness and accuracy.
|
| 42 |
|
| 43 |
|
|
@@ -53,11 +53,10 @@ Then you can use the model like this:
|
|
| 53 |
|
| 54 |
```python
|
| 55 |
from sentence_transformers import SentenceTransformer
|
| 56 |
-
from pyvi.ViTokenizer import tokenize
|
| 57 |
|
| 58 |
sentences = ["Paris est une capitale de la France", "Paris is a capital of France"]
|
| 59 |
|
| 60 |
-
model = SentenceTransformer('Lajavaness/bilingual-embedding-large', trust_remote_code=True)
|
| 61 |
print(embeddings)
|
| 62 |
|
| 63 |
```
|
|
|
|
| 37 |
- Dataset: [STSB-fr and en]
|
| 38 |
- Method: Fine-tuning specifically for the semantic textual similarity benchmark using Siamese BERT-Networks configured with the 'sentence-transformers' library.
|
| 39 |
### Stage 4: Advanced Augmentation Fine-tuning
|
| 40 |
+
- Dataset: STSB with generate [silver sample from gold sample](https://www.sbert.net/examples/training/data_augmentation/README.html)
|
| 41 |
- Method: Employed an advanced strategy using [Augmented SBERT](https://arxiv.org/abs/2010.08240) with Pair Sampling Strategies, integrating both Cross-Encoder and Bi-Encoder models. This stage further refined the embeddings by enriching the training data dynamically, enhancing the model's robustness and accuracy.
|
| 42 |
|
| 43 |
|
|
|
|
| 53 |
|
| 54 |
```python
|
| 55 |
from sentence_transformers import SentenceTransformer
|
|
|
|
| 56 |
|
| 57 |
sentences = ["Paris est une capitale de la France", "Paris is a capital of France"]
|
| 58 |
|
| 59 |
+
model = SentenceTransformer('Lajavaness/bilingual-embedding-large-8k', trust_remote_code=True)
|
| 60 |
print(embeddings)
|
| 61 |
|
| 62 |
```
|