Lajavaness
/

bilingual-document-embedding

Sentence Similarity

sentence-transformers

feature-extraction

sentence-embedding

Eval Results (legacy)

Model card Files Files and versions

dangvantuan commited on Jun 29, 2024

Commit

649f575

·

verified ·

1 Parent(s): c2ed2a2

Update README.md

Files changed (1) hide show

README.md +2 -3

README.md CHANGED Viewed

@@ -37,7 +37,7 @@ SentenceTransformer(
 - Dataset: [STSB-fr and en]
 - Method: Fine-tuning specifically for the semantic textual similarity benchmark using Siamese BERT-Networks configured with the 'sentence-transformers' library.
 ### Stage 4: Advanced Augmentation Fine-tuning
-- Dataset: STSB-vn with generate [silver sample from gold sample](https://www.sbert.net/examples/training/data_augmentation/README.html)
 - Method: Employed an advanced strategy using [Augmented SBERT](https://arxiv.org/abs/2010.08240) with Pair Sampling Strategies, integrating both Cross-Encoder and Bi-Encoder models. This stage further refined the embeddings by enriching the training data dynamically, enhancing the model's robustness and accuracy.
@@ -53,11 +53,10 @@ Then you can use the model like this:
 ```python
 from sentence_transformers import SentenceTransformer
-from pyvi.ViTokenizer import tokenize
 sentences = ["Paris est une capitale de la France", "Paris is a capital of France"]
-model = SentenceTransformer('Lajavaness/bilingual-embedding-large', trust_remote_code=True)
 print(embeddings)
 ```

 - Dataset: [STSB-fr and en]
 - Method: Fine-tuning specifically for the semantic textual similarity benchmark using Siamese BERT-Networks configured with the 'sentence-transformers' library.
 ### Stage 4: Advanced Augmentation Fine-tuning
+- Dataset: STSB with generate [silver sample from gold sample](https://www.sbert.net/examples/training/data_augmentation/README.html)
 - Method: Employed an advanced strategy using [Augmented SBERT](https://arxiv.org/abs/2010.08240) with Pair Sampling Strategies, integrating both Cross-Encoder and Bi-Encoder models. This stage further refined the embeddings by enriching the training data dynamically, enhancing the model's robustness and accuracy.
 ```python
 from sentence_transformers import SentenceTransformer
 sentences = ["Paris est une capitale de la France", "Paris is a capital of France"]
+model = SentenceTransformer('Lajavaness/bilingual-embedding-large-8k', trust_remote_code=True)
 print(embeddings)
 ```