Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper • 1908.10084 • Published • 13
bge-m3-ml-tr-specialized is a Sentence Transformer model optimized for scientific and technical machine learning texts in Turkish. Based on BAAI/bge-m3, the model has been fine-tuned for tasks such as sentence similarity, semantic search, conceptual matching, and meaning-based classification.
SentenceTransformer(
(0): Transformer({'max_seq_length': 8192, 'architecture': 'XLMRobertaModel'})
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True})
(2): Normalize()
)
pip install -U sentence-transformers
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("dogukanvzr/bge-m3-ml-tr-specialized")
sentences = [
"Accuracy refers to how close a model's predictions are to the actual values.",
"Model accuracy indicates how well the predictions align with true labels.",
"Feature engineering plays a critical role in machine learning pipelines."
]
embeddings = model.encode(sentences)
from sklearn.metrics.pairwise import cosine_similarity
scores = cosine_similarity([embeddings[0]], embeddings[1:])
print(scores)
ml-paraphrase-tr sentence_0, sentence_1, label (float between 0.0–1.0 indicating similarity) CosineSimilarityLoss (internally uses MSELoss) | Epoch | Step | Average Loss |
|---|---|---|
| 0.5 | 500 | 0.0338 |
| 1.0 | 1000 | 0.0188 |
| 1.5 | 1500 | 0.0147 |
| 2.0 | 2000 | 0.0127 |
| 2.5 | 2500 | 0.0105 |
This model is particularly well-suited for the following NLP and ML tasks in Turkish:
s1 = "Machine learning algorithms learn from past data to make future predictions."
s2 = "The model performs inference based on learned patterns."
s3 = "The size of the dataset can affect the generalization capacity of the model."
embs = model.encode([s1, s2, s3])
from sklearn.metrics.pairwise import cosine_similarity
sim = cosine_similarity([embs[0]], embs[1:])
print(sim)
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
For bug reports, suggestions, or contributions:
ml-paraphrase-trBase model
BAAI/bge-m3