Sentence Similarity
sentence-transformers
Safetensors
English
Russian
xlm-roberta
text-embeddings-inference
Instructions to use seniichev/me5-wb with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use seniichev/me5-wb with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("seniichev/me5-wb") sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Notebooks
- Google Colab
- Kaggle
Multilingual E5 WB
Fine-tuned version of default multilingual-e5-base for WB DS School and RAG project.
Evaluation Results
As model is used as retriever, goal was to boost its performance at cosine similarity between question and answer. With given dataset of QA pairs model performance on EmbeddingSimilarityEvaluator improved from 0.62 to 0.78.
Fine Tuning
DataLoader:
torch.utils.data.dataloader.DataLoader of length 790 with parameters:
{'batch_size': 12, 'sampler': 'torch.utils.data.sampler.SequentialSampler', 'batch_sampler': 'torch.utils.data.sampler.BatchSampler'}
Loss:
sentence_transformers.losses.ContrastiveLoss.ContrastiveLoss with parameters:
{'distance_metric': 'SiameseDistanceMetric.COSINE_DISTANCE', 'margin': 0.5, 'size_average': True}
Parameters of the fit()-Method:
{
"epochs": 10,
"evaluation_steps": 100,
"evaluator": "sentence_transformers.evaluation.EmbeddingSimilarityEvaluator.EmbeddingSimilarityEvaluator",
"optimizer_class": "<class 'torch.optim.adamw.AdamW'>",
"optimizer_params": {
"lr": 2e-05
},
"scheduler": "WarmupLinear",
"steps_per_epoch": null,
"warmup_steps": 790,
"weight_decay": 0.01
}
- Downloads last month
- 1