SentenceTransformer based on sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2

This is a sentence-transformers model finetuned from sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
Maximum Sequence Length: 128 tokens
Output Dimensionality: 384 dimensions
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("mohsayed/para_tr_enar_1")
# Run inference
sentences = [
    'stress formula 20 capsules',
    'ستريس فورميولا 20 كبسول',
    'كورتيكوفيوسيديك كريم موضعي 30 جم',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

Size: 17,702 training samples
Columns: sentence1 and sentence2
Approximate statistics based on the first 1000 samples:
sentence1 sentence2
type string string
details
min: 6 tokens
mean: 10.29 tokens
max: 20 tokens

min: 7 tokens
mean: 12.42 tokens
max: 25 tokens

	sentence1	sentence2
type	string	string
details	min: 6 tokens mean: 10.29 tokens max: 20 tokens	min: 7 tokens mean: 12.42 tokens max: 25 tokens

Samples:

sentence1	sentence2
`azelast plus 125 / 50 mcg nasal spray 25 ml`	`azelast plus 125/50 mcg nasal spray 25 ml`
`ticanase plus 125 / 50 mcg nasal spray 15 ml`	`ticanase plus 125/50 mcg nasal spray 15 ml`
`nasostop 0.1% adult nasal drops 15 ml`	`nasostop 0.1% adult nasal drops 15 ml`

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

Evaluation Dataset

Unnamed Dataset

Size: 1,771 evaluation samples
Columns: sentence1 and sentence2
Approximate statistics based on the first 1000 samples:
sentence1 sentence2
type string string
details
min: 6 tokens
mean: 12.13 tokens
max: 47 tokens

min: 4 tokens
mean: 12.44 tokens
max: 26 tokens

	sentence1	sentence2
type	string	string
details	min: 6 tokens mean: 12.13 tokens max: 47 tokens	min: 4 tokens mean: 12.44 tokens max: 26 tokens

Samples:

sentence1	sentence2
`calcibella fortified liquid chocolate 200 gm`	`كالسيبيلا شيكولاته سائلة 200 جم`
`glaryl 4 mg 30 tab`	`glaryl 4mg 30 tab.`
`pixefresh mouth spray 60 ml`	`بيكسيفريش بخاخ للفم 60 مل`

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
num_train_epochs: 15
warmup_ratio: 0.1
fp16: True
load_best_model_at_end: True

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 5e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 15
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: True
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: True
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
tp_size: 0
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional

Training Logs

Click to expand

Epoch	Step	Training Loss	Validation Loss
0.0903	100	1.123	-
0.1807	200	0.2605	-
0.2710	300	0.1432	-
0.3613	400	0.1151	-
0.4517	500	0.09	-
0.5420	600	0.0666	-
0.6323	700	0.0534	-
0.7227	800	0.0593	-
0.8130	900	0.0484	-
0.9033	1000	0.0652	0.0302
0.9937	1100	0.0441	-
1.0840	1200	0.0333	-
1.1743	1300	0.0395	-
1.2647	1400	0.0357	-
1.3550	1500	0.0351	-
1.4453	1600	0.0338	-
1.5357	1700	0.0365	-
1.6260	1800	0.0518	-
1.7164	1900	0.0426	-
1.8067	2000	0.0312	0.0234
1.8970	2100	0.041	-
1.9874	2200	0.0401	-
2.0777	2300	0.0177	-
2.1680	2400	0.0216	-
2.2584	2500	0.0203	-
2.3487	2600	0.0184	-
2.4390	2700	0.0203	-
2.5294	2800	0.024	-
2.6197	2900	0.0154	-
2.7100	3000	0.0292	0.0147
2.8004	3100	0.025	-
2.8907	3200	0.02	-
2.9810	3300	0.0187	-
3.0714	3400	0.0264	-
3.1617	3500	0.0153	-
3.2520	3600	0.01	-
3.3424	3700	0.0156	-
3.4327	3800	0.014	-
3.5230	3900	0.027	-
3.6134	4000	0.014	0.0093
3.7037	4100	0.0134	-
3.7940	4200	0.0127	-
3.8844	4300	0.0223	-
3.9747	4400	0.0137	-
4.0650	4500	0.01	-
4.1554	4600	0.0135	-
4.2457	4700	0.0082	-
4.3360	4800	0.013	-
4.4264	4900	0.0075	-
4.5167	5000	0.0064	0.0060
4.6070	5100	0.0113	-
4.6974	5200	0.0109	-
4.7877	5300	0.0116	-
4.8780	5400	0.0105	-
4.9684	5500	0.0074	-
5.0587	5600	0.0084	-
5.1491	5700	0.0111	-
5.2394	5800	0.0027	-
5.3297	5900	0.0066	-
5.4201	6000	0.0064	0.0045
5.5104	6100	0.0044	-
5.6007	6200	0.0096	-
5.6911	6300	0.0065	-
5.7814	6400	0.0093	-
5.8717	6500	0.0136	-
5.9621	6600	0.0214	-
6.0524	6700	0.0054	-
6.1427	6800	0.0028	-
6.2331	6900	0.008	-
6.3234	7000	0.0115	0.0021
6.4137	7100	0.0045	-
6.5041	7200	0.0053	-
6.5944	7300	0.0083	-
6.6847	7400	0.0081	-
6.7751	7500	0.0035	-
6.8654	7600	0.0081	-
6.9557	7700	0.0063	-
7.0461	7800	0.0056	-
7.1364	7900	0.0034	-
7.2267	8000	0.0069	0.0025
7.3171	8100	0.0026	-
7.4074	8200	0.0047	-
7.4977	8300	0.0034	-
7.5881	8400	0.0052	-
7.6784	8500	0.0081	-
7.7687	8600	0.0023	-
7.8591	8700	0.004	-
7.9494	8800	0.004	-
8.0397	8900	0.003	-
8.1301	9000	0.0032	0.0031
8.2204	9100	0.0054	-
8.3107	9200	0.0058	-
8.4011	9300	0.0044	-
8.4914	9400	0.0029	-
8.5818	9500	0.0039	-
8.6721	9600	0.0033	-
8.7624	9700	0.0061	-
8.8528	9800	0.0029	-
8.9431	9900	0.0037	-
9.0334	10000	0.0024	0.0020
9.1238	10100	0.0046	-
9.2141	10200	0.0037	-
9.3044	10300	0.0041	-
9.3948	10400	0.0064	-
9.4851	10500	0.0058	-
9.5754	10600	0.0058	-
9.6658	10700	0.0031	-
9.7561	10800	0.0015	-
9.8464	10900	0.0037	-
9.9368	11000	0.0045	0.0013
10.0271	11100	0.0038	-
10.1174	11200	0.0027	-
10.2078	11300	0.0061	-
10.2981	11400	0.0046	-
10.3884	11500	0.0028	-
10.4788	11600	0.0021	-
10.5691	11700	0.0029	-
10.6594	11800	0.005	-
10.7498	11900	0.002	-
10.8401	12000	0.0058	0.0012
10.9304	12100	0.003	-
11.0208	12200	0.0005	-
11.1111	12300	0.0022	-
11.2014	12400	0.0046	-
11.2918	12500	0.0028	-
11.3821	12600	0.0016	-
11.4724	12700	0.0026	-
11.5628	12800	0.0025	-
11.6531	12900	0.0009	-
11.7435	13000	0.0022	0.0014
11.8338	13100	0.0021	-
11.9241	13200	0.0018	-
12.0145	13300	0.0032	-
12.1048	13400	0.0024	-
12.1951	13500	0.0029	-
12.2855	13600	0.0009	-
12.3758	13700	0.0009	-
12.4661	13800	0.002	-
12.5565	13900	0.0026	-
12.6468	14000	0.0008	0.0011
12.7371	14100	0.0016	-
12.8275	14200	0.0012	-
12.9178	14300	0.0009	-
13.0081	14400	0.0013	-
13.0985	14500	0.0013	-
13.1888	14600	0.004	-
13.2791	14700	0.0006	-
13.3695	14800	0.0025	-
13.4598	14900	0.0004	-
13.5501	15000	0.0021	0.0010
13.6405	15100	0.0023	-
13.7308	15200	0.0054	-
13.8211	15300	0.0014	-
13.9115	15400	0.0028	-
14.0018	15500	0.0008	-
14.0921	15600	0.0006	-
14.1825	15700	0.0015	-
14.2728	15800	0.0004	-
14.3631	15900	0.005	-
14.4535	16000	0.0009	0.0011
14.5438	16100	0.0022	-
14.6341	16200	0.0015	-
14.7245	16300	0.0021	-
14.8148	16400	0.0012	-
14.9051	16500	0.0005	-
14.9955	16600	0.0019	-

The bold row denotes the saved checkpoint.

Framework Versions

Python: 3.11.11
Sentence Transformers: 4.0.2
Transformers: 4.50.3
PyTorch: 2.6.0+cu124
Accelerate: 1.5.2
Datasets: 3.5.0
Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}