SentenceTransformer based on sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2

This is a sentence-transformers model finetuned from sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("mohsayed/para_tr_enar_1")
# Run inference
sentences = [
    'stress formula 20 capsules',
    'ستريس فورميولا 20 كبسول',
    'كورتيكوفيوسيديك كريم موضعي 30 جم',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 17,702 training samples
  • Columns: sentence1 and sentence2
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2
    type string string
    details
    • min: 6 tokens
    • mean: 10.29 tokens
    • max: 20 tokens
    • min: 7 tokens
    • mean: 12.42 tokens
    • max: 25 tokens
  • Samples:
    sentence1 sentence2
    azelast plus 125 / 50 mcg nasal spray 25 ml azelast plus 125/50 mcg nasal spray 25 ml
    ticanase plus 125 / 50 mcg nasal spray 15 ml ticanase plus 125/50 mcg nasal spray 15 ml
    nasostop 0.1% adult nasal drops 15 ml nasostop 0.1% adult nasal drops 15 ml
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 1,771 evaluation samples
  • Columns: sentence1 and sentence2
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2
    type string string
    details
    • min: 6 tokens
    • mean: 12.13 tokens
    • max: 47 tokens
    • min: 4 tokens
    • mean: 12.44 tokens
    • max: 26 tokens
  • Samples:
    sentence1 sentence2
    calcibella fortified liquid chocolate 200 gm كالسيبيلا شيكولاته سائلة 200 جم
    glaryl 4 mg 30 tab glaryl 4mg 30 tab.
    pixefresh mouth spray 60 ml بيكسيفريش بخاخ للفم 60 مل
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 15
  • warmup_ratio: 0.1
  • fp16: True
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 15
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss Validation Loss
0.0903 100 1.123 -
0.1807 200 0.2605 -
0.2710 300 0.1432 -
0.3613 400 0.1151 -
0.4517 500 0.09 -
0.5420 600 0.0666 -
0.6323 700 0.0534 -
0.7227 800 0.0593 -
0.8130 900 0.0484 -
0.9033 1000 0.0652 0.0302
0.9937 1100 0.0441 -
1.0840 1200 0.0333 -
1.1743 1300 0.0395 -
1.2647 1400 0.0357 -
1.3550 1500 0.0351 -
1.4453 1600 0.0338 -
1.5357 1700 0.0365 -
1.6260 1800 0.0518 -
1.7164 1900 0.0426 -
1.8067 2000 0.0312 0.0234
1.8970 2100 0.041 -
1.9874 2200 0.0401 -
2.0777 2300 0.0177 -
2.1680 2400 0.0216 -
2.2584 2500 0.0203 -
2.3487 2600 0.0184 -
2.4390 2700 0.0203 -
2.5294 2800 0.024 -
2.6197 2900 0.0154 -
2.7100 3000 0.0292 0.0147
2.8004 3100 0.025 -
2.8907 3200 0.02 -
2.9810 3300 0.0187 -
3.0714 3400 0.0264 -
3.1617 3500 0.0153 -
3.2520 3600 0.01 -
3.3424 3700 0.0156 -
3.4327 3800 0.014 -
3.5230 3900 0.027 -
3.6134 4000 0.014 0.0093
3.7037 4100 0.0134 -
3.7940 4200 0.0127 -
3.8844 4300 0.0223 -
3.9747 4400 0.0137 -
4.0650 4500 0.01 -
4.1554 4600 0.0135 -
4.2457 4700 0.0082 -
4.3360 4800 0.013 -
4.4264 4900 0.0075 -
4.5167 5000 0.0064 0.0060
4.6070 5100 0.0113 -
4.6974 5200 0.0109 -
4.7877 5300 0.0116 -
4.8780 5400 0.0105 -
4.9684 5500 0.0074 -
5.0587 5600 0.0084 -
5.1491 5700 0.0111 -
5.2394 5800 0.0027 -
5.3297 5900 0.0066 -
5.4201 6000 0.0064 0.0045
5.5104 6100 0.0044 -
5.6007 6200 0.0096 -
5.6911 6300 0.0065 -
5.7814 6400 0.0093 -
5.8717 6500 0.0136 -
5.9621 6600 0.0214 -
6.0524 6700 0.0054 -
6.1427 6800 0.0028 -
6.2331 6900 0.008 -
6.3234 7000 0.0115 0.0021
6.4137 7100 0.0045 -
6.5041 7200 0.0053 -
6.5944 7300 0.0083 -
6.6847 7400 0.0081 -
6.7751 7500 0.0035 -
6.8654 7600 0.0081 -
6.9557 7700 0.0063 -
7.0461 7800 0.0056 -
7.1364 7900 0.0034 -
7.2267 8000 0.0069 0.0025
7.3171 8100 0.0026 -
7.4074 8200 0.0047 -
7.4977 8300 0.0034 -
7.5881 8400 0.0052 -
7.6784 8500 0.0081 -
7.7687 8600 0.0023 -
7.8591 8700 0.004 -
7.9494 8800 0.004 -
8.0397 8900 0.003 -
8.1301 9000 0.0032 0.0031
8.2204 9100 0.0054 -
8.3107 9200 0.0058 -
8.4011 9300 0.0044 -
8.4914 9400 0.0029 -
8.5818 9500 0.0039 -
8.6721 9600 0.0033 -
8.7624 9700 0.0061 -
8.8528 9800 0.0029 -
8.9431 9900 0.0037 -
9.0334 10000 0.0024 0.0020
9.1238 10100 0.0046 -
9.2141 10200 0.0037 -
9.3044 10300 0.0041 -
9.3948 10400 0.0064 -
9.4851 10500 0.0058 -
9.5754 10600 0.0058 -
9.6658 10700 0.0031 -
9.7561 10800 0.0015 -
9.8464 10900 0.0037 -
9.9368 11000 0.0045 0.0013
10.0271 11100 0.0038 -
10.1174 11200 0.0027 -
10.2078 11300 0.0061 -
10.2981 11400 0.0046 -
10.3884 11500 0.0028 -
10.4788 11600 0.0021 -
10.5691 11700 0.0029 -
10.6594 11800 0.005 -
10.7498 11900 0.002 -
10.8401 12000 0.0058 0.0012
10.9304 12100 0.003 -
11.0208 12200 0.0005 -
11.1111 12300 0.0022 -
11.2014 12400 0.0046 -
11.2918 12500 0.0028 -
11.3821 12600 0.0016 -
11.4724 12700 0.0026 -
11.5628 12800 0.0025 -
11.6531 12900 0.0009 -
11.7435 13000 0.0022 0.0014
11.8338 13100 0.0021 -
11.9241 13200 0.0018 -
12.0145 13300 0.0032 -
12.1048 13400 0.0024 -
12.1951 13500 0.0029 -
12.2855 13600 0.0009 -
12.3758 13700 0.0009 -
12.4661 13800 0.002 -
12.5565 13900 0.0026 -
12.6468 14000 0.0008 0.0011
12.7371 14100 0.0016 -
12.8275 14200 0.0012 -
12.9178 14300 0.0009 -
13.0081 14400 0.0013 -
13.0985 14500 0.0013 -
13.1888 14600 0.004 -
13.2791 14700 0.0006 -
13.3695 14800 0.0025 -
13.4598 14900 0.0004 -
13.5501 15000 0.0021 0.0010
13.6405 15100 0.0023 -
13.7308 15200 0.0054 -
13.8211 15300 0.0014 -
13.9115 15400 0.0028 -
14.0018 15500 0.0008 -
14.0921 15600 0.0006 -
14.1825 15700 0.0015 -
14.2728 15800 0.0004 -
14.3631 15900 0.005 -
14.4535 16000 0.0009 0.0011
14.5438 16100 0.0022 -
14.6341 16200 0.0015 -
14.7245 16300 0.0021 -
14.8148 16400 0.0012 -
14.9051 16500 0.0005 -
14.9955 16600 0.0019 -
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.11.11
  • Sentence Transformers: 4.0.2
  • Transformers: 4.50.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.5.2
  • Datasets: 3.5.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
10
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mohsayed/para_tr_enar_1