SentenceTransformer based on sentence-transformers/paraphrase-multilingual-mpnet-base-v2

This is a sentence-transformers model finetuned from sentence-transformers/paraphrase-multilingual-mpnet-base-v2. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False, 'architecture': 'XLMRobertaModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("spenccorp/frameR-marpor")
# Run inference
sentences = [
    'V prihajajočem obdobju, ko bo EU iskala rešitve in odgovore za svoj razvoj mora biti Slovenija aktivna in pozitivno naravnana država članica.',
    'Zavzemati se mora za svoje interese in pri tem upoštevati tudi specifike drugih držav.',
    'Wir haben den Rechtsanspruch auf einen Betreuungsplatz für unter dreijährige Kinder geschaffen.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000,  0.7975, -0.1152],
#         [ 0.7975,  1.0000, -0.2082],
#         [-0.1152, -0.2082,  1.0000]])

Training Details

Training Dataset

Unnamed Dataset

  • Size: 248,053 training samples
  • Columns: sentence_0 and sentence_1
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1
    type string string
    details
    • min: 4 tokens
    • mean: 28.56 tokens
    • max: 119 tokens
    • min: 4 tokens
    • mean: 29.47 tokens
    • max: 128 tokens
  • Samples:
    sentence_0 sentence_1
    De fet, en el període 1992-2005, Madrid va acumular el 56% de la inversió quan només tenia un 22% del trànsit aeri estatal total. En canvi, Barcelona, amb un 15% del trànsit, només rebia un 15% de la inversió.
    Πάγωμα των δανείων στη διάρκεια της ανεργίας. Οι μέρες ανεργίας να ασφαλίζονται.
    e impedir que los fondos europeos sean utilizados para socavar la democracia y las libertades fundamentales en cualquier lugar del Planeta. Avanzar en el establecimiento de un marco europeo de colaboración en políticas de defensa y seguridad humana.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss
0.0322 500 1.5646
0.0645 1000 1.2545
0.0967 1500 1.0603
0.1290 2000 0.6574
0.1612 2500 0.3502
0.1935 3000 0.3025
0.2257 3500 0.3002
0.2580 4000 0.2684
0.2902 4500 0.2816
0.3225 5000 0.2845
0.3547 5500 0.2798
0.3870 6000 0.2783
0.4192 6500 0.2806
0.4515 7000 0.264
0.4837 7500 0.2429
0.5160 8000 0.2684
0.5482 8500 0.2648
0.5805 9000 0.2558
0.6127 9500 0.2516
0.6450 10000 0.2477
0.6772 10500 0.2479
0.7095 11000 0.2419
0.7417 11500 0.2387
0.7740 12000 0.2301
0.8062 12500 0.2216
0.8385 13000 0.2353
0.8707 13500 0.2415
0.9030 14000 0.222
0.9352 14500 0.2381
0.9675 15000 0.2263
0.9997 15500 0.2399
1.0320 16000 0.1883
1.0642 16500 0.2012
1.0965 17000 0.1903
1.1287 17500 0.1847
1.1610 18000 0.1845
1.1932 18500 0.1961
1.2255 19000 0.1886
1.2577 19500 0.1806
1.2900 20000 0.1736
1.3222 20500 0.1785
1.3545 21000 0.1835
1.3867 21500 0.187
1.4190 22000 0.177
1.4512 22500 0.1596
1.4835 23000 0.1729
1.5157 23500 0.172
1.5480 24000 0.1679
1.5802 24500 0.1872
1.6125 25000 0.1713
1.6447 25500 0.1654
1.6770 26000 0.1816
1.7092 26500 0.1789
1.7415 27000 0.1793
1.7737 27500 0.1766
1.8060 28000 0.1698
1.8382 28500 0.1628
1.8705 29000 0.1527
1.9027 29500 0.1622
1.9350 30000 0.15
1.9672 30500 0.1593
1.9995 31000 0.1669
2.0317 31500 0.1292
2.0640 32000 0.1249
2.0962 32500 0.1426
2.1285 33000 0.1436
2.1607 33500 0.1216
2.1930 34000 0.1304
2.2252 34500 0.1233
2.2575 35000 0.1268
2.2897 35500 0.1308
2.3220 36000 0.1275
2.3542 36500 0.1264
2.3865 37000 0.1252
2.4187 37500 0.1288
2.4510 38000 0.1289
2.4832 38500 0.1216
2.5155 39000 0.1247
2.5477 39500 0.1228
2.5800 40000 0.1252
2.6122 40500 0.128
2.6445 41000 0.1211
2.6767 41500 0.1237
2.7090 42000 0.1231
2.7412 42500 0.1317
2.7735 43000 0.1211
2.8057 43500 0.13
2.8380 44000 0.1118
2.8702 44500 0.117
2.9025 45000 0.112
2.9347 45500 0.121
2.9670 46000 0.1232
2.9992 46500 0.1257

Framework Versions

  • Python: 3.11.7
  • Sentence Transformers: 5.2.0
  • Transformers: 4.57.3
  • PyTorch: 2.9.1+cu128
  • Accelerate: 1.12.0
  • Datasets: 4.5.0
  • Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
19
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for spenccorp/frameR-marpor

Papers for spenccorp/frameR-marpor