SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-MiniLM-L6-v2
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Stergios-Konstantinidis/MNLP_M2_document_encoder")
# Run inference
sentences = [
    '        "To generate queer warmth phrases, we employed persona prompting to adapt our SAE warmth phrases (see Table\\u00a04). Three distinct personas were designed and used as prompts to produce iterations of the 14 SAE warmth phrases. Each phrase was processed through all three persona prompts (see Table\\u00a08), resulting in a total of 42 unique queer warmth phrases. The final set of phrases is presented below.",',
    '        "To generate queer warmth phrases, we employed persona prompting to adapt our SAE warmth phrases (see Table\\u00a04). Three distinct personas were designed and used as prompts to produce iterations of the 14 SAE warmth phrases. Each phrase was processed through all three persona prompts (see Table\\u00a08), resulting in a total of 42 unique queer warmth phrases. The final set of phrases is presented below.",',
    '    "title": "Always skip attention",',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 21,000 training samples
  • Columns: sentence_0, sentence_1, and label
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 label
    type string string int
    details
    • min: 3 tokens
    • mean: 173.22 tokens
    • max: 256 tokens
    • min: 3 tokens
    • mean: 170.67 tokens
    • max: 256 tokens
    • 0: ~66.60%
    • 1: ~33.40%
  • Samples:
    sentence_0 sentence_1 label
    "the user may robustify the design by selecting a suitable A^^\ud835\udc34\widehat{A}over^ start_ARG italic_A end_ARG. Only the choice of A^^\ud835\udc34\widehat{A}over^ start_ARG italic_A end_ARG has an impact at an algorithmic level and, normally, A^^\ud835\udc34\widehat{A}over^ start_ARG italic_A end_ARG is tuned to a set A\ud835\udc34Aitalic_A that, in the user\u2019s mind, captures, and suitably describes, possible adversarial actions. Still, we remark that our results hold true for any choice of A^^\ud835\udc34\widehat{A}over^ start_ARG italic_A end_ARG and A\ud835\udc34Aitalic_A (with A^\u2286A^\ud835\udc34\ud835\udc34\widehat{A}\subseteq Aover^ start_ARG italic_A end_ARG \u2286 italic_A), so accommodating situations in which, e.g., the user envisages adversarial actions of a certain type and, yet, he is willing to theoretically test the robustness of the design against actions of higher magnitude. One simple example of this situation occurs when the design is done... "the user may robustify the design by selecting a suitable A^^\ud835\udc34\widehat{A}over^ start_ARG italic_A end_ARG. Only the choice of A^^\ud835\udc34\widehat{A}over^ start_ARG italic_A end_ARG has an impact at an algorithmic level and, normally, A^^\ud835\udc34\widehat{A}over^ start_ARG italic_A end_ARG is tuned to a set A\ud835\udc34Aitalic_A that, in the user\u2019s mind, captures, and suitably describes, possible adversarial actions. Still, we remark that our results hold true for any choice of A^^\ud835\udc34\widehat{A}over^ start_ARG italic_A end_ARG and A\ud835\udc34Aitalic_A (with A^\u2286A^\ud835\udc34\ud835\udc34\widehat{A}\subseteq Aover^ start_ARG italic_A end_ARG \u2286 italic_A), so accommodating situations in which, e.g., the user envisages adversarial actions of a certain type and, yet, he is willing to theoretically test the robustness of the design against actions of higher magnitude. One simple example of this situation occurs when the design is done... 1
    "Aha Moment of R1-Reward. Through our task design and reward function formulation, the R1-Reward model effectively learns the reward modeling task structure during the SFT phase. Following reinforcement learning, it reduces the length of reasoning to enhance efficiency. Visual examples of the model\u2019s output appear in Figures\u00a03 and\u00a06. The model autonomously learns a process to assess response quality. It first defines the goal, analyzes the image, attempts to solve the problem, and provides an answer. Based on this, the model evaluates Response 1 and Response 2, compares the two outputs, and gives a final ranking. Simultaneously, the model demonstrates different reflection patterns. In Figure\u00a03, the model encounters an error in its calculation, but after rechecking the bar chart, it recognizes the mistake and recalculates to obtain the correct result. In Figure\u00a06, the model misunderstands the problem. However, after outputting \u201cWait, re-reading the ... "In an ideal case, the hole made after the punch doesn\u2019t move and keeps the size of the needle. Then the hole is filled with a subsequent paint layer, if it is not made in the top layer.", 0
    "In our search for the optimal parameters, we evaluated all possible combinations presented in Section\u00a03.3. To do this, we aggregated the results for each specific parameter configuration and computed the mean metrics. This approach allowed us to isolate the effects of each parameter under evaluation.", "We employ RWP to model the movement of humans within the indoor space and use the Matern hard-core process (MHCP) to model static obstacles, such as furniture or static humans, in the environment [15].", 0
  • Loss: ContrastiveTensionLoss

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 3
  • per_device_eval_batch_size: 3
  • num_train_epochs: 10
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 3
  • per_device_eval_batch_size: 3
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Click to expand
Epoch Step Training Loss
0.0714 500 1.8871
0.1429 1000 1.7445
0.2143 1500 1.7138
0.2857 2000 1.699
0.3571 2500 1.6729
0.4286 3000 1.6864
0.5 3500 1.6718
0.5714 4000 1.6754
0.6429 4500 1.6747
0.7143 5000 1.6709
0.7857 5500 1.6797
0.8571 6000 1.6768
0.9286 6500 1.6694
1.0 7000 1.6754
1.0714 7500 1.6632
1.1429 8000 1.6643
1.2143 8500 1.6553
1.2857 9000 1.6626
1.3571 9500 1.6734
1.4286 10000 1.673
1.5 10500 1.6611
1.5714 11000 1.671
1.6429 11500 1.6762
1.7143 12000 1.6717
1.7857 12500 1.6599
1.8571 13000 1.681
1.9286 13500 1.6715
2.0 14000 1.6815
2.0714 14500 1.6304
2.1429 15000 1.6351
2.2143 15500 1.648
2.2857 16000 1.6538
2.3571 16500 1.6396
2.4286 17000 1.632
2.5 17500 1.6497
2.5714 18000 1.6526
2.6429 18500 1.6346
2.7143 19000 1.6548
2.7857 19500 1.6549
2.8571 20000 1.6438
2.9286 20500 1.6448
3.0 21000 1.6435
3.0714 21500 1.589
3.1429 22000 1.6075
3.2143 22500 1.6084
3.2857 23000 1.6061
3.3571 23500 1.6121
3.4286 24000 1.6168
3.5 24500 1.6022
3.5714 25000 1.6164
3.6429 25500 1.6132
3.7143 26000 1.6036
3.7857 26500 1.6077
3.8571 27000 1.6241
3.9286 27500 1.6224
4.0 28000 1.6023
4.0714 28500 1.5479
4.1429 29000 1.5569
4.2143 29500 1.5611
4.2857 30000 1.5413
4.3571 30500 1.5568
4.4286 31000 1.5458
4.5 31500 1.5405
4.5714 32000 1.5707
4.6429 32500 1.557
4.7143 33000 1.5561
4.7857 33500 1.5698
4.8571 34000 1.546
4.9286 34500 1.5589
5.0 35000 1.5692
5.0714 35500 1.5029
5.1429 36000 1.5087
5.2143 36500 1.4882
5.2857 37000 1.5116
5.3571 37500 1.5016
5.4286 38000 1.4988
5.5 38500 1.5065
5.5714 39000 1.5089
5.6429 39500 1.5104
5.7143 40000 1.4937
5.7857 40500 1.4974
5.8571 41000 1.5095
5.9286 41500 1.5064
6.0 42000 1.5119
6.0714 42500 1.4572
6.1429 43000 1.4732
6.2143 43500 1.4534
6.2857 44000 1.4598
6.3571 44500 1.4626
6.4286 45000 1.4486
6.5 45500 1.4677
6.5714 46000 1.4705
6.6429 46500 1.4757
6.7143 47000 1.4724
6.7857 47500 1.4744
6.8571 48000 1.4571
6.9286 48500 1.4571
7.0 49000 1.4549
7.0714 49500 1.4198
7.1429 50000 1.4328
7.2143 50500 1.4322
7.2857 51000 1.4191
7.3571 51500 1.4355
7.4286 52000 1.4409
7.5 52500 1.4366
7.5714 53000 1.4378
7.6429 53500 1.4229
7.7143 54000 1.4386
7.7857 54500 1.453
7.8571 55000 1.419
7.9286 55500 1.4215
8.0 56000 1.4248
8.0714 56500 1.4184
8.1429 57000 1.4059
8.2143 57500 1.4011
8.2857 58000 1.3962
8.3571 58500 1.4134
8.4286 59000 1.4104
8.5 59500 1.3924
8.5714 60000 1.4062
8.6429 60500 1.4117
8.7143 61000 1.4192
8.7857 61500 1.402
8.8571 62000 1.3998
8.9286 62500 1.4087
9.0 63000 1.4203
9.0714 63500 1.389
9.1429 64000 1.4049
9.2143 64500 1.3897
9.2857 65000 1.3839
9.3571 65500 1.3712
9.4286 66000 1.3908
9.5 66500 1.3986
9.5714 67000 1.4014
9.6429 67500 1.3919
9.7143 68000 1.404
9.7857 68500 1.3921
9.8571 69000 1.3918
9.9286 69500 1.4046
10.0 70000 1.3923

Framework Versions

  • Python: 3.12.8
  • Sentence Transformers: 3.4.1
  • Transformers: 4.51.3
  • PyTorch: 2.5.1+cu124
  • Accelerate: 1.3.0
  • Datasets: 3.6.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

ContrastiveTensionLoss

@inproceedings{carlsson2021semantic,
    title={Semantic Re-tuning with Contrastive Tension},
    author={Fredrik Carlsson and Amaru Cuba Gyllensten and Evangelia Gogoulou and Erik Ylip{"a}{"a} Hellqvist and Magnus Sahlgren},
    booktitle={International Conference on Learning Representations},
    year={2021},
    url={https://openreview.net/forum?id=Ov_sMNau-PF}
}
Downloads last month
1
Safetensors
Model size
22.7M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Stergios-Konstantinidis/MNLP_M2_document_encoder

Finetuned
(743)
this model

Paper for Stergios-Konstantinidis/MNLP_M2_document_encoder