Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper
•
1908.10084
•
Published
•
11
This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("Stergios-Konstantinidis/MNLP_M2_document_encoder")
# Run inference
sentences = [
' "To generate queer warmth phrases, we employed persona prompting to adapt our SAE warmth phrases (see Table\\u00a04). Three distinct personas were designed and used as prompts to produce iterations of the 14 SAE warmth phrases. Each phrase was processed through all three persona prompts (see Table\\u00a08), resulting in a total of 42 unique queer warmth phrases. The final set of phrases is presented below.",',
' "To generate queer warmth phrases, we employed persona prompting to adapt our SAE warmth phrases (see Table\\u00a04). Three distinct personas were designed and used as prompts to produce iterations of the 14 SAE warmth phrases. Each phrase was processed through all three persona prompts (see Table\\u00a08), resulting in a total of 42 unique queer warmth phrases. The final set of phrases is presented below.",',
' "title": "Always skip attention",',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
sentence_0, sentence_1, and label| sentence_0 | sentence_1 | label | |
|---|---|---|---|
| type | string | string | int |
| details |
|
|
|
| sentence_0 | sentence_1 | label |
|---|---|---|
"the user may robustify the design by selecting a suitable A^^\ud835\udc34\widehat{A}over^ start_ARG italic_A end_ARG. Only the choice of A^^\ud835\udc34\widehat{A}over^ start_ARG italic_A end_ARG has an impact at an algorithmic level and, normally, A^^\ud835\udc34\widehat{A}over^ start_ARG italic_A end_ARG is tuned to a set A\ud835\udc34Aitalic_A that, in the user\u2019s mind, captures, and suitably describes, possible adversarial actions. Still, we remark that our results hold true for any choice of A^^\ud835\udc34\widehat{A}over^ start_ARG italic_A end_ARG and A\ud835\udc34Aitalic_A (with A^\u2286A^\ud835\udc34\ud835\udc34\widehat{A}\subseteq Aover^ start_ARG italic_A end_ARG \u2286 italic_A), so accommodating situations in which, e.g., the user envisages adversarial actions of a certain type and, yet, he is willing to theoretically test the robustness of the design against actions of higher magnitude. One simple example of this situation occurs when the design is done... |
"the user may robustify the design by selecting a suitable A^^\ud835\udc34\widehat{A}over^ start_ARG italic_A end_ARG. Only the choice of A^^\ud835\udc34\widehat{A}over^ start_ARG italic_A end_ARG has an impact at an algorithmic level and, normally, A^^\ud835\udc34\widehat{A}over^ start_ARG italic_A end_ARG is tuned to a set A\ud835\udc34Aitalic_A that, in the user\u2019s mind, captures, and suitably describes, possible adversarial actions. Still, we remark that our results hold true for any choice of A^^\ud835\udc34\widehat{A}over^ start_ARG italic_A end_ARG and A\ud835\udc34Aitalic_A (with A^\u2286A^\ud835\udc34\ud835\udc34\widehat{A}\subseteq Aover^ start_ARG italic_A end_ARG \u2286 italic_A), so accommodating situations in which, e.g., the user envisages adversarial actions of a certain type and, yet, he is willing to theoretically test the robustness of the design against actions of higher magnitude. One simple example of this situation occurs when the design is done... |
1 |
"Aha Moment of R1-Reward. Through our task design and reward function formulation, the R1-Reward model effectively learns the reward modeling task structure during the SFT phase. Following reinforcement learning, it reduces the length of reasoning to enhance efficiency. Visual examples of the model\u2019s output appear in Figures\u00a03 and\u00a06. The model autonomously learns a process to assess response quality. It first defines the goal, analyzes the image, attempts to solve the problem, and provides an answer. Based on this, the model evaluates Response 1 and Response 2, compares the two outputs, and gives a final ranking. Simultaneously, the model demonstrates different reflection patterns. In Figure\u00a03, the model encounters an error in its calculation, but after rechecking the bar chart, it recognizes the mistake and recalculates to obtain the correct result. In Figure\u00a06, the model misunderstands the problem. However, after outputting \u201cWait, re-reading the ... |
"In an ideal case, the hole made after the punch doesn\u2019t move and keeps the size of the needle. Then the hole is filled with a subsequent paint layer, if it is not made in the top layer.", |
0 |
"In our search for the optimal parameters, we evaluated all possible combinations presented in Section\u00a03.3. To do this, we aggregated the results for each specific parameter configuration and computed the mean metrics. This approach allowed us to isolate the effects of each parameter under evaluation.", |
"We employ RWP to model the movement of humans within the indoor space and use the Matern hard-core process (MHCP) to model static obstacles, such as furniture or static humans, in the environment [15].", |
0 |
ContrastiveTensionLossper_device_train_batch_size: 3per_device_eval_batch_size: 3num_train_epochs: 10multi_dataset_batch_sampler: round_robinoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 3per_device_eval_batch_size: 3per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 10max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}tp_size: 0fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robin| Epoch | Step | Training Loss |
|---|---|---|
| 0.0714 | 500 | 1.8871 |
| 0.1429 | 1000 | 1.7445 |
| 0.2143 | 1500 | 1.7138 |
| 0.2857 | 2000 | 1.699 |
| 0.3571 | 2500 | 1.6729 |
| 0.4286 | 3000 | 1.6864 |
| 0.5 | 3500 | 1.6718 |
| 0.5714 | 4000 | 1.6754 |
| 0.6429 | 4500 | 1.6747 |
| 0.7143 | 5000 | 1.6709 |
| 0.7857 | 5500 | 1.6797 |
| 0.8571 | 6000 | 1.6768 |
| 0.9286 | 6500 | 1.6694 |
| 1.0 | 7000 | 1.6754 |
| 1.0714 | 7500 | 1.6632 |
| 1.1429 | 8000 | 1.6643 |
| 1.2143 | 8500 | 1.6553 |
| 1.2857 | 9000 | 1.6626 |
| 1.3571 | 9500 | 1.6734 |
| 1.4286 | 10000 | 1.673 |
| 1.5 | 10500 | 1.6611 |
| 1.5714 | 11000 | 1.671 |
| 1.6429 | 11500 | 1.6762 |
| 1.7143 | 12000 | 1.6717 |
| 1.7857 | 12500 | 1.6599 |
| 1.8571 | 13000 | 1.681 |
| 1.9286 | 13500 | 1.6715 |
| 2.0 | 14000 | 1.6815 |
| 2.0714 | 14500 | 1.6304 |
| 2.1429 | 15000 | 1.6351 |
| 2.2143 | 15500 | 1.648 |
| 2.2857 | 16000 | 1.6538 |
| 2.3571 | 16500 | 1.6396 |
| 2.4286 | 17000 | 1.632 |
| 2.5 | 17500 | 1.6497 |
| 2.5714 | 18000 | 1.6526 |
| 2.6429 | 18500 | 1.6346 |
| 2.7143 | 19000 | 1.6548 |
| 2.7857 | 19500 | 1.6549 |
| 2.8571 | 20000 | 1.6438 |
| 2.9286 | 20500 | 1.6448 |
| 3.0 | 21000 | 1.6435 |
| 3.0714 | 21500 | 1.589 |
| 3.1429 | 22000 | 1.6075 |
| 3.2143 | 22500 | 1.6084 |
| 3.2857 | 23000 | 1.6061 |
| 3.3571 | 23500 | 1.6121 |
| 3.4286 | 24000 | 1.6168 |
| 3.5 | 24500 | 1.6022 |
| 3.5714 | 25000 | 1.6164 |
| 3.6429 | 25500 | 1.6132 |
| 3.7143 | 26000 | 1.6036 |
| 3.7857 | 26500 | 1.6077 |
| 3.8571 | 27000 | 1.6241 |
| 3.9286 | 27500 | 1.6224 |
| 4.0 | 28000 | 1.6023 |
| 4.0714 | 28500 | 1.5479 |
| 4.1429 | 29000 | 1.5569 |
| 4.2143 | 29500 | 1.5611 |
| 4.2857 | 30000 | 1.5413 |
| 4.3571 | 30500 | 1.5568 |
| 4.4286 | 31000 | 1.5458 |
| 4.5 | 31500 | 1.5405 |
| 4.5714 | 32000 | 1.5707 |
| 4.6429 | 32500 | 1.557 |
| 4.7143 | 33000 | 1.5561 |
| 4.7857 | 33500 | 1.5698 |
| 4.8571 | 34000 | 1.546 |
| 4.9286 | 34500 | 1.5589 |
| 5.0 | 35000 | 1.5692 |
| 5.0714 | 35500 | 1.5029 |
| 5.1429 | 36000 | 1.5087 |
| 5.2143 | 36500 | 1.4882 |
| 5.2857 | 37000 | 1.5116 |
| 5.3571 | 37500 | 1.5016 |
| 5.4286 | 38000 | 1.4988 |
| 5.5 | 38500 | 1.5065 |
| 5.5714 | 39000 | 1.5089 |
| 5.6429 | 39500 | 1.5104 |
| 5.7143 | 40000 | 1.4937 |
| 5.7857 | 40500 | 1.4974 |
| 5.8571 | 41000 | 1.5095 |
| 5.9286 | 41500 | 1.5064 |
| 6.0 | 42000 | 1.5119 |
| 6.0714 | 42500 | 1.4572 |
| 6.1429 | 43000 | 1.4732 |
| 6.2143 | 43500 | 1.4534 |
| 6.2857 | 44000 | 1.4598 |
| 6.3571 | 44500 | 1.4626 |
| 6.4286 | 45000 | 1.4486 |
| 6.5 | 45500 | 1.4677 |
| 6.5714 | 46000 | 1.4705 |
| 6.6429 | 46500 | 1.4757 |
| 6.7143 | 47000 | 1.4724 |
| 6.7857 | 47500 | 1.4744 |
| 6.8571 | 48000 | 1.4571 |
| 6.9286 | 48500 | 1.4571 |
| 7.0 | 49000 | 1.4549 |
| 7.0714 | 49500 | 1.4198 |
| 7.1429 | 50000 | 1.4328 |
| 7.2143 | 50500 | 1.4322 |
| 7.2857 | 51000 | 1.4191 |
| 7.3571 | 51500 | 1.4355 |
| 7.4286 | 52000 | 1.4409 |
| 7.5 | 52500 | 1.4366 |
| 7.5714 | 53000 | 1.4378 |
| 7.6429 | 53500 | 1.4229 |
| 7.7143 | 54000 | 1.4386 |
| 7.7857 | 54500 | 1.453 |
| 7.8571 | 55000 | 1.419 |
| 7.9286 | 55500 | 1.4215 |
| 8.0 | 56000 | 1.4248 |
| 8.0714 | 56500 | 1.4184 |
| 8.1429 | 57000 | 1.4059 |
| 8.2143 | 57500 | 1.4011 |
| 8.2857 | 58000 | 1.3962 |
| 8.3571 | 58500 | 1.4134 |
| 8.4286 | 59000 | 1.4104 |
| 8.5 | 59500 | 1.3924 |
| 8.5714 | 60000 | 1.4062 |
| 8.6429 | 60500 | 1.4117 |
| 8.7143 | 61000 | 1.4192 |
| 8.7857 | 61500 | 1.402 |
| 8.8571 | 62000 | 1.3998 |
| 8.9286 | 62500 | 1.4087 |
| 9.0 | 63000 | 1.4203 |
| 9.0714 | 63500 | 1.389 |
| 9.1429 | 64000 | 1.4049 |
| 9.2143 | 64500 | 1.3897 |
| 9.2857 | 65000 | 1.3839 |
| 9.3571 | 65500 | 1.3712 |
| 9.4286 | 66000 | 1.3908 |
| 9.5 | 66500 | 1.3986 |
| 9.5714 | 67000 | 1.4014 |
| 9.6429 | 67500 | 1.3919 |
| 9.7143 | 68000 | 1.404 |
| 9.7857 | 68500 | 1.3921 |
| 9.8571 | 69000 | 1.3918 |
| 9.9286 | 69500 | 1.4046 |
| 10.0 | 70000 | 1.3923 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@inproceedings{carlsson2021semantic,
title={Semantic Re-tuning with Contrastive Tension},
author={Fredrik Carlsson and Amaru Cuba Gyllensten and Evangelia Gogoulou and Erik Ylip{"a}{"a} Hellqvist and Magnus Sahlgren},
booktitle={International Conference on Learning Representations},
year={2021},
url={https://openreview.net/forum?id=Ov_sMNau-PF}
}
Base model
sentence-transformers/all-MiniLM-L6-v2