Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper • 1908.10084 • Published • 13
How to use Stergios-Konstantinidis/MNLP_M2_document_encoder with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("Stergios-Konstantinidis/MNLP_M2_document_encoder")
sentences = [
" \"The lemma follows by invoking Lemma 4.1 and Lemma A.1.\\n\\u220e\",",
" \"To better address non-stationarity with changing uncertainty, we introduce Location-Scale Noise Model (LSNM) into DDPMs, which relaxes the traditional Additive Noise Model (ANM) by incorporating a contextually changing variance: \\ud835\\udc18=f\\u2062(\\ud835\\udc17)+g\\u2062(\\ud835\\udc17)\\u2062\\u03f5\\ud835\\udc18\\ud835\\udc53\\ud835\\udc17\\ud835\\udc54\\ud835\\udc17bold-italic-\\u03f5\\\\mathbf{Y}=f(\\\\mathbf{X})+\\\\sqrt{g(\\\\mathbf{X})}\\\\boldsymbol{\\\\epsilon}bold_Y = italic_f ( bold_X ) + square-root start_ARG italic_g ( bold_X ) end_ARG bold_italic_\\u03f5, where g\\u2062(\\ud835\\udc17)\\ud835\\udc54\\ud835\\udc17g(\\\\mathbf{X})italic_g ( bold_X ) is an \\ud835\\udc17\\ud835\\udc17\\\\mathbf{X}bold_X-dependent variance model. LSNM is capable of modeling both the contextual mean through f\\u2062(\\ud835\\udc17)\\ud835\\udc53\\ud835\\udc17f(\\\\mathbf{X})italic_f ( bold_X ) and the contextual uncertainty through g\\u2062(\\ud835\\udc17)\\ud835\\udc54\\ud835\\udc17\\\\sqrt{g(\\\\mathbf{X})}square-root start_ARG italic_g ( bold_X ) end_ARG. In the special case where g\\u2062(\\ud835\\udc17)\\u22611\\ud835\\udc54\\ud835\\udc171g(\\\\mathbf{X})\\\\equiv 1italic_g ( bold_X ) \\u2261 1, this simplifies to the standard ANM. Building upon this more flexible and expressive assumption, we propose the Non-stationary Diffusion Model (NsDiff) framework, which provides an uncertainty-aware noise schedule for both forward and reverse diffusion processes. In summary, our contributions are as:\\n\\n\\n\\u2022\\n\\nWe observe that the ANM is inadequate for capturing the varying uncertainty and propose a novel framework that integrates LSNM to allow for explict uncertainty modeling. This work is the first attempt to introduce LSNM into probabilistic time series forecasting.\\n\\n\\n\\n\\u2022\\n\\nTo fundamentally elevate the noise modeling capabilities of DDPM, we seamlessly integrate time-varying variances into the core diffusion process through an uncertainty-aware noise schedule that dynamically adapts the noise variance at each step.\\n\\n\\n\\n\\n\\u2022\\n\\nExperimental results indicate that NsDiff achieves superior performance in capturing uncertainty. Specifically, in comparison to the second-best recent baseline TMDM, NsDiff improves up to 66.3% on real-world datasets and 88.3% on synthetic datasets.\",",
" \"The deep neural network representation of the Bifrost simulations is highly compressed compared to the original Bifrost data: the deep neural network has 44,261 floating point values whereas the Bifrost simulation cube has 96\\u22c596\\u22c564\\u22c520=11,796,480\\u22c5969664201179648096\\\\cdot 96\\\\cdot 64\\\\cdot 20=11,796,48096 \\u22c5 96 \\u22c5 64 \\u22c5 20 = 11 , 796 , 480 floating point values. This corresponds to a compression by a factor of 267; this compression factor may be different for other numerical simulations and depends on their smoothness. In addition, the deep neural network can be evaluated at any point in space and time covered by the simulations, therefore enabling a trivial way to interpolate between grid points; furthermore, gradients are calculate with high efficiency with automatic differentiation. As such, it might be worth considering releasing deep-neural-network representations of (magneto)hydrodynamic simulations.\",",
" \"\\u03f5y\\u2062(\\u03bc)={1nt\\u2062\\u2211i=nkntey\\u2062(ti,\\u03bc)=1nt\\u2062\\u2211i=nknt|y~\\u2062(ti,\\u03bc)\\u2212y\\u2062(ti,\\u03bc)|if\\u00a0\\u20621nt\\u2062\\u2211i=nknt|y\\u2062(ti,\\u03bc)|\\u22641,1nt\\u2062\\u2211i=nkntey,r\\u2062e\\u2062l\\u2062(ti,\\u03bc)=1nt\\u2062\\u2211i=nknt|y~\\u2062(ti,\\u03bc)\\u2212y\\u2062(ti,\\u03bc)|/|y\\u2062(ti,\\u03bc)|if\\u00a0\\u20621nt\\u2062\\u2211i=nknt|y\\u2062(ti,\\u03bc)|>1.subscriptitalic-\\u03f5\\ud835\\udc66\\ud835\\udf07cases1subscript\\ud835\\udc5b\\ud835\\udc61superscriptsubscript\\ud835\\udc56subscript\\ud835\\udc5b\\ud835\\udc58subscript\\ud835\\udc5b\\ud835\\udc61subscript\\ud835\\udc52\\ud835\\udc66subscript\\ud835\\udc61\\ud835\\udc56\\ud835\\udf071subscript\\ud835\\udc5b\\ud835\\udc61superscriptsubscript\\ud835\\udc56subscript\\ud835\\udc5b\\ud835\\udc58subscript\\ud835\\udc5b\\ud835\\udc61~\\ud835\\udc66subscript\\ud835\\udc61\\ud835\\udc56\\ud835\\udf07\\ud835\\udc66subscript\\ud835\\udc61\\ud835\\udc56\\ud835\\udf07if\\u00a01subscript\\ud835\\udc5b\\ud835\\udc61superscriptsubscript\\ud835\\udc56subscript\\ud835\\udc5b\\ud835\\udc58subscript\\ud835\\udc5b\\ud835\\udc61\\ud835\\udc66subscript\\ud835\\udc61\\ud835\\udc56\\ud835\\udf0711subscript\\ud835\\udc5b\\ud835\\udc61superscriptsubscript\\ud835\\udc56subscript\\ud835\\udc5b\\ud835\\udc58subscript\\ud835\\udc5b\\ud835\\udc61subscript\\ud835\\udc52\\ud835\\udc66\\ud835\\udc5f\\ud835\\udc52\\ud835\\udc59subscript\\ud835\\udc61\\ud835\\udc56\\ud835\\udf071subscript\\ud835\\udc5b\\ud835\\udc61superscriptsubscript\\ud835\\udc56subscript\\ud835\\udc5b\\ud835\\udc58subscript\\ud835\\udc5b\\ud835\\udc61~\\ud835\\udc66subscript\\ud835\\udc61\\ud835\\udc56\\ud835\\udf07\\ud835\\udc66subscript\\ud835\\udc61\\ud835\\udc56\\ud835\\udf07\\ud835\\udc66subscript\\ud835\\udc61\\ud835\\udc56\\ud835\\udf07if\\u00a01subscript\\ud835\\udc5b\\ud835\\udc61superscriptsubscript\\ud835\\udc56subscript\\ud835\\udc5b\\ud835\\udc58subscript\\ud835\\udc5b\\ud835\\udc61\\ud835\\udc66subscript\\ud835\\udc61\\ud835\\udc56\\ud835\\udf071\\\\centering\\\\epsilon_{y}(\\\\mu)=\\\\begin{cases}\\\\frac{1}{n_{t}}\\\\sum\\\\limits_{i=n_{k}}^%\\n{n_{t}}e_{y}(t_{i},\\\\mu)=\\\\frac{1}{n_{t}}\\\\sum\\\\limits_{i=n_{k}}^{n_{t}}|\\\\tilde{y}%\\n(t_{i},\\\\mu)-y(t_{i},\\\\mu)|&\\\\text{if }\\\\frac{1}{n_{t}}\\\\sum\\\\limits_{i=n_{k}}^{n_{t%\\n}}|y(t_{i},\\\\mu)|\\\\leq 1,\\\\\\\\\\n\\\\frac{1}{n_{t}}\\\\sum\\\\limits_{i=n_{k}}^{n_{t}}e_{y,rel}(t_{i},\\\\mu)=\\\\frac{1}{n_{t%\\n}}\\\\sum\\\\limits_{i=n_{k}}^{n_{t}}|\\\\tilde{y}(t_{i},\\\\mu)-y(t_{i},\\\\mu)|/|y(t_{i},%\\n\\\\mu)|&\\\\text{if }\\\\frac{1}{n_{t}}\\\\sum\\\\limits_{i=n_{k}}^{n_{t}}|y(t_{i},\\\\mu)|>1.%\\n\\\\end{cases}\\\\@add@centeringitalic_\\u03f5 start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ( italic_\\u03bc ) = { start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG \\u2211 start_POSTSUBSCRIPT italic_i = italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_e start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_\\u03bc ) = divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG \\u2211 start_POSTSUBSCRIPT italic_i = italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | over~ start_ARG italic_y end_ARG ( italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_\\u03bc ) - italic_y ( italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_\\u03bc ) | end_CELL start_CELL if divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG \\u2211 start_POSTSUBSCRIPT italic_i = italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | italic_y ( italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_\\u03bc ) | \\u2264 1 , end_CELL end_ROW start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG \\u2211 start_POSTSUBSCRIPT italic_i = italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_e start_POSTSUBSCRIPT italic_y , italic_r italic_e italic_l end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_\\u03bc ) = divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG \\u2211 start_POSTSUBSCRIPT italic_i = italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | over~ start_ARG italic_y end_ARG ( italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_\\u03bc ) - italic_y ( italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_\\u03bc ) | / | italic_y ( italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_\\u03bc ) | end_CELL start_CELL if divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG \\u2211 start_POSTSUBSCRIPT italic_i = italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | italic_y ( italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_\\u03bc ) | > 1 . end_CELL end_ROW\\n\\n(12)\","
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("Stergios-Konstantinidis/MNLP_M2_document_encoder")
# Run inference
sentences = [
' "To generate queer warmth phrases, we employed persona prompting to adapt our SAE warmth phrases (see Table\\u00a04). Three distinct personas were designed and used as prompts to produce iterations of the 14 SAE warmth phrases. Each phrase was processed through all three persona prompts (see Table\\u00a08), resulting in a total of 42 unique queer warmth phrases. The final set of phrases is presented below.",',
' "To generate queer warmth phrases, we employed persona prompting to adapt our SAE warmth phrases (see Table\\u00a04). Three distinct personas were designed and used as prompts to produce iterations of the 14 SAE warmth phrases. Each phrase was processed through all three persona prompts (see Table\\u00a08), resulting in a total of 42 unique queer warmth phrases. The final set of phrases is presented below.",',
' "title": "Always skip attention",',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
sentence_0, sentence_1, and label| sentence_0 | sentence_1 | label | |
|---|---|---|---|
| type | string | string | int |
| details |
|
|
|
| sentence_0 | sentence_1 | label |
|---|---|---|
"the user may robustify the design by selecting a suitable A^^\ud835\udc34\widehat{A}over^ start_ARG italic_A end_ARG. Only the choice of A^^\ud835\udc34\widehat{A}over^ start_ARG italic_A end_ARG has an impact at an algorithmic level and, normally, A^^\ud835\udc34\widehat{A}over^ start_ARG italic_A end_ARG is tuned to a set A\ud835\udc34Aitalic_A that, in the user\u2019s mind, captures, and suitably describes, possible adversarial actions. Still, we remark that our results hold true for any choice of A^^\ud835\udc34\widehat{A}over^ start_ARG italic_A end_ARG and A\ud835\udc34Aitalic_A (with A^\u2286A^\ud835\udc34\ud835\udc34\widehat{A}\subseteq Aover^ start_ARG italic_A end_ARG \u2286 italic_A), so accommodating situations in which, e.g., the user envisages adversarial actions of a certain type and, yet, he is willing to theoretically test the robustness of the design against actions of higher magnitude. One simple example of this situation occurs when the design is done... |
"the user may robustify the design by selecting a suitable A^^\ud835\udc34\widehat{A}over^ start_ARG italic_A end_ARG. Only the choice of A^^\ud835\udc34\widehat{A}over^ start_ARG italic_A end_ARG has an impact at an algorithmic level and, normally, A^^\ud835\udc34\widehat{A}over^ start_ARG italic_A end_ARG is tuned to a set A\ud835\udc34Aitalic_A that, in the user\u2019s mind, captures, and suitably describes, possible adversarial actions. Still, we remark that our results hold true for any choice of A^^\ud835\udc34\widehat{A}over^ start_ARG italic_A end_ARG and A\ud835\udc34Aitalic_A (with A^\u2286A^\ud835\udc34\ud835\udc34\widehat{A}\subseteq Aover^ start_ARG italic_A end_ARG \u2286 italic_A), so accommodating situations in which, e.g., the user envisages adversarial actions of a certain type and, yet, he is willing to theoretically test the robustness of the design against actions of higher magnitude. One simple example of this situation occurs when the design is done... |
1 |
"Aha Moment of R1-Reward. Through our task design and reward function formulation, the R1-Reward model effectively learns the reward modeling task structure during the SFT phase. Following reinforcement learning, it reduces the length of reasoning to enhance efficiency. Visual examples of the model\u2019s output appear in Figures\u00a03 and\u00a06. The model autonomously learns a process to assess response quality. It first defines the goal, analyzes the image, attempts to solve the problem, and provides an answer. Based on this, the model evaluates Response 1 and Response 2, compares the two outputs, and gives a final ranking. Simultaneously, the model demonstrates different reflection patterns. In Figure\u00a03, the model encounters an error in its calculation, but after rechecking the bar chart, it recognizes the mistake and recalculates to obtain the correct result. In Figure\u00a06, the model misunderstands the problem. However, after outputting \u201cWait, re-reading the ... |
"In an ideal case, the hole made after the punch doesn\u2019t move and keeps the size of the needle. Then the hole is filled with a subsequent paint layer, if it is not made in the top layer.", |
0 |
"In our search for the optimal parameters, we evaluated all possible combinations presented in Section\u00a03.3. To do this, we aggregated the results for each specific parameter configuration and computed the mean metrics. This approach allowed us to isolate the effects of each parameter under evaluation.", |
"We employ RWP to model the movement of humans within the indoor space and use the Matern hard-core process (MHCP) to model static obstacles, such as furniture or static humans, in the environment [15].", |
0 |
ContrastiveTensionLossper_device_train_batch_size: 3per_device_eval_batch_size: 3num_train_epochs: 10multi_dataset_batch_sampler: round_robinoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 3per_device_eval_batch_size: 3per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 10max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}tp_size: 0fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robin| Epoch | Step | Training Loss |
|---|---|---|
| 0.0714 | 500 | 1.8871 |
| 0.1429 | 1000 | 1.7445 |
| 0.2143 | 1500 | 1.7138 |
| 0.2857 | 2000 | 1.699 |
| 0.3571 | 2500 | 1.6729 |
| 0.4286 | 3000 | 1.6864 |
| 0.5 | 3500 | 1.6718 |
| 0.5714 | 4000 | 1.6754 |
| 0.6429 | 4500 | 1.6747 |
| 0.7143 | 5000 | 1.6709 |
| 0.7857 | 5500 | 1.6797 |
| 0.8571 | 6000 | 1.6768 |
| 0.9286 | 6500 | 1.6694 |
| 1.0 | 7000 | 1.6754 |
| 1.0714 | 7500 | 1.6632 |
| 1.1429 | 8000 | 1.6643 |
| 1.2143 | 8500 | 1.6553 |
| 1.2857 | 9000 | 1.6626 |
| 1.3571 | 9500 | 1.6734 |
| 1.4286 | 10000 | 1.673 |
| 1.5 | 10500 | 1.6611 |
| 1.5714 | 11000 | 1.671 |
| 1.6429 | 11500 | 1.6762 |
| 1.7143 | 12000 | 1.6717 |
| 1.7857 | 12500 | 1.6599 |
| 1.8571 | 13000 | 1.681 |
| 1.9286 | 13500 | 1.6715 |
| 2.0 | 14000 | 1.6815 |
| 2.0714 | 14500 | 1.6304 |
| 2.1429 | 15000 | 1.6351 |
| 2.2143 | 15500 | 1.648 |
| 2.2857 | 16000 | 1.6538 |
| 2.3571 | 16500 | 1.6396 |
| 2.4286 | 17000 | 1.632 |
| 2.5 | 17500 | 1.6497 |
| 2.5714 | 18000 | 1.6526 |
| 2.6429 | 18500 | 1.6346 |
| 2.7143 | 19000 | 1.6548 |
| 2.7857 | 19500 | 1.6549 |
| 2.8571 | 20000 | 1.6438 |
| 2.9286 | 20500 | 1.6448 |
| 3.0 | 21000 | 1.6435 |
| 3.0714 | 21500 | 1.589 |
| 3.1429 | 22000 | 1.6075 |
| 3.2143 | 22500 | 1.6084 |
| 3.2857 | 23000 | 1.6061 |
| 3.3571 | 23500 | 1.6121 |
| 3.4286 | 24000 | 1.6168 |
| 3.5 | 24500 | 1.6022 |
| 3.5714 | 25000 | 1.6164 |
| 3.6429 | 25500 | 1.6132 |
| 3.7143 | 26000 | 1.6036 |
| 3.7857 | 26500 | 1.6077 |
| 3.8571 | 27000 | 1.6241 |
| 3.9286 | 27500 | 1.6224 |
| 4.0 | 28000 | 1.6023 |
| 4.0714 | 28500 | 1.5479 |
| 4.1429 | 29000 | 1.5569 |
| 4.2143 | 29500 | 1.5611 |
| 4.2857 | 30000 | 1.5413 |
| 4.3571 | 30500 | 1.5568 |
| 4.4286 | 31000 | 1.5458 |
| 4.5 | 31500 | 1.5405 |
| 4.5714 | 32000 | 1.5707 |
| 4.6429 | 32500 | 1.557 |
| 4.7143 | 33000 | 1.5561 |
| 4.7857 | 33500 | 1.5698 |
| 4.8571 | 34000 | 1.546 |
| 4.9286 | 34500 | 1.5589 |
| 5.0 | 35000 | 1.5692 |
| 5.0714 | 35500 | 1.5029 |
| 5.1429 | 36000 | 1.5087 |
| 5.2143 | 36500 | 1.4882 |
| 5.2857 | 37000 | 1.5116 |
| 5.3571 | 37500 | 1.5016 |
| 5.4286 | 38000 | 1.4988 |
| 5.5 | 38500 | 1.5065 |
| 5.5714 | 39000 | 1.5089 |
| 5.6429 | 39500 | 1.5104 |
| 5.7143 | 40000 | 1.4937 |
| 5.7857 | 40500 | 1.4974 |
| 5.8571 | 41000 | 1.5095 |
| 5.9286 | 41500 | 1.5064 |
| 6.0 | 42000 | 1.5119 |
| 6.0714 | 42500 | 1.4572 |
| 6.1429 | 43000 | 1.4732 |
| 6.2143 | 43500 | 1.4534 |
| 6.2857 | 44000 | 1.4598 |
| 6.3571 | 44500 | 1.4626 |
| 6.4286 | 45000 | 1.4486 |
| 6.5 | 45500 | 1.4677 |
| 6.5714 | 46000 | 1.4705 |
| 6.6429 | 46500 | 1.4757 |
| 6.7143 | 47000 | 1.4724 |
| 6.7857 | 47500 | 1.4744 |
| 6.8571 | 48000 | 1.4571 |
| 6.9286 | 48500 | 1.4571 |
| 7.0 | 49000 | 1.4549 |
| 7.0714 | 49500 | 1.4198 |
| 7.1429 | 50000 | 1.4328 |
| 7.2143 | 50500 | 1.4322 |
| 7.2857 | 51000 | 1.4191 |
| 7.3571 | 51500 | 1.4355 |
| 7.4286 | 52000 | 1.4409 |
| 7.5 | 52500 | 1.4366 |
| 7.5714 | 53000 | 1.4378 |
| 7.6429 | 53500 | 1.4229 |
| 7.7143 | 54000 | 1.4386 |
| 7.7857 | 54500 | 1.453 |
| 7.8571 | 55000 | 1.419 |
| 7.9286 | 55500 | 1.4215 |
| 8.0 | 56000 | 1.4248 |
| 8.0714 | 56500 | 1.4184 |
| 8.1429 | 57000 | 1.4059 |
| 8.2143 | 57500 | 1.4011 |
| 8.2857 | 58000 | 1.3962 |
| 8.3571 | 58500 | 1.4134 |
| 8.4286 | 59000 | 1.4104 |
| 8.5 | 59500 | 1.3924 |
| 8.5714 | 60000 | 1.4062 |
| 8.6429 | 60500 | 1.4117 |
| 8.7143 | 61000 | 1.4192 |
| 8.7857 | 61500 | 1.402 |
| 8.8571 | 62000 | 1.3998 |
| 8.9286 | 62500 | 1.4087 |
| 9.0 | 63000 | 1.4203 |
| 9.0714 | 63500 | 1.389 |
| 9.1429 | 64000 | 1.4049 |
| 9.2143 | 64500 | 1.3897 |
| 9.2857 | 65000 | 1.3839 |
| 9.3571 | 65500 | 1.3712 |
| 9.4286 | 66000 | 1.3908 |
| 9.5 | 66500 | 1.3986 |
| 9.5714 | 67000 | 1.4014 |
| 9.6429 | 67500 | 1.3919 |
| 9.7143 | 68000 | 1.404 |
| 9.7857 | 68500 | 1.3921 |
| 9.8571 | 69000 | 1.3918 |
| 9.9286 | 69500 | 1.4046 |
| 10.0 | 70000 | 1.3923 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@inproceedings{carlsson2021semantic,
title={Semantic Re-tuning with Contrastive Tension},
author={Fredrik Carlsson and Amaru Cuba Gyllensten and Evangelia Gogoulou and Erik Ylip{"a}{"a} Hellqvist and Magnus Sahlgren},
booktitle={International Conference on Learning Representations},
year={2021},
url={https://openreview.net/forum?id=Ov_sMNau-PF}
}
Base model
sentence-transformers/all-MiniLM-L6-v2