Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper • 1908.10084 • Published • 12
This is a sentence-transformers model finetuned from Master-thesis-NAP/ModernBert-DAPT-math. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("Master-thesis-NAP/ModernBERT-DAPT-Embed-DAPT-Math")
# Run inference
sentences = [
"Does Werner-Young's inequality imply that the convolution of two $L^p$ spaces is always $L^r$ for $1 < r < \\infty$?",
"[Werner-Young's inequality]\\label{Young op-op}\nSuppose $S\\in \\cS^p$ and $T\\in \\cS^q$ with $1+r^{-1}=p^{-1}+q^{-1}$.\nThen $S\\star T\\in L^r(\\R^{2d})$ and\n\\begin{align*}\n \\|S\\star T\\|_{L^{r}}\\leq \\|S\\|_{\\cS^p}\\|T\\|_{\\cS^q}.\n\\end{align*}",
'$\\cE^{(0)}_{p,\\alpha}$ satisfies the second Beurling-Deny criterion. If $1 < p_- \\leq p_+ < \\infty$, it is reflexive and satisfies the $\\Delta_2$-condition. \n %',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
TESTINGInformationRetrievalEvaluator| Metric | Value |
|---|---|
| cosine_accuracy@1 | 0.568 |
| cosine_accuracy@3 | 0.6324 |
| cosine_accuracy@5 | 0.6586 |
| cosine_accuracy@10 | 0.6938 |
| cosine_precision@1 | 0.568 |
| cosine_precision@3 | 0.3649 |
| cosine_precision@5 | 0.2774 |
| cosine_precision@10 | 0.1819 |
| cosine_recall@1 | 0.0265 |
| cosine_recall@3 | 0.0487 |
| cosine_recall@5 | 0.0599 |
| cosine_recall@10 | 0.0752 |
| cosine_ndcg@10 | 0.2532 |
| cosine_mrr@10 | 0.607 |
| cosine_map@100 | 0.0742 |
anchor, positive, and negative| anchor | positive | negative | |
|---|---|---|---|
| type | string | string | string |
| details |
|
|
|
| anchor | positive | negative |
|---|---|---|
What is the limit of the proportion of 1's in the sequence $a_n$ as $n$ approaches infinity, given that $0 \leq 3g_n -2n \leq 4$? |
Let $g_n$ be the number of $1$'s in the sequence $a_1 a_2 \cdots a_n$. |
\label{thm:bounds_initial} |
Does the statement of \textbf{ThmConjAreTrue} imply that the maximum genus of a locally Cohen-Macaulay curve in $\mathbb{P}^3_{\mathbb{C}}$ of degree $d$ that does not lie on a surface of degree $s-1$ is always equal to $g(d,s)$? |
\label{ThmConjAreTrue} |
[{\cite[Corollary 2.2.2 with $p=3$]{BSY}}] |
\emph{Is the statement \emph{If $X$ is a compact Hausdorff space, then $X$ is normal}, proven in the first isomorphism theorem for topological groups, or is it a well-known result in topology?} |
} |
\label{prop:coherence} |
TripletLoss with these parameters:{
"distance_metric": "TripletDistanceMetric.COSINE",
"triplet_margin": 0.1
}
eval_strategy: epochper_device_train_batch_size: 16per_device_eval_batch_size: 16gradient_accumulation_steps: 8learning_rate: 2e-05num_train_epochs: 4lr_scheduler_type: cosinewarmup_ratio: 0.1bf16: Truetf32: Trueload_best_model_at_end: Trueoptim: adamw_torch_fusedbatch_sampler: no_duplicatesoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: epochprediction_loss_only: Trueper_device_train_batch_size: 16per_device_eval_batch_size: 16per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 8eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 2e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 4max_steps: -1lr_scheduler_type: cosinelr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Truefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Truelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Trueignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}tp_size: 0fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportional| Epoch | Step | Training Loss | TESTING_cosine_ndcg@10 |
|---|---|---|---|
| 0.0160 | 10 | 1.1162 | - |
| 0.0320 | 20 | 1.0465 | - |
| 0.0481 | 30 | 0.9663 | - |
| 0.0641 | 40 | 0.8758 | - |
| 0.0801 | 50 | 0.8215 | - |
| 0.0961 | 60 | 0.7492 | - |
| 0.1122 | 70 | 0.6356 | - |
| 0.1282 | 80 | 0.3573 | - |
| 0.1442 | 90 | 0.166 | - |
| 0.1602 | 100 | 0.0797 | - |
| 0.1762 | 110 | 0.046 | - |
| 0.1923 | 120 | 0.0419 | - |
| 0.2083 | 130 | 0.025 | - |
| 0.2243 | 140 | 0.0233 | - |
| 0.2403 | 150 | 0.0205 | - |
| 0.2564 | 160 | 0.0142 | - |
| 0.2724 | 170 | 0.017 | - |
| 0.2884 | 180 | 0.0157 | - |
| 0.3044 | 190 | 0.0104 | - |
| 0.3204 | 200 | 0.0126 | - |
| 0.3365 | 210 | 0.019 | - |
| 0.3525 | 220 | 0.0153 | - |
| 0.3685 | 230 | 0.0171 | - |
| 0.3845 | 240 | 0.0124 | - |
| 0.4006 | 250 | 0.01 | - |
| 0.4166 | 260 | 0.0071 | - |
| 0.4326 | 270 | 0.0125 | - |
| 0.4486 | 280 | 0.0096 | - |
| 0.4647 | 290 | 0.0092 | - |
| 0.4807 | 300 | 0.0067 | - |
| 0.4967 | 310 | 0.0069 | - |
| 0.5127 | 320 | 0.0054 | - |
| 0.5287 | 330 | 0.0107 | - |
| 0.5448 | 340 | 0.0115 | - |
| 0.5608 | 350 | 0.0083 | - |
| 0.5768 | 360 | 0.0175 | - |
| 0.5928 | 370 | 0.0162 | - |
| 0.6089 | 380 | 0.0094 | - |
| 0.6249 | 390 | 0.0124 | - |
| 0.6409 | 400 | 0.0078 | - |
| 0.6569 | 410 | 0.014 | - |
| 0.6729 | 420 | 0.0117 | - |
| 0.6890 | 430 | 0.0097 | - |
| 0.7050 | 440 | 0.0094 | - |
| 0.7210 | 450 | 0.0077 | - |
| 0.7370 | 460 | 0.0103 | - |
| 0.7531 | 470 | 0.0099 | - |
| 0.7691 | 480 | 0.0123 | - |
| 0.7851 | 490 | 0.0103 | - |
| 0.8011 | 500 | 0.0098 | - |
| 0.8171 | 510 | 0.0059 | - |
| 0.8332 | 520 | 0.0031 | - |
| 0.8492 | 530 | 0.0075 | - |
| 0.8652 | 540 | 0.0101 | - |
| 0.8812 | 550 | 0.0099 | - |
| 0.8973 | 560 | 0.0098 | - |
| 0.9133 | 570 | 0.0072 | - |
| 0.9293 | 580 | 0.0057 | - |
| 0.9453 | 590 | 0.0074 | - |
| 0.9613 | 600 | 0.0038 | - |
| 0.9774 | 610 | 0.0127 | - |
| 0.9934 | 620 | 0.0098 | - |
| 1.0 | 625 | - | 0.2532 |
| 1.0080 | 630 | 0.0064 | - |
| 1.0240 | 640 | 0.0066 | - |
| 1.0401 | 650 | 0.0056 | - |
| 1.0561 | 660 | 0.0031 | - |
| 1.0721 | 670 | 0.0023 | - |
| 1.0881 | 680 | 0.0032 | - |
| 1.1041 | 690 | 0.0021 | - |
| 1.1202 | 700 | 0.0011 | - |
| 1.1362 | 710 | 0.006 | - |
| 1.1522 | 720 | 0.0045 | - |
| 1.1682 | 730 | 0.0041 | - |
| 1.1843 | 740 | 0.0026 | - |
| 1.2003 | 750 | 0.0019 | - |
| 1.2163 | 760 | 0.0058 | - |
| 1.2323 | 770 | 0.0054 | - |
| 1.2483 | 780 | 0.0066 | - |
| 1.2644 | 790 | 0.0033 | - |
| 1.2804 | 800 | 0.004 | - |
| 1.2964 | 810 | 0.0028 | - |
| 1.3124 | 820 | 0.0027 | - |
| 1.3285 | 830 | 0.0017 | - |
| 1.3445 | 840 | 0.0009 | - |
| 1.3605 | 850 | 0.0048 | - |
| 1.3765 | 860 | 0.0037 | - |
| 1.3925 | 870 | 0.0045 | - |
| 1.4086 | 880 | 0.0043 | - |
| 1.4246 | 890 | 0.0046 | - |
| 1.4406 | 900 | 0.0023 | - |
| 1.4566 | 910 | 0.0031 | - |
| 1.4727 | 920 | 0.0027 | - |
| 1.4887 | 930 | 0.0022 | - |
| 1.5047 | 940 | 0.0042 | - |
| 1.5207 | 950 | 0.0026 | - |
| 1.5368 | 960 | 0.0049 | - |
| 1.5528 | 970 | 0.0024 | - |
| 1.5688 | 980 | 0.0019 | - |
| 1.5848 | 990 | 0.0038 | - |
| 1.6008 | 1000 | 0.0036 | - |
| 1.6169 | 1010 | 0.0023 | - |
| 1.6329 | 1020 | 0.0021 | - |
| 1.6489 | 1030 | 0.0011 | - |
| 1.6649 | 1040 | 0.0025 | - |
| 1.6810 | 1050 | 0.0026 | - |
| 1.6970 | 1060 | 0.0034 | - |
| 1.7130 | 1070 | 0.0024 | - |
| 1.7290 | 1080 | 0.0038 | - |
| 1.7450 | 1090 | 0.002 | - |
| 1.7611 | 1100 | 0.0046 | - |
| 1.7771 | 1110 | 0.0003 | - |
| 1.7931 | 1120 | 0.0062 | - |
| 1.8091 | 1130 | 0.0057 | - |
| 1.8252 | 1140 | 0.0012 | - |
| 1.8412 | 1150 | 0.0021 | - |
| 1.8572 | 1160 | 0.0038 | - |
| 1.8732 | 1170 | 0.0024 | - |
| 1.8892 | 1180 | 0.0026 | - |
| 1.9053 | 1190 | 0.0034 | - |
| 1.9213 | 1200 | 0.0064 | - |
| 1.9373 | 1210 | 0.0041 | - |
| 1.9533 | 1220 | 0.0032 | - |
| 1.9694 | 1230 | 0.0028 | - |
| 1.9854 | 1240 | 0.0009 | - |
| 2.0 | 1250 | 0.0042 | 0.2488 |
| 2.0160 | 1260 | 0.0005 | - |
| 2.0320 | 1270 | 0.0018 | - |
| 2.0481 | 1280 | 0.0009 | - |
| 2.0641 | 1290 | 0.001 | - |
| 2.0801 | 1300 | 0.0024 | - |
| 2.0961 | 1310 | 0.0011 | - |
| 2.1122 | 1320 | 0.0008 | - |
| 2.1282 | 1330 | 0.0001 | - |
| 2.1442 | 1340 | 0.0006 | - |
| 2.1602 | 1350 | 0.0005 | - |
| 2.1762 | 1360 | 0.0003 | - |
| 2.1923 | 1370 | 0.0 | - |
| 2.2083 | 1380 | 0.0 | - |
| 2.2243 | 1390 | 0.0001 | - |
| 2.2403 | 1400 | 0.0001 | - |
| 2.2564 | 1410 | 0.0027 | - |
| 2.2724 | 1420 | 0.0005 | - |
| 2.2884 | 1430 | 0.0007 | - |
| 2.3044 | 1440 | 0.0001 | - |
| 2.3204 | 1450 | 0.0002 | - |
| 2.3365 | 1460 | 0.001 | - |
| 2.3525 | 1470 | 0.0003 | - |
| 2.3685 | 1480 | 0.001 | - |
| 2.3845 | 1490 | 0.0 | - |
| 2.4006 | 1500 | 0.0006 | - |
| 2.4166 | 1510 | 0.0007 | - |
| 2.4326 | 1520 | 0.0007 | - |
| 2.4486 | 1530 | 0.0004 | - |
| 2.4647 | 1540 | 0.0007 | - |
| 2.4807 | 1550 | 0.0012 | - |
| 2.4967 | 1560 | 0.0015 | - |
| 2.5127 | 1570 | 0.0014 | - |
| 2.5287 | 1580 | 0.0005 | - |
| 2.5448 | 1590 | 0.0005 | - |
| 2.5608 | 1600 | 0.0014 | - |
| 2.5768 | 1610 | 0.0016 | - |
| 2.5928 | 1620 | 0.0 | - |
| 2.6089 | 1630 | 0.0002 | - |
| 2.6249 | 1640 | 0.0006 | - |
| 2.6409 | 1650 | 0.0002 | - |
| 2.6569 | 1660 | 0.0003 | - |
| 2.6729 | 1670 | 0.0007 | - |
| 2.6890 | 1680 | 0.0005 | - |
| 2.7050 | 1690 | 0.0007 | - |
| 2.7210 | 1700 | 0.0 | - |
| 2.7370 | 1710 | 0.0008 | - |
| 2.7531 | 1720 | 0.0019 | - |
| 2.7691 | 1730 | 0.0017 | - |
| 2.7851 | 1740 | 0.0002 | - |
| 2.8011 | 1750 | 0.0002 | - |
| 2.8171 | 1760 | 0.0002 | - |
| 2.8332 | 1770 | 0.0014 | - |
| 2.8492 | 1780 | 0.0005 | - |
| 2.8652 | 1790 | 0.0021 | - |
| 2.8812 | 1800 | 0.002 | - |
| 2.8973 | 1810 | 0.0021 | - |
| 2.9133 | 1820 | 0.0007 | - |
| 2.9293 | 1830 | 0.0 | - |
| 2.9453 | 1840 | 0.0011 | - |
| 2.9613 | 1850 | 0.0006 | - |
| 2.9774 | 1860 | 0.0008 | - |
| 2.9934 | 1870 | 0.0001 | - |
| 3.0 | 1875 | - | 0.2516 |
| 3.0080 | 1880 | 0.0033 | - |
| 3.0240 | 1890 | 0.0 | - |
| 3.0401 | 1900 | 0.0 | - |
| 3.0561 | 1910 | 0.0009 | - |
| 3.0721 | 1920 | 0.0001 | - |
| 3.0881 | 1930 | 0.001 | - |
| 3.1041 | 1940 | 0.0001 | - |
| 3.1202 | 1950 | 0.0001 | - |
| 3.1362 | 1960 | 0.0 | - |
| 3.1522 | 1970 | 0.0003 | - |
| 3.1682 | 1980 | 0.0001 | - |
| 3.1843 | 1990 | 0.0005 | - |
| 3.2003 | 2000 | 0.0 | - |
| 3.2163 | 2010 | 0.0 | - |
| 3.2323 | 2020 | 0.0 | - |
| 3.2483 | 2030 | 0.0 | - |
| 3.2644 | 2040 | 0.0 | - |
| 3.2804 | 2050 | 0.0 | - |
| 3.2964 | 2060 | 0.0001 | - |
| 3.3124 | 2070 | 0.0001 | - |
| 3.3285 | 2080 | 0.0 | - |
| 3.3445 | 2090 | 0.0001 | - |
| 3.3605 | 2100 | 0.0 | - |
| 3.3765 | 2110 | 0.0005 | - |
| 3.3925 | 2120 | 0.0001 | - |
| 3.4086 | 2130 | 0.0 | - |
| 3.4246 | 2140 | 0.0 | - |
| 3.4406 | 2150 | 0.0004 | - |
| 3.4566 | 2160 | 0.0005 | - |
| 3.4727 | 2170 | 0.0 | - |
| 3.4887 | 2180 | 0.0006 | - |
| 3.5047 | 2190 | 0.0002 | - |
| 3.5207 | 2200 | 0.0007 | - |
| 3.5368 | 2210 | 0.0 | - |
| 3.5528 | 2220 | 0.0 | - |
| 3.5688 | 2230 | 0.0008 | - |
| 3.5848 | 2240 | 0.0001 | - |
| 3.6008 | 2250 | 0.0013 | - |
| 3.6169 | 2260 | 0.0004 | - |
| 3.6329 | 2270 | 0.0006 | - |
| 3.6489 | 2280 | 0.0001 | - |
| 3.6649 | 2290 | 0.0 | - |
| 3.6810 | 2300 | 0.0011 | - |
| 3.6970 | 2310 | 0.0005 | - |
| 3.7130 | 2320 | 0.0 | - |
| 3.7290 | 2330 | 0.0 | - |
| 3.7450 | 2340 | 0.0006 | - |
| 3.7611 | 2350 | 0.0 | - |
| 3.7771 | 2360 | 0.0002 | - |
| 3.7931 | 2370 | 0.0006 | - |
| 3.8091 | 2380 | 0.0002 | - |
| 3.8252 | 2390 | 0.0004 | - |
| 3.8412 | 2400 | 0.0 | - |
| 3.8572 | 2410 | 0.0007 | - |
| 3.8732 | 2420 | 0.0006 | - |
| 3.8892 | 2430 | 0.0002 | - |
| 3.9053 | 2440 | 0.0009 | - |
| 3.9213 | 2450 | 0.0009 | - |
| 3.9373 | 2460 | 0.0 | - |
| 3.9533 | 2470 | 0.0001 | - |
| 3.9694 | 2480 | 0.0012 | - |
| 3.9854 | 2490 | 0.0003 | - |
| 3.9950 | 2496 | - | 0.2524 |
| -1 | -1 | - | 0.2532 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{hermans2017defense,
title={In Defense of the Triplet Loss for Person Re-Identification},
author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
year={2017},
eprint={1703.07737},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Base model
Master-thesis-NAP/ModernBert-DAPT-math