Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper
•
1908.10084
•
Published
•
12
This is a sentence-transformers model finetuned from sentence-transformers/all-mpnet-base-v2. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 384, 'do_lower_case': False, 'architecture': 'MPNetModel'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'board cert agency code, Board Cert Agency Code',
'2nd board cert',
'comments',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000, 0.6759, -0.0045],
# [ 0.6759, 1.0000, 0.0552],
# [-0.0045, 0.0552, 1.0000]])
anchor and positive| anchor | positive | |
|---|---|---|
| type | string | string |
| details |
|
|
| anchor | positive |
|---|---|
accepting patients ind, Accepting Patients IND |
primary spec accepting new patients for pcps and ob |
accepting patients ind, Accepting Patients IND |
accepting new patients (all practitioner types ongoing outpatient basis) (y n) (no blanks) |
accepting patients ind, Accepting Patients IND |
acc ind for pts |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
anchor and positive| anchor | positive | |
|---|---|---|
| type | string | string |
| details |
|
|
| anchor | positive |
|---|---|
accepting patients ind, Accepting Patients IND |
open close panel |
accepting patients ind, Accepting Patients IND |
panel status |
accepting patients ind, Accepting Patients IND |
commercial panel status |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
eval_strategy: stepsper_device_train_batch_size: 32per_device_eval_batch_size: 32learning_rate: 2e-05warmup_ratio: 0.1fp16: Trueload_best_model_at_end: Truebatch_sampler: no_duplicatesoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 32per_device_eval_batch_size: 32per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 2e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 3max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Trueignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportionalrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | Training Loss | Validation Loss |
|---|---|---|---|
| 0.0258 | 50 | 0.8668 | - |
| 0.0517 | 100 | 0.7505 | 0.6548 |
| 0.0775 | 150 | 0.6506 | - |
| 0.1033 | 200 | 0.4672 | 0.4107 |
| 0.1291 | 250 | 0.403 | - |
| 0.1550 | 300 | 0.3284 | 0.2954 |
| 0.1808 | 350 | 0.3005 | - |
| 0.2066 | 400 | 0.2248 | 0.2149 |
| 0.2324 | 450 | 0.219 | - |
| 0.2583 | 500 | 0.1794 | 0.1685 |
| 0.2841 | 550 | 0.1441 | - |
| 0.3099 | 600 | 0.1522 | 0.1397 |
| 0.3357 | 650 | 0.1322 | - |
| 0.3616 | 700 | 0.1254 | 0.1283 |
| 0.3874 | 750 | 0.1194 | - |
| 0.4132 | 800 | 0.134 | 0.1140 |
| 0.4390 | 850 | 0.0932 | - |
| 0.4649 | 900 | 0.1025 | 0.0957 |
| 0.4907 | 950 | 0.1063 | - |
| 0.5165 | 1000 | 0.0956 | 0.0945 |
| 0.5424 | 1050 | 0.071 | - |
| 0.5682 | 1100 | 0.0727 | 0.0836 |
| 0.5940 | 1150 | 0.0895 | - |
| 0.6198 | 1200 | 0.0786 | 0.0750 |
| 0.6457 | 1250 | 0.0923 | - |
| 0.6715 | 1300 | 0.0905 | 0.0742 |
| 0.6973 | 1350 | 0.0522 | - |
| 0.7231 | 1400 | 0.0645 | 0.0693 |
| 0.7490 | 1450 | 0.0711 | - |
| 0.7748 | 1500 | 0.0655 | 0.0627 |
| 0.8006 | 1550 | 0.0532 | - |
| 0.8264 | 1600 | 0.0602 | 0.0615 |
| 0.8523 | 1650 | 0.0674 | - |
| 0.8781 | 1700 | 0.0537 | 0.0564 |
| 0.9039 | 1750 | 0.0578 | - |
| 0.9298 | 1800 | 0.0643 | 0.0533 |
| 0.9556 | 1850 | 0.0655 | - |
| 0.9814 | 1900 | 0.0562 | 0.0519 |
| 1.0072 | 1950 | 0.0538 | - |
| 1.0331 | 2000 | 0.043 | 0.0470 |
| 1.0589 | 2050 | 0.035 | - |
| 1.0847 | 2100 | 0.0412 | 0.0454 |
| 1.1105 | 2150 | 0.0362 | - |
| 1.1364 | 2200 | 0.0454 | 0.0449 |
| 1.1622 | 2250 | 0.0438 | - |
| 1.1880 | 2300 | 0.0453 | 0.0433 |
| 1.2138 | 2350 | 0.0298 | - |
| 1.2397 | 2400 | 0.0351 | 0.0444 |
| 1.2655 | 2450 | 0.0349 | - |
| 1.2913 | 2500 | 0.0391 | 0.0431 |
| 1.3171 | 2550 | 0.0404 | - |
| 1.3430 | 2600 | 0.0371 | 0.0423 |
| 1.3688 | 2650 | 0.0382 | - |
| 1.3946 | 2700 | 0.0325 | 0.0420 |
| 1.4205 | 2750 | 0.0394 | - |
| 1.4463 | 2800 | 0.0469 | 0.0421 |
| 1.4721 | 2850 | 0.0466 | - |
| 1.4979 | 2900 | 0.0374 | 0.0407 |
| 1.5238 | 2950 | 0.0321 | - |
| 1.5496 | 3000 | 0.022 | 0.0388 |
| 1.5754 | 3050 | 0.0229 | - |
| 1.6012 | 3100 | 0.0354 | 0.0367 |
| 1.6271 | 3150 | 0.0275 | - |
| 1.6529 | 3200 | 0.036 | 0.0358 |
| 1.6787 | 3250 | 0.0349 | - |
| 1.7045 | 3300 | 0.0359 | 0.0337 |
| 1.7304 | 3350 | 0.0386 | - |
| 1.7562 | 3400 | 0.029 | 0.0341 |
| 1.7820 | 3450 | 0.0348 | - |
| 1.8079 | 3500 | 0.0241 | 0.0342 |
| 1.8337 | 3550 | 0.0281 | - |
| 1.8595 | 3600 | 0.0239 | 0.0323 |
| 1.8853 | 3650 | 0.0281 | - |
| 1.9112 | 3700 | 0.0301 | 0.0323 |
| 1.9370 | 3750 | 0.0186 | - |
| 1.9628 | 3800 | 0.0246 | 0.0308 |
| 1.9886 | 3850 | 0.0315 | - |
| 2.0145 | 3900 | 0.0185 | 0.0302 |
| 2.0403 | 3950 | 0.0272 | - |
| 2.0661 | 4000 | 0.025 | 0.0304 |
| 2.0919 | 4050 | 0.0262 | - |
| 2.1178 | 4100 | 0.02 | 0.0306 |
| 2.1436 | 4150 | 0.0163 | - |
| 2.1694 | 4200 | 0.0301 | 0.0294 |
| 2.1952 | 4250 | 0.0176 | - |
| 2.2211 | 4300 | 0.0206 | 0.0297 |
| 2.2469 | 4350 | 0.0121 | - |
| 2.2727 | 4400 | 0.0206 | 0.0294 |
| 2.2986 | 4450 | 0.018 | - |
| 2.3244 | 4500 | 0.0178 | 0.0291 |
| 2.3502 | 4550 | 0.0153 | - |
| 2.3760 | 4600 | 0.0219 | 0.0288 |
| 2.4019 | 4650 | 0.0214 | - |
| 2.4277 | 4700 | 0.0212 | 0.0281 |
| 2.4535 | 4750 | 0.0183 | - |
| 2.4793 | 4800 | 0.0302 | 0.0280 |
| 2.5052 | 4850 | 0.0158 | - |
| 2.5310 | 4900 | 0.02 | 0.0274 |
| 2.5568 | 4950 | 0.0171 | - |
| 2.5826 | 5000 | 0.0275 | 0.0269 |
| 2.6085 | 5050 | 0.0193 | - |
| 2.6343 | 5100 | 0.0158 | 0.0269 |
| 2.6601 | 5150 | 0.0179 | - |
| 2.6860 | 5200 | 0.0214 | 0.0269 |
| 2.7118 | 5250 | 0.0225 | - |
| 2.7376 | 5300 | 0.0166 | 0.0264 |
| 2.7634 | 5350 | 0.0243 | - |
| 2.7893 | 5400 | 0.0154 | 0.0262 |
| 2.8151 | 5450 | 0.0245 | - |
| 2.8409 | 5500 | 0.0122 | 0.0261 |
| 2.8667 | 5550 | 0.0234 | - |
| 2.8926 | 5600 | 0.0217 | 0.0259 |
| 2.9184 | 5650 | 0.0166 | - |
| 2.9442 | 5700 | 0.0165 | 0.0258 |
| 2.9700 | 5750 | 0.0126 | - |
| 2.9959 | 5800 | 0.0201 | 0.0258 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
sentence-transformers/all-mpnet-base-v2