Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper • 1908.10084 • Published • 13
This is a Cross Encoder model finetuned from microsoft/MiniLM-L12-H384-uncased using the sentence-transformers library. It computes scores for pairs of texts, which can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import CrossEncoder
# Download from the 🤗 Hub
model = CrossEncoder("sentence_transformers_model_id")
# Get scores for pairs of texts
pairs = [
['enrollment statistics at southern arkansas university', 'The University of Southern Malawi also known as the Malawi University of Science and Technology(MUST) [edit]. The Malawi University of Science and Technology was established on 17th December 2012 by the Malawi University of Science and Technology Act No. 31 of 2012 as the fourth Public University in Malawi.'],
['burgos is in what province spain', 'The province of Burgos is a province of northern Spain, in the northeastern part of the autonomous community of Castile and Leon. León it is bordered by the provinces Of, Palencia, Cantabria, Ã\x81lava, Alava álava, La, Rioja, soria Segovia. And valladolid its capital is the City. of burgoshe province of Burgos is divided into 371 municipalities, being the Spanish province with the highest number, although many of them have fewer than 100 inhabitants.'],
['most important customer service skills', 'Customer Service Skill #1: Empathy. Empathy gets thrown around a lot in support training, and for good reason: it might be the single most important customer service skill to develop. To help your customers be happy and successful, itâ\x80\x99s important to understand what happiness and success mean to them.'],
['what happens if we eat too many carbohydrates', 'What Happens If You Eat Too Many Carbs? We all know the feeling you get after eating a large bowl of pasta. Your stomach swells up and you feel like you just gained 10 pounds. Surprisingly carbohydrates are a very important fuel source for your body. Without them it would be hard to have any energy throughout the day. Even though there are risks to consuming no carbs at all, there are also risks to consuming too much! See the article below where we talk about what could happen if you eat too many carbs. You Will Gain Body Fat Sorry to say this but â\x80\x9cyesâ\x80\x9d if you consume too many carbs than you will gain body fat. This isnâ\x80\x99t all that bad though when it comes to building muscle that is. You need to be eating lots of calories throughout the day in order to spark muscle growth. Carbohydrates just happen to have a lot of calories in them.'],
['what county is wharton nj in', 'Sponsored Topics. Wharton is a Borough in Morris County, New Jersey, United States. As of the 2000 United States Census, the borough population was 6,298.'],
]
scores = model.predict(pairs)
print(scores.shape)
# (5,)
# Or rank different texts based on similarity to a single text
ranks = model.rank(
'enrollment statistics at southern arkansas university',
[
'The University of Southern Malawi also known as the Malawi University of Science and Technology(MUST) [edit]. The Malawi University of Science and Technology was established on 17th December 2012 by the Malawi University of Science and Technology Act No. 31 of 2012 as the fourth Public University in Malawi.',
'The province of Burgos is a province of northern Spain, in the northeastern part of the autonomous community of Castile and Leon. León it is bordered by the provinces Of, Palencia, Cantabria, Ã\x81lava, Alava álava, La, Rioja, soria Segovia. And valladolid its capital is the City. of burgoshe province of Burgos is divided into 371 municipalities, being the Spanish province with the highest number, although many of them have fewer than 100 inhabitants.',
'Customer Service Skill #1: Empathy. Empathy gets thrown around a lot in support training, and for good reason: it might be the single most important customer service skill to develop. To help your customers be happy and successful, itâ\x80\x99s important to understand what happiness and success mean to them.',
'What Happens If You Eat Too Many Carbs? We all know the feeling you get after eating a large bowl of pasta. Your stomach swells up and you feel like you just gained 10 pounds. Surprisingly carbohydrates are a very important fuel source for your body. Without them it would be hard to have any energy throughout the day. Even though there are risks to consuming no carbs at all, there are also risks to consuming too much! See the article below where we talk about what could happen if you eat too many carbs. You Will Gain Body Fat Sorry to say this but â\x80\x9cyesâ\x80\x9d if you consume too many carbs than you will gain body fat. This isnâ\x80\x99t all that bad though when it comes to building muscle that is. You need to be eating lots of calories throughout the day in order to spark muscle growth. Carbohydrates just happen to have a lot of calories in them.',
'Sponsored Topics. Wharton is a Borough in Morris County, New Jersey, United States. As of the 2000 United States Census, the borough population was 6,298.',
]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
train-eval, NanoMSMARCO, NanoNFCorpus and NanoNQCERerankingEvaluator| Metric | train-eval | NanoMSMARCO | NanoNFCorpus | NanoNQ |
|---|---|---|---|---|
| map | 0.6582 | 0.6058 (+0.1162) | 0.3384 (+0.0680) | 0.6984 (+0.2778) |
| mrr@10 | 0.6556 | 0.5982 (+0.1207) | 0.5367 (+0.0368) | 0.7111 (+0.2844) |
| ndcg@10 | 0.7121 | 0.6699 (+0.1294) | 0.3760 (+0.0510) | 0.7469 (+0.2462) |
NanoBEIR_meanCENanoBEIREvaluator| Metric | Value |
|---|---|
| map | 0.5476 (+0.1540) |
| mrr@10 | 0.6153 (+0.1473) |
| ndcg@10 | 0.5976 (+0.1422) |
sentence_0, sentence_1, and label| sentence_0 | sentence_1 | label | |
|---|---|---|---|
| type | string | string | int |
| details |
|
|
|
| sentence_0 | sentence_1 | label |
|---|---|---|
enrollment statistics at southern arkansas university |
The University of Southern Malawi also known as the Malawi University of Science and Technology(MUST) [edit]. The Malawi University of Science and Technology was established on 17th December 2012 by the Malawi University of Science and Technology Act No. 31 of 2012 as the fourth Public University in Malawi. |
0 |
burgos is in what province spain |
The province of Burgos is a province of northern Spain, in the northeastern part of the autonomous community of Castile and Leon. León it is bordered by the provinces Of, Palencia, Cantabria, Ãlava, Alava álava, La, Rioja, soria Segovia. And valladolid its capital is the City. of burgoshe province of Burgos is divided into 371 municipalities, being the Spanish province with the highest number, although many of them have fewer than 100 inhabitants. |
1 |
most important customer service skills |
Customer Service Skill #1: Empathy. Empathy gets thrown around a lot in support training, and for good reason: it might be the single most important customer service skill to develop. To help your customers be happy and successful, itâs important to understand what happiness and success mean to them. |
1 |
FitMixinLosseval_strategy: stepsper_device_train_batch_size: 64per_device_eval_batch_size: 64num_train_epochs: 1fp16: Trueoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 64per_device_eval_batch_size: 64per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 1max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: proportional| Epoch | Step | Training Loss | train-eval_ndcg@10 | NanoMSMARCO_ndcg@10 | NanoNFCorpus_ndcg@10 | NanoNQ_ndcg@10 | NanoBEIR_mean_ndcg@10 |
|---|---|---|---|---|---|---|---|
| -1 | -1 | - | 0.0488 | 0.0971 (-0.4433) | 0.2449 (-0.0802) | 0.0508 (-0.4498) | 0.1310 (-0.3244) |
| 0.016 | 500 | 1.1004 | - | - | - | - | - |
| 0.032 | 1000 | 0.7746 | - | - | - | - | - |
| 0.048 | 1500 | 0.543 | - | - | - | - | - |
| 0.064 | 2000 | 0.4508 | - | - | - | - | - |
| 0.08 | 2500 | 0.4112 | - | - | - | - | - |
| 0.096 | 3000 | 0.3949 | - | - | - | - | - |
| 0.112 | 3500 | 0.3793 | - | - | - | - | - |
| 0.128 | 4000 | 0.3584 | - | - | - | - | - |
| 0.144 | 4500 | 0.3725 | - | - | - | - | - |
| 0.16 | 5000 | 0.358 | 0.6634 | 0.6343 (+0.0939) | 0.3986 (+0.0735) | 0.7085 (+0.2078) | 0.5805 (+0.1251) |
| 0.176 | 5500 | 0.3442 | - | - | - | - | - |
| 0.192 | 6000 | 0.3355 | - | - | - | - | - |
| 0.208 | 6500 | 0.3423 | - | - | - | - | - |
| 0.224 | 7000 | 0.3253 | - | - | - | - | - |
| 0.24 | 7500 | 0.3256 | - | - | - | - | - |
| 0.256 | 8000 | 0.3231 | - | - | - | - | - |
| 0.272 | 8500 | 0.3218 | - | - | - | - | - |
| 0.288 | 9000 | 0.3119 | - | - | - | - | - |
| 0.304 | 9500 | 0.3056 | - | - | - | - | - |
| 0.32 | 10000 | 0.3125 | 0.6861 | 0.6423 (+0.1019) | 0.4197 (+0.0947) | 0.7333 (+0.2327) | 0.5985 (+0.1431) |
| 0.336 | 10500 | 0.3 | - | - | - | - | - |
| 0.352 | 11000 | 0.305 | - | - | - | - | - |
| 0.368 | 11500 | 0.3088 | - | - | - | - | - |
| 0.384 | 12000 | 0.2963 | - | - | - | - | - |
| 0.4 | 12500 | 0.3068 | - | - | - | - | - |
| 0.416 | 13000 | 0.299 | - | - | - | - | - |
| 0.432 | 13500 | 0.2962 | - | - | - | - | - |
| 0.448 | 14000 | 0.2942 | - | - | - | - | - |
| 0.464 | 14500 | 0.2969 | - | - | - | - | - |
| 0.48 | 15000 | 0.2956 | 0.6964 | 0.6397 (+0.0993) | 0.3773 (+0.0523) | 0.7140 (+0.2134) | 0.5770 (+0.1216) |
| 0.496 | 15500 | 0.2928 | - | - | - | - | - |
| 0.512 | 16000 | 0.2829 | - | - | - | - | - |
| 0.528 | 16500 | 0.2794 | - | - | - | - | - |
| 0.544 | 17000 | 0.2818 | - | - | - | - | - |
| 0.56 | 17500 | 0.2843 | - | - | - | - | - |
| 0.576 | 18000 | 0.2858 | - | - | - | - | - |
| 0.592 | 18500 | 0.2801 | - | - | - | - | - |
| 0.608 | 19000 | 0.2902 | - | - | - | - | - |
| 0.624 | 19500 | 0.2768 | - | - | - | - | - |
| 0.64 | 20000 | 0.2768 | 0.6963 | 0.6456 (+0.1052) | 0.3820 (+0.0570) | 0.7230 (+0.2224) | 0.5835 (+0.1282) |
| 0.656 | 20500 | 0.2744 | - | - | - | - | - |
| 0.672 | 21000 | 0.2753 | - | - | - | - | - |
| 0.688 | 21500 | 0.2632 | - | - | - | - | - |
| 0.704 | 22000 | 0.2818 | - | - | - | - | - |
| 0.72 | 22500 | 0.2668 | - | - | - | - | - |
| 0.736 | 23000 | 0.2673 | - | - | - | - | - |
| 0.752 | 23500 | 0.2663 | - | - | - | - | - |
| 0.768 | 24000 | 0.2612 | - | - | - | - | - |
| 0.784 | 24500 | 0.2655 | - | - | - | - | - |
| 0.8 | 25000 | 0.2592 | 0.7070 | 0.6614 (+0.1210) | 0.3803 (+0.0552) | 0.7482 (+0.2476) | 0.5966 (+0.1412) |
| 0.816 | 25500 | 0.2661 | - | - | - | - | - |
| 0.832 | 26000 | 0.2568 | - | - | - | - | - |
| 0.848 | 26500 | 0.2651 | - | - | - | - | - |
| 0.864 | 27000 | 0.2577 | - | - | - | - | - |
| 0.88 | 27500 | 0.2579 | - | - | - | - | - |
| 0.896 | 28000 | 0.2552 | - | - | - | - | - |
| 0.912 | 28500 | 0.2531 | - | - | - | - | - |
| 0.928 | 29000 | 0.255 | - | - | - | - | - |
| 0.944 | 29500 | 0.2565 | - | - | - | - | - |
| 0.96 | 30000 | 0.2534 | 0.7150 | 0.6647 (+0.1243) | 0.3745 (+0.0495) | 0.7479 (+0.2472) | 0.5957 (+0.1403) |
| 0.976 | 30500 | 0.2508 | - | - | - | - | - |
| 0.992 | 31000 | 0.2459 | - | - | - | - | - |
| 1.0 | 31250 | - | 0.7121 | 0.6699 (+0.1294) | 0.3760 (+0.0510) | 0.7469 (+0.2462) | 0.5976 (+0.1422) |
Carbon emissions were measured using CodeCarbon.
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
Base model
microsoft/MiniLM-L12-H384-uncased