Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper
• 1908.10084 • Published
• 12
This is a Cross Encoder model finetuned from BAAI/bge-reranker-v2-m3 using the sentence-transformers library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import CrossEncoder
# Download from the 🤗 Hub
model = CrossEncoder("cross_encoder_model_id")
# Get scores for pairs of texts
pairs = [
['Who played Mr. Tucker of the show by the person who plays the old man in Waiting on A Woman?', "</s></s>Waitin' on a Woman. Paisley has referred to ``Waitin 'on a Woman ''as`` one of the most important songs'' that he's ever recorded. Because of the importance that he places on the song, Paisley asked Andy Griffith to star in the music video, as he felt that Griffith's personality matched the personality of the older man in the song. Griffith speaks the old man's lines in the video as well. Jim Shea and Peter Tilden directed the video.</s></s></s>"],
['What culture\'s arrival in the country the performer of Privilege is a citizen of is known as the "Davidian Revolution"?', '</s></s></s>Ivor Cutler. In 2014 a new play, "The Beautiful Cosmos of Ivor Cutler", a co-production by Vanishing Point and National Theatre of Scotland, was performed.</s></s>'],
['What position was held by the Republican candidate running for governor in the state Plum Hollow Country Club is located?', '</s></s>Gerald Hills. Gerald (Rusty) J. Hills, II is an American politician and educator in the state of Michigan and is currently the communications director for Michigan Attorney General Bill Schuette.</s></s></s>2018 Michigan gubernatorial election. Michigan gubernatorial election, 2018 ← 2014 November 6, 2018 2022 → Nominee Bill Schuette Gretchen Whitmer Bill Gelineau Party Republican Democratic Libertarian Running mate Lisa Posthumus Lyons Garlin Gilchrist II Angelique Thomas Incumbent Governor Rick Snyder Republican'],
['When did weed become legal where marble for the lincoln memorial was sourced?', '</s>Cannabis in California. Cannabis in California is permitted, subject to regulations, for both medical and recreational use. In recent decades the state has led the country in efforts to legalize cannabis, holding the first (unsuccessful) vote to decriminalize it in 1972 and, through Proposition 215, becoming the first state to legalize it for medical use in 1996. In the November 2016 election, voters passed an amendment legalizing recreational use of marijuana.</s></s></s>Timeline of cannabis laws in the United States. The legal history of cannabis in the United States began with state - level prohibition in the early 20th century, with the first major federal limitations occurring in 1937. Starting with Oregon in 1973, individual states began to liberalize cannabis laws through decriminalization. In 1996, California became the first state to legalize medical cannabis, sparking a trend that spread to a majority of states by 2016. In 2012, Colorado and Washington became the first states to legalize cannabis for recreational use.</s>'],
['Where is the Rio Grande river located in the country where Norbrook is located?', 'Norfolk Island. Norfolk Island is located in the South Pacific Ocean, east of the Australian mainland. Norfolk Island is the main island of the island group the territory encompasses and is located at 29°02′S 167°57′E\ufeff / \ufeff29.033°S 167.950°E\ufeff / -29.033; 167.950. It has an area of 34.6 square kilometres (13.4 sq mi), with no large-scale internal bodies of water and 32 km (20 mi) of coastline. The island\'s highest point is Mount Bates (319 metres (1,047 feet) above sea level), located in the northwest quadrant of the island. The majority of the terrain is suitable for farming and other agricultural uses. Phillip Island, the second largest island of the territory, is located at 29°07′S 167°57′E\ufeff / \ufeff29.117°S 167.950°E\ufeff / -29.117; 167.950, seven kilometres (4.3 miles) south of the main island.</s>Norfolk Island. Norfolk Island is located in the South Pacific Ocean, east of the Australian mainland. Norfolk Island is the main island of the island group the territory encompasses and is located at 29°02′S 167°57′E\ufeff / \ufeff29.033°S 167.950°E\ufeff / -29.033; 167.950. It has an area of 34.6 square kilometres (13.4 sq mi), with no large-scale internal bodies of water and 32 km (20 mi) of coastline. The island\'s highest point is Mount Bates (319 metres (1,047 feet) above sea level), located in the northwest quadrant of the island. The majority of the terrain is suitable for farming and other agricultural uses. Phillip Island, the second largest island of the territory, is located at 29°07′S 167°57′E\ufeff / \ufeff29.117°S 167.950°E\ufeff / -29.117; 167.950, seven kilometres (4.3 miles) south of the main island.</s></s>Norbrook. Norbrook is an upscale neighbourhood of the Kingston Metropolitan Area of Jamaica, with approximately 15,000 residents and is an important residential, shopping and commercial centre of the city itself. Norbrook is regarded as anywhere from the Immaculate Conception High School (in the South) to about 100m up "The Hill" (in the North).</s></s>Rio Grande (Jamaica). The Rio Grande is a river of Jamaica, found in the parish of Portland. It was named when the Spanish occupied Jamaica in the 15th and 16th centuries. One of the largest rivers in Jamaica, it was named ``Big River \'\'(Rio Grande) by the Spanish, and today is one of the many tourist attractions in Portland, mainly for rafting.'],
]
scores = model.predict(pairs)
print(scores.shape)
# (5,)
# Or rank different texts based on similarity to a single text
ranks = model.rank(
'Who played Mr. Tucker of the show by the person who plays the old man in Waiting on A Woman?',
[
"</s></s>Waitin' on a Woman. Paisley has referred to ``Waitin 'on a Woman ''as`` one of the most important songs'' that he's ever recorded. Because of the importance that he places on the song, Paisley asked Andy Griffith to star in the music video, as he felt that Griffith's personality matched the personality of the older man in the song. Griffith speaks the old man's lines in the video as well. Jim Shea and Peter Tilden directed the video.</s></s></s>",
'</s></s></s>Ivor Cutler. In 2014 a new play, "The Beautiful Cosmos of Ivor Cutler", a co-production by Vanishing Point and National Theatre of Scotland, was performed.</s></s>',
'</s></s>Gerald Hills. Gerald (Rusty) J. Hills, II is an American politician and educator in the state of Michigan and is currently the communications director for Michigan Attorney General Bill Schuette.</s></s></s>2018 Michigan gubernatorial election. Michigan gubernatorial election, 2018 ← 2014 November 6, 2018 2022 → Nominee Bill Schuette Gretchen Whitmer Bill Gelineau Party Republican Democratic Libertarian Running mate Lisa Posthumus Lyons Garlin Gilchrist II Angelique Thomas Incumbent Governor Rick Snyder Republican',
'</s>Cannabis in California. Cannabis in California is permitted, subject to regulations, for both medical and recreational use. In recent decades the state has led the country in efforts to legalize cannabis, holding the first (unsuccessful) vote to decriminalize it in 1972 and, through Proposition 215, becoming the first state to legalize it for medical use in 1996. In the November 2016 election, voters passed an amendment legalizing recreational use of marijuana.</s></s></s>Timeline of cannabis laws in the United States. The legal history of cannabis in the United States began with state - level prohibition in the early 20th century, with the first major federal limitations occurring in 1937. Starting with Oregon in 1973, individual states began to liberalize cannabis laws through decriminalization. In 1996, California became the first state to legalize medical cannabis, sparking a trend that spread to a majority of states by 2016. In 2012, Colorado and Washington became the first states to legalize cannabis for recreational use.</s>',
'Norfolk Island. Norfolk Island is located in the South Pacific Ocean, east of the Australian mainland. Norfolk Island is the main island of the island group the territory encompasses and is located at 29°02′S 167°57′E\ufeff / \ufeff29.033°S 167.950°E\ufeff / -29.033; 167.950. It has an area of 34.6 square kilometres (13.4 sq mi), with no large-scale internal bodies of water and 32 km (20 mi) of coastline. The island\'s highest point is Mount Bates (319 metres (1,047 feet) above sea level), located in the northwest quadrant of the island. The majority of the terrain is suitable for farming and other agricultural uses. Phillip Island, the second largest island of the territory, is located at 29°07′S 167°57′E\ufeff / \ufeff29.117°S 167.950°E\ufeff / -29.117; 167.950, seven kilometres (4.3 miles) south of the main island.</s>Norfolk Island. Norfolk Island is located in the South Pacific Ocean, east of the Australian mainland. Norfolk Island is the main island of the island group the territory encompasses and is located at 29°02′S 167°57′E\ufeff / \ufeff29.033°S 167.950°E\ufeff / -29.033; 167.950. It has an area of 34.6 square kilometres (13.4 sq mi), with no large-scale internal bodies of water and 32 km (20 mi) of coastline. The island\'s highest point is Mount Bates (319 metres (1,047 feet) above sea level), located in the northwest quadrant of the island. The majority of the terrain is suitable for farming and other agricultural uses. Phillip Island, the second largest island of the territory, is located at 29°07′S 167°57′E\ufeff / \ufeff29.117°S 167.950°E\ufeff / -29.117; 167.950, seven kilometres (4.3 miles) south of the main island.</s></s>Norbrook. Norbrook is an upscale neighbourhood of the Kingston Metropolitan Area of Jamaica, with approximately 15,000 residents and is an important residential, shopping and commercial centre of the city itself. Norbrook is regarded as anywhere from the Immaculate Conception High School (in the South) to about 100m up "The Hill" (in the North).</s></s>Rio Grande (Jamaica). The Rio Grande is a river of Jamaica, found in the parish of Portland. It was named when the Spanish occupied Jamaica in the 15th and 16th centuries. One of the largest rivers in Jamaica, it was named ``Big River \'\'(Rio Grande) by the Spanish, and today is one of the many tourist attractions in Portland, mainly for rafting.',
]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
validation and train_subsetCECorrelationEvaluator| Metric | validation | train_subset |
|---|---|---|
| pearson | 0.9038 | 0.8992 |
| spearman | 0.8998 | 0.896 |
sentence_0, sentence_1, and label| sentence_0 | sentence_1 | label | |
|---|---|---|---|
| type | string | string | float |
| details |
|
|
|
| sentence_0 | sentence_1 | label |
|---|---|---|
Who played Mr. Tucker of the show by the person who plays the old man in Waiting on A Woman? |
Waitin' on a Woman. Paisley has referred to |
0.5 |
What culture's arrival in the country the performer of Privilege is a citizen of is known as the "Davidian Revolution"? |
Ivor Cutler. In 2014 a new play, "The Beautiful Cosmos of Ivor Cutler", a co-production by Vanishing Point and National Theatre of Scotland, was performed. |
0.3333333333333333 |
What position was held by the Republican candidate running for governor in the state Plum Hollow Country Club is located? |
Gerald Hills. Gerald (Rusty) J. Hills, II is an American politician and educator in the state of Michigan and is currently the communications director for Michigan Attorney General Bill Schuette.2018 Michigan gubernatorial election. Michigan gubernatorial election, 2018 ← 2014 November 6, 2018 2022 → Nominee Bill Schuette Gretchen Whitmer Bill Gelineau Party Republican Democratic Libertarian Running mate Lisa Posthumus Lyons Garlin Gilchrist II Angelique Thomas Incumbent Governor Rick Snyder Republican |
0.6666666666666666 |
BinaryCrossEntropyLoss with these parameters:{
"activation_fn": "torch.nn.modules.linear.Identity",
"pos_weight": null
}
eval_strategy: stepsper_device_train_batch_size: 2per_device_eval_batch_size: 2overwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 2per_device_eval_batch_size: 2per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 3max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Falsehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseeval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseeval_use_gather_object: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: proportionalrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | Training Loss | validation_spearman | train_subset_spearman |
|---|---|---|---|---|
| 0.0499 | 250 | - | 0.8300 | 0.8134 |
| 0.0997 | 500 | 0.517 | 0.8536 | 0.8442 |
| 0.1496 | 750 | - | 0.8496 | 0.8538 |
| 0.1995 | 1000 | 0.4395 | 0.8586 | 0.8600 |
| 0.2494 | 1250 | - | 0.8661 | 0.8649 |
| 0.2992 | 1500 | 0.4236 | 0.8770 | 0.8673 |
| 0.3491 | 1750 | - | 0.8788 | 0.8715 |
| 0.3990 | 2000 | 0.4354 | 0.8829 | 0.8765 |
| 0.4488 | 2250 | - | 0.8810 | 0.8766 |
| 0.4987 | 2500 | 0.4056 | 0.8835 | 0.8819 |
| 0.5486 | 2750 | - | 0.8857 | 0.8828 |
| 0.5984 | 3000 | 0.4093 | 0.8858 | 0.8842 |
| 0.6483 | 3250 | - | 0.8940 | 0.8858 |
| 0.6982 | 3500 | 0.4207 | 0.8905 | 0.8893 |
| 0.7481 | 3750 | - | 0.8954 | 0.8937 |
| 0.7979 | 4000 | 0.4006 | 0.8960 | 0.8942 |
| 0.8478 | 4250 | - | 0.8998 | 0.8960 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
Base model
BAAI/bge-reranker-v2-m3