Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper
•
1908.10084
•
Published
•
11
This is a Cross Encoder model finetuned from BAAI/bge-reranker-v2-m3 using the sentence-transformers library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import CrossEncoder
# Download from the 🤗 Hub
model = CrossEncoder("cross_encoder_model_id")
# Get scores for pairs of texts
pairs = [
['Who was the writer of the musical for which Edwin Gagiano received his Best Actor Award for the role of Alf Bueller?', "Back to the 80s (musical). Back to the 80's [sic] is a musical written by Neil Gooding with the original musical adaption made by Scott Copeman. It was later re-orchestrated and arranged by Brett Foster in 2003, just prior to the Australian Production. It was originally staged by Neil Gooding Productions Pty Ltd in Sydney, Australia in 2004. It is a popular show for school productions in the English speaking world."],
['Are Stephens Cur and Smooth Collie both herding dogs?', 'Stephens Cur. The Stephens Cur (a.k.a. Stephens\' Stock Cur), is a scent hound that belongs to the Cur dog breed. They were originally bred by the Stephens family in southeastern Kentucky. The dogs known as "Little black dog" were bred by generations of that family for over a century. In 1970, they were recognized as separate and distinct breed of Cur. The dog is mostly black with white markings, but more than a third white is not permissible. It is good for hunting raccoon and squirrel, but can also be used to bay wild boar. They are registered with the United Kennel Club'],
['What was the name of the radio telescope the surpassed the Dwingeloo Radio Observatory?', 'Hoopoe-billed ʻakialoa. The hoopoe-billed ʻakialoa, ("Akialoa upupirostris"), was an extinct species of Hawaiian honeycreeper. Fossil remains have been found of this species in the Hawaiian islands of Kauai and Oahu. The species specific name, "upupirostris", is derived from the Latin "upupa", hoopoe, and "rostrum", bill, and refers to the long sickle-shaped bill which resembles that of the hoopoe. The species was apparently slightly larger than others in its genus. A similar but smaller bird has been discovered but is as yet undescribed from the island of Maui. The species presumably became extinct after the arrival of humans in Hawaii, and is known only from the fossil record.'],
['Are both Robert Stevenson and Am Rong a filmmaker?', 'Will Arnett. William Emerson Arnett ( ; born May 4, 1970) is a Canadian-American actor, voice actor and comedian. He is best known for his role as George Oscar "Gob" Bluth II in the Fox/Netflix series "Arrested Development" (2003–2006, 2013, 2018); as well as his titular role as BoJack Horseman in the Netflix Original Series of the same name (2014-present). He has appeared in films such as "Blades of Glory" (2007), "Hot Rod" (2007) and "Teenage Mutant Ninja Turtles" (2014).'],
['Who founded this American guitar manufacturer headquartered in Maryland that produced electric baritone guitars?', 'Baritone guitar. The baritone guitar is a guitar with a longer scale length, typically a larger body, and heavier internal bracing, so it can be tuned to a lower pitch. Gretsch, Fender, Gibson, Ibanez, ESP Guitars, PRS Guitars, Music Man, Danelectro, Schecter, Jerry Jones Guitars, Burns London and many other companies have produced electric baritone guitars since the 1960s, although always in small numbers due to low popularity. Tacoma, Santa Cruz, Taylor, Martin, Alvarez Guitars and others have made acoustic baritone guitars.'],
]
scores = model.predict(pairs)
print(scores.shape)
# (5,)
# Or rank different texts based on similarity to a single text
ranks = model.rank(
'Who was the writer of the musical for which Edwin Gagiano received his Best Actor Award for the role of Alf Bueller?',
[
"Back to the 80s (musical). Back to the 80's [sic] is a musical written by Neil Gooding with the original musical adaption made by Scott Copeman. It was later re-orchestrated and arranged by Brett Foster in 2003, just prior to the Australian Production. It was originally staged by Neil Gooding Productions Pty Ltd in Sydney, Australia in 2004. It is a popular show for school productions in the English speaking world.",
'Stephens Cur. The Stephens Cur (a.k.a. Stephens\' Stock Cur), is a scent hound that belongs to the Cur dog breed. They were originally bred by the Stephens family in southeastern Kentucky. The dogs known as "Little black dog" were bred by generations of that family for over a century. In 1970, they were recognized as separate and distinct breed of Cur. The dog is mostly black with white markings, but more than a third white is not permissible. It is good for hunting raccoon and squirrel, but can also be used to bay wild boar. They are registered with the United Kennel Club',
'Hoopoe-billed ʻakialoa. The hoopoe-billed ʻakialoa, ("Akialoa upupirostris"), was an extinct species of Hawaiian honeycreeper. Fossil remains have been found of this species in the Hawaiian islands of Kauai and Oahu. The species specific name, "upupirostris", is derived from the Latin "upupa", hoopoe, and "rostrum", bill, and refers to the long sickle-shaped bill which resembles that of the hoopoe. The species was apparently slightly larger than others in its genus. A similar but smaller bird has been discovered but is as yet undescribed from the island of Maui. The species presumably became extinct after the arrival of humans in Hawaii, and is known only from the fossil record.',
'Will Arnett. William Emerson Arnett ( ; born May 4, 1970) is a Canadian-American actor, voice actor and comedian. He is best known for his role as George Oscar "Gob" Bluth II in the Fox/Netflix series "Arrested Development" (2003–2006, 2013, 2018); as well as his titular role as BoJack Horseman in the Netflix Original Series of the same name (2014-present). He has appeared in films such as "Blades of Glory" (2007), "Hot Rod" (2007) and "Teenage Mutant Ninja Turtles" (2014).',
'Baritone guitar. The baritone guitar is a guitar with a longer scale length, typically a larger body, and heavier internal bracing, so it can be tuned to a lower pitch. Gretsch, Fender, Gibson, Ibanez, ESP Guitars, PRS Guitars, Music Man, Danelectro, Schecter, Jerry Jones Guitars, Burns London and many other companies have produced electric baritone guitars since the 1960s, although always in small numbers due to low popularity. Tacoma, Santa Cruz, Taylor, Martin, Alvarez Guitars and others have made acoustic baritone guitars.',
]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
validation and train_subsetCEBinaryClassificationEvaluator| Metric | validation | train_subset |
|---|---|---|
| accuracy | 0.99 | 0.942 |
| accuracy_threshold | 0.3649 | 0.7961 |
| f1 | 0.99 | 0.9437 |
| f1_threshold | 0.3649 | 0.6639 |
| precision | 0.9933 | 0.9346 |
| recall | 0.9867 | 0.9529 |
| average_precision | 0.9991 | 0.9847 |
sentence_0, sentence_1, and label| sentence_0 | sentence_1 | label | |
|---|---|---|---|
| type | string | string | float |
| details |
|
|
|
| sentence_0 | sentence_1 | label |
|---|---|---|
Who was the writer of the musical for which Edwin Gagiano received his Best Actor Award for the role of Alf Bueller? |
Back to the 80s (musical). Back to the 80's [sic] is a musical written by Neil Gooding with the original musical adaption made by Scott Copeman. It was later re-orchestrated and arranged by Brett Foster in 2003, just prior to the Australian Production. It was originally staged by Neil Gooding Productions Pty Ltd in Sydney, Australia in 2004. It is a popular show for school productions in the English speaking world. |
1.0 |
Are Stephens Cur and Smooth Collie both herding dogs? |
Stephens Cur. The Stephens Cur (a.k.a. Stephens' Stock Cur), is a scent hound that belongs to the Cur dog breed. They were originally bred by the Stephens family in southeastern Kentucky. The dogs known as "Little black dog" were bred by generations of that family for over a century. In 1970, they were recognized as separate and distinct breed of Cur. The dog is mostly black with white markings, but more than a third white is not permissible. It is good for hunting raccoon and squirrel, but can also be used to bay wild boar. They are registered with the United Kennel Club |
1.0 |
What was the name of the radio telescope the surpassed the Dwingeloo Radio Observatory? |
Hoopoe-billed ʻakialoa. The hoopoe-billed ʻakialoa, ("Akialoa upupirostris"), was an extinct species of Hawaiian honeycreeper. Fossil remains have been found of this species in the Hawaiian islands of Kauai and Oahu. The species specific name, "upupirostris", is derived from the Latin "upupa", hoopoe, and "rostrum", bill, and refers to the long sickle-shaped bill which resembles that of the hoopoe. The species was apparently slightly larger than others in its genus. A similar but smaller bird has been discovered but is as yet undescribed from the island of Maui. The species presumably became extinct after the arrival of humans in Hawaii, and is known only from the fossil record. |
0.0 |
BinaryCrossEntropyLoss with these parameters:{
"activation_fn": "torch.nn.modules.linear.Identity",
"pos_weight": null
}
eval_strategy: stepsper_device_train_batch_size: 2per_device_eval_batch_size: 2overwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 2per_device_eval_batch_size: 2per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 3max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Falsehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseeval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseeval_use_gather_object: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: proportionalrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | validation_average_precision | train_subset_average_precision |
|---|---|---|---|
| 0.025 | 100 | 0.9994 | 0.9911 |
| 0.05 | 200 | 0.9983 | 0.9895 |
| 0.075 | 300 | 0.9981 | 0.9887 |
| 0.1 | 400 | 0.9991 | 0.9847 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
Base model
BAAI/bge-reranker-v2-m3