Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper
•
1908.10084
•
Published
•
12
This is a Cross Encoder model finetuned from cross-encoder/nli-deberta-v3-base using the sentence-transformers library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import CrossEncoder
# Download from the 🤗 Hub
model = CrossEncoder("cross_encoder_model_id")
# Get scores for pairs of texts
pairs = [
['Route 309 is a Connecticut State Highway in the northwestern Hartford suburbs from Canton to Simsbury .', 'Route 309 runs a Canton State Highway in the northwestern Connecticut suburbs from Hartford to Simsbury .'],
['During the competition she lost 50-25 to Zimbabwe , 84-16 to Tanzania , 58-24 to South Africa .', 'During the competition , they lost 50-25 to Zimbabwe , 84-16 to Tanzania , 58-24 to South Africa .'],
['The latter study is one of the few prospective demonstrations that environmental stress with high blood pressure and LVH remains associated .', 'The latter study remains one of the few prospective demonstrations that environmental stress with high blood pressure and LVH is associated .'],
['The Marignane is located at Marseille Airport in Provence .', 'The Marignane is located in Marseille Provence Airport .'],
['Birleffi was of Italian descent and Roman - Catholic in a predominantly Protestant state .', 'Birleffi was of Italian ethnicity and Roman Catholic in a predominantly Protestant state .'],
]
scores = model.predict(pairs)
print(scores.shape)
# (5,)
# Or rank different texts based on similarity to a single text
ranks = model.rank(
'Route 309 is a Connecticut State Highway in the northwestern Hartford suburbs from Canton to Simsbury .',
[
'Route 309 runs a Canton State Highway in the northwestern Connecticut suburbs from Hartford to Simsbury .',
'During the competition , they lost 50-25 to Zimbabwe , 84-16 to Tanzania , 58-24 to South Africa .',
'The latter study remains one of the few prospective demonstrations that environmental stress with high blood pressure and LVH is associated .',
'The Marignane is located in Marseille Provence Airport .',
'Birleffi was of Italian ethnicity and Roman Catholic in a predominantly Protestant state .',
]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
paws-val-judgeCEBinaryClassificationEvaluator| Metric | Value |
|---|---|
| accuracy | 0.9646 |
| accuracy_threshold | 0.0871 |
| f1 | 0.9605 |
| f1_threshold | 0.0871 |
| precision | 0.947 |
| recall | 0.9743 |
| average_precision | 0.987 |
sentence_0, sentence_1, and label| sentence_0 | sentence_1 | label | |
|---|---|---|---|
| type | string | string | float |
| details |
|
|
|
| sentence_0 | sentence_1 | label |
|---|---|---|
Route 309 is a Connecticut State Highway in the northwestern Hartford suburbs from Canton to Simsbury . |
Route 309 runs a Canton State Highway in the northwestern Connecticut suburbs from Hartford to Simsbury . |
0.0 |
During the competition she lost 50-25 to Zimbabwe , 84-16 to Tanzania , 58-24 to South Africa . |
During the competition , they lost 50-25 to Zimbabwe , 84-16 to Tanzania , 58-24 to South Africa . |
1.0 |
The latter study is one of the few prospective demonstrations that environmental stress with high blood pressure and LVH remains associated . |
The latter study remains one of the few prospective demonstrations that environmental stress with high blood pressure and LVH is associated . |
1.0 |
BinaryCrossEntropyLoss with these parameters:{
"activation_fn": "torch.nn.modules.linear.Identity",
"pos_weight": null
}
per_device_train_batch_size: 16per_device_eval_batch_size: 16overwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 16per_device_eval_batch_size: 16per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 3max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthproject: huggingfacetrackio_space_id: trackioddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: noneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Trueprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: proportionalrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | Training Loss | paws-val-judge_average_precision |
|---|---|---|---|
| 0.1852 | 500 | 0.3758 | - |
| 0.3704 | 1000 | 0.226 | - |
| 0.5556 | 1500 | 0.2176 | - |
| 0.7407 | 2000 | 0.1778 | - |
| 0.9259 | 2500 | 0.1757 | - |
| 1.0 | 2700 | - | 0.9826 |
| 1.1111 | 3000 | 0.1494 | - |
| 1.2963 | 3500 | 0.1271 | - |
| 1.4815 | 4000 | 0.1197 | - |
| 1.6667 | 4500 | 0.1263 | - |
| 1.8519 | 5000 | 0.116 | - |
| 2.0 | 5400 | - | 0.9852 |
| 2.0370 | 5500 | 0.1084 | - |
| 2.2222 | 6000 | 0.0707 | - |
| 2.4074 | 6500 | 0.0741 | - |
| 2.5926 | 7000 | 0.0713 | - |
| 2.7778 | 7500 | 0.0723 | - |
| 2.9630 | 8000 | 0.0727 | - |
| 3.0 | 8100 | - | 0.9870 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
Base model
microsoft/deberta-v3-base