Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper • 1908.10084 • Published • 12
This is a Cross Encoder model finetuned from cross-encoder/ms-marco-MiniLM-L6-v2 using the sentence-transformers library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import CrossEncoder
# Download from the 🤗 Hub
model = CrossEncoder("cross_encoder_model_id")
# Get scores for pairs of texts
pairs = [
['How to implement error handling', 'Implementation guide for error handling. Step-by-step instructions with code examples. Covers setup, configuration, and best practices.'],
['Where is UserModel defined', 'Source code location and module structure. Class definitions and interface documentation. File paths and import statements.'],
['Add metrics collection to scheduler', 'Feature implementation guide with API extensions. Configuration options and customization points. Testing requirements.'],
['Fix bug in data processor', 'Company holiday schedule and PTO policy. HR contact information.'],
['Refactor authentication middleware for better readability', 'Refactoring patterns and code improvement strategies. Before/after examples with measurable improvements. Migration guide.'],
]
scores = model.predict(pairs)
print(scores.shape)
# (5,)
# Or rank different texts based on similarity to a single text
ranks = model.rank(
'How to implement error handling',
[
'Implementation guide for error handling. Step-by-step instructions with code examples. Covers setup, configuration, and best practices.',
'Source code location and module structure. Class definitions and interface documentation. File paths and import statements.',
'Feature implementation guide with API extensions. Configuration options and customization points. Testing requirements.',
'Company holiday schedule and PTO policy. HR contact information.',
'Refactoring patterns and code improvement strategies. Before/after examples with measurable improvements. Migration guide.',
]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
valCECorrelationEvaluator| Metric | Value |
|---|---|
| pearson | 0.9929 |
| spearman | 0.9384 |
sentence_0, sentence_1, and label| sentence_0 | sentence_1 | label | |
|---|---|---|---|
| type | string | string | float |
| details |
|
|
|
| sentence_0 | sentence_1 | label |
|---|---|---|
How to implement error handling |
Implementation guide for error handling. Step-by-step instructions with code examples. Covers setup, configuration, and best practices. |
1.0 |
Where is UserModel defined |
Source code location and module structure. Class definitions and interface documentation. File paths and import statements. |
1.0 |
Add metrics collection to scheduler |
Feature implementation guide with API extensions. Configuration options and customization points. Testing requirements. |
1.0 |
BinaryCrossEntropyLoss with these parameters:{
"activation_fn": "torch.nn.modules.linear.Identity",
"pos_weight": null
}
per_device_train_batch_size: 16eval_strategy: stepsper_device_eval_batch_size: 16per_device_train_batch_size: 16num_train_epochs: 3max_steps: -1learning_rate: 5e-05lr_scheduler_type: linearlr_scheduler_kwargs: Nonewarmup_steps: 0optim: adamw_torch_fusedoptim_args: Noneweight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08optim_target_modules: Nonegradient_accumulation_steps: 1average_tokens_across_devices: Truemax_grad_norm: 1label_smoothing_factor: 0.0bf16: Falsefp16: Falsebf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Nonetorch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneuse_liger_kernel: Falseliger_kernel_config: Noneuse_cache: Falseneftune_noise_alpha: Nonetorch_empty_cache_steps: Noneauto_find_batch_size: Falselog_on_each_node: Truelogging_nan_inf_filter: Trueinclude_num_input_tokens_seen: nolog_level: passivelog_level_replica: warningdisable_tqdm: Falseproject: huggingfacetrackio_space_id: trackioeval_strategy: stepsper_device_eval_batch_size: 16prediction_loss_only: Trueeval_on_start: Falseeval_do_concat_batches: Trueeval_use_gather_object: Falseeval_accumulation_steps: Noneinclude_for_metrics: []batch_eval_metrics: Falsesave_only_model: Falsesave_on_each_node: Falseenable_jit_checkpoint: Falsepush_to_hub: Falsehub_private_repo: Nonehub_model_id: Nonehub_strategy: every_savehub_always_push: Falsehub_revision: Noneload_best_model_at_end: Falseignore_data_skip: Falserestore_callback_states_from_checkpoint: Falsefull_determinism: Falseseed: 42data_seed: Noneuse_cpu: Falseaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedataloader_drop_last: Falsedataloader_num_workers: 0dataloader_pin_memory: Truedataloader_persistent_workers: Falsedataloader_prefetch_factor: Noneremove_unused_columns: Truelabel_names: Nonetrain_sampling_strategy: randomlength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falseddp_backend: Noneddp_timeout: 1800fsdp: []fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}deepspeed: Nonedebug: []skip_memory_metrics: Truedo_predict: Falseresume_from_checkpoint: Nonewarmup_ratio: Nonelocal_rank: -1prompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: proportionalrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | val_spearman |
|---|---|---|
| 1.0 | 100 | 0.9382 |
| 2.0 | 200 | 0.9384 |
| 3.0 | 300 | 0.9384 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
Base model
microsoft/MiniLM-L12-H384-uncased