Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper
•
1908.10084
•
Published
•
10
This is a sentence-transformers model finetuned from answerdotai/ModernBERT-base on the all-nli dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
# Run inference
sentences = [
"The cat sat on the windowsill, watching the birds outside.",
"Quantum computing has the potential to revolutionize cryptography.",
"A delicious homemade pizza requires fresh ingredients and patience.",
"The stock market fluctuates based on economic and political events.",
"Machine learning models improve with more diverse and high-quality data.",
"Quantum computing SOLVES many problems in stock market."
]
f_embeddings = finetuned_model.encode(sentences)
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# Get the similarity scores for the embeddings
f_similarities = finetuned_model.similarity(f_embeddings, f_embeddings)
print(f_similarities)
Below are the cosine similarity matrices before and after fine-tuning:
tensor([[1.0000, 0.9052, 0.9002, 0.9080, 0.8959, 0.8925],
[0.9052, 1.0000, 0.8940, 0.9162, 0.9148, 0.9144],
[0.9002, 0.8940, 1.0000, 0.8995, 0.9033, 0.8940],
[0.9080, 0.9162, 0.8995, 1.0000, 0.9209, 0.9153],
[0.8959, 0.9148, 0.9033, 0.9209, 1.0000, 0.9142],
[0.8925, 0.9144, 0.8940, 0.9153, 0.9142, 1.0000]])
tensor([[1.0000, 0.3817, 0.3830, 0.3936, 0.3612, 0.4211],
[0.3817, 1.0000, 0.4469, 0.5501, 0.5800, 0.6188],
[0.3830, 0.4469, 1.0000, 0.4487, 0.4868, 0.5096],
[0.3936, 0.5501, 0.4487, 1.0000, 0.5981, 0.5528],
[0.3612, 0.5800, 0.4868, 0.5981, 1.0000, 0.5553],
[0.4211, 0.6188, 0.5096, 0.5528, 0.5553, 1.0000]])
Here is a heatmap of the embedding similarity matrix after fine-tuning:
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("ravi259/ModernBERT-base-nli-v2")
# Run inference
sentences = [
'A young boy playing in the grass.',
'There is a child in the grass.',
'The boy is in the sand.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
sts-dev and sts-testEmbeddingSimilarityEvaluator| Metric | sts-dev | sts-test |
|---|---|---|
| pearson_cosine | 0.7501 | 0.696 |
| spearman_cosine | 0.7643 | 0.6893 |
anchor, positive, and negative| anchor | positive | negative | |
|---|---|---|---|
| type | string | string | string |
| details |
|
|
|
| anchor | positive | negative |
|---|---|---|
A person on a horse jumps over a broken down airplane. |
A person is outdoors, on a horse. |
A person is at a diner, ordering an omelette. |
Children smiling and waving at camera |
There are children present |
The kids are frowning |
A boy is jumping on skateboard in the middle of a red bridge. |
The boy does a skateboarding trick. |
The boy skates down the sidewalk. |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
anchor, positive, and negative| anchor | positive | negative | |
|---|---|---|---|
| type | string | string | string |
| details |
|
|
|
| anchor | positive | negative |
|---|---|---|
Two women are embracing while holding to go packages. |
Two woman are holding packages. |
The men are fighting outside a deli. |
Two young children in blue jerseys, one with the number 9 and one with the number 2 are standing on wooden steps in a bathroom and washing their hands in a sink. |
Two kids in numbered jerseys wash their hands. |
Two kids in jackets walk to school. |
A man selling donuts to a customer during a world exhibition event held in the city of Angeles |
A man selling donuts to a customer. |
A woman drinks her coffee in a small cafe. |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
eval_strategy: stepsper_device_train_batch_size: 128per_device_eval_batch_size: 128num_train_epochs: 1warmup_ratio: 0.1fp16: Truebatch_sampler: no_duplicatesoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 128per_device_eval_batch_size: 128per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 1max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportional| Epoch | Step | Validation Loss | sts-dev_spearman_cosine | sts-test_spearman_cosine |
|---|---|---|---|---|
| -1 | -1 | - | 0.5566 | - |
| 0.1266 | 10 | 2.9276 | 0.7376 | - |
| 0.2532 | 20 | 1.6373 | 0.7721 | - |
| 0.3797 | 30 | 1.5806 | 0.7676 | - |
| 0.5063 | 40 | 1.7071 | 0.7613 | - |
| 0.6329 | 50 | 1.7604 | 0.7640 | - |
| 0.7595 | 60 | 1.7851 | 0.7665 | - |
| 0.8861 | 70 | 1.9029 | 0.7643 | - |
| -1 | -1 | - | - | 0.6893 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
answerdotai/ModernBERT-base