Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper
•
1908.10084
•
Published
•
11
This is a sentence-transformers model finetuned from BAAI/bge-m3. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: XLMRobertaModel
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
"What factors are contributing to pressure on Apple's market share in China?",
"The company forecast low-to-mid single-digit\nrevenue growth, in line with muted expectations. In China, Apple posted $16 billion in revenue, slightly\nabove forecasts, though competition from Huawei and slower AI\nrollout continue to pressure market share. If losses hold, Apple is on track to shed more than $150\nbillion in market value, while a bullish outlook from Microsoft\n<MSFT.O> earlier this week has helped the Windows-maker become\nthe world's most valuable company.",
'With recent\nexchange rate fluctuations adding to the uncertainty, we are\ntaking a more cautious outlook for the near future." While Washington and Beijing on Monday agreed to slash\ntariffs for at least 90 days, the cheer over the temporary truce\nwas tempered by caution given a more permanent trade deal needs\nto be struck, while higher tariffs overall could still weigh on\nthe global economy. Most of the iPhones Foxconn makes for Apple are assembled in\nChina.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
InformationRetrievalEvaluator| Metric | Value |
|---|---|
| cosine_accuracy@1 | 0.3454 |
| cosine_accuracy@3 | 0.6057 |
| cosine_accuracy@5 | 0.7223 |
| cosine_accuracy@10 | 0.8465 |
| cosine_precision@1 | 0.3454 |
| cosine_precision@3 | 0.2019 |
| cosine_precision@5 | 0.1445 |
| cosine_precision@10 | 0.0847 |
| cosine_recall@1 | 0.3454 |
| cosine_recall@3 | 0.6057 |
| cosine_recall@5 | 0.7223 |
| cosine_recall@10 | 0.8465 |
| cosine_ndcg@10 | 0.5859 |
| cosine_mrr@10 | 0.5034 |
| cosine_map@100 | 0.5105 |
sentence_0 and sentence_1| sentence_0 | sentence_1 | |
|---|---|---|
| type | string | string |
| details |
|
|
| sentence_0 | sentence_1 |
|---|---|
By approximately what percentage did Meta's shares increase in after-hours trading following the announcement of its results? |
Its shares |
How have drugmakers responded to proposed tariffs on imported pharmaceutical products during the Commerce Department's investigation? |
The move triggered a 21-day public comment period as part of |
Which South American companies currently use the company's regional services, and what growth expectations does Estevez have for the area? |
The company already has 36 regions and 114 availability |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
eval_strategy: stepsper_device_train_batch_size: 3per_device_eval_batch_size: 3num_train_epochs: 2multi_dataset_batch_sampler: round_robinoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 3per_device_eval_batch_size: 3per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 2max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robin| Epoch | Step | Training Loss | cosine_ndcg@10 |
|---|---|---|---|
| 0.0192 | 50 | - | 0.5170 |
| 0.0384 | 100 | - | 0.5279 |
| 0.0577 | 150 | - | 0.5324 |
| 0.0769 | 200 | - | 0.5336 |
| 0.0961 | 250 | - | 0.5456 |
| 0.1153 | 300 | - | 0.5535 |
| 0.1346 | 350 | - | 0.5507 |
| 0.1538 | 400 | - | 0.5532 |
| 0.1730 | 450 | - | 0.5591 |
| 0.1922 | 500 | 0.2091 | 0.5693 |
| 0.2115 | 550 | - | 0.5666 |
| 0.2307 | 600 | - | 0.5669 |
| 0.2499 | 650 | - | 0.5668 |
| 0.2691 | 700 | - | 0.5636 |
| 0.2884 | 750 | - | 0.5650 |
| 0.3076 | 800 | - | 0.5636 |
| 0.3268 | 850 | - | 0.5677 |
| 0.3460 | 900 | - | 0.5686 |
| 0.3652 | 950 | - | 0.5678 |
| 0.3845 | 1000 | 0.0546 | 0.5624 |
| 0.4037 | 1050 | - | 0.5659 |
| 0.4229 | 1100 | - | 0.5687 |
| 0.4421 | 1150 | - | 0.5704 |
| 0.4614 | 1200 | - | 0.5695 |
| 0.4806 | 1250 | - | 0.5702 |
| 0.4998 | 1300 | - | 0.5582 |
| 0.5190 | 1350 | - | 0.5703 |
| 0.5383 | 1400 | - | 0.5688 |
| 0.5575 | 1450 | - | 0.5722 |
| 0.5767 | 1500 | 0.0529 | 0.5673 |
| 0.5959 | 1550 | - | 0.5669 |
| 0.6151 | 1600 | - | 0.5597 |
| 0.6344 | 1650 | - | 0.5666 |
| 0.6536 | 1700 | - | 0.5626 |
| 0.6728 | 1750 | - | 0.5627 |
| 0.6920 | 1800 | - | 0.5641 |
| 0.7113 | 1850 | - | 0.5572 |
| 0.7305 | 1900 | - | 0.5632 |
| 0.7497 | 1950 | - | 0.5733 |
| 0.7689 | 2000 | 0.0478 | 0.5644 |
| 0.7882 | 2050 | - | 0.5658 |
| 0.8074 | 2100 | - | 0.5608 |
| 0.8266 | 2150 | - | 0.5687 |
| 0.8458 | 2200 | - | 0.5728 |
| 0.8651 | 2250 | - | 0.5581 |
| 0.8843 | 2300 | - | 0.5612 |
| 0.9035 | 2350 | - | 0.5616 |
| 0.9227 | 2400 | - | 0.5650 |
| 0.9419 | 2450 | - | 0.5626 |
| 0.9612 | 2500 | 0.0482 | 0.5665 |
| 0.9804 | 2550 | - | 0.5668 |
| 0.9996 | 2600 | - | 0.5552 |
| 1.0 | 2601 | - | 0.5556 |
| 1.0188 | 2650 | - | 0.5681 |
| 1.0381 | 2700 | - | 0.5620 |
| 1.0573 | 2750 | - | 0.5639 |
| 1.0765 | 2800 | - | 0.5646 |
| 1.0957 | 2850 | - | 0.5714 |
| 1.1150 | 2900 | - | 0.5748 |
| 1.1342 | 2950 | - | 0.5739 |
| 1.1534 | 3000 | 0.033 | 0.5630 |
| 1.1726 | 3050 | - | 0.5655 |
| 1.1918 | 3100 | - | 0.5711 |
| 1.2111 | 3150 | - | 0.5680 |
| 1.2303 | 3200 | - | 0.5742 |
| 1.2495 | 3250 | - | 0.5714 |
| 1.2687 | 3300 | - | 0.5657 |
| 1.2880 | 3350 | - | 0.5636 |
| 1.3072 | 3400 | - | 0.5701 |
| 1.3264 | 3450 | - | 0.5720 |
| 1.3456 | 3500 | 0.0276 | 0.5733 |
| 1.3649 | 3550 | - | 0.5738 |
| 1.3841 | 3600 | - | 0.5743 |
| 1.4033 | 3650 | - | 0.5702 |
| 1.4225 | 3700 | - | 0.5732 |
| 1.4418 | 3750 | - | 0.5705 |
| 1.4610 | 3800 | - | 0.5774 |
| 1.4802 | 3850 | - | 0.5735 |
| 1.4994 | 3900 | - | 0.5781 |
| 1.5186 | 3950 | - | 0.5691 |
| 1.5379 | 4000 | 0.0266 | 0.5729 |
| 1.5571 | 4050 | - | 0.5712 |
| 1.5763 | 4100 | - | 0.5685 |
| 1.5955 | 4150 | - | 0.5711 |
| 1.6148 | 4200 | - | 0.5712 |
| 1.6340 | 4250 | - | 0.5716 |
| 1.6532 | 4300 | - | 0.5762 |
| 1.6724 | 4350 | - | 0.5813 |
| 1.6917 | 4400 | - | 0.5822 |
| 1.7109 | 4450 | - | 0.5805 |
| 1.7301 | 4500 | 0.0337 | 0.5789 |
| 1.7493 | 4550 | - | 0.5745 |
| 1.7686 | 4600 | - | 0.5752 |
| 1.7878 | 4650 | - | 0.5780 |
| 1.8070 | 4700 | - | 0.5815 |
| 1.8262 | 4750 | - | 0.5833 |
| 1.8454 | 4800 | - | 0.5809 |
| 1.8647 | 4850 | - | 0.5711 |
| 1.8839 | 4900 | - | 0.5716 |
| 1.9031 | 4950 | - | 0.5816 |
| 1.9223 | 5000 | 0.0299 | 0.5815 |
| 1.9416 | 5050 | - | 0.5816 |
| 1.9608 | 5100 | - | 0.5847 |
| 1.9800 | 5150 | - | 0.5831 |
| 1.9992 | 5200 | - | 0.5847 |
| 2.0 | 5202 | - | 0.5859 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
BAAI/bge-m3