Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper
•
1908.10084
•
Published
•
12
This is a sentence-transformers model finetuned from Alibaba-NLP/gte-modernbert-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'Instruct: Identify solution for the email\nQuery: How to grant Placer.ai access via Entra group App_PlacerAI',
'::: table-wrap\n+----------------------+-----------------------------------------------------+\n| **Application Name** | Placer.ai |\n+----------------------+-----------------------------------------------------+\n| **Description** | Media tool |\n+----------------------+-----------------------------------------------------+\n| **Owner** | Lou Roberts |\n+----------------------+-----------------------------------------------------+\n| **Requester** | |\n+----------------------+-----------------------------------------------------+\n| **Service** | |\n+----------------------+-----------------------------------------------------+\n| **On this page** | ::: {.toc-macro .rbtoc1758184816272} |\n| | - [🚪 Access](#Placer.AI-Access) |\n| | - [🧑\u200d🤝\u200d🧑 User Management](#Placer.AI-UserManagement) |\n| | - [Known Issues](#Placer.AI-KnownIssues) |\n| | ::: |\n+----------------------+-----------------------------------------------------+\n:::\n\n## {.emoticon .emoticon-blue-star emoji-id="1f6aa" emoji-shortname=":door:" emoji-fallback="🚪" width="16" height="16" emoticon-name="blue-star"} Access {#Placer.AI-Access}\n\nAdd user to App_PlacerAI group in Entra.\n\nThere are no licensing limits, so you can add anyone from Media or related departments who requests access. If questions, ask the app owner above.\n\n## {.emoticon .emoticon-blue-star emoji-id="1f9d1-200d-1f91d-200d-1f9d1" emoji-shortname=":people_holding_hands:" emoji-fallback="🧑\u200d🤝\u200d🧑" width="16" height="16" emoticon-name="blue-star"} User Management {#Placer.AI-UserManagement}\n\n::: table-wrap\n -------------- -------------------------------------------------------------------\n **Action** **Function**\n Add_User Add user to group and have them sign in. It has JIT provisioning.\n Remove_User Manual via portal\n Disable_User Remove from App_PlacerAI group\n -------------- -------------------------------------------------------------------\n:::\n\n## {.emoticon .emoticon-broken-heart emoji-id="1f494" emoji-shortname=":broken_heart:" emoji-fallback="💔" width="16" height="16" emoticon-name="broken-heart"} Known Issues {#Placer.AI-KnownIssues}\n\n::: table-wrap\n ----------- -------------\n **Issue** **Details**\n \n \n ----------- -------------\n:::',
'## **Computer Setup** {#CreatingaUSBdrivewithOSDCloud-ComputerSetup}\n\n> NOTE: Your machine needs to be able to write unencrypted files to a USB drive. If your computer is not already in Device_USB_Encryption_Exclusion, add it.\n\nOn the machine that will create the USB drive:\n\nInstall Windows 11 24H2 ADK: [https://go.microsoft.com/fwlink/?linkid=2289980](https://go.microsoft.com/fwlink/?linkid=2289980){.external-link rel="nofollow"}\n\n- Install locally when prompted - we want to install on this machine\n\n- You can accept all the defaults when installing\n\nInstall Windows PE add-on: [https://go.microsoft.com/fwlink/?linkid=2289981](https://go.microsoft.com/fwlink/?linkid=2289981){.external-link rel="nofollow"}\n\n- There are no settings, so you can click Next to install.\n\n## **Configuring a USB drive with OSD Cloud** {#CreatingaUSBdrivewithOSDCloud-ConfiguringaUSBdrivewithOSDCloud}\n\n1. Copy this script on a PC:\n\n[https://glondon.sharepoint.com/:u:/r/sites/Group_IT/Shared%20Documents/General/OSDCloud/OSDCloud%20Automated%20USB%20with%20User%20Prompt.ps1?csf=1&web=1&e=6FRniH](https://glondon.sharepoint.com/:u:/r/sites/Group_IT/Shared%20Documents/General/OSDCloud/OSDCloud%20Automated%20USB%20with%20User%20Prompt.ps1?csf=1&web=1&e=6FRniH){.external-link card-appearance="inline" rel="nofollow"}\n\n::: {.confluence-information-macro .confluence-information-macro-information}\n[]{.aui-icon .aui-icon-small .aui-iconfont-info .confluence-information-macro-icon}\n\n::: confluence-information-macro-body\nIt\'s stored in the team "Gravity -- Group IT" so use your Gravity Global account to access the link\n:::\n:::\n\n2. Open PowerShell ISE as an administrator\n\n3. Paste the script in PowerShell ISE and run (or you can download the file and run it). This script will remind you to have a 32GB flash drive connected to the PC.\n\n1. NOTE: The initial popup may appear behind the ISE window. If it seems stuck, there is probably a popup hiding.\n\n2. If you use Windows PowerShell instead (as admin), the first popup message will fail. You can ignore the error and it will proceed automatically.\n\n::: {.confluence-information-macro .confluence-information-macro-information}\n[]{.aui-icon .aui-icon-small .aui-iconfont-info .confluence-information-macro-icon}\n\n::: confluence-information-macro-body\nIt\'s probably best to disconnect other USB flash drives to not accidentally format the wrong flash drive\n:::\n:::\n\n4. The script will also prompt for a name for the OSDCloud Template. Follow the naming scheme Win\\_(version of Windows)\\_(Build version)\\_(Current month and year).\n\n1. Example: Win_11_24H2_Feb25.\n\n5. In the console of PowerShell ISE, eventually you\'ll be asked to confirm which USB Drive you want to format.\n\n1. If you only have one USB drive connected, it\'s the only USB drive listed. If you have more than one USB drive connected, be careful and choose the correct drive.\n\n## **Dell BIOS Configuration Files** {#CreatingaUSBdrivewithOSDCloud-DellBIOSConfigurationFiles}\n\nAdd these two files to the root of the OSDCloud drive. (Note there are two partitions on the USB stick - you want OSDCloud)\n\n- CCTK Folder: [https://glondon.sharepoint.com/:f:/r/sites/Group_IT/Shared%20Documents/General/OSDCloud/CCTK?csf=1&web=1&e=q5sltN](https://glondon.sharepoint.com/:f:/r/sites/Group_IT/Shared%20Documents/General/OSDCloud/CCTK?csf=1&web=1&e=q5sltN){.external-link card-appearance="inline" rel="nofollow"}\n\n- BIOSChanges.ini: [https://glondon.sharepoint.com/:u:/r/sites/Group_IT/Shared%20Documents/General/OSDCloud/BIOSChanges.ini?csf=1&web=1&e=P8ufTf](https://glondon.sharepoint.com/:u:/r/sites/Group_IT/Shared%20Documents/General/OSDCloud/BIOSChanges.ini?csf=1&web=1&e=P8ufTf){.external-link card-appearance="inline" rel="nofollow"}\n\nBoth the folder and file should be placed in the root of the USB drive.\n\n## Dell Command Update and other drivers {#CreatingaUSBdrivewithOSDCloud-DellCommandUpdateandotherdrivers}\n\nOn your OSDCloud USB stick:\n\n- There are two partitions - choose the WINPE "drive" in Windows Explorer\n\n- Create a folder called "Dell_Drivers" in the root\n\n- Download and copy three files there:\n\n- Installer for latest Dell Command Update (Windows Universal Application) [https://www.dell.com/support/kbdoc/en-us/000177325/dell-command-update](https://www.dell.com/support/kbdoc/en-us/000177325/dell-command-update){.external-link card-appearance="inline" rel="nofollow"}\n\n- .NET Desktop Runtime (the version required by Dell Command Update; usually the LTS version): [https://dotnet.microsoft.com/en-us/download/dotnet/8.0](https://dotnet.microsoft.com/en-us/download/dotnet/8.0){.external-link card-appearance="inline" rel="nofollow"}\n\n- Note that you want the DESKTOP RUNTIME, not the SDK, nor the CLI runtime. Annoying.\n\n- Latest Intel NPU driver: [https://www.intel.com/content/www/us/en/download/794734/intel-npu-driver-windows.html](https://www.intel.com/content/www/us/en/download/794734/intel-npu-driver-windows.html){.external-link card-appearance="inline" rel="nofollow"}\n\n- This is only necessary so long as Dell Command Update still installs a version with bugs. This shouldn\'t be necessary past 2025, we hope.\n\n## Rebuild the computer {#CreatingaUSBdrivewithOSDCloud-Rebuildthecomputer}\n\nIf rebuilding a PC already enrolled in Gravity Global, follow this article:\n\n[PC Rebuild Procedure](https://sitesandservices.atlassian.net/wiki/spaces/IT/pages/4294672385/PC+Rebuild+Procedure){linked-resource-id="4294672385" linked-resource-version="11" linked-resource-type="page"}\n\nIf needing to rebuild a PC and enroll it in Gravity Global, follow this article:\n\n[Enrolling a PC in Gravity from other organizations](https://sitesandservices.atlassian.net/wiki/spaces/IT/pages/4294311952/Enrolling+a+PC+in+Gravity+from+other+organizations){linked-resource-id="4294311952" linked-resource-version="10" linked-resource-type="page"}',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.7865, 0.1898],
# [0.7865, 1.0000, 0.3530],
# [0.1898, 0.3530, 1.0000]])
validationEmbeddingSimilarityEvaluator| Metric | Value |
|---|---|
| pearson_cosine | nan |
| spearman_cosine | nan |
sentence_0, sentence_1, and sentence_2| sentence_0 | sentence_1 | sentence_2 | |
|---|---|---|---|
| type | string | string | string |
| details |
|
|
|
| sentence_0 | sentence_1 | sentence_2 |
|---|---|---|
Instruct: Identify solution for the email |
::: plugin_pagetree |
For systems running in AWS Elastic Container Service (ECS), often in Docker (and Fargate, for searching), you sometimes need to run console or command line commands in the Docker container while it is running. |
Instruct: Identify solution for the email |
AWS Organizations made it simpler for us to have many accounts, one for each purpose, instead of trying to set up byzantine permissions to segregate people within one account. For example, creating a limited admin in an account meant for one thing is much, much easier than trying to let people do things like create everything (RDS, EC2, S3) but not be able to touch the ones other people create. |
::: table-wrap |
Instruct: Identify solution for the email |
When creating a new Purchase Order, you may need to get a quote from a supplier before placing the order, other times the price and product are already known. |
::: table-wrap |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim",
"gather_across_devices": false
}
eval_strategy: stepsper_device_train_batch_size: 1per_device_eval_batch_size: 1num_train_epochs: 5fp16: Truemulti_dataset_batch_sampler: round_robinoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 1per_device_eval_batch_size: 1per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 5max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robinrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | validation_spearman_cosine |
|---|---|---|
| 0.0736 | 50 | nan |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
answerdotai/ModernBERT-base