Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper
•
1908.10084
•
Published
•
12
This is a sentence-transformers model finetuned from sentence-transformers/all-mpnet-base-v2. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'specifically, the proposed regulations proposed the addition of new sec. 300.13 to the user fee regulations to establish a $67 user fee for issuing an estate tax closing letter for an estate.',
'additionally, the preamble to the proposed regulations explains the special benefits conferred by the issuance of estate tax closing letters and analyzes how the irs has computed that the full cost of issuing an estate tax closing letter is $67.',
'with respect to whether and how the partnership allocates the rehabilitation credit to partners, the comment specifically asked ``whether the partners are allocated 20 percent of the credit each year although all of the credit basis is reduced in the first year when the property is placed in service or whether, after the first year, the remaining four years over which the credit is spread is taken into account and applied solely at the partner level over those remaining years, consistent with the section 1.50-1 regulations.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
sentence_0 and sentence_1| sentence_0 | sentence_1 | |
|---|---|---|
| type | string | string |
| details |
|
|
| sentence_0 | sentence_1 |
|---|---|
in such cases, the staff will accept the use of the simplified method for only some but not all share option grants. |
if a company uses this simplified method, the company should disclose in the notes to its financial statements the use of the method, the reason why the method was used, the types of share option grants for which the method was used if the method was not used for all share option grants, and the periods for which the method was used if the method was not used in all periods. |
background subject to various exceptions, section 6033(a)(1) of the internal revenue code (code) requires every organization exempt from taxation under section 501(a) (tax-exempt organization) to file an annual return, stating specifically the items of gross income, receipts, and disbursements, and such other information for the purpose of carrying out the internal revenue laws as the secretary of the treasury or his delegate (secretary) may by forms or regulations prescribe, and keep such records, render under oath such statements, make such other returns, and comply with such rules and regulations as the secretary may from time to time prescribe. |
the annual information returns required under section 6033 are forms 990, |
interpretive response: no. before becoming a public entity, company a did not use the fair-value-based method for either its share options or its liability awards. |
\12\ \12\ this view is consistent with the fasb's basis for rejecting full retrospective application of fasb asc topic 718 as described in the basis for conclusions of statement 123r, paragraph b251. |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
per_device_train_batch_size: 4per_device_eval_batch_size: 4fp16: Truemulti_dataset_batch_sampler: round_robinoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 4per_device_eval_batch_size: 4per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 3max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robin| Epoch | Step | Training Loss |
|---|---|---|
| 0.0833 | 500 | 0.5984 |
| 0.1666 | 1000 | 0.419 |
| 0.2500 | 1500 | 0.3454 |
| 0.3333 | 2000 | 0.3111 |
| 0.4166 | 2500 | 0.2628 |
| 0.4999 | 3000 | 0.2747 |
| 0.5832 | 3500 | 0.2567 |
| 0.6666 | 4000 | 0.2184 |
| 0.7499 | 4500 | 0.1802 |
| 0.8332 | 5000 | 0.1796 |
| 0.9165 | 5500 | 0.174 |
| 0.9998 | 6000 | 0.1742 |
| 1.0832 | 6500 | 0.1043 |
| 1.1665 | 7000 | 0.1011 |
| 1.2498 | 7500 | 0.1193 |
| 1.3331 | 8000 | 0.1167 |
| 1.4164 | 8500 | 0.1037 |
| 1.4998 | 9000 | 0.1097 |
| 1.5831 | 9500 | 0.1018 |
| 1.6664 | 10000 | 0.1017 |
| 1.7497 | 10500 | 0.1028 |
| 1.8330 | 11000 | 0.0854 |
| 1.9163 | 11500 | 0.088 |
| 1.9997 | 12000 | 0.1027 |
| 2.0830 | 12500 | 0.0778 |
| 2.1663 | 13000 | 0.0645 |
| 2.2496 | 13500 | 0.0503 |
| 2.3329 | 14000 | 0.0822 |
| 2.4163 | 14500 | 0.0616 |
| 2.4996 | 15000 | 0.0688 |
| 2.5829 | 15500 | 0.0543 |
| 2.6662 | 16000 | 0.0678 |
| 2.7495 | 16500 | 0.0565 |
| 2.8329 | 17000 | 0.0683 |
| 2.9162 | 17500 | 0.0412 |
| 2.9995 | 18000 | 0.0726 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
sentence-transformers/all-mpnet-base-v2