Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper • 1908.10084 • Published • 15
How to use shrijayan/all-mpnet-base-v2-sample with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("shrijayan/all-mpnet-base-v2-sample")
sentences = [
"For the following multiple choice question, select one correct answer. Let s think step by step. Question In a postoperative patient with a urinary diversion, the nurse should monitor the urine volume every hour. Below how many ml h of urine may indicate that the patient is dehydrated or has some type of internal obstruction or loss ? Options A. 200 ml h. B. 100 ml h. C. 80 ml h. D. 50 ml h. E. 30 ml h.",
"Our approach shows that gene expression can be explained by a modest number of co localized transcription factors, however, information on cell type specific binding is crucial for understanding combinatorial gene regulation.",
"We have developed a rapid, simple, sensitive and specific method to quantify β antithrombin activity using 1μL of plasma. β antithrombin significantly increases in patients with ischemic cerebrovascular disease during the acute event, probably by its release from the vasculature.",
"A postoperative patient with a urinary diversion requires close monitoring of urine output to ensure that the diversion is functioning properly and that the patient is not experiencing any complications. Monitoring urine volume every hour is a crucial aspect of postoperative care in this scenario. To determine the correct answer, let s analyze each option A. 200 ml h This is a relatively high urine output, and it would not typically indicate dehydration or internal obstruction. In fact, a urine output of 200 ml h is generally considered adequate and may even be higher than the average urine output for a healthy adult. B. 100 ml h This is also a relatively high urine output and would not typically indicate dehydration or internal obstruction. A urine output of 100 ml h is still within the normal range and would not raise concerns about dehydration or obstruction. C. 80 ml h While this is a slightly lower urine output, it is still within the normal range and would not necessarily indicate dehydration or internal obstruction. D. 50 ml h This is a lower urine output, and it may start to raise concerns about dehydration or internal obstruction. However, it is still not the lowest option, and the nurse may need to consider other factors before determining the cause of the low urine output. E. 30 ml h This is the lowest urine output option, and it would likely indicate that the patient is dehydrated or has some type of internal obstruction or loss. A urine output of 30 ml h is generally considered low and would require immediate attention from the nurse to determine the cause and take corrective action. Considering the options, the correct answer is E. 30 ml h. A urine output of 30 ml h is a critical threshold that may indicate dehydration or internal obstruction, and the nurse should take immediate action to assess the patient s fluid status and the functioning of the urinary diversion. Answer E."
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]This is a sentence-transformers model finetuned from intfloat/e5-base-v2. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'Is the Danish National Hospital Register a valuable study base for epidemiologic research in febrile seizures?',
'The Danish National Hospital Register is a valuable tool for epidemiologic research in febrile seizures.',
'Ans. is c i.e., Presence of depression Good prognostic factors Acute onset late onset onset after 35 years of age Presence of precipitating stressor Good premorbid adjustment catatonic best prognosis Paranoid 2nd best sho duration 6 months Married Positive symptoms Presence of depression family history of mood disorder first episode pyknic fat physique female sex good treatment compliance good response to treatment good social suppo presence of confusion or perplexity normal brain CT Scan outpatient treatment.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
eval-dataset and test-datasetTripletEvaluator| Metric | eval-dataset | test-dataset |
|---|---|---|
| cosine_accuracy | 1.0 | 0.97 |
sentence1, sentence2, and label| sentence1 | sentence2 | label | |
|---|---|---|---|
| type | string | string | float |
| details |
|
|
|
| sentence1 | sentence2 | label |
|---|---|---|
Triad of biotin deficiency is |
Dermatitis, glossitis, Alopecia 407 H 314 Basic pathology 8th Biotin deficiency clinical features Adult Mental changes depression, hallucination , paresthesia, anorexia, nausea, A scaling, seborrheic and erythematous rash may occur around the eye, nose, mouth, as well as extremities 407 H Infant hypotonia, lethargy, apathy, alopecia and a characteristic rash that includes the ears.Symptoms of biotin deficiency includes Anaemia, loss of apepite dermatitis, glossitis 150 U. Satyanarayan Symptoms of biotin deficiency Dermatitis spectacle eyed appearance due to circumocular alopecia, pallor of skin membrane, depression, Lassitude, somnolence, anemia and hypercholesterolaemia 173 Rana Shinde 6th |
1.0 |
Drug responsible for the below condition |
Thalidomide given to pregnant lady can lead to hypoplasia of limbs called as Phocomelia . |
1.0 |
Is benefit from procarbazine , lomustine , and vincristine in oligodendroglial tumors associated with mutation of IDH? |
IDH mutational status identified patients with oligodendroglial tumors who did and did not benefit from alkylating agent chemotherapy with RT. Although patients with codeleted tumors lived longest, patients with noncodeleted IDH mutated tumors also lived longer after CRT. |
1.0 |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
question, answer, and hard_negative| question | answer | hard_negative | |
|---|---|---|---|
| type | string | string | NoneType |
| details |
|
|
| question | answer | hard_negative |
|---|---|---|
Hutchinsons secondaries In skull are due to tumors in |
Adrenal neuroblastomas are malig8nant neoplasms arising from sympathetic neuroblsts in Medulla of adrenal gland Neuroblastoma is a cancer that develops from immature nerve cells found in several areas of the body.Neuroblastoma most commonly arises in and around the adrenalglands, which have similar origins to nerve cells and sit atop the kidneys. |
None |
Proliferative glomerular deposits in the kidney are found in |
IgA nephropathy or Berger s disease immune complex mediated glomerulonephritis defined by the presence of diffuse mesangial IgA deposits often associated with mesangial hypercellularity. Male preponderance, peak incidence in the second and third decades of life.Clinical and laboratory findings Two most common presentations recurrent episodes of macroscopic hematuria during or immediately following an upper respiratory infection often accompanied by proteinuria or persistent asymptomatic microscopic hematuriaIgA deposited in the mesangium is typically polymeric and of the IgA1 subclass. IgM, IgG, C3, or immunoglobulin light chains may be codistributed with IgAPresence of elevated serum IgA levels in 20 50 of patients, IgA deposition in skin biopsies in 15 55 of patients, elevated levels of secretory IgA and IgA fibronectin complexesIgA nephropathy is a benign disease mostly, 5 30 of patients go into a complete remission, with others having hematuria but well preserved renal functionAbou... |
None |
Does meconium aspiration induce oxidative injury in the hippocampus of newborn piglets? |
Our data thus suggest that oxidative injury associated with pulmonary, but not systemic, hemodynamic disturbances may contribute to hippocampal damage after meconium aspiration in newborns. |
None |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
do_predict: Trueeval_strategy: stepsper_device_train_batch_size: 16per_device_eval_batch_size: 16num_train_epochs: 1warmup_ratio: 0.1fp16: Trueload_best_model_at_end: Truebatch_sampler: no_duplicatesoverwrite_output_dir: Falsedo_predict: Trueeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 16per_device_eval_batch_size: 16per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 1max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Trueignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Falsehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportional| Epoch | Step | eval-dataset_cosine_accuracy | test-dataset_cosine_accuracy |
|---|---|---|---|
| 0 | 0 | 1.0 | - |
| 1.0 | 25 | - | 0.97 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
intfloat/e5-base-v2