metadata
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:10836
- loss:TripletLoss
base_model: sentence-transformers/all-mpnet-base-v2
widget:
- source_sentence: >-
how has lack of testing availability led to underreporting of true
incidence of Covid-19?
sentences:
- >-
can an effective sars-cov-2 vaccine be developed for the older
population [SEP] the emergence of sars-cov-2 and its inordinately rapid
spread is posing severe challenges to the wellbeing of millions of
people worldwide, health care systems and the global economy. we aim to
provide a platform exclusively for discussions of individual and age
differences in susceptibility and immune responses to covid caused by
sars-cov-2 infection and how to prevent or reduce severity of disease in
older adults.
- >-
the impact of changes in diagnostic testing practices on estimates of
covid-19 transmission in the united states [SEP] estimates of the
reproductive number for novel pathogens such as sars-cov-2 are essential
for understanding the potential trajectory of the epidemic and the level
of intervention that is needed to bring the epidemic under control.
however, most methods for estimating the basic reproductive number
(r(0)) and time-varying effective reproductive number (r(t)) assume that
the fraction of cases detected and reported is constant through time.
- nan
- source_sentence: >-
will SARS-CoV2 infected people develop immunity? Is cross protection
possible?
sentences:
- nan
- >-
medical ethics in disasters [SEP] disasters frequently create demands
that outstrip available existing medical and societal resources.
disaster may, for example, not only strike care providers and hospital
facilities directly; they may decimate communities capacities to provide
food to the population and carry out critical waste disposal services.
- >-
sars coronavirus pathogenesis: host innate immune responses and viral
antagonism of interferon [SEP] sars-cov is a pathogenic coronavirus that
emerged from a zoonotic reservoir, leading to global dissemination of
the virus. the association sars-cov with aberrant cytokine, chemokine,
and interferon stimulated gene (isg) responses in patients provided
evidence that sars-cov pathogenesis is at least partially controlled by
innate immune signaling.
- source_sentence: >-
what kinds of complications related to COVID-19 are associated with
diabetes
sentences:
- >-
recommendation to optimize safety of elective surgical care while
limiting the spread of covid-19: primum non nocere [SEP] covid-19 has
drastically altered our lives in an unprecedented manner, shuttering
industries, and leaving most of the country in isolation as we adapt to
the evolving crisis. the optimal solution of how to effectively balance
the resumption of standard surgical care while doing everything possible
to limit the spread of covid-19 is undetermined, and could include
strategies such as social distancing, screening forms and tests
including temperature screening, segregation of inpatient and outpatient
teams, proper use of protective gear, and the use of ambulatory surgery
centers (ascs) to provide elective, yet ultimately essential, surgical
care while conserving resources and protecting the health of patients
and health-care providers.
- >-
upper airway symptoms in coronavirus disease 2019 (covid-19) [SEP] upper
airway symptoms in coronavirus disease 2019 (covid-19)
- >-
diabetes mellitus is associated with increased mortality and severity of
disease in covid-19 pneumonia a systematic review, meta-analysis, and
meta-regression [SEP] background and aims diabetes mellitus (dm) is
chronic conditions with devastating multi-systemic complication and may
be associated with severe form of coronavirus disease 2019 (covid-19).
subgroup analysis showed that the association was weaker in studies with
median age 55 years-old (rr 1.92) compared to 55 years-old (rr 3.48),
and in prevalence of hypertension 25 (rr 1.93) compared to 25 (rr 3.06).
- source_sentence: coronavirus early symptoms
sentences:
- >-
the common cold in frail older persons: impact of rhinovirus and
coronavirus in a senior daycare center [SEP] objective: to evaluate the
incidence and impact of rhinovirus and coronavirus infections in older
persons attending daycare. patients: frail older persons and staff
members of the daycare centers who developed signs or symptoms of an
acute respiratory illness measurements: demographic, medical, and
physical findings were recorded on subjects at baseline and during
respiratory illness.
- >-
epidemiology, clinical course, and outcomes of critically ill adults
with covid-19 in new york city: a prospective cohort study [SEP]
background: nearly 30,000 patients with coronavirus disease-2019
(covid-19) have been hospitalized in new york city as of april 14th,
2020. results: of 1,150 adults hospitalized with covid-19 during the
study period, 257 (22) were critically ill.
- >-
coronavirus disease (covid-19): a primer for emergency physicians [SEP]
introduction: rapid worldwide spread of coronavirus disease 2019
(covid-19) has resulted in a global pandemic. discussion: severe acute
respiratory syndrome coronavirus 2 (sars-cov-2), the virus responsible
for causing covid-19, is primarily transmitted from person-to-person
through close contact (approximately 6 ft) by respiratory droplets.
- source_sentence: what types of rapid testing for Covid-19 have been developed?
sentences:
- >-
on the assessment of more reliable covid-19 infected number: the italian
case. [SEP] covid-19 (sars-cov-2) is the most recent pandemic disease
the world is currently managing. patients affected by covid-19 are
identified employing medical swabs applied mainly to (i) citizens with
covid-19 symptoms such as flu or high temperature, or (ii) citizens that
had contacts with covid-19 patients.
- >-
lack of antiviral activity of darunavir against sars-cov-2 [SEP] given
the high need and the absence of specific antivirals for treatment of
covid-19 (the disease caused by severe acute respiratory
syndrome-associated coronavirus-2 sars-cov-2), human immunodeficiency
virus (hiv) protease inhibitors are being considered as therapeutic
alternatives. overall, the data do not support the use of drv for
treatment of covid-19.
- >-
the covid-19 pandemic: important considerations for contact lens
practitioners [SEP] a novel coronavirus (cov), the severe acute
respiratory syndrome coronavirus - 2 (sars-cov-2), results in the
coronavirus disease 2019 (covid-19). thus, it is imperative cl wearers
are reminded of the steps they should follow to minimise their risk of
complications, to reduce their need to leave isolation and seek care.
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- cosine_accuracy@1
- cosine_accuracy@3
- cosine_accuracy@5
- cosine_accuracy@10
- cosine_precision@1
- cosine_precision@3
- cosine_precision@5
- cosine_precision@10
- cosine_recall@1
- cosine_recall@3
- cosine_recall@5
- cosine_recall@10
- cosine_ndcg@10
- cosine_mrr@10
- cosine_map@100
model-index:
- name: SentenceTransformer based on sentence-transformers/all-mpnet-base-v2
results:
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: val
type: val
metrics:
- type: cosine_accuracy@1
value: 0.6
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.8
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.9333333333333333
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.9333333333333333
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.6
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.5777777777777778
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.5733333333333334
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.48666666666666664
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.0037118073861730316
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.011399309808564868
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.019975486198167695
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.033174913852812835
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.5158660061527193
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.7155555555555556
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.18187688764934176
name: Cosine Map@100
SentenceTransformer based on sentence-transformers/all-mpnet-base-v2
This is a sentence-transformers model finetuned from sentence-transformers/all-mpnet-base-v2. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: sentence-transformers/all-mpnet-base-v2
- Maximum Sequence Length: 384 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'what types of rapid testing for Covid-19 have been developed?',
'on the assessment of more reliable covid-19 infected number: the italian case. [SEP] covid-19 (sars-cov-2) is the most recent pandemic disease the world is currently managing. patients affected by covid-19 are identified employing medical swabs applied mainly to (i) citizens with covid-19 symptoms such as flu or high temperature, or (ii) citizens that had contacts with covid-19 patients.',
'lack of antiviral activity of darunavir against sars-cov-2 [SEP] given the high need and the absence of specific antivirals for treatment of covid-19 (the disease caused by severe acute respiratory syndrome-associated coronavirus-2 sars-cov-2), human immunodeficiency virus (hiv) protease inhibitors are being considered as therapeutic alternatives. overall, the data do not support the use of drv for treatment of covid-19.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Information Retrieval
- Dataset:
val - Evaluated with
InformationRetrievalEvaluator
| Metric | Value |
|---|---|
| cosine_accuracy@1 | 0.6 |
| cosine_accuracy@3 | 0.8 |
| cosine_accuracy@5 | 0.9333 |
| cosine_accuracy@10 | 0.9333 |
| cosine_precision@1 | 0.6 |
| cosine_precision@3 | 0.5778 |
| cosine_precision@5 | 0.5733 |
| cosine_precision@10 | 0.4867 |
| cosine_recall@1 | 0.0037 |
| cosine_recall@3 | 0.0114 |
| cosine_recall@5 | 0.02 |
| cosine_recall@10 | 0.0332 |
| cosine_ndcg@10 | 0.5159 |
| cosine_mrr@10 | 0.7156 |
| cosine_map@100 | 0.1819 |
Training Details
Training Dataset
Unnamed Dataset
- Size: 10,836 training samples
- Columns:
sentence_0,sentence_1, andsentence_2 - Approximate statistics based on the first 1000 samples:
sentence_0 sentence_1 sentence_2 type string string string details - min: 5 tokens
- mean: 18.36 tokens
- max: 50 tokens
- min: 3 tokens
- mean: 87.23 tokens
- max: 219 tokens
- min: 3 tokens
- mean: 81.52 tokens
- max: 252 tokens
- Samples:
sentence_0 sentence_1 sentence_2 coronavirus originthe origin, transmission and clinical therapies on coronavirus disease 2019 (covid-19) outbreak an update on the status [SEP] an acute respiratory disease, caused by a novel coronavirus (sars-cov-2, previously known as 2019-ncov), the coronavirus disease 2019 (covid-19) has spread throughout china and received worldwide attention. the emergence of sars-cov-2, since the severe acute respiratory syndrome coronavirus (sars-cov) in 2002 and middle east respiratory syndrome coronavirus (mers-cov) in 2012, marked the third introduction of a highly pathogenic and large-scale epidemic coronavirus into the human population in the twenty-first century.challenges in developing methods for quantifying the effects of weather and climate on water-associated diseases: a systematic review [SEP] infectious diseases attributable to unsafe water supply, sanitation and hygiene (e.g. cholera, leptospirosis, giardiasis) remain an important cause of morbidity and mortality, especially in low-income countries. furthermore, the methods often did not distinguish among the multiple sources of time-lags (e.g. patient physiology, reporting bias, healthcare access) between environmental drivers/exposures and disease detection.Seeking information on best practices for activities and duration of quarantine for those exposed and/ infected to COVID-19 virus.recommendation to optimize safety of elective surgical care while limiting the spread of covid-19: primum non nocere [SEP] covid-19 has drastically altered our lives in an unprecedented manner, shuttering industries, and leaving most of the country in isolation as we adapt to the evolving crisis. the optimal solution of how to effectively balance the resumption of standard surgical care while doing everything possible to limit the spread of covid-19 is undetermined, and could include strategies such as social distancing, screening forms and tests including temperature screening, segregation of inpatient and outpatient teams, proper use of protective gear, and the use of ambulatory surgery centers (ascs) to provide elective, yet ultimately essential, surgical care while conserving resources and protecting the health of patients and health-care providers.killing more than pain: etiology and remedy for an opioid crisis [SEP] the search for effective pain relief has been ever present across human history. this chapter describes the etiology and epidemiology of the opioid crisis using public health and health belief model frameworks and reviews approaches that have been applied to address supply (e.g., overprescribing) and demand (e.g., medication treatments) sides of the equation.coronavirus early symptomsnanimpact of antibacterials on subsequent resistance and clinical outcomes in adult patients with viral pneumonia: an opportunity for stewardship [SEP] introduction: respiratory viruses are increasingly recognized as significant etiologies of pneumonia among hospitalized patients. method: this was a single-center retrospective cohort study to evaluate the impact of antibacterials in viral pneumonia on clinical outcomes and subsequent multidrug-resistant organism (mdro) infections/colonization. - Loss:
TripletLosswith these parameters:{ "distance_metric": "TripletDistanceMetric.EUCLIDEAN", "triplet_margin": 5 }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy: stepsper_device_train_batch_size: 16per_device_eval_batch_size: 16multi_dataset_batch_sampler: round_robin
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 16per_device_eval_batch_size: 16per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 3max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}tp_size: 0fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robin
Training Logs
| Epoch | Step | Training Loss | val_cosine_ndcg@10 |
|---|---|---|---|
| 0.7375 | 500 | 4.4901 | - |
| 1.0 | 678 | - | 0.5159 |
Framework Versions
- Python: 3.11.12
- Sentence Transformers: 3.4.1
- Transformers: 4.50.3
- PyTorch: 2.6.0+cu124
- Accelerate: 1.5.2
- Datasets: 3.5.0
- Tokenizers: 0.21.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
TripletLoss
@misc{hermans2017defense,
title={In Defense of the Triplet Loss for Person Re-Identification},
author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
year={2017},
eprint={1703.07737},
archivePrefix={arXiv},
primaryClass={cs.CV}
}