metadata
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- dense
- generated_from_trainer
- dataset_size:75822
- loss:MultipleNegativesRankingLoss
base_model: BAAI/bge-small-en-v1.5
widget:
- source_sentence: blood clots
sentences:
- >-
Herbal infusions as a source of calcium, magnesium, iron, zinc and
copper in human nutrition.
The study material consisted of five herbs: chamomile (flowers), mint
(leaves), St John's wort (flowers and leaves), sage (leaves) and nettle
(leaves), sourced from three producers. The calcium, magnesium, iron,
zinc and copper contents were determined for both dried herb samples and
prepared infusions, and the extraction rates were calculated. Mineral
components were determined using atomic absorption spectrometry
- >-
Vegetarian diets and incidence of diabetes in the Adventist Health
Study-2
Aim To evaluate the relationship of diet to incident diabetes among
non-Black and Black participants in the Adventist Health Study-2.
Methods and Results Participants were 15,200 men and 26,187 women (17.3%
Blacks) across the U.S. and Canada who were free of diabetes and who
provided demographic, anthropometric, lifestyle and dietary data.
Participants were grouped as vegan, lacto ovo vegetarian, pesco
vegetarian, semi-vegetarian or
- >-
Green tea: nature's defense against malignancies.
The current practice of introducing phytochemicals to support the immune
system or fight against diseases is based on centuries old traditions.
Nutritional support is a recent advancement in the domain of diet-based
therapies; green tea and its constituents are one of the important
components of these strategies to prevent and cure various malignancies.
The anti-carcinogenic and anti-mutagenic activities of green tea were
highlighted some years ago suggestin
- source_sentence: carcinogens
sentences:
- >-
Vitamin B12 sources and bioavailability.
The usual dietary sources of vitamin B(12) are animal foods, meat, milk,
egg, fish, and shellfish. As the intrinsic factor-mediated intestinal
absorption system is estimated to be saturated at about 1.5-2.0 microg
per meal under physiologic conditions, vitamin B(12) bioavailability
significantly decreases with increasing intake of vitamin B(12) per
meal. The bioavailability of vitamin B(12) in healthy humans from fish
meat, sheep meat, and chicken meat averaged 42%,
- >-
Dietary intake of nitrate and nitrite and risk of renal cell carcinoma
in the NIH-AARP Diet and Health Study
Background: Nitrate and nitrite are present in many foods and are
precursors of N-nitroso compounds, known animal carcinogens and
potential human carcinogens. We prospectively investigated the
association between nitrate and nitrite intake from dietary sources and
risk of renal cell carcinoma (RCC) overall and clear cell and papillary
histological subtypes in the NIH-AARP Diet and Health Study. Metho
- >-
A 21-day Daniel fast with or without krill oil supplementation improves
anthropometric parameters and the cardiometabolic profile in men and
women
Background The Daniel Fast is a vegan diet that prohibits the
consumption of animal products, refined foods, white flour,
preservatives, additives, sweeteners, flavorings, caffeine, and alcohol.
Following this dietary plan for 21 days has been demonstrated to improve
blood pressure, LDL-C, and certain markers of oxidative stress, but it
has also been shown to low
- source_sentence: Is Distilled Fish Oil Toxin-Free?
sentences:
- >-
Sniffer dogs as part of a bimodal bionic research approach to develop a
lung cancer screening
Lung cancer (LC) continues to represent a heavy burden for health care
systems worldwide. Epidemiological studies predict that its role will
increase in the near future. While patient prognosis is strongly
associated with tumour stage and early detection of disease, no
screening test exists so far. It has been suggested that electronic
sensor devices, commonly referred to as ‘electronic noses’, may be
applicable to
- >-
Efficacy of omega-3 fatty acid supplements (eicosapentaenoic acid and
docosahexaenoic acid) in the secondary prevention of cardiovascular
disease: ...
BACKGROUND: Although previous randomized, double-blind,
placebo-controlled trials reported the efficacy of omega-3 fatty acid
supplements in the secondary prevention of cardiovascular disease (CVD),
the evidence remains inconclusive. Using a meta-analysis, we
investigated the efficacy of eicosapentaenoic acid and docosahexaenoic
acid in the secondary preventi
- >-
A Prospective Study of Long-term Intake of Dietary Fiber and Risk of
Crohn’s Disease and Ulcerative Colitis
Background & Aims Increased intake of dietary fiber has been proposed to
reduce risk of inflammatory bowel diseases (Crohn’s disease [CD],
ulcerative colitis [UC]). However, few prospective studies have examined
associations between long-term intake of dietary fiber and risk of
incident CD or UC. Methods We collected and analyzed data from 170,776
women, followed over 26 y, who participated in the Nur
- source_sentence: trans fats
sentences:
- >-
Laboratory, Epidemiological, and Human Intervention Studies Show That
Tea (Camellia sinensis) May Be Useful in the Prevention of Obesity
Tea (Camellia sinensis, Theaceae) and tea polyphenols have been studied
for the prevention of chronic diseases, including obesity. Obesity
currently affects >20% of adults in the United States and is a risk
factor for chronic diseases such as type II diabetes, cardiovascular
disease, and cancer. Given this increasing public health concern, the
use of dietary agents for the
- >-
Dietary intake of nitrate and nitrite and risk of renal cell carcinoma
in the NIH-AARP Diet and Health Study
Background: Nitrate and nitrite are present in many foods and are
precursors of N-nitroso compounds, known animal carcinogens and
potential human carcinogens. We prospectively investigated the
association between nitrate and nitrite intake from dietary sources and
risk of renal cell carcinoma (RCC) overall and clear cell and papillary
histological subtypes in the NIH-AARP Diet and Health Study. Metho
- >-
Vegetarian and vegan diets in type 2 diabetes management.
Vegetarian and vegan diets offer significant benefits for diabetes
management. In observational studies, individuals following vegetarian
diets are about half as likely to develop diabetes, compared with
non-vegetarians. In clinical trials in individuals with type 2 diabetes,
low-fat vegan diets improve glycemic control to a greater extent than
conventional diabetes diets. Although this effect is primarily
attributable to greater weight loss, evidenc
- source_sentence: poisonous plants
sentences:
- >-
Creating public awareness: state 2025 diabetes forecasts.
The incidence and prevalence of diabetes (primarily type 2 diabetes) has
risen sharply since 1990. It is projected to increase another 64%
between 2010 and 2025, affecting 53.1 million people and resulting in
medical and societal costs of a half trillion dollars a year. We know
how to prevent many cases of diabetes and how to treat it effectively.
Early appropriate treatment makes a significant difference in preventing
major complications and reducin
- >-
Dietary sources of inorganic microparticles and their intake in healthy
subjects and patients with Crohn's disease.
Dietary microparticles are non-biological, bacterial-sized particles.
Endogenous sources are derived from intestinal Ca and phosphate
secretion. Exogenous sources are mainly titanium dioxide (TiO2) and
mixed silicates (Psil); they are resistant to degradation and accumulate
in human Peyer's patch macrophages and there is some evidence that they
exacerbate inflammation in Crohn's disease (CD).
- >-
Antioxidant, antimutagenic, and antitumor effects of pine needles (Pinus
densiflora).
Pine needles (Pinus densiflora Siebold et Zuccarini) have long been used
as a traditional health-promoting medicinal food in Korea. To
investigate their potential anticancer effects, antioxidant,
antimutagenic, and antitumor activities were assessed in vitro and/or in
vivo. Pine needle ethanol extract (PNE) significantly inhibited
Fe(2+)-induced lipid peroxidation and scavenged 1,1-diphenyl-
2-picrylhydrazyl radical in vit
pipeline_tag: sentence-similarity
library_name: sentence-transformers
SentenceTransformer based on BAAI/bge-small-en-v1.5
This is a sentence-transformers model finetuned from BAAI/bge-small-en-v1.5. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: BAAI/bge-small-en-v1.5
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 384 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': True, 'architecture': 'BertModel'})
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'poisonous plants',
'Antioxidant, antimutagenic, and antitumor effects of pine needles (Pinus densiflora).\nPine needles (Pinus densiflora Siebold et Zuccarini) have long been used as a traditional health-promoting medicinal food in Korea. To investigate their potential anticancer effects, antioxidant, antimutagenic, and antitumor activities were assessed in vitro and/or in vivo. Pine needle ethanol extract (PNE) significantly inhibited Fe(2+)-induced lipid peroxidation and scavenged 1,1-diphenyl- 2-picrylhydrazyl radical in vit',
"Dietary sources of inorganic microparticles and their intake in healthy subjects and patients with Crohn's disease.\nDietary microparticles are non-biological, bacterial-sized particles. Endogenous sources are derived from intestinal Ca and phosphate secretion. Exogenous sources are mainly titanium dioxide (TiO2) and mixed silicates (Psil); they are resistant to degradation and accumulate in human Peyer's patch macrophages and there is some evidence that they exacerbate inflammation in Crohn's disease (CD). ",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.5729, 0.4656],
# [0.5729, 1.0000, 0.5740],
# [0.4656, 0.5740, 1.0000]])
Training Details
Training Dataset
Unnamed Dataset
- Size: 75,822 training samples
- Columns:
sentence_0andsentence_1 - Approximate statistics based on the first 1000 samples:
sentence_0 sentence_1 type string string details - min: 3 tokens
- mean: 7.12 tokens
- max: 37 tokens
- min: 28 tokens
- mean: 109.88 tokens
- max: 177 tokens
- Samples:
sentence_0 sentence_1 serotoninThe potential toxicity of artificial sweeteners.
Since their discovery, the safety of artificial sweeteners has been controversial. Artificial sweeteners provide the sweetness of sugar without the calories. As public health attention has turned to reversing the obesity epidemic in the United States, more individuals of all ages are choosing to use these products. These choices may be beneficial for those who cannot tolerate sugar in their diets (e.g., diabetics). However, scientists disagree about the relatindustrial toxinsMarine Food Pollutants as a Risk Factor for Hypoinsulinemia and Type 2 Diabetes
Background Some persistent environmental chemicals are suspected of causing an increased risk of type 2 diabetes mellitus, a disease particularly common after age 70. This concern was examined in a cross-sectional study of elderly subjects in a population with elevated contaminant exposures from seafood species high in the food chain. Methods Clinical examinations of 713 Faroese residents aged 70-74 years (64% of eligible populaUpdate on Herbalife®Bioavailability of vitamin D₂ from UV-B-irradiated button mushrooms in healthy adults deficient in serum 25-hydroxyvitamin D: a randomized controll...
BACKGROUND/OBJECTIVES: Mushrooms contain very little or any vitamin D(2) but are abundant in ergosterol, which can be converted into vitamin D(2) by ultraviolet (UV) irradiation. Our objective was to investigate the bioavailability of vitamin D(2) from vitamin D(2)-enhanced mushrooms by UV-B in humans, and comparing it with a vitamin D(2) supplement. SUBJECTS - Loss:
MultipleNegativesRankingLosswith these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim", "gather_across_devices": false, "directions": [ "query_to_doc" ], "partition_mode": "joint", "hardness_mode": null, "hardness_strength": 0.0 }
Training Hyperparameters
Non-Default Hyperparameters
per_device_train_batch_size: 64per_device_eval_batch_size: 64num_train_epochs: 2multi_dataset_batch_sampler: round_robin
All Hyperparameters
Click to expand
do_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 64per_device_eval_batch_size: 64gradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 2max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: Nonewarmup_ratio: Nonewarmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Trueenable_jit_checkpoint: Falsesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseuse_cpu: Falseseed: 42data_seed: Nonebf16: Falsefp16: Falsebf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: -1ddp_backend: Nonedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonedisable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Nonegroup_by_length: Falselength_column_name: lengthproject: huggingfacetrackio_space_id: trackioddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Truepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_for_metrics: []eval_do_concat_batches: Trueauto_find_batch_size: Falsefull_determinism: Falseddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_num_input_tokens_seen: noneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Trueuse_cache: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robinrouter_mapping: {}learning_rate_mapping: {}
Training Logs
| Epoch | Step | Training Loss |
|---|---|---|
| 0.4219 | 500 | 3.5716 |
| 0.8439 | 1000 | 3.2683 |
| 1.2658 | 1500 | 3.1075 |
| 1.6878 | 2000 | 3.0246 |
Framework Versions
- Python: 3.12.13
- Sentence Transformers: 5.3.0
- Transformers: 5.0.0
- PyTorch: 2.10.0+cu128
- Accelerate: 1.13.0
- Datasets: 4.0.0
- Tokenizers: 0.22.2
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{oord2019representationlearningcontrastivepredictive,
title={Representation Learning with Contrastive Predictive Coding},
author={Aaron van den Oord and Yazhe Li and Oriol Vinyals},
year={2019},
eprint={1807.03748},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/1807.03748},
}