SentenceTransformer based on cambridgeltl/SapBERT-from-PubMedBERT-fulltext

This is a sentence-transformers model finetuned from cambridgeltl/SapBERT-from-PubMedBERT-fulltext on the bc5_cdr_me_sh2015_complete dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Stevenf232/SapBERT_MultipleNegativesRankingLoss_BC5CDR_Context")
# Run inference
sentences = [
    'insomnia [SEP] pressive symptoms was admitted to a psychiatric hospital due to insomnia, loss of appetite, exhaustion, and agitation. Medical treatment',
    'Sleep Initiation and Maintenance Disorders [SEP] Disorders characterized by impairment of the ability to initiate or maintain sleep. This may occur as a primary disorder or in a',
    'Atrioventricular Block [SEP] Impaired impulse conduction from HEART ATRIA to HEART VENTRICLES. AV block can mean delayed or completely blocked impulse conduc',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.8093, 0.1453],
#         [0.8093, 1.0000, 0.1948],
#         [0.1453, 0.1948, 1.0000]])

Training Details

Training Dataset

bc5_cdr_me_sh2015_complete

  • Dataset: bc5_cdr_me_sh2015_complete at f40f655
  • Size: 5,424 training samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string int
    details
    • min: 9 tokens
    • mean: 29.07 tokens
    • max: 79 tokens
    • min: 4 tokens
    • mean: 25.04 tokens
    • max: 43 tokens
    • 1: 100.00%
  • Samples:
    sentence1 sentence2 label
    Naloxone [SEP] Naloxone reverses the antihypertensive effect of clonidine. Naloxone [SEP] A specific opiate antagonist that has no agonist activity. It is a competitive antagonist at mu, delta, and kappa opioid recepto 1
    clonidine [SEP] Naloxone reverses the antihypertensive effect of clonidine. Clonidine [SEP] An imidazoline sympatholytic agent that stimulates ALPHA-2 ADRENERGIC RECEPTORS and central IMIDAZOLINE RECEPTORS. It is commonl 1
    hypertensive [SEP] In unanesthetized, spontaneously hypertensive rats the decrease in blood pressure and heart rate produced by Hypertension [SEP] Persistently high systemic arterial BLOOD PRESSURE. Based on multiple readings (BLOOD PRESSURE DETERMINATION), hypertension is c 1
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Evaluation Dataset

bc5_cdr_me_sh2015_complete

  • Dataset: bc5_cdr_me_sh2015_complete at f40f655
  • Size: 5,445 evaluation samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string int
    details
    • min: 11 tokens
    • mean: 30.69 tokens
    • max: 166 tokens
    • min: 4 tokens
    • mean: 24.66 tokens
    • max: 62 tokens
    • 1: 100.00%
  • Samples:
    sentence1 sentence2 label
    Tricuspid valve regurgitation [SEP] Tricuspid valve regurgitation and lithium carbonate toxicity in a newborn infant. Tricuspid Valve Insufficiency [SEP] Backflow of blood from the RIGHT VENTRICLE into the RIGHT ATRIUM due to imperfect closure of the TRICUSPID VALVE.
    1
    lithium carbonate [SEP] Tricuspid valve regurgitation and lithium carbonate toxicity in a newborn infant. Lithium Carbonate [SEP] A lithium salt, classified as a mood-stabilizing agent. Lithium ion alters the metabolism of BIOGENIC MONOAMINES in the CENTRAL 1
    toxicity [SEP] Tricuspid valve regurgitation and lithium carbonate toxicity in a newborn infant. Drug-Related Side Effects and Adverse Reactions [SEP] Disorders that result from the intended use of PHARMACEUTICAL PREPARATIONS. Included in this heading are a broad variety of chem 1
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • learning_rate: 2e-05
  • max_steps: 200
  • warmup_ratio: 0.1
  • warmup_steps: 0.1
  • fp16: True

All Hyperparameters

Click to expand
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3
  • max_steps: 200
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_ratio: 0.1
  • warmup_steps: 0.1
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • enable_jit_checkpoint: False
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • use_cpu: False
  • seed: 42
  • data_seed: None
  • bf16: False
  • fp16: True
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: -1
  • ddp_backend: None
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • auto_find_batch_size: False
  • full_determinism: False
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • use_cache: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss Validation Loss
0.1176 10 2.6695 2.4324
0.2353 20 2.2030 1.8628
0.3529 30 1.6394 1.5455
0.4706 40 1.5937 1.3570
0.5882 50 1.3294 1.2489
0.7059 60 1.2576 1.1594
0.8235 70 1.0213 1.1042
0.9412 80 1.0295 1.0672
1.0588 90 0.8890 1.0293
1.1765 100 0.9259 1.0030
1.2941 110 0.8096 0.9743
1.4118 120 0.7438 0.9587
1.5294 130 0.7797 0.9442
1.6471 140 0.7999 0.9265
1.7647 150 0.7323 0.9142
1.8824 160 0.7510 0.9070
2.0 170 0.7297 0.9032
2.1176 180 0.6434 0.8985
2.2353 190 0.5984 0.8967
2.3529 200 0.6603 0.8959

Framework Versions

  • Python: 3.12.12
  • Sentence Transformers: 5.2.3
  • Transformers: 5.0.0
  • PyTorch: 2.10.0+cu128
  • Accelerate: 1.12.0
  • Datasets: 4.0.0
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
17
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Stevenf232/SapBERT_MultipleNegativesRankingLoss_BC5CDR_Context

Finetuned
(22)
this model

Dataset used to train Stevenf232/SapBERT_MultipleNegativesRankingLoss_BC5CDR_Context

Papers for Stevenf232/SapBERT_MultipleNegativesRankingLoss_BC5CDR_Context