abhinand/MedEmbed-training-triplets-v1
Viewer • Updated • 304k • 106 • 8
How to use lion-ai/embeddinggemma-300m-medembed-triplets with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("lion-ai/embeddinggemma-300m-medembed-triplets")
sentences = [
"What were the results of the functional study using RNA and cDNA derivatives of the BRCA1 c.5074+3A>C variant?",
"The patient was discharged on [date of discharge] and his most recent HbA1c level was 7.1% with a few episodes of hypoglycemia.",
"The patient was diagnosed with stage IV NSCLC (malignant pleural effusion) in December 2014. Adenocarcinoma cells from pleural effusion were found, and immunohistochemistry analysis demonstrated positivity in TTF-1 and negativity in CK 5/6 and P63.",
"Based on the results of the functional study using RNA and cDNA derivatives of the BRCA1 c.5074+3A>C variant, it is a likely pathogenic variant."
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]This is a sentence-transformers model finetuned from google/embeddinggemma-300m on the med_embed-training-triplets-v1 dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'Gemma3TextModel'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Dense({'in_features': 768, 'out_features': 3072, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
(3): Dense({'in_features': 3072, 'out_features': 768, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
(4): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("lion-ai/embeddinggemma-300m-medembed-triplets")
# Run inference
queries = [
"What was the outcome of the second surgical debulking procedure?",
]
documents = [
'Although the treatment initially showed signs of efficacy, the tumor progressed rapidly, and the patient died three months after the second surgical debulking procedure.',
'The patient last followed-up two months after surgery. Proptosis had completely subsided but the patient did have mild ptosis. Nevertheless, the patient was very satisfied with the outcome.',
'The patient is advised to seek medical attention if they experience any COVID-19 related symptoms, such as fever, cough, and dyspnea.',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 768] [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[0.6088, 0.3697, 0.0330]])
medembed-triplets-dev-100TripletEvaluator| Metric | Value |
|---|---|
| cosine_accuracy | 0.74 |
anchor, positive, and negative| anchor | positive | negative | |
|---|---|---|---|
| type | string | string | string |
| details |
|
|
|
| anchor | positive | negative |
|---|---|---|
MTHFR homozygous A223V mutation symptoms |
The patient was admitted to our institution where MTHFR homozygous A223V mutation was identified. Folic acid intake was increased to 800 mcg/d, and no other coagulation tests were abnormal. |
The patient had several symptoms of MM, including hypercalcemia, bone fractures, anemia, and renal insufficiency. A biopsy showed atypical clonal plasma cells with Cluster of Differentiation (CD)138 positive infiltration. |
Causes of spindle cell malignancy in the duodenal wall |
Histological analysis revealed a spindle cell malignancy that was positive for CD21, CD23, and vimentin, but negative for CD20, CD34, CD35, CD117, DOG 1, and smooth muscle actin. |
Based on immunohistochemical analysis of the tumor cells, the primary buttock tumor was diagnosed as a skeletal muscle metastasis of the primary small intestine gastrointestinal stromal tumor (GIST). |
What was the patient's main complaint during hospital admission? |
This 27-year-old pregnant woman was admitted to the hospital at 36 weeks gestation with acute vision loss in her left eye and severe onset headache. |
The patient was discharged the next day |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim",
"gather_across_devices": false
}
anchor, positive, and negative| anchor | positive | negative | |
|---|---|---|---|
| type | string | string | string |
| details |
|
|
|
| anchor | positive | negative |
|---|---|---|
What was the initial presentation of the patient? |
The 45-year-old female patient presented to the department with an enlarging lesion in her upper abdomen. |
The patient was transferred to this hospital for further evaluation. |
giant omphalocele symptoms |
The patient, a 9-year-old female, presented to the hospital with a large lump in the anterior abdominal wall extending from the xiphisternum to the level of iliac crest. |
The patient presented with bilateral nasovestibular lumps which grew in size over several months, occluding nasal entrance and protruding outside the nose. |
granulomatous lymphocytic interstitial lung disease treatment |
The patient had clubbing and chronic lung findings, and thorax CT revealed extended and severe bronchiectasis with thickened bronchial walls, some granulomatous nodules and mosaic appearance, compatible with granulomatous lymphocytic interstitial lung disease (GLILD). Regular intravenous immunoglobulin (IVIG) replacement was started. |
The patient was treated with methylprednisolone pulse therapy followed by oral prednisolone (PSL) and cyclophosphamide intravenously. After treatment, arthralgia, renal function, proteinuria, and skin manifestations improved. |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim",
"gather_across_devices": false
}
eval_strategy: stepsper_device_train_batch_size: 16per_device_eval_batch_size: 16gradient_accumulation_steps: 8learning_rate: 2e-05weight_decay: 0.01num_train_epochs: 1warmup_ratio: 0.1dataloader_num_workers: 4load_best_model_at_end: Trueddp_find_unused_parameters: Falseprompts: task: sentence similarity | query: batch_sampler: no_duplicatesoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 16per_device_eval_batch_size: 16per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 8eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 2e-05weight_decay: 0.01adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 1max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 4dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Trueignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthproject: huggingfacetrackio_space_id: trackioddp_find_unused_parameters: Falseddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: noneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Trueprompts: task: sentence similarity | query: batch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportionalrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | Training Loss | Validation Loss | medembed-triplets-dev-100_cosine_accuracy |
|---|---|---|---|---|
| -1 | -1 | - | - | 0.6100 |
| 0.0111 | 10 | 2.4148 | - | - |
| 0.0222 | 20 | 2.2998 | - | - |
| 0.0333 | 30 | 2.1778 | - | - |
| 0.0445 | 40 | 2.0189 | - | - |
| 0.0556 | 50 | 1.8635 | - | - |
| 0.0667 | 60 | 1.7367 | - | - |
| 0.0778 | 70 | 1.6461 | - | - |
| 0.0889 | 80 | 1.5031 | - | - |
| 0.1000 | 90 | 1.4991 | - | - |
| 0.1111 | 100 | 1.3934 | 1.3675 | 0.6700 |
| 0.1222 | 110 | 1.4089 | - | - |
| 0.1334 | 120 | 1.348 | - | - |
| 0.1445 | 130 | 1.348 | - | - |
| 0.1556 | 140 | 1.3034 | - | - |
| 0.1667 | 150 | 1.2936 | - | - |
| 0.1778 | 160 | 1.2916 | - | - |
| 0.1889 | 170 | 1.1942 | - | - |
| 0.2000 | 180 | 1.2397 | - | - |
| 0.2111 | 190 | 1.2626 | - | - |
| 0.2223 | 200 | 1.2502 | 1.1623 | 0.6800 |
| 0.2334 | 210 | 1.2267 | - | - |
| 0.2445 | 220 | 1.2234 | - | - |
| 0.2556 | 230 | 1.1737 | - | - |
| 0.2667 | 240 | 1.1432 | - | - |
| 0.2778 | 250 | 1.0871 | - | - |
| 0.2889 | 260 | 1.1874 | - | - |
| 0.3000 | 270 | 1.1004 | - | - |
| 0.3112 | 280 | 1.1237 | - | - |
| 0.3223 | 290 | 1.1089 | - | - |
| 0.3334 | 300 | 1.0465 | 1.0819 | 0.7000 |
| 0.3445 | 310 | 1.1186 | - | - |
| 0.3556 | 320 | 1.1047 | - | - |
| 0.3667 | 330 | 1.1235 | - | - |
| 0.3778 | 340 | 1.1269 | - | - |
| 0.3889 | 350 | 1.1004 | - | - |
| 0.4001 | 360 | 1.1414 | - | - |
| 0.4112 | 370 | 1.0982 | - | - |
| 0.4223 | 380 | 1.077 | - | - |
| 0.4334 | 390 | 1.0781 | - | - |
| 0.4445 | 400 | 1.0856 | 1.0317 | 0.7100 |
| 0.4556 | 410 | 1.0473 | - | - |
| 0.4667 | 420 | 1.1216 | - | - |
| 0.4778 | 430 | 1.0943 | - | - |
| 0.4890 | 440 | 1.0587 | - | - |
| 0.5001 | 450 | 1.0297 | - | - |
| 0.5112 | 460 | 1.0463 | - | - |
| 0.5223 | 470 | 1.0405 | - | - |
| 0.5334 | 480 | 1.085 | - | - |
| 0.5445 | 490 | 1.0685 | - | - |
| 0.5556 | 500 | 1.047 | 1.0063 | 0.74 |
| 0.5667 | 510 | 1.0331 | - | - |
| 0.5779 | 520 | 1.0309 | - | - |
| 0.5890 | 530 | 1.0146 | - | - |
| 0.6001 | 540 | 1.018 | - | - |
| 0.6112 | 550 | 1.0098 | - | - |
| 0.6223 | 560 | 1.0213 | - | - |
| 0.6334 | 570 | 1.0467 | - | - |
| 0.6445 | 580 | 1.0771 | - | - |
| 0.6556 | 590 | 1.0876 | - | - |
| 0.6668 | 600 | 1.0605 | 0.9780 | 0.7200 |
| 0.6779 | 610 | 1.0287 | - | - |
| 0.6890 | 620 | 1.0296 | - | - |
| 0.7001 | 630 | 1.0052 | - | - |
| 0.7112 | 640 | 1.0105 | - | - |
| 0.7223 | 650 | 0.9932 | - | - |
| 0.7334 | 660 | 0.9831 | - | - |
| 0.7445 | 670 | 1.0151 | - | - |
| 0.7557 | 680 | 1.0012 | - | - |
| 0.7668 | 690 | 0.9714 | - | - |
| 0.7779 | 700 | 1.0313 | 0.9797 | 0.7300 |
| 0.7890 | 710 | 1.0415 | - | - |
| 0.8001 | 720 | 1.0029 | - | - |
| 0.8112 | 730 | 1.0331 | - | - |
| 0.8223 | 740 | 1.0312 | - | - |
| 0.8334 | 750 | 1.041 | - | - |
| 0.8446 | 760 | 0.9796 | - | - |
| 0.8557 | 770 | 1.0296 | - | - |
| 0.8668 | 780 | 0.9824 | - | - |
| 0.8779 | 790 | 1.0317 | - | - |
| 0.8890 | 800 | 1.0647 | 0.9780 | 0.7400 |
| 0.9001 | 810 | 0.9536 | - | - |
| 0.9112 | 820 | 1.0211 | - | - |
| 0.9224 | 830 | 1.0131 | - | - |
| 0.9335 | 840 | 1.0236 | - | - |
| 0.9446 | 850 | 0.9874 | - | - |
| 0.9557 | 860 | 1.0107 | - | - |
| 0.9668 | 870 | 0.9533 | - | - |
| 0.9779 | 880 | 0.9635 | - | - |
| 0.9890 | 890 | 0.996 | - | - |
| 1.0 | 900 | 0.9829 | 0.9730 | 0.7400 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
google/embeddinggemma-300m