SentenceTransformer

This is a sentence-transformers model trained on the french triplet ds and french custom triplet ds datasets. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity
  • Training Datasets:
    • french triplet ds
    • french custom triplet ds

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("thomasavare/all-MiniLM-L6-v2-med-v0")
# Run inference
sentences = [
    'Plan de soins post-opératoires après une chirurgie de kyste osseux anévrismaux',
    'Les exercices de renforcement et de musculation de la hanche ont été commencés tôt, et à la cinquième semaine, on a commencé à marcher avec des béquilles, et quatre semaines plus tard, on a abandonné les béquilles et on a encouragé le patient à marcher de façon autonome.',
    "Le patient a été conseillé de continuer à faire un suivi régulier auprès de son fournisseur de soins de santé primaires et de ses dentistes pour gérer les problèmes postopératoires et s'assurer qu'il n'y a pas de récidive de la maladie.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.3373, 0.4268],
#         [0.3373, 1.0000, 0.4083],
#         [0.4268, 0.4083, 1.0000]])

Training Details

Training Datasets

french triplet ds

  • Dataset: french triplet ds
  • Size: 232,684 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 6 tokens
    • mean: 18.1 tokens
    • max: 42 tokens
    • min: 9 tokens
    • mean: 55.16 tokens
    • max: 239 tokens
    • min: 10 tokens
    • mean: 54.65 tokens
    • max: 228 tokens
  • Samples:
    anchor positive negative
    masse kystique sur l' utérus Le patient s'est présenté aux urgences avec une douleur abdominale sévère dans le quadrant inférieur gauche, qui a été diagnostiquée comme une masse kystique sur la paroi antérieure gauche de l'utérus. Symptômes: masse pelvienne à croissance rapide et taux sériques accrus de marqueurs tumoraux
    Plan de soins post-démarrage pour les patients atteints d' une BPAN La patiente sera suivie en consultation externe avec une surveillance étroite de sa nutrition et de ses habitudes de comportement pour s'assurer qu'elle ne revienne pas à ses comportements antérieurs. Le patient a été libéré le 30e jour d'hospitalisation avec de l'aspirine seule.
    Comment l'état du patient a- t- il réagi au traitement? Le patient a répondu positivement au traitement prescrit par la diéthylcarbamazine et aucun suivi n' est nécessaire. Les symptômes du patient se sont résolus après avoir reçu un traitement et subi des échocardiogrammes de suivi.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

french custom triplet ds

  • Dataset: french custom triplet ds
  • Size: 251,939 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 3 tokens
    • mean: 16.31 tokens
    • max: 74 tokens
    • min: 3 tokens
    • mean: 16.44 tokens
    • max: 72 tokens
    • min: 3 tokens
    • mean: 15.43 tokens
    • max: 61 tokens
  • Samples:
    anchor positive negative
    Cholera Maladie infectieuse Mittelschmerz
    Choléra Cholera Astringents et détergents locaux
    Maladie infectieuse Choléra collision avec tout objet, fixe ou mobile ou en mouvement
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 512
  • learning_rate: 2e-05
  • warmup_steps: 0.05
  • bf16: True
  • project: icd10-embeddings
  • trackio_space_id: thomasavare/icd10-embeddings
  • warmup_ratio: 0.05
  • prompts: {'anchor': 'Instruct : Represent the disease in a standardized clinical concept\nQuery :', 'positive': 'Instruct : Represent the disease in a standardized clinical concept\nQuery :', 'negative': 'Instruct : Represent the disease in a standardized clinical concept\nQuery :'}
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • per_device_train_batch_size: 512
  • num_train_epochs: 3
  • max_steps: -1
  • learning_rate: 2e-05
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_steps: 0.05
  • optim: adamw_torch_fused
  • optim_args: None
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • optim_target_modules: None
  • gradient_accumulation_steps: 1
  • average_tokens_across_devices: True
  • max_grad_norm: 1.0
  • label_smoothing_factor: 0.0
  • bf16: True
  • fp16: False
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • use_liger_kernel: False
  • liger_kernel_config: None
  • use_cache: False
  • neftune_noise_alpha: None
  • torch_empty_cache_steps: None
  • auto_find_batch_size: False
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • include_num_input_tokens_seen: no
  • log_level: passive
  • log_level_replica: warning
  • disable_tqdm: False
  • project: icd10-embeddings
  • trackio_space_id: thomasavare/icd10-embeddings
  • eval_strategy: no
  • per_device_eval_batch_size: 8
  • prediction_loss_only: True
  • eval_on_start: False
  • eval_do_concat_batches: True
  • eval_use_gather_object: False
  • eval_accumulation_steps: None
  • include_for_metrics: []
  • batch_eval_metrics: False
  • save_only_model: False
  • save_on_each_node: False
  • enable_jit_checkpoint: False
  • push_to_hub: False
  • hub_private_repo: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_always_push: False
  • hub_revision: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • restore_callback_states_from_checkpoint: False
  • full_determinism: False
  • seed: 42
  • data_seed: None
  • use_cpu: False
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • dataloader_prefetch_factor: None
  • remove_unused_columns: True
  • label_names: None
  • train_sampling_strategy: random
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • ddp_backend: None
  • ddp_timeout: 1800
  • fsdp: []
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • deepspeed: None
  • debug: []
  • skip_memory_metrics: True
  • do_predict: False
  • resume_from_checkpoint: None
  • warmup_ratio: 0.05
  • local_rank: -1
  • prompts: {'anchor': 'Instruct : Represent the disease in a standardized clinical concept\nQuery :', 'positive': 'Instruct : Represent the disease in a standardized clinical concept\nQuery :', 'negative': 'Instruct : Represent the disease in a standardized clinical concept\nQuery :'}
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss
0.1055 100 5.7088
0.2110 200 4.9171
0.3165 300 4.8061
0.4219 400 4.7313
0.5274 500 4.7004
0.6329 600 4.6518
0.7384 700 4.6307
0.8439 800 4.6992
0.9494 900 4.5778
1.0549 1000 4.4946
1.1603 1100 4.6055
1.2658 1200 4.5647
1.3713 1300 4.5245
1.4768 1400 4.5631
1.5823 1500 4.5186
1.6878 1600 4.5509
1.7932 1700 4.5756
1.8987 1800 4.6112
2.0042 1900 4.4410
2.1097 2000 4.6082
2.2152 2100 4.5329
2.3207 2200 4.5414
2.4262 2300 4.5330
2.5316 2400 4.5384
2.6371 2500 4.5075
2.7426 2600 4.5156
2.8481 2700 4.5750
2.9536 2800 4.6071

Framework Versions

  • Python: 3.12.12
  • Sentence Transformers: 5.2.3
  • Transformers: 5.2.0
  • PyTorch: 2.10.0+cu128
  • Accelerate: 1.13.0
  • Datasets: 4.3.0
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
1
Safetensors
Model size
22.7M params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for thomasavare/all-MiniLM-L6-v2-med-v0