miriad-embedding / README.md
tien314's picture
mtien/miriad-embedding
ca201f8 verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - dense
  - generated_from_trainer
  - dataset_size:2000
  - loss:MatryoshkaLoss
  - loss:MultipleNegativesRankingLoss
base_model: sentence-transformers/all-mpnet-base-v2
widget:
  - source_sentence: >
      What methods have been attempted to improve resin bond strength to
      irradiated dentin?
    sentences:
      - >-
        Patients with BHD syndrome may have concerns about communicating genetic
        risk to their family members, especially if their family has different
        communication patterns or cultural norms. Some patients may find it
        difficult to share information about an inherited, potentially lethal
        disorder with their family members. It is observed that families in
        which affected members have experienced significant morbidity are more
        likely to pursue genetic testing and surveillance. However, this
        phenomenon has not been systematically studied in the BHD population.
        Patients may also worry that their family members are not motivated to
        pursue genetic testing and surveillance. In these situations, patients
        can share medical papers and handouts with their family members and
        inform them about the process to obtain genetic testing. Additionally,
        patients can encourage their family members to attend scientific
        meetings and connect with other BHD families through resources like the
        Myrovlytis website. Cancer Genetic Counselors (CGC) and/or Advanced
        Practice Nurses in Genetics (APNG) can also provide support and guidance
        to patients and their families in coping with the psychosocial
        ramifications of BHD.
      - >-
        Psychological stress has been found to have a significant impact on
        medical illness, including ocular disease. While vision researchers have
        not fully embraced the approach of psychoneuroimmunology in addressing
        ocular disease, it is clear that no organ system is protected from the
        effects of negative emotional states. Stress is more prevalent among the
        elderly, and conditions such as retirement, chronic illness, loss of
        loved ones, and caregiver's stress can induce chronic debilitating
        stress. Ophthalmologists should prioritize time with patients to
        establish a compassionate rapport and address emotional factors that may
        contribute to ocular conditions. Failure to do so compromises the
        individual's opportunity for healing.
      - >-
        Many researchers have attempted to improve resin bond strength to
        irradiated dentin by removing the denatured layer mechanically and
        chemically. However, efficient methods for clinical application have not
        yet been established. The reduction of dentin bonding strength is
        believed to be due to the denatured layer of dentin surface, which has
        led to the exploration of various techniques to remove or mitigate its
        effects.
  - source_sentence: |
      What are the clinical features of peripheral ossifying fibroma?
    sentences:
      - >-
        The management of intracranial hemorrhage after thrombolysis is still
        uncertain. It is unclear whether patients with severe intracranial
        hemorrhage soon after thrombolytic therapy should receive only
        supportive medical care or should be aggressively managed with treatment
        of increased intracranial pressure, ventriculostomy, or neurosurgical
        evacuation. The use of clinical decision-making aids, such as Figure 1,
        may assist clinicians in making empirical decisions for these patients.
      - >-
        When the diagnosis of HIT is confirmed, therapeutic doses of alternative
        non-heparin anticoagulants are usually required. Heparin treatments must
        be stopped immediately, including heparin-bonded catheters and heparin
        flushes. Patients should be given a non-heparin anticoagulant such as
        direct thrombin inhibitors like Bivalirudin, Argatroban, or Lepirudin.
        These inhibitors directly inhibit the actions of thrombin and do not
        require a cofactor. They are active against both free and clot-bound
        thrombin and do not interact with or produce heparin-dependent
        antibodies.
      - >-
        Histopathological evaluation of biopsy specimens of peripheral ossifying
        fibroma typically reveals intact or ulcerated stratified squamous
        surface epithelium, potentially mature mineralized material, epithelial
        proliferation, benign fibrous connective tissue with varying fibroblast
        content, myofibroblasts and collagen, lamellar or woven osteoid, and
        cement-like material or dystrophic calcifications. The presence of acute
        and chronic inflammatory cells may also be observed.
  - source_sentence: >
      What are the common clinical features and diagnostic criteria of relapsing
      polychondritis?
    sentences:
      - >-
        Lethal complications of relapsing polychondritis are often associated
        with airway or cardiovascular involvement. This can include
        complications such as aortic incompetence, mitral regurgitation,
        pericarditis, cardiac ischemia, aneurysms of large arteries, vasculitis
        of the central nervous system, phlebitis, and Raynaud's phenomenon.
        Neurological and renal system involvement can also occur, although it is
        rare. Regular follow-up and management are important to monitor and
        prevent potential complications in patients with relapsing
        polychondritis.
      - >-
        Media focus can contribute to the risk of burnout in managers. Burnout
        is a prolonged response to chronic emotional and interpersonal stressors
        at work. The pressure and scrutiny from the media can lead to feelings
        of exhaustion, cynicism, and inefficacy, which are the three dimensions
        of burnout. Managers may respond to increased pressure by becoming
        avoidant, narrow-minded, and hard on themselves, their subordinates, and
        their families. They may also try to establish emotional and cognitive
        distance from the pressuring situation. Ultimately, the exposure to
        negative media focus with elements of personification can increase the
        risk of burnout in some managers.
      - >-
        Intrathymic injection of MBP has potential applications in various
        medical treatments. It can be used in surgical brain injuries caused by
        cutting, electric coagulation, suction, and traction to alleviate the
        secondary attack to the brain tissue and reduce the auto-inflammation
        process triggered by the exposure of autoantigens. It may also be
        beneficial for elective surgeries, such as intracranial tumor
        operations, to induce immune tolerance and alleviate auto-inflammation.
        With the development of minimally invasive operation techniques,
        intrathymic injection without exposing the thorax can become a simple,
        efficient, and safe procedure. Further studies are needed to investigate
        the potential applications of intrathymic injection of MBP in vivo.
  - source_sentence: >
      What are some potential mechanisms by which quercetin may protect against
      cancer?
    sentences:
      - >-
        There is a significant correlation between serum B2M levels and some
        biochemical parameters, such as ALK, bilirubin, and INR, in patients
        with liver disease. However, no significant correlation has been found
        between serum B2M levels and viral load among patients with liver
        disease.
      - >-
        When the diagnosis of HIT is confirmed, therapeutic doses of alternative
        non-heparin anticoagulants are usually required. Heparin treatments must
        be stopped immediately, including heparin-bonded catheters and heparin
        flushes. Patients should be given a non-heparin anticoagulant such as
        direct thrombin inhibitors like Bivalirudin, Argatroban, or Lepirudin.
        These inhibitors directly inhibit the actions of thrombin and do not
        require a cofactor. They are active against both free and clot-bound
        thrombin and do not interact with or produce heparin-dependent
        antibodies.
      - >-
        Silymarin and Ginkgo biloba extract have been found to possess
        hepatoprotective effects against NDEA-induced hepatocarcinogenesis.
        These extracts can scavenge free radicals, prevent hepatocellular
        damage, and suppress the leakage of enzymes through plasma membranes.
        They may also modify the biotransformation/detoxification of NDEA,
        reducing its liver toxicity. Additionally, silymarin can reduce
        intracellular ROS levels, prevent oxidative stress-induced cellular
        damage, and stimulate hepatic cell proliferation for liver regeneration.
        These effects make silymarin and Ginkgo biloba extract strong candidates
        as chemopreventive agents for liver cancer.
  - source_sentence: >
      What are the molecular mechanisms involved in the synergistic induction of
      SAA by IL-1, TNF-α, and IL-6?
    sentences:
      - >-
        The complex formation of STAT3, NF-κB p65, and p300 is involved in the
        transcriptional activity of the SAA1 gene. STAT3 and p300 are recruited
        to the SAA1 promoter region in response to IL-6 or IL-1β + IL-6
        stimulation. Co-expression of wild type p300 with wild type STAT3
        enhances the luciferase activity of the SAA1 gene in a dose-dependent
        manner. This suggests that the heteromeric complex formation of STAT3,
        NF-κB p65, and p300 contributes to the transcriptional activity of the
        SAA1 gene.
      - >-
        Intrathymic injection of MBP has potential applications in various
        medical treatments. It can be used in surgical brain injuries caused by
        cutting, electric coagulation, suction, and traction to alleviate the
        secondary attack to the brain tissue and reduce the auto-inflammation
        process triggered by the exposure of autoantigens. It may also be
        beneficial for elective surgeries, such as intracranial tumor
        operations, to induce immune tolerance and alleviate auto-inflammation.
        With the development of minimally invasive operation techniques,
        intrathymic injection without exposing the thorax can become a simple,
        efficient, and safe procedure. Further studies are needed to investigate
        the potential applications of intrathymic injection of MBP in vivo.
      - >-
        Phenotypic screens of approved drug collections and synergistic
        combinations can be a useful approach for rapid identification of new
        therapeutics for drug-resistant bacteria. This approach can also be
        applied to emerging outbreaks of infectious diseases where vaccines and
        therapeutic agents are unavailable or unrealistic to develop in a short
        period of time. By screening existing drugs and combinations, new
        therapeutics can be identified and potentially repurposed for the
        treatment of drug-resistant infections.
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - cosine_accuracy@1
  - cosine_accuracy@3
  - cosine_accuracy@5
  - cosine_accuracy@10
  - cosine_precision@1
  - cosine_precision@3
  - cosine_precision@5
  - cosine_precision@10
  - cosine_recall@1
  - cosine_recall@3
  - cosine_recall@5
  - cosine_recall@10
  - cosine_ndcg@10
  - cosine_mrr@10
  - cosine_map@100
model-index:
  - name: SentenceTransformer based on sentence-transformers/all-mpnet-base-v2
    results:
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 768
          type: dim_768
        metrics:
          - type: cosine_accuracy@1
            value: 0.7775
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.8885
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.917
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.947
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.7775
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.29616666666666663
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.18340000000000004
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.09470000000000002
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.7775
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.8885
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.917
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.947
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.8637977392462012
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.8369255952380947
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.8394380047776188
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 512
          type: dim_512
        metrics:
          - type: cosine_accuracy@1
            value: 0.7785
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.8825
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.917
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.944
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.7785
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.29416666666666663
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.18340000000000004
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.09440000000000003
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.7785
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.8825
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.917
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.944
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.8623716893141778
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.8360055555555553
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.8388749447751291
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 128
          type: dim_128
        metrics:
          - type: cosine_accuracy@1
            value: 0.7555
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.8655
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.9145
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.943
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.7555
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.2884999999999999
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.18290000000000003
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.09430000000000001
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.7555
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.8655
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.9145
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.943
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.8499528413626729
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.8199301587301584
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.8224780775804242
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 64
          type: dim_64
        metrics:
          - type: cosine_accuracy@1
            value: 0.714
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.8365
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.877
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.9285
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.714
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.27883333333333327
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.1754
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.09285
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.714
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.8365
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.877
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.9285
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.8195584918161248
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.7848236111111104
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.7878148778237813
            name: Cosine Map@100

SentenceTransformer based on sentence-transformers/all-mpnet-base-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-mpnet-base-v2. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-mpnet-base-v2
  • Maximum Sequence Length: 384 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 384, 'do_lower_case': False, 'architecture': 'MPNetModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'What are the molecular mechanisms involved in the synergistic induction of SAA by IL-1, TNF-α, and IL-6?\n',
    'The complex formation of STAT3, NF-κB p65, and p300 is involved in the transcriptional activity of the SAA1 gene. STAT3 and p300 are recruited to the SAA1 promoter region in response to IL-6 or IL-1β + IL-6 stimulation. Co-expression of wild type p300 with wild type STAT3 enhances the luciferase activity of the SAA1 gene in a dose-dependent manner. This suggests that the heteromeric complex formation of STAT3, NF-κB p65, and p300 contributes to the transcriptional activity of the SAA1 gene.',
    'Phenotypic screens of approved drug collections and synergistic combinations can be a useful approach for rapid identification of new therapeutics for drug-resistant bacteria. This approach can also be applied to emerging outbreaks of infectious diseases where vaccines and therapeutic agents are unavailable or unrealistic to develop in a short period of time. By screening existing drugs and combinations, new therapeutics can be identified and potentially repurposed for the treatment of drug-resistant infections.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.7925, 0.1356],
#         [0.7925, 1.0000, 0.1694],
#         [0.1356, 0.1694, 1.0000]])

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.7775
cosine_accuracy@3 0.8885
cosine_accuracy@5 0.917
cosine_accuracy@10 0.947
cosine_precision@1 0.7775
cosine_precision@3 0.2962
cosine_precision@5 0.1834
cosine_precision@10 0.0947
cosine_recall@1 0.7775
cosine_recall@3 0.8885
cosine_recall@5 0.917
cosine_recall@10 0.947
cosine_ndcg@10 0.8638
cosine_mrr@10 0.8369
cosine_map@100 0.8394

Information Retrieval

Metric Value
cosine_accuracy@1 0.7785
cosine_accuracy@3 0.8825
cosine_accuracy@5 0.917
cosine_accuracy@10 0.944
cosine_precision@1 0.7785
cosine_precision@3 0.2942
cosine_precision@5 0.1834
cosine_precision@10 0.0944
cosine_recall@1 0.7785
cosine_recall@3 0.8825
cosine_recall@5 0.917
cosine_recall@10 0.944
cosine_ndcg@10 0.8624
cosine_mrr@10 0.836
cosine_map@100 0.8389

Information Retrieval

Metric Value
cosine_accuracy@1 0.7555
cosine_accuracy@3 0.8655
cosine_accuracy@5 0.9145
cosine_accuracy@10 0.943
cosine_precision@1 0.7555
cosine_precision@3 0.2885
cosine_precision@5 0.1829
cosine_precision@10 0.0943
cosine_recall@1 0.7555
cosine_recall@3 0.8655
cosine_recall@5 0.9145
cosine_recall@10 0.943
cosine_ndcg@10 0.85
cosine_mrr@10 0.8199
cosine_map@100 0.8225

Information Retrieval

Metric Value
cosine_accuracy@1 0.714
cosine_accuracy@3 0.8365
cosine_accuracy@5 0.877
cosine_accuracy@10 0.9285
cosine_precision@1 0.714
cosine_precision@3 0.2788
cosine_precision@5 0.1754
cosine_precision@10 0.0929
cosine_recall@1 0.714
cosine_recall@3 0.8365
cosine_recall@5 0.877
cosine_recall@10 0.9285
cosine_ndcg@10 0.8196
cosine_mrr@10 0.7848
cosine_map@100 0.7878

Training Details

Training Dataset

Unnamed Dataset

  • Size: 2,000 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 8 tokens
    • mean: 20.92 tokens
    • max: 51 tokens
    • min: 30 tokens
    • mean: 116.22 tokens
    • max: 227 tokens
  • Samples:
    anchor positive
    What are the common clinical features and diagnostic criteria of relapsing polychondritis?
    Lethal complications of relapsing polychondritis are often associated with airway or cardiovascular involvement. This can include complications such as aortic incompetence, mitral regurgitation, pericarditis, cardiac ischemia, aneurysms of large arteries, vasculitis of the central nervous system, phlebitis, and Raynaud's phenomenon. Neurological and renal system involvement can also occur, although it is rare. Regular follow-up and management are important to monitor and prevent potential complications in patients with relapsing polychondritis.
    What are the treatment options for relapsing polychondritis?
    Lethal complications of relapsing polychondritis are often associated with airway or cardiovascular involvement. This can include complications such as aortic incompetence, mitral regurgitation, pericarditis, cardiac ischemia, aneurysms of large arteries, vasculitis of the central nervous system, phlebitis, and Raynaud's phenomenon. Neurological and renal system involvement can also occur, although it is rare. Regular follow-up and management are important to monitor and prevent potential complications in patients with relapsing polychondritis.
    What are the potential complications associated with relapsing polychondritis?
    Lethal complications of relapsing polychondritis are often associated with airway or cardiovascular involvement. This can include complications such as aortic incompetence, mitral regurgitation, pericarditis, cardiac ischemia, aneurysms of large arteries, vasculitis of the central nervous system, phlebitis, and Raynaud's phenomenon. Neurological and renal system involvement can also occur, although it is rare. Regular follow-up and management are important to monitor and prevent potential complications in patients with relapsing polychondritis.
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • gradient_accumulation_steps: 4
  • learning_rate: 2e-05
  • num_train_epochs: 1
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • warmup_steps: 0.1
  • bf16: True
  • load_best_model_at_end: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 8
  • gradient_accumulation_steps: 4
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: None
  • warmup_ratio: 0.1
  • warmup_steps: 0.1
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • enable_jit_checkpoint: False
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • use_cpu: False
  • seed: 42
  • data_seed: None
  • bf16: True
  • fp16: False
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: -1
  • ddp_backend: None
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • auto_find_batch_size: False
  • full_determinism: False
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • use_cache: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss dim_768_cosine_ndcg@10 dim_512_cosine_ndcg@10 dim_128_cosine_ndcg@10 dim_64_cosine_ndcg@10
-1 -1 - 0.8142 0.8058 0.7676 0.7053
0.032 1 1.5764 0.8146 0.8055 0.7669 0.7049
0.064 2 2.6620 0.8162 0.8077 0.7690 0.7086
0.096 3 1.9032 0.8204 0.8126 0.7759 0.7173
0.128 4 1.6601 0.8252 0.8177 0.7849 0.7282
0.16 5 1.1083 0.8315 0.8251 0.7902 0.7419
0.192 6 2.7345 0.8361 0.8317 0.7970 0.7510
0.224 7 1.2922 0.8375 0.8351 0.8025 0.7620
0.256 8 1.6647 0.8399 0.8367 0.8080 0.7686
0.288 9 1.1997 0.8425 0.8398 0.8133 0.7754
0.32 10 0.8064 0.8441 0.8419 0.8181 0.7799
0.352 11 1.1935 0.8468 0.8442 0.8220 0.7843
0.384 12 0.7776 0.8482 0.8462 0.8242 0.7886
0.416 13 0.9272 0.8494 0.8484 0.8261 0.7940
0.448 14 1.2406 0.8510 0.8502 0.8294 0.7978
0.48 15 1.0830 0.8520 0.8518 0.8325 0.7999
0.512 16 1.9336 0.8534 0.8532 0.8340 0.8017
0.544 17 1.2190 0.8541 0.8537 0.8360 0.8026
0.576 18 1.7060 0.8554 0.8545 0.8388 0.8063
0.608 19 1.4131 0.8571 0.8561 0.8412 0.8084
0.64 20 1.1700 0.8581 0.8569 0.8429 0.8101
0.672 21 0.5671 0.8599 0.8580 0.8445 0.8118
0.704 22 1.4699 0.8613 0.8596 0.8455 0.8140
0.736 23 1.6544 0.8620 0.8608 0.8463 0.8158
0.768 24 2.0854 0.8624 0.8614 0.8476 0.8169
0.8 25 0.9175 0.8630 0.8616 0.8484 0.8180
0.832 26 1.3673 0.8632 0.8615 0.8485 0.8182
0.864 27 1.2114 0.8637 0.8617 0.8491 0.8190
0.896 28 0.9807 0.8637 0.8620 0.8497 0.8190
0.928 29 0.9052 0.8635 0.8620 0.8497 0.8192
0.96 30 1.7420 0.8640 0.8624 0.8500 0.8194
0.992 31 1.3071 0.8640 0.8622 0.8497 0.8193
1.0 32 1.3117 0.8638 0.8624 0.8500 0.8196

Framework Versions

  • Python: 3.12.12
  • Sentence Transformers: 5.2.3
  • Transformers: 5.0.0
  • PyTorch: 2.10.0+cu128
  • Accelerate: 1.12.0
  • Datasets: 4.0.0
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}