--- tags: - sentence-transformers - sentence-similarity - feature-extraction - dense - generated_from_trainer - dataset_size:2000 - loss:MatryoshkaLoss - loss:MultipleNegativesRankingLoss base_model: sentence-transformers/all-mpnet-base-v2 widget: - source_sentence: 'What methods have been attempted to improve resin bond strength to irradiated dentin? ' sentences: - Patients with BHD syndrome may have concerns about communicating genetic risk to their family members, especially if their family has different communication patterns or cultural norms. Some patients may find it difficult to share information about an inherited, potentially lethal disorder with their family members. It is observed that families in which affected members have experienced significant morbidity are more likely to pursue genetic testing and surveillance. However, this phenomenon has not been systematically studied in the BHD population. Patients may also worry that their family members are not motivated to pursue genetic testing and surveillance. In these situations, patients can share medical papers and handouts with their family members and inform them about the process to obtain genetic testing. Additionally, patients can encourage their family members to attend scientific meetings and connect with other BHD families through resources like the Myrovlytis website. Cancer Genetic Counselors (CGC) and/or Advanced Practice Nurses in Genetics (APNG) can also provide support and guidance to patients and their families in coping with the psychosocial ramifications of BHD. - Psychological stress has been found to have a significant impact on medical illness, including ocular disease. While vision researchers have not fully embraced the approach of psychoneuroimmunology in addressing ocular disease, it is clear that no organ system is protected from the effects of negative emotional states. Stress is more prevalent among the elderly, and conditions such as retirement, chronic illness, loss of loved ones, and caregiver's stress can induce chronic debilitating stress. Ophthalmologists should prioritize time with patients to establish a compassionate rapport and address emotional factors that may contribute to ocular conditions. Failure to do so compromises the individual's opportunity for healing. - Many researchers have attempted to improve resin bond strength to irradiated dentin by removing the denatured layer mechanically and chemically. However, efficient methods for clinical application have not yet been established. The reduction of dentin bonding strength is believed to be due to the denatured layer of dentin surface, which has led to the exploration of various techniques to remove or mitigate its effects. - source_sentence: 'What are the clinical features of peripheral ossifying fibroma? ' sentences: - The management of intracranial hemorrhage after thrombolysis is still uncertain. It is unclear whether patients with severe intracranial hemorrhage soon after thrombolytic therapy should receive only supportive medical care or should be aggressively managed with treatment of increased intracranial pressure, ventriculostomy, or neurosurgical evacuation. The use of clinical decision-making aids, such as Figure 1, may assist clinicians in making empirical decisions for these patients. - When the diagnosis of HIT is confirmed, therapeutic doses of alternative non-heparin anticoagulants are usually required. Heparin treatments must be stopped immediately, including heparin-bonded catheters and heparin flushes. Patients should be given a non-heparin anticoagulant such as direct thrombin inhibitors like Bivalirudin, Argatroban, or Lepirudin. These inhibitors directly inhibit the actions of thrombin and do not require a cofactor. They are active against both free and clot-bound thrombin and do not interact with or produce heparin-dependent antibodies. - Histopathological evaluation of biopsy specimens of peripheral ossifying fibroma typically reveals intact or ulcerated stratified squamous surface epithelium, potentially mature mineralized material, epithelial proliferation, benign fibrous connective tissue with varying fibroblast content, myofibroblasts and collagen, lamellar or woven osteoid, and cement-like material or dystrophic calcifications. The presence of acute and chronic inflammatory cells may also be observed. - source_sentence: 'What are the common clinical features and diagnostic criteria of relapsing polychondritis? ' sentences: - Lethal complications of relapsing polychondritis are often associated with airway or cardiovascular involvement. This can include complications such as aortic incompetence, mitral regurgitation, pericarditis, cardiac ischemia, aneurysms of large arteries, vasculitis of the central nervous system, phlebitis, and Raynaud's phenomenon. Neurological and renal system involvement can also occur, although it is rare. Regular follow-up and management are important to monitor and prevent potential complications in patients with relapsing polychondritis. - Media focus can contribute to the risk of burnout in managers. Burnout is a prolonged response to chronic emotional and interpersonal stressors at work. The pressure and scrutiny from the media can lead to feelings of exhaustion, cynicism, and inefficacy, which are the three dimensions of burnout. Managers may respond to increased pressure by becoming avoidant, narrow-minded, and hard on themselves, their subordinates, and their families. They may also try to establish emotional and cognitive distance from the pressuring situation. Ultimately, the exposure to negative media focus with elements of personification can increase the risk of burnout in some managers. - Intrathymic injection of MBP has potential applications in various medical treatments. It can be used in surgical brain injuries caused by cutting, electric coagulation, suction, and traction to alleviate the secondary attack to the brain tissue and reduce the auto-inflammation process triggered by the exposure of autoantigens. It may also be beneficial for elective surgeries, such as intracranial tumor operations, to induce immune tolerance and alleviate auto-inflammation. With the development of minimally invasive operation techniques, intrathymic injection without exposing the thorax can become a simple, efficient, and safe procedure. Further studies are needed to investigate the potential applications of intrathymic injection of MBP in vivo. - source_sentence: 'What are some potential mechanisms by which quercetin may protect against cancer? ' sentences: - There is a significant correlation between serum B2M levels and some biochemical parameters, such as ALK, bilirubin, and INR, in patients with liver disease. However, no significant correlation has been found between serum B2M levels and viral load among patients with liver disease. - When the diagnosis of HIT is confirmed, therapeutic doses of alternative non-heparin anticoagulants are usually required. Heparin treatments must be stopped immediately, including heparin-bonded catheters and heparin flushes. Patients should be given a non-heparin anticoagulant such as direct thrombin inhibitors like Bivalirudin, Argatroban, or Lepirudin. These inhibitors directly inhibit the actions of thrombin and do not require a cofactor. They are active against both free and clot-bound thrombin and do not interact with or produce heparin-dependent antibodies. - Silymarin and Ginkgo biloba extract have been found to possess hepatoprotective effects against NDEA-induced hepatocarcinogenesis. These extracts can scavenge free radicals, prevent hepatocellular damage, and suppress the leakage of enzymes through plasma membranes. They may also modify the biotransformation/detoxification of NDEA, reducing its liver toxicity. Additionally, silymarin can reduce intracellular ROS levels, prevent oxidative stress-induced cellular damage, and stimulate hepatic cell proliferation for liver regeneration. These effects make silymarin and Ginkgo biloba extract strong candidates as chemopreventive agents for liver cancer. - source_sentence: 'What are the molecular mechanisms involved in the synergistic induction of SAA by IL-1, TNF-α, and IL-6? ' sentences: - The complex formation of STAT3, NF-κB p65, and p300 is involved in the transcriptional activity of the SAA1 gene. STAT3 and p300 are recruited to the SAA1 promoter region in response to IL-6 or IL-1β + IL-6 stimulation. Co-expression of wild type p300 with wild type STAT3 enhances the luciferase activity of the SAA1 gene in a dose-dependent manner. This suggests that the heteromeric complex formation of STAT3, NF-κB p65, and p300 contributes to the transcriptional activity of the SAA1 gene. - Intrathymic injection of MBP has potential applications in various medical treatments. It can be used in surgical brain injuries caused by cutting, electric coagulation, suction, and traction to alleviate the secondary attack to the brain tissue and reduce the auto-inflammation process triggered by the exposure of autoantigens. It may also be beneficial for elective surgeries, such as intracranial tumor operations, to induce immune tolerance and alleviate auto-inflammation. With the development of minimally invasive operation techniques, intrathymic injection without exposing the thorax can become a simple, efficient, and safe procedure. Further studies are needed to investigate the potential applications of intrathymic injection of MBP in vivo. - Phenotypic screens of approved drug collections and synergistic combinations can be a useful approach for rapid identification of new therapeutics for drug-resistant bacteria. This approach can also be applied to emerging outbreaks of infectious diseases where vaccines and therapeutic agents are unavailable or unrealistic to develop in a short period of time. By screening existing drugs and combinations, new therapeutics can be identified and potentially repurposed for the treatment of drug-resistant infections. pipeline_tag: sentence-similarity library_name: sentence-transformers metrics: - cosine_accuracy@1 - cosine_accuracy@3 - cosine_accuracy@5 - cosine_accuracy@10 - cosine_precision@1 - cosine_precision@3 - cosine_precision@5 - cosine_precision@10 - cosine_recall@1 - cosine_recall@3 - cosine_recall@5 - cosine_recall@10 - cosine_ndcg@10 - cosine_mrr@10 - cosine_map@100 model-index: - name: SentenceTransformer based on sentence-transformers/all-mpnet-base-v2 results: - task: type: information-retrieval name: Information Retrieval dataset: name: dim 768 type: dim_768 metrics: - type: cosine_accuracy@1 value: 0.7775 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.8885 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.917 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.947 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.7775 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.29616666666666663 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.18340000000000004 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.09470000000000002 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.7775 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.8885 name: Cosine Recall@3 - type: cosine_recall@5 value: 0.917 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.947 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.8637977392462012 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.8369255952380947 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.8394380047776188 name: Cosine Map@100 - task: type: information-retrieval name: Information Retrieval dataset: name: dim 512 type: dim_512 metrics: - type: cosine_accuracy@1 value: 0.7785 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.8825 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.917 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.944 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.7785 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.29416666666666663 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.18340000000000004 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.09440000000000003 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.7785 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.8825 name: Cosine Recall@3 - type: cosine_recall@5 value: 0.917 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.944 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.8623716893141778 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.8360055555555553 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.8388749447751291 name: Cosine Map@100 - task: type: information-retrieval name: Information Retrieval dataset: name: dim 128 type: dim_128 metrics: - type: cosine_accuracy@1 value: 0.7555 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.8655 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.9145 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.943 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.7555 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.2884999999999999 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.18290000000000003 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.09430000000000001 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.7555 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.8655 name: Cosine Recall@3 - type: cosine_recall@5 value: 0.9145 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.943 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.8499528413626729 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.8199301587301584 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.8224780775804242 name: Cosine Map@100 - task: type: information-retrieval name: Information Retrieval dataset: name: dim 64 type: dim_64 metrics: - type: cosine_accuracy@1 value: 0.714 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.8365 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.877 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.9285 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.714 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.27883333333333327 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.1754 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.09285 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.714 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.8365 name: Cosine Recall@3 - type: cosine_recall@5 value: 0.877 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.9285 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.8195584918161248 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.7848236111111104 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.7878148778237813 name: Cosine Map@100 --- # SentenceTransformer based on sentence-transformers/all-mpnet-base-v2 This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) - **Maximum Sequence Length:** 384 tokens - **Output Dimensionality:** 768 dimensions - **Similarity Function:** Cosine Similarity ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Repository:** [Sentence Transformers on GitHub](https://github.com/huggingface/sentence-transformers) - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) ### Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 384, 'do_lower_case': False, 'architecture': 'MPNetModel'}) (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) (2): Normalize() ) ``` ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer # Download from the 🤗 Hub model = SentenceTransformer("sentence_transformers_model_id") # Run inference sentences = [ 'What are the molecular mechanisms involved in the synergistic induction of SAA by IL-1, TNF-α, and IL-6?\n', 'The complex formation of STAT3, NF-κB p65, and p300 is involved in the transcriptional activity of the SAA1 gene. STAT3 and p300 are recruited to the SAA1 promoter region in response to IL-6 or IL-1β + IL-6 stimulation. Co-expression of wild type p300 with wild type STAT3 enhances the luciferase activity of the SAA1 gene in a dose-dependent manner. This suggests that the heteromeric complex formation of STAT3, NF-κB p65, and p300 contributes to the transcriptional activity of the SAA1 gene.', 'Phenotypic screens of approved drug collections and synergistic combinations can be a useful approach for rapid identification of new therapeutics for drug-resistant bacteria. This approach can also be applied to emerging outbreaks of infectious diseases where vaccines and therapeutic agents are unavailable or unrealistic to develop in a short period of time. By screening existing drugs and combinations, new therapeutics can be identified and potentially repurposed for the treatment of drug-resistant infections.', ] embeddings = model.encode(sentences) print(embeddings.shape) # [3, 768] # Get the similarity scores for the embeddings similarities = model.similarity(embeddings, embeddings) print(similarities) # tensor([[1.0000, 0.7925, 0.1356], # [0.7925, 1.0000, 0.1694], # [0.1356, 0.1694, 1.0000]]) ``` ## Evaluation ### Metrics #### Information Retrieval * Dataset: `dim_768` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters: ```json { "truncate_dim": 768 } ``` | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.7775 | | cosine_accuracy@3 | 0.8885 | | cosine_accuracy@5 | 0.917 | | cosine_accuracy@10 | 0.947 | | cosine_precision@1 | 0.7775 | | cosine_precision@3 | 0.2962 | | cosine_precision@5 | 0.1834 | | cosine_precision@10 | 0.0947 | | cosine_recall@1 | 0.7775 | | cosine_recall@3 | 0.8885 | | cosine_recall@5 | 0.917 | | cosine_recall@10 | 0.947 | | **cosine_ndcg@10** | **0.8638** | | cosine_mrr@10 | 0.8369 | | cosine_map@100 | 0.8394 | #### Information Retrieval * Dataset: `dim_512` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters: ```json { "truncate_dim": 512 } ``` | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.7785 | | cosine_accuracy@3 | 0.8825 | | cosine_accuracy@5 | 0.917 | | cosine_accuracy@10 | 0.944 | | cosine_precision@1 | 0.7785 | | cosine_precision@3 | 0.2942 | | cosine_precision@5 | 0.1834 | | cosine_precision@10 | 0.0944 | | cosine_recall@1 | 0.7785 | | cosine_recall@3 | 0.8825 | | cosine_recall@5 | 0.917 | | cosine_recall@10 | 0.944 | | **cosine_ndcg@10** | **0.8624** | | cosine_mrr@10 | 0.836 | | cosine_map@100 | 0.8389 | #### Information Retrieval * Dataset: `dim_128` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters: ```json { "truncate_dim": 128 } ``` | Metric | Value | |:--------------------|:---------| | cosine_accuracy@1 | 0.7555 | | cosine_accuracy@3 | 0.8655 | | cosine_accuracy@5 | 0.9145 | | cosine_accuracy@10 | 0.943 | | cosine_precision@1 | 0.7555 | | cosine_precision@3 | 0.2885 | | cosine_precision@5 | 0.1829 | | cosine_precision@10 | 0.0943 | | cosine_recall@1 | 0.7555 | | cosine_recall@3 | 0.8655 | | cosine_recall@5 | 0.9145 | | cosine_recall@10 | 0.943 | | **cosine_ndcg@10** | **0.85** | | cosine_mrr@10 | 0.8199 | | cosine_map@100 | 0.8225 | #### Information Retrieval * Dataset: `dim_64` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters: ```json { "truncate_dim": 64 } ``` | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.714 | | cosine_accuracy@3 | 0.8365 | | cosine_accuracy@5 | 0.877 | | cosine_accuracy@10 | 0.9285 | | cosine_precision@1 | 0.714 | | cosine_precision@3 | 0.2788 | | cosine_precision@5 | 0.1754 | | cosine_precision@10 | 0.0929 | | cosine_recall@1 | 0.714 | | cosine_recall@3 | 0.8365 | | cosine_recall@5 | 0.877 | | cosine_recall@10 | 0.9285 | | **cosine_ndcg@10** | **0.8196** | | cosine_mrr@10 | 0.7848 | | cosine_map@100 | 0.7878 | ## Training Details ### Training Dataset #### Unnamed Dataset * Size: 2,000 training samples * Columns: anchor and positive * Approximate statistics based on the first 1000 samples: | | anchor | positive | |:--------|:----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------| | type | string | string | | details | | | * Samples: | anchor | positive | |:------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | What are the common clinical features and diagnostic criteria of relapsing polychondritis?
| Lethal complications of relapsing polychondritis are often associated with airway or cardiovascular involvement. This can include complications such as aortic incompetence, mitral regurgitation, pericarditis, cardiac ischemia, aneurysms of large arteries, vasculitis of the central nervous system, phlebitis, and Raynaud's phenomenon. Neurological and renal system involvement can also occur, although it is rare. Regular follow-up and management are important to monitor and prevent potential complications in patients with relapsing polychondritis. | | What are the treatment options for relapsing polychondritis?
| Lethal complications of relapsing polychondritis are often associated with airway or cardiovascular involvement. This can include complications such as aortic incompetence, mitral regurgitation, pericarditis, cardiac ischemia, aneurysms of large arteries, vasculitis of the central nervous system, phlebitis, and Raynaud's phenomenon. Neurological and renal system involvement can also occur, although it is rare. Regular follow-up and management are important to monitor and prevent potential complications in patients with relapsing polychondritis. | | What are the potential complications associated with relapsing polychondritis?
| Lethal complications of relapsing polychondritis are often associated with airway or cardiovascular involvement. This can include complications such as aortic incompetence, mitral regurgitation, pericarditis, cardiac ischemia, aneurysms of large arteries, vasculitis of the central nervous system, phlebitis, and Raynaud's phenomenon. Neurological and renal system involvement can also occur, although it is rare. Regular follow-up and management are important to monitor and prevent potential complications in patients with relapsing polychondritis. | * Loss: [MatryoshkaLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters: ```json { "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 768, 512, 128, 64 ], "matryoshka_weights": [ 1, 1, 1, 1 ], "n_dims_per_step": -1 } ``` ### Training Hyperparameters #### Non-Default Hyperparameters - `eval_strategy`: steps - `per_device_train_batch_size`: 16 - `gradient_accumulation_steps`: 4 - `learning_rate`: 2e-05 - `num_train_epochs`: 1 - `lr_scheduler_type`: cosine - `warmup_ratio`: 0.1 - `warmup_steps`: 0.1 - `bf16`: True - `load_best_model_at_end`: True - `batch_sampler`: no_duplicates #### All Hyperparameters
Click to expand - `do_predict`: False - `eval_strategy`: steps - `prediction_loss_only`: True - `per_device_train_batch_size`: 16 - `per_device_eval_batch_size`: 8 - `gradient_accumulation_steps`: 4 - `eval_accumulation_steps`: None - `torch_empty_cache_steps`: None - `learning_rate`: 2e-05 - `weight_decay`: 0.0 - `adam_beta1`: 0.9 - `adam_beta2`: 0.999 - `adam_epsilon`: 1e-08 - `max_grad_norm`: 1.0 - `num_train_epochs`: 1 - `max_steps`: -1 - `lr_scheduler_type`: cosine - `lr_scheduler_kwargs`: None - `warmup_ratio`: 0.1 - `warmup_steps`: 0.1 - `log_level`: passive - `log_level_replica`: warning - `log_on_each_node`: True - `logging_nan_inf_filter`: True - `enable_jit_checkpoint`: False - `save_on_each_node`: False - `save_only_model`: False - `restore_callback_states_from_checkpoint`: False - `use_cpu`: False - `seed`: 42 - `data_seed`: None - `bf16`: True - `fp16`: False - `bf16_full_eval`: False - `fp16_full_eval`: False - `tf32`: None - `local_rank`: -1 - `ddp_backend`: None - `debug`: [] - `dataloader_drop_last`: False - `dataloader_num_workers`: 0 - `dataloader_prefetch_factor`: None - `disable_tqdm`: False - `remove_unused_columns`: True - `label_names`: None - `load_best_model_at_end`: True - `ignore_data_skip`: False - `fsdp`: [] - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} - `parallelism_config`: None - `deepspeed`: None - `label_smoothing_factor`: 0.0 - `optim`: adamw_torch_fused - `optim_args`: None - `group_by_length`: False - `length_column_name`: length - `project`: huggingface - `trackio_space_id`: trackio - `ddp_find_unused_parameters`: None - `ddp_bucket_cap_mb`: None - `ddp_broadcast_buffers`: False - `dataloader_pin_memory`: True - `dataloader_persistent_workers`: False - `skip_memory_metrics`: True - `push_to_hub`: False - `resume_from_checkpoint`: None - `hub_model_id`: None - `hub_strategy`: every_save - `hub_private_repo`: None - `hub_always_push`: False - `hub_revision`: None - `gradient_checkpointing`: False - `gradient_checkpointing_kwargs`: None - `include_for_metrics`: [] - `eval_do_concat_batches`: True - `auto_find_batch_size`: False - `full_determinism`: False - `ddp_timeout`: 1800 - `torch_compile`: False - `torch_compile_backend`: None - `torch_compile_mode`: None - `include_num_input_tokens_seen`: no - `neftune_noise_alpha`: None - `optim_target_modules`: None - `batch_eval_metrics`: False - `eval_on_start`: False - `use_liger_kernel`: False - `liger_kernel_config`: None - `eval_use_gather_object`: False - `average_tokens_across_devices`: True - `use_cache`: False - `prompts`: None - `batch_sampler`: no_duplicates - `multi_dataset_batch_sampler`: proportional - `router_mapping`: {} - `learning_rate_mapping`: {}
### Training Logs | Epoch | Step | Training Loss | dim_768_cosine_ndcg@10 | dim_512_cosine_ndcg@10 | dim_128_cosine_ndcg@10 | dim_64_cosine_ndcg@10 | |:-----:|:----:|:-------------:|:----------------------:|:----------------------:|:----------------------:|:---------------------:| | -1 | -1 | - | 0.8142 | 0.8058 | 0.7676 | 0.7053 | | 0.032 | 1 | 1.5764 | 0.8146 | 0.8055 | 0.7669 | 0.7049 | | 0.064 | 2 | 2.6620 | 0.8162 | 0.8077 | 0.7690 | 0.7086 | | 0.096 | 3 | 1.9032 | 0.8204 | 0.8126 | 0.7759 | 0.7173 | | 0.128 | 4 | 1.6601 | 0.8252 | 0.8177 | 0.7849 | 0.7282 | | 0.16 | 5 | 1.1083 | 0.8315 | 0.8251 | 0.7902 | 0.7419 | | 0.192 | 6 | 2.7345 | 0.8361 | 0.8317 | 0.7970 | 0.7510 | | 0.224 | 7 | 1.2922 | 0.8375 | 0.8351 | 0.8025 | 0.7620 | | 0.256 | 8 | 1.6647 | 0.8399 | 0.8367 | 0.8080 | 0.7686 | | 0.288 | 9 | 1.1997 | 0.8425 | 0.8398 | 0.8133 | 0.7754 | | 0.32 | 10 | 0.8064 | 0.8441 | 0.8419 | 0.8181 | 0.7799 | | 0.352 | 11 | 1.1935 | 0.8468 | 0.8442 | 0.8220 | 0.7843 | | 0.384 | 12 | 0.7776 | 0.8482 | 0.8462 | 0.8242 | 0.7886 | | 0.416 | 13 | 0.9272 | 0.8494 | 0.8484 | 0.8261 | 0.7940 | | 0.448 | 14 | 1.2406 | 0.8510 | 0.8502 | 0.8294 | 0.7978 | | 0.48 | 15 | 1.0830 | 0.8520 | 0.8518 | 0.8325 | 0.7999 | | 0.512 | 16 | 1.9336 | 0.8534 | 0.8532 | 0.8340 | 0.8017 | | 0.544 | 17 | 1.2190 | 0.8541 | 0.8537 | 0.8360 | 0.8026 | | 0.576 | 18 | 1.7060 | 0.8554 | 0.8545 | 0.8388 | 0.8063 | | 0.608 | 19 | 1.4131 | 0.8571 | 0.8561 | 0.8412 | 0.8084 | | 0.64 | 20 | 1.1700 | 0.8581 | 0.8569 | 0.8429 | 0.8101 | | 0.672 | 21 | 0.5671 | 0.8599 | 0.8580 | 0.8445 | 0.8118 | | 0.704 | 22 | 1.4699 | 0.8613 | 0.8596 | 0.8455 | 0.8140 | | 0.736 | 23 | 1.6544 | 0.8620 | 0.8608 | 0.8463 | 0.8158 | | 0.768 | 24 | 2.0854 | 0.8624 | 0.8614 | 0.8476 | 0.8169 | | 0.8 | 25 | 0.9175 | 0.8630 | 0.8616 | 0.8484 | 0.8180 | | 0.832 | 26 | 1.3673 | 0.8632 | 0.8615 | 0.8485 | 0.8182 | | 0.864 | 27 | 1.2114 | 0.8637 | 0.8617 | 0.8491 | 0.8190 | | 0.896 | 28 | 0.9807 | 0.8637 | 0.8620 | 0.8497 | 0.8190 | | 0.928 | 29 | 0.9052 | 0.8635 | 0.8620 | 0.8497 | 0.8192 | | 0.96 | 30 | 1.7420 | 0.8640 | 0.8624 | 0.8500 | 0.8194 | | 0.992 | 31 | 1.3071 | 0.8640 | 0.8622 | 0.8497 | 0.8193 | | 1.0 | 32 | 1.3117 | 0.8638 | 0.8624 | 0.8500 | 0.8196 | ### Framework Versions - Python: 3.12.12 - Sentence Transformers: 5.2.3 - Transformers: 5.0.0 - PyTorch: 2.10.0+cu128 - Accelerate: 1.12.0 - Datasets: 4.0.0 - Tokenizers: 0.22.2 ## Citation ### BibTeX #### Sentence Transformers ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ``` #### MatryoshkaLoss ```bibtex @misc{kusupati2024matryoshka, title={Matryoshka Representation Learning}, author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi}, year={2024}, eprint={2205.13147}, archivePrefix={arXiv}, primaryClass={cs.LG} } ``` #### MultipleNegativesRankingLoss ```bibtex @misc{henderson2017efficient, title={Efficient Natural Language Response Suggestion for Smart Reply}, author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil}, year={2017}, eprint={1705.00652}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```