SentenceTransformer

This is a sentence-transformers model trained on the mnri_dataset and contrastive_dataset datasets. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Maximum Sequence Length: 2500 tokens
  • Output Dimensionality: 1024 dimensions
  • Similarity Function: Cosine Similarity
  • Training Datasets:
    • mnri_dataset
    • contrastive_dataset

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 2500, 'do_lower_case': False, 'architecture': 'Qwen3Model'})
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': True, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
queries = [
    "Instruct: Given a cancer patient summary, retrieve clinical trial options that are reasonable for that patient; or, given a clinical trial option, retrieve cancer patients who are reasonable candidates for that trial. Age: 57\nSex: Male\nCancer type: Gastric adenocarcinoma (HER2\u2011amplified, intestinal type)\nHistology: Moderately differentiated intestinal\u2011type adenocarcinoma\nCurrent extent: Metastatic disease with hepatic metastases (dominant lesion ~4\u202fcm in segment\u202fVII, multiple smaller lesions) and diffuse peritoneal carcinomatosis; bulky omental implants and moderate ascites; persistent circumferential thickening of the gastric cardia (~2.5\u202fcm). No longer receiving disease\u2011directed therapy; transitioned to best\u2011supportive hospice care (ECOG\u202f\u2248\u202f4).\nBiomarkers: HER2 IHC\u202f3+, HER2 (ERBB2) amplification confirmed by FISH (ratio\u202f5.2); Microsatellite stable (MSS); KRAS G12V (pathogenic); TP53 R273C (pathogenic); CDH1 G274E (likely pathogenic); PIK3CA E542K (activating); MET amplification (Copy number\u202f\u223c6); FGFR2 amplification (Copy number\u202f\u223c5); CDKN2A homozygous deletion; Tumor mutational burden\u202f\u2248\u202f8\u202fMut/Mb; Ki\u201167 \u223c45%; CK7\u207a/CK20\u207b; retained MLH1,PMS2,MSH2,MSH6.\nTreatment history:\n# 1/2018\u2011mid\u20112018: Front\u2011line trastuzumab (loading 8\u202fmg/kg then 6\u202fmg/kg q21\u202fd)\u202f+\u202fcisplatin 80\u202fmg/m\u00b2 iv day\u202f1 q21\u202fd\u202f+\u202fcontinuous infusional 5\u2011fu 1000\u202fmg/m\u00b2 days\u202f1\u20114 q21\u202fd (four cycles). Best response: Partial response (shrinkage of gastric wall thickening and hepatic lesions).\n# Late\u202f2018 (cycles\u202f5\u20116): Continuation of the same triplet regimen to complete six cycles; maintained partial response.\n# 12/2018: Restaging CT confirmed ongoing partial response.\n# Early\u202f2019: Planned radical gastrectomy aborted intra\u2011operatively due to diffuse peritoneal disease; feeding jejunostomy placed.\n# 5/2019: Elective laparoscopic right hemicolectomy for synchronous ascending colon adenocarcinoma (pT1a\u202fN0\u202fM0, R0). No adjuvant therapy required.\n# 6/2019 onward: Second\u2011line ramucirumab 10\u202fmg/kg i.v q2\u2009wks\u202f+\u202fweekly paclitaxel 80\u202fmg/m\u00b2 (initiated after progression on first\u2011line). Delivered two cycles; development of grade\u202f2 peripheral sensory neuropathy led to dose reduction of paclitaxel to 70\u202fmg/m\u00b2 and eventual cessation after disease progression.\n# 10/2019: Further disease progression evidenced by growing hepatic metastases and new peritoneal implants; cardiology consulted after NSTEMI; recommendation to cease cytotoxic and anti\u2011angiogenic agents.\n# 1/2020: Evaluated for experimental HER2/PD\u2011L1 bispecific antibody trial; found ineligible due to therapeutic anticoagulation (apixaban) begun shortly after NSTEMI.\n# 4/2020: Transitioned to hospice/best\u2011supportive care; all systemic anticancer agents (incl. trastuzumab) discontinued. Focus shifted to symptom control (opioids, anti\u2011emetics, nutritional support via jejunostomy) and advance\u2011directive implementation.\nCancer type: Ascending colon adenocarcinoma\nHistology: Well\u2011to\u2011moderately differentiated tubular adenocarcinoma, grade\u202f1\nCurrent extent: Resected, pathologic stage\u202fIA (pT1a\u202fN0\u202fM0), R0 margins; no evidence of disease on surveillance; no further therapy required.\nBiomarkers: KRAS wild\u2011type; BRAF V600E negative; MSI stable; CK20\u207a, CDX2\u207a.\nTreatment history:\n# 5/2019: Laparoscopic right hemicolectomy with regional node sampling (0/15 positive); postoperative course uncomplicated.",
]
documents = [
    'Instruct: Given a cancer patient summary, retrieve clinical trial options that are reasonable for that patient; or, given a clinical trial option, retrieve cancer patients who are reasonable candidates for that trial. Age range allowed: 18-75 years. Sex allowed: Male and Female. Cancer type allowed: Any solid malignant tumor. Histology allowed: NA. Cancer burden allowed: Advanced unresectable or metastatic disease where standard therapies have failed, are intolerable, or ineffective. Prior treatment required: Progression after standard anticancer therapy. Prior treatment excluded: Any prior gene or cell therapy product or any prior therapy directly targeting KRAS G12V mutation (e.g., KRAS\u202fG12V‐specific small‑molecule inhibitor or cellular therapy). Biomarkers required: KRAS G12V mutation and HLA‑A*11:01 positivity (both assessed during screening). Biomarkers excluded: NA.',
    'Instruct: Given a cancer patient summary, retrieve clinical trial options that are reasonable for that patient; or, given a clinical trial option, retrieve cancer patients who are reasonable candidates for that trial. Age range allowed: NA. Sex allowed: Male and Female. Cancer type allowed: Non‑small cell lung cancer. Histology allowed: Adenocarcinoma, large\u202fcell, neuroendocrine or any non‑squamous histology (predominantly squamous histology excluded). Cancer burden allowed: Unresectable, locally advanced, or metastatic disease. Prior treatment required: Disease refractory to or progressive after all standard‐of‑care therapies demonstrating clinical benefit, unless no applicable standard therapy exists or the patient chooses to decline. Prior treatment excluded: Receipt of concurrent systemic anticancer therapy other than protocol‑permitted localized palliative radiation or hormone ablative therapy. Biomarkers required: NA. Biomarkers excluded: NA.',
    'Instruct: Given a cancer patient summary, retrieve clinical trial options that are reasonable for that patient; or, given a clinical trial option, retrieve cancer patients who are reasonable candidates for that trial. Age range allowed: 18-75 years. Sex allowed: Male and Female. Cancer type allowed: Any solid malignant tumor. Histology allowed: NA. Cancer burden allowed: Locally advanced or metastatic disease. Prior treatment required: NA. Prior treatment excluded: NA. Biomarkers required: NECTIN4 gene amplification positive. Biomarkers excluded: NA.',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 1024] [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[0.4265, 0.1800, 0.3230]])

Training Details

Training Datasets

mnri_dataset

  • Dataset: mnri_dataset
  • Size: 913,338 training samples
  • Columns: patient_summary_trunc and this_space_trunc
  • Approximate statistics based on the first 1000 samples:
    patient_summary_trunc this_space_trunc
    type string string
    details
    • min: 264 tokens
    • mean: 805.57 tokens
    • max: 1600 tokens
    • min: 112 tokens
    • mean: 170.21 tokens
    • max: 388 tokens
  • Samples:
    patient_summary_trunc this_space_trunc
    Instruct: Given a cancer patient summary, retrieve clinical trial options that are reasonable for that patient; or, given a clinical trial option, retrieve cancer patients who are reasonable candidates for that trial. Age: 15
    Sex: Female
    Cancer type: B‑cell acute lymphoblastic leukemia (precursor B‑cell ALL)
    Histology: Precursor B‑lymphoblasts (flow: CD19⁺, CD22⁺, CD10⁺, CD34⁺, TdT⁺; CD20−, MPO−, surface IgM−)
    Current extent: Systemic disease with diffuse bone‑marrow involvement; now in partial remission after induction (marrow blasts ≈15 %, MRD low‑level positive ∼0.02 %)
    Biomarkers: TCF3‑PBX1 fusion (t(1;19)), hyperdiploid karyotype (extra chromosomes 4, 10, 17), PAX5 truncating mutation (p.R38*), CDKN2A/CDKN2B homozygous deletion, NRAS activating mutation (p.Q61K), low‑frequency CREBBP p.R1746H (subclonal), negative for BCR‑ABL1 and ETV6‑RUNX1; immunophenotype as above
    Treatment history:
    # 08/2017 – 09/2017 (Day 1‑28 of induction): Multidrug induction per COG AALL1131 – vincristine ...
    Instruct: Given a cancer patient summary, retrieve clinical trial options that are reasonable for that patient; or, given a clinical trial option, retrieve cancer patients who are reasonable candidates for that trial. Age range allowed: 0 through 45 years. Sex allowed: both sexes. Cancer type allowed: acute lymphoblastic leukaemia. Histology allowed: B‑cell precursor acute lymphoblastic leukaemia. Cancer burden allowed: newly diagnosed, never relapsed, de novo disease. Prior treatment required: NA. Prior treatment excluded: systemic corticosteroids ≥10 mg/m²/day prednisone equivalents for longer than one week before diagnosis; any chemotherapeutic agent administered within four weeks before diagnosis. Biomarkers required: surface immunoglobulin negative phenotype; IG::MYC rearrangement accepted only when BCL2 and BCL6 rearrangements are absent. Biomarkers excluded: KMT2A‑rearranged B‑cell precursor ALL in patients younger than 1 year; Philadelphia chromosome‑positive (t[9;22]/BCR‑ABL) ...
    Instruct: Given a cancer patient summary, retrieve clinical trial options that are reasonable for that patient; or, given a clinical trial option, retrieve cancer patients who are reasonable candidates for that trial. Age: 53
    Sex: Male
    Cancer type: Lung cancer
    Histology: Poorly differentiated squamous cell carcinoma (left lower lobe)
    Current extent: Metastatic disease – primary left lower‑lobar lesion ~3.5 cm, new left adrenal metastasis (1.8 cm), right hilar lymph node enlargement; overall progressive disease per RECIST 1.1
    Biomarkers: PD‑L1 tumor proportion score 10 %; HER2 (ERBB2) amplification (≈9 copies); TP53 missense mutation p.R273C (likely pathogenic); CDKN2A homozygous deletion; FGFR1 amplification (≈7 copies); PIK3CA H1047R activating mutation; NOTCH1 L1575P variant of uncertain significance; KRAS wild‑type; EGFR wild‑type
    Treatment history:
    # 1/Jan 2017: Diagnostic bronchoscopic forceps biopsies of left lower lobar endobronchial lesion → confirmation of poorly differentiate...
    Instruct: Given a cancer patient summary, retrieve clinical trial options that are reasonable for that patient; or, given a clinical trial option, retrieve cancer patients who are reasonable candidates for that trial. Age range allowed: 18-75 years. Sex allowed: Male and Female. Cancer type allowed: Advanced solid tumor. Histology allowed: Any. Cancer burden allowed: Locally advanced or metastatic disease refractory to standard therapy or lacking effective treatment. Prior treatment required: Progressive disease after standard systemic therapy. Prior treatment excluded: Prior receipt of targeted ROR1 inhibitor therapy. Biomarkers required: ROR1 positive (assessment planned during screening). Biomarkers excluded: NA.
    Instruct: Given a cancer patient summary, retrieve clinical trial options that are reasonable for that patient; or, given a clinical trial option, retrieve cancer patients who are reasonable candidates for that trial. Age: 9
    Sex: Male
    Cancer type: B‑myeloid Mixed Phenotype Acute Leukemia (MPAL)
    Histology: Biphasic B‑cell/myeloid acute leukemia
    Current extent: Post‑second allogeneic HSCT, in remission; flow MRD < 0.05 % (0.02 %), low‑level FLT3‑ITD persistence (allelic ratio ∼0.38)
    Biomarkers: FLT3‑ITD (persisting, AR ∼ 0.38); WT1 truncating mutation p.Arg430*; NRAS p.Gly12Asp; DNMT3A p.Arg882His; CDKN2A/B homozygous deletion (9p21); IDH2 p.Arg140Gln (VAF ≈ 21 %→7 % across samples); NUP98‑NSD1 fusion (detected at diagnosis, subsequently undetectable); Additional routine panels negative for KMT2A rearrangement, BCR‑ABL1, other actionable hits.
    Treatment history:
    # 1/2017‑5/2017: FLAG‑IDA induction (Fludarabine, Cytarabine, G‑CSF, Idarubicin) + intrathecal Methotrexate; best response comp...
    Instruct: Given a cancer patient summary, retrieve clinical trial options that are reasonable for that patient; or, given a clinical trial option, retrieve cancer patients who are reasonable candidates for that trial. Age range allowed: 1 Year to 39 Years. Sex allowed: Male and Female. Cancer type allowed: Mixed phenotype acute leukemia, B‑myeloid. Histology allowed: B‑myeloid mixed phenotype acute leukemia. Cancer burden allowed: Relapsed or refractory disease. Prior treatment required: Full recovery from prior hematopoietic stem cell transplantation or anthracycline exposure. Prior treatment excluded: Current administration of anticancer agents (except intrathecal agents or hydroxyurea). Biomarkers required: NA. Biomarkers excluded: KMT2A rearrangement, Philadelphia chromosome/BCR‑ABL1 fusion.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

contrastive_dataset

  • Dataset: contrastive_dataset
  • Size: 1,526,022 training samples
  • Columns: patient_summary_trunc, this_space_trunc, and label
  • Approximate statistics based on the first 1000 samples:
    patient_summary_trunc this_space_trunc label
    type string string float
    details
    • min: 264 tokens
    • mean: 840.28 tokens
    • max: 1600 tokens
    • min: 121 tokens
    • mean: 185.75 tokens
    • max: 388 tokens
    • min: -1.0
    • mean: -0.4
    • max: 1.0
  • Samples:
    patient_summary_trunc this_space_trunc label
    Instruct: Given a cancer patient summary, retrieve clinical trial options that are reasonable for that patient; or, given a clinical trial option, retrieve cancer patients who are reasonable candidates for that trial. Age: 15
    Sex: Female
    Cancer type: B‑cell acute lymphoblastic leukemia (precursor B‑cell ALL)
    Histology: Precursor B‑lymphoblasts (flow: CD19⁺, CD22⁺, CD10⁺, CD34⁺, TdT⁺; CD20−, MPO−, surface IgM−)
    Current extent: Systemic disease with diffuse bone‑marrow involvement; now in partial remission after induction (marrow blasts ≈15 %, MRD low‑level positive ∼0.02 %)
    Biomarkers: TCF3‑PBX1 fusion (t(1;19)), hyperdiploid karyotype (extra chromosomes 4, 10, 17), PAX5 truncating mutation (p.R38*), CDKN2A/CDKN2B homozygous deletion, NRAS activating mutation (p.Q61K), low‑frequency CREBBP p.R1746H (subclonal), negative for BCR‑ABL1 and ETV6‑RUNX1; immunophenotype as above
    Treatment history:
    # 08/2017 – 09/2017 (Day 1‑28 of induction): Multidrug induction per COG AALL1131 – vincristine ...
    Instruct: Given a cancer patient summary, retrieve clinical trial options that are reasonable for that patient; or, given a clinical trial option, retrieve cancer patients who are reasonable candidates for that trial. Age range allowed: 0 through 45 years. Sex allowed: both sexes. Cancer type allowed: acute lymphoblastic leukaemia. Histology allowed: B‑cell precursor acute lymphoblastic leukaemia. Cancer burden allowed: newly diagnosed, never relapsed, de novo disease. Prior treatment required: NA. Prior treatment excluded: systemic corticosteroids ≥10 mg/m²/day prednisone equivalents for longer than one week before diagnosis; any chemotherapeutic agent administered within four weeks before diagnosis. Biomarkers required: surface immunoglobulin negative phenotype; IG::MYC rearrangement accepted only when BCL2 and BCL6 rearrangements are absent. Biomarkers excluded: KMT2A‑rearranged B‑cell precursor ALL in patients younger than 1 year; Philadelphia chromosome‑positive (t[9;22]/BCR‑ABL) ... 0.6000000000000001
    Instruct: Given a cancer patient summary, retrieve clinical trial options that are reasonable for that patient; or, given a clinical trial option, retrieve cancer patients who are reasonable candidates for that trial. Age: 53
    Sex: Male
    Cancer type: Lung cancer
    Histology: Poorly differentiated squamous cell carcinoma (left lower lobe)
    Current extent: Metastatic disease – primary left lower‑lobar lesion ~3.5 cm, new left adrenal metastasis (1.8 cm), right hilar lymph node enlargement; overall progressive disease per RECIST 1.1
    Biomarkers: PD‑L1 tumor proportion score 10 %; HER2 (ERBB2) amplification (≈9 copies); TP53 missense mutation p.R273C (likely pathogenic); CDKN2A homozygous deletion; FGFR1 amplification (≈7 copies); PIK3CA H1047R activating mutation; NOTCH1 L1575P variant of uncertain significance; KRAS wild‑type; EGFR wild‑type
    Treatment history:
    # 1/Jan 2017: Diagnostic bronchoscopic forceps biopsies of left lower lobar endobronchial lesion → confirmation of poorly differentiate...
    Instruct: Given a cancer patient summary, retrieve clinical trial options that are reasonable for that patient; or, given a clinical trial option, retrieve cancer patients who are reasonable candidates for that trial. Age range allowed: 18-75 years. Sex allowed: Male and Female. Cancer type allowed: Advanced solid tumor. Histology allowed: Any. Cancer burden allowed: Locally advanced or metastatic disease refractory to standard therapy or lacking effective treatment. Prior treatment required: Progressive disease after standard systemic therapy. Prior treatment excluded: Prior receipt of targeted ROR1 inhibitor therapy. Biomarkers required: ROR1 positive (assessment planned during screening). Biomarkers excluded: NA. 0.19999999999999996
    Instruct: Given a cancer patient summary, retrieve clinical trial options that are reasonable for that patient; or, given a clinical trial option, retrieve cancer patients who are reasonable candidates for that trial. Age: 9
    Sex: Male
    Cancer type: B‑myeloid Mixed Phenotype Acute Leukemia (MPAL)
    Histology: Biphasic B‑cell/myeloid acute leukemia
    Current extent: Post‑second allogeneic HSCT, in remission; flow MRD < 0.05 % (0.02 %), low‑level FLT3‑ITD persistence (allelic ratio ∼0.38)
    Biomarkers: FLT3‑ITD (persisting, AR ∼ 0.38); WT1 truncating mutation p.Arg430*; NRAS p.Gly12Asp; DNMT3A p.Arg882His; CDKN2A/B homozygous deletion (9p21); IDH2 p.Arg140Gln (VAF ≈ 21 %→7 % across samples); NUP98‑NSD1 fusion (detected at diagnosis, subsequently undetectable); Additional routine panels negative for KMT2A rearrangement, BCR‑ABL1, other actionable hits.
    Treatment history:
    # 1/2017‑5/2017: FLAG‑IDA induction (Fludarabine, Cytarabine, G‑CSF, Idarubicin) + intrathecal Methotrexate; best response comp...
    Instruct: Given a cancer patient summary, retrieve clinical trial options that are reasonable for that patient; or, given a clinical trial option, retrieve cancer patients who are reasonable candidates for that trial. Age range allowed: 1 Year to 39 Years. Sex allowed: Male and Female. Cancer type allowed: Mixed phenotype acute leukemia, B‑myeloid. Histology allowed: B‑myeloid mixed phenotype acute leukemia. Cancer burden allowed: Relapsed or refractory disease. Prior treatment required: Full recovery from prior hematopoietic stem cell transplantation or anthracycline exposure. Prior treatment excluded: Current administration of anticancer agents (except intrathecal agents or hydroxyurea). Biomarkers required: NA. Biomarkers excluded: KMT2A rearrangement, Philadelphia chromosome/BCR‑ABL1 fusion. 0.19999999999999996
  • Loss: CoSENTLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 10
  • learning_rate: 2e-05
  • warmup_ratio: 0.01
  • bf16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 10
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_ratio: 0.01
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 2
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: True
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Click to expand
Epoch Step Training Loss
0.0033 100 2.0831
0.0066 200 2.0847
0.0098 300 2.097
0.0131 400 2.0245
0.0164 500 2.0934
0.0197 600 2.0194
0.0230 700 2.0852
0.0262 800 2.1862
0.0295 900 2.0248
0.0328 1000 2.1063
0.0361 1100 2.128
0.0394 1200 2.0274
0.0426 1300 2.1446
0.0459 1400 2.0915
0.0492 1500 2.103
0.0525 1600 2.0583
0.0558 1700 2.1763
0.0590 1800 2.1259
0.0623 1900 2.0833
0.0656 2000 2.0816
0.0689 2100 2.0925
0.0722 2200 2.1179
0.0754 2300 2.0641
0.0787 2400 2.1368
0.0820 2500 2.1445
0.0853 2600 2.1387
0.0886 2700 2.1183
0.0918 2800 2.0845
0.0951 2900 2.084
0.0984 3000 2.1198
0.1017 3100 2.1777
0.1049 3200 2.0943
0.1082 3300 2.1166
0.1115 3400 2.0513
0.1148 3500 2.1159
0.1181 3600 2.0959
0.1213 3700 2.1227
0.1246 3800 2.1153
0.1279 3900 2.1422
0.1312 4000 2.0548
0.1345 4100 2.0591
0.1377 4200 2.0763
0.1410 4300 2.111
0.1443 4400 2.0537
0.1476 4500 2.1053
0.1509 4600 2.0623
0.1541 4700 2.1636
0.1574 4800 2.0333
0.1607 4900 2.0741
0.1640 5000 2.1379
0.1673 5100 2.1269
0.1705 5200 2.0973
0.1738 5300 2.07
0.1771 5400 2.0978
0.1804 5500 2.1343
0.1837 5600 2.0909
0.1869 5700 2.1077
0.1902 5800 2.1039
0.1935 5900 2.1409
0.1968 6000 2.0468
0.2001 6100 2.1135
0.2033 6200 2.1275
0.2066 6300 2.1131
0.2099 6400 2.0956
0.2132 6500 2.1259
0.2165 6600 2.1312
0.2197 6700 2.0809
0.2230 6800 2.0918
0.2263 6900 2.1624
0.2296 7000 2.1218
0.2329 7100 2.1044
0.2361 7200 2.119
0.2394 7300 2.0472
0.2427 7400 2.0815
0.2460 7500 2.105
0.2493 7600 2.134
0.2525 7700 2.0661
0.2558 7800 2.1011
0.2591 7900 2.0386
0.2624 8000 2.0698
0.2657 8100 2.1363
0.2689 8200 2.1183
0.2722 8300 2.1191
0.2755 8400 2.0374
0.2788 8500 2.1293
0.2821 8600 2.0872
0.2853 8700 2.0638
0.2886 8800 2.1166
0.2919 8900 2.099
0.2952 9000 2.0727
0.2984 9100 2.1259
0.3017 9200 2.1358
0.3050 9300 2.1272
0.3083 9400 2.145
0.3116 9500 2.0753
0.3148 9600 2.0871
0.3181 9700 2.1388
0.3214 9800 2.0986
0.3247 9900 2.1013
0.3280 10000 2.106
0.3312 10100 2.0948
0.3345 10200 2.1102
0.3378 10300 2.1457
0.3411 10400 2.0821
0.3444 10500 2.132
0.3476 10600 2.0262
0.3509 10700 2.0818
0.3542 10800 2.103
0.3575 10900 2.0715
0.3608 11000 2.1061
0.3640 11100 2.052
0.3673 11200 2.1026
0.3706 11300 2.0743
0.3739 11400 2.0773
0.3772 11500 2.1342
0.3804 11600 2.0608
0.3837 11700 2.1129
0.3870 11800 2.124
0.3903 11900 2.0204
0.3936 12000 2.1111
0.3968 12100 2.0998
0.4001 12200 2.1395
0.4034 12300 2.0428
0.4067 12400 2.0757
0.4100 12500 2.0887
0.4132 12600 2.0103
0.4165 12700 2.1141
0.4198 12800 2.0516
0.4231 12900 2.0486
0.4264 13000 2.0743
0.4296 13100 2.0659
0.4329 13200 2.0763
0.4362 13300 2.0994
0.4395 13400 2.066
0.4428 13500 2.0853
0.4460 13600 2.0957
0.4493 13700 2.094
0.4526 13800 2.1339
0.4559 13900 2.0468
0.4592 14000 2.0546
0.4624 14100 2.072
0.4657 14200 2.0703
0.4690 14300 2.0714
0.4723 14400 2.0622
0.4756 14500 2.1081
0.4788 14600 2.0513
0.4821 14700 2.0728
0.4854 14800 2.1158
0.4887 14900 2.0409
0.4919 15000 1.996
0.4952 15100 2.1065
0.4985 15200 2.0746
0.5018 15300 2.0526
0.5051 15400 2.088
0.5083 15500 2.0513
0.5116 15600 2.1005
0.5149 15700 2.0703
0.5182 15800 2.1393
0.5215 15900 2.0152
0.5247 16000 2.0655
0.5280 16100 2.0516
0.5313 16200 2.0734
0.5346 16300 2.0802
0.5379 16400 2.0612
0.5411 16500 2.1198
0.5444 16600 2.0871
0.5477 16700 2.0723
0.5510 16800 2.0433
0.5543 16900 2.059
0.5575 17000 2.0619
0.5608 17100 2.0962
0.5641 17200 2.1011
0.5674 17300 2.0652
0.5707 17400 2.0472
0.5739 17500 2.1834
0.5772 17600 2.0879
0.5805 17700 2.0406
0.5838 17800 2.0764
0.5871 17900 2.1096
0.5903 18000 2.0538
0.5936 18100 2.1295
0.5969 18200 2.0927
0.6002 18300 2.0113
0.6035 18400 1.9882
0.6067 18500 2.0406
0.6100 18600 2.0342
0.6133 18700 2.0202
0.6166 18800 2.0612
0.6199 18900 2.028
0.6231 19000 2.0788
0.6264 19100 2.0548
0.6297 19200 2.109
0.6330 19300 2.094
0.6363 19400 2.084
0.6395 19500 2.1271
0.6428 19600 2.0722
0.6461 19700 2.0255
0.6494 19800 2.1328
0.6527 19900 2.0356
0.6559 20000 2.0345
0.6592 20100 2.0739
0.6625 20200 2.0954
0.6658 20300 2.0697
0.6690 20400 2.0646
0.6723 20500 2.0475
0.6756 20600 2.1127
0.6789 20700 1.9988
0.6822 20800 2.0889
0.6854 20900 2.0684
0.6887 21000 2.1226
0.6920 21100 2.1362
0.6953 21200 2.0598
0.6986 21300 2.0518
0.7018 21400 2.0737
0.7051 21500 2.1033
0.7084 21600 2.0717
0.7117 21700 2.1137
0.7150 21800 2.0343
0.7182 21900 2.0134
0.7215 22000 2.0762
0.7248 22100 2.0494
0.7281 22200 2.0879
0.7314 22300 2.0972
0.7346 22400 2.0359
0.7379 22500 2.0689
0.7412 22600 2.0644
0.7445 22700 2.0473
0.7478 22800 2.0074
0.7510 22900 2.0321
0.7543 23000 2.0549
0.7576 23100 2.0483
0.7609 23200 2.0566
0.7642 23300 2.0063
0.7674 23400 2.0682
0.7707 23500 2.0937
0.7740 23600 2.0763
0.7773 23700 2.0564
0.7806 23800 2.0722
0.7838 23900 2.0313
0.7871 24000 1.986
0.7904 24100 1.995
0.7937 24200 2.0377
0.7970 24300 2.1
0.8002 24400 2.0417
0.8035 24500 2.0112
0.8068 24600 2.0283
0.8101 24700 2.0271
0.8134 24800 2.1089
0.8166 24900 2.0634
0.8199 25000 2.0508
0.8232 25100 2.0922
0.8265 25200 2.0939
0.8298 25300 2.0848
0.8330 25400 1.9946
0.8363 25500 2.0836
0.8396 25600 2.0013
0.8429 25700 2.0319
0.8462 25800 2.1079
0.8494 25900 2.1042
0.8527 26000 2.0973
0.8560 26100 2.0648
0.8593 26200 2.0273
0.8625 26300 2.1004
0.8658 26400 2.0812
0.8691 26500 2.1113
0.8724 26600 2.0413
0.8757 26700 2.0857
0.8789 26800 2.0575
0.8822 26900 2.0931
0.8855 27000 2.0806
0.8888 27100 2.0996
0.8921 27200 2.0014
0.8953 27300 2.0291
0.8986 27400 2.028
0.9019 27500 2.0799
0.9052 27600 1.9772
0.9085 27700 2.0451
0.9117 27800 2.0455
0.9150 27900 1.9754
0.9183 28000 2.0859
0.9216 28100 2.0352
0.9249 28200 2.0758
0.9281 28300 2.0445
0.9314 28400 2.052
0.9347 28500 2.0244
0.9380 28600 2.1101
0.9413 28700 2.0423
0.9445 28800 2.0078
0.9478 28900 2.0554
0.9511 29000 2.0204
0.9544 29100 2.0509
0.9577 29200 1.9927
0.9609 29300 1.999
0.9642 29400 1.9829
0.9675 29500 2.0003
0.9708 29600 2.0076
0.9741 29700 2.0568
0.9773 29800 1.9611
0.9806 29900 2.0457
0.9839 30000 2.0547
0.9872 30100 2.02
0.9905 30200 1.9795
0.9937 30300 2.0051
0.9970 30400 2.0794
1.0003 30500 2.0667
1.0036 30600 1.9406
1.0069 30700 2.0212
1.0101 30800 2.0533
1.0134 30900 1.9644
1.0167 31000 2.0073
1.0200 31100 1.9806
1.0233 31200 2.0171
1.0265 31300 2.0708
1.0298 31400 1.9352
1.0331 31500 1.9848
1.0364 31600 1.996
1.0397 31700 1.9329
1.0429 31800 1.9882
1.0462 31900 1.9278
1.0495 32000 1.9672
1.0528 32100 1.9495
1.0560 32200 2.0224
1.0593 32300 1.9632
1.0626 32400 1.9348
1.0659 32500 1.9285
1.0692 32600 1.9388
1.0724 32700 1.9608
1.0757 32800 1.9267
1.0790 32900 1.98
1.0823 33000 1.9691
1.0856 33100 1.998
1.0888 33200 1.9501
1.0921 33300 1.9353
1.0954 33400 1.9338
1.0987 33500 1.9413
1.1020 33600 2.0409
1.1052 33700 1.9417
1.1085 33800 1.9552
1.1118 33900 1.8979
1.1151 34000 1.9533
1.1184 34100 1.937
1.1216 34200 1.9782
1.1249 34300 1.9474
1.1282 34400 1.9821
1.1315 34500 1.8978
1.1348 34600 1.8783
1.1380 34700 1.9186
1.1413 34800 1.9101
1.1446 34900 1.8959
1.1479 35000 1.9551
1.1512 35100 1.8813
1.1544 35200 2.0067
1.1577 35300 1.8722
1.1610 35400 1.9121
1.1643 35500 1.9411
1.1676 35600 1.9647
1.1708 35700 1.9368
1.1741 35800 1.9224
1.1774 35900 1.9519
1.1807 36000 1.911
1.1840 36100 1.9401
1.1872 36200 1.9059
1.1905 36300 1.9713
1.1938 36400 1.9322
1.1971 36500 1.8975
1.2004 36600 1.9329
1.2036 36700 1.9436
1.2069 36800 1.9657
1.2102 36900 1.9324
1.2135 37000 1.9498
1.2168 37100 1.9496
1.2200 37200 1.9208
1.2233 37300 1.8935
1.2266 37400 1.9956
1.2299 37500 1.9417
1.2332 37600 1.9551
1.2364 37700 1.9525
1.2397 37800 1.8884
1.2430 37900 1.8992
1.2463 38000 1.9455
1.2495 38100 1.9799
1.2528 38200 1.8973
1.2561 38300 1.9368
1.2594 38400 1.8638
1.2627 38500 1.8783
1.2659 38600 1.9521
1.2692 38700 1.9587
1.2725 38800 1.9217
1.2758 38900 1.8796
1.2791 39000 1.9181
1.2823 39100 1.911
1.2856 39200 1.906
1.2889 39300 1.9309
1.2922 39400 1.9247
1.2955 39500 1.8877
1.2987 39600 1.9543
1.3020 39700 1.9842
1.3053 39800 1.94
1.3086 39900 1.9499
1.3119 40000 1.8988
1.3151 40100 1.9099
1.3184 40200 1.9682
1.3217 40300 1.9114
1.3250 40400 1.902
1.3283 40500 1.9025
1.3315 40600 1.9609
1.3348 40700 1.9385
1.3381 40800 1.9485
1.3414 40900 1.9176
1.3447 41000 1.9352
1.3479 41100 1.8728
1.3512 41200 1.8923
1.3545 41300 1.9536
1.3578 41400 1.8894
1.3611 41500 1.9193
1.3643 41600 1.8939
1.3676 41700 1.9187
1.3709 41800 1.9014
1.3742 41900 1.8843
1.3775 42000 1.9761
1.3807 42100 1.8594
1.3840 42200 1.9446
1.3873 42300 1.9486
1.3906 42400 1.8255
1.3939 42500 1.93
1.3971 42600 1.8953
1.4004 42700 1.9685
1.4037 42800 1.8485
1.4070 42900 1.9112
1.4103 43000 1.8845
1.4135 43100 1.8515
1.4168 43200 1.9322
1.4201 43300 1.8673
1.4234 43400 1.8668
1.4267 43500 1.8961
1.4299 43600 1.8979
1.4332 43700 1.9091
1.4365 43800 1.9181
1.4398 43900 1.9033
1.4430 44000 1.918
1.4463 44100 1.9252
1.4496 44200 1.9255
1.4529 44300 1.9293
1.4562 44400 1.8432
1.4594 44500 1.9051
1.4627 44600 1.9006
1.4660 44700 1.9191
1.4693 44800 1.9071
1.4726 44900 1.8956
1.4758 45000 1.9097
1.4791 45100 1.883
1.4824 45200 1.9219
1.4857 45300 1.9312
1.4890 45400 1.856
1.4922 45500 1.8255
1.4955 45600 1.9186
1.4988 45700 1.8755
1.5021 45800 1.8742
1.5054 45900 1.9239
1.5086 46000 1.8842
1.5119 46100 1.9221
1.5152 46200 1.8837
1.5185 46300 1.9597
1.5218 46400 1.8564
1.5250 46500 1.8904
1.5283 46600 1.861
1.5316 46700 1.897
1.5349 46800 1.8951
1.5382 46900 1.8838
1.5414 47000 1.9156
1.5447 47100 1.9058
1.5480 47200 1.9225
1.5513 47300 1.8461
1.5546 47400 1.8857
1.5578 47500 1.9042
1.5611 47600 1.9145
1.5644 47700 1.9117
1.5677 47800 1.8948
1.5710 47900 1.8315
1.5742 48000 2.0098
1.5775 48100 1.9015
1.5808 48200 1.8483
1.5841 48300 1.8915
1.5874 48400 1.9237
1.5906 48500 1.8882
1.5939 48600 1.9438
1.5972 48700 1.8874
1.6005 48800 1.8094
1.6038 48900 1.8296
1.6070 49000 1.8631
1.6103 49100 1.8423
1.6136 49200 1.8384
1.6169 49300 1.8685
1.6202 49400 1.8621
1.6234 49500 1.8841
1.6267 49600 1.8625
1.6300 49700 1.9441
1.6333 49800 1.9338
1.6365 49900 1.9017
1.6398 50000 1.9243
1.6431 50100 1.899
1.6464 50200 1.8616
1.6497 50300 1.9886
1.6529 50400 1.8361
1.6562 50500 1.8767
1.6595 50600 1.8793
1.6628 50700 1.9453
1.6661 50800 1.8792
1.6693 50900 1.8666
1.6726 51000 1.8334
1.6759 51100 1.9263
1.6792 51200 1.815
1.6825 51300 1.9027
1.6857 51400 1.8991
1.6890 51500 1.9383
1.6923 51600 1.9796
1.6956 51700 1.8553
1.6989 51800 1.8651
1.7021 51900 1.923
1.7054 52000 1.9168
1.7087 52100 1.9046
1.7120 52200 1.9175
1.7153 52300 1.862
1.7185 52400 1.8409
1.7218 52500 1.9012
1.7251 52600 1.8872
1.7284 52700 1.91
1.7317 52800 1.9113
1.7349 52900 1.8294
1.7382 53000 1.8987
1.7415 53100 1.8792
1.7448 53200 1.8593
1.7481 53300 1.8387
1.7513 53400 1.8481
1.7546 53500 1.848
1.7579 53600 1.9255
1.7612 53700 1.873
1.7645 53800 1.8494
1.7677 53900 1.8634
1.7710 54000 1.9154
1.7743 54100 1.88
1.7776 54200 1.8747
1.7809 54300 1.8803
1.7841 54400 1.8646
1.7874 54500 1.791
1.7907 54600 1.8555
1.7940 54700 1.846
1.7973 54800 1.9235
1.8005 54900 1.8634
1.8038 55000 1.8466
1.8071 55100 1.8079
1.8104 55200 1.87
1.8136 55300 1.9149
1.8169 55400 1.8907
1.8202 55500 1.8717
1.8235 55600 1.9281
1.8268 55700 1.929
1.8300 55800 1.9038
1.8333 55900 1.8671
1.8366 56000 1.8556
1.8399 56100 1.8317
1.8432 56200 1.8681
1.8464 56300 1.9238
1.8497 56400 1.9257
1.8530 56500 1.9114
1.8563 56600 1.8595
1.8596 56700 1.8448
1.8628 56800 1.9383
1.8661 56900 1.9138
1.8694 57000 1.9337
1.8727 57100 1.8714
1.8760 57200 1.9025
1.8792 57300 1.8922
1.8825 57400 1.8981
1.8858 57500 1.9164
1.8891 57600 1.9137
1.8924 57700 1.8394
1.8956 57800 1.838
1.8989 57900 1.8642
1.9022 58000 1.9035
1.9055 58100 1.8386
1.9088 58200 1.8522
1.9120 58300 1.8524
1.9153 58400 1.8262
1.9186 58500 1.8586
1.9219 58600 1.8708
1.9252 58700 1.9007
1.9284 58800 1.8483
1.9317 58900 1.8699
1.9350 59000 1.8653
1.9383 59100 1.8931
1.9416 59200 1.8571
1.9448 59300 1.8384
1.9481 59400 1.8838
1.9514 59500 1.8135
1.9547 59600 1.8838
1.9580 59700 1.8001
1.9612 59800 1.824
1.9645 59900 1.8156
1.9678 60000 1.8505
1.9711 60100 1.8066
1.9744 60200 1.8712
1.9776 60300 1.7907
1.9809 60400 1.8729
1.9842 60500 1.8874
1.9875 60600 1.8291
1.9908 60700 1.7899
1.9940 60800 1.8457
1.9973 60900 1.8888
2.0006 61000 1.9008
2.0039 61100 1.7372
2.0071 61200 1.86
2.0104 61300 1.8764
2.0137 61400 1.7489
2.0170 61500 1.8152
2.0203 61600 1.7834
2.0235 61700 1.8174
2.0268 61800 1.8868
2.0301 61900 1.7351
2.0334 62000 1.7656
2.0367 62100 1.8076
2.0399 62200 1.7437
2.0432 62300 1.7684
2.0465 62400 1.7179
2.0498 62500 1.7759
2.0531 62600 1.7629
2.0563 62700 1.7665
2.0596 62800 1.7714
2.0629 62900 1.7474
2.0662 63000 1.7615
2.0695 63100 1.7437
2.0727 63200 1.7409
2.0760 63300 1.7373
2.0793 63400 1.7705
2.0826 63500 1.7524
2.0859 63600 1.8012
2.0891 63700 1.7344
2.0924 63800 1.7459
2.0957 63900 1.7463
2.0990 64000 1.7453
2.1023 64100 1.8076
2.1055 64200 1.7679
2.1088 64300 1.724
2.1121 64400 1.7006
2.1154 64500 1.7291
2.1187 64600 1.7212
2.1219 64700 1.7828
2.1252 64800 1.741
2.1285 64900 1.7645
2.1318 65000 1.7174
2.1351 65100 1.6568
2.1383 65200 1.7193
2.1416 65300 1.7015
2.1449 65400 1.7144
2.1482 65500 1.7418
2.1515 65600 1.7042
2.1547 65700 1.7734
2.1580 65800 1.6707
2.1613 65900 1.7292
2.1646 66000 1.7242
2.1679 66100 1.7666
2.1711 66200 1.7508
2.1744 66300 1.761
2.1777 66400 1.7547
2.1810 66500 1.6977
2.1843 66600 1.7322
2.1875 66700 1.6875
2.1908 66800 1.7769
2.1941 66900 1.7078
2.1974 67000 1.6932
2.2006 67100 1.7323
2.2039 67200 1.7489
2.2072 67300 1.7572
2.2105 67400 1.7205
2.2138 67500 1.78
2.2170 67600 1.7437
2.2203 67700 1.6845
2.2236 67800 1.6908
2.2269 67900 1.7702
2.2302 68000 1.7179
2.2334 68100 1.7214
2.2367 68200 1.7673
2.2400 68300 1.6757
2.2433 68400 1.7045
2.2466 68500 1.7644
2.2498 68600 1.7922
2.2531 68700 1.7284
2.2564 68800 1.7098
2.2597 68900 1.6664
2.2630 69000 1.7052
2.2662 69100 1.7184
2.2695 69200 1.758
2.2728 69300 1.6877
2.2761 69400 1.6478
2.2794 69500 1.6849
2.2826 69600 1.6985
2.2859 69700 1.7169
2.2892 69800 1.7246
2.2925 69900 1.6989
2.2958 70000 1.6752
2.2990 70100 1.7734
2.3023 70200 1.754
2.3056 70300 1.7526
2.3089 70400 1.7231
2.3122 70500 1.7254
2.3154 70600 1.6972
2.3187 70700 1.7795
2.3220 70800 1.6778
2.3253 70900 1.7071
2.3286 71000 1.72
2.3318 71100 1.7381
2.3351 71200 1.7221
2.3384 71300 1.781
2.3417 71400 1.7077
2.3450 71500 1.7387
2.3482 71600 1.6725
2.3515 71700 1.6935
2.3548 71800 1.7562
2.3581 71900 1.6878
2.3614 72000 1.6909
2.3646 72100 1.712
2.3679 72200 1.6854
2.3712 72300 1.7083
2.3745 72400 1.6975
2.3778 72500 1.7578
2.3810 72600 1.6652
2.3843 72700 1.7657
2.3876 72800 1.6973
2.3909 72900 1.6335
2.3941 73000 1.7062
2.3974 73100 1.6915
2.4007 73200 1.7723
2.4040 73300 1.6457
2.4073 73400 1.7151
2.4105 73500 1.6733
2.4138 73600 1.6606
2.4171 73700 1.6922
2.4204 73800 1.6836
2.4237 73900 1.6484
2.4269 74000 1.6948
2.4302 74100 1.6837
2.4335 74200 1.7202
2.4368 74300 1.7299
2.4401 74400 1.6856
2.4433 74500 1.706
2.4466 74600 1.7266
2.4499 74700 1.7354
2.4532 74800 1.7301
2.4565 74900 1.6377
2.4597 75000 1.6948
2.4630 75100 1.6806
2.4663 75200 1.7143
2.4696 75300 1.6908
2.4729 75400 1.6828
2.4761 75500 1.7243
2.4794 75600 1.6892
2.4827 75700 1.7265
2.4860 75800 1.6987
2.4893 75900 1.6442
2.4925 76000 1.6118
2.4958 76100 1.7129
2.4991 76200 1.6515
2.5024 76300 1.664
2.5057 76400 1.7259
2.5089 76500 1.7128
2.5122 76600 1.7069
2.5155 76700 1.6877
2.5188 76800 1.7571
2.5221 76900 1.6714
2.5253 77000 1.6727
2.5286 77100 1.6848
2.5319 77200 1.7195
2.5352 77300 1.6569
2.5385 77400 1.6996
2.5417 77500 1.6831
2.5450 77600 1.7039
2.5483 77700 1.7171
2.5516 77800 1.6838
2.5549 77900 1.6893
2.5581 78000 1.7016
2.5614 78100 1.7456
2.5647 78200 1.7281
2.5680 78300 1.6918
2.5713 78400 1.6715
2.5745 78500 1.7708
2.5778 78600 1.7068
2.5811 78700 1.6602
2.5844 78800 1.676
2.5876 78900 1.7335
2.5909 79000 1.7131
2.5942 79100 1.7628
2.5975 79200 1.6806
2.6008 79300 1.6172
2.6040 79400 1.6649
2.6073 79500 1.685
2.6106 79600 1.6553
2.6139 79700 1.6667
2.6172 79800 1.6635
2.6204 79900 1.6562
2.6237 80000 1.6848
2.6270 80100 1.6907
2.6303 80200 1.7314
2.6336 80300 1.7627
2.6368 80400 1.6962
2.6401 80500 1.7257
2.6434 80600 1.6903
2.6467 80700 1.7103
2.6500 80800 1.7612
2.6532 80900 1.6416
2.6565 81000 1.704
2.6598 81100 1.6984
2.6631 81200 1.7549
2.6664 81300 1.7114
2.6696 81400 1.7018
2.6729 81500 1.6451
2.6762 81600 1.7034
2.6795 81700 1.6557
2.6828 81800 1.7173
2.6860 81900 1.6907
2.6893 82000 1.7692
2.6926 82100 1.7491
2.6959 82200 1.6443
2.6992 82300 1.6867
2.7024 82400 1.7279
2.7057 82500 1.7552
2.7090 82600 1.6773
2.7123 82700 1.7345
2.7156 82800 1.6649
2.7188 82900 1.675
2.7221 83000 1.7177
2.7254 83100 1.7093
2.7287 83200 1.705
2.7320 83300 1.7497
2.7352 83400 1.6442
2.7385 83500 1.7147
2.7418 83600 1.7238
2.7451 83700 1.6653
2.7484 83800 1.6418
2.7516 83900 1.6701
2.7549 84000 1.6879
2.7582 84100 1.7377
2.7615 84200 1.6991
2.7648 84300 1.6793
2.7680 84400 1.6906
2.7713 84500 1.7543
2.7746 84600 1.7044
2.7779 84700 1.6963
2.7811 84800 1.7193
2.7844 84900 1.6883
2.7877 85000 1.6649
2.7910 85100 1.6911
2.7943 85200 1.7253
2.7975 85300 1.7344
2.8008 85400 1.6826
2.8041 85500 1.6767
2.8074 85600 1.6364
2.8107 85700 1.7168
2.8139 85800 1.7622
2.8172 85900 1.7312
2.8205 86000 1.6972
2.8238 86100 1.7845
2.8271 86200 1.7589
2.8303 86300 1.7576
2.8336 86400 1.7414
2.8369 86500 1.6898
2.8402 86600 1.6809
2.8435 86700 1.7099
2.8467 86800 1.7643
2.8500 86900 1.7538
2.8533 87000 1.782
2.8566 87100 1.7008
2.8599 87200 1.6826
2.8631 87300 1.7683
2.8664 87400 1.799
2.8697 87500 1.7653
2.8730 87600 1.7152
2.8763 87700 1.7335
2.8795 87800 1.7513
2.8828 87900 1.7244
2.8861 88000 1.7766
2.8894 88100 1.7826
2.8927 88200 1.7202
2.8959 88300 1.6805
2.8992 88400 1.7176
2.9025 88500 1.7359
2.9058 88600 1.7159
2.9091 88700 1.709
2.9123 88800 1.7137
2.9156 88900 1.7372
2.9189 89000 1.7192
2.9222 89100 1.7137
2.9255 89200 1.7737
2.9287 89300 1.719
2.9320 89400 1.7047
2.9353 89500 1.7314
2.9386 89600 1.7385
2.9419 89700 1.7207
2.9451 89800 1.7229
2.9484 89900 1.7533
2.9517 90000 1.6814
2.9550 90100 1.7411
2.9582 90200 1.6637
2.9615 90300 1.7148
2.9648 90400 1.6791
2.9681 90500 1.7154
2.9714 90600 1.6958
2.9746 90700 1.7396
2.9779 90800 1.656
2.9812 90900 1.763
2.9845 91000 1.7397
2.9878 91100 1.7194
2.9910 91200 1.6691
2.9943 91300 1.7505
2.9976 91400 1.7788

Framework Versions

  • Python: 3.13.12
  • Sentence Transformers: 5.2.3
  • Transformers: 4.57.6
  • PyTorch: 2.9.1+cu128
  • Accelerate: 1.13.0
  • Datasets: 4.6.1
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

CoSENTLoss

@article{10531646,
    author={Huang, Xiang and Peng, Hao and Zou, Dongcheng and Liu, Zhiwei and Li, Jianxin and Liu, Kay and Wu, Jia and Su, Jianlin and Yu, Philip S.},
    journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
    title={CoSENT: Consistent Sentence Embedding via Similarity Ranking},
    year={2024},
    doi={10.1109/TASLP.2024.3402087}
}
Downloads last month
190
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ksg-dfci/TrialSpace-0426

Quantizations
1 model

Spaces using ksg-dfci/TrialSpace-0426 2

Collection including ksg-dfci/TrialSpace-0426

Papers for ksg-dfci/TrialSpace-0426