SentenceTransformer based on jhu-clsp/mmBERT-base

This is a sentence-transformers model finetuned from jhu-clsp/mmBERT-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: jhu-clsp/mmBERT-base
  • Maximum Sequence Length: 128 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'attenuated vaccines:',
    'कम संवेदनशील टीकेः',
    '६.५% दसादशे',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.3723, 0.1543],
#         [0.3723, 1.0000, 0.2746],
#         [0.1543, 0.2746, 1.0000]])

Evaluation

Metrics

Translation

Metric Value
src2trg_accuracy 0.616
trg2src_accuracy 0.604
mean_accuracy 0.61

Training Details

Training Dataset

Unnamed Dataset

  • Size: 3,749,530 training samples
  • Columns: sentence1 and sentence2
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2
    type string string
    details
    • min: 12 tokens
    • mean: 31.26 tokens
    • max: 88 tokens
    • min: 19 tokens
    • mean: 67.93 tokens
    • max: 128 tokens
  • Samples:
    sentence1 sentence2
    There was no Mughal tradition of primogeniture, the systematic passing of rule, upon an emperor's death, to his eldest son.
    चक्रवर्तिनः मृत्योः अनन्तरं तस्य शासनस्य व्यवस्थितरूपेण सङ्क्रमणस्य, मुघलपरम्परायाः ज्येष्ठपुत्राधिकारपद्धतिः नासीत्।
    The four sons of Shah Jahan all held governorships during their father's reign.
    शाह्-जहाँ-नामकस्य चत्वारः पुत्राः, सर्वे पितुः शासनकाले शासकपदम् अधारयन्।
    In this regard he discusses the correlation between social opportunities of education and health and how both of these complement economic and political freedoms as a healthy and well-educated person is better suited to make informed economic decisions and be involved in fruitful political demonstrations etc.
    अस्मिन् विषये सः शिक्षणस्य स्वास्थ्यस्य च सामाजिकावकाशानाम् अन्योन्य-सम्बन्धस्य, तथा च एतद्द्वयम् अपि आर्थिक-राजनैतिक-स्वातन्त्र्ययोः कथं पूरकं भवतः इति च चर्चां करोति, यतोहि स्वस्था सुशिक्षिता च व्यक्तिः ज्ञानपूर्वम् आर्थिकविषयान् निर्णेतुं तथा फलप्रदेषु राजनैतिकेषु प्रतिपादनादिषु संलग्नः भवितुं च अधिकारी भवति इति।
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 1,000 evaluation samples
  • Columns: sentence1 and sentence2
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2
    type string string
    details
    • min: 5 tokens
    • mean: 11.9 tokens
    • max: 67 tokens
    • min: 5 tokens
    • mean: 23.13 tokens
    • max: 128 tokens
  • Samples:
    sentence1 sentence2
    plus 2 tempered glass screen protectors: 6 पश्चात तापाभिसंतप्तॊ विदुर समार कर्शितः
    "Take sadaqah (alms) from their wealth in order to purify them with it." (p. अप्येकाङ्गेऽप्यधोवस्तुमिच्छामि च सुकुत्सिते" ॥
    "Who could it possibly be?" कश्च तासेः सम्भवति ?
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 32
  • num_train_epochs: 5
  • max_steps: 12000
  • learning_rate: 2e-05
  • warmup_steps: 500
  • gradient_accumulation_steps: 4
  • bf16: True
  • eval_strategy: steps
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • per_device_train_batch_size: 32
  • num_train_epochs: 5
  • max_steps: 12000
  • learning_rate: 2e-05
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_steps: 500
  • optim: adamw_torch_fused
  • optim_args: None
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • optim_target_modules: None
  • gradient_accumulation_steps: 4
  • average_tokens_across_devices: True
  • max_grad_norm: 1.0
  • label_smoothing_factor: 0.0
  • bf16: True
  • fp16: False
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • use_liger_kernel: False
  • liger_kernel_config: None
  • use_cache: False
  • neftune_noise_alpha: None
  • torch_empty_cache_steps: None
  • auto_find_batch_size: False
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • include_num_input_tokens_seen: no
  • log_level: passive
  • log_level_replica: warning
  • disable_tqdm: False
  • project: huggingface
  • trackio_space_id: trackio
  • eval_strategy: steps
  • per_device_eval_batch_size: 8
  • prediction_loss_only: True
  • eval_on_start: False
  • eval_do_concat_batches: True
  • eval_use_gather_object: False
  • eval_accumulation_steps: None
  • include_for_metrics: []
  • batch_eval_metrics: False
  • save_only_model: False
  • save_on_each_node: False
  • enable_jit_checkpoint: False
  • push_to_hub: False
  • hub_private_repo: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_always_push: False
  • hub_revision: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • restore_callback_states_from_checkpoint: False
  • full_determinism: False
  • seed: 42
  • data_seed: None
  • use_cpu: False
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • dataloader_prefetch_factor: None
  • remove_unused_columns: True
  • label_names: None
  • train_sampling_strategy: random
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • ddp_backend: None
  • ddp_timeout: 1800
  • fsdp: []
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • deepspeed: None
  • debug: []
  • skip_memory_metrics: True
  • do_predict: False
  • resume_from_checkpoint: None
  • warmup_ratio: None
  • local_rank: -1
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Click to expand
Epoch Step Training Loss Validation Loss eval-en-sa_mean_accuracy
0.0034 100 3.1353 - -
0.0068 200 2.7273 - -
0.0102 300 1.8263 - -
0.0137 400 1.1810 - -
0.0171 500 0.8952 - -
0.0205 600 0.7068 - -
0.0239 700 0.5979 - -
0.0273 800 0.5412 - -
0.0307 900 0.5255 - -
0.0341 1000 0.4847 0.2013 0.5045
0.0376 1100 0.4752 - -
0.0410 1200 0.4645 - -
0.0444 1300 0.4173 - -
0.0478 1400 0.4220 - -
0.0512 1500 0.4163 - -
0.0546 1600 0.3978 - -
0.0580 1700 0.3895 - -
0.0614 1800 0.3778 - -
0.0649 1900 0.3904 - -
0.0683 2000 0.3656 0.1436 0.563
0.0717 2100 0.3565 - -
0.0751 2200 0.3526 - -
0.0785 2300 0.3632 - -
0.0819 2400 0.3468 - -
0.0853 2500 0.3506 - -
0.0888 2600 0.3505 - -
0.0922 2700 0.3466 - -
0.0956 2800 0.3422 - -
0.0990 2900 0.3393 - -
0.1024 3000 0.3345 0.1240 0.587
0.1058 3100 0.3238 - -
0.1092 3200 0.3230 - -
0.1127 3300 0.3281 - -
0.1161 3400 0.3246 - -
0.1195 3500 0.3111 - -
0.1229 3600 0.3092 - -
0.1263 3700 0.3187 - -
0.1297 3800 0.3293 - -
0.1331 3900 0.3246 - -
0.1366 4000 0.3174 0.1165 0.598
0.1400 4100 0.3213 - -
0.1434 4200 0.3167 - -
0.1468 4300 0.3142 - -
0.1502 4400 0.3070 - -
0.1536 4500 0.3094 - -
0.1570 4600 0.3084 - -
0.1604 4700 0.3068 - -
0.1639 4800 0.3060 - -
0.1673 4900 0.3020 - -
0.1707 5000 0.3072 0.1133 0.6045
0.1741 5100 0.3151 - -
0.1775 5200 0.3121 - -
0.1809 5300 0.3059 - -
0.1843 5400 0.3069 - -
0.1878 5500 0.3069 - -
0.1912 5600 0.3134 - -
0.1946 5700 0.3017 - -
0.1980 5800 0.3088 - -
0.2014 5900 0.3011 - -
0.2048 6000 0.3075 0.1109 0.608
0.2082 6100 0.2957 - -
0.2117 6200 0.3049 - -
0.2151 6300 0.2994 - -
0.2185 6400 0.2951 - -
0.2219 6500 0.3116 - -
0.2253 6600 0.3155 - -
0.2287 6700 0.2938 - -
0.2321 6800 0.2824 - -
0.2355 6900 0.2973 - -
0.2390 7000 0.3111 0.1100 0.6065
0.2424 7100 0.2973 - -
0.2458 7200 0.2995 - -
0.2492 7300 0.2962 - -
0.2526 7400 0.2994 - -
0.2560 7500 0.2964 - -
0.2594 7600 0.2997 - -
0.2629 7700 0.2932 - -
0.2663 7800 0.2993 - -
0.2697 7900 0.2987 - -
0.2731 8000 0.2898 0.1084 0.6085
0.2765 8100 0.3007 - -
0.2799 8200 0.2935 - -
0.2833 8300 0.2885 - -
0.2868 8400 0.3021 - -
0.2902 8500 0.2958 - -
0.2936 8600 0.3056 - -
0.2970 8700 0.2908 - -
0.3004 8800 0.3096 - -
0.3038 8900 0.2924 - -
0.3072 9000 0.3019 0.1077 0.607
0.3107 9100 0.2985 - -
0.3141 9200 0.2906 - -
0.3175 9300 0.2961 - -
0.3209 9400 0.3044 - -
0.3243 9500 0.3005 - -
0.3277 9600 0.2943 - -
0.3311 9700 0.2948 - -
0.3345 9800 0.3046 - -
0.3380 9900 0.2948 - -
0.3414 10000 0.3060 0.1083 0.608
0.3448 10100 0.2906 - -
0.3482 10200 0.2958 - -
0.3516 10300 0.2919 - -
0.3550 10400 0.3041 - -
0.3584 10500 0.3055 - -
0.3619 10600 0.2975 - -
0.3653 10700 0.2984 - -
0.3687 10800 0.2883 - -
0.3721 10900 0.2949 - -
0.3755 11000 0.2987 0.1083 0.6085
0.3789 11100 0.2938 - -
0.3823 11200 0.2942 - -
0.3858 11300 0.2879 - -
0.3892 11400 0.2909 - -
0.3926 11500 0.2899 - -
0.3960 11600 0.2921 - -
0.3994 11700 0.2944 - -
0.4028 11800 0.2985 - -
0.4062 11900 0.3027 - -
0.4097 12000 0.2988 0.1082 0.61
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.18
  • Sentence Transformers: 5.2.3
  • Transformers: 5.2.0
  • PyTorch: 2.8.0+cu128
  • Accelerate: 1.12.0
  • Datasets: 3.3.2
  • Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
13
Safetensors
Model size
0.3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Bheri/ithasa-mmbert-v3

Finetuned
(65)
this model

Papers for Bheri/ithasa-mmbert-v3

Evaluation results