SentenceTransformer based on jhu-clsp/mmBERT-base
This is a sentence-transformers model finetuned from jhu-clsp/mmBERT-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: jhu-clsp/mmBERT-base
- Maximum Sequence Length: 128 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
Model Sources
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 128, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("sentence_transformers_model_id")
sentences = [
'attenuated vaccines:',
'कम संवेदनशील टीकेः',
'६.५% दसादशे',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
similarities = model.similarity(embeddings, embeddings)
print(similarities)
Evaluation
Metrics
Translation
| Metric |
Value |
| src2trg_accuracy |
0.616 |
| trg2src_accuracy |
0.604 |
| mean_accuracy |
0.61 |
Training Details
Training Dataset
Unnamed Dataset
Evaluation Dataset
Unnamed Dataset
Training Hyperparameters
Non-Default Hyperparameters
per_device_train_batch_size: 32
num_train_epochs: 5
max_steps: 12000
learning_rate: 2e-05
warmup_steps: 500
gradient_accumulation_steps: 4
bf16: True
eval_strategy: steps
load_best_model_at_end: True
All Hyperparameters
Click to expand
per_device_train_batch_size: 32
num_train_epochs: 5
max_steps: 12000
learning_rate: 2e-05
lr_scheduler_type: linear
lr_scheduler_kwargs: None
warmup_steps: 500
optim: adamw_torch_fused
optim_args: None
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
optim_target_modules: None
gradient_accumulation_steps: 4
average_tokens_across_devices: True
max_grad_norm: 1.0
label_smoothing_factor: 0.0
bf16: True
fp16: False
bf16_full_eval: False
fp16_full_eval: False
tf32: None
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
use_liger_kernel: False
liger_kernel_config: None
use_cache: False
neftune_noise_alpha: None
torch_empty_cache_steps: None
auto_find_batch_size: False
log_on_each_node: True
logging_nan_inf_filter: True
include_num_input_tokens_seen: no
log_level: passive
log_level_replica: warning
disable_tqdm: False
project: huggingface
trackio_space_id: trackio
eval_strategy: steps
per_device_eval_batch_size: 8
prediction_loss_only: True
eval_on_start: False
eval_do_concat_batches: True
eval_use_gather_object: False
eval_accumulation_steps: None
include_for_metrics: []
batch_eval_metrics: False
save_only_model: False
save_on_each_node: False
enable_jit_checkpoint: False
push_to_hub: False
hub_private_repo: None
hub_model_id: None
hub_strategy: every_save
hub_always_push: False
hub_revision: None
load_best_model_at_end: True
ignore_data_skip: False
restore_callback_states_from_checkpoint: False
full_determinism: False
seed: 42
data_seed: None
use_cpu: False
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
parallelism_config: None
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_pin_memory: True
dataloader_persistent_workers: False
dataloader_prefetch_factor: None
remove_unused_columns: True
label_names: None
train_sampling_strategy: random
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
ddp_backend: None
ddp_timeout: 1800
fsdp: []
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
deepspeed: None
debug: []
skip_memory_metrics: True
do_predict: False
resume_from_checkpoint: None
warmup_ratio: None
local_rank: -1
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional
router_mapping: {}
learning_rate_mapping: {}
Training Logs
Click to expand
| Epoch |
Step |
Training Loss |
Validation Loss |
eval-en-sa_mean_accuracy |
| 0.0034 |
100 |
3.1353 |
- |
- |
| 0.0068 |
200 |
2.7273 |
- |
- |
| 0.0102 |
300 |
1.8263 |
- |
- |
| 0.0137 |
400 |
1.1810 |
- |
- |
| 0.0171 |
500 |
0.8952 |
- |
- |
| 0.0205 |
600 |
0.7068 |
- |
- |
| 0.0239 |
700 |
0.5979 |
- |
- |
| 0.0273 |
800 |
0.5412 |
- |
- |
| 0.0307 |
900 |
0.5255 |
- |
- |
| 0.0341 |
1000 |
0.4847 |
0.2013 |
0.5045 |
| 0.0376 |
1100 |
0.4752 |
- |
- |
| 0.0410 |
1200 |
0.4645 |
- |
- |
| 0.0444 |
1300 |
0.4173 |
- |
- |
| 0.0478 |
1400 |
0.4220 |
- |
- |
| 0.0512 |
1500 |
0.4163 |
- |
- |
| 0.0546 |
1600 |
0.3978 |
- |
- |
| 0.0580 |
1700 |
0.3895 |
- |
- |
| 0.0614 |
1800 |
0.3778 |
- |
- |
| 0.0649 |
1900 |
0.3904 |
- |
- |
| 0.0683 |
2000 |
0.3656 |
0.1436 |
0.563 |
| 0.0717 |
2100 |
0.3565 |
- |
- |
| 0.0751 |
2200 |
0.3526 |
- |
- |
| 0.0785 |
2300 |
0.3632 |
- |
- |
| 0.0819 |
2400 |
0.3468 |
- |
- |
| 0.0853 |
2500 |
0.3506 |
- |
- |
| 0.0888 |
2600 |
0.3505 |
- |
- |
| 0.0922 |
2700 |
0.3466 |
- |
- |
| 0.0956 |
2800 |
0.3422 |
- |
- |
| 0.0990 |
2900 |
0.3393 |
- |
- |
| 0.1024 |
3000 |
0.3345 |
0.1240 |
0.587 |
| 0.1058 |
3100 |
0.3238 |
- |
- |
| 0.1092 |
3200 |
0.3230 |
- |
- |
| 0.1127 |
3300 |
0.3281 |
- |
- |
| 0.1161 |
3400 |
0.3246 |
- |
- |
| 0.1195 |
3500 |
0.3111 |
- |
- |
| 0.1229 |
3600 |
0.3092 |
- |
- |
| 0.1263 |
3700 |
0.3187 |
- |
- |
| 0.1297 |
3800 |
0.3293 |
- |
- |
| 0.1331 |
3900 |
0.3246 |
- |
- |
| 0.1366 |
4000 |
0.3174 |
0.1165 |
0.598 |
| 0.1400 |
4100 |
0.3213 |
- |
- |
| 0.1434 |
4200 |
0.3167 |
- |
- |
| 0.1468 |
4300 |
0.3142 |
- |
- |
| 0.1502 |
4400 |
0.3070 |
- |
- |
| 0.1536 |
4500 |
0.3094 |
- |
- |
| 0.1570 |
4600 |
0.3084 |
- |
- |
| 0.1604 |
4700 |
0.3068 |
- |
- |
| 0.1639 |
4800 |
0.3060 |
- |
- |
| 0.1673 |
4900 |
0.3020 |
- |
- |
| 0.1707 |
5000 |
0.3072 |
0.1133 |
0.6045 |
| 0.1741 |
5100 |
0.3151 |
- |
- |
| 0.1775 |
5200 |
0.3121 |
- |
- |
| 0.1809 |
5300 |
0.3059 |
- |
- |
| 0.1843 |
5400 |
0.3069 |
- |
- |
| 0.1878 |
5500 |
0.3069 |
- |
- |
| 0.1912 |
5600 |
0.3134 |
- |
- |
| 0.1946 |
5700 |
0.3017 |
- |
- |
| 0.1980 |
5800 |
0.3088 |
- |
- |
| 0.2014 |
5900 |
0.3011 |
- |
- |
| 0.2048 |
6000 |
0.3075 |
0.1109 |
0.608 |
| 0.2082 |
6100 |
0.2957 |
- |
- |
| 0.2117 |
6200 |
0.3049 |
- |
- |
| 0.2151 |
6300 |
0.2994 |
- |
- |
| 0.2185 |
6400 |
0.2951 |
- |
- |
| 0.2219 |
6500 |
0.3116 |
- |
- |
| 0.2253 |
6600 |
0.3155 |
- |
- |
| 0.2287 |
6700 |
0.2938 |
- |
- |
| 0.2321 |
6800 |
0.2824 |
- |
- |
| 0.2355 |
6900 |
0.2973 |
- |
- |
| 0.2390 |
7000 |
0.3111 |
0.1100 |
0.6065 |
| 0.2424 |
7100 |
0.2973 |
- |
- |
| 0.2458 |
7200 |
0.2995 |
- |
- |
| 0.2492 |
7300 |
0.2962 |
- |
- |
| 0.2526 |
7400 |
0.2994 |
- |
- |
| 0.2560 |
7500 |
0.2964 |
- |
- |
| 0.2594 |
7600 |
0.2997 |
- |
- |
| 0.2629 |
7700 |
0.2932 |
- |
- |
| 0.2663 |
7800 |
0.2993 |
- |
- |
| 0.2697 |
7900 |
0.2987 |
- |
- |
| 0.2731 |
8000 |
0.2898 |
0.1084 |
0.6085 |
| 0.2765 |
8100 |
0.3007 |
- |
- |
| 0.2799 |
8200 |
0.2935 |
- |
- |
| 0.2833 |
8300 |
0.2885 |
- |
- |
| 0.2868 |
8400 |
0.3021 |
- |
- |
| 0.2902 |
8500 |
0.2958 |
- |
- |
| 0.2936 |
8600 |
0.3056 |
- |
- |
| 0.2970 |
8700 |
0.2908 |
- |
- |
| 0.3004 |
8800 |
0.3096 |
- |
- |
| 0.3038 |
8900 |
0.2924 |
- |
- |
| 0.3072 |
9000 |
0.3019 |
0.1077 |
0.607 |
| 0.3107 |
9100 |
0.2985 |
- |
- |
| 0.3141 |
9200 |
0.2906 |
- |
- |
| 0.3175 |
9300 |
0.2961 |
- |
- |
| 0.3209 |
9400 |
0.3044 |
- |
- |
| 0.3243 |
9500 |
0.3005 |
- |
- |
| 0.3277 |
9600 |
0.2943 |
- |
- |
| 0.3311 |
9700 |
0.2948 |
- |
- |
| 0.3345 |
9800 |
0.3046 |
- |
- |
| 0.3380 |
9900 |
0.2948 |
- |
- |
| 0.3414 |
10000 |
0.3060 |
0.1083 |
0.608 |
| 0.3448 |
10100 |
0.2906 |
- |
- |
| 0.3482 |
10200 |
0.2958 |
- |
- |
| 0.3516 |
10300 |
0.2919 |
- |
- |
| 0.3550 |
10400 |
0.3041 |
- |
- |
| 0.3584 |
10500 |
0.3055 |
- |
- |
| 0.3619 |
10600 |
0.2975 |
- |
- |
| 0.3653 |
10700 |
0.2984 |
- |
- |
| 0.3687 |
10800 |
0.2883 |
- |
- |
| 0.3721 |
10900 |
0.2949 |
- |
- |
| 0.3755 |
11000 |
0.2987 |
0.1083 |
0.6085 |
| 0.3789 |
11100 |
0.2938 |
- |
- |
| 0.3823 |
11200 |
0.2942 |
- |
- |
| 0.3858 |
11300 |
0.2879 |
- |
- |
| 0.3892 |
11400 |
0.2909 |
- |
- |
| 0.3926 |
11500 |
0.2899 |
- |
- |
| 0.3960 |
11600 |
0.2921 |
- |
- |
| 0.3994 |
11700 |
0.2944 |
- |
- |
| 0.4028 |
11800 |
0.2985 |
- |
- |
| 0.4062 |
11900 |
0.3027 |
- |
- |
| 0.4097 |
12000 |
0.2988 |
0.1082 |
0.61 |
- The bold row denotes the saved checkpoint.
Framework Versions
- Python: 3.10.18
- Sentence Transformers: 5.2.3
- Transformers: 5.2.0
- PyTorch: 2.8.0+cu128
- Accelerate: 1.12.0
- Datasets: 3.3.2
- Tokenizers: 0.22.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}