SentenceTransformer based on buddhist-nlp/buddhist-sentence-similarity

This is a sentence-transformers model finetuned from buddhist-nlp/buddhist-sentence-similarity. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    "ji ltar chos smra ba de'i lus mi ngal bar 'gyur ba dang| lus kyi dbang po bde bar 'gyur ba dang| rab tu dga' ba skye bar 'gyur ba dang| gang gi slad du sangs rgyas brgya stong dag la dge ba'i rtsa ba bskrun pa'i sems can rnams kyi don gyi slad du| gser 'od dam pa mdo sde'i dbang po'i rgyal po 'di 'dzam bu'i gling du yun ring du gnas shing myur du nub par mi 'gyur ba dang| sems can rnams kyang gser 'od dam pa mdo sde'i dbang po'i rgyal po 'di nyan par 'gyur ba dang| ye shes kyi phung po bsam gyis mi khyab pa thob par 'gyur ba dang| shes rab dang ldan par 'gyur ba dang| bsod nams kyi phung po rab tu 'dzin par 'gyur ba dang| ma 'ongs pa'i dus na bskal pa bye ba khrag khrig brgya stong phrag du mar lha dang mi'i bde ba bsam gyis mi khyab pa myong bar 'gyur ba dang| de bzhin gshegs pa dang 'grogs par 'gyur ba dang| ma 'ongs pa'i dus na bla na med pa yang dag par rdzogs pa'i byang chub mngon par rdzogs par 'tshang rgya bar 'gyur ba dang| sems can dmyal ba dang| dud 'gro'i skye gnas dang| gshin rje'i 'jig rten gyi sdug bsngal thams cad shin tu rgyun chad par 'gyur bar de'i spu'i bu ga rnams su mdangs stsal bar bgyi'o||",
    'yamaru nom ӧgüüleqči dgeslong töüni beye ülü alzoulun üyiledün: beyeyin erketü-yi amuγuulang bolγon: sayitur bayasxan üyiledkü kigēd: keni tulada zoun mingγan burxan-noγoudtu buyani ündüsü öüskeqsen amitan-noγoudiyin tusayin tulada: suduriyin aimagiyin erketü xān dēdü altan gerel öüni: ‘zambutib-tu önidö orošiulun ötör ülü ecüdken üyiledkü kigēd: amitan-noγoudčü suduriyin ayimagiyin erketü xān dēdü altan gerel öüni sonosun üyiledkü kigēd: belge biligiyin coqco sedkiši ügei olun üyiledkü: biliq-lügē tögüsün üyiledkü kigēd: buyani coqco sayitur barin üyiledkü: irē ödüi caqtu olon zoun mingγan kraq kriq ǯeva γalab-tu: kümün tenggeriyin amuγuulang sedkiši ügei edlen üiledkü kigēd: tögünčilen boluqsan-luγā nӧkücün üyiledkü: irē ödüi caqtu dēre ügei sayitur dousuqsan bodhi-du ilerkei dousun burxan bolxu kigēd tamuyin amitan kigēd adousuni tӧrӧkui oron erligiyin yertünčüyin zobolong xamugi tasulun üyiledküye: tӧüni šara üsün-noγoud- tu önggü ogün üyiledümüi:',
    'dusul xōsun zoun üzüq:',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 3,779 training samples
  • Columns: sentence_0 and sentence_1
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1
    type string string
    details
    • min: 3 tokens
    • mean: 56.43 tokens
    • max: 426 tokens
    • min: 7 tokens
    • mean: 57.5 tokens
    • max: 397 tokens
  • Samples:
    sentence_0 sentence_1
    bdag ni khyod kyi srog bcod du 'ongs pas gsung rab mdo sde’i sgra thos pa tsam gyi bdag gi mthu stobs kyang rab tu nyams zhe sdang rnams kyang rab tu zhi zhing nyams las
    gang gi phyir ni 'byung ba mi 'byung ba
    sgyu ma smig rgyu lta bu chags med pa
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 80
  • fp16: True
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 80
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Click to expand
Epoch Step Training Loss
1.0 237 -
1.2658 300 -
2.0 474 -
2.1097 500 1.3285
2.5316 600 -
3.0 711 -
3.7975 900 -
4.0 948 -
4.2194 1000 0.4782
5.0 1185 -
5.0633 1200 -
6.0 1422 -
6.3291 1500 0.2195
7.0 1659 -
7.5949 1800 -
8.0 1896 -
8.4388 2000 0.1024
8.8608 2100 -
9.0 2133 -
10.0 2370 -
10.1266 2400 -
10.5485 2500 0.054
11.0 2607 -
11.3924 2700 -
12.0 2844 -
12.6582 3000 0.0277
13.0 3081 -
13.9241 3300 -
14.0 3318 -
14.7679 3500 0.0205
15.0 3555 -
15.1899 3600 -
16.0 3792 -
16.4557 3900 -
16.8776 4000 0.0173
17.0 4029 -
17.7215 4200 -
18.0 4266 -
18.9873 4500 0.0177
19.0 4503 -
20.0 4740 -
20.2532 4800 -
21.0 4977 -
21.0970 5000 0.0114
21.5190 5100 -
22.0 5214 -
22.7848 5400 -
23.0 5451 -
23.2068 5500 0.0115
24.0 5688 -
24.0506 5700 -
25.0 5925 -
25.3165 6000 0.0095
26.0 6162 -
26.5823 6300 -
27.0 6399 -
27.4262 6500 0.0123
27.8481 6600 -
28.0 6636 -
29.0 6873 -
29.1139 6900 -
29.5359 7000 0.0087
30.0 7110 -
30.3797 7200 -
31.0 7347 -
31.6456 7500 0.0074
32.0 7584 -
32.9114 7800 -
33.0 7821 -
33.7553 8000 0.0108
34.0 8058 -
34.1772 8100 -
35.0 8295 -
35.4430 8400 -
35.8650 8500 0.0074
36.0 8532 -
36.7089 8700 -
37.0 8769 -
37.9747 9000 0.0068
38.0 9006 -
39.0 9243 -
39.2405 9300 -
40.0 9480 -
40.0844 9500 0.0053
40.5063 9600 -
41.0 9717 -
41.7722 9900 -
42.0 9954 -
42.1941 10000 0.0066
43.0 10191 -
43.0380 10200 -
44.0 10428 -
44.3038 10500 0.0073
45.0 10665 -
45.5696 10800 -
46.0 10902 -
46.4135 11000 0.0067
46.8354 11100 -
47.0 11139 -
48.0 11376 -
48.1013 11400 -
48.5232 11500 0.0061
49.0 11613 -
49.3671 11700 -
50.0 11850 -
50.6329 12000 0.0062
51.0 12087 -
51.8987 12300 -
52.0 12324 -
52.7426 12500 0.0051
53.0 12561 -
53.1646 12600 -
54.0 12798 -
54.4304 12900 -
54.8523 13000 0.0052
55.0 13035 -
55.6962 13200 -
56.0 13272 -
56.9620 13500 0.004
57.0 13509 -
58.0 13746 -
58.2278 13800 -
59.0 13983 -

Framework Versions

  • Python: 3.12.1
  • Sentence Transformers: 3.1.1
  • Transformers: 4.44.2
  • PyTorch: 2.6.0.dev20240923+cu121
  • Accelerate: 1.12.0
  • Datasets: 4.4.2
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
32
Safetensors
Model size
0.5B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for LilNomto/labse_oi_bo

Finetuned
(1)
this model

Papers for LilNomto/labse_oi_bo