Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper
•
1908.10084
•
Published
•
9
This is a sentence-transformers model finetuned from buddhist-nlp/buddhist-sentence-similarity. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'ན་མོ་དེ་དག་ལ་སོགས་པ་ཕྱོགས་བཅུ′ཨི་′ཇིག་རྟེན་གྱི་ཁམས་ཐམས་ཅད་ན༑་དེ་བཞིན་གཤེགས་པ་དགྲ་བཅོམ་པ་ཡང་དག་པར་རྫོགས་པ′ཨི་སངས་རྒྱས་བཅོམ་ལྡན་′དས་གང་ཇི་སྙེད་ཅིག་བཞུགས་ཏེ་′ཚོ་ཞིང་གཞེས་པ′ཨི་སངས་རྒྱས་བཅོམ་ལྡན་′དས་དེ་དག་ཐམས་ཅད་བདག་ལ་དགོངས་སུ་གསོལ༑',
'tede terigüüten arban zügiyin xamuq yertüncüyin oron-du ilaγun tögüsün üleqsen tögünčilen boluqsan dayini durun sayitur dousuqsan burxad ali kedüi soun-yin tālaxui xamuq ilaγun tögüsüq/sen burxad namai ayiladun soyirxo:',
'subudi ali bodhi-sadv-nar eyin kemēn: bi oroni zoҟōl-noγoudi bütēmüi kemēn ögüülekülē: töüni basa tögünčilen ögüülen bü üyiled:',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
sentence_0 and sentence_1| sentence_0 | sentence_1 | |
|---|---|---|
| type | string | string |
| details |
|
|
| sentence_0 | sentence_1 |
|---|---|
དེ་ལྟ་བུ་དེས༑་ཐོ་རངས༑་སྔ་དྲོ༑་ཕྱི་དྲོ༑་སྲོད་དེ་ཐུན་བཞི་ལ་བསྒོམ་པར་བྱ་སྟེ༑་དེ་ཡང་དང་པོར་ཡུན་རིངས་ན་བྱིང་རྒོད་ཀྱི་དབང་དུ་འགྲོ་སླ་ཞིང༑་དེ་ལ་གོམས་ན་བློ་འཆོས་དཀའ་བས༑་ཐུན་ཆུང་ལ་གྲངས་མང་བར་བྱ༑ |
tere metü töügēr örlȫbür erte oroi üdeši bür tere dörbön xübidü bišilγan üyiled tere čü urida öni odxulā čibkü doqšir xoxuyin erkēr odxu kilbar bolun: töün-dü bišilxulā oyou-bēn yasaxu kerke yin tula mün: xubi oxor {olo} üyiled: |
འཛམ་བུའི་གླིང་ན་ངེད་ཀྱི་བུ༑༑་འདི་ལྟར་ཆོས་ལྡན་རྒྱལ་པོ་ཡིན༑༑་ཆོས་ཀྱི་ཡུལ་འཁོར་དེར་བསྟན་ནས༑༑་ལེགས་པར་བྱེད་ལ་སྐྱེ་བོ་བཀོད༑༑ |
zambutib/-tu mani küböün: ene metü nom/toi xān mün: nomiyin oron orčin tende üzüülēd: sayitur üyiledküi-dü tӧrӧlkitӧni zoҟōxu:: |
ཌེ་བཞིན་གཤེགས་པ་བརྒྱད་ཀྱི་རིག་སྔགས་′དི་བརྗོད་པ་ཙམ་གྱིས་′གྲོ་བ་རིགས་དྲུག་གི་སེམས་ཅན་རྣམས་སོ་སོ′ཨི་སྡུག་བསྔལ་རྣམས་ཞི་ནས་བདེ་ལེགས་སུ་གྱུར་ཅིག༑ |
nayiman bodhi sadv-yin zarliq tögünčilen ireqsen züreken öü/ni ögüüleqseni tödüi-gēr zur/γān züyil xamuq amitani tus-buri/yin zobolong amurlīd: amuγuu/lang xotolo tögüskü boltuγai:: |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
eval_strategy: stepsper_device_train_batch_size: 16per_device_eval_batch_size: 16num_train_epochs: 40fp16: Truemulti_dataset_batch_sampler: round_robinoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 16per_device_eval_batch_size: 16per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 40max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Falsehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseeval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseeval_use_gather_object: Falsebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robin| Epoch | Step | Training Loss |
|---|---|---|
| 1.0 | 231 | - |
| 1.2987 | 300 | - |
| 2.0 | 462 | - |
| 2.1645 | 500 | 0.6161 |
| 2.5974 | 600 | - |
| 3.0 | 693 | - |
| 3.8961 | 900 | - |
| 4.0 | 924 | - |
| 4.3290 | 1000 | 0.0751 |
| 5.0 | 1155 | - |
| 5.1948 | 1200 | - |
| 6.0 | 1386 | - |
| 6.4935 | 1500 | 0.0292 |
| 7.0 | 1617 | - |
| 7.7922 | 1800 | - |
| 8.0 | 1848 | - |
| 8.6580 | 2000 | 0.0158 |
| 9.0 | 2079 | - |
| 9.0909 | 2100 | - |
| 10.0 | 2310 | - |
| 10.3896 | 2400 | - |
| 10.8225 | 2500 | 0.011 |
| 11.0 | 2541 | - |
| 11.6883 | 2700 | - |
| 12.0 | 2772 | - |
| 12.9870 | 3000 | 0.0102 |
| 13.0 | 3003 | - |
| 14.0 | 3234 | - |
| 14.2857 | 3300 | - |
| 15.0 | 3465 | - |
| 15.1515 | 3500 | 0.0121 |
| 15.5844 | 3600 | - |
| 16.0 | 3696 | - |
| 16.8831 | 3900 | - |
| 17.0 | 3927 | - |
| 17.3160 | 4000 | 0.0087 |
| 18.0 | 4158 | - |
| 18.1818 | 4200 | - |
| 19.0 | 4389 | - |
| 19.4805 | 4500 | 0.0078 |
| 20.0 | 4620 | - |
| 20.7792 | 4800 | - |
| 21.0 | 4851 | - |
| 21.6450 | 5000 | 0.0083 |
| 22.0 | 5082 | - |
| 22.0779 | 5100 | - |
| 23.0 | 5313 | - |
| 23.3766 | 5400 | - |
| 23.8095 | 5500 | 0.0083 |
| 24.0 | 5544 | - |
| 24.6753 | 5700 | - |
| 25.0 | 5775 | - |
| 25.9740 | 6000 | 0.0065 |
| 26.0 | 6006 | - |
| 27.0 | 6237 | - |
| 27.2727 | 6300 | - |
| 28.0 | 6468 | - |
| 28.1385 | 6500 | 0.0059 |
| 28.5714 | 6600 | - |
| 29.0 | 6699 | - |
| 29.8701 | 6900 | - |
| 30.0 | 6930 | - |
| 30.3030 | 7000 | 0.007 |
| 31.0 | 7161 | - |
| 31.1688 | 7200 | - |
| 32.0 | 7392 | - |
| 32.4675 | 7500 | 0.0058 |
| 33.0 | 7623 | - |
| 33.7662 | 7800 | - |
| 34.0 | 7854 | - |
| 34.6320 | 8000 | 0.0043 |
| 35.0 | 8085 | - |
| 35.0649 | 8100 | - |
| 36.0 | 8316 | - |
| 36.3636 | 8400 | - |
| 36.7965 | 8500 | 0.0044 |
| 37.0 | 8547 | - |
| 37.6623 | 8700 | - |
| 38.0 | 8778 | - |
| 38.9610 | 9000 | 0.0059 |
| 39.0 | 9009 | - |
| 40.0 | 9240 | - |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
buddhist-nlp/buddhist-sentence-similarity