Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper
•
1908.10084
•
Published
•
12
This is a sentence-transformers model finetuned from intfloat/e5-base-v2. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertModel'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'query: God: (A.S. and Dutch God; Dan. Gud; Ger. Gott), the name of the Divine Being. It is the rendering (1) of the Hebrew <i> \'El</i> , from a word meaning to be strong; (2) of <i> \'Eloah_, plural _\'Elohim</i> . The singular form, <i> Eloah</i> , is used only in poetry. The plural form is more commonly used in all parts of the Bible, The Hebrew word Jehovah (q.v.), the only other word generally employed to denote the Supreme Being, is uniformly rendered in the Authorized Version by "LORD," printed in small capitals. The existence of God is taken for granted in the Bible. There is nowhere any argument to prove it. He who disbelieves this truth is spoken of as one devoid of understanding ( Psalms 14:1 ). The arguments generally adduced by theologians in proof of the being of God are: <li> The a priori argument, which is the testimony afforded by reason. <li> The a posteriori argument, by which we proceed logically from the facts of experience to causes. These arguments are, (a) The cosmological, by which it is proved that there must be a First Cause of all things, for every effect must have a cause. (b) The teleological, or the argument from design. We see everywhere the operations of an intelligent Cause in nature. (c) The moral argument, called also the anthropological argument, based on the moral consciousness and the history of mankind, which exhibits a moral order and purpose which can only be explained on the supposition of the existence of God. Conscience and human history testify that "verily there is a God that judgeth in the earth." The attributes of God are set forth in order by Moses in Exodus 34:6 Exodus 34:7 . (see also Deuteronomy 6:4 ; 10:17 ; Numbers 16:22 ; Exodus 15:11 ; 33:19 ; Isaiah 44:6 ; Habakkuk 3:6 ; Psalms 102:26 ; Job 34:12 .) They are also systematically classified in Revelation 5:12 and 7:12 . God\'s attributes are spoken of by some as absolute, i.e., such as belong to his essence as Jehovah, Jah, etc.; and relative, i.e., such as are ascribed to him with relation to his creatures. Others distinguish them into communicable, i.e., those which can be imparted in degree to his creatures: goodness, holiness, wisdom, etc.; and incommunicable, which cannot be so imparted: independence, immutability, immensity, and eternity. They are by some also divided into natural attributes, eternity, immensity, etc.; and moral, holiness, goodness, etc.',
'passage: How long, Lord, must I call for help,\n but you do not listen?\nOr cry out to you, “Violence!”\n but you do not save?',
'passage: Then each man grabbed his opponent by the head and thrust his dagger into his opponent’s side, and they fell down together. So that place in Gibeon was called Helkath Hazzurim.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.4670, 0.3140],
# [0.4670, 1.0000, 0.4137],
# [0.3140, 0.4137, 1.0000]])
sentence_0, sentence_1, and label| sentence_0 | sentence_1 | label | |
|---|---|---|---|
| type | string | string | float |
| details |
|
|
|
| sentence_0 | sentence_1 | label |
|---|---|---|
query: To those who sold doves he said, “Get these out of here! Stop turning my Father’s house into a market!” |
passage: His disciples remembered that it is written: “Zeal for your house will consume me.” |
1.0 |
query: Joseph (son of Jacob) |
passage: Joseph found favor in his eyes and became his attendant. Potiphar put him in charge of his household, and he entrusted to his care everything he owned. |
1.0 |
query: Divination meaning |
passage: He sacrificed his children in the fire in the Valley of Ben Hinnom, practiced divination and witchcraft, sought omens, and consulted mediums and spiritists. He did much evil in the eyes of the Lord, arousing his anger. |
1.0 |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim",
"gather_across_devices": false
}
per_device_train_batch_size: 32per_device_eval_batch_size: 32num_train_epochs: 1max_steps: 100multi_dataset_batch_sampler: round_robinoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 32per_device_eval_batch_size: 32per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 1max_steps: 100lr_scheduler_type: linearlr_scheduler_kwargs: Nonewarmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthproject: huggingfacetrackio_space_id: trackioddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: noneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Trueprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robinrouter_mapping: {}learning_rate_mapping: {}@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
intfloat/e5-base-v2