--- tags: - sentence-transformers - sentence-similarity - feature-extraction - dense - generated_from_trainer - dataset_size:262023 - loss:MultipleNegativesRankingLoss base_model: intfloat/e5-base-v2 widget: - source_sentence: "query: A discerning person keeps wisdom in view,\n but a fool’s\ \ eyes wander to the ends of the earth." sentences: - "passage: A foolish son brings grief to his father\n and bitterness to the\ \ mother who bore him." - 'passage: But whoever lives by the truth comes into the light, so that it may be seen plainly that what they have done has been done in the sight of God.' - 'passage: In the past, while Saul was king over us, you were the one who led Israel on their military campaigns. And the Lord said to you, ‘You will shepherd my people Israel, and you will become their ruler.’”' - source_sentence: 'query: Who was Joanna in the Bible?' sentences: - 'passage: Joanna the wife of Chuza, the manager of Herod’s household; Susanna; and many others. These women were helping to support them out of their own means.' - 'passage: Meanwhile, Horam king of Gezer had come up to help Lachish, but Joshua defeated him and his army—until no survivors were left.' - 'passage: As they were going out, they met a man from Cyrene, named Simon, and they forced him to carry the cross.' - source_sentence: 'query: Girdle meaning' sentences: - 'passage: But Joseph said, “Far be it from me to do such a thing! Only the man who was found to have the cup will become my slave. The rest of you, go back to your father in peace.”' - "passage: He takes off the shackles put on by kings\n and ties a loincloth\ \ around their waist." - 'passage: In the tent of meeting, outside the curtain that shields the ark of the covenant law, Aaron and his sons are to keep the lamps burning before the Lord from evening till morning. This is to be a lasting ordinance among the Israelites for the generations to come.' - source_sentence: 'query: The event ''Blind Man Healed'' as recorded in Scripture, involving Jesus.' sentences: - 'passage: Then he said: “Praise be to the Lord, the God of Israel, who with his own hand has fulfilled what he promised with his own mouth to my father David. For he said,' - 'passage: After Terah had lived 70 years, he became the father of Abram, Nahor and Haran.' - 'passage: Jesus said, “For judgment I have come into this world, so that the blind will see and those who see will become blind.”' - source_sentence: 'query: Law meaning' sentences: - "passage: “I will record Rahab and Babylon\n among those who acknowledge me—\n\ Philistia too, and Tyre, along with Cush—\n and will say, ‘This one was born\ \ in Zion.’”" - "passage: Your plunder, O nations, is harvested as by young locusts;\n like\ \ a swarm of locusts people pounce on it." - 'passage: For truly I tell you, until heaven and earth disappear, not the smallest letter, not the least stroke of a pen, will by any means disappear from the Law until everything is accomplished.' pipeline_tag: sentence-similarity library_name: sentence-transformers --- # SentenceTransformer based on intfloat/e5-base-v2 This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [intfloat/e5-base-v2](https://huggingface.co/intfloat/e5-base-v2). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** [intfloat/e5-base-v2](https://huggingface.co/intfloat/e5-base-v2) - **Maximum Sequence Length:** 256 tokens - **Output Dimensionality:** 768 dimensions - **Similarity Function:** Cosine Similarity ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Repository:** [Sentence Transformers on GitHub](https://github.com/huggingface/sentence-transformers) - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) ### Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertModel'}) (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) (2): Normalize() ) ``` ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer # Download from the 🤗 Hub model = SentenceTransformer("sentence_transformers_model_id") # Run inference sentences = [ 'query: Law meaning', 'passage: For truly I tell you, until heaven and earth disappear, not the smallest letter, not the least stroke of a pen, will by any means disappear from the Law until everything is accomplished.', 'passage: “I will record Rahab and Babylon\n among those who acknowledge me—\nPhilistia too, and Tyre, along with Cush—\n and will say, ‘This one was born in Zion.’”', ] embeddings = model.encode(sentences) print(embeddings.shape) # [3, 768] # Get the similarity scores for the embeddings similarities = model.similarity(embeddings, embeddings) print(similarities) # tensor([[1.0000, 0.7034, 0.5718], # [0.7034, 1.0000, 0.6188], # [0.5718, 0.6188, 1.0000]]) ``` ## Training Details ### Training Dataset #### Unnamed Dataset * Size: 262,023 training samples * Columns: sentence_0, sentence_1, and label * Approximate statistics based on the first 1000 samples: | | sentence_0 | sentence_1 | label | |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:--------------------------------------------------------------| | type | string | string | float | | details | | | | * Samples: | sentence_0 | sentence_1 | label | |:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------| | query: Messiah: (Heb. mashiah), in all the thirty-nine instances of its occurring in the Old Testament, is rendered by the LXX. “Christos.” It means anointed. Thus priests (Ex. 28:41; 40:15; Num. 3:3), prophets (1 Kings 19:16), and kings (1 Sam. 9:16; 16:3; 2 Sam. 12:7) were anointed with oil, and so consecrated to their respective offices. The great Messiah is anointed “above his fellows” (Ps. 45:7); i.e., he embraces in himself all the three offices. | passage: Anoint them just as you anointed their father, so they may serve me as priests. Their anointing will be to a priesthood that will continue throughout their generations.” | 1.0 | | query: who was Toi | passage: he sent his son Joram to King David to greet him and congratulate him on his victory in battle over Hadadezer, who had been at war with Tou. Joram brought with him articles of silver, of gold and of bronze. | 1.0 | | query: God | passage: Bring the grain offering made of these things to the Lord; present it to the priest, who shall take it to the altar. | 1.0 | * Loss: [MultipleNegativesRankingLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters: ```json { "scale": 20.0, "similarity_fct": "cos_sim", "gather_across_devices": false } ``` ### Training Hyperparameters #### Non-Default Hyperparameters - `per_device_train_batch_size`: 32 - `per_device_eval_batch_size`: 32 - `num_train_epochs`: 1 - `max_steps`: 25 - `multi_dataset_batch_sampler`: round_robin #### All Hyperparameters
Click to expand - `overwrite_output_dir`: False - `do_predict`: False - `eval_strategy`: no - `prediction_loss_only`: True - `per_device_train_batch_size`: 32 - `per_device_eval_batch_size`: 32 - `per_gpu_train_batch_size`: None - `per_gpu_eval_batch_size`: None - `gradient_accumulation_steps`: 1 - `eval_accumulation_steps`: None - `torch_empty_cache_steps`: None - `learning_rate`: 5e-05 - `weight_decay`: 0.0 - `adam_beta1`: 0.9 - `adam_beta2`: 0.999 - `adam_epsilon`: 1e-08 - `max_grad_norm`: 1 - `num_train_epochs`: 1 - `max_steps`: 25 - `lr_scheduler_type`: linear - `lr_scheduler_kwargs`: None - `warmup_ratio`: 0.0 - `warmup_steps`: 0 - `log_level`: passive - `log_level_replica`: warning - `log_on_each_node`: True - `logging_nan_inf_filter`: True - `save_safetensors`: True - `save_on_each_node`: False - `save_only_model`: False - `restore_callback_states_from_checkpoint`: False - `no_cuda`: False - `use_cpu`: False - `use_mps_device`: False - `seed`: 42 - `data_seed`: None - `jit_mode_eval`: False - `bf16`: False - `fp16`: False - `fp16_opt_level`: O1 - `half_precision_backend`: auto - `bf16_full_eval`: False - `fp16_full_eval`: False - `tf32`: None - `local_rank`: 0 - `ddp_backend`: None - `tpu_num_cores`: None - `tpu_metrics_debug`: False - `debug`: [] - `dataloader_drop_last`: False - `dataloader_num_workers`: 0 - `dataloader_prefetch_factor`: None - `past_index`: -1 - `disable_tqdm`: False - `remove_unused_columns`: True - `label_names`: None - `load_best_model_at_end`: False - `ignore_data_skip`: False - `fsdp`: [] - `fsdp_min_num_params`: 0 - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} - `fsdp_transformer_layer_cls_to_wrap`: None - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} - `parallelism_config`: None - `deepspeed`: None - `label_smoothing_factor`: 0.0 - `optim`: adamw_torch_fused - `optim_args`: None - `adafactor`: False - `group_by_length`: False - `length_column_name`: length - `project`: huggingface - `trackio_space_id`: trackio - `ddp_find_unused_parameters`: None - `ddp_bucket_cap_mb`: None - `ddp_broadcast_buffers`: False - `dataloader_pin_memory`: True - `dataloader_persistent_workers`: False - `skip_memory_metrics`: True - `use_legacy_prediction_loop`: False - `push_to_hub`: False - `resume_from_checkpoint`: None - `hub_model_id`: None - `hub_strategy`: every_save - `hub_private_repo`: None - `hub_always_push`: False - `hub_revision`: None - `gradient_checkpointing`: False - `gradient_checkpointing_kwargs`: None - `include_inputs_for_metrics`: False - `include_for_metrics`: [] - `eval_do_concat_batches`: True - `fp16_backend`: auto - `push_to_hub_model_id`: None - `push_to_hub_organization`: None - `mp_parameters`: - `auto_find_batch_size`: False - `full_determinism`: False - `torchdynamo`: None - `ray_scope`: last - `ddp_timeout`: 1800 - `torch_compile`: False - `torch_compile_backend`: None - `torch_compile_mode`: None - `include_tokens_per_second`: False - `include_num_input_tokens_seen`: no - `neftune_noise_alpha`: None - `optim_target_modules`: None - `batch_eval_metrics`: False - `eval_on_start`: False - `use_liger_kernel`: False - `liger_kernel_config`: None - `eval_use_gather_object`: False - `average_tokens_across_devices`: True - `prompts`: None - `batch_sampler`: batch_sampler - `multi_dataset_batch_sampler`: round_robin - `router_mapping`: {} - `learning_rate_mapping`: {}
### Framework Versions - Python: 3.11.14 - Sentence Transformers: 5.2.0 - Transformers: 4.57.6 - PyTorch: 2.10.0+cpu - Accelerate: 1.12.0 - Datasets: 4.5.0 - Tokenizers: 0.22.2 ## Citation ### BibTeX #### Sentence Transformers ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ``` #### MultipleNegativesRankingLoss ```bibtex @misc{henderson2017efficient, title={Efficient Natural Language Response Suggestion for Smart Reply}, author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil}, year={2017}, eprint={1705.00652}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```