hf-e5-bible-25 / README.md
dpshade22's picture
Upload hf-e5-bible-25 embedding model
fba69df verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - dense
  - generated_from_trainer
  - dataset_size:262023
  - loss:MultipleNegativesRankingLoss
base_model: intfloat/e5-base-v2
widget:
  - source_sentence: |-
      query: A discerning person keeps wisdom in view,
          but a fool’s eyes wander to the ends of the earth.
    sentences:
      - |-
        passage: A foolish son brings grief to his father
            and bitterness to the mother who bore him.
      - >-
        passage: But whoever lives by the truth comes into the light, so that it
        may be seen plainly that what they have done has been done in the sight
        of God.
      - >-
        passage: In the past, while Saul was king over us, you were the one who
        led Israel on their military campaigns. And the Lord said to you, ‘You
        will shepherd my people Israel, and you will become their ruler.’”
  - source_sentence: 'query: Who was Joanna in the Bible?'
    sentences:
      - >-
        passage: Joanna the wife of Chuza, the manager of Herod’s household;
        Susanna; and many others. These women were helping to support them out
        of their own means.
      - >-
        passage: Meanwhile, Horam king of Gezer had come up to help Lachish, but
        Joshua defeated him and his army—until no survivors were left.
      - >-
        passage: As they were going out, they met a man from Cyrene, named
        Simon, and they forced him to carry the cross.
  - source_sentence: 'query: Girdle meaning'
    sentences:
      - >-
        passage: But Joseph said, “Far be it from me to do such a thing! Only
        the man who was found to have the cup will become my slave. The rest of
        you, go back to your father in peace.”
      - |-
        passage: He takes off the shackles put on by kings
            and ties a loincloth around their waist.
      - >-
        passage: In the tent of meeting, outside the curtain that shields the
        ark of the covenant law, Aaron and his sons are to keep the lamps
        burning before the Lord from evening till morning. This is to be a
        lasting ordinance among the Israelites for the generations to come.
  - source_sentence: >-
      query: The event 'Blind Man Healed' as recorded in Scripture, involving
      Jesus.
    sentences:
      - >-
        passage: Then he said:

        “Praise be to the Lord, the God of Israel, who with his own hand has
        fulfilled what he promised with his own mouth to my father David. For he
        said,
      - >-
        passage: After Terah had lived 70 years, he became the father of Abram,
        Nahor and Haran.
      - >-
        passage: Jesus said, “For judgment I have come into this world, so that
        the blind will see and those who see will become blind.”
  - source_sentence: 'query: Law meaning'
    sentences:
      - |-
        passage: “I will record Rahab and Babylon
            among those who acknowledge me—
        Philistia too, and Tyre, along with Cush—
            and will say, ‘This one was born in Zion.’”
      - |-
        passage: Your plunder, O nations, is harvested as by young locusts;
            like a swarm of locusts people pounce on it.
      - >-
        passage: For truly I tell you, until heaven and earth disappear, not the
        smallest letter, not the least stroke of a pen, will by any means
        disappear from the Law until everything is accomplished.
pipeline_tag: sentence-similarity
library_name: sentence-transformers

SentenceTransformer based on intfloat/e5-base-v2

This is a sentence-transformers model finetuned from intfloat/e5-base-v2. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: intfloat/e5-base-v2
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'query: Law meaning',
    'passage: For truly I tell you, until heaven and earth disappear, not the smallest letter, not the least stroke of a pen, will by any means disappear from the Law until everything is accomplished.',
    'passage: “I will record Rahab and Babylon\n    among those who acknowledge me—\nPhilistia too, and Tyre, along with Cush—\n    and will say, ‘This one was born in Zion.’”',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.7034, 0.5718],
#         [0.7034, 1.0000, 0.6188],
#         [0.5718, 0.6188, 1.0000]])

Training Details

Training Dataset

Unnamed Dataset

  • Size: 262,023 training samples
  • Columns: sentence_0, sentence_1, and label
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 label
    type string string float
    details
    • min: 5 tokens
    • mean: 29.07 tokens
    • max: 256 tokens
    • min: 8 tokens
    • mean: 34.62 tokens
    • max: 94 tokens
    • min: 1.0
    • mean: 1.0
    • max: 1.0
  • Samples:
    sentence_0 sentence_1 label
    query: Messiah: (Heb. mashiah), in all the thirty-nine instances of its occurring in the Old Testament, is rendered by the LXX. “Christos.” It means anointed. Thus priests (Ex. 28:41; 40:15; Num. 3:3), prophets (1 Kings 19:16), and kings (1 Sam. 9:16; 16:3; 2 Sam. 12:7) were anointed with oil, and so consecrated to their respective offices. The great Messiah is anointed “above his fellows” (Ps. 45:7); i.e., he embraces in himself all the three offices. passage: Anoint them just as you anointed their father, so they may serve me as priests. Their anointing will be to a priesthood that will continue throughout their generations.” 1.0
    query: who was Toi passage: he sent his son Joram to King David to greet him and congratulate him on his victory in battle over Hadadezer, who had been at war with Tou. Joram brought with him articles of silver, of gold and of bronze. 1.0
    query: God passage: Bring the grain offering made of these things to the Lord; present it to the priest, who shall take it to the altar. 1.0
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • num_train_epochs: 1
  • max_steps: 25
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 1
  • max_steps: 25
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Framework Versions

  • Python: 3.11.14
  • Sentence Transformers: 5.2.0
  • Transformers: 4.57.6
  • PyTorch: 2.10.0+cpu
  • Accelerate: 1.12.0
  • Datasets: 4.5.0
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}