hf-e5-bible-75 / README.md
dpshade22's picture
Upload hf-e5-bible-75 embedding model
8622281 verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - dense
  - generated_from_trainer
  - dataset_size:262023
  - loss:MultipleNegativesRankingLoss
base_model: intfloat/e5-base-v2
widget:
  - source_sentence: 'query: what happened at holy week'
    sentences:
      - >-
        passage: He replied, “Go into the city to a certain man and tell him,
        ‘The Teacher says: My appointed time is near. I am going to celebrate
        the Passover with my disciples at your house.’”
      - |-
        passage: Many nations will come and say,
        “Come, let us go up to the mountain of the Lord,
            to the temple of the God of Jacob.
        He will teach us his ways,
            so that we may walk in his paths.”
        The law will go out from Zion,
            the word of the Lord from Jerusalem.
      - >-
        passage: But seek first his kingdom and his righteousness, and all these
        things will be given to you as well.
  - source_sentence: 'query: what is Cheek'
    sentences:
      - >-
        passage: But I tell you, do not resist an evil person. If anyone slaps
        you on the right cheek, turn to them the other cheek also.
      - |-
        passage: “I am the Lord; that is my name!
            I will not yield my glory to another
            or my praise to idols.
      - 'passage: Ham'
  - source_sentence: 'query: what happened at prophecies of isaiah'
    sentences:
      - >-
        passage: The Israelites who were present in Jerusalem celebrated the
        Festival of Unleavened Bread for seven days with great rejoicing, while
        the Levites and priests praised the Lord every day with resounding
        instruments dedicated to the Lord.
      - |-
        passage: The blacksmith takes a tool
            and works with it in the coals;
        he shapes an idol with hammers,
            he forges it with the might of his arm.
        He gets hungry and loses his strength;
            he drinks no water and grows faint.
      - >-
        passage: “Take a census of the whole Israelite community by their clans
        and families, listing every man by name, one by one.
  - source_sentence: 'query: who was God'
    sentences:
      - >-
        passage: he will not listen to you. Then I will lay my hand on Egypt and
        with mighty acts of judgment I will bring out my divisions, my people
        the Israelites.
      - >-
        passage: I looked, and there before me was a white horse! Its rider held
        a bow, and he was given a crown, and he rode out as a conqueror bent on
        conquest.
      - >-
        passage: The Lord your God will put all these curses on your enemies who
        hate and persecute you.
  - source_sentence: 'query: Moses'
    sentences:
      - >-
        passage: King Ahaz cut off the side panels and removed the basins from
        the movable stands. He removed the Sea from the bronze bulls that
        supported it and set it on a stone base.
      - >-
        passage: “Gad will have one portion; it will border the territory of
        Zebulun from east to west.
      - >-
        passage: Now Joshua son of Nun was filled with the spirit of wisdom
        because Moses had laid his hands on him. So the Israelites listened to
        him and did what the Lord had commanded Moses.
pipeline_tag: sentence-similarity
library_name: sentence-transformers

SentenceTransformer based on intfloat/e5-base-v2

This is a sentence-transformers model finetuned from intfloat/e5-base-v2. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: intfloat/e5-base-v2
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'query: Moses',
    'passage: Now Joshua son of Nun was filled with the spirit of wisdom because Moses had laid his hands on him. So the Israelites listened to him and did what the Lord had commanded Moses.',
    'passage: King Ahaz cut off the side panels and removed the basins from the movable stands. He removed the Sea from the bronze bulls that supported it and set it on a stone base.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.6713, 0.4499],
#         [0.6713, 1.0000, 0.4162],
#         [0.4499, 0.4162, 1.0000]])

Training Details

Training Dataset

Unnamed Dataset

  • Size: 262,023 training samples
  • Columns: sentence_0, sentence_1, and label
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 label
    type string string float
    details
    • min: 5 tokens
    • mean: 26.15 tokens
    • max: 256 tokens
    • min: 5 tokens
    • mean: 35.44 tokens
    • max: 93 tokens
    • min: 1.0
    • mean: 1.0
    • max: 1.0
  • Samples:
    sentence_0 sentence_1 label
    query: story of prophecies of isaiah passage: The oxen and donkeys that work the soil will eat fodder and mash, spread out with fork and shovel. 1.0
    query: Abraham and Lot passage: Now Lot, who was moving about with Abram, also had flocks and herds and tents. 1.0
    query: Why were the blind and lame not allowed to “enter the house” (2 Samuel 5:8)? passage: And Mephibosheth lived in Jerusalem, because he always ate at the king’s table; he was lame in both feet. 1.0
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • num_train_epochs: 1
  • max_steps: 75
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 1
  • max_steps: 75
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Framework Versions

  • Python: 3.11.14
  • Sentence Transformers: 5.2.0
  • Transformers: 4.57.6
  • PyTorch: 2.10.0+cpu
  • Accelerate: 1.12.0
  • Datasets: 4.5.0
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}