e5-base-bible-50 / README.md
dpshade22's picture
Upload e5-base-bible-50 embedding model
96e145f verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - dense
  - generated_from_trainer
  - dataset_size:70323
  - loss:CosineSimilarityLoss
base_model: intfloat/e5-base-v2
widget:
  - source_sentence: 'Birth of Cainan | participants: cainan_534, enos_1193'
    sentences:
      - >-
        The mother of Sisera looked out at a window, and cried through the
        lattice, Why is his chariot so long in coming? why tarry the wheels of
        his chariots?
      - >-
        Therefore, behold, the days come, that I will do judgment upon the
        graven images of Babylon: and her whole land shall be confounded, and
        all her slain shall fall in the midst of her.
      - >-
        Which was the son of Mathusala, which was the son of Enoch, which was
        the son of Jared, which was the son of Maleleel, which was the son of
        Cainan,
  - source_sentence: >-
      Jerusalem Council | participants: silas_2740, judas_1759, james_719,
      peter_2745, barnabas_1722, paul_2479
    sentences:
      - >-
        What ailed thee, O thou sea, that thou fleddest? thou Jordan, that thou
        wast driven back?
      - >-
        We have sent therefore Judas and Silas, who shall also tell you the same
        things by mouth.
      - >-
        The Spirit itself beareth witness with our spirit, that we are the
        children of God:
  - source_sentence: >-
      But he that is married careth for the things that are of the world, how he
      may please his wife.
    sentences:
      - >-
        But she had brought them up to the roof of the house, and hid them with
        the stalks of flax, which she had laid in order upon the roof.
      - >-
        And their whole body, and their backs, and their hands, and their wings,
        and the wheels, were full of eyes round about, even the wheels that they
        four had.
      - >-
        There is difference also between a wife and a virgin. The unmarried
        woman careth for the things of the Lord, that she may be holy both in
        body and in spirit: but she that is married careth for the things of the
        world, how she may please her husband.
  - source_sentence: And the little owl, and the cormorant, and the great owl,
    sentences:
      - And the swan, and the pelican, and the gier eagle,
      - >-
        Take Aaron and his sons with him, and the garments, and the anointing
        oil, and a bullock for the sin offering, and two rams, and a basket of
        unleavened bread;
      - >-
        And his power shall be mighty, but not by his own power: and he shall
        destroy wonderfully, and shall prosper, and practise, and shall destroy
        the mighty and the holy people.
  - source_sentence: John's Witness
    sentences:
      - >-
        And they asked him, and said unto him, Why baptizest thou then, if thou
        be not that Christ, nor Elias, neither that prophet?
      - >-
        Then I took Jaazaniah the son of Jeremiah, the son of Habaziniah, and
        his brethren, and all his sons, and the whole house of the Rechabites;
      - >-
        But he turned, and said unto Peter, Get thee behind me, Satan: thou art
        an offence unto me: for thou savourest not the things that be of God,
        but those that be of men.
pipeline_tag: sentence-similarity
library_name: sentence-transformers

SentenceTransformer based on intfloat/e5-base-v2

This is a sentence-transformers model finetuned from intfloat/e5-base-v2. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: intfloat/e5-base-v2
  • Maximum Sequence Length: 128 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    "John's Witness",
    'And they asked him, and said unto him, Why baptizest thou then, if thou be not that Christ, nor Elias, neither that prophet?',
    'Then I took Jaazaniah the son of Jeremiah, the son of Habaziniah, and his brethren, and all his sons, and the whole house of the Rechabites;',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.7469, 0.7488],
#         [0.7469, 1.0000, 0.8236],
#         [0.7488, 0.8236, 1.0000]])

Training Details

Training Dataset

Unnamed Dataset

  • Size: 70,323 training samples
  • Columns: sentence_0, sentence_1, and label
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 label
    type string string float
    details
    • min: 3 tokens
    • mean: 53.54 tokens
    • max: 128 tokens
    • min: 5 tokens
    • mean: 35.99 tokens
    • max: 85 tokens
    • min: 0.0
    • mean: 0.99
    • max: 1.0
  • Samples:
    sentence_0 sentence_1 label
    Prophecies of Jeremiah | participants: jeremiah_853 In his days Judah shall be saved, and Israel shall dwell safely: and this is his name whereby he shall be called, The Lord Our Righteousness. 1.0
    God: (A.S. and Dutch God; Dan. Gud; Ger. Gott), the name of the Divine Being. It is the rendering (1) of the Hebrew 'El , from a word meaning to be strong; (2) of 'Eloah_, plural _'Elohim . The singular form, Eloah , is used only in poetry. The plural form is more commonly used in all parts of the Bible, The Hebrew word Jehovah (q.v.), the only other word generally employed to denote the Supreme Being, is uniformly rendered in the Authorized Version by "LORD," printed in small capitals. The existence of God is taken for granted in the Bible. There is nowhere any argument to prove it. He who disbelieves this truth is spoken of as one devoid of understanding ( Psalms 14:1 ). The arguments generally adduced by theologians in proof of the being of God are:
  • The a priori argument, which is the testimony afforded by reason.
  • The a posteriori argument, by which we proceed logically from the facts of experience to causes. These arguments are, (a) T...
  • And if ye offer the blind for sacrifice, is it not evil? and if ye offer the lame and sick, is it not evil? offer it now unto thy governor; will he be pleased with thee, or accept thy person? saith the Lord of hosts. 1.0
    Holy Week For in those days shall be affliction, such as was not from the beginning of the creation which God created unto this time, neither shall be. 1.0
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • num_train_epochs: 1
  • max_steps: 50
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 8
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 1
  • max_steps: 50
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Framework Versions

  • Python: 3.11.14
  • Sentence Transformers: 5.2.0
  • Transformers: 4.57.6
  • PyTorch: 2.10.0+cpu
  • Accelerate: 1.12.0
  • Datasets: 4.5.0
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}