e5-base-john-10 / README.md
dpshade22's picture
Upload e5-base-john-10 embedding model
c63a053 verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - dense
  - generated_from_trainer
  - dataset_size:70323
  - loss:CosineSimilarityLoss
base_model: intfloat/e5-base-v2
widget:
  - source_sentence: Suffer me that I may speak; and after that I have spoken, mock on.
    sentences:
      - >-
        And Peleg lived after he begat Reu two hundred and nine years, and begat
        sons and daughters.
      - >-
        And to offer a sacrifice according to that which is said in the law of
        the Lord, A pair of turtledoves, or two young pigeons.
      - >-
        As for me, is my complaint to man? and if it were so, why should not my
        spirit be troubled?
  - source_sentence: >-
      Jesus Christ:  anointed, the Greek translation of the Hebrew word rendered
      "Messiah" (q.v.), the official title of our Lord, occurring five hundred
      and fourteen times in the New Testament. It denotes that he was anointed
      or consecrated to his great redemptive work as Prophet, Priest, and King
      of his people. He is Jesus the Christ (  Acts 17:3  ;   18:5  ;   Matthew
      22:42  ), the Anointed One. He is thus spoken of by (  Isaiah 61:1  ), and
      by (  Daniel 9:24-26  ), who styles him "Messiah the Prince."    The
      Messiah is the same person as "the seed of the woman" (  Genesis 3:15  ),
      "the seed of Abraham" (  Genesis 22:18  ), the "Prophet like unto Moses"
      (  Deuteronomy 18:15  ), "the priest after the order of Melchizedek" ( 
      Psalms 110:4  ), "the rod out of the stem of Jesse" (  Isaiah 11:1   
      Isaiah 11:10  ), the "Immanuel," the virgin's son (  Isaiah 7:14  ), "the
      branch of Jehovah" (  Isaiah 4:2  ), and "the messenger of the covenant"
      (  Malachi 3:1  ). This is he "of whom Moses in the law and the prophets
      did write." The Old Testament Scripture is full of prophetic declarations
      regarding the Great Deliverer and the work he was to accomplish. Jesus the
      Christ is Jesus the Great Deliverer, the Anointed One, the Saviour of men.
      This name denotes that Jesus was divinely appointed, commissioned, and
      accredited as the Saviour of men (  Hebrews 5:4  ;   Isaiah 11:2-4  ;  
      49:6  ;   John 5:37  ;   Acts 2:22  ).    To believe that "Jesus is the
      Christ" is to believe that he is the Anointed, the Messiah of the
      prophets, the Saviour sent of God, that he was, in a word, what he claimed
      to be. This is to believe the gospel, by the faith of which alone men can
      be brought unto God. That Jesus is the Christ is the testimony of God, and
      the faith of this constitutes a Christian (  1 Corinthians 12:3  ;  
      1 John 5:1  ).
    sentences:
      - >-
        And he took thereof in his hands, and went on eating, and came to his
        father and mother, and he gave them, and they did eat: but he told not
        them that he had taken the honey out of the carcase of the lion.
      - >-
        And Jesus said unto him, Forbid him not: for he that is not against us
        is for us.
      - >-
        And thou shalt put it under the compass of the altar beneath, that the
        net may be even to the midst of the altar.
  - source_sentence: >-
      And, behold, seven thin ears and blasted with the east wind sprung up
      after them.
    sentences:
      - >-
        When they were but a few men in number; yea, very few, and strangers in
        it.
      - Till the Lord look down, and behold from heaven.
      - >-
        And the seven thin ears devoured the seven rank and full ears. And
        Pharaoh awoke, and, behold, it was a dream.
  - source_sentence: >-
      And he shall dwell in that city, until he stand before the congregation
      for judgment, and until the death of the high priest that shall be in
      those days: then shall the slayer return, and come unto his own city, and
      unto his own house, unto the city from whence he fled.
    sentences:
      - >-
        And they appointed Kedesh in Galilee in mount Naphtali, and Shechem in
        mount Ephraim, and Kirjatharba, which is Hebron, in the mountain of
        Judah.
      - >-
        For the time past of our life may suffice us to have wrought the will of
        the Gentiles, when we walked in lasciviousness, lusts, excess of wine,
        revellings, banquetings, and abominable idolatries:
      - >-
        Where are the gods of Hamath and Arphad? where are the gods of
        Sepharvaim? and have they delivered Samaria out of my hand?
  - source_sentence: Gath
    sentences:
      - >-
        And the cities which the Philistines had taken from Israel were restored
        to Israel, from Ekron even unto Gath; and the coasts thereof did Israel
        deliver out of the hands of the Philistines. And there was peace between
        Israel and the Amorites.
      - >-
        And as we tarried there many days, there came down from Judaea a certain
        prophet, named Agabus.
      - >-
        And the priests consented to receive no more money of the people,
        neither to repair the breaches of the house.
pipeline_tag: sentence-similarity
library_name: sentence-transformers

SentenceTransformer based on intfloat/e5-base-v2

This is a sentence-transformers model finetuned from intfloat/e5-base-v2. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: intfloat/e5-base-v2
  • Maximum Sequence Length: 128 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'Gath',
    'And the cities which the Philistines had taken from Israel were restored to Israel, from Ekron even unto Gath; and the coasts thereof did Israel deliver out of the hands of the Philistines. And there was peace between Israel and the Amorites.',
    'And as we tarried there many days, there came down from Judaea a certain prophet, named Agabus.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.7385, 0.7175],
#         [0.7385, 1.0000, 0.7856],
#         [0.7175, 0.7856, 1.0000]])

Training Details

Training Dataset

Unnamed Dataset

  • Size: 70,323 training samples
  • Columns: sentence_0, sentence_1, and label
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 label
    type string string float
    details
    • min: 3 tokens
    • mean: 55.11 tokens
    • max: 128 tokens
    • min: 8 tokens
    • mean: 35.91 tokens
    • max: 91 tokens
    • min: 0.0
    • mean: 0.99
    • max: 1.0
  • Samples:
    sentence_0 sentence_1 label
    The family of the house of Levi apart, and their wives apart; the family of Shimei apart, and their wives apart; All the families that remain, every family apart, and their wives apart. 1.0
    And I will make thee to pass with thine enemies into a land which thou knowest not: for a fire is kindled in mine anger, which shall burn upon you. O Lord, thou knowest: remember me, and visit me, and revenge me of my persecutors; take me not away in thy longsuffering: know that for thy sake I have suffered rebuke. 1.0
    God: (A.S. and Dutch God; Dan. Gud; Ger. Gott), the name of the Divine Being. It is the rendering (1) of the Hebrew 'El , from a word meaning to be strong; (2) of 'Eloah_, plural _'Elohim . The singular form, Eloah , is used only in poetry. The plural form is more commonly used in all parts of the Bible, The Hebrew word Jehovah (q.v.), the only other word generally employed to denote the Supreme Being, is uniformly rendered in the Authorized Version by "LORD," printed in small capitals. The existence of God is taken for granted in the Bible. There is nowhere any argument to prove it. He who disbelieves this truth is spoken of as one devoid of understanding ( Psalms 14:1 ). The arguments generally adduced by theologians in proof of the being of God are:
  • The a priori argument, which is the testimony afforded by reason.
  • The a posteriori argument, by which we proceed logically from the facts of experience to causes. These arguments are, (a) T...
  • Thou hast forsaken me, saith the Lord, thou art gone backward: therefore will I stretch out my hand against thee, and destroy thee; I am weary with repenting. 1.0
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • num_train_epochs: 1
  • max_steps: 10
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 8
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 1
  • max_steps: 10
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Framework Versions

  • Python: 3.13.11
  • Sentence Transformers: 5.2.0
  • Transformers: 4.57.6
  • PyTorch: 2.10.0+cpu
  • Accelerate: 1.12.0
  • Datasets: 4.5.0
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}