e5-base-john / README.md
dpshade22's picture
Upload e5-base-john embedding model
ffd4367 verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - dense
  - generated_from_trainer
  - dataset_size:2633
  - loss:CosineSimilarityLoss
base_model: intfloat/e5-base-v2
widget:
  - source_sentence: >-
      Many therefore of his disciples, when they had heard this, said, This is
      an hard saying; who can hear it?
    sentences:
      - >-
        If ye keep my commandments, ye shall abide in my love; even as I have
        kept my Father's commandments, and abide in his love.
      - >-
        When Jesus knew in himself that his disciples murmured at it, he said
        unto them, Doth this offend you?
      - >-
        He said, I am the voice of one crying in the wilderness, Make straight
        the way of the Lord, as said the prophet Esaias.
  - source_sentence: 'Jesus and Nicodemus | participants: jesus_905, nicodemus_2204'
    sentences:
      - >-
        And as Moses lifted up the serpent in the wilderness, even so must the
        Son of man be lifted up:
      - >-
        Then when Mary was come where Jesus was, and saw him, she fell down at
        his feet, saying unto him, Lord, if thou hadst been here, my brother had
        not died.
      - >-
        They answered him, Jesus of Nazareth. Jesus saith unto them, I am he.
        And Judas also, which betrayed him, stood with them.
  - source_sentence: >-
      For he whom God hath sent speaketh the words of God: for God giveth not
      the Spirit by measure unto him.
    sentences:
      - Then said Jesus unto the twelve, Will ye also go away?
      - The Father loveth the Son, and hath given all things into his hand.
      - >-
        Why askest thou me? ask them which heard me, what I have said unto them:
        behold, they know what I said.
  - source_sentence: >-
      Lazarus Raised form the Dead | participants: jesus_905, mary_1939,
      lazarus_1812
    sentences:
      - But he saith unto them, It is I; be not afraid.
      - >-
        But some of them went their ways to the Pharisees, and told them what
        things Jesus had done.
      - >-
        Jesus answered and said unto them, Destroy this temple, and in three
        days I will raise it up.
  - source_sentence: >-
      God:  (A.S. and Dutch God; Dan. Gud; Ger. Gott), the name of the Divine
      Being. It is the rendering (1) of the Hebrew <i> 'El</i> , from a word
      meaning to be strong; (2) of <i> 'Eloah_, plural _'Elohim</i> . The
      singular form, <i> Eloah</i> , is used only in poetry. The plural form is
      more commonly used in all parts of the Bible, The Hebrew word Jehovah
      (q.v.), the only other word generally employed to denote the Supreme
      Being, is uniformly rendered in the Authorized Version by "LORD," printed
      in small capitals. The existence of God is taken for granted in the Bible.
      There is nowhere any argument to prove it. He who disbelieves this truth
      is spoken of as one devoid of understanding (  Psalms 14:1  ).    The
      arguments generally adduced by theologians in proof of the being of God
      are:   <li> The a priori argument, which is the testimony afforded by
      reason.    <li> The a posteriori argument, by which we proceed logically
      from the facts of experience to causes. These arguments are,    (a) The
      cosmological, by which it is proved that there must be a First Cause of
      all things, for every effect must have a cause.   (b) The teleological, or
      the argument from design. We see everywhere the operations of an
      intelligent Cause in nature.   (c) The moral argument, called also the
      anthropological argument, based on the moral consciousness and the history
      of mankind, which exhibits a moral order and purpose which can only be
      explained on the supposition of the existence of God. Conscience and human
      history testify that "verily there is a God that judgeth in the earth."  
      The attributes of God are set forth in order by Moses in   Exodus 34:6  
      Exodus 34:7  . (see also   Deuteronomy 6:4  ;   10:17  ;   Numbers 16:22 
      ;   Exodus 15:11  ;   33:19  ;   Isaiah 44:6  ;   Habakkuk 3:6  ;   Psalms
      102:26  ;   Job 34:12  .) They are also systematically classified in  
      Revelation 5:12   and   7:12  .    God's attributes are spoken of by some
      as absolute, i.e., such as belong to his essence as Jehovah, Jah, etc.;
      and relative, i.e., such as are ascribed to him with relation to his
      creatures. Others distinguish them into communicable, i.e., those which
      can be imparted in degree to his creatures: goodness, holiness, wisdom,
      etc.; and incommunicable, which cannot be so imparted: independence,
      immutability, immensity, and eternity. They are by some also divided into
      natural attributes, eternity, immensity, etc.; and moral, holiness,
      goodness, etc.
    sentences:
      - As he spake these words, many believed on him.
      - >-
        Jesus said unto them, If God were your Father, ye would love me: for I
        proceeded forth and came from God; neither came I of myself, but he sent
        me.
      - >-
        Jesus answered them, I told you, and ye believed not: the works that I
        do in my Father's name, they bear witness of me.
pipeline_tag: sentence-similarity
library_name: sentence-transformers

SentenceTransformer based on intfloat/e5-base-v2

This is a sentence-transformers model finetuned from intfloat/e5-base-v2. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: intfloat/e5-base-v2
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'God:  (A.S. and Dutch God; Dan. Gud; Ger. Gott), the name of the Divine Being. It is the rendering (1) of the Hebrew <i> \'El</i> , from a word meaning to be strong; (2) of <i> \'Eloah_, plural _\'Elohim</i> . The singular form, <i> Eloah</i> , is used only in poetry. The plural form is more commonly used in all parts of the Bible, The Hebrew word Jehovah (q.v.), the only other word generally employed to denote the Supreme Being, is uniformly rendered in the Authorized Version by "LORD," printed in small capitals. The existence of God is taken for granted in the Bible. There is nowhere any argument to prove it. He who disbelieves this truth is spoken of as one devoid of understanding (  Psalms 14:1  ).    The arguments generally adduced by theologians in proof of the being of God are:   <li> The a priori argument, which is the testimony afforded by reason.    <li> The a posteriori argument, by which we proceed logically from the facts of experience to causes. These arguments are,    (a) The cosmological, by which it is proved that there must be a First Cause of all things, for every effect must have a cause.   (b) The teleological, or the argument from design. We see everywhere the operations of an intelligent Cause in nature.   (c) The moral argument, called also the anthropological argument, based on the moral consciousness and the history of mankind, which exhibits a moral order and purpose which can only be explained on the supposition of the existence of God. Conscience and human history testify that "verily there is a God that judgeth in the earth."   The attributes of God are set forth in order by Moses in   Exodus 34:6   Exodus 34:7  . (see also   Deuteronomy 6:4  ;   10:17  ;   Numbers 16:22  ;   Exodus 15:11  ;   33:19  ;   Isaiah 44:6  ;   Habakkuk 3:6  ;   Psalms 102:26  ;   Job 34:12  .) They are also systematically classified in   Revelation 5:12   and   7:12  .    God\'s attributes are spoken of by some as absolute, i.e., such as belong to his essence as Jehovah, Jah, etc.; and relative, i.e., such as are ascribed to him with relation to his creatures. Others distinguish them into communicable, i.e., those which can be imparted in degree to his creatures: goodness, holiness, wisdom, etc.; and incommunicable, which cannot be so imparted: independence, immutability, immensity, and eternity. They are by some also divided into natural attributes, eternity, immensity, etc.; and moral, holiness, goodness, etc.',
    'Jesus said unto them, If God were your Father, ye would love me: for I proceeded forth and came from God; neither came I of myself, but he sent me.',
    'As he spake these words, many believed on him.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.7557, 0.7462],
#         [0.7557, 1.0000, 0.7852],
#         [0.7462, 0.7852, 1.0000]])

Training Details

Training Dataset

Unnamed Dataset

  • Size: 2,633 training samples
  • Columns: sentence_0, sentence_1, and label
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 label
    type string string float
    details
    • min: 3 tokens
    • mean: 81.92 tokens
    • max: 256 tokens
    • min: 9 tokens
    • mean: 30.06 tokens
    • max: 73 tokens
    • min: 1.0
    • mean: 1.0
    • max: 1.0
  • Samples:
    sentence_0 sentence_1 label
    God: (A.S. and Dutch God; Dan. Gud; Ger. Gott), the name of the Divine Being. It is the rendering (1) of the Hebrew 'El , from a word meaning to be strong; (2) of 'Eloah_, plural _'Elohim . The singular form, Eloah , is used only in poetry. The plural form is more commonly used in all parts of the Bible, The Hebrew word Jehovah (q.v.), the only other word generally employed to denote the Supreme Being, is uniformly rendered in the Authorized Version by "LORD," printed in small capitals. The existence of God is taken for granted in the Bible. There is nowhere any argument to prove it. He who disbelieves this truth is spoken of as one devoid of understanding ( Psalms 14:1 ). The arguments generally adduced by theologians in proof of the being of God are:
  • The a priori argument, which is the testimony afforded by reason.
  • The a posteriori argument, by which we proceed logically from the facts of experience to causes. These arguments are, (a) T...
  • For as the Father hath life in himself; so hath he given to the Son to have life in himself; 1.0
    Bread of Life Sermon | participants: jesus_905, peter_2745 Jesus therefore answered and said unto them, Murmur not among yourselves. 1.0
    Verily, verily, I say unto thee, When thou wast young, thou girdest thyself, and walkedst whither thou wouldest: but when thou shalt be old, thou shalt stretch forth thy hands, and another shall gird thee, and carry thee whither thou wouldest not. This spake he, signifying by what death he should glorify God. And when he had spoken this, he saith unto him, Follow me. 1.0
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 1
  • max_steps: 5
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 1
  • max_steps: 5
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Framework Versions

  • Python: 3.13.11
  • Sentence Transformers: 5.2.0
  • Transformers: 4.57.6
  • PyTorch: 2.10.0+cpu
  • Accelerate: 1.12.0
  • Datasets: 4.5.0
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}