Upload e5-base-john-10 embedding model

Browse files

Files changed (11) hide show

1_Pooling/config.json +10 -0
README.md +386 -0
config.json +25 -0
config_sentence_transformers.json +14 -0
model.safetensors +3 -0
modules.json +20 -0
sentence_bert_config.json +4 -0
special_tokens_map.json +37 -0
tokenizer.json +0 -0
tokenizer_config.json +56 -0
vocab.txt +0 -0

1_Pooling/config.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+    "word_embedding_dimension": 768,
+    "pooling_mode_cls_token": false,
+    "pooling_mode_mean_tokens": true,
+    "pooling_mode_max_tokens": false,
+    "pooling_mode_mean_sqrt_len_tokens": false,
+    "pooling_mode_weightedmean_tokens": false,
+    "pooling_mode_lasttoken": false,
+    "include_prompt": true
+}

README.md ADDED Viewed

	@@ -0,0 +1,386 @@

+---
+tags:
+- sentence-transformers
+- sentence-similarity
+- feature-extraction
+- dense
+- generated_from_trainer
+- dataset_size:70323
+- loss:CosineSimilarityLoss
+base_model: intfloat/e5-base-v2
+widget:
+- source_sentence: Suffer me that I may speak; and after that I have spoken, mock
+    on.
+  sentences:
+  - And Peleg lived after he begat Reu two hundred and nine years, and begat sons
+    and daughters.
+  - And to offer a sacrifice according to that which is said in the law of the Lord,
+    A pair of turtledoves, or two young pigeons.
+  - As for me, is my complaint to man? and if it were so, why should not my spirit
+    be troubled?
+- source_sentence: 'Jesus Christ:  anointed, the Greek translation of the Hebrew word
+    rendered "Messiah" (q.v.), the official title of our Lord, occurring five hundred
+    and fourteen times in the New Testament. It denotes that he was anointed or consecrated
+    to his great redemptive work as Prophet, Priest, and King of his people. He is
+    Jesus the Christ (  Acts 17:3  ;   18:5  ;   Matthew 22:42  ), the Anointed One.
+    He is thus spoken of by (  Isaiah 61:1  ), and by (  Daniel 9:24-26  ), who styles
+    him "Messiah the Prince."    The Messiah is the same person as "the seed of the
+    woman" (  Genesis 3:15  ), "the seed of Abraham" (  Genesis 22:18  ), the "Prophet
+    like unto Moses" (  Deuteronomy 18:15  ), "the priest after the order of Melchizedek"
+    (  Psalms 110:4  ), "the rod out of the stem of Jesse" (  Isaiah 11:1    Isaiah
+    11:10  ), the "Immanuel," the virgin''s son (  Isaiah 7:14  ), "the branch of
+    Jehovah" (  Isaiah 4:2  ), and "the messenger of the covenant" (  Malachi 3:1  ).
+    This is he "of whom Moses in the law and the prophets did write." The Old Testament
+    Scripture is full of prophetic declarations regarding the Great Deliverer and
+    the work he was to accomplish. Jesus the Christ is Jesus the Great Deliverer,
+    the Anointed One, the Saviour of men. This name denotes that Jesus was divinely
+    appointed, commissioned, and accredited as the Saviour of men (  Hebrews 5:4  ;   Isaiah
+    11:2-4  ;   49:6  ;   John 5:37  ;   Acts 2:22  ).    To believe that "Jesus is
+    the Christ" is to believe that he is the Anointed, the Messiah of the prophets,
+    the Saviour sent of God, that he was, in a word, what he claimed to be. This is
+    to believe the gospel, by the faith of which alone men can be brought unto God.
+    That Jesus is the Christ is the testimony of God, and the faith of this constitutes
+    a Christian (  1&nbsp;Corinthians 12:3  ;   1&nbsp;John 5:1  ).'
+  sentences:
+  - 'And he took thereof in his hands, and went on eating, and came to his father
+    and mother, and he gave them, and they did eat: but he told not them that he had
+    taken the honey out of the carcase of the lion.'
+  - 'And Jesus said unto him, Forbid him not: for he that is not against us is for
+    us.'
+  - And thou shalt put it under the compass of the altar beneath, that the net may
+    be even to the midst of the altar.
+- source_sentence: And, behold, seven thin ears and blasted with the east wind sprung
+    up after them.
+  sentences:
+  - When they were but a few men in number; yea, very few, and strangers in it.
+  - Till the Lord look down, and behold from heaven.
+  - And the seven thin ears devoured the seven rank and full ears. And Pharaoh awoke,
+    and, behold, it was a dream.
+- source_sentence: 'And he shall dwell in that city, until he stand before the congregation
+    for judgment, and until the death of the high priest that shall be in those days:
+    then shall the slayer return, and come unto his own city, and unto his own house,
+    unto the city from whence he fled.'
+  sentences:
+  - And they appointed Kedesh in Galilee in mount Naphtali, and Shechem in mount Ephraim,
+    and Kirjatharba, which is Hebron, in the mountain of Judah.
+  - 'For the time past of our life may suffice us to have wrought the will of the
+    Gentiles, when we walked in lasciviousness, lusts, excess of wine, revellings,
+    banquetings, and abominable idolatries:'
+  - Where are the gods of Hamath and Arphad? where are the gods of Sepharvaim? and
+    have they delivered Samaria out of my hand?
+- source_sentence: Gath
+  sentences:
+  - And the cities which the Philistines had taken from Israel were restored to Israel,
+    from Ekron even unto Gath; and the coasts thereof did Israel deliver out of the
+    hands of the Philistines. And there was peace between Israel and the Amorites.
+  - And as we tarried there many days, there came down from Judaea a certain prophet,
+    named Agabus.
+  - And the priests consented to receive no more money of the people, neither to repair
+    the breaches of the house.
+pipeline_tag: sentence-similarity
+library_name: sentence-transformers
+---
+# SentenceTransformer based on intfloat/e5-base-v2
+This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [intfloat/e5-base-v2](https://huggingface.co/intfloat/e5-base-v2). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
+## Model Details
+### Model Description
+- **Model Type:** Sentence Transformer
+- **Base model:** [intfloat/e5-base-v2](https://huggingface.co/intfloat/e5-base-v2) <!-- at revision f52bf8ec8c7124536f0efb74aca902b2995e5bcd -->
+- **Maximum Sequence Length:** 128 tokens
+- **Output Dimensionality:** 768 dimensions
+- **Similarity Function:** Cosine Similarity
+<!-- - **Training Dataset:** Unknown -->
+<!-- - **Language:** Unknown -->
+<!-- - **License:** Unknown -->
+### Model Sources
+- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
+- **Repository:** [Sentence Transformers on GitHub](https://github.com/huggingface/sentence-transformers)
+- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
+### Full Model Architecture
+```
+SentenceTransformer(
+  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False, 'architecture': 'BertModel'})
+  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
+  (2): Normalize()
+)
+```
+## Usage
+### Direct Usage (Sentence Transformers)
+First install the Sentence Transformers library:
+```bash
+pip install -U sentence-transformers
+```
+Then you can load this model and run inference.
+```python
+from sentence_transformers import SentenceTransformer
+# Download from the 🤗 Hub
+model = SentenceTransformer("sentence_transformers_model_id")
+# Run inference
+sentences = [
+    'Gath',
+    'And the cities which the Philistines had taken from Israel were restored to Israel, from Ekron even unto Gath; and the coasts thereof did Israel deliver out of the hands of the Philistines. And there was peace between Israel and the Amorites.',
+    'And as we tarried there many days, there came down from Judaea a certain prophet, named Agabus.',
+]
+embeddings = model.encode(sentences)
+print(embeddings.shape)
+# [3, 768]
+# Get the similarity scores for the embeddings
+similarities = model.similarity(embeddings, embeddings)
+print(similarities)
+# tensor([[1.0000, 0.7385, 0.7175],
+#         [0.7385, 1.0000, 0.7856],
+#         [0.7175, 0.7856, 1.0000]])
+```
+<!--
+### Direct Usage (Transformers)
+<details><summary>Click to see the direct usage in Transformers</summary>
+</details>
+-->
+<!--
+### Downstream Usage (Sentence Transformers)
+You can finetune this model on your own dataset.
+<details><summary>Click to expand</summary>
+</details>
+-->
+<!--
+### Out-of-Scope Use
+*List how the model may foreseeably be misused and address what users ought not to do with the model.*
+-->
+<!--
+## Bias, Risks and Limitations
+*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
+-->
+<!--
+### Recommendations
+*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
+-->
+## Training Details
+### Training Dataset
+#### Unnamed Dataset
+* Size: 70,323 training samples
+* Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
+* Approximate statistics based on the first 1000 samples:
+  |         | sentence_0                                                                         | sentence_1                                                                        | label                                                          |
+  |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------|
+  | type    | string                                                                             | string                                                                            | float                                                          |
+  | details | <ul><li>min: 3 tokens</li><li>mean: 55.11 tokens</li><li>max: 128 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 35.91 tokens</li><li>max: 91 tokens</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.99</li><li>max: 1.0</li></ul> |
+* Samples:
+  | sentence_0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | sentence_1                                                                                                                                                                            | label            |
+  |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|
+  | <code>The family of the house of Levi apart, and their wives apart; the family of Shimei apart, and their wives apart;</code>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | <code>All the families that remain, every family apart, and their wives apart.</code>                                                                                                 | <code>1.0</code> |
+  | <code>And I will make thee to pass with thine enemies into a land which thou knowest not: for a fire is kindled in mine anger, which shall burn upon you.</code>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | <code>O Lord, thou knowest: remember me, and visit me, and revenge me of my persecutors; take me not away in thy longsuffering: know that for thy sake I have suffered rebuke.</code> | <code>1.0</code> |
+  | <code>God:  (A.S. and Dutch God; Dan. Gud; Ger. Gott), the name of the Divine Being. It is the rendering (1) of the Hebrew <i> 'El</i> , from a word meaning to be strong; (2) of <i> 'Eloah_, plural _'Elohim</i> . The singular form, <i> Eloah</i> , is used only in poetry. The plural form is more commonly used in all parts of the Bible, The Hebrew word Jehovah (q.v.), the only other word generally employed to denote the Supreme Being, is uniformly rendered in the Authorized Version by "LORD," printed in small capitals. The existence of God is taken for granted in the Bible. There is nowhere any argument to prove it. He who disbelieves this truth is spoken of as one devoid of understanding (  Psalms 14:1  ).    The arguments generally adduced by theologians in proof of the being of God are:   <li> The a priori argument, which is the testimony afforded by reason.    <li> The a posteriori argument, by which we proceed logically from the facts of experience to causes. These arguments are,    (a) T...</code> | <code>Thou hast forsaken me, saith the Lord, thou art gone backward: therefore will I stretch out my hand against thee, and destroy thee; I am weary with repenting.</code>           | <code>1.0</code> |
+* Loss: [<code>CosineSimilarityLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosinesimilarityloss) with these parameters:
+  ```json
+  {
+      "loss_fct": "torch.nn.modules.loss.MSELoss"
+  }
+  ```
+### Training Hyperparameters
+#### Non-Default Hyperparameters
+- `num_train_epochs`: 1
+- `max_steps`: 10
+- `multi_dataset_batch_sampler`: round_robin
+#### All Hyperparameters
+<details><summary>Click to expand</summary>
+- `overwrite_output_dir`: False
+- `do_predict`: False
+- `eval_strategy`: no
+- `prediction_loss_only`: True
+- `per_device_train_batch_size`: 8
+- `per_device_eval_batch_size`: 8
+- `per_gpu_train_batch_size`: None
+- `per_gpu_eval_batch_size`: None
+- `gradient_accumulation_steps`: 1
+- `eval_accumulation_steps`: None
+- `torch_empty_cache_steps`: None
+- `learning_rate`: 5e-05
+- `weight_decay`: 0.0
+- `adam_beta1`: 0.9
+- `adam_beta2`: 0.999
+- `adam_epsilon`: 1e-08
+- `max_grad_norm`: 1
+- `num_train_epochs`: 1
+- `max_steps`: 10
+- `lr_scheduler_type`: linear
+- `lr_scheduler_kwargs`: None
+- `warmup_ratio`: 0.0
+- `warmup_steps`: 0
+- `log_level`: passive
+- `log_level_replica`: warning
+- `log_on_each_node`: True
+- `logging_nan_inf_filter`: True
+- `save_safetensors`: True
+- `save_on_each_node`: False
+- `save_only_model`: False
+- `restore_callback_states_from_checkpoint`: False
+- `no_cuda`: False
+- `use_cpu`: False
+- `use_mps_device`: False
+- `seed`: 42
+- `data_seed`: None
+- `jit_mode_eval`: False
+- `bf16`: False
+- `fp16`: False
+- `fp16_opt_level`: O1
+- `half_precision_backend`: auto
+- `bf16_full_eval`: False
+- `fp16_full_eval`: False
+- `tf32`: None
+- `local_rank`: 0
+- `ddp_backend`: None
+- `tpu_num_cores`: None
+- `tpu_metrics_debug`: False
+- `debug`: []
+- `dataloader_drop_last`: False
+- `dataloader_num_workers`: 0
+- `dataloader_prefetch_factor`: None
+- `past_index`: -1
+- `disable_tqdm`: False
+- `remove_unused_columns`: True
+- `label_names`: None
+- `load_best_model_at_end`: False
+- `ignore_data_skip`: False
+- `fsdp`: []
+- `fsdp_min_num_params`: 0
+- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
+- `fsdp_transformer_layer_cls_to_wrap`: None
+- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
+- `parallelism_config`: None
+- `deepspeed`: None
+- `label_smoothing_factor`: 0.0
+- `optim`: adamw_torch_fused
+- `optim_args`: None
+- `adafactor`: False
+- `group_by_length`: False
+- `length_column_name`: length
+- `project`: huggingface
+- `trackio_space_id`: trackio
+- `ddp_find_unused_parameters`: None
+- `ddp_bucket_cap_mb`: None
+- `ddp_broadcast_buffers`: False
+- `dataloader_pin_memory`: True
+- `dataloader_persistent_workers`: False
+- `skip_memory_metrics`: True
+- `use_legacy_prediction_loop`: False
+- `push_to_hub`: False
+- `resume_from_checkpoint`: None
+- `hub_model_id`: None
+- `hub_strategy`: every_save
+- `hub_private_repo`: None
+- `hub_always_push`: False
+- `hub_revision`: None
+- `gradient_checkpointing`: False
+- `gradient_checkpointing_kwargs`: None
+- `include_inputs_for_metrics`: False
+- `include_for_metrics`: []
+- `eval_do_concat_batches`: True
+- `fp16_backend`: auto
+- `push_to_hub_model_id`: None
+- `push_to_hub_organization`: None
+- `mp_parameters`:
+- `auto_find_batch_size`: False
+- `full_determinism`: False
+- `torchdynamo`: None
+- `ray_scope`: last
+- `ddp_timeout`: 1800
+- `torch_compile`: False
+- `torch_compile_backend`: None
+- `torch_compile_mode`: None
+- `include_tokens_per_second`: False
+- `include_num_input_tokens_seen`: no
+- `neftune_noise_alpha`: None
+- `optim_target_modules`: None
+- `batch_eval_metrics`: False
+- `eval_on_start`: False
+- `use_liger_kernel`: False
+- `liger_kernel_config`: None
+- `eval_use_gather_object`: False
+- `average_tokens_across_devices`: True
+- `prompts`: None
+- `batch_sampler`: batch_sampler
+- `multi_dataset_batch_sampler`: round_robin
+- `router_mapping`: {}
+- `learning_rate_mapping`: {}
+</details>
+### Framework Versions
+- Python: 3.13.11
+- Sentence Transformers: 5.2.0
+- Transformers: 4.57.6
+- PyTorch: 2.10.0+cpu
+- Accelerate: 1.12.0
+- Datasets: 4.5.0
+- Tokenizers: 0.22.2
+## Citation
+### BibTeX
+#### Sentence Transformers
+```bibtex
+@inproceedings{reimers-2019-sentence-bert,
+    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
+    author = "Reimers, Nils and Gurevych, Iryna",
+    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
+    month = "11",
+    year = "2019",
+    publisher = "Association for Computational Linguistics",
+    url = "https://arxiv.org/abs/1908.10084",
+}
+```
+<!--
+## Glossary
+*Clearly define terms in order to be accessible across audiences.*
+-->
+<!--
+## Model Card Authors
+*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
+-->
+<!--
+## Model Card Contact
+*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
+-->

config.json ADDED Viewed

	@@ -0,0 +1,25 @@

+{
+  "architectures": [
+    "BertModel"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "classifier_dropout": null,
+  "dtype": "float32",
+  "gradient_checkpointing": false,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 768,
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "layer_norm_eps": 1e-12,
+  "max_position_embeddings": 512,
+  "model_type": "bert",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 12,
+  "pad_token_id": 0,
+  "position_embedding_type": "absolute",
+  "transformers_version": "4.57.6",
+  "type_vocab_size": 2,
+  "use_cache": true,
+  "vocab_size": 30522
+}

config_sentence_transformers.json ADDED Viewed

	@@ -0,0 +1,14 @@

+{
+  "model_type": "SentenceTransformer",
+  "__version__": {
+    "sentence_transformers": "5.2.0",
+    "transformers": "4.57.6",
+    "pytorch": "2.10.0+cpu"
+  },
+  "prompts": {
+    "query": "",
+    "document": ""
+  },
+  "default_prompt_name": null,
+  "similarity_fn_name": "cosine"
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7e88ef30926d092005c35686915d4e9450c9292a5bb2c4d5f10ccae7374de5eb
+size 437951328

modules.json ADDED Viewed

	@@ -0,0 +1,20 @@

+[
+  {
+    "idx": 0,
+    "name": "0",
+    "path": "",
+    "type": "sentence_transformers.models.Transformer"
+  },
+  {
+    "idx": 1,
+    "name": "1",
+    "path": "1_Pooling",
+    "type": "sentence_transformers.models.Pooling"
+  },
+  {
+    "idx": 2,
+    "name": "2",
+    "path": "2_Normalize",
+    "type": "sentence_transformers.models.Normalize"
+  }
+]

sentence_bert_config.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+    "max_seq_length": 128,
+    "do_lower_case": false
+}

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,37 @@

+{
+  "cls_token": {
+    "content": "[CLS]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "mask_token": {
+    "content": "[MASK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "[PAD]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "sep_token": {
+    "content": "[SEP]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "[UNK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,56 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "101": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "102": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "103": {
+      "content": "[MASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "[CLS]",
+  "do_lower_case": true,
+  "extra_special_tokens": {},
+  "mask_token": "[MASK]",
+  "model_max_length": 512,
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "BertTokenizer",
+  "unk_token": "[UNK]"
+}

vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff