hf-e5-bible-100 / README.md
dpshade22's picture
Upload hf-e5-bible-100 embedding model
464c3b0 verified
---
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- dense
- generated_from_trainer
- dataset_size:262023
- loss:MultipleNegativesRankingLoss
base_model: intfloat/e5-base-v2
widget:
- source_sentence: 'query: Handkerchief: Only once in Authorized Version (Acts 19:12).
The Greek word (sudarion) so rendered means properly “a sweat-cloth.” It is rendered
“napkin” in John 11:44; 20:7; Luke 19:20.'
sentences:
- 'passage: as well as the cloth that had been wrapped around Jesus’ head. The cloth
was still lying in its place, separate from the linen.'
- 'passage: “On that day I will make the clans of Judah like a firepot in a woodpile,
like a flaming torch among sheaves. They will consume all the surrounding peoples
right and left, but Jerusalem will remain intact in her place.'
- 'passage: and the borders of Canaan reached from Sidon toward Gerar as far as
Gaza, and then toward Sodom, Gomorrah, Admah and Zeboyim, as far as Lasha.'
- source_sentence: 'query: what happened to Job'
sentences:
- "passage: Remember, O God, that my life is but a breath;\n my eyes will never\
\ see happiness again."
- 'passage: So he prepared a great feast for them, and after they had finished eating
and drinking, he sent them away, and they returned to their master. So the bands
from Aram stopped raiding Israel’s territory.'
- 'passage: of Ater (through Hezekiah) 98'
- source_sentence: 'query: what happened to Jesus'
sentences:
- 'passage: The Lord wrote on these tablets what he had written before, the Ten
Commandments he had proclaimed to you on the mountain, out of the fire, on the
day of the assembly. And the Lord gave them to me.'
- 'passage: “Make a tree good and its fruit will be good, or make a tree bad and
its fruit will be bad, for a tree is recognized by its fruit.'
- 'passage: So Joshua and his whole army came against them suddenly at the Waters
of Merom and attacked them,'
- source_sentence: 'query: what is Games'
sentences:
- 'passage: In Hebron he reigned over Judah seven years and six months, and in Jerusalem
he reigned over all Israel and Judah thirty-three years.'
- 'passage: Their surrounding villages were Etam, Ain, Rimmon, Token and Ashan—five
towns—'
- 'passage: Fight the good fight of the faith. Take hold of the eternal life to
which you were called when you made your good confession in the presence of many
witnesses.'
- source_sentence: 'query: God: (A.S. and Dutch God; Dan. Gud; Ger. Gott), the name
of the Divine Being. It is the rendering (1) of the Hebrew <i> ''El</i> , from
a word meaning to be strong; (2) of <i> ''Eloah_, plural _''Elohim</i> . The singular
form, <i> Eloah</i> , is used only in poetry. The plural form is more commonly
used in all parts of the Bible, The Hebrew word Jehovah (q.v.), the only other
word generally employed to denote the Supreme Being, is uniformly rendered in
the Authorized Version by "LORD," printed in small capitals. The existence of
God is taken for granted in the Bible. There is nowhere any argument to prove
it. He who disbelieves this truth is spoken of as one devoid of understanding
( Psalms 14:1 ). The arguments generally adduced by theologians in proof
of the being of God are: <li> The a priori argument, which is the testimony
afforded by reason. <li> The a posteriori argument, by which we proceed logically
from the facts of experience to causes. These arguments are, (a) The cosmological,
by which it is proved that there must be a First Cause of all things, for every
effect must have a cause. (b) The teleological, or the argument from design.
We see everywhere the operations of an intelligent Cause in nature. (c) The
moral argument, called also the anthropological argument, based on the moral consciousness
and the history of mankind, which exhibits a moral order and purpose which can
only be explained on the supposition of the existence of God. Conscience and human
history testify that "verily there is a God that judgeth in the earth." The
attributes of God are set forth in order by Moses in Exodus 34:6 Exodus 34:7 .
(see also Deuteronomy 6:4 ; 10:17 ; Numbers 16:22 ; Exodus 15:11 ; 33:19 ; Isaiah
44:6 ; Habakkuk 3:6 ; Psalms 102:26 ; Job 34:12 .) They are also systematically
classified in Revelation 5:12 and 7:12 . God''s attributes are spoken
of by some as absolute, i.e., such as belong to his essence as Jehovah, Jah, etc.;
and relative, i.e., such as are ascribed to him with relation to his creatures.
Others distinguish them into communicable, i.e., those which can be imparted in
degree to his creatures: goodness, holiness, wisdom, etc.; and incommunicable,
which cannot be so imparted: independence, immutability, immensity, and eternity.
They are by some also divided into natural attributes, eternity, immensity, etc.;
and moral, holiness, goodness, etc.'
sentences:
- 'passage: Then each man grabbed his opponent by the head and thrust his dagger
into his opponent’s side, and they fell down together. So that place in Gibeon
was called Helkath Hazzurim.'
- 'passage: and I saw the glory of the God of Israel coming from the east. His voice
was like the roar of rushing waters, and the land was radiant with his glory.'
- "passage: How long, Lord, must I call for help,\n but you do not listen?\n\
Or cry out to you, “Violence!”\n but you do not save?"
pipeline_tag: sentence-similarity
library_name: sentence-transformers
---
# SentenceTransformer based on intfloat/e5-base-v2
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [intfloat/e5-base-v2](https://huggingface.co/intfloat/e5-base-v2). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
## Model Details
### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [intfloat/e5-base-v2](https://huggingface.co/intfloat/e5-base-v2) <!-- at revision f52bf8ec8c7124536f0efb74aca902b2995e5bcd -->
- **Maximum Sequence Length:** 256 tokens
- **Output Dimensionality:** 768 dimensions
- **Similarity Function:** Cosine Similarity
<!-- - **Training Dataset:** Unknown -->
<!-- - **Language:** Unknown -->
<!-- - **License:** Unknown -->
### Model Sources
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/huggingface/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
### Full Model Architecture
```
SentenceTransformer(
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertModel'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
```
## Usage
### Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
```bash
pip install -U sentence-transformers
```
Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'query: God: (A.S. and Dutch God; Dan. Gud; Ger. Gott), the name of the Divine Being. It is the rendering (1) of the Hebrew <i> \'El</i> , from a word meaning to be strong; (2) of <i> \'Eloah_, plural _\'Elohim</i> . The singular form, <i> Eloah</i> , is used only in poetry. The plural form is more commonly used in all parts of the Bible, The Hebrew word Jehovah (q.v.), the only other word generally employed to denote the Supreme Being, is uniformly rendered in the Authorized Version by "LORD," printed in small capitals. The existence of God is taken for granted in the Bible. There is nowhere any argument to prove it. He who disbelieves this truth is spoken of as one devoid of understanding ( Psalms 14:1 ). The arguments generally adduced by theologians in proof of the being of God are: <li> The a priori argument, which is the testimony afforded by reason. <li> The a posteriori argument, by which we proceed logically from the facts of experience to causes. These arguments are, (a) The cosmological, by which it is proved that there must be a First Cause of all things, for every effect must have a cause. (b) The teleological, or the argument from design. We see everywhere the operations of an intelligent Cause in nature. (c) The moral argument, called also the anthropological argument, based on the moral consciousness and the history of mankind, which exhibits a moral order and purpose which can only be explained on the supposition of the existence of God. Conscience and human history testify that "verily there is a God that judgeth in the earth." The attributes of God are set forth in order by Moses in Exodus 34:6 Exodus 34:7 . (see also Deuteronomy 6:4 ; 10:17 ; Numbers 16:22 ; Exodus 15:11 ; 33:19 ; Isaiah 44:6 ; Habakkuk 3:6 ; Psalms 102:26 ; Job 34:12 .) They are also systematically classified in Revelation 5:12 and 7:12 . God\'s attributes are spoken of by some as absolute, i.e., such as belong to his essence as Jehovah, Jah, etc.; and relative, i.e., such as are ascribed to him with relation to his creatures. Others distinguish them into communicable, i.e., those which can be imparted in degree to his creatures: goodness, holiness, wisdom, etc.; and incommunicable, which cannot be so imparted: independence, immutability, immensity, and eternity. They are by some also divided into natural attributes, eternity, immensity, etc.; and moral, holiness, goodness, etc.',
'passage: How long, Lord, must I call for help,\n but you do not listen?\nOr cry out to you, “Violence!”\n but you do not save?',
'passage: Then each man grabbed his opponent by the head and thrust his dagger into his opponent’s side, and they fell down together. So that place in Gibeon was called Helkath Hazzurim.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.4670, 0.3140],
# [0.4670, 1.0000, 0.4137],
# [0.3140, 0.4137, 1.0000]])
```
<!--
### Direct Usage (Transformers)
<details><summary>Click to see the direct usage in Transformers</summary>
</details>
-->
<!--
### Downstream Usage (Sentence Transformers)
You can finetune this model on your own dataset.
<details><summary>Click to expand</summary>
</details>
-->
<!--
### Out-of-Scope Use
*List how the model may foreseeably be misused and address what users ought not to do with the model.*
-->
<!--
## Bias, Risks and Limitations
*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
-->
<!--
### Recommendations
*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
-->
## Training Details
### Training Dataset
#### Unnamed Dataset
* Size: 262,023 training samples
* Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
* Approximate statistics based on the first 1000 samples:
| | sentence_0 | sentence_1 | label |
|:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:--------------------------------------------------------------|
| type | string | string | float |
| details | <ul><li>min: 5 tokens</li><li>mean: 27.82 tokens</li><li>max: 256 tokens</li></ul> | <ul><li>min: 9 tokens</li><li>mean: 35.93 tokens</li><li>max: 87 tokens</li></ul> | <ul><li>min: 1.0</li><li>mean: 1.0</li><li>max: 1.0</li></ul> |
* Samples:
| sentence_0 | sentence_1 | label |
|:----------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|
| <code>query: To those who sold doves he said, “Get these out of here! Stop turning my Father’s house into a market!”</code> | <code>passage: His disciples remembered that it is written: “Zeal for your house will consume me.”</code> | <code>1.0</code> |
| <code>query: Joseph (son of Jacob)</code> | <code>passage: Joseph found favor in his eyes and became his attendant. Potiphar put him in charge of his household, and he entrusted to his care everything he owned.</code> | <code>1.0</code> |
| <code>query: Divination meaning</code> | <code>passage: He sacrificed his children in the fire in the Valley of Ben Hinnom, practiced divination and witchcraft, sought omens, and consulted mediums and spiritists. He did much evil in the eyes of the Lord, arousing his anger.</code> | <code>1.0</code> |
* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
```json
{
"scale": 20.0,
"similarity_fct": "cos_sim",
"gather_across_devices": false
}
```
### Training Hyperparameters
#### Non-Default Hyperparameters
- `per_device_train_batch_size`: 32
- `per_device_eval_batch_size`: 32
- `num_train_epochs`: 1
- `max_steps`: 100
- `multi_dataset_batch_sampler`: round_robin
#### All Hyperparameters
<details><summary>Click to expand</summary>
- `overwrite_output_dir`: False
- `do_predict`: False
- `eval_strategy`: no
- `prediction_loss_only`: True
- `per_device_train_batch_size`: 32
- `per_device_eval_batch_size`: 32
- `per_gpu_train_batch_size`: None
- `per_gpu_eval_batch_size`: None
- `gradient_accumulation_steps`: 1
- `eval_accumulation_steps`: None
- `torch_empty_cache_steps`: None
- `learning_rate`: 5e-05
- `weight_decay`: 0.0
- `adam_beta1`: 0.9
- `adam_beta2`: 0.999
- `adam_epsilon`: 1e-08
- `max_grad_norm`: 1
- `num_train_epochs`: 1
- `max_steps`: 100
- `lr_scheduler_type`: linear
- `lr_scheduler_kwargs`: None
- `warmup_ratio`: 0.0
- `warmup_steps`: 0
- `log_level`: passive
- `log_level_replica`: warning
- `log_on_each_node`: True
- `logging_nan_inf_filter`: True
- `save_safetensors`: True
- `save_on_each_node`: False
- `save_only_model`: False
- `restore_callback_states_from_checkpoint`: False
- `no_cuda`: False
- `use_cpu`: False
- `use_mps_device`: False
- `seed`: 42
- `data_seed`: None
- `jit_mode_eval`: False
- `bf16`: False
- `fp16`: False
- `fp16_opt_level`: O1
- `half_precision_backend`: auto
- `bf16_full_eval`: False
- `fp16_full_eval`: False
- `tf32`: None
- `local_rank`: 0
- `ddp_backend`: None
- `tpu_num_cores`: None
- `tpu_metrics_debug`: False
- `debug`: []
- `dataloader_drop_last`: False
- `dataloader_num_workers`: 0
- `dataloader_prefetch_factor`: None
- `past_index`: -1
- `disable_tqdm`: False
- `remove_unused_columns`: True
- `label_names`: None
- `load_best_model_at_end`: False
- `ignore_data_skip`: False
- `fsdp`: []
- `fsdp_min_num_params`: 0
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
- `fsdp_transformer_layer_cls_to_wrap`: None
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
- `parallelism_config`: None
- `deepspeed`: None
- `label_smoothing_factor`: 0.0
- `optim`: adamw_torch_fused
- `optim_args`: None
- `adafactor`: False
- `group_by_length`: False
- `length_column_name`: length
- `project`: huggingface
- `trackio_space_id`: trackio
- `ddp_find_unused_parameters`: None
- `ddp_bucket_cap_mb`: None
- `ddp_broadcast_buffers`: False
- `dataloader_pin_memory`: True
- `dataloader_persistent_workers`: False
- `skip_memory_metrics`: True
- `use_legacy_prediction_loop`: False
- `push_to_hub`: False
- `resume_from_checkpoint`: None
- `hub_model_id`: None
- `hub_strategy`: every_save
- `hub_private_repo`: None
- `hub_always_push`: False
- `hub_revision`: None
- `gradient_checkpointing`: False
- `gradient_checkpointing_kwargs`: None
- `include_inputs_for_metrics`: False
- `include_for_metrics`: []
- `eval_do_concat_batches`: True
- `fp16_backend`: auto
- `push_to_hub_model_id`: None
- `push_to_hub_organization`: None
- `mp_parameters`:
- `auto_find_batch_size`: False
- `full_determinism`: False
- `torchdynamo`: None
- `ray_scope`: last
- `ddp_timeout`: 1800
- `torch_compile`: False
- `torch_compile_backend`: None
- `torch_compile_mode`: None
- `include_tokens_per_second`: False
- `include_num_input_tokens_seen`: no
- `neftune_noise_alpha`: None
- `optim_target_modules`: None
- `batch_eval_metrics`: False
- `eval_on_start`: False
- `use_liger_kernel`: False
- `liger_kernel_config`: None
- `eval_use_gather_object`: False
- `average_tokens_across_devices`: True
- `prompts`: None
- `batch_sampler`: batch_sampler
- `multi_dataset_batch_sampler`: round_robin
- `router_mapping`: {}
- `learning_rate_mapping`: {}
</details>
### Framework Versions
- Python: 3.11.14
- Sentence Transformers: 5.2.0
- Transformers: 4.57.6
- PyTorch: 2.10.0+cpu
- Accelerate: 1.12.0
- Datasets: 4.5.0
- Tokenizers: 0.22.2
## Citation
### BibTeX
#### Sentence Transformers
```bibtex
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
```
#### MultipleNegativesRankingLoss
```bibtex
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
<!--
## Glossary
*Clearly define terms in order to be accessible across audiences.*
-->
<!--
## Model Card Authors
*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
-->
<!--
## Model Card Contact
*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
-->