hf-e5-bible-100 / README.md

Upload hf-e5-bible-100 embedding model

464c3b0 verified 27 days ago

20.5 kB

	---
	tags:
	- sentence-transformers
	- sentence-similarity
	- feature-extraction
	- dense
	- generated_from_trainer
	- dataset_size:262023
	- loss:MultipleNegativesRankingLoss
	base_model: intfloat/e5-base-v2
	widget:
	- source_sentence: 'query: Handkerchief: Only once in Authorized Version (Acts 19:12).
	The Greek word (sudarion) so rendered means properly “a sweat-cloth.” It is rendered
	“napkin” in John 11:44; 20:7; Luke 19:20.'
	sentences:
	- 'passage: as well as the cloth that had been wrapped around Jesus’ head. The cloth
	was still lying in its place, separate from the linen.'
	- 'passage: “On that day I will make the clans of Judah like a firepot in a woodpile,
	like a flaming torch among sheaves. They will consume all the surrounding peoples
	right and left, but Jerusalem will remain intact in her place.'
	- 'passage: and the borders of Canaan reached from Sidon toward Gerar as far as
	Gaza, and then toward Sodom, Gomorrah, Admah and Zeboyim, as far as Lasha.'
	- source_sentence: 'query: what happened to Job'
	sentences:
	- "passage: Remember, O God, that my life is but a breath;\n my eyes will never\
	\ see happiness again."
	- 'passage: So he prepared a great feast for them, and after they had finished eating
	and drinking, he sent them away, and they returned to their master. So the bands
	from Aram stopped raiding Israel’s territory.'
	- 'passage: of Ater (through Hezekiah) 98'
	- source_sentence: 'query: what happened to Jesus'
	sentences:
	- 'passage: The Lord wrote on these tablets what he had written before, the Ten
	Commandments he had proclaimed to you on the mountain, out of the fire, on the
	day of the assembly. And the Lord gave them to me.'
	- 'passage: “Make a tree good and its fruit will be good, or make a tree bad and
	its fruit will be bad, for a tree is recognized by its fruit.'
	- 'passage: So Joshua and his whole army came against them suddenly at the Waters
	of Merom and attacked them,'
	- source_sentence: 'query: what is Games'
	sentences:
	- 'passage: In Hebron he reigned over Judah seven years and six months, and in Jerusalem
	he reigned over all Israel and Judah thirty-three years.'
	- 'passage: Their surrounding villages were Etam, Ain, Rimmon, Token and Ashan—five
	towns—'
	- 'passage: Fight the good fight of the faith. Take hold of the eternal life to
	which you were called when you made your good confession in the presence of many
	witnesses.'
	- source_sentence: 'query: God: (A.S. and Dutch God; Dan. Gud; Ger. Gott), the name
	of the Divine Being. It is the rendering (1) of the Hebrew <i> ''El</i> , from
	a word meaning to be strong; (2) of <i> ''Eloah_, plural _''Elohim</i> . The singular
	form, <i> Eloah</i> , is used only in poetry. The plural form is more commonly
	used in all parts of the Bible, The Hebrew word Jehovah (q.v.), the only other
	word generally employed to denote the Supreme Being, is uniformly rendered in
	the Authorized Version by "LORD," printed in small capitals. The existence of
	God is taken for granted in the Bible. There is nowhere any argument to prove
	it. He who disbelieves this truth is spoken of as one devoid of understanding
	( Psalms 14:1 ). The arguments generally adduced by theologians in proof
	of the being of God are: <li> The a priori argument, which is the testimony
	afforded by reason. <li> The a posteriori argument, by which we proceed logically
	from the facts of experience to causes. These arguments are, (a) The cosmological,
	by which it is proved that there must be a First Cause of all things, for every
	effect must have a cause. (b) The teleological, or the argument from design.
	We see everywhere the operations of an intelligent Cause in nature. (c) The
	moral argument, called also the anthropological argument, based on the moral consciousness
	and the history of mankind, which exhibits a moral order and purpose which can
	only be explained on the supposition of the existence of God. Conscience and human
	history testify that "verily there is a God that judgeth in the earth." The
	attributes of God are set forth in order by Moses in Exodus 34:6 Exodus 34:7 .
	(see also Deuteronomy 6:4 ; 10:17 ; Numbers 16:22 ; Exodus 15:11 ; 33:19 ; Isaiah
	44:6 ; Habakkuk 3:6 ; Psalms 102:26 ; Job 34:12 .) They are also systematically
	classified in Revelation 5:12 and 7:12 . God''s attributes are spoken
	of by some as absolute, i.e., such as belong to his essence as Jehovah, Jah, etc.;
	and relative, i.e., such as are ascribed to him with relation to his creatures.
	Others distinguish them into communicable, i.e., those which can be imparted in
	degree to his creatures: goodness, holiness, wisdom, etc.; and incommunicable,
	which cannot be so imparted: independence, immutability, immensity, and eternity.
	They are by some also divided into natural attributes, eternity, immensity, etc.;
	and moral, holiness, goodness, etc.'
	sentences:
	- 'passage: Then each man grabbed his opponent by the head and thrust his dagger
	into his opponent’s side, and they fell down together. So that place in Gibeon
	was called Helkath Hazzurim.'
	- 'passage: and I saw the glory of the God of Israel coming from the east. His voice
	was like the roar of rushing waters, and the land was radiant with his glory.'
	- "passage: How long, Lord, must I call for help,\n but you do not listen?\n\
	Or cry out to you, “Violence!”\n but you do not save?"
	pipeline_tag: sentence-similarity
	library_name: sentence-transformers
	---

	# SentenceTransformer based on intfloat/e5-base-v2

	This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [intfloat/e5-base-v2](https://huggingface.co/intfloat/e5-base-v2). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

	## Model Details

	### Model Description
	- Model Type: Sentence Transformer
	- Base model: [intfloat/e5-base-v2](https://huggingface.co/intfloat/e5-base-v2) <!-- at revision f52bf8ec8c7124536f0efb74aca902b2995e5bcd -->
	- Maximum Sequence Length: 256 tokens
	- Output Dimensionality: 768 dimensions
	- Similarity Function: Cosine Similarity
	<!-- - Training Dataset: Unknown -->
	<!-- - Language: Unknown -->
	<!-- - License: Unknown -->

	### Model Sources

	- Documentation: [Sentence Transformers Documentation](https://sbert.net)
	- Repository: [Sentence Transformers on GitHub](https://github.com/huggingface/sentence-transformers)
	- Hugging Face: [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)

	### Full Model Architecture

	```
	SentenceTransformer(
	(0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertModel'})
	(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
	(2): Normalize()
	)
	```

	## Usage

	### Direct Usage (Sentence Transformers)

	First install the Sentence Transformers library:

	```bash
	pip install -U sentence-transformers
	```

	Then you can load this model and run inference.
	```python
	from sentence_transformers import SentenceTransformer

	# Download from the 🤗 Hub
	model = SentenceTransformer("sentence_transformers_model_id")
	# Run inference
	sentences = [
	'query: God: (A.S. and Dutch God; Dan. Gud; Ger. Gott), the name of the Divine Being. It is the rendering (1) of the Hebrew <i> \'El</i> , from a word meaning to be strong; (2) of <i> \'Eloah_, plural _\'Elohim</i> . The singular form, <i> Eloah</i> , is used only in poetry. The plural form is more commonly used in all parts of the Bible, The Hebrew word Jehovah (q.v.), the only other word generally employed to denote the Supreme Being, is uniformly rendered in the Authorized Version by "LORD," printed in small capitals. The existence of God is taken for granted in the Bible. There is nowhere any argument to prove it. He who disbelieves this truth is spoken of as one devoid of understanding ( Psalms 14:1 ). The arguments generally adduced by theologians in proof of the being of God are: <li> The a priori argument, which is the testimony afforded by reason. <li> The a posteriori argument, by which we proceed logically from the facts of experience to causes. These arguments are, (a) The cosmological, by which it is proved that there must be a First Cause of all things, for every effect must have a cause. (b) The teleological, or the argument from design. We see everywhere the operations of an intelligent Cause in nature. (c) The moral argument, called also the anthropological argument, based on the moral consciousness and the history of mankind, which exhibits a moral order and purpose which can only be explained on the supposition of the existence of God. Conscience and human history testify that "verily there is a God that judgeth in the earth." The attributes of God are set forth in order by Moses in Exodus 34:6 Exodus 34:7 . (see also Deuteronomy 6:4 ; 10:17 ; Numbers 16:22 ; Exodus 15:11 ; 33:19 ; Isaiah 44:6 ; Habakkuk 3:6 ; Psalms 102:26 ; Job 34:12 .) They are also systematically classified in Revelation 5:12 and 7:12 . God\'s attributes are spoken of by some as absolute, i.e., such as belong to his essence as Jehovah, Jah, etc.; and relative, i.e., such as are ascribed to him with relation to his creatures. Others distinguish them into communicable, i.e., those which can be imparted in degree to his creatures: goodness, holiness, wisdom, etc.; and incommunicable, which cannot be so imparted: independence, immutability, immensity, and eternity. They are by some also divided into natural attributes, eternity, immensity, etc.; and moral, holiness, goodness, etc.',
	'passage: How long, Lord, must I call for help,\n but you do not listen?\nOr cry out to you, “Violence!”\n but you do not save?',
	'passage: Then each man grabbed his opponent by the head and thrust his dagger into his opponent’s side, and they fell down together. So that place in Gibeon was called Helkath Hazzurim.',
	]
	embeddings = model.encode(sentences)
	print(embeddings.shape)
	# [3, 768]

	# Get the similarity scores for the embeddings
	similarities = model.similarity(embeddings, embeddings)
	print(similarities)
	# tensor([[1.0000, 0.4670, 0.3140],
	# [0.4670, 1.0000, 0.4137],
	# [0.3140, 0.4137, 1.0000]])
	```

	<!--
	### Direct Usage (Transformers)

	<details><summary>Click to see the direct usage in Transformers</summary>

	</details>
	-->

	<!--
	### Downstream Usage (Sentence Transformers)

	You can finetune this model on your own dataset.

	<details><summary>Click to expand</summary>

	</details>
	-->

	<!--
	### Out-of-Scope Use

	List how the model may foreseeably be misused and address what users ought not to do with the model.
	-->

	<!--
	## Bias, Risks and Limitations

	What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.
	-->

	<!--
	### Recommendations

	What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.
	-->

	## Training Details

	### Training Dataset

	#### Unnamed Dataset

	* Size: 262,023 training samples
	* Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
	* Approximate statistics based on the first 1000 samples:
	\| \| sentence_0 \| sentence_1 \| label \|
	\|:--------\|:-----------------------------------------------------------------------------------\|:----------------------------------------------------------------------------------\|:--------------------------------------------------------------\|
	\| type \| string \| string \| float \|
	\| details \| <ul><li>min: 5 tokens</li><li>mean: 27.82 tokens</li><li>max: 256 tokens</li></ul> \| <ul><li>min: 9 tokens</li><li>mean: 35.93 tokens</li><li>max: 87 tokens</li></ul> \| <ul><li>min: 1.0</li><li>mean: 1.0</li><li>max: 1.0</li></ul> \|
	* Samples:
	\| sentence_0 \| sentence_1 \| label \|
	\|:----------------------------------------------------------------------------------------------------------------------------\|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\|:-----------------\|
	\| <code>query: To those who sold doves he said, “Get these out of here! Stop turning my Father’s house into a market!”</code> \| <code>passage: His disciples remembered that it is written: “Zeal for your house will consume me.”</code> \| <code>1.0</code> \|
	\| <code>query: Joseph (son of Jacob)</code> \| <code>passage: Joseph found favor in his eyes and became his attendant. Potiphar put him in charge of his household, and he entrusted to his care everything he owned.</code> \| <code>1.0</code> \|
	\| <code>query: Divination meaning</code> \| <code>passage: He sacrificed his children in the fire in the Valley of Ben Hinnom, practiced divination and witchcraft, sought omens, and consulted mediums and spiritists. He did much evil in the eyes of the Lord, arousing his anger.</code> \| <code>1.0</code> \|
	* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
	```json
	{
	"scale": 20.0,
	"similarity_fct": "cos_sim",
	"gather_across_devices": false
	}
	```

	### Training Hyperparameters
	#### Non-Default Hyperparameters

	- `per_device_train_batch_size`: 32
	- `per_device_eval_batch_size`: 32
	- `num_train_epochs`: 1
	- `max_steps`: 100
	- `multi_dataset_batch_sampler`: round_robin

	#### All Hyperparameters
	<details><summary>Click to expand</summary>

	- `overwrite_output_dir`: False
	- `do_predict`: False
	- `eval_strategy`: no
	- `prediction_loss_only`: True
	- `per_device_train_batch_size`: 32
	- `per_device_eval_batch_size`: 32
	- `per_gpu_train_batch_size`: None
	- `per_gpu_eval_batch_size`: None
	- `gradient_accumulation_steps`: 1
	- `eval_accumulation_steps`: None
	- `torch_empty_cache_steps`: None
	- `learning_rate`: 5e-05
	- `weight_decay`: 0.0
	- `adam_beta1`: 0.9
	- `adam_beta2`: 0.999
	- `adam_epsilon`: 1e-08
	- `max_grad_norm`: 1
	- `num_train_epochs`: 1
	- `max_steps`: 100
	- `lr_scheduler_type`: linear
	- `lr_scheduler_kwargs`: None
	- `warmup_ratio`: 0.0
	- `warmup_steps`: 0
	- `log_level`: passive
	- `log_level_replica`: warning
	- `log_on_each_node`: True
	- `logging_nan_inf_filter`: True
	- `save_safetensors`: True
	- `save_on_each_node`: False
	- `save_only_model`: False
	- `restore_callback_states_from_checkpoint`: False
	- `no_cuda`: False
	- `use_cpu`: False
	- `use_mps_device`: False
	- `seed`: 42
	- `data_seed`: None
	- `jit_mode_eval`: False
	- `bf16`: False
	- `fp16`: False
	- `fp16_opt_level`: O1
	- `half_precision_backend`: auto
	- `bf16_full_eval`: False
	- `fp16_full_eval`: False
	- `tf32`: None
	- `local_rank`: 0
	- `ddp_backend`: None
	- `tpu_num_cores`: None
	- `tpu_metrics_debug`: False
	- `debug`: []
	- `dataloader_drop_last`: False
	- `dataloader_num_workers`: 0
	- `dataloader_prefetch_factor`: None
	- `past_index`: -1
	- `disable_tqdm`: False
	- `remove_unused_columns`: True
	- `label_names`: None
	- `load_best_model_at_end`: False
	- `ignore_data_skip`: False
	- `fsdp`: []
	- `fsdp_min_num_params`: 0
	- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
	- `fsdp_transformer_layer_cls_to_wrap`: None
	- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
	- `parallelism_config`: None
	- `deepspeed`: None
	- `label_smoothing_factor`: 0.0
	- `optim`: adamw_torch_fused
	- `optim_args`: None
	- `adafactor`: False
	- `group_by_length`: False
	- `length_column_name`: length
	- `project`: huggingface
	- `trackio_space_id`: trackio
	- `ddp_find_unused_parameters`: None
	- `ddp_bucket_cap_mb`: None
	- `ddp_broadcast_buffers`: False
	- `dataloader_pin_memory`: True
	- `dataloader_persistent_workers`: False
	- `skip_memory_metrics`: True
	- `use_legacy_prediction_loop`: False
	- `push_to_hub`: False
	- `resume_from_checkpoint`: None
	- `hub_model_id`: None
	- `hub_strategy`: every_save
	- `hub_private_repo`: None
	- `hub_always_push`: False
	- `hub_revision`: None
	- `gradient_checkpointing`: False
	- `gradient_checkpointing_kwargs`: None
	- `include_inputs_for_metrics`: False
	- `include_for_metrics`: []
	- `eval_do_concat_batches`: True
	- `fp16_backend`: auto
	- `push_to_hub_model_id`: None
	- `push_to_hub_organization`: None
	- `mp_parameters`:
	- `auto_find_batch_size`: False
	- `full_determinism`: False
	- `torchdynamo`: None
	- `ray_scope`: last
	- `ddp_timeout`: 1800
	- `torch_compile`: False
	- `torch_compile_backend`: None
	- `torch_compile_mode`: None
	- `include_tokens_per_second`: False
	- `include_num_input_tokens_seen`: no
	- `neftune_noise_alpha`: None
	- `optim_target_modules`: None
	- `batch_eval_metrics`: False
	- `eval_on_start`: False
	- `use_liger_kernel`: False
	- `liger_kernel_config`: None
	- `eval_use_gather_object`: False
	- `average_tokens_across_devices`: True
	- `prompts`: None
	- `batch_sampler`: batch_sampler
	- `multi_dataset_batch_sampler`: round_robin
	- `router_mapping`: {}
	- `learning_rate_mapping`: {}

	</details>

	### Framework Versions
	- Python: 3.11.14
	- Sentence Transformers: 5.2.0
	- Transformers: 4.57.6
	- PyTorch: 2.10.0+cpu
	- Accelerate: 1.12.0
	- Datasets: 4.5.0
	- Tokenizers: 0.22.2

	## Citation

	### BibTeX

	#### Sentence Transformers
	```bibtex
	@inproceedings{reimers-2019-sentence-bert,
	title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
	author = "Reimers, Nils and Gurevych, Iryna",
	booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
	month = "11",
	year = "2019",
	publisher = "Association for Computational Linguistics",
	url = "https://arxiv.org/abs/1908.10084",
	}
	```

	#### MultipleNegativesRankingLoss
	```bibtex
	@misc{henderson2017efficient,
	title={Efficient Natural Language Response Suggestion for Smart Reply},
	author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
	year={2017},
	eprint={1705.00652},
	archivePrefix={arXiv},
	primaryClass={cs.CL}
	}
	```

	<!--
	## Glossary

	Clearly define terms in order to be accessible across audiences.
	-->

	<!--
	## Model Card Authors

	Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.
	-->

	<!--
	## Model Card Contact

	Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.
	-->