metadata
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- dense
- generated_from_trainer
- dataset_size:70323
- loss:CosineSimilarityLoss
base_model: intfloat/e5-base-v2
widget:
- source_sentence: 'Birth of Cainan | participants: cainan_534, enos_1193'
sentences:
- >-
The mother of Sisera looked out at a window, and cried through the
lattice, Why is his chariot so long in coming? why tarry the wheels of
his chariots?
- >-
Therefore, behold, the days come, that I will do judgment upon the
graven images of Babylon: and her whole land shall be confounded, and
all her slain shall fall in the midst of her.
- >-
Which was the son of Mathusala, which was the son of Enoch, which was
the son of Jared, which was the son of Maleleel, which was the son of
Cainan,
- source_sentence: >-
Jerusalem Council | participants: silas_2740, judas_1759, james_719,
peter_2745, barnabas_1722, paul_2479
sentences:
- >-
What ailed thee, O thou sea, that thou fleddest? thou Jordan, that thou
wast driven back?
- >-
We have sent therefore Judas and Silas, who shall also tell you the same
things by mouth.
- >-
The Spirit itself beareth witness with our spirit, that we are the
children of God:
- source_sentence: >-
But he that is married careth for the things that are of the world, how he
may please his wife.
sentences:
- >-
But she had brought them up to the roof of the house, and hid them with
the stalks of flax, which she had laid in order upon the roof.
- >-
And their whole body, and their backs, and their hands, and their wings,
and the wheels, were full of eyes round about, even the wheels that they
four had.
- >-
There is difference also between a wife and a virgin. The unmarried
woman careth for the things of the Lord, that she may be holy both in
body and in spirit: but she that is married careth for the things of the
world, how she may please her husband.
- source_sentence: And the little owl, and the cormorant, and the great owl,
sentences:
- And the swan, and the pelican, and the gier eagle,
- >-
Take Aaron and his sons with him, and the garments, and the anointing
oil, and a bullock for the sin offering, and two rams, and a basket of
unleavened bread;
- >-
And his power shall be mighty, but not by his own power: and he shall
destroy wonderfully, and shall prosper, and practise, and shall destroy
the mighty and the holy people.
- source_sentence: John's Witness
sentences:
- >-
And they asked him, and said unto him, Why baptizest thou then, if thou
be not that Christ, nor Elias, neither that prophet?
- >-
Then I took Jaazaniah the son of Jeremiah, the son of Habaziniah, and
his brethren, and all his sons, and the whole house of the Rechabites;
- >-
But he turned, and said unto Peter, Get thee behind me, Satan: thou art
an offence unto me: for thou savourest not the things that be of God,
but those that be of men.
pipeline_tag: sentence-similarity
library_name: sentence-transformers
SentenceTransformer based on intfloat/e5-base-v2
This is a sentence-transformers model finetuned from intfloat/e5-base-v2. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: intfloat/e5-base-v2
- Maximum Sequence Length: 128 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 128, 'do_lower_case': False, 'architecture': 'BertModel'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
"John's Witness",
'And they asked him, and said unto him, Why baptizest thou then, if thou be not that Christ, nor Elias, neither that prophet?',
'Then I took Jaazaniah the son of Jeremiah, the son of Habaziniah, and his brethren, and all his sons, and the whole house of the Rechabites;',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.7469, 0.7488],
# [0.7469, 1.0000, 0.8236],
# [0.7488, 0.8236, 1.0000]])
Training Details
Training Dataset
Unnamed Dataset
- Size: 70,323 training samples
- Columns:
sentence_0,sentence_1, andlabel - Approximate statistics based on the first 1000 samples:
sentence_0 sentence_1 label type string string float details - min: 3 tokens
- mean: 53.54 tokens
- max: 128 tokens
- min: 5 tokens
- mean: 35.99 tokens
- max: 85 tokens
- min: 0.0
- mean: 0.99
- max: 1.0
- Samples:
sentence_0 sentence_1 label Prophecies of Jeremiah | participants: jeremiah_853In his days Judah shall be saved, and Israel shall dwell safely: and this is his name whereby he shall be called, The Lord Our Righteousness.1.0God: (A.S. and Dutch God; Dan. Gud; Ger. Gott), the name of the Divine Being. It is the rendering (1) of the Hebrew 'El , from a word meaning to be strong; (2) of 'Eloah_, plural _'Elohim . The singular form, Eloah , is used only in poetry. The plural form is more commonly used in all parts of the Bible, The Hebrew word Jehovah (q.v.), the only other word generally employed to denote the Supreme Being, is uniformly rendered in the Authorized Version by "LORD," printed in small capitals. The existence of God is taken for granted in the Bible. There is nowhere any argument to prove it. He who disbelieves this truth is spoken of as one devoid of understanding ( Psalms 14:1 ). The arguments generally adduced by theologians in proof of the being of God are:- The a priori argument, which is the testimony afforded by reason.
The a posteriori argument, by which we proceed logically from the facts of experience to causes. These arguments are, (a) T...And if ye offer the blind for sacrifice, is it not evil? and if ye offer the lame and sick, is it not evil? offer it now unto thy governor; will he be pleased with thee, or accept thy person? saith the Lord of hosts.1.0Holy WeekFor in those days shall be affliction, such as was not from the beginning of the creation which God created unto this time, neither shall be.1.0 - Loss:
CosineSimilarityLosswith these parameters:{ "loss_fct": "torch.nn.modules.loss.MSELoss" }
Training Hyperparameters
Non-Default Hyperparameters
num_train_epochs: 1max_steps: 50multi_dataset_batch_sampler: round_robin
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 8per_device_eval_batch_size: 8per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 1max_steps: 50lr_scheduler_type: linearlr_scheduler_kwargs: Nonewarmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthproject: huggingfacetrackio_space_id: trackioddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: noneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Trueprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robinrouter_mapping: {}learning_rate_mapping: {}
Framework Versions
- Python: 3.11.14
- Sentence Transformers: 5.2.0
- Transformers: 4.57.6
- PyTorch: 2.10.0+cpu
- Accelerate: 1.12.0
- Datasets: 4.5.0
- Tokenizers: 0.22.2
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}