SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-MiniLM-L6-v2
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("ayushexel/embed-all-MiniLM-L6-v2-squad-4-epochs")
# Run inference
sentences = [
    "What is the process called that can increase solar energy in areas further away from the earth's equator?",
    'Geography effects solar energy potential because areas that are closer to the equator have a greater amount of solar radiation. However, the use of photovoltaics that can follow the position of the sun can significantly increase the solar energy potential in areas that are farther from the equator. Time variation effects the potential of solar energy because during the nighttime there is little solar radiation on the surface of the Earth for solar panels to absorb. This limits the amount of energy that solar panels can absorb in one day. Cloud cover can effect the potential of solar panels because clouds block incoming light from the sun and reduce the light available for solar cells.',
    'Geography effects solar energy potential because areas that are closer to the equator have a greater amount of solar radiation. However, the use of photovoltaics that can follow the position of the sun can significantly increase the solar energy potential in areas that are farther from the equator. Time variation effects the potential of solar energy because during the nighttime there is little solar radiation on the surface of the Earth for solar panels to absorb. This limits the amount of energy that solar panels can absorb in one day. Cloud cover can effect the potential of solar panels because clouds block incoming light from the sun and reduce the light available for solar cells.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.4092

Training Details

Training Dataset

Unnamed Dataset

  • Size: 44,285 training samples
  • Columns: question, context, and negative
  • Approximate statistics based on the first 1000 samples:
    question context negative
    type string string string
    details
    • min: 6 tokens
    • mean: 14.6 tokens
    • max: 40 tokens
    • min: 32 tokens
    • mean: 144.83 tokens
    • max: 256 tokens
    • min: 29 tokens
    • mean: 151.57 tokens
    • max: 256 tokens
  • Samples:
    question context negative
    What are two cellular organelles which contain genetic material? Some organisms have multiple copies of chromosomes: diploid, triploid, tetraploid and so on. In classical genetics, in a sexually reproducing organism (typically eukarya) the gamete has half the number of chromosomes of the somatic cell and the genome is a full set of chromosomes in a diploid cell. The halving of the genetic material in gametes is accomplished by the segregation of homologous chromosomes during meiosis. In haploid organisms, including cells of bacteria, archaea, and in organelles including mitochondria and chloroplasts, or viruses, that similarly contain genes, the single or set of circular or linear chains of DNA (or RNA for some viruses), likewise constitute the genome. The term genome can be applied specifically to mean what is stored on a complete set of nuclear DNA (i.e., the "nuclear genome") but can also be applied to what is stored within organelles that contain their own DNA, as with the "mitochondrial genome" or the "chloroplast genome". Additionally, the gen... In modern molecular biology and genetics, the genome is the genetic material of an organism. It consists of DNA (or RNA in RNA viruses). The genome includes both the genes and the non-coding sequences of the DNA/RNA.
    The modern Theravada school stablished itself in what country? Only the Theravada school does not include the Mahayana scriptures in its canon. As the modern Theravada school is descended from a branch of Buddhism that diverged and established itself in Sri Lanka prior to the emergence of the Mahayana texts, debate exists as to whether the Theravada were historically included in the hinayana designation; in the modern era, this label is seen as derogatory, and is generally avoided. Theravada ("Doctrine of the Elders", or "Ancient Doctrine") is the oldest surviving Buddhist school. It is relatively conservative, and generally closest to early Buddhism. The name Theravāda comes from the ancestral Sthāvirīya, one of the early Buddhist schools, from which the Theravadins claim descent. After unsuccessfully trying to modify the Vinaya, a small group of "elderly members", i.e. sthaviras, broke away from the majority Mahāsāṃghika during the Second Buddhist council, giving rise to the Sthavira sect. Sinhalese Buddhist reformers in the late nineteenth and early twentieth centuries portrayed the Pali Canon as the original version of scripture. They also emphasized Theravada being rational and scientific.
    Where in Antarctica has warming been noticed? Some of Antarctica has been warming up; particularly strong warming has been noted on the Antarctic Peninsula. A study by Eric Steig published in 2009 noted for the first time that the continent-wide average surface temperature trend of Antarctica is slightly positive at >0.05 °C (0.09 °F) per decade from 1957 to 2006. This study also noted that West Antarctica has warmed by more than 0.1 °C (0.2 °F) per decade in the last 50 years, and this warming is strongest in winter and spring. This is partly offset by autumn cooling in East Antarctica. There is evidence from one study that Antarctica is warming as a result of human carbon dioxide emissions, but this remains ambiguous. The amount of surface warming in West Antarctica, while large, has not led to appreciable melting at the surface, and is not directly affecting the West Antarctic Ice Sheet's contribution to sea level. Instead the recent increases in glacier outflow are believed to be due to an inflow of warm water from the deep oc... Antarctica is colder than the Arctic for three reasons. First, much of the continent is more than 3,000 m (9,800 ft) above sea level, and temperature decreases with elevation in the troposphere. Second, the Arctic Ocean covers the north polar zone: the ocean's relative warmth is transferred through the icepack and prevents temperatures in the Arctic regions from reaching the extremes typical of the land surface of Antarctica. Third, the Earth is at aphelion in July (i.e., the Earth is farthest from the Sun in the Antarctic winter), and the Earth is at perihelion in January (i.e., the Earth is closest to the Sun in the Antarctic summer). The orbital distance contributes to a colder Antarctic winter (and a warmer Antarctic summer) but the first two effects have more impact.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 5,000 evaluation samples
  • Columns: question, context, and negative_1
  • Approximate statistics based on the first 1000 samples:
    question context negative_1
    type string string string
    details
    • min: 3 tokens
    • mean: 14.63 tokens
    • max: 39 tokens
    • min: 32 tokens
    • mean: 145.56 tokens
    • max: 256 tokens
    • min: 32 tokens
    • mean: 142.88 tokens
    • max: 256 tokens
  • Samples:
    question context negative_1
    What weather condition increased? Sea levels began to rise during the Jurassic, which was probably caused by an increase in seafloor spreading. The formation of new crust beneath the surface displaced ocean waters by as much as 200 m (656 ft) more than today, which flooded coastal areas. Furthermore, Pangaea began to rift into smaller divisions, bringing more land area in contact with the ocean by forming the Tethys Sea. Temperatures continued to increase and began to stabilize. Humidity also increased with the proximity of water, and deserts retreated. Oklahoma City has a humid subtropical climate (Köppen: Cfa), with frequent variations in weather daily and seasonally, except during the consistently hot and humid summer months. Prolonged and severe droughts (sometimes leading to wildfires in the vicinity) as well as very heavy rainfall leading to flash flooding and flooding occur with some regularity. Consistent winds, usually from the south or south-southeast during the summer, help temper the hotter weather. Consistent northerly winds during the winter can intensify cold periods. Severe ice storms and snowstorms happen sporadically during the winter.
    When is Oklahoma city sever weather season begin? Oklahoma City has a very active severe weather season from March through June, especially during April and May. Being in the center of what is colloquially referred to as Tornado Alley, it is prone to especially frequent and severe tornadoes, as well as very severe hailstorms and occasional derechoes. Tornadoes have occurred in every month of the year and a secondary smaller peak also occurs during autumn, especially October. The Oklahoma City metropolitan area is one of the most tornado-prone major cities in the world, with about 150 tornadoes striking within the city limits since 1890. Since the time weather records have been kept, Oklahoma City has been struck by thirteen violent tornadoes, eleven F/EF4s and two F/EF5. On May 3, 1999 parts of southern Oklahoma City and nearby suburban communities suffered from one of the most powerful tornadoes on record, an F5 on the Fujita scale, with wind speeds estimated by radar at 318 mph (510 km/h). On May 20, 2013, far southwest Oklahoma Cit... Oklahoma City has a very active severe weather season from March through June, especially during April and May. Being in the center of what is colloquially referred to as Tornado Alley, it is prone to especially frequent and severe tornadoes, as well as very severe hailstorms and occasional derechoes. Tornadoes have occurred in every month of the year and a secondary smaller peak also occurs during autumn, especially October. The Oklahoma City metropolitan area is one of the most tornado-prone major cities in the world, with about 150 tornadoes striking within the city limits since 1890. Since the time weather records have been kept, Oklahoma City has been struck by thirteen violent tornadoes, eleven F/EF4s and two F/EF5. On May 3, 1999 parts of southern Oklahoma City and nearby suburban communities suffered from one of the most powerful tornadoes on record, an F5 on the Fujita scale, with wind speeds estimated by radar at 318 mph (510 km/h). On May 20, 2013, far southwest Oklahoma Cit...
    What is the most intensive type of enclosure system used in the poultry business? In free-range husbandry, the birds can roam freely outdoors for at least part of the day. Often, this is in large enclosures, but the birds have access to natural conditions and can exhibit their normal behaviours. A more intensive system is yarding, in which the birds have access to a fenced yard and poultry house at a higher stocking rate. Poultry can also be kept in a barn system, with no access to the open air, but with the ability to move around freely inside the building. The most intensive system for egg-laying chickens is battery cages, often set in multiple tiers. In these, several birds share a small cage which restricts their ability to move around and behave in a normal manner. The eggs are laid on the floor of the cage and roll into troughs outside for ease of collection. Battery cages for hens have been illegal in the EU since January 1, 2012. In free-range husbandry, the birds can roam freely outdoors for at least part of the day. Often, this is in large enclosures, but the birds have access to natural conditions and can exhibit their normal behaviours. A more intensive system is yarding, in which the birds have access to a fenced yard and poultry house at a higher stocking rate. Poultry can also be kept in a barn system, with no access to the open air, but with the ability to move around freely inside the building. The most intensive system for egg-laying chickens is battery cages, often set in multiple tiers. In these, several birds share a small cage which restricts their ability to move around and behave in a normal manner. The eggs are laid on the floor of the cage and roll into troughs outside for ease of collection. Battery cages for hens have been illegal in the EU since January 1, 2012.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • num_train_epochs: 4
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss gooqa-dev_cosine_accuracy
-1 -1 - - 0.3286
0.2890 100 0.4345 0.7989 0.3810
0.5780 200 0.4048 0.7681 0.4030
0.8671 300 0.3749 0.7587 0.4074
1.1561 400 0.3187 0.7573 0.4056
1.4451 500 0.2756 0.7515 0.4042
1.7341 600 0.2765 0.7450 0.4084
2.0231 700 0.2666 0.7468 0.4088
2.3121 800 0.2031 0.7424 0.4102
2.6012 900 0.215 0.7426 0.4132
2.8902 1000 0.2107 0.7423 0.4070
3.1792 1100 0.1866 0.7413 0.4128
3.4682 1200 0.1737 0.7453 0.4068
3.7572 1300 0.1707 0.7410 0.4108
-1 -1 - - 0.4092

Framework Versions

  • Python: 3.11.0
  • Sentence Transformers: 4.0.1
  • Transformers: 4.50.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.5.2
  • Datasets: 3.5.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
1
Safetensors
Model size
22.7M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ayushexel/embed-all-MiniLM-L6-v2-squad-4-epochs

Finetuned
(772)
this model

Papers for ayushexel/embed-all-MiniLM-L6-v2-squad-4-epochs

Evaluation results