SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-MiniLM-L6-v2
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("ayushexel/emb-all-MiniLM-L6-v2-squad-5-epochs")
# Run inference
sentences = [
    'Is the strength of the modulus of rupture or elasticity increased more when wood is dried?',
    'The greatest strength increase due to drying is in the ultimate crushing strength, and strength at elastic limit in endwise compression; these are followed by the modulus of rupture, and stress at elastic limit in cross-bending, while the modulus of elasticity is least affected.',
    'The greatest strength increase due to drying is in the ultimate crushing strength, and strength at elastic limit in endwise compression; these are followed by the modulus of rupture, and stress at elastic limit in cross-bending, while the modulus of elasticity is least affected.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.4106

Training Details

Training Dataset

Unnamed Dataset

  • Size: 44,287 training samples
  • Columns: question, context, and negative
  • Approximate statistics based on the first 1000 samples:
    question context negative
    type string string string
    details
    • min: 6 tokens
    • mean: 14.53 tokens
    • max: 39 tokens
    • min: 28 tokens
    • mean: 149.98 tokens
    • max: 256 tokens
    • min: 30 tokens
    • mean: 146.58 tokens
    • max: 256 tokens
  • Samples:
    question context negative
    When is a beer at its most flavorful? Drinking chilled beer began with the development of artificial refrigeration and by the 1870s, was spread in those countries that concentrated on brewing pale lager. Chilling beer makes it more refreshing, though below 15.5 °C the chilling starts to reduce taste awareness and reduces it significantly below 10 °C (50 °F). Beer served unchilled—either cool or at room temperature, reveal more of their flavours. Cask Marque, a non-profit UK beer organisation, has set a temperature standard range of 12°–14 °C (53°–57 °F) for cask ales to be served. The process of making beer is known as brewing. A dedicated building for the making of beer is called a brewery, though beer can be made in the home and has been for much of its history. A company that makes beer is called either a brewery or a brewing company. Beer made on a domestic scale for non-commercial reasons is classified as homebrewing regardless of where it is made, though most homebrewed beer is made in the home. Brewing beer is subject to legislation and taxation in developed countries, which from the late 19th century largely restricted brewing to a commercial operation only. However, the UK government relaxed legislation in 1963, followed by Australia in 1972 and the US in 1978, allowing homebrewing to become a popular hobby.
    When was the BeiDou-1C satellite launched? The first satellite, BeiDou-1A, was launched on 30 October 2000, followed by BeiDou-1B on 20 December 2000. The third satellite, BeiDou-1C (a backup satellite), was put into orbit on 25 May 2003. The successful launch of BeiDou-1C also meant the establishment of the BeiDou-1 navigation system. The first satellite, BeiDou-1A, was launched on October 31, 2000. The second satellite, BeiDou-1B, was successfully launched on December 21, 2000. The last operational satellite of the constellation, BeiDou-1C, was launched on May 25, 2003.
    The hotel that sold for the most money in 2014 was which in NYC? Manhattan was on track to have an estimated 90,000 hotel rooms at the end of 2014, a 10% increase from 2013. In October 2014, the Anbang Insurance Group, based in China, purchased the Waldorf Astoria New York for US$1.95 billion, making it the world's most expensive hotel ever sold. Real estate is a major force in the city's economy, as the total value of all New York City property was assessed at US$914.8 billion for the 2015 fiscal year. The Time Warner Center is the property with the highest-listed market value in the city, at US$1.1 billion in 2006. New York City is home to some of the nation's—and the world's—most valuable real estate. 450 Park Avenue was sold on July 2, 2007 for US$510 million, about $1,589 per square foot ($17,104/m²), breaking the barely month-old record for an American office building of $1,476 per square foot ($15,887/m²) set in the June 2007 sale of 660 Madison Avenue. According to Forbes, in 2014, Manhattan was home to six of the top ten zip codes in the United States by median housing price.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 5,000 evaluation samples
  • Columns: question, context, and negative_1
  • Approximate statistics based on the first 1000 samples:
    question context negative_1
    type string string string
    details
    • min: 7 tokens
    • mean: 14.65 tokens
    • max: 31 tokens
    • min: 31 tokens
    • mean: 147.8 tokens
    • max: 256 tokens
    • min: 28 tokens
    • mean: 147.17 tokens
    • max: 256 tokens
  • Samples:
    question context negative_1
    What percentage of biodiversity has the planet lost since 1970 In absolute terms, the planet has lost 52% of its biodiversity since 1970 according to a 2014 study by the World Wildlife Fund. The Living Planet Report 2014 claims that "the number of mammals, birds, reptiles, amphibians and fish across the globe is, on average, about half the size it was 40 years ago". Of that number, 39% accounts for the terrestrial wildlife gone, 39% for the marine wildlife gone, and 76% for the freshwater wildlife gone. Biodiversity took the biggest hit in Latin America, plummeting 83 percent. High-income countries showed a 10% increase in biodiversity, which was canceled out by a loss in low-income countries. This is despite the fact that high-income countries use five times the ecological resources of low-income countries, which was explained as a result of process whereby wealthy nations are outsourcing resource depletion to poorer nations, which are suffering the greatest ecosystem losses. In absolute terms, the planet has lost 52% of its biodiversity since 1970 according to a 2014 study by the World Wildlife Fund. The Living Planet Report 2014 claims that "the number of mammals, birds, reptiles, amphibians and fish across the globe is, on average, about half the size it was 40 years ago". Of that number, 39% accounts for the terrestrial wildlife gone, 39% for the marine wildlife gone, and 76% for the freshwater wildlife gone. Biodiversity took the biggest hit in Latin America, plummeting 83 percent. High-income countries showed a 10% increase in biodiversity, which was canceled out by a loss in low-income countries. This is despite the fact that high-income countries use five times the ecological resources of low-income countries, which was explained as a result of process whereby wealthy nations are outsourcing resource depletion to poorer nations, which are suffering the greatest ecosystem losses.
    What is the per capita income of Delhi as of 2013? New Delhi is the largest commercial city in northern India. It has an estimated net State Domestic Product (FY 2010) of ₹1595 billion (US$23 billion) in nominal terms and ~₹6800 billion (US$100 billion) in PPP terms. As of 2013, the per capita income of Delhi was Rs. 230000, second highest in India after Goa. GSDP in Delhi at the current prices for 2012-13 is estimated at Rs 3.88 trillion (short scale) against Rs 3.11 trillion (short scale) in 2011-12. The Government of National Capital Territory of Delhi does not release any economic figures specifically for New Delhi but publishes an official economic report on the whole of Delhi annually. According to the Economic Survey of Delhi, the metropolis has a net State Domestic Product (SDP) of Rs. 83,085 crores (for the year 2004–05) and a per capita income of Rs. 53,976($1,200). In the year 2008–09 New Delhi had a Per Capita Income of Rs.1,16,886 ($2,595).It grew by 16.2% to reach Rs.1,35,814 ($3,018) in 2009–10 fiscal. New Delhi's Per Capita GDP (at PPP) was at $6,860 during 2009–10 fiscal, making it one of the richest cities in India. The tertiary sector contributes 78.4% of Delhi's gross SDP followed by secondary and primary sectors with 20.2% and 1.4% contribution respectively.
    Which Prussian general commanded the attack against the French at St. Privat? By 16:50, with the Prussian southern attacks in danger of breaking up, the Prussian 3rd Guards Infantry Brigade of the Second Army opened an attack against the French positions at St. Privat which were commanded by General Canrobert. At 17:15, the Prussian 4th Guards Infantry Brigade joined the advance followed at 17:45 by the Prussian 1st Guards Infantry Brigade. All of the Prussian Guard attacks were pinned down by lethal French gunfire from the rifle pits and trenches. At 18:15 the Prussian 2nd Guards Infantry Brigade, the last of the 1st Guards Infantry Division, was committed to the attack on St. Privat while Steinmetz committed the last of the reserves of the First Army across the Mance Ravine. By 18:30, a considerable portion of the VII and VIII Corps disengaged from the fighting and withdrew towards the Prussian positions at Rezonville. With the defeat of the First Army, Prince Frederick Charles ordered a massed artillery attack against Canrobert's position at St. Privat to prevent the Guards attack from failing too. At 19:00 the 3rd Division of Fransecky's II Corps of the Second Army advanced across Ravine while the XII Corps cleared out the nearby town of Roncourt and with the survivors of the 1st Guards Infantry Division launched a fresh attack against the ruins of St. Privat. At 20:00, the arrival of the Prussian 4th Infantry Division of the II Corps and with the Prussian right flank on Mance Ravine, the line stabilised. By then, the Prussians of the 1st Guards Infantry Division and the XII and II Corps captured St. Privat forcing the decimated French forces to withdraw. With the Prussians exhausted from the fighting, the French were now able to mount a counter-attack. General Bourbaki, however, refused to commit the reserves of the French Old Guard to the battle because, by that time, he considered the overall si...
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 256
  • per_device_eval_batch_size: 256
  • num_train_epochs: 5
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 256
  • per_device_eval_batch_size: 256
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss gooqa-dev_cosine_accuracy
-1 -1 - - 0.3282
0.5780 100 0.548 0.8841 0.3910
1.1561 200 0.4777 0.8530 0.4024
1.7341 300 0.4008 0.8470 0.4040
2.3121 400 0.3563 0.8390 0.4090
2.8902 500 0.3234 0.8276 0.4064
3.4682 600 0.2866 0.8325 0.4068
4.0462 700 0.274 0.8348 0.4028
4.6243 800 0.2512 0.8284 0.4062
-1 -1 - - 0.4106

Framework Versions

  • Python: 3.11.0
  • Sentence Transformers: 4.0.1
  • Transformers: 4.50.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.5.2
  • Datasets: 3.5.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
1
Safetensors
Model size
22.7M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ayushexel/emb-all-MiniLM-L6-v2-squad-5-epochs

Finetuned
(820)
this model

Papers for ayushexel/emb-all-MiniLM-L6-v2-squad-5-epochs

Evaluation results