SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: sentence-transformers/all-MiniLM-L6-v2
Maximum Sequence Length: 256 tokens
Output Dimensionality: 384 dimensions
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("ayushexel/emb-all-MiniLM-L6-v2-squad-5-epochs")
# Run inference
sentences = [
    'Is the strength of the modulus of rupture or elasticity increased more when wood is dried?',
    'The greatest strength increase due to drying is in the ultimate crushing strength, and strength at elastic limit in endwise compression; these are followed by the modulus of rupture, and stress at elastic limit in cross-bending, while the modulus of elasticity is least affected.',
    'The greatest strength increase due to drying is in the ultimate crushing strength, and strength at elastic limit in endwise compression; these are followed by the modulus of rupture, and stress at elastic limit in cross-bending, while the modulus of elasticity is least affected.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

Dataset: gooqa-dev
Evaluated with TripletEvaluator

Metric	Value
cosine_accuracy	0.4106

Training Details

Training Dataset

Unnamed Dataset

Size: 44,287 training samples
Columns: question, context, and negative

Approximate statistics based on the first 1000 samples:

	question	context	negative
type	string	string	string
details	min: 6 tokens mean: 14.53 tokens max: 39 tokens	min: 28 tokens mean: 149.98 tokens max: 256 tokens	min: 30 tokens mean: 146.58 tokens max: 256 tokens

Samples:

question	context	negative
`When is a beer at its most flavorful?`	Drinking chilled beer began with the development of artificial refrigeration and by the 1870s, was spread in those countries that concentrated on brewing pale lager. Chilling beer makes it more refreshing, though below 15.5 °C the chilling starts to reduce taste awareness and reduces it significantly below 10 °C (50 °F). Beer served unchilled—either cool or at room temperature, reveal more of their flavours. Cask Marque, a non-profit UK beer organisation, has set a temperature standard range of 12°–14 °C (53°–57 °F) for cask ales to be served.	The process of making beer is known as brewing. A dedicated building for the making of beer is called a brewery, though beer can be made in the home and has been for much of its history. A company that makes beer is called either a brewery or a brewing company. Beer made on a domestic scale for non-commercial reasons is classified as homebrewing regardless of where it is made, though most homebrewed beer is made in the home. Brewing beer is subject to legislation and taxation in developed countries, which from the late 19th century largely restricted brewing to a commercial operation only. However, the UK government relaxed legislation in 1963, followed by Australia in 1972 and the US in 1978, allowing homebrewing to become a popular hobby.
`When was the BeiDou-1C satellite launched?`	`The first satellite, BeiDou-1A, was launched on 30 October 2000, followed by BeiDou-1B on 20 December 2000. The third satellite, BeiDou-1C (a backup satellite), was put into orbit on 25 May 2003. The successful launch of BeiDou-1C also meant the establishment of the BeiDou-1 navigation system.`	`The first satellite, BeiDou-1A, was launched on October 31, 2000. The second satellite, BeiDou-1B, was successfully launched on December 21, 2000. The last operational satellite of the constellation, BeiDou-1C, was launched on May 25, 2003.`
`The hotel that sold for the most money in 2014 was which in NYC?`	`Manhattan was on track to have an estimated 90,000 hotel rooms at the end of 2014, a 10% increase from 2013. In October 2014, the Anbang Insurance Group, based in China, purchased the Waldorf Astoria New York for US$1.95 billion, making it the world's most expensive hotel ever sold.`	Real estate is a major force in the city's economy, as the total value of all New York City property was assessed at US$914.8 billion for the 2015 fiscal year. The Time Warner Center is the property with the highest-listed market value in the city, at US$1.1 billion in 2006. New York City is home to some of the nation's—and the world's—most valuable real estate. 450 Park Avenue was sold on July 2, 2007 for US$510 million, about $1,589 per square foot ($17,104/m²), breaking the barely month-old record for an American office building of $1,476 per square foot ($15,887/m²) set in the June 2007 sale of 660 Madison Avenue. According to Forbes, in 2014, Manhattan was home to six of the top ten zip codes in the United States by median housing price.

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

Evaluation Dataset

Unnamed Dataset

Size: 5,000 evaluation samples
Columns: question, context, and negative_1

Approximate statistics based on the first 1000 samples:

	question	context	negative_1
type	string	string	string
details	min: 7 tokens mean: 14.65 tokens max: 31 tokens	min: 31 tokens mean: 147.8 tokens max: 256 tokens	min: 28 tokens mean: 147.17 tokens max: 256 tokens

Samples:

question	context	negative_1
`What percentage of biodiversity has the planet lost since 1970`	In absolute terms, the planet has lost 52% of its biodiversity since 1970 according to a 2014 study by the World Wildlife Fund. The Living Planet Report 2014 claims that "the number of mammals, birds, reptiles, amphibians and fish across the globe is, on average, about half the size it was 40 years ago". Of that number, 39% accounts for the terrestrial wildlife gone, 39% for the marine wildlife gone, and 76% for the freshwater wildlife gone. Biodiversity took the biggest hit in Latin America, plummeting 83 percent. High-income countries showed a 10% increase in biodiversity, which was canceled out by a loss in low-income countries. This is despite the fact that high-income countries use five times the ecological resources of low-income countries, which was explained as a result of process whereby wealthy nations are outsourcing resource depletion to poorer nations, which are suffering the greatest ecosystem losses.	In absolute terms, the planet has lost 52% of its biodiversity since 1970 according to a 2014 study by the World Wildlife Fund. The Living Planet Report 2014 claims that "the number of mammals, birds, reptiles, amphibians and fish across the globe is, on average, about half the size it was 40 years ago". Of that number, 39% accounts for the terrestrial wildlife gone, 39% for the marine wildlife gone, and 76% for the freshwater wildlife gone. Biodiversity took the biggest hit in Latin America, plummeting 83 percent. High-income countries showed a 10% increase in biodiversity, which was canceled out by a loss in low-income countries. This is despite the fact that high-income countries use five times the ecological resources of low-income countries, which was explained as a result of process whereby wealthy nations are outsourcing resource depletion to poorer nations, which are suffering the greatest ecosystem losses.
`What is the per capita income of Delhi as of 2013?`	`New Delhi is the largest commercial city in northern India. It has an estimated net State Domestic Product (FY 2010) of ₹1595 billion (US$23 billion) in nominal terms and ~₹6800 billion (US$100 billion) in PPP terms. As of 2013, the per capita income of Delhi was Rs. 230000, second highest in India after Goa. GSDP in Delhi at the current prices for 2012-13 is estimated at Rs 3.88 trillion (short scale) against Rs 3.11 trillion (short scale) in 2011-12.`	The Government of National Capital Territory of Delhi does not release any economic figures specifically for New Delhi but publishes an official economic report on the whole of Delhi annually. According to the Economic Survey of Delhi, the metropolis has a net State Domestic Product (SDP) of Rs. 83,085 crores (for the year 2004–05) and a per capita income of Rs. 53,976($1,200). In the year 2008–09 New Delhi had a Per Capita Income of Rs.1,16,886 ($2,595).It grew by 16.2% to reach Rs.1,35,814 ($3,018) in 2009–10 fiscal. New Delhi's Per Capita GDP (at PPP) was at $6,860 during 2009–10 fiscal, making it one of the richest cities in India. The tertiary sector contributes 78.4% of Delhi's gross SDP followed by secondary and primary sectors with 20.2% and 1.4% contribution respectively.
`Which Prussian general commanded the attack against the French at St. Privat?`	By 16:50, with the Prussian southern attacks in danger of breaking up, the Prussian 3rd Guards Infantry Brigade of the Second Army opened an attack against the French positions at St. Privat which were commanded by General Canrobert. At 17:15, the Prussian 4th Guards Infantry Brigade joined the advance followed at 17:45 by the Prussian 1st Guards Infantry Brigade. All of the Prussian Guard attacks were pinned down by lethal French gunfire from the rifle pits and trenches. At 18:15 the Prussian 2nd Guards Infantry Brigade, the last of the 1st Guards Infantry Division, was committed to the attack on St. Privat while Steinmetz committed the last of the reserves of the First Army across the Mance Ravine. By 18:30, a considerable portion of the VII and VIII Corps disengaged from the fighting and withdrew towards the Prussian positions at Rezonville.	With the defeat of the First Army, Prince Frederick Charles ordered a massed artillery attack against Canrobert's position at St. Privat to prevent the Guards attack from failing too. At 19:00 the 3rd Division of Fransecky's II Corps of the Second Army advanced across Ravine while the XII Corps cleared out the nearby town of Roncourt and with the survivors of the 1st Guards Infantry Division launched a fresh attack against the ruins of St. Privat. At 20:00, the arrival of the Prussian 4th Infantry Division of the II Corps and with the Prussian right flank on Mance Ravine, the line stabilised. By then, the Prussians of the 1st Guards Infantry Division and the XII and II Corps captured St. Privat forcing the decimated French forces to withdraw. With the Prussians exhausted from the fighting, the French were now able to mount a counter-attack. General Bourbaki, however, refused to commit the reserves of the French Old Guard to the battle because, by that time, he considered the overall si...

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 256
per_device_eval_batch_size: 256
num_train_epochs: 5
warmup_ratio: 0.1
fp16: True
batch_sampler: no_duplicates

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 256
per_device_eval_batch_size: 256
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 5e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 5
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: True
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
tp_size: 0
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional

Training Logs

Epoch	Step	Training Loss	Validation Loss	gooqa-dev_cosine_accuracy
-1	-1	-	-	0.3282
0.5780	100	0.548	0.8841	0.3910
1.1561	200	0.4777	0.8530	0.4024
1.7341	300	0.4008	0.8470	0.4040
2.3121	400	0.3563	0.8390	0.4090
2.8902	500	0.3234	0.8276	0.4064
3.4682	600	0.2866	0.8325	0.4068
4.0462	700	0.274	0.8348	0.4028
4.6243	800	0.2512	0.8284	0.4062
-1	-1	-	-	0.4106

Framework Versions

Python: 3.11.0
Sentence Transformers: 4.0.1
Transformers: 4.50.3
PyTorch: 2.6.0+cu124
Accelerate: 1.5.2
Datasets: 3.5.0
Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Downloads last month: 1

Safetensors

Model size

22.7M params

Tensor type

F32

Model tree for ayushexel/emb-all-MiniLM-L6-v2-squad-5-epochs

Base model

sentence-transformers/all-MiniLM-L6-v2

Finetuned

(820)

this model

Papers for ayushexel/emb-all-MiniLM-L6-v2-squad-5-epochs

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Paper • 1908.10084 • Published Aug 27, 2019 • 12

Efficient Natural Language Response Suggestion for Smart Reply

Paper • 1705.00652 • Published May 1, 2017

Evaluation results

Cosine Accuracy on gooqa dev
self-reported

0.411