SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: sentence-transformers/all-MiniLM-L6-v2
Maximum Sequence Length: 256 tokens
Output Dimensionality: 384 dimensions
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("ayushexel/embed-all-MiniLM-L6-v2-squad-7-epochs")
# Run inference
sentences = [
    'Hokkien is usually written using what characters?',
    "Hokkien dialects are typically written using Chinese characters (漢字, Hàn-jī). However, the written script was and remains adapted to the literary form, which is based on classical Chinese, not the vernacular and spoken form. Furthermore, the character inventory used for Mandarin (standard written Chinese) does not correspond to Hokkien words, and there are a large number of informal characters (替字, thè-jī or thòe-jī; 'substitute characters') which are unique to Hokkien (as is the case with Cantonese). For instance, about 20 to 25% of Taiwanese morphemes lack an appropriate or standard Chinese character.",
    'In the 1990s, marked by the liberalization of language development and mother tongue movement in Taiwan, Taiwanese Hokkien had undergone a fast pace in its development. In 1993, Taiwan became the first region in the world to implement the teaching of Taiwanese Hokkien in Taiwanese schools. In 2001, the local Taiwanese language program was further extended to all schools in Taiwan, and Taiwanese Hokkien became one of the compulsory local Taiwanese languages to be learned in schools. The mother tongue movement in Taiwan even influenced Xiamen (Amoy) to the point that in 2010, Xiamen also began to implement the teaching of Hokkien dialect in its schools. In 2007, the Ministry of Education in Taiwan also completed the standardization of Chinese characters used for writing Hokkien and developed Tai-lo as the standard Hokkien pronunciation and romanization guide. A number of universities in Taiwan also offer Hokkien degree courses for training Hokkien-fluent talents to work for the Hokkien media industry and education. Taiwan also has its own Hokkien literary and cultural circles whereby Hokkien poets and writers compose poetry or literature in Hokkien on a regular basis.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

Dataset: gooqa-dev
Evaluated with TripletEvaluator

Metric	Value
cosine_accuracy	0.4082

Training Details

Training Dataset

Unnamed Dataset

Size: 44,283 training samples
Columns: question, context, and negative

Approximate statistics based on the first 1000 samples:

	question	context	negative
type	string	string	string
details	min: 6 tokens mean: 14.67 tokens max: 40 tokens	min: 28 tokens mean: 145.59 tokens max: 256 tokens	min: 29 tokens mean: 150.57 tokens max: 256 tokens

Samples:

question	context	negative
`The presence of what substance at alkaline pH makes it difficult to precipitate uranium as phosphate?`	In nature, uranium(VI) forms highly soluble carbonate complexes at alkaline pH. This leads to an increase in mobility and availability of uranium to groundwater and soil from nuclear wastes which leads to health hazards. However, it is difficult to precipitate uranium as phosphate in the presence of excess carbonate at alkaline pH. A Sphingomonas sp. strain BSAR-1 has been found to express a high activity alkaline phosphatase (PhoK) that has been applied for bioprecipitation of uranium as uranyl phosphate species from alkaline solutions. The precipitation ability was enhanced by overexpressing PhoK protein in E. coli.	Uranium metal reacts with almost all non-metal elements (with an exception of the noble gases) and their compounds, with reactivity increasing with temperature. Hydrochloric and nitric acids dissolve uranium, but non-oxidizing acids other than hydrochloric acid attack the element very slowly. When finely divided, it can react with cold water; in air, uranium metal becomes coated with a dark layer of uranium oxide. Uranium in ores is extracted chemically and converted into uranium dioxide or other chemical forms usable in industry.
`What UK firm approves pharmaceutical drugs?`	In the UK, the Medicines and Healthcare Products Regulatory Agency approves drugs for use, though the evaluation is done by the European Medicines Agency, an agency of the European Union based in London. Normally an approval in the UK and other European countries comes later than one in the USA. Then it is the National Institute for Health and Care Excellence (NICE), for England and Wales, who decides if and how the National Health Service (NHS) will allow (in the sense of paying for) their use. The British National Formulary is the core guide for pharmacists and clinicians.	On 2 July 2012, GlaxoSmithKline pleaded guilty to criminal charges and agreed to a $3 billion settlement of the largest health-care fraud case in the U.S. and the largest payment by a drug company. The settlement is related to the company's illegal promotion of prescription drugs, its failure to report safety data, bribing doctors, and promoting medicines for uses for which they were not licensed. The drugs involved were Paxil, Wellbutrin, Advair, Lamictal, and Zofran for off-label, non-covered uses. Those and the drugs Imitrex, Lotronex, Flovent, and Valtrex were involved in the kickback scheme.
`Which book was touted for establishing regulations for church and government?`	Presbyterian history is part of the history of Christianity, but the beginning of Presbyterianism as a distinct movement occurred during the 16th-century Protestant Reformation. As the Catholic Church resisted the reformers, several different theological movements splintered from the Church and bore different denominations. Presbyterianism was especially influenced by the French theologian John Calvin, who is credited with the development of Reformed theology, and the work of John Knox, a Scotsman who studied with Calvin in Geneva, Switzerland and brought his teachings back to Scotland. The Presbyterian church traces its ancestry back primarily to England and Scotland. In August 1560 the Parliament of Scotland adopted the Scots Confession as the creed of the Scottish Kingdom. In December 1560, the First Book of Discipline was published, outlining important doctrinal issues but also establishing regulations for church government, including the creation of ten ecclesiastical districts wi...	John Knox (1505–1572), a Scot who had spent time studying under Calvin in Geneva, returned to Scotland and urged his countrymen to reform the Church in line with Calvinist doctrines. After a period of religious convulsion and political conflict culminating in a victory for the Protestant party at the Siege of Leith the authority of the Church of Rome was abolished in favour of Reformation by the legislation of the Scottish Reformation Parliament in 1560. The Church was eventually organised by Andrew Melville along Presbyterian lines to become the national Church of Scotland. King James VI and I moved the Church of Scotland towards an episcopal form of government, and in 1637, James' successor, Charles I and William Laud, the Archbishop of Canterbury, attempted to force the Church of Scotland to use the Book of Common Prayer. What resulted was an armed insurrection, with many Scots signing the Solemn League and Covenant. The Covenanters would serve as the government of Scotland for near...

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

Evaluation Dataset

Unnamed Dataset

Size: 5,000 evaluation samples
Columns: question, context, and negative_1

Approximate statistics based on the first 1000 samples:

	question	context	negative_1
type	string	string	string
details	min: 6 tokens mean: 14.73 tokens max: 39 tokens	min: 31 tokens mean: 149.37 tokens max: 256 tokens	min: 31 tokens mean: 146.56 tokens max: 256 tokens

Samples:

question	context	negative_1
`Classical musicians continued to use many instruments from what era?`	`Classical musicians continued to use many of instruments from the Baroque era, such as the cello, contrabass, recorder, trombone, timpani, fortepiano and organ. While some Baroque instruments fell into disuse (e.g., the theorbo and rackett), many Baroque instruments were changed into the versions that are still in use today, such as the Baroque violin (which became the violin), the Baroque oboe (which became the oboe) and the Baroque trumpet, which transitioned to the regular valved trumpet.`	The instruments currently used in most classical music were largely invented before the mid-19th century (often much earlier) and codified in the 18th and 19th centuries. They consist of the instruments found in an orchestra or in a concert band, together with several other solo instruments (such as the piano, harpsichord, and organ). The symphony orchestra is the most widely known medium for classical music and includes members of the string, woodwind, brass, and percussion families of instruments. The concert band consists of members of the woodwind, brass, and percussion families. It generally has a larger variety and amount of woodwind and brass instruments than the orchestra but does not have a string section. However, many concert bands use a double bass. The vocal practices changed a great deal over the classical period, from the single line monophonic Gregorian chant done by monks in the Medieval period to the complex, polyphonic choral works of the Renaissance and subsequent p...
`What was von Braun's role in the army's rocket program during during World War II?`	During the Second World War, General Dornberger was the military head of the army's rocket program, Zanssen became the commandant of the Peenemünde army rocket centre, and von Braun was the technical director of the ballistic missile program. They would lead the team that built the Aggregate-4 (A-4) rocket, which became the first vehicle to reach outer space during its test flight program in 1942 and 1943. By 1943, Germany began mass-producing the A-4 as the Vergeltungswaffe 2 ("Vengeance Weapon" 2, or more commonly, V2), a ballistic missile with a 320 kilometers (200 mi) range carrying a 1,130 kilograms (2,490 lb) warhead at 4,000 kilometers per hour (2,500 mph). Its supersonic speed meant there was no defense against it, and radar detection provided little warning. Germany used the weapon to bombard southern England and parts of Allied-liberated western Europe from 1944 until 1945. After the war, the V-2 became the basis of early American and Soviet rocket designs.	Von Braun and his team were sent to the United States Army's White Sands Proving Ground, located in New Mexico, in 1945. They set about assembling the captured V2s and began a program of launching them and instructing American engineers in their operation. These tests led to the first rocket to take photos from outer space, and the first two-stage rocket, the WAC Corporal-V2 combination, in 1949. The German rocket team was moved from Fort Bliss to the Army's new Redstone Arsenal, located in Huntsville, Alabama, in 1950. From here, von Braun and his team would develop the Army's first operational medium-range ballistic missile, the Redstone rocket, that would, in slightly modified versions, launch both America's first satellite, and the first piloted Mercury space missions. It became the basis for both the Jupiter and Saturn family of rockets.
`Which king failed to execute Goring's freehold document before fleeing to London?`	Possibly the first house erected within the site was that of a Sir William Blake, around 1624. The next owner was Lord Goring, who from 1633 extended Blake's house and developed much of today's garden, then known as Goring Great Garden. He did not, however, obtain the freehold interest in the mulberry garden. Unbeknown to Goring, in 1640 the document "failed to pass the Great Seal before King Charles I fled London, which it needed to do for legal execution". It was this critical omission that helped the British royal family regain the freehold under King George III.	Possibly the first house erected within the site was that of a Sir William Blake, around 1624. The next owner was Lord Goring, who from 1633 extended Blake's house and developed much of today's garden, then known as Goring Great Garden. He did not, however, obtain the freehold interest in the mulberry garden. Unbeknown to Goring, in 1640 the document "failed to pass the Great Seal before King Charles I fled London, which it needed to do for legal execution". It was this critical omission that helped the British royal family regain the freehold under King George III.

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 128
per_device_eval_batch_size: 128
num_train_epochs: 7
warmup_ratio: 0.1
fp16: True
batch_sampler: no_duplicates

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 128
per_device_eval_batch_size: 128
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 5e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 7
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: True
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
tp_size: 0
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional

Training Logs

Epoch	Step	Training Loss	Validation Loss	gooqa-dev_cosine_accuracy
-1	-1	-	-	0.3282
0.2890	100	0.4377	0.7989	0.3856
0.5780	200	0.4002	0.7684	0.3982
0.8671	300	0.3856	0.7632	0.3994
1.1561	400	0.3278	0.7679	0.4000
1.4451	500	0.2967	0.7551	0.4004
1.7341	600	0.2908	0.7548	0.4014
2.0231	700	0.2847	0.7545	0.4038
2.3121	800	0.2088	0.7515	0.4066
2.6012	900	0.2096	0.7464	0.4110
2.8902	1000	0.2136	0.7433	0.4120
3.1792	1100	0.1838	0.7491	0.4122
3.4682	1200	0.1633	0.7465	0.4072
3.7572	1300	0.1661	0.7540	0.4098
4.0462	1400	0.1621	0.7525	0.4090
4.3353	1500	0.1331	0.7589	0.4040
4.6243	1600	0.1366	0.7505	0.4088
4.9133	1700	0.1402	0.7551	0.4098
5.2023	1800	0.1233	0.7524	0.4094
5.4913	1900	0.1185	0.7543	0.4104
5.7803	2000	0.1197	0.7512	0.4108
6.0694	2100	0.1168	0.7537	0.4104
6.3584	2200	0.1088	0.7552	0.4118
6.6474	2300	0.1074	0.7550	0.4152
6.9364	2400	0.1055	0.7552	0.4132
-1	-1	-	-	0.4082

Framework Versions

Python: 3.11.0
Sentence Transformers: 4.0.1
Transformers: 4.50.3
PyTorch: 2.6.0+cu124
Accelerate: 1.5.2
Datasets: 3.5.0
Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Downloads last month: -

Safetensors

Model size

22.7M params

Tensor type

F32

Model tree for ayushexel/embed-all-MiniLM-L6-v2-squad-7-epochs

Base model

sentence-transformers/all-MiniLM-L6-v2

Finetuned

(774)

this model

Papers for ayushexel/embed-all-MiniLM-L6-v2-squad-7-epochs

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Paper • 1908.10084 • Published Aug 27, 2019 • 12

Efficient Natural Language Response Suggestion for Smart Reply

Paper • 1705.00652 • Published May 1, 2017

Evaluation results

Cosine Accuracy on gooqa dev
self-reported

0.408