SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-MiniLM-L6-v2
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("ayushexel/embed-all-MiniLM-L6-v2-squad-7-epochs")
# Run inference
sentences = [
    'Hokkien is usually written using what characters?',
    "Hokkien dialects are typically written using Chinese characters (漢字, Hàn-jī). However, the written script was and remains adapted to the literary form, which is based on classical Chinese, not the vernacular and spoken form. Furthermore, the character inventory used for Mandarin (standard written Chinese) does not correspond to Hokkien words, and there are a large number of informal characters (替字, thè-jī or thòe-jī; 'substitute characters') which are unique to Hokkien (as is the case with Cantonese). For instance, about 20 to 25% of Taiwanese morphemes lack an appropriate or standard Chinese character.",
    'In the 1990s, marked by the liberalization of language development and mother tongue movement in Taiwan, Taiwanese Hokkien had undergone a fast pace in its development. In 1993, Taiwan became the first region in the world to implement the teaching of Taiwanese Hokkien in Taiwanese schools. In 2001, the local Taiwanese language program was further extended to all schools in Taiwan, and Taiwanese Hokkien became one of the compulsory local Taiwanese languages to be learned in schools. The mother tongue movement in Taiwan even influenced Xiamen (Amoy) to the point that in 2010, Xiamen also began to implement the teaching of Hokkien dialect in its schools. In 2007, the Ministry of Education in Taiwan also completed the standardization of Chinese characters used for writing Hokkien and developed Tai-lo as the standard Hokkien pronunciation and romanization guide. A number of universities in Taiwan also offer Hokkien degree courses for training Hokkien-fluent talents to work for the Hokkien media industry and education. Taiwan also has its own Hokkien literary and cultural circles whereby Hokkien poets and writers compose poetry or literature in Hokkien on a regular basis.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.4082

Training Details

Training Dataset

Unnamed Dataset

  • Size: 44,283 training samples
  • Columns: question, context, and negative
  • Approximate statistics based on the first 1000 samples:
    question context negative
    type string string string
    details
    • min: 6 tokens
    • mean: 14.67 tokens
    • max: 40 tokens
    • min: 28 tokens
    • mean: 145.59 tokens
    • max: 256 tokens
    • min: 29 tokens
    • mean: 150.57 tokens
    • max: 256 tokens
  • Samples:
    question context negative
    The presence of what substance at alkaline pH makes it difficult to precipitate uranium as phosphate? In nature, uranium(VI) forms highly soluble carbonate complexes at alkaline pH. This leads to an increase in mobility and availability of uranium to groundwater and soil from nuclear wastes which leads to health hazards. However, it is difficult to precipitate uranium as phosphate in the presence of excess carbonate at alkaline pH. A Sphingomonas sp. strain BSAR-1 has been found to express a high activity alkaline phosphatase (PhoK) that has been applied for bioprecipitation of uranium as uranyl phosphate species from alkaline solutions. The precipitation ability was enhanced by overexpressing PhoK protein in E. coli. Uranium metal reacts with almost all non-metal elements (with an exception of the noble gases) and their compounds, with reactivity increasing with temperature. Hydrochloric and nitric acids dissolve uranium, but non-oxidizing acids other than hydrochloric acid attack the element very slowly. When finely divided, it can react with cold water; in air, uranium metal becomes coated with a dark layer of uranium oxide. Uranium in ores is extracted chemically and converted into uranium dioxide or other chemical forms usable in industry.
    What UK firm approves pharmaceutical drugs? In the UK, the Medicines and Healthcare Products Regulatory Agency approves drugs for use, though the evaluation is done by the European Medicines Agency, an agency of the European Union based in London. Normally an approval in the UK and other European countries comes later than one in the USA. Then it is the National Institute for Health and Care Excellence (NICE), for England and Wales, who decides if and how the National Health Service (NHS) will allow (in the sense of paying for) their use. The British National Formulary is the core guide for pharmacists and clinicians. On 2 July 2012, GlaxoSmithKline pleaded guilty to criminal charges and agreed to a $3 billion settlement of the largest health-care fraud case in the U.S. and the largest payment by a drug company. The settlement is related to the company's illegal promotion of prescription drugs, its failure to report safety data, bribing doctors, and promoting medicines for uses for which they were not licensed. The drugs involved were Paxil, Wellbutrin, Advair, Lamictal, and Zofran for off-label, non-covered uses. Those and the drugs Imitrex, Lotronex, Flovent, and Valtrex were involved in the kickback scheme.
    Which book was touted for establishing regulations for church and government? Presbyterian history is part of the history of Christianity, but the beginning of Presbyterianism as a distinct movement occurred during the 16th-century Protestant Reformation. As the Catholic Church resisted the reformers, several different theological movements splintered from the Church and bore different denominations. Presbyterianism was especially influenced by the French theologian John Calvin, who is credited with the development of Reformed theology, and the work of John Knox, a Scotsman who studied with Calvin in Geneva, Switzerland and brought his teachings back to Scotland. The Presbyterian church traces its ancestry back primarily to England and Scotland. In August 1560 the Parliament of Scotland adopted the Scots Confession as the creed of the Scottish Kingdom. In December 1560, the First Book of Discipline was published, outlining important doctrinal issues but also establishing regulations for church government, including the creation of ten ecclesiastical districts wi... John Knox (1505–1572), a Scot who had spent time studying under Calvin in Geneva, returned to Scotland and urged his countrymen to reform the Church in line with Calvinist doctrines. After a period of religious convulsion and political conflict culminating in a victory for the Protestant party at the Siege of Leith the authority of the Church of Rome was abolished in favour of Reformation by the legislation of the Scottish Reformation Parliament in 1560. The Church was eventually organised by Andrew Melville along Presbyterian lines to become the national Church of Scotland. King James VI and I moved the Church of Scotland towards an episcopal form of government, and in 1637, James' successor, Charles I and William Laud, the Archbishop of Canterbury, attempted to force the Church of Scotland to use the Book of Common Prayer. What resulted was an armed insurrection, with many Scots signing the Solemn League and Covenant. The Covenanters would serve as the government of Scotland for near...
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 5,000 evaluation samples
  • Columns: question, context, and negative_1
  • Approximate statistics based on the first 1000 samples:
    question context negative_1
    type string string string
    details
    • min: 6 tokens
    • mean: 14.73 tokens
    • max: 39 tokens
    • min: 31 tokens
    • mean: 149.37 tokens
    • max: 256 tokens
    • min: 31 tokens
    • mean: 146.56 tokens
    • max: 256 tokens
  • Samples:
    question context negative_1
    Classical musicians continued to use many instruments from what era? Classical musicians continued to use many of instruments from the Baroque era, such as the cello, contrabass, recorder, trombone, timpani, fortepiano and organ. While some Baroque instruments fell into disuse (e.g., the theorbo and rackett), many Baroque instruments were changed into the versions that are still in use today, such as the Baroque violin (which became the violin), the Baroque oboe (which became the oboe) and the Baroque trumpet, which transitioned to the regular valved trumpet. The instruments currently used in most classical music were largely invented before the mid-19th century (often much earlier) and codified in the 18th and 19th centuries. They consist of the instruments found in an orchestra or in a concert band, together with several other solo instruments (such as the piano, harpsichord, and organ). The symphony orchestra is the most widely known medium for classical music and includes members of the string, woodwind, brass, and percussion families of instruments. The concert band consists of members of the woodwind, brass, and percussion families. It generally has a larger variety and amount of woodwind and brass instruments than the orchestra but does not have a string section. However, many concert bands use a double bass. The vocal practices changed a great deal over the classical period, from the single line monophonic Gregorian chant done by monks in the Medieval period to the complex, polyphonic choral works of the Renaissance and subsequent p...
    What was von Braun's role in the army's rocket program during during World War II? During the Second World War, General Dornberger was the military head of the army's rocket program, Zanssen became the commandant of the Peenemünde army rocket centre, and von Braun was the technical director of the ballistic missile program. They would lead the team that built the Aggregate-4 (A-4) rocket, which became the first vehicle to reach outer space during its test flight program in 1942 and 1943. By 1943, Germany began mass-producing the A-4 as the Vergeltungswaffe 2 ("Vengeance Weapon" 2, or more commonly, V2), a ballistic missile with a 320 kilometers (200 mi) range carrying a 1,130 kilograms (2,490 lb) warhead at 4,000 kilometers per hour (2,500 mph). Its supersonic speed meant there was no defense against it, and radar detection provided little warning. Germany used the weapon to bombard southern England and parts of Allied-liberated western Europe from 1944 until 1945. After the war, the V-2 became the basis of early American and Soviet rocket designs. Von Braun and his team were sent to the United States Army's White Sands Proving Ground, located in New Mexico, in 1945. They set about assembling the captured V2s and began a program of launching them and instructing American engineers in their operation. These tests led to the first rocket to take photos from outer space, and the first two-stage rocket, the WAC Corporal-V2 combination, in 1949. The German rocket team was moved from Fort Bliss to the Army's new Redstone Arsenal, located in Huntsville, Alabama, in 1950. From here, von Braun and his team would develop the Army's first operational medium-range ballistic missile, the Redstone rocket, that would, in slightly modified versions, launch both America's first satellite, and the first piloted Mercury space missions. It became the basis for both the Jupiter and Saturn family of rockets.
    Which king failed to execute Goring's freehold document before fleeing to London? Possibly the first house erected within the site was that of a Sir William Blake, around 1624. The next owner was Lord Goring, who from 1633 extended Blake's house and developed much of today's garden, then known as Goring Great Garden. He did not, however, obtain the freehold interest in the mulberry garden. Unbeknown to Goring, in 1640 the document "failed to pass the Great Seal before King Charles I fled London, which it needed to do for legal execution". It was this critical omission that helped the British royal family regain the freehold under King George III. Possibly the first house erected within the site was that of a Sir William Blake, around 1624. The next owner was Lord Goring, who from 1633 extended Blake's house and developed much of today's garden, then known as Goring Great Garden. He did not, however, obtain the freehold interest in the mulberry garden. Unbeknown to Goring, in 1640 the document "failed to pass the Great Seal before King Charles I fled London, which it needed to do for legal execution". It was this critical omission that helped the British royal family regain the freehold under King George III.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • num_train_epochs: 7
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 7
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss gooqa-dev_cosine_accuracy
-1 -1 - - 0.3282
0.2890 100 0.4377 0.7989 0.3856
0.5780 200 0.4002 0.7684 0.3982
0.8671 300 0.3856 0.7632 0.3994
1.1561 400 0.3278 0.7679 0.4000
1.4451 500 0.2967 0.7551 0.4004
1.7341 600 0.2908 0.7548 0.4014
2.0231 700 0.2847 0.7545 0.4038
2.3121 800 0.2088 0.7515 0.4066
2.6012 900 0.2096 0.7464 0.4110
2.8902 1000 0.2136 0.7433 0.4120
3.1792 1100 0.1838 0.7491 0.4122
3.4682 1200 0.1633 0.7465 0.4072
3.7572 1300 0.1661 0.7540 0.4098
4.0462 1400 0.1621 0.7525 0.4090
4.3353 1500 0.1331 0.7589 0.4040
4.6243 1600 0.1366 0.7505 0.4088
4.9133 1700 0.1402 0.7551 0.4098
5.2023 1800 0.1233 0.7524 0.4094
5.4913 1900 0.1185 0.7543 0.4104
5.7803 2000 0.1197 0.7512 0.4108
6.0694 2100 0.1168 0.7537 0.4104
6.3584 2200 0.1088 0.7552 0.4118
6.6474 2300 0.1074 0.7550 0.4152
6.9364 2400 0.1055 0.7552 0.4132
-1 -1 - - 0.4082

Framework Versions

  • Python: 3.11.0
  • Sentence Transformers: 4.0.1
  • Transformers: 4.50.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.5.2
  • Datasets: 3.5.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
-
Safetensors
Model size
22.7M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ayushexel/embed-all-MiniLM-L6-v2-squad-7-epochs

Finetuned
(774)
this model

Papers for ayushexel/embed-all-MiniLM-L6-v2-squad-7-epochs

Evaluation results