Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper
• 1908.10084 • Published
• 12
This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("ayushexel/embed-all-MiniLM-L6-v2-squad-3-epochs")
# Run inference
sentences = [
'What did Tom Parsons consider as the risk factor for strong future quakes?',
'In a United States Geological Survey (USGS) study, preliminary rupture models of the earthquake indicated displacement of up to 9 meters along a fault approximately 240 km long by 20 km deep. The earthquake generated deformations of the surface greater than 3 meters and increased the stress (and probability of occurrence of future events) at the northeastern and southwestern ends of the fault. On May 20, USGS seismologist Tom Parsons warned that there is "high risk" of a major M>7 aftershock over the next weeks or months.',
'The earthquake was the worst to strike the Sichuan area in over 30 years. Following the quake, experts and the general public sought information on whether or not the earthquake could have been predicted in advance, and whether or not studying statistics related to the quake could result in better prediction of earthquakes in the future. Earthquake prediction is not yet established science; there was no consensus within the scientific community that earthquake "prediction" is possible.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
gooqa-devTripletEvaluator| Metric | Value |
|---|---|
| cosine_accuracy | 0.4034 |
question, context, and negative| question | context | negative | |
|---|---|---|---|
| type | string | string | string |
| details |
|
|
|
| question | context | negative |
|---|---|---|
Some foods that are okay for people to eat are what to dogs? |
A number of common human foods and household ingestibles are toxic to dogs, including chocolate solids (theobromine poisoning), onion and garlic (thiosulphate, sulfoxide or disulfide poisoning), grapes and raisins, macadamia nuts, xylitol, as well as various plants and other potentially ingested materials. The nicotine in tobacco can also be dangerous. Dogs can get it by scavenging in garbage or ashtrays; eating cigars and cigarettes. Signs can be vomiting of large amounts (e.g., from eating cigar butts) or diarrhea. Some other signs are abdominal pain, loss of coordination, collapse, or death. Dogs are highly susceptible to theobromine poisoning, typically from ingestion of chocolate. Theobromine is toxic to dogs because, although the dog's metabolism is capable of breaking down the chemical, the process is so slow that even small amounts of chocolate can be fatal, especially dark chocolate. |
Greek cuisine is characteristic of the healthy Mediterranean diet, which is epitomized by dishes of Crete. Greek cuisine incorporates fresh ingredients into a variety of local dishes such as moussaka, stifado, Greek salad, fasolada, spanakopita and souvlaki. Some dishes can be traced back to ancient Greece like skordalia (a thick purée of walnuts, almonds, crushed garlic and olive oil), lentil soup, retsina (white or rosé wine sealed with pine resin) and pasteli (candy bar with sesame seeds baked with honey). Throughout Greece people often enjoy eating from small dishes such as meze with various dips such as tzatziki, grilled octopus and small fish, feta cheese, dolmades (rice, currants and pine kernels wrapped in vine leaves), various pulses, olives and cheese. Olive oil is added to almost every dish. |
Which city is often referred to as Australia's garden city? |
Melbourne is often referred to as Australia's garden city, and the state of Victoria was once known as the garden state. There is an abundance of parks and gardens in Melbourne, many close to the CBD with a variety of common and rare plant species amid landscaped vistas, pedestrian pathways and tree-lined avenues. Melbourne's parks are often considered the best public parks in all of Australia's major cities. There are also many parks in the surrounding suburbs of Melbourne, such as in the municipalities of Stonnington, Boroondara and Port Phillip, south east of the central business district. The extensive area covered by urban Melbourne is formally divided into hundreds of suburbs (for addressing and postal purposes), and administered as local government areas 31 of which are located within the metropolitan area. |
Melbourne is typical of Australian capital cities in that after the turn of the 20th century, it expanded with the underlying notion of a 'quarter acre home and garden' for every family, often referred to locally as the Australian Dream. This, coupled with the popularity of the private automobile after 1945, led to the auto-centric urban structure now present today in the middle and outer suburbs. Much of metropolitan Melbourne is accordingly characterised by low density sprawl, whilst its inner city areas feature predominantly medium-density, transit-oriented urban forms. The city centre, Docklands, St. Kilda Road and Southbank areas feature high-density forms. |
How much is Israel's water technology industry worth? |
Israel is one of the world's technological leaders in water technology. In 2011, its water technology industry was worth around $2 billion a year with annual exports of products and services in the tens of millions of dollars. The ongoing shortage of water in the country has spurred innovation in water conservation techniques, and a substantial agricultural modernization, drip irrigation, was invented in Israel. Israel is also at the technological forefront of desalination and water recycling. The Ashkelon seawater reverse osmosis (SWRO) plant, the largest in the world, was voted 'Desalination Plant of the Year' in the Global Water Awards in 2006. Israel hosts an annual Water Technology Exhibition and Conference (WaTec) that attracts thousands of people from across the world. By 2014, Israel's desalination programs provided roughly 35% of Israel's drinking water and it is expected to supply 40% by 2015 and 70% by 2050. As of May 29, 2015 more than 50 percent of the water for Israeli ho... |
Israel is a leading country in the development of solar energy. Israel is a global leader in water conservation and geothermal energy, and its development of cutting-edge technologies in software, communications and the life sciences have evoked comparisons with Silicon Valley. According to the OECD, Israel is also ranked 1st in the world in expenditure on Research and Development (R&D) as a percentage of GDP. Intel and Microsoft built their first overseas research and development centers in Israel, and other high-tech multi-national corporations, such as IBM, Google, Apple, HP, Cisco Systems, and Motorola, have opened R&D facilities in the country. |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
question, context, and negative_1| question | context | negative_1 | |
|---|---|---|---|
| type | string | string | string |
| details |
|
|
|
| question | context | negative_1 |
|---|---|---|
What language form differs in the amount of vowel reduction? |
Catalan has inherited the typical vowel system of Vulgar Latin, with seven stressed phonemes: /a ɛ e i ɔ o u/, a common feature in Western Romance, except Spanish. Balearic has also instances of stressed /ə/. Dialects differ in the different degrees of vowel reduction, and the incidence of the pair /ɛ e/. |
In Central Catalan, unstressed vowels reduce to three: /a e ɛ/ > [ə]; /o ɔ u/ > [u]; /i/ remains distinct. The other dialects have different vowel reduction processes (see the section pronunciation of dialects in this article). |
Who released the first collection of Chopin's works? |
Chopin's original publishers included Maurice Schlesinger and Camille Pleyel. His works soon began to appear in popular 19th-century piano anthologies. The first collected edition was by Breitkopf & Härtel (1878–1902). Among modern scholarly editions of Chopin's works are the version under the name of Paderewski published between 1937 and 1966 and the more recent Polish "National Edition", edited by Jan Ekier, both of which contain detailed explanations and discussions regarding choices and sources. |
Possibly the first venture into fictional treatments of Chopin's life was a fanciful operatic version of some of its events. Chopin was written by Giacomo Orefice and produced in Milan in 1901. All the music is derived from that of Chopin. |
How many students in New York partcipate in higher education? |
Over 600,000 students are enrolled in New York City's over 120 higher education institutions, the highest number of any city in the United States, including over half million in the City University of New York (CUNY) system alone in 2014. In 2005, three out of five Manhattan residents were college graduates, and one out of four had a postgraduate degree, forming one of the highest concentrations of highly educated people in any American city. New York City is home to such notable private universities as Barnard College, Columbia University, Cooper Union, Fordham University, New York University, New York Institute of Technology, Pace University, and Yeshiva University. The public CUNY system is one of the largest universities in the nation, comprising 24 institutions across all five boroughs: senior colleges, community colleges, and other graduate/professional schools. The public State University of New York (SUNY) system also serves New York City, as well as the rest of the state. The ... |
Over 600,000 students are enrolled in New York City's over 120 higher education institutions, the highest number of any city in the United States, including over half million in the City University of New York (CUNY) system alone in 2014. In 2005, three out of five Manhattan residents were college graduates, and one out of four had a postgraduate degree, forming one of the highest concentrations of highly educated people in any American city. New York City is home to such notable private universities as Barnard College, Columbia University, Cooper Union, Fordham University, New York University, New York Institute of Technology, Pace University, and Yeshiva University. The public CUNY system is one of the largest universities in the nation, comprising 24 institutions across all five boroughs: senior colleges, community colleges, and other graduate/professional schools. The public State University of New York (SUNY) system also serves New York City, as well as the rest of the state. The ... |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
eval_strategy: stepsper_device_train_batch_size: 128per_device_eval_batch_size: 128warmup_ratio: 0.1fp16: Truebatch_sampler: no_duplicatesoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 128per_device_eval_batch_size: 128per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 3max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}tp_size: 0fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportional| Epoch | Step | Training Loss | Validation Loss | gooqa-dev_cosine_accuracy |
|---|---|---|---|---|
| -1 | -1 | - | - | 0.3264 |
| 0.2890 | 100 | 0.4336 | 0.7852 | 0.3846 |
| 0.5780 | 200 | 0.3867 | 0.7628 | 0.3992 |
| 0.8671 | 300 | 0.3847 | 0.7571 | 0.3930 |
| 1.1561 | 400 | 0.3119 | 0.7450 | 0.4062 |
| 1.4451 | 500 | 0.2727 | 0.7444 | 0.4076 |
| 1.7341 | 600 | 0.2769 | 0.7392 | 0.4052 |
| 2.0231 | 700 | 0.269 | 0.7371 | 0.4032 |
| 2.3121 | 800 | 0.2084 | 0.7373 | 0.4010 |
| 2.6012 | 900 | 0.2095 | 0.7354 | 0.4032 |
| 2.8902 | 1000 | 0.213 | 0.7354 | 0.4096 |
| -1 | -1 | - | - | 0.4034 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
sentence-transformers/all-MiniLM-L6-v2