Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper • 1908.10084 • Published • 13
How to use seongil-dn/further_trainset_large_v2 with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("seongil-dn/further_trainset_large_v2")
sentences = [
"When was Jacques-Louis David born?",
"Jacques-Louis David was born into a prosperous French family in Paris on 30 August 1748. When he was about nine his father was killed in a duel and his mother left him with his well-off architect uncles. They saw to it that he received an excellent education at the Collège des Quatre-Nations, University of Paris, but he was never a good student-- he had a facial tumor that impeded his speech, and he was always preoccupied with drawing. He covered his notebooks with drawings, and he once said, \"I was always hiding behind the instructor's chair, drawing for the duration of the class\". Soon, he desired to be a painter, but his uncles and mother wanted him to be an architect. He overcame the opposition, and went to learn from François Boucher (1703–1770), the leading painter of the time, who was also a distant relative. Boucher was a Rococo painter, but tastes were changing, and the fashion for Rococo was giving way to a more classical style. Boucher decided that instead of taking over David's tutelage, he would send David to his friend, Joseph-Marie Vien (1716–1809), a painter who embraced the classical reaction to Rococo. There, David attended the Royal Academy, based in what is now the Louvre.",
"Jacques Louis Antoine Marie David (22 December 1930 – 19 December 2018) was a French Roman Catholic bishop.\nDavid was born in France and was ordained to the priesthood in 1956. He served as titular bishop of \"Girba\" and as auxiliary bishop of the Roman Catholic Archdiocese of Bordeaux, France, from 1981 to 1986. He served as bishop of the Roman Catholic Diocese of La Rochelle and Saintes, France, from 1986 to 1996. David served as bishop of the Roman Catholic Diocese of Évreux, France, from 1996 to 2006.",
"Jérôme David was born in Rome, Italy on 30 June 1823, nominal grandson of the painter Jacques-Louis David, and godson of Jérôme Bonaparte, King of Westphalia and Catharina of Württemberg, his wife.\nHe was the natural son of King Jérôme.\nHis family destined him for the navy, where he served from 1835 to 1837, but he took a dislike to this service and chose to join the army instead.\nHe graduated from the École de Saint-Cyr on 1 October 1844 as second lieutenant of the Zouaves.",
"Garneray was born in Paris (on Rue Saint-Andre-des-arts, in the Latin Quarter) on 19 February 1783. He was the elder son of Jean-François Garneray (1755–1837), painter of the king, who was pupil of Jacques-Louis David. At thirteen, he joined the Navy as a seaman, encouraged by his cousin, Beaulieu-Leloup, commander of the frigate \"Forte\" (\"the Stout one\"). Garneray sailed from Rochefort to the Indian Ocean with the frigate division under Sercey, to which the \"Forte\" belonged.",
"It was moved there from its original location after the artist's death on 25 December 1825 where his body had been resting in the old churchyard of the St. Michael and St. Gudula collegiate church of the Leopold Quarter of Brussels while waiting for posthumous repatriation to France. However, as a notable participant of the Reign of Terror, his body was not accepted for repatriation, and the lead-lined oak casket was left where it was. Thanks to an initiative by Jobard, a monument was erected with the text \"À Jacques-Louis David restaurateur de l'école moderne de peinture de France ici dessous\" (\"To Jacques-Louis David, restorer of the school of modern art of France, buried here\"). In 1882 a grandson requested that the monument be moved to a more prominent location and the body was re-buried at the Mayor's circle in the city cemetery of Evere."
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [6, 6]This is a sentence-transformers model trained. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 1024, 'do_lower_case': False}) with Transformer model: XLMRobertaModel
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'When were bluebonnets named the state flower of Texas?',
'Bluebonnet is a name given to any number of blue-flowered species of the genus "Lupinus" predominantly found in southwestern United States and is collectively the state flower of Texas. The shape of the petals on the flower resembles the bonnet worn by pioneer women to shield them from the sun.\nSpecies often called bluebonnets include:On March 7, 1901, "Lupinus subcarnosus" became the only species of bluebonnet recognized as the state flower of Texas; however, "Lupinus texensis" emerged as the favorite of most Texans. So, in 1971, the Texas Legislature made any similar species of "Lupinus" that could be found in Texas the state flower.',
'The second major festival hosted in Ennis is the Bluebonnet Trails Festival, celebrating the state flower of Texas and the vibrant bloom of wildflowers in the surrounding countryside. The event attracts tens of thousands of tourists each year to events including sightseeing excursions and a festival in downtown. The festival is held on the third weekend of April, and the Bluebonnet Trails are hosted for the entire month. First hosted along the Kachina Prairie Park\'s historic mile-long trail system in 1938, the Bluebonnet Trails have since expanded into a route map of several dozen miles along rural farm roads throughout the surrounding countryside east and northeast of the city. The routes for these sightseeing excursions have been officially hosted and mapped out by the Ennis Garden Club since 1951. To commemorate the popularity of the Bluebonnet Trails Festival and the efforts made to celebrate and preserve the state flower of Texas, Ennis was designated by the 1997 Texas State Legislature as the "Official Bluebonnet City of Texas" and home to the "Official Bluebonnet Trail of Texas."',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
anchor, positive, negative, negative_2, negative_3, and negative_4| anchor | positive | negative | negative_2 | negative_3 | negative_4 | |
|---|---|---|---|---|---|---|
| type | string | string | string | string | string | string |
| details |
|
|
|
|
|
|
| anchor | positive | negative | negative_2 | negative_3 | negative_4 |
|---|---|---|---|---|---|
When was quantum field theory developed? |
The third thread in the development of quantum field theory was the need to handle the statistics of many-particle systems consistently and with ease. In 1927, Pascual Jordan tried to extend the canonical quantization of fields to the many-body wave functions of identical particles using a formalism which is known as statistical transformation theory; this procedure is now sometimes called second quantization. In 1928, Jordan and Eugene Wigner found that the quantum field describing electrons, or other fermions, had to be expanded using anti-commuting creation and annihilation operators due to the Pauli exclusion principle (see Jordan–Wigner transformation). This thread of development was incorporated into many-body theory and strongly influenced condensed matter physics and nuclear physics. |
The application of the new quantum theory to electromagnetism resulted in quantum field theory, which was developed starting around 1930. Quantum field theory has driven the development of more sophisticated formulations of quantum mechanics, of which the ones presented here are simple special cases. |
Two classic text-books from the 1960s, James D. Bjorken, Sidney David Drell, "Relativistic Quantum Mechanics" (1964) and J. J. Sakurai, "Advanced Quantum Mechanics" (1967), thoroughly developed the Feynman graph expansion techniques using physically intuitive and practical methods following from the correspondence principle, without worrying about the technicalities involved in deriving the Feynman rules from the superstructure of quantum field theory itself. Although both Feynman's heuristic and pictorial style of dealing with the infinities, as well as the formal methods of Tomonaga and Schwinger, worked extremely well, and gave spectacularly accurate answers, the true analytical nature of the question of "renormalizability", that is, whether ANY theory formulated as a "quantum field theory" would give finite answers, was not worked-out until much later, when the urgency of trying to formulate finite theories for the strong and electro-weak (and gravitational interactions) demanded i... |
It was evident from the beginning that a proper quantum treatment of the electromagnetic field had to somehow incorporate Einstein's relativity theory, which had grown out of the study of classical electromagnetism. This need to put together relativity and quantum mechanics was the second major motivation in the development of quantum field theory. Pascual Jordan and Wolfgang Pauli showed in 1928 that quantum fields could be made to behave in the way predicted by special relativity during coordinate transformations (specifically, they showed that the field commutators were Lorentz invariant). A further boost for quantum field theory came with the discovery of the Dirac equation, which was originally formulated and interpreted as a single-particle equation analogous to the Schrödinger equation, but unlike the Schrödinger equation, the Dirac equation satisfies both the Lorentz invariance, that is, the requirements of special relativity, and the rules of quantum mechanics. |
Through the works of Born, Heisenberg, and Pascual Jordan in 1925-1926, a quantum theory of the free electromagnetic field (one with no interactions with matter) was developed via canonical quantization by treating the electromagnetic field as a set of quantum harmonic oscillators. With the exclusion of interactions, however, such a theory was yet incapable of making quantitative predictions about the real world. |
Was there a year 0? |
Cassini gave the following reasons for using a year 0: |
Games Def Interceptions Fumbles Sacks & Tackles |
This enzyme belongs to the family of oxidoreductases, specifically those acting on paired donors, with O2 as oxidant and incorporation or reduction of oxygen. The oxygen incorporated need not be derived from O2 with 2-oxoglutarate as one donor, and incorporation of one atom o oxygen into each donor. The systematic name of this enzyme class is N6,N6,N6-trimethyl-L-lysine,2-oxoglutarate:oxygen oxidoreductase (3-hydroxylating). Other names in common use include trimethyllysine alpha-ketoglutarate dioxygenase, TML-alpha-ketoglutarate dioxygenase, TML hydroxylase, 6-N,6-N,6-N-trimethyl-L-lysine,2-oxoglutarate:oxygen oxidoreductase, and (3-hydroxylating). This enzyme participates in lysine degradation and L-carnitine biosynthesis and requires the presence of iron and ascorbate. |
ㅜ is one of the Korean hangul. The Unicode for ㅜ is U+315C. |
ㅌ is one of the Korean hangul. The Unicode for ㅌ is U+314C. |
When is the dialectical method used? |
The Dialect Test was created by A.J. Ellis in February 1879, and was used in the fieldwork for his work "On Early English Pronunciation". It stands as one of the earliest methods of identifying vowel sounds and features of speech. The aim was to capture the main vowel sounds of an individual dialect by listening to the reading of a short passage. All the categories of West Saxon words and vowels were included in the test so that comparisons could be made with the historic West Saxon speech as well as with various other dialects. |
Karl Popper has attacked the dialectic repeatedly. In 1937, he wrote and delivered a paper entitled "What Is Dialectic?" in which he attacked the dialectical method for its willingness "to put up with contradictions". Popper concluded the essay with these words: "The whole development of dialectic should be a warning against the dangers inherent in philosophical system-building. It should remind us that philosophy should not be made a basis for any sort of scientific system and that philosophers should be much more modest in their claims. One task which they can fulfill quite usefully is the study of the critical methods of science" (Ibid., p. 335). |
He was one of the first to apply Labovian methods in Britain with his research in 1970-1 on the speech of Bradford, Halifax and Huddersfield. He concluded that the speech detailed in most of dialectology (e.g. A. J. Ellis, the Survey of English Dialects) had virtually disappeared, having found only one speaker out of his sample of 106 speakers who regularly used dialect. However, he found that differences in speech persisted as an indicator of social class, age and gender. This PhD dissertation was later adapted into a book, "Dialect and Accent in Industrial West Yorkshire". The work was criticised by Graham Shorrocks on the grounds that the sociolinguistic methods used were inappropriate for recording the traditional vernacular and that there was an inadequate basis for comparison with earlier dialect studies in West Yorkshire. |
The Institute also attempted to reformulate dialectics as a concrete method. The use of such a dialectical method can be traced back to the philosophy of Hegel, who conceived dialectic as the tendency of a notion to pass over into its own negation as the result of conflict between its inherent contradictory aspects. In opposition to previous modes of thought, which viewed things in abstraction, each by itself and as though endowed with fixed properties, Hegelian dialectic has the ability to consider ideas according to their movement and change in time, as well as according to their interrelations and interactions. |
For Marx, dialectics is not a formula for generating predetermined outcomes but is a method for the empirical study of social processes in terms of interrelations, development, and transformation. In his introduction to the Penguin edition of Marx's "Capital", Ernest Mandel writes, "When the dialectical method is applied to the study of economic problems, economic phenomena are not viewed separately from each other, by bits and pieces, but in their inner connection as an integrated totality, structured around, and by, a basic predominant mode of production." |
CachedGISTEmbedLoss with these parameters:{'guide': SentenceTransformer(
(0): Transformer({'max_seq_length': 1024, 'do_lower_case': False}) with Transformer model: XLMRobertaModel
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
), 'temperature': 0.01}
per_device_train_batch_size: 1024learning_rate: 3e-05weight_decay: 0.01num_train_epochs: 6warmup_ratio: 0.05bf16: Truebatch_sampler: no_duplicatesoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 1024per_device_eval_batch_size: 8per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 3e-05weight_decay: 0.01adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 6max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.05warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Truefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Truedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportional| Epoch | Step | Training Loss |
|---|---|---|
| 0.04 | 1 | 0.1495 |
| 0.08 | 2 | 0.1625 |
| 0.12 | 3 | 0.1623 |
| 0.16 | 4 | 0.1883 |
| 0.2 | 5 | 0.1559 |
| 0.24 | 6 | 0.15 |
| 0.28 | 7 | 0.1513 |
| 0.32 | 8 | 0.1646 |
| 0.36 | 9 | 0.1608 |
| 0.4 | 10 | 0.1774 |
| 0.44 | 11 | 0.1545 |
| 0.48 | 12 | 0.1646 |
| 0.52 | 13 | 0.1901 |
| 0.56 | 14 | 0.1791 |
| 0.6 | 15 | 0.1621 |
| 0.64 | 16 | 0.1539 |
| 0.68 | 17 | 0.1623 |
| 0.72 | 18 | 0.1802 |
| 0.76 | 19 | 0.1613 |
| 0.8 | 20 | 0.172 |
| 0.84 | 21 | 0.1481 |
| 0.88 | 22 | 0.1783 |
| 0.92 | 23 | 0.1636 |
| 0.96 | 24 | 0.1753 |
| 1.0 | 25 | 0.1763 |
| 1.04 | 26 | 0.1302 |
| 1.08 | 27 | 0.1172 |
| 1.12 | 28 | 0.1207 |
| 1.16 | 29 | 0.1259 |
| 1.2 | 30 | 0.1241 |
| 1.24 | 31 | 0.1208 |
| 1.28 | 32 | 0.1155 |
| 1.32 | 33 | 0.1168 |
| 1.3600 | 34 | 0.0985 |
| 1.4 | 35 | 0.119 |
| 1.44 | 36 | 0.1063 |
| 1.48 | 37 | 0.1277 |
| 1.52 | 38 | 0.1071 |
| 1.56 | 39 | 0.1234 |
| 1.6 | 40 | 0.106 |
| 1.6400 | 41 | 0.109 |
| 1.6800 | 42 | 0.1149 |
| 1.72 | 43 | 0.1068 |
| 1.76 | 44 | 0.1035 |
| 1.8 | 45 | 0.1221 |
| 1.8400 | 46 | 0.1007 |
| 1.88 | 47 | 0.1001 |
| 1.92 | 48 | 0.1105 |
| 1.96 | 49 | 0.1144 |
| 2.0 | 50 | 0.099 |
| 2.04 | 51 | 0.0873 |
| 2.08 | 52 | 0.0845 |
| 2.12 | 53 | 0.0848 |
| 2.16 | 54 | 0.0815 |
| 2.2 | 55 | 0.073 |
| 2.24 | 56 | 0.0915 |
| 2.2800 | 57 | 0.0833 |
| 2.32 | 58 | 0.0808 |
| 2.36 | 59 | 0.0837 |
| 2.4 | 60 | 0.0889 |
| 2.44 | 61 | 0.0829 |
| 2.48 | 62 | 0.0887 |
| 2.52 | 63 | 0.0898 |
| 2.56 | 64 | 0.0679 |
| 2.6 | 65 | 0.0835 |
| 2.64 | 66 | 0.0736 |
| 2.68 | 67 | 0.0813 |
| 2.7200 | 68 | 0.0832 |
| 2.76 | 69 | 0.0785 |
| 2.8 | 70 | 0.076 |
| 2.84 | 71 | 0.0833 |
| 2.88 | 72 | 0.0891 |
| 2.92 | 73 | 0.0709 |
| 2.96 | 74 | 0.0825 |
| 3.0 | 75 | 0.0695 |
| 3.04 | 76 | 0.0553 |
| 3.08 | 77 | 0.0612 |
| 3.12 | 78 | 0.0663 |
| 3.16 | 79 | 0.0663 |
| 3.2 | 80 | 0.0585 |
| 3.24 | 81 | 0.0647 |
| 3.2800 | 82 | 0.0631 |
| 3.32 | 83 | 0.0676 |
| 3.36 | 84 | 0.0708 |
| 3.4 | 85 | 0.0599 |
| 3.44 | 86 | 0.0629 |
| 3.48 | 87 | 0.0618 |
| 3.52 | 88 | 0.0529 |
| 3.56 | 89 | 0.0572 |
| 3.6 | 90 | 0.0641 |
| 3.64 | 91 | 0.0636 |
| 3.68 | 92 | 0.0538 |
| 3.7200 | 93 | 0.061 |
| 3.76 | 94 | 0.0541 |
| 3.8 | 95 | 0.0671 |
| 3.84 | 96 | 0.0589 |
| 3.88 | 97 | 0.0575 |
| 3.92 | 98 | 0.0639 |
| 3.96 | 99 | 0.059 |
| 4.0 | 100 | 0.0593 |
| 4.04 | 101 | 0.0481 |
| 4.08 | 102 | 0.0496 |
| 4.12 | 103 | 0.0519 |
| 4.16 | 104 | 0.0536 |
| 4.2 | 105 | 0.0481 |
| 4.24 | 106 | 0.0521 |
| 4.28 | 107 | 0.0551 |
| 4.32 | 108 | 0.0495 |
| 4.36 | 109 | 0.0524 |
| 4.4 | 110 | 0.0463 |
| 4.44 | 111 | 0.0572 |
| 4.48 | 112 | 0.0419 |
| 4.52 | 113 | 0.0524 |
| 4.5600 | 114 | 0.053 |
| 4.6 | 115 | 0.0503 |
| 4.64 | 116 | 0.0522 |
| 4.68 | 117 | 0.0388 |
| 4.72 | 118 | 0.0436 |
| 4.76 | 119 | 0.0527 |
| 4.8 | 120 | 0.0454 |
| 4.84 | 121 | 0.0503 |
| 4.88 | 122 | 0.053 |
| 4.92 | 123 | 0.0566 |
| 4.96 | 124 | 0.0534 |
| 5.0 | 125 | 0.0455 |
| 5.04 | 126 | 0.0471 |
| 5.08 | 127 | 0.0446 |
| 5.12 | 128 | 0.0469 |
| 5.16 | 129 | 0.0495 |
| 5.2 | 130 | 0.0412 |
| 5.24 | 131 | 0.0482 |
| 5.28 | 132 | 0.0425 |
| 5.32 | 133 | 0.0389 |
| 5.36 | 134 | 0.0468 |
| 5.4 | 135 | 0.046 |
| 5.44 | 136 | 0.0438 |
| 5.48 | 137 | 0.0465 |
| 5.52 | 138 | 0.0418 |
| 5.5600 | 139 | 0.0453 |
| 5.6 | 140 | 0.0463 |
| 5.64 | 141 | 0.0439 |
| 5.68 | 142 | 0.0447 |
| 5.72 | 143 | 0.0464 |
| 5.76 | 144 | 0.0413 |
| 5.8 | 145 | 0.0388 |
| 5.84 | 146 | 0.0468 |
| 5.88 | 147 | 0.0416 |
| 5.92 | 148 | 0.0441 |
| 5.96 | 149 | 0.0446 |
| 6.0 | 150 | 0.0469 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}