Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper • 1908.10084 • Published • 13
How to use amene-gafsi/MNLP_M3_document_encoder with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("amene-gafsi/MNLP_M3_document_encoder")
sentences = [
"Which of the following statements accurately describes the relationship between gene compaction and locus volume in genomic loci?\n\nA) Increased locus volume correlates with higher gene compaction.\n\nB) A high level of compaction is associated with a high volume of the genomic locus.\n\nC) Gene compaction is directly proportional to locus volume.\n\nD) Loci with low volume exhibit high levels of compaction.\n\n**Correct Answer: D) Loci with low volume exhibit high levels of compaction.**",
"The QoS negotiation is supported by the PRACK request, that starts resource reservation in the calling party network, and it is answered by a 2XX response code. Once this response has been sent, the called party has selected the codec too, and starts resource reservation on its side. Subsequent UPDATE requests are sent to inform about the reservation progress, and they are answered by 2XX response codes. In a typical offer/answer exchange, one UPDATE will be sent by the calling party when its reservation is completed, then the called party will respond and eventually finish allocating the resources. It is then, when all the resources for the call are in place, when the caller is alerted.\nIf the individual has undergone stenting, an anticoagulant will be a necessity to prevent build-up around the stent(s), as the body will perceive the foreign body as a wound and attempt to heal it. Some patients who had alternate corrective surgery, such as the Mustard or Senning procedure, may have issues with SA and VA nodal transmissions in later life. Typical symptoms include palpitations and problems with low heart rates. This is commonly solved with a Pacemaker unit, providing scar tissue from the original operation does not block its functionality. More recently, ACE inhibitors have been prescribed to patients in the hope of relieving stress on the heart.\nUsing this method which results in a relatively high control of size and shape, semiconductor nanostructures could be synthesized in the form of dots, tubes, wires and other forms which show interesting optic and electronic size-dependent properties. Since the synergistic properties resulting from the intimate contact and interaction between the core and shell, CSSNCs can provide novel functions and enhanced properties which are not observed in single nanoparticles.The size of core materials and the thickness of shell can be controlled during synthesis. For example, in the synthesis of CdSe core nanocrystals, the volume of H2S gas can determine the size of core nanocrystals.",
"In mathematics, the Chang number of an irreducible representation of a simple complex Lie algebra is its dimension modulo 1 + h, where h is the Coxeter number. Chang numbers are named after Chang (1982), who rediscovered an element of order h + 1 found by Kac (1981). Kac (1981) showed that there is a unique class of regular elements σ of order h + 1, in the complex points of the corresponding Chevalley group. He showed that the trace of σ on an irreducible representation is −1, 0, or +1, and if h + 1 is prime then the trace is congruent to the dimension mod h+1. This implies that the dimension of an irreducible representation is always −1, 0, or +1 mod h + 1 whenever h + 1 is prime.\nMosquito bite allergies are informally classified as 1) the skeeter syndrome, i.e., severe local skin reactions sometimes associated with low-grade fever; 2) systemic reactions that range from high-grade fever, lymphadenopathy, abdominal pain, and/or diarrhea to, very rarely, life-threatening symptoms of anaphylaxis; and 3) severe and often systemic reactions occurring in individuals that have an Epstein-Barr virus-associated lymphoproliferative disease, Epstein-Barr virus-negative lymphoid malignancy, or another predisposing condition such as eosinophilic cellulitis or chronic lymphocytic leukemia. The term papular urticaria is commonly used for a reaction to mosquito bites that is dominated by widely spread hives.",
"All LIRR bilevel passenger rail cars have two wide quarter-point doors on each side, for high level platforms only. The bilevel cars used by NJ Transit and Exo have four doors on each side, two quarter-point doors at high level platform height and one at each end vestibule, with traps used to reach low level platforms. The bilevel cars used by MBTA have side doors with traps at each end vestibule.\nFor 128 bits of security and the smallest signature size in a Rainbow multivariate quadratic equation signature scheme, Petzoldt, Bulygin and Buchmann, recommend using equations in F 31 {\\displaystyle \\mathbb {F} _{31}} with a public key size of just over 991,000 bits, a private key of just over 740,000 bits and digital signatures which are 424 bits in length.\nA 2020 study identified a habitat-specific and relatively abundant core microbiome in the manuka phyllosphere, which was persistent across all samples. In contrast, non-core phyllosphere microorganisms exhibited significant variation across individual host trees and populations that was strongly driven by environmental and spatial factors. The results demonstrated the existence of a dominant and ubiquitous core microbiome in the phyllosphere of manuka.\nIt seems that weak polarizations are ordinarily unable to form a component of a vector soliton. However, due to the cross-polarization modulation between strong and weak polarization components, a \"weak soliton\" could also be formed. It thus demonstrates that the soliton obtained is not a \"scalar\" soliton with a linear polarization mode, but rather a vector soliton with a large ellipticity.\nThe GAMtools command gamtools compaction can be used to calculate an estimation of chromatin compaction. Compaction is a value assigned to a gene that represents how large the gene is. The level of compaction is inversely proportional to the locus volume. Genomic loci with a low volume are said to have a high level of compaction, and loci with a high volume have a low level of compaction."
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2 on the json dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("amene-gafsi/minilm-finetuned-embedding")
# Run inference
sentences = [
'Which of the following statements accurately describes a key implication of the parameter μ in population dynamics?\n\nA) If μ equals 1, the population will certainly grow indefinitely. \nB) If μ is less than 1, the population will face a high probability of extinction. \nC) If μ is greater than 1, the population will go extinct with certainty. \nD) If μ is equal to 0, the population will persist indefinitely.\n\n**Correct Answer: B) If μ is less than 1, the population will face a high probability of extinction.**',
"Micro-encapsulation allows for metabolism within the membrane, exchange of small molecules and prevention of passage of large substances across it. The main advantages of encapsulation include improved mimicry in the body, increased solubility of the cargo and decreased immune responses. Notably, artificial cells have been clinically successful in hemoperfusion.\nIf μ < 1, then the expected number of individuals goes rapidly to zero, which implies ultimate extinction with probability 1 by Markov's inequality. Alternatively, if μ > 1, then the probability of ultimate extinction is less than 1 (but not necessarily zero; consider a process where each individual either has 0 or 100 children with equal probability.\nThe decision to tolerate up to 10 μg/liter of “nonrelevant” metabolites in groundwater and drinking water is politically highly contentious in Europe. Some consider the higher limit acceptable as no imminent health risk can be proven, whereas others regard it as a fundamental deviation from the precautionary principle. == References ==\nInformally, dynamical systems describe the time evolution of the phase space of some mechanical system. Commonly, such evolution is given by some differential equations, or quite often in terms of discrete time steps. However, in the present case, instead of focusing on the time evolution of discrete points, one shifts attention to the time evolution of collections of points.\nCommercial Crew Development (CCDev) is a human spaceflight development program that is funded by the U.S. government and administered by NASA. CCDev will result in US and international astronauts flying to the International Space Station (ISS) on privately operated crew vehicles. Operational contracts to fly astronauts were awarded in September 2014 to SpaceX and Boeing.\nTo do so, one needs precise disease definitions and a probabilistic analysis of symptoms and molecular profiles. Physicists have been studying similar problems for years, utilizing microscopic elements and their interactions to extract macroscopic states of various physical systems. Physics inspired machine learning approaches can thus be applied to study disease processes and to perform biomarker analysis.\nDuring the second stage, the light-independent reactions use these products to capture and reduce carbon dioxide. Most organisms that use oxygenic photosynthesis use visible light for the light-dependent reactions, although at least three use shortwave infrared or, more specifically, far-red radiation.Some organisms employ even more radical variants of photosynthesis.",
'Structuring elements are particular cases of binary images, usually being small and simple. In mathematical morphology, binary images are subsets of a Euclidean space Rd or the integer grid Zd, for some dimension d. Here are some examples of widely used structuring elements (denoted by B): Let E=R2; B is an open disk of radius r, centered at the origin. Let E=Z2; B is a 3x3 square, that is, B={(-1,-1),(-1,0),(-1,1),(0,-1),(0,0),(0,1),(1,-1),(1,0),(1,1)}. Let E=Z2; B is the "cross" given by: B={(-1,0),(0,-1),(0,0),(0,1),(1,0)}.In the discrete case, a structuring element can also be represented as a set of pixels on a grid, assuming the values 1 (if the pixel belongs to the structuring element) or 0 (otherwise). When used by a hit-or-miss transform, usually the structuring element is a composite of two disjoint sets (two simple structuring elements), one associated to the foreground, and one associated to the background of the image to be probed. In this case, an alternative representation of the composite structuring element is as a set of pixels which are either set (1, associated to the foreground), not set (0, associated to the background) or "don\'t care".',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
baselineInformationRetrievalEvaluator| Metric | Value |
|---|---|
| cosine_accuracy@1 | 0.68 |
| cosine_accuracy@3 | 0.78 |
| cosine_accuracy@5 | 0.78 |
| cosine_accuracy@10 | 0.8 |
| cosine_precision@1 | 0.68 |
| cosine_precision@3 | 0.26 |
| cosine_precision@5 | 0.156 |
| cosine_precision@10 | 0.08 |
| cosine_recall@1 | 0.68 |
| cosine_recall@3 | 0.78 |
| cosine_recall@5 | 0.78 |
| cosine_recall@10 | 0.8 |
| cosine_ndcg@10 | 0.744 |
| cosine_mrr@10 | 0.7257 |
| cosine_map@100 | 0.7281 |
anchor and positive| anchor | positive | |
|---|---|---|
| type | string | string |
| details |
|
|
| anchor | positive |
|---|---|
Which of the following milestones in tin smelting occurred in 1978? |
The work then proceeded to smelting tin concentrates (1975) and then sulfidic tin concentrates (1977).MIM and ER&S jointly funded the 1975 Port Kembla converter slag treatment trials and MIM’s involvement continued with the slag treatment work in Townsville and Mount Isa.In parallel with the copper slag treatment work, the CSIRO was continuing to work in tin smelting. Projects included a five tonne ("t") plant for recovering tin from slag being installed at Associated Tin Smelters in 1978, and the first sulfidic smelting test work being done in collaboration with Aberfoyle Limited, in which tin was fumed from pyritic tin ore and from mixed tin and copper concentrates. Aberfoyle was investigating the possibility of using the Sirosmelt lance approach to improve the recovery of tin from complex ores, such as its mine at Cleveland, Tasmania, and the Queen Hill ore zone near Zeehan in Tasmania.The Aberfoyle work led to the construction and operation in late 1980 of a four t/h tin matte fumi... |
Which of the following conditions is necessary for the application of Theorem GF3 in the context of the product defined recursively by ( f_n(z) = z(1 + g_n(z)) )? |
z g_n(z) |
Which of the following statements correctly describes the relationship between axonometry and axonometric projection? |
Images drawn in parallel projection rely upon the technique of axonometry ("to measure along axes"), as described in Pohlke's theorem. In general, the resulting image is oblique (the rays are not perpendicular to the image plane); but in special cases the result is orthographic (the rays are perpendicular to the image plane). Axonometry should not be confused with axonometric projection, as in English literature the latter usually refers only to a specific class of pictorials (see below). |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
eval_strategy: epochper_device_train_batch_size: 32gradient_accumulation_steps: 8learning_rate: 1e-06num_train_epochs: 4lr_scheduler_type: cosinewarmup_ratio: 0.1bf16: Truetf32: Trueload_best_model_at_end: Trueoptim: adamw_torch_fusedbatch_sampler: no_duplicatesoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: epochprediction_loss_only: Trueper_device_train_batch_size: 32per_device_eval_batch_size: 8per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 8eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 1e-06weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 4max_steps: -1lr_scheduler_type: cosinelr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Truefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Truelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Trueignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}tp_size: 0fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportional| Epoch | Step | Training Loss | baseline_cosine_ndcg@10 |
|---|---|---|---|
| -1 | -1 | - | 0.7290 |
| 0.8276 | 3 | - | 0.7365 |
| 1.8276 | 6 | - | 0.7427 |
| 2.8276 | 9 | - | 0.7420 |
| 3.2759 | 10 | 7.0507 | - |
| 3.8276 | 12 | - | 0.744 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
nreimers/MiniLM-L6-H384-uncased