Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper • 1908.10084 • Published • 13
How to use zacbrld/MNLP_M3_document_encoder with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("zacbrld/MNLP_M3_document_encoder")
sentences = [
"For example, t ∈ { 0 , 1 , … , N } , N 0 , or {\\mbox{ or }}[0,+\\infty ).} Similarly, a filtered probability space (also known as a stochastic basis) ( Ω , F , { F t } t ≥ 0 , P ) {\\displaystyle \\left(\\Omega ,{\\mathcal {F}},\\left\\{{\\mathcal {F}}_{t}\\right\\}_{t\\geq 0},\\mathbb {P} \\right)} , is a probability space equipped with the filtration { F t } t ≥ 0 {\\displaystyle \\left\\{{\\mathcal {F}}_{t}\\right\\}_{t\\geq 0}} of its σ {\\displaystyle \\sigma } -algebra F {\\displaystyle {\\mathcal {F}}} . A filtered probability space is said to satisfy the usual conditions if it is complete (i.e., F 0 {\\displaystyle {\\mathcal {F}}_{0}} contains all P {\\displaystyle \\mathbb {P} } -null sets) and right-continuous (i.e. F t = F t + := ⋂ s > t F s {\\displaystyle {\\mathcal {F}}_{t}={\\mathcal {F}}_{t+}:=\\bigcap _{s>t}{\\mathcal {F}}_{s}} for all times t {\\displaystyle t} ).It is also useful (in the case of an unbounded index set) to define F ∞ {\\displaystyle {\\mathcal {F}}_{\\infty }} as the σ {\\displaystyle \\sigma } -algebra generated by the infinite union of the F t {\\displaystyle {\\mathcal {F}}_{t}} 's, which is contained in F {\\displaystyle {\\mathcal {F}}}: F ∞ = σ ( ⋃ t ≥ 0 F t ) ⊆ F .",
"These individuals can experience these symptoms from failed attempts of depression like symptoms.Narcissistic personality disorder is characterized as feelings of superiority, a sense of grandiosity, exhibitionism, charming but also exploitive behaviors in the interpersonal domain, success, beauty, feelings of entitlement and a lack of empathy. Those with this disorder often engage in assertive self enhancement and antagonistic self protection. All of these factors can lead an individual with narcissistic personality disorder to manipulate others.",
"{\\displaystyle {\\mathcal {F}}_{\\infty }=\\sigma \\left(\\bigcup _{t\\geq 0}{\\mathcal {F}}_{t}\\right)\\subseteq {\\mathcal {F}}.} A σ-algebra defines the set of events that can be measured, which in a probability context is equivalent to events that can be discriminated, or \"questions that can be answered at time t {\\displaystyle t} \". Therefore, a filtration is often used to represent the change in the set of events that can be measured, through gain or loss of information. A typical example is in mathematical finance, where a filtration represents the information available up to and including each time t {\\displaystyle t} , and is more and more precise (the set of measurable events is staying the same or increasing) as more information from the evolution of the stock price becomes available.",
"Section: Structure and dynamics > Composition. Like microtubules, neurotubules are made up of protein polymers of α-tubulin and β-tubulin, globular proteins that are closely related. They join together to form a dimer, called tubulin. Neurotubules are generally assembled by 13 protofilaments which are polymerized from tubulin dimers. As a tubulin dimer consists of one α-tubulin and one β-tubulin, one end of the neurotubule is exposed with the α-tubulin and the other end with β-tubulin, these two ends contribute to the polarity of the neurotubule – the plus (+) end and the minus (-) end. The β-tubulin subunit is exposed on the plus (+) end. The two ends differ in their growth rate: plus (+) end is the fast-growing end while minus (-) end is the slow-growing end. Both ends have their own rate of polymerization and depolymerization of tubulin dimers, net polymerization causes the assembly of tubulin, hence the length of the neurotubules."
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 350, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("zacbrld/MNLP_M3_document_encoder_kaggle")
# Run inference
sentences = [
'If X {\\displaystyle X} is a linear space and g {\\displaystyle g} are constants, the system is said to be subject to additive noise, otherwise it is said to be subject to multiplicative noise. This term is somewhat misleading as it has come to mean the general case even though it appears to imply the limited case in which g ( x ) ∝ x {\\displaystyle g(x)\\propto x} . For a fixed configuration of noise, SDE has a unique solution differentiable with respect to the initial condition.',
'Nontriviality of stochastic case shows up when one tries to average various objects of interest over noise configurations. In this sense, an SDE is not a uniquely defined entity when noise is multiplicative and when the SDE is understood as a continuous time limit of a stochastic difference equation. In this case, SDE must be complemented by what is known as "interpretations of SDE" such as Itô or a Stratonovich interpretations of SDEs.',
'Article: RNA-Seq technology and its application in fish transcriptomics.. High-throughput sequencing technologies, also known as next-generation sequencing (NGS) technologies, have revolutionized the way that genomic research is advancing. In addition to the static genome, these state-of-art technologies have been recently exploited to analyze the dynamic transcriptome, and the resulting technology is termed RNA sequencing (RNA-seq). RNA-seq is free from many limitations of other transcriptomic approaches, such as microarray and tag-based sequencing method. Although RNA-seq has only been available for a short time, studies using this method have completely changed our perspective of the breadth and depth of eukaryotic transcriptomes. In terms of the transcriptomics of teleost fishes, both model and non-model species have benefited from the RNA-seq approach and have undergone tremendous advances in the past several years. RNA-seq has helped not only in mapping and annotating fish transcriptome but also in our understanding of many biological processes in fish, such as development, adaptive evolution, host immune response, and stress response. In this review, we first provide an overview of each step of RNA-seq from library construction to the bioinformatic analysis of the data. We then summarize and discuss the recent biological insights obtained from the RNA-seq studies in a variety of fish species.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
sentence_0, sentence_1, and sentence_2| sentence_0 | sentence_1 | sentence_2 | |
|---|---|---|---|
| type | string | string | string |
| details |
|
|
|
| sentence_0 | sentence_1 | sentence_2 |
|---|---|---|
Most hard-bodied insect specimens and some other hard-bodied invertebrates such as certain Arachnida, are preserved as pinned specimens. Either while still fresh, or after rehydrating them if necessary because they had dried out, specimens are transfixed by special stainless steel entomological pins. As the insect dries the internal tissues solidify and, possibly aided to some extent by the integument, they grip the pin and secure the specimen in place on the pin. Very small, delicate specimens may instead be secured by fine steel points driven into slips of card, or glued to card points or similar attachments that in turn are pinned in the same way as entire mounted insects. |
The pins offer a means of handling the specimens without damage, and they also bear labels for descriptive and reference data. Once dried, the specimens may be kept in conveniently sized open trays. The bottoms of the trays are lined with a material suited to receiving and holding entomological pins securely and conveniently. |
Article: Interruption of People in Human-Computer Interaction: A General Unifying Definition of Human Interruption and Taxonomy. Abstract : User-interruption in human-computer interaction (HCI) is an increasingly important problem. Many of the useful advances in intelligent and multitasking computer systems have the significant side effect of greatly increasing user-interruption. This previously innocuous HCI problem has become critical to the successful function of many kinds of modern computer systems. Unfortunately, no HCI design guidelines exist for solving this problem. In fact, theoretical tools do not yet exist for investigating the HCI problem of user-interruption in a comprehensive and generalizable way. This report asserts that a single unifying definition of user-interruption and the accompanying practical taxonomy would be useful theoretical tools for driving effective investigation of this crucial HCI problem. These theoretical tools are constructed here. A comprehensive a... |
In strike-slip tectonic settings, deformation of the lithosphere occurs primarily in the plane of Earth as a result of near horizontal maximum and minimum principal stresses. Faults associated with these plate boundaries are primarily vertical. Wherever these vertical fault planes encounter bends, movement along the fault can create local areas of compression or tension. When the curve in the fault plane moves apart, a region of transtension occurs and sometimes is large enough and long-lived enough to create a sedimentary basin often called a pull-apart basin or strike-slip basin. |
These basins are often roughly rhombohedral in shape and may be called a rhombochasm. A classic rhombochasm is illustrated by the Dead Sea rift, where northward movement of the Arabian Plate relative to the Anatolian Plate has created a strike slip basin. The opposite effect is that of transpression, where converging movement of a curved fault plane causes collision of the opposing sides of the fault. An example is the San Bernardino Mountains north of Los Angeles, which result from convergence along a curve in the San Andreas fault system. The Northridge earthquake was caused by vertical movement along local thrust and reverse faults "bunching up" against the bend in the otherwise strike-slip fault environment. |
This was the first interpretation and prediction of a particle and corresponding antiparticle. See Dirac spinor and bispinor for further description of these spinors. In the non-relativistic limit the Dirac equation reduces to the Pauli equation (see Dirac equation for how). |
M1: This was used by seacoast artillery for major-caliber seacoast guns. It computed continuous firing data for a battery of two guns that were separated by not more than 1,000 feet (300 m). It utilised the same type of input data furnished by a range section with the then-current (1940) types of position-finding and fire-control equipment. M3: This was used in conjunction with the M9 and M10 directors to compute all required firing data, i.e. azimuth, elevation and fuze time. |
The computations were made continuously, so that the gun was at all times correctly pointed and the fuze correctly timed for firing at any instant. The computer was mounted in the M13 or M14 director trailer. |
Section: Industry > Semiconductors. A semiconductor is a material that has a resistivity between a conductor and insulator. Modern day electronics run on semiconductors, and the industry had an estimated US$530 billion market in 2021. Its electronic properties can be greatly altered through intentionally introducing impurities in a process referred to as doping. Semiconductor materials are used to build diodes, transistors, light-emitting diodes (LEDs), and analog and digital electric circuits, among their many uses. Semiconductor devices have replaced thermionic devices like vacuum tubes in most applications. Semiconductor devices are manufactured both as single discrete devices and as integrated circuits (ICs), which consist of a number—from a few to millions—of devices manufactured and interconnected on a single semiconductor substrate. Of all the semiconductors in use today, silicon makes up the largest portion both by quantity and commercial value. Monocrystalline silicon is used ... |
TripletLoss with these parameters:{
"distance_metric": "TripletDistanceMetric.EUCLIDEAN",
"triplet_margin": 5
}
per_device_train_batch_size: 16per_device_eval_batch_size: 16num_train_epochs: 10multi_dataset_batch_sampler: round_robinoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 16per_device_eval_batch_size: 16per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 10max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robin| Epoch | Step | Training Loss |
|---|---|---|
| 0.1896 | 500 | 2.189 |
| 0.3792 | 1000 | 0.2668 |
| 0.5688 | 1500 | 0.1869 |
| 0.7584 | 2000 | 0.1456 |
| 0.9480 | 2500 | 0.1123 |
| 1.1377 | 3000 | 0.0978 |
| 1.3273 | 3500 | 0.0735 |
| 1.5169 | 4000 | 0.0842 |
| 1.7065 | 4500 | 0.0756 |
| 1.8961 | 5000 | 0.0577 |
| 2.0857 | 5500 | 0.0512 |
| 2.2753 | 6000 | 0.0308 |
| 2.4649 | 6500 | 0.0271 |
| 2.6545 | 7000 | 0.0303 |
| 2.8441 | 7500 | 0.0324 |
| 3.0338 | 8000 | 0.0325 |
| 3.2234 | 8500 | 0.0112 |
| 3.4130 | 9000 | 0.0136 |
| 3.6026 | 9500 | 0.0123 |
| 3.7922 | 10000 | 0.0117 |
| 3.9818 | 10500 | 0.0148 |
| 4.1714 | 11000 | 0.0085 |
| 4.3610 | 11500 | 0.0066 |
| 4.5506 | 12000 | 0.0053 |
| 4.7402 | 12500 | 0.0078 |
| 4.9298 | 13000 | 0.006 |
| 5.1195 | 13500 | 0.0058 |
| 5.3091 | 14000 | 0.0043 |
| 5.4987 | 14500 | 0.0027 |
| 5.6883 | 15000 | 0.0036 |
| 5.8779 | 15500 | 0.0035 |
| 6.0675 | 16000 | 0.0029 |
| 6.2571 | 16500 | 0.0031 |
| 6.4467 | 17000 | 0.0015 |
| 6.6363 | 17500 | 0.0025 |
| 6.8259 | 18000 | 0.0021 |
| 7.0155 | 18500 | 0.0032 |
| 7.2052 | 19000 | 0.0011 |
| 7.3948 | 19500 | 0.001 |
| 7.5844 | 20000 | 0.0012 |
| 7.7740 | 20500 | 0.0011 |
| 7.9636 | 21000 | 0.0013 |
| 8.1532 | 21500 | 0.0002 |
| 8.3428 | 22000 | 0.001 |
| 8.5324 | 22500 | 0.0006 |
| 8.7220 | 23000 | 0.0003 |
| 8.9116 | 23500 | 0.0007 |
| 9.1013 | 24000 | 0.0003 |
| 9.2909 | 24500 | 0.0002 |
| 9.4805 | 25000 | 0.0005 |
| 9.6701 | 25500 | 0.0005 |
| 9.8597 | 26000 | 0.0005 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{hermans2017defense,
title={In Defense of the Triplet Loss for Person Re-Identification},
author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
year={2017},
eprint={1703.07737},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Base model
nreimers/MiniLM-L6-H384-uncased