--- tags: - sentence-transformers - sentence-similarity - feature-extraction - generated_from_trainer - dataset_size:5489 - loss:MultipleNegativesRankingLoss base_model: zacbrld/MNLP_M2_document_encoder widget: - source_sentence: Military activity affects the physical geology. This was first noted through the intensive shelling on the Western Front during World War I, which caused the shattering of the bedrock and changed the rocks' permeability. New minerals, rocks, and land-forms are also a byproduct of nuclear testing. sentences: - 'Silicon can form sigma bonds to other silicon atoms (and disilane is the parent of this class of compounds). However, it is difficult to prepare and isolate SinH2n+2 (analogous to the saturated alkane hydrocarbons) with n greater than about 8, as their thermal stability decreases with increases in the number of silicon atoms. Silanes higher in molecular weight than disilane decompose to polymeric polysilicon hydride and hydrogen. But with a suitable pair of organic substituents in place of hydrogen on each silicon it is possible to prepare polysilanes (sometimes, erroneously called polysilenes) that are analogues of alkanes. These long chain compounds have surprising electronic properties - high electrical conductivity, for example - arising from sigma delocalization of the electrons in the chain. Even silicon–silicon pi bonds are possible. However, these bonds are less stable than the carbon analogues. Disilane and longer silanes are quite reactive compared to alkanes. Disilene and disilynes are quite rare, unlike alkenes and alkynes. Examples of disilynes, long thought to be too unstable to be isolated were reported in 2004.' - 'The increasing sophistication of brain-reading technologies has led many to investigate their potential applications for lie detection. Legally required brain scans arguably violate “the guarantee against self-incrimination” because they differ from acceptable forms of bodily evidence, such as fingerprints or blood samples, in an important way: they are not simply physical, hard evidence, but evidence that is intimately linked to the defendant''s mind. Under US law, brain-scanning technologies might also raise implications for the Fourth Amendment, calling into question whether they constitute an unreasonable search and seizure.' - Military activity affects the physical geology. This was first noted through the intensive shelling on the Western Front during World War I, which caused the shattering of the bedrock and changed the rocks' permeability. New minerals, rocks, and land-forms are also a byproduct of nuclear testing. - source_sentence: Right after a bombing in Moscow on September 6, 1999, several anti-nuclear activists were detained under suspicion. Vladimir Slivyak was one of the three arrested under suspicion. He was an activist in the anti-nuclear movement and a Voronezh action camp organizer. After the bombing Slivyak was pushed into a car by several men who claimed to be Moscow police. The police interrogated and threatened Slivyak for around ninety minutes before letting him go. The Moscow police thought environmentalists from the anti-nuclear movement were associated with the bombing since an earlier bombing occurred on August 31 at Manezh Palace in Moscow . After the incident, on August 31, several more bombings occurred which agitated many people, leading to the racially profiled arrest of dark-skinned Muscovites and visitors to the Russian capital. sentences: - The technique works backwards from the target to identify a precursor molecule and an enzyme that converts it into the target, and then a second precursor that can produce the first and so on until a simple, inexpensive molecule becomes the beginning of the series. For each precursor, the enzyme is evolved using induced mutations and natural selection to produce a more productive version. The evolutionary process can be repeated over multiple generations until acceptable productivity is achieved. The process does not require high temperature, high pressure, the use of exotic catalysts or other elements that can increase costs. The enzyme "optimizations" that increase the production of one precursor from another are cumulative in that the same precursor productivity improvements can potentially be leveraged across multiple target molecules. - Right after a bombing in Moscow on September 6, 1999, several anti-nuclear activists were detained under suspicion. Vladimir Slivyak was one of the three arrested under suspicion. He was an activist in the anti-nuclear movement and a Voronezh action camp organizer. After the bombing Slivyak was pushed into a car by several men who claimed to be Moscow police. The police interrogated and threatened Slivyak for around ninety minutes before letting him go. The Moscow police thought environmentalists from the anti-nuclear movement were associated with the bombing since an earlier bombing occurred on August 31 at Manezh Palace in Moscow . After the incident, on August 31, several more bombings occurred which agitated many people, leading to the racially profiled arrest of dark-skinned Muscovites and visitors to the Russian capital. - One of the main sources of information about the Earth's composition comes from understanding the relationship between peridotite and basalt melting. Peridotite makes up most of Earth's mantle. Basalt, which is highly concentrated in the Earth's oceanic crust, is formed when magma reaches the Earth's surface and cools down at a very fast rate. When magma cools, different minerals crystallize at different times depending on the cooling temperature of that respective mineral. This ultimately changes the chemical composition of the melt as different minerals begin to crystallize. Fractional crystallization of elements in basaltic liquids has also been studied to observe the composition of lava in the upper mantle. This concept can be applied by scientists to give insight on the evolution of Earth's mantle and how concentrations of lithophile trace elements have varied over the last 3.5 billion years. - source_sentence: 'The group designs numerous structural concepts such as frameworks and floors like Dalle O''Portune and D-Dalle. The timber design office of excellence is an entity specializing in the design and optimization of wood construction projects. It stands out for its ability to meet the highest demands in terms of performance, durability and aesthetics, and is thus recognized for its contribution to the realization of ambitious projects in the field of timber construction.' sentences: - 'The group designs numerous structural concepts such as frameworks and floors like Dalle O''Portune and D-Dalle. The timber design office of excellence is an entity specializing in the design and optimization of wood construction projects. It stands out for its ability to meet the highest demands in terms of performance, durability and aesthetics, and is thus recognized for its contribution to the realization of ambitious projects in the field of timber construction.' - 'In waterways, the term bridge strike may be used when a water vessel collides with a bridge. This may include a collision to the bridge span or a collision to the bridge support structure such as a pier. Bridge protection systems are used to mitigate the effects of a ship strike. In 2014, the United States Coast Guard published statistics that it investigated 205 bridge strikes in the eleven years prior to the publication. All of those collisions involved involved a fixed, swing, lift or draw bridge. That number was 1.2% of all vessel collision incidents investigated by the Coast Guard. The primary causal factor was the lack of accurate air draft data, the distance between water surface to the top most part of the vessel.' - 'Post, Stephen Garrard. Encyclopedia of bioethics. Third edition. Macmillan Reference USA, 2003. ISBN 0028657748. ISSN 0950-4125; DOI:10.1108/09504120510573477. (5-Volume Set; 3062 pages). Reich, Warren Thomas Encyclopedia of Bioethics. First edition. New York: Free Press, 1978. ISBN 0029261805. ISBN 978-0029261804. (4-Volume Set; 1933 pages) Reich, Warren Thomas Encyclopedia of Bioethics. Second edition. New York: Free Press, 1982. (5-Volume Set; 2950 pages) Reich, Warren Thomas Encyclopedia of Bioethics. Third edition. New York: Simon & Schuster Macmillan, 1995; London: Simon and Schuster and Prentice Hall International, c1995. Rev. ed. (5-Volume Set; 2950 pages; 464 articles) ISBN 0028973550. ISBN 978-0028973555.' - source_sentence: 'Regression is used to make predictions based on the retrieved data through statistical trends and statistical modeling. Different uses of this technique are used for fetching Photometric redshifts and measurements of physical parameters of stars. The approaches are listed below: Artificial neural network (ANN) Support vector regression (SVR) Decision tree Random forest k-nearest neighbors regression Kernel regression Principal component regression (PCR) Gaussian process Least squared regression (LSR) Partial least squares regression' sentences: - 'Regression is used to make predictions based on the retrieved data through statistical trends and statistical modeling. Different uses of this technique are used for fetching Photometric redshifts and measurements of physical parameters of stars. The approaches are listed below: Artificial neural network (ANN) Support vector regression (SVR) Decision tree Random forest k-nearest neighbors regression Kernel regression Principal component regression (PCR) Gaussian process Least squared regression (LSR) Partial least squares regression' - 'Clandestine chemistry is not limited to drugs; it is also associated with explosives, and other illegal chemicals. Of the explosives manufactured illegally, nitroglycerin and acetone peroxide are easiest to produce due to the ease with which the precursors can be acquired. Uncle Fester is a writer who commonly writes about different aspects of clandestine chemistry. Secrets of Methamphetamine Manufacture is among his most popular books, and is considered required reading for DEA agents. More of his books deal with other aspects of clandestine chemistry, including explosives, and poisons. Fester is, however, considered by many to be a faulty and unreliable source for information in regard to the clandestine manufacture of chemicals.' - A novel input representation has been developed consisting of a combination of sparse encoding, Blosum encoding, and input derived from hidden Markov models. this method predicts T-cell epitopes for the genome of hepatitis C virus and discuss possible applications of the prediction method to guide the process of rational vaccine design. - source_sentence: 'Burray and The Barriers Undiscovered Scotland: The Churchill Barriers Our Past History: The Churchill Barriers Archived 17 December 2006 at the Wayback Machine Okneypics.com: photos of the barrier Archived 15 May 2008 at the Wayback Machine' sentences: - "For a neuron, in the limit of \n \n \n \n b\n =\n \ \ 0\n \n \n {\\displaystyle b=0}\n \n, the map becomes 1D, since\ \ \n \n \n \n y\n \n \n {\\displaystyle y}\n \n converges\ \ to a constant. If the parameter \n \n \n \n b\n \n \n\ \ {\\displaystyle b}\n \n is scanned in a range, different orbits will be\ \ seen, some periodic, others chaotic, that appear between two fixed points, one\ \ at \n \n \n \n x\n =\n 1\n \n \n {\\\ displaystyle x=1}\n \n ; \n \n \n \n y\n =\n 1\n\ \ \n \n {\\displaystyle y=1}\n \n and the other close to the value\ \ of \n \n \n \n k\n \n \n {\\displaystyle k}\n \n\ \ (which would be the regime excitable).\n\n\n== References ==" - 'Cerebellar Purkinje neurons have been proposed to have two distinct bursting modes: dendritically driven, by dendritic Ca2+ spikes, and somatically driven, wherein the persistent Na+ current is the burst initiator and the SK K+ current is the burst terminator. Purkinje neurons may utilise these bursting forms in information coding to the deep cerebellar nuclei.' - 'Burray and The Barriers Undiscovered Scotland: The Churchill Barriers Our Past History: The Churchill Barriers Archived 17 December 2006 at the Wayback Machine Okneypics.com: photos of the barrier Archived 15 May 2008 at the Wayback Machine' pipeline_tag: sentence-similarity library_name: sentence-transformers --- # SentenceTransformer based on zacbrld/MNLP_M2_document_encoder This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [zacbrld/MNLP_M2_document_encoder](https://huggingface.co/zacbrld/MNLP_M2_document_encoder). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** [zacbrld/MNLP_M2_document_encoder](https://huggingface.co/zacbrld/MNLP_M2_document_encoder) - **Maximum Sequence Length:** 256 tokens - **Output Dimensionality:** 384 dimensions - **Similarity Function:** Cosine Similarity ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) ### Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) (2): Normalize() ) ``` ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer # Download from the 🤗 Hub model = SentenceTransformer("zacbrld/MNLP_M2_document_encoder") # Run inference sentences = [ 'Burray and The Barriers\nUndiscovered Scotland: The Churchill Barriers\nOur Past History: The Churchill Barriers Archived 17 December 2006 at the Wayback Machine\nOkneypics.com: photos of the barrier Archived 15 May 2008 at the Wayback Machine', 'Burray and The Barriers\nUndiscovered Scotland: The Churchill Barriers\nOur Past History: The Churchill Barriers Archived 17 December 2006 at the Wayback Machine\nOkneypics.com: photos of the barrier Archived 15 May 2008 at the Wayback Machine', 'Cerebellar Purkinje neurons have been proposed to have two distinct bursting modes: dendritically driven, by dendritic Ca2+ spikes, and somatically driven, wherein the persistent Na+ current is the burst initiator and the SK K+ current is the burst terminator. Purkinje neurons may utilise these bursting forms in information coding to the deep cerebellar nuclei.', ] embeddings = model.encode(sentences) print(embeddings.shape) # [3, 384] # Get the similarity scores for the embeddings similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] ``` ## Training Details ### Training Dataset #### Unnamed Dataset * Size: 5,489 training samples * Columns: sentence_0 and sentence_1 * Approximate statistics based on the first 1000 samples: | | sentence_0 | sentence_1 | |:--------|:-------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------| | type | string | string | | details | | | * Samples: | sentence_0 | sentence_1 | |:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | In related work, Smoller, Temple, and Vogler propose that this shockwave may have resulted in our part of the universe having a lower density than that surrounding it, causing the accelerated expansion normally attributed to dark energy.
They also propose that this related theory could be tested: a universe with dark energy should give a figure for the cubic correction to redshift versus luminosity C = −0.180 at a = a whereas for Smoller, Temple, and Vogler's alternative C should be positive rather than negative. They give a more precise calculation for their wave model alternative as: the cubic correction to redshift versus luminosity at a = a is C = 0.359.
| In related work, Smoller, Temple, and Vogler propose that this shockwave may have resulted in our part of the universe having a lower density than that surrounding it, causing the accelerated expansion normally attributed to dark energy.
They also propose that this related theory could be tested: a universe with dark energy should give a figure for the cubic correction to redshift versus luminosity C = −0.180 at a = a whereas for Smoller, Temple, and Vogler's alternative C should be positive rather than negative. They give a more precise calculation for their wave model alternative as: the cubic correction to redshift versus luminosity at a = a is C = 0.359.
| | Evolution is a central organizing concept in biology. It is the change in heritable characteristics of populations over successive generations. In artificial selection, animals were selectively bred for specific traits.
Given that traits are inherited, populations contain a varied mix of traits, and reproduction is able to increase any population, Darwin argued that in the natural world, it was nature that played the role of humans in selecting for specific traits. Darwin inferred that individuals who possessed heritable traits better adapted to their environments are more likely to survive and produce more offspring than other individuals. He further inferred that this would lead to the accumulation of favorable traits over successive generations, thereby increasing the match between the organisms and their environment.
| Evolution is a central organizing concept in biology. It is the change in heritable characteristics of populations over successive generations. In artificial selection, animals were selectively bred for specific traits.
Given that traits are inherited, populations contain a varied mix of traits, and reproduction is able to increase any population, Darwin argued that in the natural world, it was nature that played the role of humans in selecting for specific traits. Darwin inferred that individuals who possessed heritable traits better adapted to their environments are more likely to survive and produce more offspring than other individuals. He further inferred that this would lead to the accumulation of favorable traits over successive generations, thereby increasing the match between the organisms and their environment.
| | The total number of engineers employed in the U.S. in 2015 was roughly 1.6 million. Of these, 278,340 were mechanical engineers (17.28%), the largest discipline by size. In 2012, the median annual income of mechanical engineers in the U.S. workforce was $80,580. The median income was highest when working for the government ($92,030), and lowest in education ($57,090). In 2014, the total number of mechanical engineering jobs was projected to grow 5% over the next decade. As of 2009, the average starting salary was $58,800 with a bachelor's degree. | The total number of engineers employed in the U.S. in 2015 was roughly 1.6 million. Of these, 278,340 were mechanical engineers (17.28%), the largest discipline by size. In 2012, the median annual income of mechanical engineers in the U.S. workforce was $80,580. The median income was highest when working for the government ($92,030), and lowest in education ($57,090). In 2014, the total number of mechanical engineering jobs was projected to grow 5% over the next decade. As of 2009, the average starting salary was $58,800 with a bachelor's degree. | * Loss: [MultipleNegativesRankingLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters: ```json { "scale": 20.0, "similarity_fct": "cos_sim" } ``` ### Training Hyperparameters #### Non-Default Hyperparameters - `per_device_train_batch_size`: 16 - `per_device_eval_batch_size`: 16 - `num_train_epochs`: 5 - `multi_dataset_batch_sampler`: round_robin #### All Hyperparameters
Click to expand - `overwrite_output_dir`: False - `do_predict`: False - `eval_strategy`: no - `prediction_loss_only`: True - `per_device_train_batch_size`: 16 - `per_device_eval_batch_size`: 16 - `per_gpu_train_batch_size`: None - `per_gpu_eval_batch_size`: None - `gradient_accumulation_steps`: 1 - `eval_accumulation_steps`: None - `torch_empty_cache_steps`: None - `learning_rate`: 5e-05 - `weight_decay`: 0.0 - `adam_beta1`: 0.9 - `adam_beta2`: 0.999 - `adam_epsilon`: 1e-08 - `max_grad_norm`: 1 - `num_train_epochs`: 5 - `max_steps`: -1 - `lr_scheduler_type`: linear - `lr_scheduler_kwargs`: {} - `warmup_ratio`: 0.0 - `warmup_steps`: 0 - `log_level`: passive - `log_level_replica`: warning - `log_on_each_node`: True - `logging_nan_inf_filter`: True - `save_safetensors`: True - `save_on_each_node`: False - `save_only_model`: False - `restore_callback_states_from_checkpoint`: False - `no_cuda`: False - `use_cpu`: False - `use_mps_device`: False - `seed`: 42 - `data_seed`: None - `jit_mode_eval`: False - `use_ipex`: False - `bf16`: False - `fp16`: False - `fp16_opt_level`: O1 - `half_precision_backend`: auto - `bf16_full_eval`: False - `fp16_full_eval`: False - `tf32`: None - `local_rank`: 0 - `ddp_backend`: None - `tpu_num_cores`: None - `tpu_metrics_debug`: False - `debug`: [] - `dataloader_drop_last`: False - `dataloader_num_workers`: 0 - `dataloader_prefetch_factor`: None - `past_index`: -1 - `disable_tqdm`: False - `remove_unused_columns`: True - `label_names`: None - `load_best_model_at_end`: False - `ignore_data_skip`: False - `fsdp`: [] - `fsdp_min_num_params`: 0 - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} - `tp_size`: 0 - `fsdp_transformer_layer_cls_to_wrap`: None - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} - `deepspeed`: None - `label_smoothing_factor`: 0.0 - `optim`: adamw_torch - `optim_args`: None - `adafactor`: False - `group_by_length`: False - `length_column_name`: length - `ddp_find_unused_parameters`: None - `ddp_bucket_cap_mb`: None - `ddp_broadcast_buffers`: False - `dataloader_pin_memory`: True - `dataloader_persistent_workers`: False - `skip_memory_metrics`: True - `use_legacy_prediction_loop`: False - `push_to_hub`: False - `resume_from_checkpoint`: None - `hub_model_id`: None - `hub_strategy`: every_save - `hub_private_repo`: None - `hub_always_push`: False - `gradient_checkpointing`: False - `gradient_checkpointing_kwargs`: None - `include_inputs_for_metrics`: False - `include_for_metrics`: [] - `eval_do_concat_batches`: True - `fp16_backend`: auto - `push_to_hub_model_id`: None - `push_to_hub_organization`: None - `mp_parameters`: - `auto_find_batch_size`: False - `full_determinism`: False - `torchdynamo`: None - `ray_scope`: last - `ddp_timeout`: 1800 - `torch_compile`: False - `torch_compile_backend`: None - `torch_compile_mode`: None - `include_tokens_per_second`: False - `include_num_input_tokens_seen`: False - `neftune_noise_alpha`: None - `optim_target_modules`: None - `batch_eval_metrics`: False - `eval_on_start`: False - `use_liger_kernel`: False - `eval_use_gather_object`: False - `average_tokens_across_devices`: False - `prompts`: None - `batch_sampler`: batch_sampler - `multi_dataset_batch_sampler`: round_robin
### Training Logs | Epoch | Step | Training Loss | |:------:|:----:|:-------------:| | 1.4535 | 500 | 0.0002 | | 2.9070 | 1000 | 0.0 | | 4.3605 | 1500 | 0.0007 | ### Framework Versions - Python: 3.10.11 - Sentence Transformers: 3.4.1 - Transformers: 4.51.3 - PyTorch: 2.6.0 - Accelerate: 1.7.0 - Datasets: 3.6.0 - Tokenizers: 0.21.1 ## Citation ### BibTeX #### Sentence Transformers ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ``` #### MultipleNegativesRankingLoss ```bibtex @misc{henderson2017efficient, title={Efficient Natural Language Response Suggestion for Smart Reply}, author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil}, year={2017}, eprint={1705.00652}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```