zacbrld's picture
🔁 Fine-tuned on custom STEM corpus
6f1d702 verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:5489
  - loss:MultipleNegativesRankingLoss
base_model: zacbrld/MNLP_M2_document_encoder
widget:
  - source_sentence: >-
      Military activity affects the physical geology. This was first noted
      through the intensive shelling on the Western Front during World War I,
      which caused the shattering of the bedrock and changed the rocks'
      permeability. New minerals, rocks, and land-forms are also a byproduct of
      nuclear testing.
    sentences:
      - >-
        Silicon can form sigma bonds to other silicon atoms (and disilane is the
        parent of this class of compounds). However, it is difficult to prepare
        and isolate SinH2n+2 (analogous to the saturated alkane hydrocarbons)
        with n greater than about 8, as their thermal stability decreases with
        increases in the number of silicon atoms.  Silanes higher in molecular
        weight than disilane decompose to polymeric polysilicon hydride and
        hydrogen.  But with a suitable pair of organic substituents in place of
        hydrogen on each silicon it is possible to prepare polysilanes
        (sometimes, erroneously called polysilenes) that are analogues of
        alkanes. These long chain compounds have surprising electronic
        properties - high electrical conductivity, for example - arising from
        sigma delocalization of the electrons in the chain.

        Even silicon–silicon pi bonds are possible. However, these bonds are
        less stable than the carbon analogues. Disilane and longer silanes are
        quite reactive compared to alkanes. Disilene and disilynes are quite
        rare, unlike alkenes and alkynes. Examples of disilynes, long thought to
        be too unstable to be isolated were reported in 2004.
      - >-
        The increasing sophistication of brain-reading technologies has led many
        to investigate their potential applications for lie detection. Legally
        required brain scans arguably violate “the guarantee against
        self-incrimination” because they differ from acceptable forms of bodily
        evidence, such as fingerprints or blood samples, in an important way:
        they are not simply physical, hard evidence, but evidence that is
        intimately linked to the defendant's mind. Under US law, brain-scanning
        technologies might also raise implications for the Fourth Amendment,
        calling into question whether they constitute an unreasonable search and
        seizure.
      - >-
        Military activity affects the physical geology. This was first noted
        through the intensive shelling on the Western Front during World War I,
        which caused the shattering of the bedrock and changed the rocks'
        permeability. New minerals, rocks, and land-forms are also a byproduct
        of nuclear testing.
  - source_sentence: >-
      Right after a bombing in Moscow on September 6, 1999, several anti-nuclear
      activists were detained under suspicion. Vladimir Slivyak was one of the
      three arrested under suspicion. He was an activist in the anti-nuclear
      movement and a Voronezh action camp organizer. After the bombing Slivyak
      was pushed into a car by several men who claimed to be Moscow police. The
      police interrogated and threatened Slivyak for around ninety minutes
      before letting him go. The Moscow police thought environmentalists from
      the anti-nuclear movement were associated with the bombing since an
      earlier bombing occurred on August 31 at Manezh Palace in Moscow . After
      the incident, on August 31, several more bombings occurred which agitated
      many people, leading to the racially profiled arrest of dark-skinned
      Muscovites and visitors to the Russian capital.
    sentences:
      - >-
        The technique works backwards from the target to identify a precursor
        molecule and an enzyme that converts it into the target, and then a
        second precursor that can produce the first and so on until a simple,
        inexpensive molecule becomes the beginning of the series. For each
        precursor, the enzyme is evolved using induced mutations and natural
        selection to produce a more productive version. The evolutionary process
        can be repeated over multiple generations until acceptable productivity
        is achieved. The process does not require high temperature, high
        pressure, the use of exotic catalysts or other elements that can
        increase costs. The enzyme "optimizations" that increase the production
        of one precursor from another are cumulative in that the same precursor
        productivity improvements can potentially be leveraged across multiple
        target molecules.
      - >-
        Right after a bombing in Moscow on September 6, 1999, several
        anti-nuclear activists were detained under suspicion. Vladimir Slivyak
        was one of the three arrested under suspicion. He was an activist in the
        anti-nuclear movement and a Voronezh action camp organizer. After the
        bombing Slivyak was pushed into a car by several men who claimed to be
        Moscow police. The police interrogated and threatened Slivyak for around
        ninety minutes before letting him go. The Moscow police thought
        environmentalists from the anti-nuclear movement were associated with
        the bombing since an earlier bombing occurred on August 31 at Manezh
        Palace in Moscow . After the incident, on August 31, several more
        bombings occurred which agitated many people, leading to the racially
        profiled arrest of dark-skinned Muscovites and visitors to the Russian
        capital.
      - >-
        One of the main sources of information about the Earth's composition
        comes from understanding the relationship between peridotite and basalt
        melting. Peridotite makes up most of Earth's mantle. Basalt, which is
        highly concentrated in the Earth's oceanic crust, is formed when magma
        reaches the Earth's surface and cools down at a very fast rate. When
        magma cools, different minerals crystallize at different times depending
        on the cooling temperature of that respective mineral. This ultimately
        changes the chemical composition of the melt as different minerals begin
        to crystallize. Fractional crystallization of elements in basaltic
        liquids has also been studied to observe the composition of lava in the
        upper mantle. This concept can be applied by scientists to give insight
        on the evolution of Earth's mantle and how concentrations of lithophile
        trace elements have varied over the last 3.5 billion years.
  - source_sentence: >-
      The group designs numerous structural concepts such as frameworks and
      floors like Dalle O'Portune and D-Dalle.

      The timber design office of excellence is an entity specializing in the
      design and optimization of wood construction projects. It stands out for
      its ability to meet the highest demands in terms of performance,
      durability and aesthetics, and is thus recognized for its contribution to
      the realization of ambitious projects in the field of timber construction.
    sentences:
      - >-
        The group designs numerous structural concepts such as frameworks and
        floors like Dalle O'Portune and D-Dalle.

        The timber design office of excellence is an entity specializing in the
        design and optimization of wood construction projects. It stands out for
        its ability to meet the highest demands in terms of performance,
        durability and aesthetics, and is thus recognized for its contribution
        to the realization of ambitious projects in the field of timber
        construction.
      - >-
        In waterways, the term bridge strike may be used when a water vessel
        collides with a bridge. This may include a collision to the bridge span
        or a collision to the bridge support structure such as a pier. Bridge
        protection systems are used to mitigate the effects of a ship strike.

        In 2014, the United States Coast Guard published statistics that it
        investigated 205 bridge strikes in the eleven years prior to the
        publication. All of those collisions involved involved a fixed, swing,
        lift or draw bridge. That number was 1.2% of all vessel collision
        incidents investigated by the Coast Guard. The primary causal factor was
        the lack of accurate air draft data, the distance between water surface
        to the top most part of the vessel.
      - >-
        Post, Stephen Garrard. Encyclopedia of bioethics. Third edition.
        Macmillan Reference USA, 2003. ISBN 0028657748. ISSN 0950-4125;
        DOI:10.1108/09504120510573477.  (5-Volume Set; 3062 pages).

        Reich, Warren Thomas Encyclopedia of Bioethics. First edition.  New
        York: Free Press, 1978.  ISBN 0029261805.  ISBN 978-0029261804. 
        (4-Volume Set; 1933 pages)

        Reich, Warren Thomas Encyclopedia of Bioethics. Second edition.  New
        York: Free Press, 1982.  (5-Volume Set; 2950 pages)

        Reich, Warren Thomas Encyclopedia of Bioethics. Third edition.  New
        York: Simon & Schuster Macmillan, 1995; London: Simon and Schuster and
        Prentice Hall International, c1995. Rev. ed. (5-Volume Set; 2950 pages;
        464 articles) ISBN 0028973550. ISBN 978-0028973555.
  - source_sentence: >-
      Regression is used to make predictions based on the retrieved data through
      statistical trends and statistical modeling. Different uses of this
      technique are used for fetching Photometric redshifts and measurements of
      physical parameters of stars. The approaches are listed below:


      Artificial neural network (ANN)

      Support vector regression (SVR)

      Decision tree

      Random forest

      k-nearest neighbors regression

      Kernel regression

      Principal component regression (PCR)

      Gaussian process

      Least squared regression (LSR)

      Partial least squares regression
    sentences:
      - >-
        Regression is used to make predictions based on the retrieved data
        through statistical trends and statistical modeling. Different uses of
        this technique are used for fetching Photometric redshifts and
        measurements of physical parameters of stars. The approaches are listed
        below:


        Artificial neural network (ANN)

        Support vector regression (SVR)

        Decision tree

        Random forest

        k-nearest neighbors regression

        Kernel regression

        Principal component regression (PCR)

        Gaussian process

        Least squared regression (LSR)

        Partial least squares regression
      - >-
        Clandestine chemistry is not limited to drugs; it is also associated
        with explosives, and other illegal chemicals. Of the explosives
        manufactured illegally, nitroglycerin and acetone peroxide are easiest
        to produce due to the ease with which the precursors can be acquired.

        Uncle Fester is a writer who commonly writes about different aspects of
        clandestine chemistry. Secrets of Methamphetamine Manufacture is among
        his most popular books, and is considered required reading for DEA
        agents. More of his books deal with other aspects of clandestine
        chemistry, including explosives, and poisons. Fester is, however,
        considered by many to be a faulty and unreliable source for information
        in regard to the clandestine manufacture of chemicals.
      - >-
        A novel input representation has been developed consisting of a
        combination of sparse encoding, Blosum encoding, and input derived from
        hidden Markov models. this method predicts T-cell epitopes for the
        genome of hepatitis C virus and discuss possible applications of the
        prediction method to guide the process of rational vaccine design.
  - source_sentence: >-
      Burray and The Barriers

      Undiscovered Scotland: The Churchill Barriers

      Our Past History: The Churchill Barriers Archived 17 December 2006 at the
      Wayback Machine

      Okneypics.com: photos of the barrier Archived 15 May 2008 at the Wayback
      Machine
    sentences:
      - |-
        For a neuron, in the limit of 
          
            
              
                b
                =
                0
              
            
            {\displaystyle b=0}
          
        , the map becomes 1D, since 
          
            
              
                y
              
            
            {\displaystyle y}
          
         converges to a constant. If the parameter 
          
            
              
                b
              
            
            {\displaystyle b}
          
         is scanned in a range, different orbits will be seen, some periodic, others chaotic, that appear between two fixed points, one at 
          
            
              
                x
                =
                1
              
            
            {\displaystyle x=1}
          
         ; 
          
            
              
                y
                =
                1
              
            
            {\displaystyle y=1}
          
         and the other close to the value of 
          
            
              
                k
              
            
            {\displaystyle k}
          
         (which would be the regime excitable).


        == References ==
      - >-
        Cerebellar Purkinje neurons have been proposed to have two distinct
        bursting modes: dendritically driven, by dendritic Ca2+ spikes, and
        somatically driven, wherein the persistent Na+ current is the burst
        initiator and the SK K+ current is the burst terminator. Purkinje
        neurons may utilise these bursting forms in information coding to the
        deep cerebellar nuclei.
      - >-
        Burray and The Barriers

        Undiscovered Scotland: The Churchill Barriers

        Our Past History: The Churchill Barriers Archived 17 December 2006 at
        the Wayback Machine

        Okneypics.com: photos of the barrier Archived 15 May 2008 at the Wayback
        Machine
pipeline_tag: sentence-similarity
library_name: sentence-transformers

SentenceTransformer based on zacbrld/MNLP_M2_document_encoder

This is a sentence-transformers model finetuned from zacbrld/MNLP_M2_document_encoder. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: zacbrld/MNLP_M2_document_encoder
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("zacbrld/MNLP_M2_document_encoder")
# Run inference
sentences = [
    'Burray and The Barriers\nUndiscovered Scotland: The Churchill Barriers\nOur Past History: The Churchill Barriers Archived 17 December 2006 at the Wayback Machine\nOkneypics.com: photos of the barrier Archived 15 May 2008 at the Wayback Machine',
    'Burray and The Barriers\nUndiscovered Scotland: The Churchill Barriers\nOur Past History: The Churchill Barriers Archived 17 December 2006 at the Wayback Machine\nOkneypics.com: photos of the barrier Archived 15 May 2008 at the Wayback Machine',
    'Cerebellar Purkinje neurons have been proposed to have two distinct bursting modes: dendritically driven, by dendritic Ca2+ spikes, and somatically driven, wherein the persistent Na+ current is the burst initiator and the SK K+ current is the burst terminator. Purkinje neurons may utilise these bursting forms in information coding to the deep cerebellar nuclei.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 5,489 training samples
  • Columns: sentence_0 and sentence_1
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1
    type string string
    details
    • min: 34 tokens
    • mean: 144.23 tokens
    • max: 256 tokens
    • min: 34 tokens
    • mean: 144.23 tokens
    • max: 256 tokens
  • Samples:
    sentence_0 sentence_1
    In related work, Smoller, Temple, and Vogler propose that this shockwave may have resulted in our part of the universe having a lower density than that surrounding it, causing the accelerated expansion normally attributed to dark energy.
    They also propose that this related theory could be tested: a universe with dark energy should give a figure for the cubic correction to redshift versus luminosity C = −0.180 at a = a whereas for Smoller, Temple, and Vogler's alternative C should be positive rather than negative. They give a more precise calculation for their wave model alternative as: the cubic correction to redshift versus luminosity at a = a is C = 0.359.
    In related work, Smoller, Temple, and Vogler propose that this shockwave may have resulted in our part of the universe having a lower density than that surrounding it, causing the accelerated expansion normally attributed to dark energy.
    They also propose that this related theory could be tested: a universe with dark energy should give a figure for the cubic correction to redshift versus luminosity C = −0.180 at a = a whereas for Smoller, Temple, and Vogler's alternative C should be positive rather than negative. They give a more precise calculation for their wave model alternative as: the cubic correction to redshift versus luminosity at a = a is C = 0.359.
    Evolution is a central organizing concept in biology. It is the change in heritable characteristics of populations over successive generations. In artificial selection, animals were selectively bred for specific traits.
    Given that traits are inherited, populations contain a varied mix of traits, and reproduction is able to increase any population, Darwin argued that in the natural world, it was nature that played the role of humans in selecting for specific traits. Darwin inferred that individuals who possessed heritable traits better adapted to their environments are more likely to survive and produce more offspring than other individuals. He further inferred that this would lead to the accumulation of favorable traits over successive generations, thereby increasing the match between the organisms and their environment.
    Evolution is a central organizing concept in biology. It is the change in heritable characteristics of populations over successive generations. In artificial selection, animals were selectively bred for specific traits.
    Given that traits are inherited, populations contain a varied mix of traits, and reproduction is able to increase any population, Darwin argued that in the natural world, it was nature that played the role of humans in selecting for specific traits. Darwin inferred that individuals who possessed heritable traits better adapted to their environments are more likely to survive and produce more offspring than other individuals. He further inferred that this would lead to the accumulation of favorable traits over successive generations, thereby increasing the match between the organisms and their environment.
    The total number of engineers employed in the U.S. in 2015 was roughly 1.6 million. Of these, 278,340 were mechanical engineers (17.28%), the largest discipline by size. In 2012, the median annual income of mechanical engineers in the U.S. workforce was $80,580. The median income was highest when working for the government ($92,030), and lowest in education ($57,090). In 2014, the total number of mechanical engineering jobs was projected to grow 5% over the next decade. As of 2009, the average starting salary was $58,800 with a bachelor's degree. The total number of engineers employed in the U.S. in 2015 was roughly 1.6 million. Of these, 278,340 were mechanical engineers (17.28%), the largest discipline by size. In 2012, the median annual income of mechanical engineers in the U.S. workforce was $80,580. The median income was highest when working for the government ($92,030), and lowest in education ($57,090). In 2014, the total number of mechanical engineering jobs was projected to grow 5% over the next decade. As of 2009, the average starting salary was $58,800 with a bachelor's degree.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 5
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Epoch Step Training Loss
1.4535 500 0.0002
2.9070 1000 0.0
4.3605 1500 0.0007

Framework Versions

  • Python: 3.10.11
  • Sentence Transformers: 3.4.1
  • Transformers: 4.51.3
  • PyTorch: 2.6.0
  • Accelerate: 1.7.0
  • Datasets: 3.6.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}