MNLP_M3_document_encoder / README.md

zacbrld

Add new SentenceTransformer model.

c90ed0d verified 8 months ago

preview code

raw

history blame contribute delete

44.3 kB

metadata

tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:42185
  - loss:TripletLoss
base_model: sentence-transformers/all-MiniLM-L6-v2
widget:
  - source_sentence: >-
      For example, t ∈ { 0 , 1 , … , N } , N 0 ,  or {\mbox{ or }}[0,+\infty ).}
      Similarly, a filtered probability space (also known as a stochastic basis)
      ( Ω , F , { F t } t ≥ 0 , P ) {\displaystyle \left(\Omega ,{\mathcal
      {F}},\left\{{\mathcal {F}}_{t}\right\}_{t\geq 0},\mathbb {P} \right)} , is
      a probability space equipped with the filtration { F t } t ≥ 0
      {\displaystyle \left\{{\mathcal {F}}_{t}\right\}_{t\geq 0}} of its σ
      {\displaystyle \sigma } -algebra F {\displaystyle {\mathcal {F}}} . A
      filtered probability space is said to satisfy the usual conditions if it
      is complete (i.e., F 0 {\displaystyle {\mathcal {F}}_{0}} contains all P
      {\displaystyle \mathbb {P} } -null sets) and right-continuous (i.e. F t =
      F t + := ⋂ s > t F s {\displaystyle {\mathcal {F}}_{t}={\mathcal
      {F}}_{t+}:=\bigcap _{s>t}{\mathcal {F}}_{s}} for all times t
      {\displaystyle t} ).It is also useful (in the case of an unbounded index
      set) to define F ∞ {\displaystyle {\mathcal {F}}_{\infty }} as the σ
      {\displaystyle \sigma } -algebra generated by the infinite union of the F
      t {\displaystyle {\mathcal {F}}_{t}} 's, which is contained in F
      {\displaystyle {\mathcal {F}}}: F ∞ = σ ( ⋃ t ≥ 0 F t ) ⊆ F .
    sentences:
      - >-
        These individuals can experience these symptoms from failed attempts of
        depression like symptoms.Narcissistic personality disorder is
        characterized as feelings of superiority, a sense of grandiosity,
        exhibitionism, charming but also exploitive behaviors in the
        interpersonal domain, success, beauty, feelings of entitlement and a
        lack of empathy. Those with this disorder often engage in assertive self
        enhancement and antagonistic self protection. All of these factors can
        lead an individual with narcissistic personality disorder to manipulate
        others.
      - >-
        {\displaystyle {\mathcal {F}}_{\infty }=\sigma \left(\bigcup _{t\geq
        0}{\mathcal {F}}_{t}\right)\subseteq {\mathcal {F}}.} A σ-algebra
        defines the set of events that can be measured, which in a probability
        context is equivalent to events that can be discriminated, or "questions
        that can be answered at time t {\displaystyle t} ". Therefore, a
        filtration is often used to represent the change in the set of events
        that can be measured, through gain or loss of information. A typical
        example is in mathematical finance, where a filtration represents the
        information available up to and including each time t {\displaystyle t}
        , and is more and more precise (the set of measurable events is staying
        the same or increasing) as more information from the evolution of the
        stock price becomes available.
      - >-
        Section: Structure and dynamics > Composition. Like microtubules,
        neurotubules are made up of protein polymers of α-tubulin and β-tubulin,
        globular proteins that are closely related. They join together to form a
        dimer, called tubulin. Neurotubules are generally assembled by 13
        protofilaments which are polymerized from tubulin dimers. As a tubulin
        dimer consists of one α-tubulin and one β-tubulin, one end of the
        neurotubule is exposed with the α-tubulin and the other end with
        β-tubulin, these two ends contribute to the polarity of the neurotubule
        – the plus (+) end and the minus (-) end. The β-tubulin subunit is
        exposed on the plus (+) end. The two ends differ in their growth rate:
        plus (+) end is the fast-growing end while minus (-) end is the
        slow-growing end. Both ends have their own rate of polymerization and
        depolymerization of tubulin dimers, net polymerization causes the
        assembly of tubulin, hence the length of the neurotubules.
  - source_sentence: >-
      We want to find the value of $X$ in the given situation. We are told that
      James has $X$ apples, and 4 of them are red and 3 of them are green. We
      want to find the probability that both apples he chooses are green. The
      total number of apples James has is $X$, and the total number of green
      apples is 3. To find the probability, we can use the formula: In this
      case, the number of favorable outcomes is choosing 2 green apples out of
      the 3 available green apples. The total number of possible outcomes is
      choosing any 2 apples out of the $X$ total apples. So, the probability is:
      Probability = (Number of ways to choose 2 green apples) / (Number of ways
      to choose 2 apples) Since we are given that the probability is
      $\frac{1}{7}$, we can write: $\frac{1}{7} = \frac{3 \choose 2}{X \choose
      2}$ Simplifying, we have: $\frac{1}{7} = \frac{3}{\frac{X(X-1)}{2}}$
    sentences:
      - >-
        Article: A common type system for clinical natural language processing.
        One challenge in reusing clinical data stored in electronic medical
        records is that these data are heterogenous. Clinical Natural Language
        Processing (NLP) plays an important role in transforming information in
        clinical text to a standard representation that is comparable and
        interoperable. Information may be processed and shared when a type
        system specifies the allowable data structures. Therefore, we aim to
        define a common type system for clinical NLP that enables
        interoperability between structured and unstructured data generated in
        different clinical settings. We describe a common type system for
        clinical NLP that has an end target of deep semantics based on Clinical
        Element Models (CEMs), thus interoperating with structured data and
        accommodating diverse NLP approaches. The type system has been
        implemented in UIMA (Unstructured Information Management Architecture)
        and is fully functional in a popular open-source clinical NLP system,
        cTAKES (clinical Text Analysis and Knowledge Extraction System) versions
        2.0 and later. We have created a type system that targets deep
        semantics, thereby allowing for NLP systems to encapsulate knowledge
        from text and share it alongside heterogenous clinical data sources.
        Rather than surface semantics that are typically the end product of NLP
        algorithms, CEM-based semantics explicitly build in deep clinical
        semantics as the point of interoperability with more structured data
        types.
      - >-
        Furthermore, the majority of all the male skeletons from the European
        Neolithic period have so far yielded Y-DNA belonging to this haplogroup.
        The oldest skeletons confirmed by ancient DNA testing as carrying
        haplogroup G2a were five found in the Avellaner cave burial site in
        Catalonia, Spain and were dated by radiocarbon dating to about 5000 BCE.
        Haplogroup I-M253 (I1) at 4,3% of which L22, Z58 and Z63. According to a
        study published in 2010, I-M253 originated between 3,170 and 5,000 years
        ago, in Chalcolithic Europe. A 2014 study in Hungary uncovered remains
        of two individuals from the Linear Pottery culture, one of whom was
        found to have carried the M253 SNP which defines Haplogroup I1. This
        culture is thought to have been present between 7,500 and 6,500 years
        ago. Finally, there are also some other Y-DNA Haplogroups presented at a
        lower levels among Bulgarians ~ 10% all together, as J-M267 (J1) at
        ~3.5%, E-M34 (E1b1b1b2a1) at ~2%, T-M70 (T1a) at ~1.5%, at less than 1%
        Haplogroup C-M217 (C2), H-M82 (H1a1), N-M231 (N), Q-M242 (Q), L-M61 (L),
        I-M170 (I*), E-M96 (E*) excl.
      - >-
        So, the probability is: Probability = (Number of ways to Multiplying
        both sides of the equation by $\frac{X(X-1)}{2}$, we get:
        $\frac{X(X-1)}{2} = 3 \times 7$ $X(X-1) = 6 \times 7$ $X(X-1) = 42$
        Expanding the equation, we have: $X^2 - X = 42$ Rearranging the
        equation, we get: $X^2 - X - 42 = 0$ This is a quadratic equation that
        can be factored as: $(X - 7)(X + 6) = 0$ Setting each factor equal to
        zero, we have two possible solutions: $X - 7 = 0$ or $X + 6 = 0$ Solving
        for $X$, we find: $X = 7$ or $X = -6$ Since the number of apples cannot
        be negative, the value of $X$ is 7.
  - source_sentence: >-
      Section: Model. The Oppenheimer–Snyder model of continued gravitational
      collapse is described by the line element d s 2 = − d τ 2 + A 2 ( η ) ( d
      R 2 1 − 2 M R − 2 R b 2 1 R + + R 2 d Ω 2 ) {\displaystyle ds^{2}=-d\tau
      ^{2}+A^{2}(\eta )\left({\frac {dR^{2}}{1-2M{\frac
      {R_{-}^{2}}{R_{b}^{2}}}{\frac {1}{R_{+}}}}}+R^{2}d\Omega ^{2}\right)} The
      quantities appearing in this expression are as follows: The coordinates
      are ( τ , R , θ , ϕ ) {\displaystyle (\tau ,R,\theta ,\phi )} where θ , ϕ
      {\displaystyle \theta ,\phi } are coordinates for the 2-sphere. R b
      {\displaystyle R_{b}} is a positive quantity, the "boundary radius",
      representing the boundary of the matter region. M {\displaystyle M} is a
      positive quantity, the mass.
    sentences:
      - >-
        A standard demonstration in general relativity is to show how, in the
        "Newtonian limit" (i.e. the particles are moving slowly, the
        gravitational field is weak, and the field is static), curvature of time
        alone is sufficient to derive Newton's law of gravity. : 101–106
        Newtonian gravitation is a theory of curved time. General relativity is
        a theory of curved time and curved space. Given G as the gravitational
        constant, M as the mass of a Newtonian star, and orbiting bodies of
        insignificant mass at distance r from the star, the spacetime interval
        for Newtonian gravitation is one for which only the time coefficient is
        variable:: 229–232 Δ s 2 = ( 1 − 2 G M c 2 r ) ( c Δ t ) 2 − ( Δ x ) 2 −
        ( Δ y ) 2 − ( Δ z ) 2 {\displaystyle \Delta s^{2}=\left(1-{\frac
        {2GM}{c^{2}r}}\right)(c\Delta t)^{2}-\,(\Delta x)^{2}-(\Delta
        y)^{2}-(\Delta z)^{2}}
      - >-
        Section: Examples > Example 1. s ( t ) = A cos ⁡ ( ω t + θ ) ,
        {\displaystyle s(t)=A\cos(\omega t+\theta ),} where ω > 0. s a ( t ) = A
        e j ( ω t + θ ) , φ ( t ) = ω t + θ . {\displaystyle
        {\begin{aligned}s_{\mathrm {a} }(t)&=Ae^{j(\omega t+\theta )},\\\varphi
        (t)&=\omega t+\theta .\end{aligned}}} In this simple sinusoidal example,
        the constant θ is also commonly referred to as phase or phase offset.
        φ(t) is a function of time; θ is not. In the next example, we also see
        that the phase offset of a real-valued sinusoid is ambiguous unless a
        reference (sin or cos) is specified. φ(t) is unambiguously defined.
      - >-
        M {\displaystyle M} is a positive quantity, the mass. R − = m i n ( R ,
        R b ) {\displaystyle R_{-}=\mathrm {min} (R,R_{b})} and R + = m a x ( R
        , R b ) {\displaystyle R_{+}=\mathrm {max} (R,R_{b})} . η {\displaystyle
        \eta } is defined implicitly by the equation τ ( η , R ) = 1 2 R + 3 2 M
        ( η + sin ⁡ η ) . {\displaystyle \tau (\eta ,R)={\frac {1}{2}}{\sqrt
        {\frac {R_{+}^{3}}{2M}}}(\eta +\sin \eta ).} A ( η ) = 1 + cos ⁡ η 2
        {\displaystyle A(\eta )={\frac {1+\cos \eta }{2}}} . This expression is
        valid both in the matter region R < R b {\displaystyle R<R_{b}} , and
        the vacuum region R > R b {\displaystyle R>R_{b}} , and continuously
        transitions between the two.
  - source_sentence: >-
      Section: Properties and parameters > Plasma potential. Since plasmas are
      very good electrical conductors, electric potentials play an important
      role. The average potential in the space between charged particles,
      independent of how it can be measured, is called the "plasma potential",
      or the "space potential". If an electrode is inserted into a plasma, its
      potential will generally lie considerably below the plasma potential due
      to what is termed a Debye sheath. The good electrical conductivity of
      plasmas makes their electric fields very small. This results in the
      important concept of "quasineutrality", which says the density of negative
      charges is approximately equal to the density of positive charges over
      large volumes of the plasma ( n e = ⟨ Z ⟩ n i {\displaystyle n_{e}=\langle
      Z\rangle n_{i}} ), but on the scale of the Debye length, there can be
      charge imbalance. In the special case that double layers are formed, the
      charge separation can extend some tens of Debye lengths. The magnitude of
      the potentials and electric fields must be determined by means other than
      simply finding the net charge density. A common example is to assume that
      the electrons satisfy the Boltzmann relation: n e ∝ exp ⁡ ( e Φ / k B T e
      ) . {\displaystyle n_{e}\propto \exp(e\Phi /k_{\text{B}}T_{e}).}
      Differentiating this relation provides a means to calculate the electric
      field from the density: E → = k B T e e ∇ n e n e .
    sentences:
      - >-
        When the integers a and b are coprime, the standard way of expressing
        this fact in mathematical notation is to indicate that their greatest
        common divisor is one, by the formula gcd(a, b) = 1 or (a, b) = 1. In
        their 1989 textbook Concrete Mathematics, Ronald Graham, Donald Knuth,
        and Oren Patashnik proposed an alternative notation a ⊥ b {\displaystyle
        a\perp b} to indicate that a and b are relatively prime and that the
        term "prime" be used instead of coprime (as in a is prime to b). A fast
        way to determine whether two numbers are coprime is given by the
        Euclidean algorithm and its faster variants such as binary GCD algorithm
        or Lehmer's GCD algorithm. The number of integers coprime with a
        positive integer n, between 1 and n, is given by Euler's totient
        function, also known as Euler's phi function, φ(n). A set of integers
        can also be called coprime if its elements share no common positive
        factor except 1. A stronger condition on a set of integers is pairwise
        coprime, which means that a and b are coprime for every pair (a, b) of
        different integers in the set. The set {2, 3, 4} is coprime, but it is
        not pairwise coprime since 2 and 4 are not relatively prime.
      - >-
        Let's assume the number of cans of corn Beth bought is C. Twice the
        number of cans of corn she bought would be 2C. So, 15 more than twice
        the number of cans of corn she bought would be 2C + 15. We know that
        Beth purchased 35 cans of peas, so we can set up the equation: 2C + 15 =
        35. To isolate C, we can subtract 15 from both sides of the equation: 2C
        = 35 - 15 = 20. Dividing both sides of the equation by 2, we get C =
        20/2 = 10. Therefore, Beth bought 10 cans of corn.
      - >-
        {\displaystyle n_{e}\propto \exp(e\Phi /k_{\text{B}}T_{e}).}
        Differentiating this relation provides a means to calculate the electric
        field from the density: E → = k B T e e ∇ n e n e . {\displaystyle {\vec
        {E}}={\frac {k_{\text{B}}T_{e}}{e}}{\frac {\nabla n_{e}}{n_{e}}}.} It is
        possible to produce a plasma that is not quasineutral. An electron beam,
        for example, has only negative charges. The density of a non-neutral
        plasma must generally be very low, or it must be very small, otherwise,
        it will be dissipated by the repulsive electrostatic force.
  - source_sentence: >-
      If X {\displaystyle X} is a linear space and g {\displaystyle g} are
      constants, the system is said to be subject to additive noise, otherwise
      it is said to be subject to multiplicative noise. This term is somewhat
      misleading as it has come to mean the general case even though it appears
      to imply the limited case in which g ( x ) ∝ x {\displaystyle g(x)\propto
      x} . For a fixed configuration of noise, SDE has a unique solution
      differentiable with respect to the initial condition.
    sentences:
      - >-
        Nontriviality of stochastic case shows up when one tries to average
        various objects of interest over noise configurations. In this sense, an
        SDE is not a uniquely defined entity when noise is multiplicative and
        when the SDE is understood as a continuous time limit of a stochastic
        difference equation. In this case, SDE must be complemented by what is
        known as "interpretations of SDE" such as Itô or a Stratonovich
        interpretations of SDEs.
      - >-
        Article: RNA-Seq technology and its application in fish
        transcriptomics.. High-throughput sequencing technologies, also known as
        next-generation sequencing (NGS) technologies, have revolutionized the
        way that genomic research is advancing. In addition to the static
        genome, these state-of-art technologies have been recently exploited to
        analyze the dynamic transcriptome, and the resulting technology is
        termed RNA sequencing (RNA-seq). RNA-seq is free from many limitations
        of other transcriptomic approaches, such as microarray and tag-based
        sequencing method. Although RNA-seq has only been available for a short
        time, studies using this method have completely changed our perspective
        of the breadth and depth of eukaryotic transcriptomes. In terms of the
        transcriptomics of teleost fishes, both model and non-model species have
        benefited from the RNA-seq approach and have undergone tremendous
        advances in the past several years. RNA-seq has helped not only in
        mapping and annotating fish transcriptome but also in our understanding
        of many biological processes in fish, such as development, adaptive
        evolution, host immune response, and stress response. In this review, we
        first provide an overview of each step of RNA-seq from library
        construction to the bioinformatic analysis of the data. We then
        summarize and discuss the recent biological insights obtained from the
        RNA-seq studies in a variety of fish species.
      - >-
        This is the σ-algebra generated by the singletons of X . {\displaystyle
        X.} Note: "countable" includes finite or empty. The collection of all
        unions of sets in a countable partition of X {\displaystyle X} is a
        σ-algebra.
pipeline_tag: sentence-similarity
library_name: sentence-transformers

SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: sentence-transformers/all-MiniLM-L6-v2
Maximum Sequence Length: 350 tokens
Output Dimensionality: 384 dimensions
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 350, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("zacbrld/MNLP_M3_document_encoder_kaggle")
# Run inference
sentences = [
    'If X {\\displaystyle X} is a linear space and g {\\displaystyle g} are constants, the system is said to be subject to additive noise, otherwise it is said to be subject to multiplicative noise. This term is somewhat misleading as it has come to mean the general case even though it appears to imply the limited case in which g ( x ) ∝ x {\\displaystyle g(x)\\propto x} . For a fixed configuration of noise, SDE has a unique solution differentiable with respect to the initial condition.',
    'Nontriviality of stochastic case shows up when one tries to average various objects of interest over noise configurations. In this sense, an SDE is not a uniquely defined entity when noise is multiplicative and when the SDE is understood as a continuous time limit of a stochastic difference equation. In this case, SDE must be complemented by what is known as "interpretations of SDE" such as Itô or a Stratonovich interpretations of SDEs.',
    'Article: RNA-Seq technology and its application in fish transcriptomics.. High-throughput sequencing technologies, also known as next-generation sequencing (NGS) technologies, have revolutionized the way that genomic research is advancing. In addition to the static genome, these state-of-art technologies have been recently exploited to analyze the dynamic transcriptome, and the resulting technology is termed RNA sequencing (RNA-seq). RNA-seq is free from many limitations of other transcriptomic approaches, such as microarray and tag-based sequencing method. Although RNA-seq has only been available for a short time, studies using this method have completely changed our perspective of the breadth and depth of eukaryotic transcriptomes. In terms of the transcriptomics of teleost fishes, both model and non-model species have benefited from the RNA-seq approach and have undergone tremendous advances in the past several years. RNA-seq has helped not only in mapping and annotating fish transcriptome but also in our understanding of many biological processes in fish, such as development, adaptive evolution, host immune response, and stress response. In this review, we first provide an overview of each step of RNA-seq from library construction to the bioinformatic analysis of the data. We then summarize and discuss the recent biological insights obtained from the RNA-seq studies in a variety of fish species.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

Size: 42,185 training samples
Columns: sentence_0, sentence_1, and sentence_2

Approximate statistics based on the first 1000 samples:

	sentence_0	sentence_1	sentence_2
type	string	string	string
details	min: 13 tokens mean: 194.02 tokens max: 350 tokens	min: 14 tokens mean: 182.63 tokens max: 350 tokens	min: 12 tokens mean: 230.56 tokens max: 350 tokens

Samples:

sentence_0	sentence_1	sentence_2
Most hard-bodied insect specimens and some other hard-bodied invertebrates such as certain Arachnida, are preserved as pinned specimens. Either while still fresh, or after rehydrating them if necessary because they had dried out, specimens are transfixed by special stainless steel entomological pins. As the insect dries the internal tissues solidify and, possibly aided to some extent by the integument, they grip the pin and secure the specimen in place on the pin. Very small, delicate specimens may instead be secured by fine steel points driven into slips of card, or glued to card points or similar attachments that in turn are pinned in the same way as entire mounted insects.	`The pins offer a means of handling the specimens without damage, and they also bear labels for descriptive and reference data. Once dried, the specimens may be kept in conveniently sized open trays. The bottoms of the trays are lined with a material suited to receiving and holding entomological pins securely and conveniently.`	Article: Interruption of People in Human-Computer Interaction: A General Unifying Definition of Human Interruption and Taxonomy. Abstract : User-interruption in human-computer interaction (HCI) is an increasingly important problem. Many of the useful advances in intelligent and multitasking computer systems have the significant side effect of greatly increasing user-interruption. This previously innocuous HCI problem has become critical to the successful function of many kinds of modern computer systems. Unfortunately, no HCI design guidelines exist for solving this problem. In fact, theoretical tools do not yet exist for investigating the HCI problem of user-interruption in a comprehensive and generalizable way. This report asserts that a single unifying definition of user-interruption and the accompanying practical taxonomy would be useful theoretical tools for driving effective investigation of this crucial HCI problem. These theoretical tools are constructed here. A comprehensive a...
In strike-slip tectonic settings, deformation of the lithosphere occurs primarily in the plane of Earth as a result of near horizontal maximum and minimum principal stresses. Faults associated with these plate boundaries are primarily vertical. Wherever these vertical fault planes encounter bends, movement along the fault can create local areas of compression or tension. When the curve in the fault plane moves apart, a region of transtension occurs and sometimes is large enough and long-lived enough to create a sedimentary basin often called a pull-apart basin or strike-slip basin.	These basins are often roughly rhombohedral in shape and may be called a rhombochasm. A classic rhombochasm is illustrated by the Dead Sea rift, where northward movement of the Arabian Plate relative to the Anatolian Plate has created a strike slip basin. The opposite effect is that of transpression, where converging movement of a curved fault plane causes collision of the opposing sides of the fault. An example is the San Bernardino Mountains north of Los Angeles, which result from convergence along a curve in the San Andreas fault system. The Northridge earthquake was caused by vertical movement along local thrust and reverse faults "bunching up" against the bend in the otherwise strike-slip fault environment.	`This was the first interpretation and prediction of a particle and corresponding antiparticle. See Dirac spinor and bispinor for further description of these spinors. In the non-relativistic limit the Dirac equation reduces to the Pauli equation (see Dirac equation for how).`
`M1: This was used by seacoast artillery for major-caliber seacoast guns. It computed continuous firing data for a battery of two guns that were separated by not more than 1,000 feet (300 m). It utilised the same type of input data furnished by a range section with the then-current (1940) types of position-finding and fire-control equipment. M3: This was used in conjunction with the M9 and M10 directors to compute all required firing data, i.e. azimuth, elevation and fuze time.`	`The computations were made continuously, so that the gun was at all times correctly pointed and the fuze correctly timed for firing at any instant. The computer was mounted in the M13 or M14 director trailer.`	Section: Industry > Semiconductors. A semiconductor is a material that has a resistivity between a conductor and insulator. Modern day electronics run on semiconductors, and the industry had an estimated US$530 billion market in 2021. Its electronic properties can be greatly altered through intentionally introducing impurities in a process referred to as doping. Semiconductor materials are used to build diodes, transistors, light-emitting diodes (LEDs), and analog and digital electric circuits, among their many uses. Semiconductor devices have replaced thermionic devices like vacuum tubes in most applications. Semiconductor devices are manufactured both as single discrete devices and as integrated circuits (ICs), which consist of a number—from a few to millions—of devices manufactured and interconnected on a single semiconductor substrate. Of all the semiconductors in use today, silicon makes up the largest portion both by quantity and commercial value. Monocrystalline silicon is used ...

Loss: TripletLoss with these parameters:

{
    "distance_metric": "TripletDistanceMetric.EUCLIDEAN",
    "triplet_margin": 5
}

Training Hyperparameters

Non-Default Hyperparameters

per_device_train_batch_size: 16
per_device_eval_batch_size: 16
num_train_epochs: 10
multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: no
prediction_loss_only: True
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 5e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1
num_train_epochs: 10
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.0
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: round_robin

Training Logs

Epoch	Step	Training Loss
0.1896	500	2.189
0.3792	1000	0.2668
0.5688	1500	0.1869
0.7584	2000	0.1456
0.9480	2500	0.1123
1.1377	3000	0.0978
1.3273	3500	0.0735
1.5169	4000	0.0842
1.7065	4500	0.0756
1.8961	5000	0.0577
2.0857	5500	0.0512
2.2753	6000	0.0308
2.4649	6500	0.0271
2.6545	7000	0.0303
2.8441	7500	0.0324
3.0338	8000	0.0325
3.2234	8500	0.0112
3.4130	9000	0.0136
3.6026	9500	0.0123
3.7922	10000	0.0117
3.9818	10500	0.0148
4.1714	11000	0.0085
4.3610	11500	0.0066
4.5506	12000	0.0053
4.7402	12500	0.0078
4.9298	13000	0.006
5.1195	13500	0.0058
5.3091	14000	0.0043
5.4987	14500	0.0027
5.6883	15000	0.0036
5.8779	15500	0.0035
6.0675	16000	0.0029
6.2571	16500	0.0031
6.4467	17000	0.0015
6.6363	17500	0.0025
6.8259	18000	0.0021
7.0155	18500	0.0032
7.2052	19000	0.0011
7.3948	19500	0.001
7.5844	20000	0.0012
7.7740	20500	0.0011
7.9636	21000	0.0013
8.1532	21500	0.0002
8.3428	22000	0.001
8.5324	22500	0.0006
8.7220	23000	0.0003
8.9116	23500	0.0007
9.1013	24000	0.0003
9.2909	24500	0.0002
9.4805	25000	0.0005
9.6701	25500	0.0005
9.8597	26000	0.0005

Framework Versions

Python: 3.12.8
Sentence Transformers: 3.4.1
Transformers: 4.52.2
PyTorch: 2.7.0+cu126
Accelerate: 1.3.0
Datasets: 3.2.0
Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

TripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}