zacbrld commited on
Commit
c90ed0d
·
verified ·
1 Parent(s): 23a179d

Add new SentenceTransformer model.

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 384,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,597 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:42185
8
+ - loss:TripletLoss
9
+ base_model: sentence-transformers/all-MiniLM-L6-v2
10
+ widget:
11
+ - source_sentence: 'For example, t ∈ { 0 , 1 , … , N } , N 0 , or {\mbox{ or }}[0,+\infty
12
+ ).} Similarly, a filtered probability space (also known as a stochastic basis)
13
+ ( Ω , F , { F t } t ≥ 0 , P ) {\displaystyle \left(\Omega ,{\mathcal {F}},\left\{{\mathcal
14
+ {F}}_{t}\right\}_{t\geq 0},\mathbb {P} \right)} , is a probability space equipped
15
+ with the filtration { F t } t ≥ 0 {\displaystyle \left\{{\mathcal {F}}_{t}\right\}_{t\geq
16
+ 0}} of its σ {\displaystyle \sigma } -algebra F {\displaystyle {\mathcal {F}}}
17
+ . A filtered probability space is said to satisfy the usual conditions if it is
18
+ complete (i.e., F 0 {\displaystyle {\mathcal {F}}_{0}} contains all P {\displaystyle
19
+ \mathbb {P} } -null sets) and right-continuous (i.e. F t = F t + := ⋂ s > t F
20
+ s {\displaystyle {\mathcal {F}}_{t}={\mathcal {F}}_{t+}:=\bigcap _{s>t}{\mathcal
21
+ {F}}_{s}} for all times t {\displaystyle t} ).It is also useful (in the case of
22
+ an unbounded index set) to define F ∞ {\displaystyle {\mathcal {F}}_{\infty }}
23
+ as the σ {\displaystyle \sigma } -algebra generated by the infinite union of the
24
+ F t {\displaystyle {\mathcal {F}}_{t}} ''s, which is contained in F {\displaystyle
25
+ {\mathcal {F}}}: F ∞ = σ ( ⋃ t ≥ 0 F t ) ⊆ F .'
26
+ sentences:
27
+ - These individuals can experience these symptoms from failed attempts of depression
28
+ like symptoms.Narcissistic personality disorder is characterized as feelings of
29
+ superiority, a sense of grandiosity, exhibitionism, charming but also exploitive
30
+ behaviors in the interpersonal domain, success, beauty, feelings of entitlement
31
+ and a lack of empathy. Those with this disorder often engage in assertive self
32
+ enhancement and antagonistic self protection. All of these factors can lead an
33
+ individual with narcissistic personality disorder to manipulate others.
34
+ - '{\displaystyle {\mathcal {F}}_{\infty }=\sigma \left(\bigcup _{t\geq 0}{\mathcal
35
+ {F}}_{t}\right)\subseteq {\mathcal {F}}.} A σ-algebra defines the set of events
36
+ that can be measured, which in a probability context is equivalent to events that
37
+ can be discriminated, or "questions that can be answered at time t {\displaystyle
38
+ t} ". Therefore, a filtration is often used to represent the change in the set
39
+ of events that can be measured, through gain or loss of information. A typical
40
+ example is in mathematical finance, where a filtration represents the information
41
+ available up to and including each time t {\displaystyle t} , and is more and
42
+ more precise (the set of measurable events is staying the same or increasing)
43
+ as more information from the evolution of the stock price becomes available.'
44
+ - 'Section: Structure and dynamics > Composition. Like microtubules, neurotubules
45
+ are made up of protein polymers of α-tubulin and β-tubulin, globular proteins
46
+ that are closely related. They join together to form a dimer, called tubulin.
47
+ Neurotubules are generally assembled by 13 protofilaments which are polymerized
48
+ from tubulin dimers. As a tubulin dimer consists of one α-tubulin and one β-tubulin,
49
+ one end of the neurotubule is exposed with the α-tubulin and the other end with
50
+ β-tubulin, these two ends contribute to the polarity of the neurotubule – the
51
+ plus (+) end and the minus (-) end. The β-tubulin subunit is exposed on the plus
52
+ (+) end. The two ends differ in their growth rate: plus (+) end is the fast-growing
53
+ end while minus (-) end is the slow-growing end. Both ends have their own rate
54
+ of polymerization and depolymerization of tubulin dimers, net polymerization causes
55
+ the assembly of tubulin, hence the length of the neurotubules.'
56
+ - source_sentence: 'We want to find the value of $X$ in the given situation. We are
57
+ told that James has $X$ apples, and 4 of them are red and 3 of them are green.
58
+ We want to find the probability that both apples he chooses are green. The total
59
+ number of apples James has is $X$, and the total number of green apples is 3.
60
+ To find the probability, we can use the formula: In this case, the number of favorable
61
+ outcomes is choosing 2 green apples out of the 3 available green apples. The total
62
+ number of possible outcomes is choosing any 2 apples out of the $X$ total apples.
63
+ So, the probability is: Probability = (Number of ways to choose 2 green apples)
64
+ / (Number of ways to choose 2 apples) Since we are given that the probability
65
+ is $\frac{1}{7}$, we can write: $\frac{1}{7} = \frac{3 \choose 2}{X \choose 2}$
66
+ Simplifying, we have: $\frac{1}{7} = \frac{3}{\frac{X(X-1)}{2}}$'
67
+ sentences:
68
+ - 'Article: A common type system for clinical natural language processing. One challenge
69
+ in reusing clinical data stored in electronic medical records is that these data
70
+ are heterogenous. Clinical Natural Language Processing (NLP) plays an important
71
+ role in transforming information in clinical text to a standard representation
72
+ that is comparable and interoperable. Information may be processed and shared
73
+ when a type system specifies the allowable data structures. Therefore, we aim
74
+ to define a common type system for clinical NLP that enables interoperability
75
+ between structured and unstructured data generated in different clinical settings.
76
+ We describe a common type system for clinical NLP that has an end target of deep
77
+ semantics based on Clinical Element Models (CEMs), thus interoperating with structured
78
+ data and accommodating diverse NLP approaches. The type system has been implemented
79
+ in UIMA (Unstructured Information Management Architecture) and is fully functional
80
+ in a popular open-source clinical NLP system, cTAKES (clinical Text Analysis and
81
+ Knowledge Extraction System) versions 2.0 and later. We have created a type system
82
+ that targets deep semantics, thereby allowing for NLP systems to encapsulate knowledge
83
+ from text and share it alongside heterogenous clinical data sources. Rather than
84
+ surface semantics that are typically the end product of NLP algorithms, CEM-based
85
+ semantics explicitly build in deep clinical semantics as the point of interoperability
86
+ with more structured data types.'
87
+ - Furthermore, the majority of all the male skeletons from the European Neolithic
88
+ period have so far yielded Y-DNA belonging to this haplogroup. The oldest skeletons
89
+ confirmed by ancient DNA testing as carrying haplogroup G2a were five found in
90
+ the Avellaner cave burial site in Catalonia, Spain and were dated by radiocarbon
91
+ dating to about 5000 BCE. Haplogroup I-M253 (I1) at 4,3% of which L22, Z58 and
92
+ Z63. According to a study published in 2010, I-M253 originated between 3,170 and
93
+ 5,000 years ago, in Chalcolithic Europe. A 2014 study in Hungary uncovered remains
94
+ of two individuals from the Linear Pottery culture, one of whom was found to have
95
+ carried the M253 SNP which defines Haplogroup I1. This culture is thought to have
96
+ been present between 7,500 and 6,500 years ago. Finally, there are also some other
97
+ Y-DNA Haplogroups presented at a lower levels among Bulgarians ~ 10% all together,
98
+ as J-M267 (J1) at ~3.5%, E-M34 (E1b1b1b2a1) at ~2%, T-M70 (T1a) at ~1.5%, at less
99
+ than 1% Haplogroup C-M217 (C2), H-M82 (H1a1), N-M231 (N), Q-M242 (Q), L-M61 (L),
100
+ I-M170 (I*), E-M96 (E*) excl.
101
+ - 'So, the probability is: Probability = (Number of ways to Multiplying both sides
102
+ of the equation by $\frac{X(X-1)}{2}$, we get: $\frac{X(X-1)}{2} = 3 \times 7$
103
+ $X(X-1) = 6 \times 7$ $X(X-1) = 42$ Expanding the equation, we have: $X^2 - X
104
+ = 42$ Rearranging the equation, we get: $X^2 - X - 42 = 0$ This is a quadratic
105
+ equation that can be factored as: $(X - 7)(X + 6) = 0$ Setting each factor equal
106
+ to zero, we have two possible solutions: $X - 7 = 0$ or $X + 6 = 0$ Solving for
107
+ $X$, we find: $X = 7$ or $X = -6$ Since the number of apples cannot be negative,
108
+ the value of $X$ is 7.'
109
+ - source_sentence: 'Section: Model. The Oppenheimer–Snyder model of continued gravitational
110
+ collapse is described by the line element d s 2 = − d τ 2 + A 2 ( η ) ( d R 2
111
+ 1 − 2 M R − 2 R b 2 1 R + + R 2 d Ω 2 ) {\displaystyle ds^{2}=-d\tau ^{2}+A^{2}(\eta
112
+ )\left({\frac {dR^{2}}{1-2M{\frac {R_{-}^{2}}{R_{b}^{2}}}{\frac {1}{R_{+}}}}}+R^{2}d\Omega
113
+ ^{2}\right)} The quantities appearing in this expression are as follows: The coordinates
114
+ are ( τ , R , θ , ϕ ) {\displaystyle (\tau ,R,\theta ,\phi )} where θ , ϕ {\displaystyle
115
+ \theta ,\phi } are coordinates for the 2-sphere. R b {\displaystyle R_{b}} is
116
+ a positive quantity, the "boundary radius", representing the boundary of the matter
117
+ region. M {\displaystyle M} is a positive quantity, the mass.'
118
+ sentences:
119
+ - 'A standard demonstration in general relativity is to show how, in the "Newtonian
120
+ limit" (i.e. the particles are moving slowly, the gravitational field is weak,
121
+ and the field is static), curvature of time alone is sufficient to derive Newton''s
122
+ law of gravity. : 101–106 Newtonian gravitation is a theory of curved time. General
123
+ relativity is a theory of curved time and curved space. Given G as the gravitational
124
+ constant, M as the mass of a Newtonian star, and orbiting bodies of insignificant
125
+ mass at distance r from the star, the spacetime interval for Newtonian gravitation
126
+ is one for which only the time coefficient is variable:: 229–232 Δ s 2 = ( 1 −
127
+ 2 G M c 2 r ) ( c Δ t ) 2 − ( Δ x ) 2 − ( Δ y ) 2 − ( Δ z ) 2 {\displaystyle \Delta
128
+ s^{2}=\left(1-{\frac {2GM}{c^{2}r}}\right)(c\Delta t)^{2}-\,(\Delta x)^{2}-(\Delta
129
+ y)^{2}-(\Delta z)^{2}}'
130
+ - 'Section: Examples > Example 1. s ( t ) = A cos ⁡ ( ω t + θ ) , {\displaystyle
131
+ s(t)=A\cos(\omega t+\theta ),} where ω > 0. s a ( t ) = A e j ( �� t + θ ) , φ
132
+ ( t ) = ω t + θ . {\displaystyle {\begin{aligned}s_{\mathrm {a} }(t)&=Ae^{j(\omega
133
+ t+\theta )},\\\varphi (t)&=\omega t+\theta .\end{aligned}}} In this simple sinusoidal
134
+ example, the constant θ is also commonly referred to as phase or phase offset.
135
+ φ(t) is a function of time; θ is not. In the next example, we also see that the
136
+ phase offset of a real-valued sinusoid is ambiguous unless a reference (sin or
137
+ cos) is specified. φ(t) is unambiguously defined.'
138
+ - M {\displaystyle M} is a positive quantity, the mass. R − = m i n ( R , R b )
139
+ {\displaystyle R_{-}=\mathrm {min} (R,R_{b})} and R + = m a x ( R , R b ) {\displaystyle
140
+ R_{+}=\mathrm {max} (R,R_{b})} . η {\displaystyle \eta } is defined implicitly
141
+ by the equation τ ( η , R ) = 1 2 R + 3 2 M ( η + sin ⁡ η ) . {\displaystyle \tau
142
+ (\eta ,R)={\frac {1}{2}}{\sqrt {\frac {R_{+}^{3}}{2M}}}(\eta +\sin \eta ).} A
143
+ ( η ) = 1 + cos ⁡ η 2 {\displaystyle A(\eta )={\frac {1+\cos \eta }{2}}} . This
144
+ expression is valid both in the matter region R < R b {\displaystyle R<R_{b}}
145
+ , and the vacuum region R > R b {\displaystyle R>R_{b}} , and continuously transitions
146
+ between the two.
147
+ - source_sentence: 'Section: Properties and parameters > Plasma potential. Since plasmas
148
+ are very good electrical conductors, electric potentials play an important role.
149
+ The average potential in the space between charged particles, independent of how
150
+ it can be measured, is called the "plasma potential", or the "space potential".
151
+ If an electrode is inserted into a plasma, its potential will generally lie considerably
152
+ below the plasma potential due to what is termed a Debye sheath. The good electrical
153
+ conductivity of plasmas makes their electric fields very small. This results in
154
+ the important concept of "quasineutrality", which says the density of negative
155
+ charges is approximately equal to the density of positive charges over large volumes
156
+ of the plasma ( n e = ⟨ Z ⟩ n i {\displaystyle n_{e}=\langle Z\rangle n_{i}} ),
157
+ but on the scale of the Debye length, there can be charge imbalance. In the special
158
+ case that double layers are formed, the charge separation can extend some tens
159
+ of Debye lengths. The magnitude of the potentials and electric fields must be
160
+ determined by means other than simply finding the net charge density. A common
161
+ example is to assume that the electrons satisfy the Boltzmann relation: n e ∝
162
+ exp ⁡ ( e Φ / k B T e ) . {\displaystyle n_{e}\propto \exp(e\Phi /k_{\text{B}}T_{e}).}
163
+ Differentiating this relation provides a means to calculate the electric field
164
+ from the density: E → = k B T e e ∇ n e n e .'
165
+ sentences:
166
+ - When the integers a and b are coprime, the standard way of expressing this fact
167
+ in mathematical notation is to indicate that their greatest common divisor is
168
+ one, by the formula gcd(a, b) = 1 or (a, b) = 1. In their 1989 textbook Concrete
169
+ Mathematics, Ronald Graham, Donald Knuth, and Oren Patashnik proposed an alternative
170
+ notation a ⊥ b {\displaystyle a\perp b} to indicate that a and b are relatively
171
+ prime and that the term "prime" be used instead of coprime (as in a is prime to
172
+ b). A fast way to determine whether two numbers are coprime is given by the Euclidean
173
+ algorithm and its faster variants such as binary GCD algorithm or Lehmer's GCD
174
+ algorithm. The number of integers coprime with a positive integer n, between 1
175
+ and n, is given by Euler's totient function, also known as Euler's phi function,
176
+ φ(n). A set of integers can also be called coprime if its elements share no common
177
+ positive factor except 1. A stronger condition on a set of integers is pairwise
178
+ coprime, which means that a and b are coprime for every pair (a, b) of different
179
+ integers in the set. The set {2, 3, 4} is coprime, but it is not pairwise coprime
180
+ since 2 and 4 are not relatively prime.
181
+ - 'Let''s assume the number of cans of corn Beth bought is C. Twice the number of
182
+ cans of corn she bought would be 2C. So, 15 more than twice the number of cans
183
+ of corn she bought would be 2C + 15. We know that Beth purchased 35 cans of peas,
184
+ so we can set up the equation: 2C + 15 = 35. To isolate C, we can subtract 15
185
+ from both sides of the equation: 2C = 35 - 15 = 20. Dividing both sides of the
186
+ equation by 2, we get C = 20/2 = 10. Therefore, Beth bought 10 cans of corn.'
187
+ - '{\displaystyle n_{e}\propto \exp(e\Phi /k_{\text{B}}T_{e}).} Differentiating
188
+ this relation provides a means to calculate the electric field from the density:
189
+ E → = k B T e e ∇ n e n e . {\displaystyle {\vec {E}}={\frac {k_{\text{B}}T_{e}}{e}}{\frac
190
+ {\nabla n_{e}}{n_{e}}}.} It is possible to produce a plasma that is not quasineutral.
191
+ An electron beam, for example, has only negative charges. The density of a non-neutral
192
+ plasma must generally be very low, or it must be very small, otherwise, it will
193
+ be dissipated by the repulsive electrostatic force.'
194
+ - source_sentence: If X {\displaystyle X} is a linear space and g {\displaystyle g}
195
+ are constants, the system is said to be subject to additive noise, otherwise it
196
+ is said to be subject to multiplicative noise. This term is somewhat misleading
197
+ as it has come to mean the general case even though it appears to imply the limited
198
+ case in which g ( x ) ∝ x {\displaystyle g(x)\propto x} . For a fixed configuration
199
+ of noise, SDE has a unique solution differentiable with respect to the initial
200
+ condition.
201
+ sentences:
202
+ - Nontriviality of stochastic case shows up when one tries to average various objects
203
+ of interest over noise configurations. In this sense, an SDE is not a uniquely
204
+ defined entity when noise is multiplicative and when the SDE is understood as
205
+ a continuous time limit of a stochastic difference equation. In this case, SDE
206
+ must be complemented by what is known as "interpretations of SDE" such as Itô
207
+ or a Stratonovich interpretations of SDEs.
208
+ - 'Article: RNA-Seq technology and its application in fish transcriptomics.. High-throughput
209
+ sequencing technologies, also known as next-generation sequencing (NGS) technologies,
210
+ have revolutionized the way that genomic research is advancing. In addition to
211
+ the static genome, these state-of-art technologies have been recently exploited
212
+ to analyze the dynamic transcriptome, and the resulting technology is termed RNA
213
+ sequencing (RNA-seq). RNA-seq is free from many limitations of other transcriptomic
214
+ approaches, such as microarray and tag-based sequencing method. Although RNA-seq
215
+ has only been available for a short time, studies using this method have completely
216
+ changed our perspective of the breadth and depth of eukaryotic transcriptomes.
217
+ In terms of the transcriptomics of teleost fishes, both model and non-model species
218
+ have benefited from the RNA-seq approach and have undergone tremendous advances
219
+ in the past several years. RNA-seq has helped not only in mapping and annotating
220
+ fish transcriptome but also in our understanding of many biological processes
221
+ in fish, such as development, adaptive evolution, host immune response, and stress
222
+ response. In this review, we first provide an overview of each step of RNA-seq
223
+ from library construction to the bioinformatic analysis of the data. We then summarize
224
+ and discuss the recent biological insights obtained from the RNA-seq studies in
225
+ a variety of fish species.'
226
+ - 'This is the σ-algebra generated by the singletons of X . {\displaystyle X.} Note:
227
+ "countable" includes finite or empty. The collection of all unions of sets in
228
+ a countable partition of X {\displaystyle X} is a σ-algebra.'
229
+ pipeline_tag: sentence-similarity
230
+ library_name: sentence-transformers
231
+ ---
232
+
233
+ # SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
234
+
235
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
236
+
237
+ ## Model Details
238
+
239
+ ### Model Description
240
+ - **Model Type:** Sentence Transformer
241
+ - **Base model:** [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) <!-- at revision c9745ed1d9f207416be6d2e6f8de32d1f16199bf -->
242
+ - **Maximum Sequence Length:** 350 tokens
243
+ - **Output Dimensionality:** 384 dimensions
244
+ - **Similarity Function:** Cosine Similarity
245
+ <!-- - **Training Dataset:** Unknown -->
246
+ <!-- - **Language:** Unknown -->
247
+ <!-- - **License:** Unknown -->
248
+
249
+ ### Model Sources
250
+
251
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
252
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
253
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
254
+
255
+ ### Full Model Architecture
256
+
257
+ ```
258
+ SentenceTransformer(
259
+ (0): Transformer({'max_seq_length': 350, 'do_lower_case': False}) with Transformer model: BertModel
260
+ (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
261
+ )
262
+ ```
263
+
264
+ ## Usage
265
+
266
+ ### Direct Usage (Sentence Transformers)
267
+
268
+ First install the Sentence Transformers library:
269
+
270
+ ```bash
271
+ pip install -U sentence-transformers
272
+ ```
273
+
274
+ Then you can load this model and run inference.
275
+ ```python
276
+ from sentence_transformers import SentenceTransformer
277
+
278
+ # Download from the 🤗 Hub
279
+ model = SentenceTransformer("zacbrld/MNLP_M3_document_encoder_kaggle")
280
+ # Run inference
281
+ sentences = [
282
+ 'If X {\\displaystyle X} is a linear space and g {\\displaystyle g} are constants, the system is said to be subject to additive noise, otherwise it is said to be subject to multiplicative noise. This term is somewhat misleading as it has come to mean the general case even though it appears to imply the limited case in which g ( x ) ∝ x {\\displaystyle g(x)\\propto x} . For a fixed configuration of noise, SDE has a unique solution differentiable with respect to the initial condition.',
283
+ 'Nontriviality of stochastic case shows up when one tries to average various objects of interest over noise configurations. In this sense, an SDE is not a uniquely defined entity when noise is multiplicative and when the SDE is understood as a continuous time limit of a stochastic difference equation. In this case, SDE must be complemented by what is known as "interpretations of SDE" such as Itô or a Stratonovich interpretations of SDEs.',
284
+ 'Article: RNA-Seq technology and its application in fish transcriptomics.. High-throughput sequencing technologies, also known as next-generation sequencing (NGS) technologies, have revolutionized the way that genomic research is advancing. In addition to the static genome, these state-of-art technologies have been recently exploited to analyze the dynamic transcriptome, and the resulting technology is termed RNA sequencing (RNA-seq). RNA-seq is free from many limitations of other transcriptomic approaches, such as microarray and tag-based sequencing method. Although RNA-seq has only been available for a short time, studies using this method have completely changed our perspective of the breadth and depth of eukaryotic transcriptomes. In terms of the transcriptomics of teleost fishes, both model and non-model species have benefited from the RNA-seq approach and have undergone tremendous advances in the past several years. RNA-seq has helped not only in mapping and annotating fish transcriptome but also in our understanding of many biological processes in fish, such as development, adaptive evolution, host immune response, and stress response. In this review, we first provide an overview of each step of RNA-seq from library construction to the bioinformatic analysis of the data. We then summarize and discuss the recent biological insights obtained from the RNA-seq studies in a variety of fish species.',
285
+ ]
286
+ embeddings = model.encode(sentences)
287
+ print(embeddings.shape)
288
+ # [3, 384]
289
+
290
+ # Get the similarity scores for the embeddings
291
+ similarities = model.similarity(embeddings, embeddings)
292
+ print(similarities.shape)
293
+ # [3, 3]
294
+ ```
295
+
296
+ <!--
297
+ ### Direct Usage (Transformers)
298
+
299
+ <details><summary>Click to see the direct usage in Transformers</summary>
300
+
301
+ </details>
302
+ -->
303
+
304
+ <!--
305
+ ### Downstream Usage (Sentence Transformers)
306
+
307
+ You can finetune this model on your own dataset.
308
+
309
+ <details><summary>Click to expand</summary>
310
+
311
+ </details>
312
+ -->
313
+
314
+ <!--
315
+ ### Out-of-Scope Use
316
+
317
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
318
+ -->
319
+
320
+ <!--
321
+ ## Bias, Risks and Limitations
322
+
323
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
324
+ -->
325
+
326
+ <!--
327
+ ### Recommendations
328
+
329
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
330
+ -->
331
+
332
+ ## Training Details
333
+
334
+ ### Training Dataset
335
+
336
+ #### Unnamed Dataset
337
+
338
+ * Size: 42,185 training samples
339
+ * Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>sentence_2</code>
340
+ * Approximate statistics based on the first 1000 samples:
341
+ | | sentence_0 | sentence_1 | sentence_2 |
342
+ |:--------|:-------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
343
+ | type | string | string | string |
344
+ | details | <ul><li>min: 13 tokens</li><li>mean: 194.02 tokens</li><li>max: 350 tokens</li></ul> | <ul><li>min: 14 tokens</li><li>mean: 182.63 tokens</li><li>max: 350 tokens</li></ul> | <ul><li>min: 12 tokens</li><li>mean: 230.56 tokens</li><li>max: 350 tokens</li></ul> |
345
+ * Samples:
346
+ | sentence_0 | sentence_1 | sentence_2 |
347
+ |:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
348
+ | <code>Most hard-bodied insect specimens and some other hard-bodied invertebrates such as certain Arachnida, are preserved as pinned specimens. Either while still fresh, or after rehydrating them if necessary because they had dried out, specimens are transfixed by special stainless steel entomological pins. As the insect dries the internal tissues solidify and, possibly aided to some extent by the integument, they grip the pin and secure the specimen in place on the pin. Very small, delicate specimens may instead be secured by fine steel points driven into slips of card, or glued to card points or similar attachments that in turn are pinned in the same way as entire mounted insects.</code> | <code>The pins offer a means of handling the specimens without damage, and they also bear labels for descriptive and reference data. Once dried, the specimens may be kept in conveniently sized open trays. The bottoms of the trays are lined with a material suited to receiving and holding entomological pins securely and conveniently.</code> | <code>Article: Interruption of People in Human-Computer Interaction: A General Unifying Definition of Human Interruption and Taxonomy. Abstract : User-interruption in human-computer interaction (HCI) is an increasingly important problem. Many of the useful advances in intelligent and multitasking computer systems have the significant side effect of greatly increasing user-interruption. This previously innocuous HCI problem has become critical to the successful function of many kinds of modern computer systems. Unfortunately, no HCI design guidelines exist for solving this problem. In fact, theoretical tools do not yet exist for investigating the HCI problem of user-interruption in a comprehensive and generalizable way. This report asserts that a single unifying definition of user-interruption and the accompanying practical taxonomy would be useful theoretical tools for driving effective investigation of this crucial HCI problem. These theoretical tools are constructed here. A comprehensive a...</code> |
349
+ | <code>In strike-slip tectonic settings, deformation of the lithosphere occurs primarily in the plane of Earth as a result of near horizontal maximum and minimum principal stresses. Faults associated with these plate boundaries are primarily vertical. Wherever these vertical fault planes encounter bends, movement along the fault can create local areas of compression or tension. When the curve in the fault plane moves apart, a region of transtension occurs and sometimes is large enough and long-lived enough to create a sedimentary basin often called a pull-apart basin or strike-slip basin.</code> | <code>These basins are often roughly rhombohedral in shape and may be called a rhombochasm. A classic rhombochasm is illustrated by the Dead Sea rift, where northward movement of the Arabian Plate relative to the Anatolian Plate has created a strike slip basin. The opposite effect is that of transpression, where converging movement of a curved fault plane causes collision of the opposing sides of the fault. An example is the San Bernardino Mountains north of Los Angeles, which result from convergence along a curve in the San Andreas fault system. The Northridge earthquake was caused by vertical movement along local thrust and reverse faults "bunching up" against the bend in the otherwise strike-slip fault environment.</code> | <code>This was the first interpretation and prediction of a particle and corresponding antiparticle. See Dirac spinor and bispinor for further description of these spinors. In the non-relativistic limit the Dirac equation reduces to the Pauli equation (see Dirac equation for how).</code> |
350
+ | <code>M1: This was used by seacoast artillery for major-caliber seacoast guns. It computed continuous firing data for a battery of two guns that were separated by not more than 1,000 feet (300 m). It utilised the same type of input data furnished by a range section with the then-current (1940) types of position-finding and fire-control equipment. M3: This was used in conjunction with the M9 and M10 directors to compute all required firing data, i.e. azimuth, elevation and fuze time.</code> | <code>The computations were made continuously, so that the gun was at all times correctly pointed and the fuze correctly timed for firing at any instant. The computer was mounted in the M13 or M14 director trailer.</code> | <code>Section: Industry > Semiconductors. A semiconductor is a material that has a resistivity between a conductor and insulator. Modern day electronics run on semiconductors, and the industry had an estimated US$530 billion market in 2021. Its electronic properties can be greatly altered through intentionally introducing impurities in a process referred to as doping. Semiconductor materials are used to build diodes, transistors, light-emitting diodes (LEDs), and analog and digital electric circuits, among their many uses. Semiconductor devices have replaced thermionic devices like vacuum tubes in most applications. Semiconductor devices are manufactured both as single discrete devices and as integrated circuits (ICs), which consist of a number—from a few to millions—of devices manufactured and interconnected on a single semiconductor substrate. Of all the semiconductors in use today, silicon makes up the largest portion both by quantity and commercial value. Monocrystalline silicon is used ...</code> |
351
+ * Loss: [<code>TripletLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#tripletloss) with these parameters:
352
+ ```json
353
+ {
354
+ "distance_metric": "TripletDistanceMetric.EUCLIDEAN",
355
+ "triplet_margin": 5
356
+ }
357
+ ```
358
+
359
+ ### Training Hyperparameters
360
+ #### Non-Default Hyperparameters
361
+
362
+ - `per_device_train_batch_size`: 16
363
+ - `per_device_eval_batch_size`: 16
364
+ - `num_train_epochs`: 10
365
+ - `multi_dataset_batch_sampler`: round_robin
366
+
367
+ #### All Hyperparameters
368
+ <details><summary>Click to expand</summary>
369
+
370
+ - `overwrite_output_dir`: False
371
+ - `do_predict`: False
372
+ - `eval_strategy`: no
373
+ - `prediction_loss_only`: True
374
+ - `per_device_train_batch_size`: 16
375
+ - `per_device_eval_batch_size`: 16
376
+ - `per_gpu_train_batch_size`: None
377
+ - `per_gpu_eval_batch_size`: None
378
+ - `gradient_accumulation_steps`: 1
379
+ - `eval_accumulation_steps`: None
380
+ - `torch_empty_cache_steps`: None
381
+ - `learning_rate`: 5e-05
382
+ - `weight_decay`: 0.0
383
+ - `adam_beta1`: 0.9
384
+ - `adam_beta2`: 0.999
385
+ - `adam_epsilon`: 1e-08
386
+ - `max_grad_norm`: 1
387
+ - `num_train_epochs`: 10
388
+ - `max_steps`: -1
389
+ - `lr_scheduler_type`: linear
390
+ - `lr_scheduler_kwargs`: {}
391
+ - `warmup_ratio`: 0.0
392
+ - `warmup_steps`: 0
393
+ - `log_level`: passive
394
+ - `log_level_replica`: warning
395
+ - `log_on_each_node`: True
396
+ - `logging_nan_inf_filter`: True
397
+ - `save_safetensors`: True
398
+ - `save_on_each_node`: False
399
+ - `save_only_model`: False
400
+ - `restore_callback_states_from_checkpoint`: False
401
+ - `no_cuda`: False
402
+ - `use_cpu`: False
403
+ - `use_mps_device`: False
404
+ - `seed`: 42
405
+ - `data_seed`: None
406
+ - `jit_mode_eval`: False
407
+ - `use_ipex`: False
408
+ - `bf16`: False
409
+ - `fp16`: False
410
+ - `fp16_opt_level`: O1
411
+ - `half_precision_backend`: auto
412
+ - `bf16_full_eval`: False
413
+ - `fp16_full_eval`: False
414
+ - `tf32`: None
415
+ - `local_rank`: 0
416
+ - `ddp_backend`: None
417
+ - `tpu_num_cores`: None
418
+ - `tpu_metrics_debug`: False
419
+ - `debug`: []
420
+ - `dataloader_drop_last`: False
421
+ - `dataloader_num_workers`: 0
422
+ - `dataloader_prefetch_factor`: None
423
+ - `past_index`: -1
424
+ - `disable_tqdm`: False
425
+ - `remove_unused_columns`: True
426
+ - `label_names`: None
427
+ - `load_best_model_at_end`: False
428
+ - `ignore_data_skip`: False
429
+ - `fsdp`: []
430
+ - `fsdp_min_num_params`: 0
431
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
432
+ - `fsdp_transformer_layer_cls_to_wrap`: None
433
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
434
+ - `deepspeed`: None
435
+ - `label_smoothing_factor`: 0.0
436
+ - `optim`: adamw_torch
437
+ - `optim_args`: None
438
+ - `adafactor`: False
439
+ - `group_by_length`: False
440
+ - `length_column_name`: length
441
+ - `ddp_find_unused_parameters`: None
442
+ - `ddp_bucket_cap_mb`: None
443
+ - `ddp_broadcast_buffers`: False
444
+ - `dataloader_pin_memory`: True
445
+ - `dataloader_persistent_workers`: False
446
+ - `skip_memory_metrics`: True
447
+ - `use_legacy_prediction_loop`: False
448
+ - `push_to_hub`: False
449
+ - `resume_from_checkpoint`: None
450
+ - `hub_model_id`: None
451
+ - `hub_strategy`: every_save
452
+ - `hub_private_repo`: None
453
+ - `hub_always_push`: False
454
+ - `gradient_checkpointing`: False
455
+ - `gradient_checkpointing_kwargs`: None
456
+ - `include_inputs_for_metrics`: False
457
+ - `include_for_metrics`: []
458
+ - `eval_do_concat_batches`: True
459
+ - `fp16_backend`: auto
460
+ - `push_to_hub_model_id`: None
461
+ - `push_to_hub_organization`: None
462
+ - `mp_parameters`:
463
+ - `auto_find_batch_size`: False
464
+ - `full_determinism`: False
465
+ - `torchdynamo`: None
466
+ - `ray_scope`: last
467
+ - `ddp_timeout`: 1800
468
+ - `torch_compile`: False
469
+ - `torch_compile_backend`: None
470
+ - `torch_compile_mode`: None
471
+ - `include_tokens_per_second`: False
472
+ - `include_num_input_tokens_seen`: False
473
+ - `neftune_noise_alpha`: None
474
+ - `optim_target_modules`: None
475
+ - `batch_eval_metrics`: False
476
+ - `eval_on_start`: False
477
+ - `use_liger_kernel`: False
478
+ - `eval_use_gather_object`: False
479
+ - `average_tokens_across_devices`: False
480
+ - `prompts`: None
481
+ - `batch_sampler`: batch_sampler
482
+ - `multi_dataset_batch_sampler`: round_robin
483
+
484
+ </details>
485
+
486
+ ### Training Logs
487
+ | Epoch | Step | Training Loss |
488
+ |:------:|:-----:|:-------------:|
489
+ | 0.1896 | 500 | 2.189 |
490
+ | 0.3792 | 1000 | 0.2668 |
491
+ | 0.5688 | 1500 | 0.1869 |
492
+ | 0.7584 | 2000 | 0.1456 |
493
+ | 0.9480 | 2500 | 0.1123 |
494
+ | 1.1377 | 3000 | 0.0978 |
495
+ | 1.3273 | 3500 | 0.0735 |
496
+ | 1.5169 | 4000 | 0.0842 |
497
+ | 1.7065 | 4500 | 0.0756 |
498
+ | 1.8961 | 5000 | 0.0577 |
499
+ | 2.0857 | 5500 | 0.0512 |
500
+ | 2.2753 | 6000 | 0.0308 |
501
+ | 2.4649 | 6500 | 0.0271 |
502
+ | 2.6545 | 7000 | 0.0303 |
503
+ | 2.8441 | 7500 | 0.0324 |
504
+ | 3.0338 | 8000 | 0.0325 |
505
+ | 3.2234 | 8500 | 0.0112 |
506
+ | 3.4130 | 9000 | 0.0136 |
507
+ | 3.6026 | 9500 | 0.0123 |
508
+ | 3.7922 | 10000 | 0.0117 |
509
+ | 3.9818 | 10500 | 0.0148 |
510
+ | 4.1714 | 11000 | 0.0085 |
511
+ | 4.3610 | 11500 | 0.0066 |
512
+ | 4.5506 | 12000 | 0.0053 |
513
+ | 4.7402 | 12500 | 0.0078 |
514
+ | 4.9298 | 13000 | 0.006 |
515
+ | 5.1195 | 13500 | 0.0058 |
516
+ | 5.3091 | 14000 | 0.0043 |
517
+ | 5.4987 | 14500 | 0.0027 |
518
+ | 5.6883 | 15000 | 0.0036 |
519
+ | 5.8779 | 15500 | 0.0035 |
520
+ | 6.0675 | 16000 | 0.0029 |
521
+ | 6.2571 | 16500 | 0.0031 |
522
+ | 6.4467 | 17000 | 0.0015 |
523
+ | 6.6363 | 17500 | 0.0025 |
524
+ | 6.8259 | 18000 | 0.0021 |
525
+ | 7.0155 | 18500 | 0.0032 |
526
+ | 7.2052 | 19000 | 0.0011 |
527
+ | 7.3948 | 19500 | 0.001 |
528
+ | 7.5844 | 20000 | 0.0012 |
529
+ | 7.7740 | 20500 | 0.0011 |
530
+ | 7.9636 | 21000 | 0.0013 |
531
+ | 8.1532 | 21500 | 0.0002 |
532
+ | 8.3428 | 22000 | 0.001 |
533
+ | 8.5324 | 22500 | 0.0006 |
534
+ | 8.7220 | 23000 | 0.0003 |
535
+ | 8.9116 | 23500 | 0.0007 |
536
+ | 9.1013 | 24000 | 0.0003 |
537
+ | 9.2909 | 24500 | 0.0002 |
538
+ | 9.4805 | 25000 | 0.0005 |
539
+ | 9.6701 | 25500 | 0.0005 |
540
+ | 9.8597 | 26000 | 0.0005 |
541
+
542
+
543
+ ### Framework Versions
544
+ - Python: 3.12.8
545
+ - Sentence Transformers: 3.4.1
546
+ - Transformers: 4.52.2
547
+ - PyTorch: 2.7.0+cu126
548
+ - Accelerate: 1.3.0
549
+ - Datasets: 3.2.0
550
+ - Tokenizers: 0.21.0
551
+
552
+ ## Citation
553
+
554
+ ### BibTeX
555
+
556
+ #### Sentence Transformers
557
+ ```bibtex
558
+ @inproceedings{reimers-2019-sentence-bert,
559
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
560
+ author = "Reimers, Nils and Gurevych, Iryna",
561
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
562
+ month = "11",
563
+ year = "2019",
564
+ publisher = "Association for Computational Linguistics",
565
+ url = "https://arxiv.org/abs/1908.10084",
566
+ }
567
+ ```
568
+
569
+ #### TripletLoss
570
+ ```bibtex
571
+ @misc{hermans2017defense,
572
+ title={In Defense of the Triplet Loss for Person Re-Identification},
573
+ author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
574
+ year={2017},
575
+ eprint={1703.07737},
576
+ archivePrefix={arXiv},
577
+ primaryClass={cs.CV}
578
+ }
579
+ ```
580
+
581
+ <!--
582
+ ## Glossary
583
+
584
+ *Clearly define terms in order to be accessible across audiences.*
585
+ -->
586
+
587
+ <!--
588
+ ## Model Card Authors
589
+
590
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
591
+ -->
592
+
593
+ <!--
594
+ ## Model Card Contact
595
+
596
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
597
+ -->
config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertModel"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "classifier_dropout": null,
7
+ "gradient_checkpointing": false,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 384,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 1536,
13
+ "layer_norm_eps": 1e-12,
14
+ "max_position_embeddings": 512,
15
+ "model_type": "bert",
16
+ "num_attention_heads": 12,
17
+ "num_hidden_layers": 6,
18
+ "pad_token_id": 0,
19
+ "position_embedding_type": "absolute",
20
+ "torch_dtype": "float32",
21
+ "transformers_version": "4.52.2",
22
+ "type_vocab_size": 2,
23
+ "use_cache": true,
24
+ "vocab_size": 30522
25
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.4.1",
4
+ "transformers": "4.52.2",
5
+ "pytorch": "2.7.0+cu126"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d3f6e7268b4bed5b2f4fa4f7dd209c42a10090dfe952f160feb92b645c4e7b3c
3
+ size 90864192
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 350,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": false,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "max_length": 128,
51
+ "model_max_length": 350,
52
+ "never_split": null,
53
+ "pad_to_multiple_of": null,
54
+ "pad_token": "[PAD]",
55
+ "pad_token_type_id": 0,
56
+ "padding_side": "right",
57
+ "sep_token": "[SEP]",
58
+ "stride": 0,
59
+ "strip_accents": null,
60
+ "tokenize_chinese_chars": true,
61
+ "tokenizer_class": "BertTokenizer",
62
+ "truncation_side": "right",
63
+ "truncation_strategy": "longest_first",
64
+ "unk_token": "[UNK]"
65
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff