sucharush commited on
Commit
4efe6e2
·
verified ·
1 Parent(s): 2580ec9

Add new SentenceTransformer model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 384,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,774 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:98112
8
+ - loss:MultipleNegativesRankingLoss
9
+ base_model: thenlper/gte-small
10
+ widget:
11
+ - source_sentence: How does a photocell control outdoor lighting?
12
+ sentences:
13
+ - 'To solve this problem, we can use the binomial probability formula:
14
+
15
+
16
+ P(X = k) = C(n, k) * p^k * (1-p)^(n-k)
17
+
18
+
19
+ where:
20
+
21
+ - P(X = k) is the probability of exactly k successes (faulty keyboards) in n trials
22
+ (laptops produced)
23
+
24
+ - C(n, k) is the number of combinations of n items taken k at a time (n! / (k!(n-k)!))
25
+
26
+ - p is the probability of success (5% or 0.05)
27
+
28
+ - n is the number of trials (400 laptops)
29
+
30
+ - k is the number of successes (20 faulty keyboards)
31
+
32
+
33
+ However, we want to find the probability of at least 20 faulty keyboards, so we
34
+ need to find the sum of probabilities for k = 20, 21, 22, ..., 400.
35
+
36
+
37
+ P(X >= 20) = 1 - P(X < 20) = 1 - Σ P(X = k) for k = 0 to 19
38
+
39
+
40
+ Now, we can calculate the probabilities for each value of k and sum them up:
41
+
42
+
43
+ P(X >= 20) = 1 - Σ C(400, k) * 0.05^k * 0.95^(400-k) for k = 0 to 19
44
+
45
+
46
+ Using a calculator or software to compute the sum, we get:
47
+
48
+
49
+ P(X >= 20) ≈ 1 - 0.0184 = 0.9816
50
+
51
+
52
+ So, the probability that at least 20 laptops will have a faulty keyboard is approximately
53
+ 98.16%.'
54
+ - A photocell controls outdoor lighting by detecting the level of ambient light.
55
+ It automatically turns the lights on when it becomes dark and off when it becomes
56
+ light, functioning as a light-dependent switch for energy efficiency and convenience.
57
+ - 'Glycosylation with β-N-acetylglucosamine (O-GlcNAcylation) is one of the most
58
+ complex post-translational modifications. The cycling of O-GlcNAc is controlled
59
+ by two enzymes: UDP-NAc transferase (OGT) and O-GlcNAcase (OGA). We recently reported
60
+ that endothelin-1 (ET-1) augments vascular levels of O-GlcNAcylated proteins.
61
+ Here we tested the hypothesis that O-GlcNAcylation contributes to the vascular
62
+ effects of ET-1 via activation of the RhoA/Rho-kinase pathway. Incubation of vascular
63
+ smooth muscle cells (VSMCs) with ET-1 (0.1 μM) produces a time-dependent increase
64
+ in O-GlcNAc levels. ET-1-induced O-GlcNAcylation is not observed when VSMCs are
65
+ previously transfected with OGT siRNA, treated with ST045849 (OGT inhibitor) or
66
+ atrasentan (ET(A) antagonist). ET-1 as well as PugNAc (OGA inhibitor) augmented
67
+ contractions to phenylephrine in endothelium-denuded rat aortas, an effect that
68
+ was abolished by the Rho kinase inhibitor Y-27632. Incubation of VSMCs with ET-1
69
+ increased expression of the phosphorylated forms of myosin phosphatase target
70
+ subunit 1 (MYPT-1), protein kinase C-potentiated protein phosphatase 1 inhibitor
71
+ protein (protein kinase C-potentiated phosphatase inhibitor-17), and myosin light
72
+ chain (MLC) and RhoA expression and activity, and this effect was abolished by
73
+ both OGT siRNA transfection or OGT inhibition and atrasentan. ET-1 also augmented
74
+ expression of PDZ-Rho GEF (guanine nucleotide exchange factor) and p115-Rho GEF
75
+ in VSMCs and this was prevented by OGT siRNA, ST045849, and atrasentan.'
76
+ - source_sentence: A torus has a major radius of 5 cm and a minor radius of 3 cm.
77
+ Find the volume of the torus.
78
+ sentences:
79
+ - 'To find the Hausdorff dimension of the Koch curve, we can use the formula:
80
+
81
+
82
+ Hausdorff dimension (D) = log(N) / log(1/s)
83
+
84
+
85
+ where N is the number of self-similar pieces and s is the scaling factor.
86
+
87
+
88
+ For the Koch curve, each line segment is divided into four segments, each of which
89
+ is 1/3 the length of the original segment. Therefore, N = 4 and s = 1/3.
90
+
91
+
92
+ Now, we can plug these values into the formula:
93
+
94
+
95
+ D = log(4) / log(1/3)
96
+
97
+
98
+ D ≈ 1.2619
99
+
100
+
101
+ So, the Hausdorff dimension of the Koch curve is approximately 1.2619.'
102
+ - 'To find the volume of a torus, we can use the formula:
103
+
104
+
105
+ Volume = (π * minor_radius^2) * (2 * π * major_radius)
106
+
107
+
108
+ where minor_radius is the minor radius of the torus and major_radius is the major
109
+ radius of the torus.
110
+
111
+
112
+ Given that the major radius is 5 cm and the minor radius is 3 cm, we can plug
113
+ these values into the formula:
114
+
115
+
116
+ Volume = (π * 3^2) * (2 * π * 5)
117
+
118
+
119
+ Volume = (π * 9) * (10 * π)
120
+
121
+
122
+ Volume = 90 * π^2
123
+
124
+
125
+ The volume of the torus is approximately 282.74 cubic centimeters.'
126
+ - The purpose of the present study was to elucidate the mechanisms of action mediating
127
+ enhancement of basal glucose uptake in skeletal muscle cells by seven medicinal
128
+ plant products recently identified from the pharmacopeia of native Canadian populations
129
+ (Spoor et al., 2006). Activity of the major signaling pathways that regulate glucose
130
+ uptake was assessed by western immunoblot in C2C12 muscle cells treated with extracts
131
+ from these plant species. Effects of extracts on mitochondrial function were assessed
132
+ by respirometry in isolated rat liver mitochondria. Metabolic stress induced by
133
+ extracts was assessed by measuring ATP concentration and rate of cell medium acidification
134
+ in C2C12 myotubes and H4IIE hepatocytes. Extracts were applied at a dose of 15-100
135
+ microg/ml. The effect of all seven products was achieved through a common mechanism
136
+ mediated not by the insulin signaling pathway but rather by the AMP-activated
137
+ protein kinase (AMPK) pathway in response to the disruption of mitochondrial function
138
+ and ensuing metabolic stress. Disruption of mitochondrial function occurred in
139
+ the form of uncoupling of oxidative phosphorylation and/or inhibition of ATPsynthase.
140
+ Activity of the AMPK pathway, in some instances comparable to that stimulated
141
+ by 4mM of the AMP-mimetic AICAR, was in several cases sustained for at least 18h
142
+ post-treatment. Duration of metabolic stress, however, was in most cases in the
143
+ order of 1h.
144
+ - source_sentence: Consider the elliptic curve given by the equation $y^2=x^3-2x+5$
145
+ over the field of rational numbers $\mathbb{Q}$. Let $P=(1,2)$ and $Q=(-1,2)$
146
+ be two points on the curve. Find the equation of the line passing through $P$
147
+ and $Q$ and show that it intersects the curve at another point $R$. Then, find
148
+ the coordinates of the point $R$.
149
+ sentences:
150
+ - Fifteen novel derivatives of D-DIBOA, including aromatic ring modifications and
151
+ the addition of side chains in positions C-2 and N-4, had previously been synthesised
152
+ and their phytotoxicity on standard target species (STS) evaluated. This strategy
153
+ combined steric, electronic, solubility and lipophilicity requirements to achieve
154
+ the maximum phytotoxic activity. An evaluation of the bioactivity of these compounds
155
+ on the systems Oryza sativa-Echinochloa crus-galli and Triticum aestivum-Avena
156
+ fatua is reported here. All compounds showed inhibition profiles on the two species
157
+ Echinochloa crus-galli (L.) Beauv. and Avena fatua L. The most marked effects
158
+ were caused by 6F-4Pr-D-DIBOA, 6F-4Val-D-DIBOA, 6Cl-4Pr-D-DIBOA and 6Cl-4Val-D-DIBOA.
159
+ The IC(50) values for the systems Echinochloa crus-galli-Oryza sativa and Avena
160
+ fatua-Triticum aestivum for all compounds were compared. The compound that showed
161
+ the greatest selectivity for the system Echinochloa crus-galli-Oryza sativa was
162
+ 8Cl-4Pr-D-DIBOA, which was 15 times more selective than the commercial herbicide
163
+ propanil (Cotanil-35). With regard to the system Avena fatua-Triticum aestivum,
164
+ the compounds that showed the highest selectivities were 8Cl-4Val-D-DIBOA and
165
+ 6F-4Pr-D-DIBOA. The results obtained for 6F-4Pr-D-DIBOA are of great interest
166
+ because of the high phytotoxicity to Avena fatua (IC(50) = 6 µM, r(2) = 0.9616).
167
+ - 'To find the equation of the line passing through points $P=(1,2)$ and $Q=(-1,2)$,
168
+ we first find the slope of the line. Since the y-coordinates of both points are
169
+ the same, the slope is 0. Therefore, the line is horizontal and its equation is
170
+ given by:
171
+
172
+
173
+ $y = 2$
174
+
175
+
176
+ Now, we want to find the point $R$ where this line intersects the elliptic curve
177
+ $y^2 = x^3 - 2x + 5$. Since we know that $y=2$, we can substitute this value into
178
+ the equation of the curve:
179
+
180
+
181
+ $(2)^2 = x^3 - 2x + 5$
182
+
183
+
184
+ Simplifying, we get:
185
+
186
+
187
+ $4 = x^3 - 2x + 5$
188
+
189
+
190
+ Rearranging the terms, we have:
191
+
192
+
193
+ $x^3 - 2x + 1 = 0$
194
+
195
+
196
+ We know that $x=1$ and $x=-1$ are solutions to this equation since they correspond
197
+ to the points $P$ and $Q$. To find the third solution, we can use synthetic division
198
+ or factor the polynomial. Factoring, we get:
199
+
200
+
201
+ $(x-1)(x+1)(x-1) = 0$
202
+
203
+
204
+ So, the third solution is $x=1$. Substituting this value back into the equation
205
+ of the line, we find the corresponding y-coordinate:
206
+
207
+
208
+ $y = 2$
209
+
210
+
211
+ Thus, the third point of intersection is $R=(1,2)$. However, in the context of
212
+ elliptic curves, we should take the "sum" of the points $P$ and $Q$ as the negative
213
+ of the third intersection point. Since $R=(1,2)$, the negative of this point is
214
+ given by $-R=(1,-2)$. Therefore, the "sum" of the points $P$ and $Q$ on the elliptic
215
+ curve is:
216
+
217
+
218
+ $P + Q = -R = (1,-2)$.'
219
+ - The use of geospatial analysis may be subject to regulatory compliance depending
220
+ on the specific application and the jurisdiction in which it is used. For example,
221
+ the use of geospatial data for marketing purposes may be subject to privacy regulations,
222
+ and the use of geospatial data for land use planning may be subject to environmental
223
+ regulations. It is important to consult with legal counsel to ensure compliance
224
+ with all applicable laws and regulations.
225
+ - source_sentence: Does sLEDAI-2K Conceal Worsening in a Particular System When There
226
+ Is Overall Improvement?
227
+ sentences:
228
+ - To determine whether the Systemic Lupus Erythematosus Disease Activity Index 2000
229
+ (SLEDAI-2K) is valid in identifying patients who had a clinically important overall
230
+ improvement with no worsening in other descriptors/systems. Consecutive patients
231
+ with systemic lupus erythematosus with active disease who attended the Lupus Clinic
232
+ between 2000 and 2012 were studied. Based on the change in the total SLEDAI-2K
233
+ scores on last visit, patients were grouped as improved, flared/worsened, and
234
+ unchanged. Patients showing improvement were evaluated for the presence of new
235
+ active descriptors at last visit compared with baseline visit. Of the 158 patients
236
+ studied, 109 patients had improved, 38 remained unchanged, and 11 flared/worsened
237
+ at last visit. In the improved group, 11 patients had a new laboratory descriptor
238
+ that was not present at baseline visit. In those 11 patients, this new laboratory
239
+ descriptor was not clinically significant and did not require a change in disease
240
+ management.
241
+ - 'To find the dot product of two vectors using their magnitudes, angle between
242
+ them, and trigonometry, we can use the formula:
243
+
244
+
245
+ Dot product = |A| * |B| * cos(θ)
246
+
247
+
248
+ where |A| and |B| are the magnitudes of the vectors, and θ is the angle between
249
+ them.
250
+
251
+
252
+ In this case, |A| = 5 units, |B| = 8 units, and θ = 60 degrees.
253
+
254
+
255
+ First, we need to convert the angle from degrees to radians:
256
+
257
+
258
+ θ = 60 * (π / 180) = π / 3 radians
259
+
260
+
261
+ Now, we can find the dot product:
262
+
263
+
264
+ Dot product = 5 * 8 * cos(π / 3)
265
+
266
+ Dot product = 40 * (1/2)
267
+
268
+ Dot product = 20
269
+
270
+
271
+ So, the dot product of the two vectors is 20.'
272
+ - To determine if hospitals that routinely discharge patients early after lobectomy
273
+ have increased readmissions. Hospitals are increasingly motivated to reduce length
274
+ of stay (LOS) after lung cancer surgery, yet it is unclear if a routine of early
275
+ discharge is associated with increased readmissions. The relationship between
276
+ hospital discharge practices and readmission rates is therefore of tremendous
277
+ clinical and financial importance. The National Cancer Database was queried for
278
+ patients undergoing lobectomy for lung cancer from 2004 to 2013 at Commission
279
+ on Cancer-accredited hospitals, which performed at least 25 lobectomies in a 2-year
280
+ period. Facility discharge practices were characterized by a facility's median
281
+ LOS relative to the median LOS for all patients in that same time period. In all,
282
+ 59,734 patients met inclusion criteria; 2687 (4.5%) experienced an unplanned readmission.
283
+ In a hierarchical logistic regression model, a routine of early discharge (defined
284
+ as a facility's tendency to discharge patients faster than the population median
285
+ in the same time period) was not associated with increased risk of readmission
286
+ (odds ratio 1.12, 95% confidence interval 0.97-1.28, P = 0.12). In a risk-adjusted
287
+ hospital readmission rate analysis, hospitals that discharged patients early did
288
+ not experience more readmissions (P = 0.39). The lack of effect of early discharge
289
+ practices on readmission rates was observed for both minimally invasive and thoracotomy
290
+ approaches.
291
+ - source_sentence: Does systemic administration of urocortin after intracerebral hemorrhage
292
+ reduce neurological deficits and neuroinflammation in rats?
293
+ sentences:
294
+ - Intracerebral hemorrhage (ICH) remains a serious clinical problem lacking effective
295
+ treatment. Urocortin (UCN), a novel anti-inflammatory neuropeptide, protects injured
296
+ cardiomyocytes and dopaminergic neurons. Our preliminary studies indicate UCN
297
+ alleviates ICH-induced brain injury when administered intracerebroventricularly
298
+ (ICV). The present study examines the therapeutic effect of UCN on ICH-induced
299
+ neurological deficits and neuroinflammation when administered by the more convenient
300
+ intraperitoneal (i.p.) route. ICH was induced in male Sprague-Dawley rats by intrastriatal
301
+ infusion of bacterial collagenase VII-S or autologous blood. UCN (2.5 or 25 μg/kg)
302
+ was administered i.p. at 60 minutes post-ICH. Penetration of i.p. administered
303
+ fluorescently labeled UCN into the striatum was examined by fluorescence microscopy.
304
+ Neurological deficits were evaluated by modified neurological severity score (mNSS).
305
+ Brain edema was assessed using the dry/wet method. Blood-brain barrier (BBB) disruption
306
+ was assessed using the Evans blue assay. Hemorrhagic volume and lesion volume
307
+ were assessed by Drabkin's method and morphometric assay, respectively. Pro-inflammatory
308
+ cytokine (TNF-α, IL-1β, and IL-6) expression was evaluated by enzyme-linked immunosorbent
309
+ assay (ELISA). Microglial activation and neuronal loss were evaluated by immunohistochemistry.
310
+ Administration of UCN reduced neurological deficits from 1 to 7 days post-ICH.
311
+ Surprisingly, although a higher dose (25 μg/kg, i.p.) also reduced the functional
312
+ deficits associated with ICH, it is significantly less effective than the lower
313
+ dose (2.5 μg/kg, i.p.). Beneficial results with the low dose of UCN included a
314
+ reduction in neurological deficits from 1 to 7 days post-ICH, as well as a reduction
315
+ in brain edema, BBB disruption, lesion volume, microglial activation and neuronal
316
+ loss 3 days post-ICH, and suppression of TNF-α, IL-1β, and IL-6 production 1,
317
+ 3 and 7 days post-ICH.
318
+ - 'A perfect number is a positive integer that is equal to the sum of its proper
319
+ divisors (excluding itself). The first perfect numbers are 6, 28, 496, and 8128.
320
+ Perfect numbers can be generated using the formula 2^(p-1) * (2^p - 1), where
321
+ p and 2^p - 1 are both prime numbers.
322
+
323
+
324
+ The first five (p, 2^p - 1) pairs are:
325
+
326
+ (2, 3) - 6
327
+
328
+ (3, 7) - 28
329
+
330
+ (5, 31) - 496
331
+
332
+ (7, 127) - 8128
333
+
334
+ (13, 8191) - 33,550,336
335
+
336
+
337
+ To find the 6th perfect number, we need to find the next prime number p such that
338
+ 2^p - 1 is also prime. The next such pair is (17, 131071). Using the formula:
339
+
340
+
341
+ 2^(17-1) * (2^17 - 1) = 2^16 * 131071 = 65,536 * 131071 = 8,589,869,056
342
+
343
+
344
+ So, the 6th perfect number is 8,589,869,056.'
345
+ - 'In type theory, the successor function $S$ is used to represent the next number
346
+ in the sequence. When you apply the successor function $S$ three times to the
347
+ number 0, you get:
348
+
349
+
350
+ 1. $S(0)$, which represents 1.
351
+
352
+ 2. $S(S(0))$, which represents 2.
353
+
354
+ 3. $S(S(S(0)))$, which represents 3.
355
+
356
+
357
+ So, the result of applying the successor function $S$ three times to the number
358
+ 0 in type theory is 3.'
359
+ pipeline_tag: sentence-similarity
360
+ library_name: sentence-transformers
361
+ metrics:
362
+ - cosine_accuracy@1
363
+ - cosine_accuracy@3
364
+ - cosine_accuracy@5
365
+ - cosine_accuracy@10
366
+ - cosine_precision@1
367
+ - cosine_precision@3
368
+ - cosine_precision@5
369
+ - cosine_recall@1
370
+ - cosine_recall@3
371
+ - cosine_recall@5
372
+ - cosine_ndcg@10
373
+ - cosine_mrr@10
374
+ - cosine_map@100
375
+ model-index:
376
+ - name: SentenceTransformer based on thenlper/gte-small
377
+ results:
378
+ - task:
379
+ type: logging
380
+ name: Logging
381
+ dataset:
382
+ name: ir eval
383
+ type: ir-eval
384
+ metrics:
385
+ - type: cosine_accuracy@1
386
+ value: 0.9291020819957809
387
+ name: Cosine Accuracy@1
388
+ - type: cosine_accuracy@3
389
+ value: 0.9819315784646427
390
+ name: Cosine Accuracy@3
391
+ - type: cosine_accuracy@5
392
+ value: 0.9933963129413923
393
+ name: Cosine Accuracy@5
394
+ - type: cosine_accuracy@10
395
+ value: 0.9984407961111621
396
+ name: Cosine Accuracy@10
397
+ - type: cosine_precision@1
398
+ value: 0.9291020819957809
399
+ name: Cosine Precision@1
400
+ - type: cosine_precision@3
401
+ value: 0.32731052615488093
402
+ name: Cosine Precision@3
403
+ - type: cosine_precision@5
404
+ value: 0.19867926258827848
405
+ name: Cosine Precision@5
406
+ - type: cosine_recall@1
407
+ value: 0.9291020819957809
408
+ name: Cosine Recall@1
409
+ - type: cosine_recall@3
410
+ value: 0.9819315784646427
411
+ name: Cosine Recall@3
412
+ - type: cosine_recall@5
413
+ value: 0.9933963129413923
414
+ name: Cosine Recall@5
415
+ - type: cosine_ndcg@10
416
+ value: 0.9670096227619588
417
+ name: Cosine Ndcg@10
418
+ - type: cosine_mrr@10
419
+ value: 0.9565327512887825
420
+ name: Cosine Mrr@10
421
+ - type: cosine_map@100
422
+ value: 0.9565967419425125
423
+ name: Cosine Map@100
424
+ ---
425
+
426
+ # SentenceTransformer based on thenlper/gte-small
427
+
428
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [thenlper/gte-small](https://huggingface.co/thenlper/gte-small). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
429
+
430
+ ## Model Details
431
+
432
+ ### Model Description
433
+ - **Model Type:** Sentence Transformer
434
+ - **Base model:** [thenlper/gte-small](https://huggingface.co/thenlper/gte-small) <!-- at revision 17e1f347d17fe144873b1201da91788898c639cd -->
435
+ - **Maximum Sequence Length:** 512 tokens
436
+ - **Output Dimensionality:** 384 dimensions
437
+ - **Similarity Function:** Cosine Similarity
438
+ <!-- - **Training Dataset:** Unknown -->
439
+ <!-- - **Language:** Unknown -->
440
+ <!-- - **License:** Unknown -->
441
+
442
+ ### Model Sources
443
+
444
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
445
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
446
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
447
+
448
+ ### Full Model Architecture
449
+
450
+ ```
451
+ SentenceTransformer(
452
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
453
+ (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
454
+ (2): Normalize()
455
+ )
456
+ ```
457
+
458
+ ## Usage
459
+
460
+ ### Direct Usage (Sentence Transformers)
461
+
462
+ First install the Sentence Transformers library:
463
+
464
+ ```bash
465
+ pip install -U sentence-transformers
466
+ ```
467
+
468
+ Then you can load this model and run inference.
469
+ ```python
470
+ from sentence_transformers import SentenceTransformer
471
+
472
+ # Download from the 🤗 Hub
473
+ model = SentenceTransformer("sucharush/gte_MNR")
474
+ # Run inference
475
+ sentences = [
476
+ 'Does systemic administration of urocortin after intracerebral hemorrhage reduce neurological deficits and neuroinflammation in rats?',
477
+ "Intracerebral hemorrhage (ICH) remains a serious clinical problem lacking effective treatment. Urocortin (UCN), a novel anti-inflammatory neuropeptide, protects injured cardiomyocytes and dopaminergic neurons. Our preliminary studies indicate UCN alleviates ICH-induced brain injury when administered intracerebroventricularly (ICV). The present study examines the therapeutic effect of UCN on ICH-induced neurological deficits and neuroinflammation when administered by the more convenient intraperitoneal (i.p.) route. ICH was induced in male Sprague-Dawley rats by intrastriatal infusion of bacterial collagenase VII-S or autologous blood. UCN (2.5 or 25 μg/kg) was administered i.p. at 60 minutes post-ICH. Penetration of i.p. administered fluorescently labeled UCN into the striatum was examined by fluorescence microscopy. Neurological deficits were evaluated by modified neurological severity score (mNSS). Brain edema was assessed using the dry/wet method. Blood-brain barrier (BBB) disruption was assessed using the Evans blue assay. Hemorrhagic volume and lesion volume were assessed by Drabkin's method and morphometric assay, respectively. Pro-inflammatory cytokine (TNF-α, IL-1β, and IL-6) expression was evaluated by enzyme-linked immunosorbent assay (ELISA). Microglial activation and neuronal loss were evaluated by immunohistochemistry. Administration of UCN reduced neurological deficits from 1 to 7 days post-ICH. Surprisingly, although a higher dose (25 μg/kg, i.p.) also reduced the functional deficits associated with ICH, it is significantly less effective than the lower dose (2.5 μg/kg, i.p.). Beneficial results with the low dose of UCN included a reduction in neurological deficits from 1 to 7 days post-ICH, as well as a reduction in brain edema, BBB disruption, lesion volume, microglial activation and neuronal loss 3 days post-ICH, and suppression of TNF-α, IL-1β, and IL-6 production 1, 3 and 7 days post-ICH.",
478
+ 'In type theory, the successor function $S$ is used to represent the next number in the sequence. When you apply the successor function $S$ three times to the number 0, you get:\n\n1. $S(0)$, which represents 1.\n2. $S(S(0))$, which represents 2.\n3. $S(S(S(0)))$, which represents 3.\n\nSo, the result of applying the successor function $S$ three times to the number 0 in type theory is 3.',
479
+ ]
480
+ embeddings = model.encode(sentences)
481
+ print(embeddings.shape)
482
+ # [3, 384]
483
+
484
+ # Get the similarity scores for the embeddings
485
+ similarities = model.similarity(embeddings, embeddings)
486
+ print(similarities.shape)
487
+ # [3, 3]
488
+ ```
489
+
490
+ <!--
491
+ ### Direct Usage (Transformers)
492
+
493
+ <details><summary>Click to see the direct usage in Transformers</summary>
494
+
495
+ </details>
496
+ -->
497
+
498
+ <!--
499
+ ### Downstream Usage (Sentence Transformers)
500
+
501
+ You can finetune this model on your own dataset.
502
+
503
+ <details><summary>Click to expand</summary>
504
+
505
+ </details>
506
+ -->
507
+
508
+ <!--
509
+ ### Out-of-Scope Use
510
+
511
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
512
+ -->
513
+
514
+ ## Evaluation
515
+
516
+ ### Metrics
517
+
518
+ #### Logging
519
+
520
+ * Dataset: `ir-eval`
521
+ * Evaluated with <code>__main__.LoggingEvaluator</code>
522
+
523
+ | Metric | Value |
524
+ |:-------------------|:----------|
525
+ | cosine_accuracy@1 | 0.9291 |
526
+ | cosine_accuracy@3 | 0.9819 |
527
+ | cosine_accuracy@5 | 0.9934 |
528
+ | cosine_accuracy@10 | 0.9984 |
529
+ | cosine_precision@1 | 0.9291 |
530
+ | cosine_precision@3 | 0.3273 |
531
+ | cosine_precision@5 | 0.1987 |
532
+ | cosine_recall@1 | 0.9291 |
533
+ | cosine_recall@3 | 0.9819 |
534
+ | cosine_recall@5 | 0.9934 |
535
+ | **cosine_ndcg@10** | **0.967** |
536
+ | cosine_mrr@10 | 0.9565 |
537
+ | cosine_map@100 | 0.9566 |
538
+
539
+ <!--
540
+ ## Bias, Risks and Limitations
541
+
542
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
543
+ -->
544
+
545
+ <!--
546
+ ### Recommendations
547
+
548
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
549
+ -->
550
+
551
+ ## Training Details
552
+
553
+ ### Training Dataset
554
+
555
+ #### Unnamed Dataset
556
+
557
+ * Size: 98,112 training samples
558
+ * Columns: <code>sentence_0</code> and <code>sentence_1</code>
559
+ * Approximate statistics based on the first 1000 samples:
560
+ | | sentence_0 | sentence_1 |
561
+ |:--------|:-----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
562
+ | type | string | string |
563
+ | details | <ul><li>min: 6 tokens</li><li>mean: 44.14 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 12 tokens</li><li>mean: 321.5 tokens</li><li>max: 512 tokens</li></ul> |
564
+ * Samples:
565
+ | sentence_0 | sentence_1 |
566
+ |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
567
+ | <code>Are transcobalamin II receptor polymorphisms associated with increased risk for neural tube defects?</code> | <code>Women who have low cobalamin (vitamin B(12)) levels are at increased risk for having children with neural tube defects (NTDs). The transcobalamin II receptor (TCblR) mediates uptake of cobalamin into cells. Inherited variants in the TCblR gene as NTD risk factors were evaluated. Case-control and family-based tests of association were used to screen common variation in TCblR as genetic risk factors for NTDs in a large Irish group. A confirmatory group of NTD triads was used to test positive findings. 2 tightly linked variants associated with NTDs in a recessive model were found: TCblR rs2336573 (G220R; p(corr)=0.0080, corrected for multiple hypothesis testing) and TCblR rs9426 (p(corr)=0.0279). These variants were also associated with NTDs in a family-based test before multiple test correction (log-linear analysis of a recessive model: rs2336573 (G220R; RR=6.59, p=0.0037) and rs9426 (RR=6.71, p=0.0035)). A copy number variant distal to TCblR and two previously unreported exonic insertio...</code> |
568
+ | <code>A company produces three products: Product A, B, and C. The monthly sales figures and marketing expenses (in thousands of dollars) for each product for the last six months are given below:<br><br>| Product | Sales1 | Sales2 | Sales3 | Sales4 | Sales5 | Sales6 | Marketing Expense1 | Marketing Expense2 | Marketing Expense3 | Marketing Expense4 | Marketing Expense5 | Marketing Expense6 |<br>|---------|--------|--------|--------|--------|--------|--------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|<br>| A | 50 | 45 | 55 | 52 | 48 | 56 | 20 | 18 | 25 | 22 | 19 | 30 |<br>| B | 40 | 48 | 35 | 37 | 45 | 38 | 12 | 15 | 10 | 14 | 17 | 11 |<br>| C | 60 | 65 | ...</code> | <code>To calculate the covariance between the sales of Product A and Product B, we first need to find the mean sales for both products. Then, we will calculate the deviations from the mean for each month's sales and multiply these deviations for both products. Finally, we will sum these products and divide by the number of months minus 1.<br><br>Mean sales for Product A:<br>(50 + 45 + 55 + 52 + 48 + 56) / 6 = 306 / 6 = 51<br><br>Mean sales for Product B:<br>(40 + 48 + 35 + 37 + 45 + 38) / 6 = 243 / 6 = 40.5<br><br>Deviations from the mean for Product A:<br>(50 - 51, 45 - 51, 55 - 51, 52 - 51, 48 - 51, 56 - 51) = (-1, -6, 4, 1, -3, 5)<br><br>Deviations from the mean for Product B:<br>(40 - 40.5, 48 - 40.5, 35 - 40.5, 37 - 40.5, 45 - 40.5, 38 - 40.5) = (-0.5, 7.5, -5.5, -3.5, 4.5, -2.5)<br><br>Multiply the deviations for both products:<br>(-1 * -0.5, -6 * 7.5, 4 * -5.5, 1 * -3.5, -3 * 4.5, 5 * -2.5) = (0.5, 45, -22, -3.5, -13.5, -12.5)<br><br>Sum of the products of deviations:<br>0.5 + 45 - 22 - 3.5 - 13.5 - 12.5 = -6<br><br>Covariance between the sale...</code> |
569
+ | <code>Consider a basketball player who has a free-throw shooting percentage of 80%. The player attempts 10 free throws in a game. <br><br>If the player makes a free throw, there is an 80% chance that they will make their next free throw attempt. If they miss a free throw, there's a 60% chance that they will make their next free throw attempt.<br><br>What is the probability that the player makes exactly 7 out of their 10 free throw attempts?</code> | <code>To solve this problem, we can use the concept of conditional probability and the binomial theorem. Let's denote the probability of making a free throw after a successful attempt as P(S) = 0.8 and the probability of making a free throw after a missed attempt as P(M) = 0.6.<br><br>We need to find the probability of making exactly 7 out of 10 free throw attempts. There are multiple ways this can happen, and we need to consider all possible sequences of 7 successes (S) and 3 misses (M). We can represent these sequences as a string of S and M, for example, SSSSSSSMMM.<br><br>There are C(10, 7) = 10! / (7! * 3!) = 120 ways to arrange 7 successes and 3 misses in a sequence of 10 attempts. For each of these sequences, we can calculate the probability of that specific sequence occurring and then sum up the probabilities of all sequences.<br><br>Let's calculate the probability of a specific sequence. For example, consider the sequence SSSSSSSMMM. The probability of this sequence occurring is:<br><br>P(SSSSSSSMMM) = P(S...</code> |
570
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
571
+ ```json
572
+ {
573
+ "scale": 20.0,
574
+ "similarity_fct": "cos_sim"
575
+ }
576
+ ```
577
+
578
+ ### Training Hyperparameters
579
+ #### Non-Default Hyperparameters
580
+
581
+ - `eval_strategy`: steps
582
+ - `per_device_train_batch_size`: 32
583
+ - `per_device_eval_batch_size`: 32
584
+ - `num_train_epochs`: 1
585
+ - `batch_sampler`: no_duplicates
586
+ - `multi_dataset_batch_sampler`: round_robin
587
+
588
+ #### All Hyperparameters
589
+ <details><summary>Click to expand</summary>
590
+
591
+ - `overwrite_output_dir`: False
592
+ - `do_predict`: False
593
+ - `eval_strategy`: steps
594
+ - `prediction_loss_only`: True
595
+ - `per_device_train_batch_size`: 32
596
+ - `per_device_eval_batch_size`: 32
597
+ - `per_gpu_train_batch_size`: None
598
+ - `per_gpu_eval_batch_size`: None
599
+ - `gradient_accumulation_steps`: 1
600
+ - `eval_accumulation_steps`: None
601
+ - `torch_empty_cache_steps`: None
602
+ - `learning_rate`: 5e-05
603
+ - `weight_decay`: 0.0
604
+ - `adam_beta1`: 0.9
605
+ - `adam_beta2`: 0.999
606
+ - `adam_epsilon`: 1e-08
607
+ - `max_grad_norm`: 1
608
+ - `num_train_epochs`: 1
609
+ - `max_steps`: -1
610
+ - `lr_scheduler_type`: linear
611
+ - `lr_scheduler_kwargs`: {}
612
+ - `warmup_ratio`: 0.0
613
+ - `warmup_steps`: 0
614
+ - `log_level`: passive
615
+ - `log_level_replica`: warning
616
+ - `log_on_each_node`: True
617
+ - `logging_nan_inf_filter`: True
618
+ - `save_safetensors`: True
619
+ - `save_on_each_node`: False
620
+ - `save_only_model`: False
621
+ - `restore_callback_states_from_checkpoint`: False
622
+ - `no_cuda`: False
623
+ - `use_cpu`: False
624
+ - `use_mps_device`: False
625
+ - `seed`: 42
626
+ - `data_seed`: None
627
+ - `jit_mode_eval`: False
628
+ - `use_ipex`: False
629
+ - `bf16`: False
630
+ - `fp16`: False
631
+ - `fp16_opt_level`: O1
632
+ - `half_precision_backend`: auto
633
+ - `bf16_full_eval`: False
634
+ - `fp16_full_eval`: False
635
+ - `tf32`: None
636
+ - `local_rank`: 0
637
+ - `ddp_backend`: None
638
+ - `tpu_num_cores`: None
639
+ - `tpu_metrics_debug`: False
640
+ - `debug`: []
641
+ - `dataloader_drop_last`: False
642
+ - `dataloader_num_workers`: 0
643
+ - `dataloader_prefetch_factor`: None
644
+ - `past_index`: -1
645
+ - `disable_tqdm`: False
646
+ - `remove_unused_columns`: True
647
+ - `label_names`: None
648
+ - `load_best_model_at_end`: False
649
+ - `ignore_data_skip`: False
650
+ - `fsdp`: []
651
+ - `fsdp_min_num_params`: 0
652
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
653
+ - `tp_size`: 0
654
+ - `fsdp_transformer_layer_cls_to_wrap`: None
655
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
656
+ - `deepspeed`: None
657
+ - `label_smoothing_factor`: 0.0
658
+ - `optim`: adamw_torch
659
+ - `optim_args`: None
660
+ - `adafactor`: False
661
+ - `group_by_length`: False
662
+ - `length_column_name`: length
663
+ - `ddp_find_unused_parameters`: None
664
+ - `ddp_bucket_cap_mb`: None
665
+ - `ddp_broadcast_buffers`: False
666
+ - `dataloader_pin_memory`: True
667
+ - `dataloader_persistent_workers`: False
668
+ - `skip_memory_metrics`: True
669
+ - `use_legacy_prediction_loop`: False
670
+ - `push_to_hub`: False
671
+ - `resume_from_checkpoint`: None
672
+ - `hub_model_id`: None
673
+ - `hub_strategy`: every_save
674
+ - `hub_private_repo`: None
675
+ - `hub_always_push`: False
676
+ - `gradient_checkpointing`: False
677
+ - `gradient_checkpointing_kwargs`: None
678
+ - `include_inputs_for_metrics`: False
679
+ - `include_for_metrics`: []
680
+ - `eval_do_concat_batches`: True
681
+ - `fp16_backend`: auto
682
+ - `push_to_hub_model_id`: None
683
+ - `push_to_hub_organization`: None
684
+ - `mp_parameters`:
685
+ - `auto_find_batch_size`: False
686
+ - `full_determinism`: False
687
+ - `torchdynamo`: None
688
+ - `ray_scope`: last
689
+ - `ddp_timeout`: 1800
690
+ - `torch_compile`: False
691
+ - `torch_compile_backend`: None
692
+ - `torch_compile_mode`: None
693
+ - `include_tokens_per_second`: False
694
+ - `include_num_input_tokens_seen`: False
695
+ - `neftune_noise_alpha`: None
696
+ - `optim_target_modules`: None
697
+ - `batch_eval_metrics`: False
698
+ - `eval_on_start`: False
699
+ - `use_liger_kernel`: False
700
+ - `eval_use_gather_object`: False
701
+ - `average_tokens_across_devices`: False
702
+ - `prompts`: None
703
+ - `batch_sampler`: no_duplicates
704
+ - `multi_dataset_batch_sampler`: round_robin
705
+
706
+ </details>
707
+
708
+ ### Training Logs
709
+ | Epoch | Step | Training Loss | ir-eval_cosine_ndcg@10 |
710
+ |:------:|:----:|:-------------:|:----------------------:|
711
+ | 0.1631 | 500 | 0.0634 | 0.9563 |
712
+ | 0.3262 | 1000 | 0.005 | 0.9627 |
713
+ | 0.4892 | 1500 | 0.0037 | 0.9631 |
714
+ | 0.6523 | 2000 | 0.0029 | 0.9660 |
715
+ | 0.8154 | 2500 | 0.0033 | 0.9663 |
716
+ | 0.9785 | 3000 | 0.0027 | 0.9670 |
717
+ | 1.0 | 3066 | - | 0.9670 |
718
+
719
+
720
+ ### Framework Versions
721
+ - Python: 3.12.8
722
+ - Sentence Transformers: 3.4.1
723
+ - Transformers: 4.51.3
724
+ - PyTorch: 2.5.1+cu124
725
+ - Accelerate: 1.3.0
726
+ - Datasets: 3.2.0
727
+ - Tokenizers: 0.21.0
728
+
729
+ ## Citation
730
+
731
+ ### BibTeX
732
+
733
+ #### Sentence Transformers
734
+ ```bibtex
735
+ @inproceedings{reimers-2019-sentence-bert,
736
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
737
+ author = "Reimers, Nils and Gurevych, Iryna",
738
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
739
+ month = "11",
740
+ year = "2019",
741
+ publisher = "Association for Computational Linguistics",
742
+ url = "https://arxiv.org/abs/1908.10084",
743
+ }
744
+ ```
745
+
746
+ #### MultipleNegativesRankingLoss
747
+ ```bibtex
748
+ @misc{henderson2017efficient,
749
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
750
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
751
+ year={2017},
752
+ eprint={1705.00652},
753
+ archivePrefix={arXiv},
754
+ primaryClass={cs.CL}
755
+ }
756
+ ```
757
+
758
+ <!--
759
+ ## Glossary
760
+
761
+ *Clearly define terms in order to be accessible across audiences.*
762
+ -->
763
+
764
+ <!--
765
+ ## Model Card Authors
766
+
767
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
768
+ -->
769
+
770
+ <!--
771
+ ## Model Card Contact
772
+
773
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
774
+ -->
config.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertModel"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "classifier_dropout": null,
7
+ "hidden_act": "gelu",
8
+ "hidden_dropout_prob": 0.1,
9
+ "hidden_size": 384,
10
+ "initializer_range": 0.02,
11
+ "intermediate_size": 1536,
12
+ "layer_norm_eps": 1e-12,
13
+ "max_position_embeddings": 512,
14
+ "model_type": "bert",
15
+ "num_attention_heads": 12,
16
+ "num_hidden_layers": 12,
17
+ "pad_token_id": 0,
18
+ "position_embedding_type": "absolute",
19
+ "torch_dtype": "float32",
20
+ "transformers_version": "4.51.3",
21
+ "type_vocab_size": 2,
22
+ "use_cache": true,
23
+ "vocab_size": 30522
24
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.4.1",
4
+ "transformers": "4.51.3",
5
+ "pytorch": "2.5.1+cu124"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c51e170326117229b63b29a3cfc35d8ac3fe1c089636235a2ab3acf19f76b5f8
3
+ size 133462128
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "max_length": 128,
51
+ "model_max_length": 512,
52
+ "never_split": null,
53
+ "pad_to_multiple_of": null,
54
+ "pad_token": "[PAD]",
55
+ "pad_token_type_id": 0,
56
+ "padding_side": "right",
57
+ "sep_token": "[SEP]",
58
+ "stride": 0,
59
+ "strip_accents": null,
60
+ "tokenize_chinese_chars": true,
61
+ "tokenizer_class": "BertTokenizer",
62
+ "truncation_side": "right",
63
+ "truncation_strategy": "longest_first",
64
+ "unk_token": "[UNK]"
65
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff