amentaphd commited on
Commit
9b08750
·
verified ·
1 Parent(s): 18159dd

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ checkpoint-1000/tokenizer.json filter=lfs diff=lfs merge=lfs -text
37
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,873 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:46338
8
+ - loss:MatryoshkaLoss
9
+ - loss:MultipleNegativesRankingLoss
10
+ base_model: Snowflake/snowflake-arctic-embed-m-v2.0
11
+ widget:
12
+ - source_sentence: What role does ESMA play in the development of guidelines and regulatory
13
+ technical standards related to cooperation arrangements with third countries as
14
+ mentioned in the text?
15
+ sentences:
16
+ - 'If a planned change is implemented notwithstanding the first and second subparagraphs,
17
+ or if an unplanned change has taken place pursuant to which the AIFM’s management
18
+ of the AIF would no longer comply with this Directive or the AIFM otherwise would
19
+ no longer comply with this Directive, the competent authorities of the home Member
20
+ State of the AIFM shall take all due measures in accordance with Article 46, including,
21
+ if necessary, the express prohibition of marketing of the AIF.
22
+
23
+
24
+ If the changes are acceptable because they do not affect the compliance of the
25
+ AIFM’s management of the AIF with this Directive, or the compliance by the AIFM
26
+ with this Directive otherwise, the competent authorities of the home Member State
27
+ of the AIFM shall, without delay, inform ESMA in so far as the changes concern
28
+ the termination of the marketing of certain AIFs or additional AIFs marketed and,
29
+ if applicable, the competent authorities of the host Member States of the AIFM
30
+ of those changes.
31
+
32
+
33
+ 11.
34
+
35
+
36
+ The Commission shall adopt, by means of delegated acts in accordance with Article
37
+ 56 and subject to the conditions of Articles 57 and 58, measures regarding the
38
+ cooperation arrangements referred to in point (a) of paragraph 2 in order to design
39
+ a common framework to facilitate the establishment of those cooperation arrangements
40
+ with third countries.
41
+
42
+
43
+ 12.
44
+
45
+
46
+ In order to ensure uniform application of this Article, ESMA may develop guidelines
47
+ to determine the conditions of application of the measures adopted by the Commission
48
+ regarding the cooperation arrangements referred to in point (a) of paragraph 2.
49
+
50
+
51
+ 13.
52
+
53
+
54
+ ESMA shall develop draft regulatory technical standards to determine the minimum
55
+ content of the cooperation arrangements referred to in point (a) of paragraph
56
+ 2 so as to ensure that both the competent authorities of the home and the host
57
+ Member States receive sufficient information in order to be able to exercise their
58
+ supervisory and investigatory powers under this Directive.
59
+
60
+
61
+ Power is delegated to the Commission to adopt the regulatory technical standards
62
+ referred to in the first subparagraph in accordance with Article 10 to 14 of Regulation
63
+ (EU) No 1095/2010.
64
+
65
+
66
+ 14.'
67
+ - (23) This Regulation should also apply to Union institutions, bodies, offices
68
+ and agencies when acting as a provider or deployer of an AI system.
69
+ - An operator that is a natural person or a microenterprise may mandate the next
70
+ operator or trader further down the supply chain that is not a natural person
71
+ or a microenterprise to act as an authorised representative. Such next operator
72
+ or trader further down the supply chain shall not place or make available relevant
73
+ products on the market or export them without submitting the due diligence statement
74
+ pursuant to Article 4(2) on behalf of that operator. In such cases, the operator
75
+ that is a natural person or a microenterprise shall retain responsibility for
76
+ compliance of the relevant product with Article 3, and shall communicate to that
77
+ next operator or trader further down the supply chain all information necessary
78
+ to confirm that due
79
+ - source_sentence: A review is scheduled for June 2019 to determine if the regulations
80
+ regarding hazardous substances should be broadened, based on practical experiences.
81
+ Additionally, the Commission aims to promote alternatives to animal testing by
82
+ reassessing testing requirements, potentially leading to amendments that prioritize
83
+ health and environmental safety.
84
+ sentences:
85
+ - '18 June 1994, until such plant and machinery is disposed of; (b) in the case
86
+ of the maintenance of plant and machinery already in service within a Member State
87
+ on 18 June 1994. For the purposes of point (a) Member States may, on grounds of
88
+ human health protection and environmental protection, prohibit within their territory
89
+ the use of such plant or machinery before it is disposed of. 25. Monomethyl-dichloro-diphenyl
90
+ methane Trade name: Ugilec 121 Ugilec 21 Shall not be placed on the market, or
91
+ used, as a substance or in mixtures. Articles containing the substance shall not
92
+ be placed on the market. 26. Monomethyl-dibromo-diphenyl methane bromobenzylbromotoluene,
93
+ mixture of isomers Trade name: DBBT CAS No 99688-47-8 Shall not be placed on'
94
+ - (35) | The fight against litter is a shared effort between competent authorities,
95
+ producers and consumers. Public authorities, including the Union institutions,
96
+ should lead by example.
97
+ - '7.
98
+
99
+
100
+ By 1 June 2013 the Commission shall carry out a review to assess whether or not,
101
+ taking into account latest developments in scientific knowledge, to extend the
102
+ scope of Article 60(3) to substances identified under Article 57(f) as having
103
+ endocrine disrupting properties. On the basis of that review the Commission may,
104
+ if appropriate, present legislative proposals.
105
+
106
+
107
+ 8.
108
+
109
+
110
+ By 1 June 2019, the Commission shall carry out a review to assess whether or not
111
+ to extend the scope of Article 33 to cover other dangerous substances, taking
112
+ into account the practical experience in implementing that Article. On the basis
113
+ of that review, the Commission may, if appropriate, present legislative proposals
114
+ to extend that obligation.
115
+
116
+
117
+ 9.
118
+
119
+
120
+ In accordance with the objective of promoting non-animal testing and the replacement,
121
+ reduction or refinement of animal testing required under this Regulation, the
122
+ Commission shall review the testing requirements of Section 8.7 of Annex VIII
123
+ by 1 June 2019. On the basis of this review, while ensuring a high level of protection
124
+ of health and the environment, the Commission may propose an amendment in accordance
125
+ with the procedure referred to in Article 133(4).
126
+
127
+
128
+ Article 139
129
+
130
+
131
+ Repeals
132
+
133
+
134
+ Directive 91/155/EEC shall be repealed.
135
+
136
+
137
+ Directives 93/105/EC and 2000/21/EC and Regulations (EEC) No 793/93 and (EC) No
138
+ 1488/94 shall be repealed with effect from 1 June 2008.
139
+
140
+
141
+ Directive 93/67/EEC shall be repealed with effect from 1 August 2008.
142
+
143
+
144
+ Directive 76/769/EEC shall be repealed with effect from 1 June 2009.
145
+
146
+
147
+ References to the repealed acts shall be construed as references to this Regulation.
148
+
149
+
150
+ Article 140
151
+
152
+
153
+ Amendment of Directive 1999/45/EC
154
+
155
+
156
+ Article 14 of Directive 1999/45/EC shall be deleted.
157
+
158
+
159
+ Article 141
160
+
161
+
162
+ Entry into force and application
163
+
164
+
165
+ 1.
166
+
167
+
168
+ This Regulation shall enter into force on 1 June 2007.
169
+
170
+
171
+ 2.
172
+
173
+
174
+ Titles II, III, V, VI, VII, XI and XII as well as Articles 128 and 136 shall apply
175
+ from 1 June 2008.
176
+
177
+
178
+ 3.
179
+
180
+
181
+ Article 135 shall apply from 1 August 2008.
182
+
183
+
184
+ 4.
185
+
186
+
187
+ Title VIII and Annex XVII shall apply from 1 June 2009.
188
+
189
+
190
+ This Regulation shall be binding in its entirety and directly applicable in all
191
+ Member States.
192
+
193
+
194
+ LIST OF ANNEXES
195
+
196
+
197
+ ANNEX I GENERAL PROVISIONS FOR ASSESSING SUBSTANCES AND PREPARING CHEMICAL SAFETY
198
+ REPORTS ANNEX II REQUIREMENTS FOR THE COMPILATION OF SAFETY DATA SHEETS ANNEX
199
+ III CRITERIA FOR SUBSTANCES REGISTERED IN QUANTITIES BETWEEN 1 AND 10 TONNES ANNEX
200
+ IV EXEMPTIONS FROM THE OBLIGATION TO REGISTER IN ACCORDANCE WITH ARTICLE 2(7)(a)
201
+ ANNEX V EXEMPTIONS FROM THE OBLIGATION TO REGISTER IN ACCORDANCE WITH ARTICLE
202
+ 2(7)(b) ANNEX VI INFORMATION REQUIREMENTS REFERRED TO IN ARTICLE 10 ANNEX VII
203
+ STANDARD INFORMATION REQUIREMENTS FOR SUBSTANCES MANUFACTURED OR IMPORTED IN QUANTITIES
204
+ OF ONE TONNE OR MORE ANNEX VIII STANDARD INFORMATION REQUIREMENTS FOR SUBSTANCES
205
+ MANUFACTURED OR IMPORTED IN QUANTITIES OF 10 TONNES OR MORE ANNEX IX STANDARD
206
+ INFORMATION REQUIREMENTS FOR SUBSTANCES MANUFACTURED OR IMPORTED IN QUANTITIES
207
+ OF 100 TONNES OR MORE ANNEX X STANDARD INFORMATION REQUIREMENTS FOR SUBSTANCES
208
+ MANUFACTURED OR IMPORTED IN QUANTITIES OF 1 000 TONNES OR MORE ANNEX XI GENERAL
209
+ RULES FOR ADAPTATION OF THE STANDARD TESTING REGIME SET OUT IN ANNEXES VII TO
210
+ X ANNEX XII GENERAL PROVISIONS FOR DOWNSTREAM USERS TO ASSESS SUBSTANCES AND PREPARE
211
+ CHEMICAL SAFETY REPORTS ANNEX XIII CRITERIA FOR THE IDENTIFICATION OF PERSISTENT,
212
+ BIOACCUMULATIVE AND TOXIC SUBSTANCES, AND VERY PERSISTENT AND VERY BIOACCUMULATIVE
213
+ SUBSTANCES ANNEX XIV LIST OF SUBSTANCES SUBJECT TO AUTHORISATION ANNEX XV DOSSIERS
214
+ ANNEX XVI SOCIO-ECONOMIC ANALYSIS ANNEX XVII RESTRICTIONS ON THE MANUFACTURE,
215
+ PLACING ON THE MARKET AND USE OF CERTAIN DANGEROUS SUBSTANCES, MIXTURES AND ARTICLES
216
+
217
+
218
+ ANNEX I
219
+
220
+
221
+ GENERAL PROVISIONS FOR ASSESSING SUBSTANCES AND PREPARING CHEMICAL SAFETY REPORTS
222
+
223
+
224
+ 0. INTRODUCTION
225
+
226
+
227
+ ▼M51'
228
+ - source_sentence: What actions must the Commission take if the economic operator
229
+ does not provide commitments or if the provided commitments are deemed inappropriate
230
+ or insufficient to address the distortion?
231
+ sentences:
232
+ - '2.
233
+
234
+
235
+ Where the economic operator concerned does not offer commitments or where the
236
+ Commission considers that the commitments referred to in paragraph 1 are neither
237
+ appropriate nor sufficient to fully and effectively remedy the distortion, the
238
+ Commission shall adopt an implementing act in the form of a decision prohibiting
239
+ the award of the contract to the economic operator concerned (‘decision prohibiting
240
+ the award of the contract’). That implementing act shall be adopted in accordance
241
+ with the advisory procedure referred to in Article 48(2). Following that decision,
242
+ the contracting authority or contracting entity shall reject the tender.
243
+
244
+
245
+ 3.'
246
+ - 6,5 8,9 (1) The values for biogas production from manure include negative emissions
247
+ for emissions saved from raw manure management. The value of esca considered is
248
+ equal to – 45 g CO2eq/MJ manure used in anaerobic digestion. (2) Maize whole
249
+ plant means maize harvested as fodder and ensiled for preservation. (3) Transport
250
+ of agricultural raw materials to the transformation plant is, according to the
251
+ methodology provided in the Commission's report of 25 February 2010 on sustainability
252
+ requirements for the use of solid and gaseous biomass sources in electricity,
253
+ heating and cooling, included in the ‘cultivation’ value. The value for transport
254
+ of maize silage accounts for 0,4 g CO2eq/MJ biogas.
255
+ - reduction in the consumption of lightweight plastic carrier bags. It should be
256
+ possible for Member States, while observing the general rules laid down in the
257
+ TFEU and acting in accordance with this Regulation, to adopt provisions which
258
+ go beyond the minimum waste prevention targets set out in this Regulation. When
259
+ implementing such measures, Member States should be aware of the risk of a shift
260
+ from heavier to lighter packaging materials and should prioritise measures that
261
+ minimise that risk.
262
+ - source_sentence: The content provides a comprehensive overview of numerous chemical
263
+ substances, including their structural formulas and potential applications. It
264
+ emphasizes the significance of specific compounds like acrylamide and thioacetamide,
265
+ while also addressing mixtures derived from coal tar. The information reflects
266
+ the intricate nature of chemical synthesis and the importance of understanding
267
+ the properties and uses of these compounds in various industrial contexts.
268
+ sentences:
269
+ - '2.
270
+
271
+
272
+ Each Member State shall ensure that a producer as defined in Article 3(1)(f)(iv)
273
+ and established on its territory, which sells EEE to another Member State in which
274
+ it is not established, appoints an authorised representative in that Member State
275
+ as the person responsible for fulfilling the obligations of that producer, pursuant
276
+ to this Directive, on the territory of that Member State.
277
+
278
+
279
+ 3.
280
+
281
+
282
+ Appointment of an authorised representative shall be by written mandate.
283
+
284
+
285
+ Article 18
286
+
287
+
288
+ Administrative cooperation and exchange of information'
289
+ - '(a) display to customers and potential customers, in a visible manner, the labels
290
+ provided in accordance with Article 32(1), point (b) or (c); (b) make reference
291
+ to the information included on the labels provided in accordance with Article
292
+ 32(1), point (b) or (c), in visual advertisements or in technical promotional
293
+ material for a specific model, in accordance with the applicable delegated acts
294
+ adopted pursuant to Article 4; and --- --- (c) not provide or display other labels,
295
+ marks, symbols or inscriptions that are likely to mislead or confuse customers
296
+ and potential customers with regard to the information included on the label regarding
297
+ ecodesign requirements. --- ---
298
+
299
+
300
+ Article 32
301
+
302
+
303
+ Obligations related to labels'
304
+ - '[2] 612-196-00-0 202-441-6 [1] 221-627-8 [2] 95-69-2 [1] 3165-93-3 [2] ►M5 —
305
+ ◄ 2,4,5-Trimethylaniline [1] 2,4,5-trimethylaniline hydrochloride [2] 612-197-00-6
306
+ 205-282-0 [1] -[2] 137-17-7 [1] 21436-97-5 [2] ►M5 — ◄ 4,4''-Thiodianiline [1]
307
+ and its salts 612-198-00-1 205-370-9 [1] 139-65-1 [1] ►M5 — ◄ 4,4''-Oxydianiline
308
+ [1] and its salts p-Aminophenyl ether [1] 612-199-00-7 202-977-0 [1] 101-80-4
309
+ [1] ►M5 — ◄ 2,4-Diaminoanisole [1] 4-methoxy-m-phenylenediamine 2,4-diaminoanisole
310
+ sulphate [2] 612-200-00-0 210-406-1 [1] 254-323-9 [2] 615-05-4 [1] 39156-41-7
311
+ [2] N, N,N'',N''-tetramethyl-4,4''-methylendianiline 612-201-00-6 202-959-2 101-61-1
312
+ C.I. Basic Violet 3 with ≥ 0,1 % of Michler''s ketone (EC No 202-027-5) 612-205-00-8
313
+ 208-953-6 548-62-9 ►M5 — ◄ 6-Methoxy-m-toluidine p-cresidine 612-209-00-X 204-419-1
314
+ 120-71-8 ►M5 — ◄ [▼M14](./../../../legal-content/EN/AUTO/?uri=celex:32012R0109
315
+ "32012R0109: INSERTED") Biphenyl-3,3′,4,4′-tetrayltetraamine; Diaminobenzidine
316
+ 612-239-00-3 202-110-6 91-95-2 (2-chloroethyl)(3-hydroxypropyl)ammonium chloride
317
+ 612-246-00-1 429-740-6 40722-80-3 3-Amino-9-ethyl carbazole; 9-Ethylcarbazol-3-ylamine
318
+ 612-280-00-7 205-057-7 132-32-1 [▼M49](./../../../legal-content/EN/AUTO/?uri=celex:32018R0675
319
+ "32018R0675: INSERTED") Reaction products of paraformaldehyde and 2-hydroxypropylamine
320
+ (ratio 3:2); [formaldehyde released from 3,3′-methylenebis[5-methyloxazolidine];
321
+ formaldehyde released from oxazolidin]; [MBO] 612-290-00-1 — — Reaction products
322
+ of paraformaldehyde with 2-hydroxypropylamine (ratio 1:1); [formaldehyde released
323
+ from α,α,α-trimethyl-1,3,5-triazine-1,3,5(2H,4H,6H)-triethanol]; [HPT] 612-291-00-7
324
+ — — Methylhydrazine 612-292-00-2 200-471-4 60-34-4 [▼C1](./../../../legal-content/EN/AUTO/?uri=celex:32006R1907R%2801%29
325
+ "32006R1907R(01): REPLACED") Ethyleneimine; aziridine 613-001-00-1 205-793-9 151-56-4
326
+ 2-Methylaziridine; propyleneimine 613-033-00-6 200-878-7 75-55-8 ►M5 — ◄ Captafol
327
+ (ISO); 1,2,3,6-tetrahydro-N-(1,1,2,2-tetrachloroethylthio) phthalimide 613-046-00-7
328
+ 219-363-3 2425-06-1 Carbadox (INN); methyl 3-(quinoxalin-2-ylmethylene)carbazate
329
+ 1,4-dioxide; 2-(methoxycarbonylhydrazonomethyl) quinoxaline 1,4-dioxide 613-050-00-9
330
+ 229-879-0 6804-07-5 A mixture of: 1,3,5-tris(3-aminomethylphenyl)-1,3,5-(1H,3H,5H)-triazine-2,4,6-trione;
331
+ a mixture of oligomers of 3,5-bis(3-aminomethylphenyl)-1-poly[3,5-bis(3-aminomethylphenyl)-2,4,6-trioxo-1,3,5-(1H,3H,5H)-triazin-1-yl]-1,3,5-(1H,3H,5H)-triazine-2,4,6-trione
332
+ 613-199-00-X 421-550-1 — [▼M14](./../../../legal-content/EN/AUTO/?uri=celex:32012R0109
333
+ "32012R0109: INSERTED") Quinoline 613-281-00-5 202-051-6 91-22-5 [▼C1](./../../../legal-content/EN/AUTO/?uri=celex:32006R1907R%2801%29
334
+ "32006R1907R(01): REPLACED") Acrylamide 616-003-00-0 201-173-7 79-06-1 [▼M69](./../../../legal-content/EN/AUTO/?uri=celex:32021R2204
335
+ "32021R2204: INSERTED") Butanone oxime; ethyl methyl ketoxime; ethyl methyl ketone
336
+ oxime 616-014-00-0 202-496-6 96-29-7 [▼C1](./../../../legal-content/EN/AUTO/?uri=celex:32006R1907R%2801%29
337
+ "32006R1907R(01): REPLACED") Thioacetamide 616-026-00-6 200-541-4 62-55-5 A mixture
338
+ of: N-[3-hydroxy-2-(2-methylacryloylamino-methoxy)propoxymethyl]-2-methylacrylamide;
339
+ N-[2,3-Bis-(2-methylacryloylamino-methoxy)propoxymethyl]-2-methylacrylamide; methacrylamide;
340
+ 2-methyl-N-(2-methyl-acryloylaminomethoxymethyl)-acrylamide; N-2,3-dihydroxypropoxymethyl)-2-methylacrylamide
341
+ 616-057-00-5 412-790-8 — [▼M14](./../../../legal-content/EN/AUTO/?uri=celex:32012R0109
342
+ "32012R0109: INSERTED") N-[6,9-dihydro-9-[[2-hydroxy-1-(hydroxymethyl)ethoxy]methyl]-6-oxo-1H-purin-2-yl]acetamide
343
+ 616-148-00-X 424-550-1 84245-12-5 [▼M69](./../../../legal-content/EN/AUTO/?uri=celex:32021R2204
344
+ "32021R2204: INSERTED") N-(hydroxymethyl)acrylamide; methylolacrylamide; [NMA]
345
+ 616-230-00-5 213-103-2 924-42-5 [▼C1](./../../../legal-content/EN/AUTO/?uri=celex:32006R1907R%2801%29
346
+ "32006R1907R(01): REPLACED") Distillates (coal tar), benzole fraction; Light oil
347
+ (A complex combination of hydrocarbons obtained by the distillation of coal tar.
348
+ It consists of hydrocarbons having carbon numbers primarily in the range of C4
349
+ to C10 and distilling in the approximate range of 80 to 160 °C.) 648-001-00-0
350
+ 283-482-7 84650-02-2 Tar oils, brown-coal; Light oil (The distillate from lignite
351
+ tar boiling in the range of approximately 80 to 250 °C. Composed primarily of
352
+ aliphatic and aromatic hydrocarbons and monobasic phenols.) 648-002-00-6 302-674-4
353
+ 94114-40-6 J Benzol forerunnings (coal); Light oil redistillate, low boiling'
354
+ - source_sentence: How does the new Eurostat methodology differ in scope from the
355
+ indicators used in this Directive for calculating energy consumption?
356
+ sentences:
357
+ - (29) The methodology for calculation of primary energy consumption and final energy
358
+ consumption is aligned with the new Eurostat methodology, but the indicators used
359
+ for the purpose of this Directive have a different scope, in that they exclude
360
+ ambient energy and include energy consumption in international aviation for the
361
+ targets in primary energy consumption and final energy consumption. The use of
362
+ new indicators also implies that any changes in energy consumption of blast furnaces
363
+ are now only reflected in primary energy consumption.
364
+ - (92) InvestEU is the Union flagship programme to boost investment, especially
365
+ the green and digital transition, by providing financing and technical assistance,
366
+ for instance through blending mechanisms. Such an approach contributes to crowd
367
+ in additional public and private capital. Moreover, Member States are encouraged
368
+ to contribute to the InvestEU Member State compartment to support financial products
369
+ available to net-zero technology manufacturing, without prejudice to applicable
370
+ State aid rules.
371
+ - be used, filled or transported through the system; --- --- (iii) specify the terms
372
+ and conditions for proper handling and packaging use; --- --- (iv) specify detailed
373
+ requirements for packaging reconditioning; --- --- (v) specify the requirements
374
+ for packaging collection; --- --- (vi) specify the requirements for packaging
375
+ storage; --- --- (vii) specify the requirements for packaging filling or uploading;
376
+ --- --- (viii) specify rules to ensure the effective and efficient collection
377
+ of reusable packaging, including by providing for incentives for end users to
378
+ return the packaging to the collection points or grouped collection system; ---
379
+ --- (ix) specify rules to ensure equal and fair access to the re-use system, including
380
+ for vulnerable
381
+ pipeline_tag: sentence-similarity
382
+ library_name: sentence-transformers
383
+ metrics:
384
+ - cosine_accuracy@1
385
+ - cosine_accuracy@3
386
+ - cosine_accuracy@5
387
+ - cosine_accuracy@10
388
+ - cosine_precision@1
389
+ - cosine_precision@3
390
+ - cosine_precision@5
391
+ - cosine_precision@10
392
+ - cosine_recall@1
393
+ - cosine_recall@3
394
+ - cosine_recall@5
395
+ - cosine_recall@10
396
+ - cosine_ndcg@10
397
+ - cosine_mrr@10
398
+ - cosine_map@100
399
+ model-index:
400
+ - name: SentenceTransformer based on Snowflake/snowflake-arctic-embed-m-v2.0
401
+ results:
402
+ - task:
403
+ type: information-retrieval
404
+ name: Information Retrieval
405
+ dataset:
406
+ name: Unknown
407
+ type: unknown
408
+ metrics:
409
+ - type: cosine_accuracy@1
410
+ value: 0.7136198860693941
411
+ name: Cosine Accuracy@1
412
+ - type: cosine_accuracy@3
413
+ value: 0.9243915069911963
414
+ name: Cosine Accuracy@3
415
+ - type: cosine_accuracy@5
416
+ value: 0.9589159330226135
417
+ name: Cosine Accuracy@5
418
+ - type: cosine_accuracy@10
419
+ value: 0.981874676333506
420
+ name: Cosine Accuracy@10
421
+ - type: cosine_precision@1
422
+ value: 0.7136198860693941
423
+ name: Cosine Precision@1
424
+ - type: cosine_precision@3
425
+ value: 0.30813050233039874
426
+ name: Cosine Precision@3
427
+ - type: cosine_precision@5
428
+ value: 0.1917831866045227
429
+ name: Cosine Precision@5
430
+ - type: cosine_precision@10
431
+ value: 0.09818746763335057
432
+ name: Cosine Precision@10
433
+ - type: cosine_recall@1
434
+ value: 0.7136198860693941
435
+ name: Cosine Recall@1
436
+ - type: cosine_recall@3
437
+ value: 0.9243915069911963
438
+ name: Cosine Recall@3
439
+ - type: cosine_recall@5
440
+ value: 0.9589159330226135
441
+ name: Cosine Recall@5
442
+ - type: cosine_recall@10
443
+ value: 0.981874676333506
444
+ name: Cosine Recall@10
445
+ - type: cosine_ndcg@10
446
+ value: 0.8626251072928146
447
+ name: Cosine Ndcg@10
448
+ - type: cosine_mrr@10
449
+ value: 0.8227635844026309
450
+ name: Cosine Mrr@10
451
+ - type: cosine_map@100
452
+ value: 0.8236564067385257
453
+ name: Cosine Map@100
454
+ ---
455
+
456
+ # SentenceTransformer based on Snowflake/snowflake-arctic-embed-m-v2.0
457
+
458
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Snowflake/snowflake-arctic-embed-m-v2.0](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v2.0). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
459
+
460
+ ## Model Details
461
+
462
+ ### Model Description
463
+ - **Model Type:** Sentence Transformer
464
+ - **Base model:** [Snowflake/snowflake-arctic-embed-m-v2.0](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v2.0) <!-- at revision 5d1bbbdf0d1c2772eff7961f4cdc32b8426dac69 -->
465
+ - **Maximum Sequence Length:** 8192 tokens
466
+ - **Output Dimensionality:** 768 dimensions
467
+ - **Similarity Function:** Cosine Similarity
468
+ <!-- - **Training Dataset:** Unknown -->
469
+ <!-- - **Language:** Unknown -->
470
+ <!-- - **License:** Unknown -->
471
+
472
+ ### Model Sources
473
+
474
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
475
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
476
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
477
+
478
+ ### Full Model Architecture
479
+
480
+ ```
481
+ SentenceTransformer(
482
+ (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: GteModel
483
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
484
+ (2): Normalize()
485
+ )
486
+ ```
487
+
488
+ ## Usage
489
+
490
+ ### Direct Usage (Sentence Transformers)
491
+
492
+ First install the Sentence Transformers library:
493
+
494
+ ```bash
495
+ pip install -U sentence-transformers
496
+ ```
497
+
498
+ Then you can load this model and run inference.
499
+ ```python
500
+ from sentence_transformers import SentenceTransformer
501
+
502
+ # Download from the 🤗 Hub
503
+ model = SentenceTransformer("sentence_transformers_model_id")
504
+ # Run inference
505
+ sentences = [
506
+ 'How does the new Eurostat methodology differ in scope from the indicators used in this Directive for calculating energy consumption?',
507
+ '(29) The methodology for calculation of primary energy consumption and final energy consumption is aligned with the new Eurostat methodology, but the indicators used for the purpose of this Directive have a different scope, in that they exclude ambient energy and include energy consumption in international aviation for the targets in primary energy consumption and final energy consumption. The use of new indicators also implies that any changes in energy consumption of blast furnaces are now only reflected in primary energy consumption.',
508
+ '(92) InvestEU is the Union flagship programme to boost investment, especially the green and digital transition, by providing financing and technical assistance, for instance through blending mechanisms. Such an approach contributes to crowd in additional public and private capital. Moreover, Member States are encouraged to contribute to the InvestEU Member State compartment to support financial products available to net-zero technology manufacturing, without prejudice to applicable State aid rules.',
509
+ ]
510
+ embeddings = model.encode(sentences)
511
+ print(embeddings.shape)
512
+ # [3, 768]
513
+
514
+ # Get the similarity scores for the embeddings
515
+ similarities = model.similarity(embeddings, embeddings)
516
+ print(similarities.shape)
517
+ # [3, 3]
518
+ ```
519
+
520
+ <!--
521
+ ### Direct Usage (Transformers)
522
+
523
+ <details><summary>Click to see the direct usage in Transformers</summary>
524
+
525
+ </details>
526
+ -->
527
+
528
+ <!--
529
+ ### Downstream Usage (Sentence Transformers)
530
+
531
+ You can finetune this model on your own dataset.
532
+
533
+ <details><summary>Click to expand</summary>
534
+
535
+ </details>
536
+ -->
537
+
538
+ <!--
539
+ ### Out-of-Scope Use
540
+
541
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
542
+ -->
543
+
544
+ ## Evaluation
545
+
546
+ ### Metrics
547
+
548
+ #### Information Retrieval
549
+
550
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
551
+
552
+ | Metric | Value |
553
+ |:--------------------|:-----------|
554
+ | cosine_accuracy@1 | 0.7136 |
555
+ | cosine_accuracy@3 | 0.9244 |
556
+ | cosine_accuracy@5 | 0.9589 |
557
+ | cosine_accuracy@10 | 0.9819 |
558
+ | cosine_precision@1 | 0.7136 |
559
+ | cosine_precision@3 | 0.3081 |
560
+ | cosine_precision@5 | 0.1918 |
561
+ | cosine_precision@10 | 0.0982 |
562
+ | cosine_recall@1 | 0.7136 |
563
+ | cosine_recall@3 | 0.9244 |
564
+ | cosine_recall@5 | 0.9589 |
565
+ | cosine_recall@10 | 0.9819 |
566
+ | **cosine_ndcg@10** | **0.8626** |
567
+ | cosine_mrr@10 | 0.8228 |
568
+ | cosine_map@100 | 0.8237 |
569
+
570
+ <!--
571
+ ## Bias, Risks and Limitations
572
+
573
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
574
+ -->
575
+
576
+ <!--
577
+ ### Recommendations
578
+
579
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
580
+ -->
581
+
582
+ ## Training Details
583
+
584
+ ### Training Dataset
585
+
586
+ #### Unnamed Dataset
587
+
588
+ * Size: 46,338 training samples
589
+ * Columns: <code>query_text</code> and <code>doc_text</code>
590
+ * Approximate statistics based on the first 1000 samples:
591
+ | | query_text | doc_text |
592
+ |:--------|:-----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
593
+ | type | string | string |
594
+ | details | <ul><li>min: 9 tokens</li><li>mean: 39.44 tokens</li><li>max: 311 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 233.15 tokens</li><li>max: 1900 tokens</li></ul> |
595
+ * Samples:
596
+ | query_text | doc_text |
597
+ |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
598
+ | <code>The regulation's applicability extends to various stakeholders involved in AI systems, including providers, deployers, importers, and manufacturers, regardless of their location. It specifically addresses high-risk AI systems and outlines the limitations of its scope, particularly concerning national security and military applications. Additionally, it clarifies that it does not interfere with the responsibilities of member states regarding national security or the operations of public authorities and international organizations in specific contexts.</code> | <code>(180) The European Data Protection Supervisor and the European Data Protection Board were consulted in accordance with Article 42(1) and (2) of Regulation (EU) 2018/1725 and delivered their joint opinion on 18 June 2021,<br><br>HAVE ADOPTED THIS REGULATION:<br><br>CHAPTER I<br><br>GENERAL PROVISIONS<br><br>Article 1<br><br>Subject matter`<br><br>1. The purpose of this Regulation is to improve the functioning of the internal market and promote the uptake of human-centric and trustworthy artificial intelligence (AI), while ensuring a high level of protection of health, safety, fundamental rights enshrined in the Charter, including democracy, the rule of law and environmental protection, against the harmful effects of AI systems in the Union and supporting innovation.<br><br>2. This Regulation lays down:<br><br>(a) harmonised rules for the placing on the market, the putting into service, and the use of AI systems in the Union; (b) prohibitions of certain AI practices; --- --- (c) specific requirements for high-risk AI systems and oblig...</code> |
599
+ | <code>How should loans with unknown use of proceeds be allocated in terms of sectors and alignment metrics?</code> | <code>instruments. For loans whose use of proceeds is known, the value shall be included for the relevant sector and alignment metric. For loans whose use of proceeds is unknown, the gross carrying amount of the exposure shall be allocated to the relevant sectors and alignment metrics based on the counterparties’ activity distribution, including by counterparties’ turnover by activity. Institutions shall add a row in the template for each relevant combination of sectors disclosed in column (b) and alignment metrics included in column (d). ---|--- (f) | Column (f): the point in time distance of the column (d) metric(s) to the 2030 data points of the Net Zero Emissions by 2050 Scenario (NZE2050), shall be expressed in percentage points. That</code> |
600
+ | <code>What measures must AIFMs implement to ensure they do not rely solely on credit ratings for assessing the creditworthiness of AIFs' assets?</code> | <code>▼M1<br><br>The measures specifying the risk-management systems referred to in point (a) of the first subparagraph shall ensure that the AIFMs are prevented from relying solely or mechanistically on credit ratings, as referred to in the first subparagraph of paragraph 2, for assessing the creditworthiness of the AIFs’ assets.<br><br>▼B<br><br>Article 16<br><br>Liquidity management<br><br>1.<br><br>AIFMs shall, for each AIF that they manage which is not an unleveraged closed- ended AIF, employ an appropriate liquidity management system and adopt procedures which enable them to monitor the liquidity risk of the AIF and to ensure that the liquidity profile of the investments of the AIF complies with its underlying obligations.</code> |
601
+ * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
602
+ ```json
603
+ {
604
+ "loss": "MultipleNegativesRankingLoss",
605
+ "matryoshka_dims": [
606
+ 768,
607
+ 512,
608
+ 256,
609
+ 128,
610
+ 64
611
+ ],
612
+ "matryoshka_weights": [
613
+ 1,
614
+ 1,
615
+ 1,
616
+ 1,
617
+ 1
618
+ ],
619
+ "n_dims_per_step": -1
620
+ }
621
+ ```
622
+
623
+ ### Training Hyperparameters
624
+ #### Non-Default Hyperparameters
625
+
626
+ - `eval_strategy`: steps
627
+ - `learning_rate`: 2e-05
628
+ - `num_train_epochs`: 4
629
+ - `warmup_ratio`: 0.1
630
+ - `fp16`: True
631
+ - `load_best_model_at_end`: True
632
+
633
+ #### All Hyperparameters
634
+ <details><summary>Click to expand</summary>
635
+
636
+ - `overwrite_output_dir`: False
637
+ - `do_predict`: False
638
+ - `eval_strategy`: steps
639
+ - `prediction_loss_only`: True
640
+ - `per_device_train_batch_size`: 8
641
+ - `per_device_eval_batch_size`: 8
642
+ - `per_gpu_train_batch_size`: None
643
+ - `per_gpu_eval_batch_size`: None
644
+ - `gradient_accumulation_steps`: 1
645
+ - `eval_accumulation_steps`: None
646
+ - `torch_empty_cache_steps`: None
647
+ - `learning_rate`: 2e-05
648
+ - `weight_decay`: 0.0
649
+ - `adam_beta1`: 0.9
650
+ - `adam_beta2`: 0.999
651
+ - `adam_epsilon`: 1e-08
652
+ - `max_grad_norm`: 1.0
653
+ - `num_train_epochs`: 4
654
+ - `max_steps`: -1
655
+ - `lr_scheduler_type`: linear
656
+ - `lr_scheduler_kwargs`: {}
657
+ - `warmup_ratio`: 0.1
658
+ - `warmup_steps`: 0
659
+ - `log_level`: passive
660
+ - `log_level_replica`: warning
661
+ - `log_on_each_node`: True
662
+ - `logging_nan_inf_filter`: True
663
+ - `save_safetensors`: True
664
+ - `save_on_each_node`: False
665
+ - `save_only_model`: False
666
+ - `restore_callback_states_from_checkpoint`: False
667
+ - `no_cuda`: False
668
+ - `use_cpu`: False
669
+ - `use_mps_device`: False
670
+ - `seed`: 42
671
+ - `data_seed`: None
672
+ - `jit_mode_eval`: False
673
+ - `use_ipex`: False
674
+ - `bf16`: False
675
+ - `fp16`: True
676
+ - `fp16_opt_level`: O1
677
+ - `half_precision_backend`: auto
678
+ - `bf16_full_eval`: False
679
+ - `fp16_full_eval`: False
680
+ - `tf32`: None
681
+ - `local_rank`: 0
682
+ - `ddp_backend`: None
683
+ - `tpu_num_cores`: None
684
+ - `tpu_metrics_debug`: False
685
+ - `debug`: []
686
+ - `dataloader_drop_last`: False
687
+ - `dataloader_num_workers`: 0
688
+ - `dataloader_prefetch_factor`: None
689
+ - `past_index`: -1
690
+ - `disable_tqdm`: False
691
+ - `remove_unused_columns`: True
692
+ - `label_names`: None
693
+ - `load_best_model_at_end`: True
694
+ - `ignore_data_skip`: False
695
+ - `fsdp`: []
696
+ - `fsdp_min_num_params`: 0
697
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
698
+ - `fsdp_transformer_layer_cls_to_wrap`: None
699
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
700
+ - `deepspeed`: None
701
+ - `label_smoothing_factor`: 0.0
702
+ - `optim`: adamw_torch
703
+ - `optim_args`: None
704
+ - `adafactor`: False
705
+ - `group_by_length`: False
706
+ - `length_column_name`: length
707
+ - `ddp_find_unused_parameters`: None
708
+ - `ddp_bucket_cap_mb`: None
709
+ - `ddp_broadcast_buffers`: False
710
+ - `dataloader_pin_memory`: True
711
+ - `dataloader_persistent_workers`: False
712
+ - `skip_memory_metrics`: True
713
+ - `use_legacy_prediction_loop`: False
714
+ - `push_to_hub`: False
715
+ - `resume_from_checkpoint`: None
716
+ - `hub_model_id`: None
717
+ - `hub_strategy`: every_save
718
+ - `hub_private_repo`: None
719
+ - `hub_always_push`: False
720
+ - `gradient_checkpointing`: False
721
+ - `gradient_checkpointing_kwargs`: None
722
+ - `include_inputs_for_metrics`: False
723
+ - `include_for_metrics`: []
724
+ - `eval_do_concat_batches`: True
725
+ - `fp16_backend`: auto
726
+ - `push_to_hub_model_id`: None
727
+ - `push_to_hub_organization`: None
728
+ - `mp_parameters`:
729
+ - `auto_find_batch_size`: False
730
+ - `full_determinism`: False
731
+ - `torchdynamo`: None
732
+ - `ray_scope`: last
733
+ - `ddp_timeout`: 1800
734
+ - `torch_compile`: False
735
+ - `torch_compile_backend`: None
736
+ - `torch_compile_mode`: None
737
+ - `dispatch_batches`: None
738
+ - `split_batches`: None
739
+ - `include_tokens_per_second`: False
740
+ - `include_num_input_tokens_seen`: False
741
+ - `neftune_noise_alpha`: None
742
+ - `optim_target_modules`: None
743
+ - `batch_eval_metrics`: False
744
+ - `eval_on_start`: False
745
+ - `use_liger_kernel`: False
746
+ - `eval_use_gather_object`: False
747
+ - `average_tokens_across_devices`: False
748
+ - `prompts`: None
749
+ - `batch_sampler`: batch_sampler
750
+ - `multi_dataset_batch_sampler`: proportional
751
+
752
+ </details>
753
+
754
+ ### Training Logs
755
+ | Epoch | Step | Training Loss | cosine_ndcg@10 |
756
+ |:----------:|:--------:|:-------------:|:--------------:|
757
+ | -1 | -1 | - | 0.7763 |
758
+ | 0.0863 | 500 | 0.2343 | - |
759
+ | **0.1726** | **1000** | **0.1259** | **0.814** |
760
+ | 0.2589 | 1500 | 0.1027 | - |
761
+ | 0.3452 | 2000 | 0.0757 | 0.8288 |
762
+ | 0.4316 | 2500 | 0.0617 | - |
763
+ | 0.5179 | 3000 | 0.0651 | 0.8288 |
764
+ | 0.6042 | 3500 | 0.0863 | - |
765
+ | 0.6905 | 4000 | 0.06 | 0.8376 |
766
+ | 0.7768 | 4500 | 0.0579 | - |
767
+ | 0.8631 | 5000 | 0.0593 | 0.8342 |
768
+ | 0.9494 | 5500 | 0.0485 | - |
769
+ | 1.0357 | 6000 | 0.0465 | 0.8384 |
770
+ | 1.1220 | 6500 | 0.0276 | - |
771
+ | 1.2084 | 7000 | 0.0353 | 0.8392 |
772
+ | 1.2947 | 7500 | 0.0335 | - |
773
+ | 1.3810 | 8000 | 0.0292 | 0.8436 |
774
+ | 1.4673 | 8500 | 0.0276 | - |
775
+ | 1.5536 | 9000 | 0.0404 | 0.8485 |
776
+ | 1.6399 | 9500 | 0.0476 | - |
777
+ | 1.7262 | 10000 | 0.0265 | 0.8601 |
778
+ | 1.8125 | 10500 | 0.017 | - |
779
+ | 1.8988 | 11000 | 0.0217 | 0.8549 |
780
+ | 1.9852 | 11500 | 0.0329 | - |
781
+ | 2.0715 | 12000 | 0.0207 | 0.8577 |
782
+ | 2.1578 | 12500 | 0.0199 | - |
783
+ | 2.2441 | 13000 | 0.015 | 0.8544 |
784
+ | 2.3304 | 13500 | 0.0143 | - |
785
+ | 2.4167 | 14000 | 0.0117 | 0.8574 |
786
+ | 2.5030 | 14500 | 0.0204 | - |
787
+ | 2.5893 | 15000 | 0.0141 | 0.8595 |
788
+ | 2.6756 | 15500 | 0.0123 | - |
789
+ | 2.7620 | 16000 | 0.0211 | 0.8538 |
790
+ | 2.8483 | 16500 | 0.0207 | - |
791
+ | 2.9346 | 17000 | 0.0134 | 0.8562 |
792
+ | 3.0209 | 17500 | 0.0276 | - |
793
+ | 3.1072 | 18000 | 0.0106 | 0.8552 |
794
+ | 3.1935 | 18500 | 0.0129 | - |
795
+ | 3.2798 | 19000 | 0.0157 | 0.8582 |
796
+ | 3.3661 | 19500 | 0.0164 | - |
797
+ | 3.4524 | 20000 | 0.0192 | 0.8614 |
798
+ | 3.5388 | 20500 | 0.0138 | - |
799
+ | 3.6251 | 21000 | 0.0141 | 0.8601 |
800
+ | 3.7114 | 21500 | 0.0109 | - |
801
+ | 3.7977 | 22000 | 0.0178 | 0.8605 |
802
+ | 3.8840 | 22500 | 0.0088 | - |
803
+ | 3.9703 | 23000 | 0.0255 | 0.8626 |
804
+
805
+ * The bold row denotes the saved checkpoint.
806
+
807
+ ### Framework Versions
808
+ - Python: 3.10.15
809
+ - Sentence Transformers: 4.0.2
810
+ - Transformers: 4.49.0
811
+ - PyTorch: 2.6.0+cu126
812
+ - Accelerate: 0.26.0
813
+ - Datasets: 3.5.0
814
+ - Tokenizers: 0.21.1
815
+
816
+ ## Citation
817
+
818
+ ### BibTeX
819
+
820
+ #### Sentence Transformers
821
+ ```bibtex
822
+ @inproceedings{reimers-2019-sentence-bert,
823
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
824
+ author = "Reimers, Nils and Gurevych, Iryna",
825
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
826
+ month = "11",
827
+ year = "2019",
828
+ publisher = "Association for Computational Linguistics",
829
+ url = "https://arxiv.org/abs/1908.10084",
830
+ }
831
+ ```
832
+
833
+ #### MatryoshkaLoss
834
+ ```bibtex
835
+ @misc{kusupati2024matryoshka,
836
+ title={Matryoshka Representation Learning},
837
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
838
+ year={2024},
839
+ eprint={2205.13147},
840
+ archivePrefix={arXiv},
841
+ primaryClass={cs.LG}
842
+ }
843
+ ```
844
+
845
+ #### MultipleNegativesRankingLoss
846
+ ```bibtex
847
+ @misc{henderson2017efficient,
848
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
849
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
850
+ year={2017},
851
+ eprint={1705.00652},
852
+ archivePrefix={arXiv},
853
+ primaryClass={cs.CL}
854
+ }
855
+ ```
856
+
857
+ <!--
858
+ ## Glossary
859
+
860
+ *Clearly define terms in order to be accessible across audiences.*
861
+ -->
862
+
863
+ <!--
864
+ ## Model Card Authors
865
+
866
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
867
+ -->
868
+
869
+ <!--
870
+ ## Model Card Contact
871
+
872
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
873
+ -->
checkpoint-1000/1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
checkpoint-1000/README.md ADDED
@@ -0,0 +1,828 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:46338
8
+ - loss:MatryoshkaLoss
9
+ - loss:MultipleNegativesRankingLoss
10
+ base_model: Snowflake/snowflake-arctic-embed-m-v2.0
11
+ widget:
12
+ - source_sentence: What role does ESMA play in the development of guidelines and regulatory
13
+ technical standards related to cooperation arrangements with third countries as
14
+ mentioned in the text?
15
+ sentences:
16
+ - 'If a planned change is implemented notwithstanding the first and second subparagraphs,
17
+ or if an unplanned change has taken place pursuant to which the AIFM’s management
18
+ of the AIF would no longer comply with this Directive or the AIFM otherwise would
19
+ no longer comply with this Directive, the competent authorities of the home Member
20
+ State of the AIFM shall take all due measures in accordance with Article 46, including,
21
+ if necessary, the express prohibition of marketing of the AIF.
22
+
23
+
24
+ If the changes are acceptable because they do not affect the compliance of the
25
+ AIFM’s management of the AIF with this Directive, or the compliance by the AIFM
26
+ with this Directive otherwise, the competent authorities of the home Member State
27
+ of the AIFM shall, without delay, inform ESMA in so far as the changes concern
28
+ the termination of the marketing of certain AIFs or additional AIFs marketed and,
29
+ if applicable, the competent authorities of the host Member States of the AIFM
30
+ of those changes.
31
+
32
+
33
+ 11.
34
+
35
+
36
+ The Commission shall adopt, by means of delegated acts in accordance with Article
37
+ 56 and subject to the conditions of Articles 57 and 58, measures regarding the
38
+ cooperation arrangements referred to in point (a) of paragraph 2 in order to design
39
+ a common framework to facilitate the establishment of those cooperation arrangements
40
+ with third countries.
41
+
42
+
43
+ 12.
44
+
45
+
46
+ In order to ensure uniform application of this Article, ESMA may develop guidelines
47
+ to determine the conditions of application of the measures adopted by the Commission
48
+ regarding the cooperation arrangements referred to in point (a) of paragraph 2.
49
+
50
+
51
+ 13.
52
+
53
+
54
+ ESMA shall develop draft regulatory technical standards to determine the minimum
55
+ content of the cooperation arrangements referred to in point (a) of paragraph
56
+ 2 so as to ensure that both the competent authorities of the home and the host
57
+ Member States receive sufficient information in order to be able to exercise their
58
+ supervisory and investigatory powers under this Directive.
59
+
60
+
61
+ Power is delegated to the Commission to adopt the regulatory technical standards
62
+ referred to in the first subparagraph in accordance with Article 10 to 14 of Regulation
63
+ (EU) No 1095/2010.
64
+
65
+
66
+ 14.'
67
+ - (23) This Regulation should also apply to Union institutions, bodies, offices
68
+ and agencies when acting as a provider or deployer of an AI system.
69
+ - An operator that is a natural person or a microenterprise may mandate the next
70
+ operator or trader further down the supply chain that is not a natural person
71
+ or a microenterprise to act as an authorised representative. Such next operator
72
+ or trader further down the supply chain shall not place or make available relevant
73
+ products on the market or export them without submitting the due diligence statement
74
+ pursuant to Article 4(2) on behalf of that operator. In such cases, the operator
75
+ that is a natural person or a microenterprise shall retain responsibility for
76
+ compliance of the relevant product with Article 3, and shall communicate to that
77
+ next operator or trader further down the supply chain all information necessary
78
+ to confirm that due
79
+ - source_sentence: A review is scheduled for June 2019 to determine if the regulations
80
+ regarding hazardous substances should be broadened, based on practical experiences.
81
+ Additionally, the Commission aims to promote alternatives to animal testing by
82
+ reassessing testing requirements, potentially leading to amendments that prioritize
83
+ health and environmental safety.
84
+ sentences:
85
+ - '18 June 1994, until such plant and machinery is disposed of; (b) in the case
86
+ of the maintenance of plant and machinery already in service within a Member State
87
+ on 18 June 1994. For the purposes of point (a) Member States may, on grounds of
88
+ human health protection and environmental protection, prohibit within their territory
89
+ the use of such plant or machinery before it is disposed of. 25. Monomethyl-dichloro-diphenyl
90
+ methane Trade name: Ugilec 121 Ugilec 21 Shall not be placed on the market, or
91
+ used, as a substance or in mixtures. Articles containing the substance shall not
92
+ be placed on the market. 26. Monomethyl-dibromo-diphenyl methane bromobenzylbromotoluene,
93
+ mixture of isomers Trade name: DBBT CAS No 99688-47-8 Shall not be placed on'
94
+ - (35) | The fight against litter is a shared effort between competent authorities,
95
+ producers and consumers. Public authorities, including the Union institutions,
96
+ should lead by example.
97
+ - '7.
98
+
99
+
100
+ By 1 June 2013 the Commission shall carry out a review to assess whether or not,
101
+ taking into account latest developments in scientific knowledge, to extend the
102
+ scope of Article 60(3) to substances identified under Article 57(f) as having
103
+ endocrine disrupting properties. On the basis of that review the Commission may,
104
+ if appropriate, present legislative proposals.
105
+
106
+
107
+ 8.
108
+
109
+
110
+ By 1 June 2019, the Commission shall carry out a review to assess whether or not
111
+ to extend the scope of Article 33 to cover other dangerous substances, taking
112
+ into account the practical experience in implementing that Article. On the basis
113
+ of that review, the Commission may, if appropriate, present legislative proposals
114
+ to extend that obligation.
115
+
116
+
117
+ 9.
118
+
119
+
120
+ In accordance with the objective of promoting non-animal testing and the replacement,
121
+ reduction or refinement of animal testing required under this Regulation, the
122
+ Commission shall review the testing requirements of Section 8.7 of Annex VIII
123
+ by 1 June 2019. On the basis of this review, while ensuring a high level of protection
124
+ of health and the environment, the Commission may propose an amendment in accordance
125
+ with the procedure referred to in Article 133(4).
126
+
127
+
128
+ Article 139
129
+
130
+
131
+ Repeals
132
+
133
+
134
+ Directive 91/155/EEC shall be repealed.
135
+
136
+
137
+ Directives 93/105/EC and 2000/21/EC and Regulations (EEC) No 793/93 and (EC) No
138
+ 1488/94 shall be repealed with effect from 1 June 2008.
139
+
140
+
141
+ Directive 93/67/EEC shall be repealed with effect from 1 August 2008.
142
+
143
+
144
+ Directive 76/769/EEC shall be repealed with effect from 1 June 2009.
145
+
146
+
147
+ References to the repealed acts shall be construed as references to this Regulation.
148
+
149
+
150
+ Article 140
151
+
152
+
153
+ Amendment of Directive 1999/45/EC
154
+
155
+
156
+ Article 14 of Directive 1999/45/EC shall be deleted.
157
+
158
+
159
+ Article 141
160
+
161
+
162
+ Entry into force and application
163
+
164
+
165
+ 1.
166
+
167
+
168
+ This Regulation shall enter into force on 1 June 2007.
169
+
170
+
171
+ 2.
172
+
173
+
174
+ Titles II, III, V, VI, VII, XI and XII as well as Articles 128 and 136 shall apply
175
+ from 1 June 2008.
176
+
177
+
178
+ 3.
179
+
180
+
181
+ Article 135 shall apply from 1 August 2008.
182
+
183
+
184
+ 4.
185
+
186
+
187
+ Title VIII and Annex XVII shall apply from 1 June 2009.
188
+
189
+
190
+ This Regulation shall be binding in its entirety and directly applicable in all
191
+ Member States.
192
+
193
+
194
+ LIST OF ANNEXES
195
+
196
+
197
+ ANNEX I GENERAL PROVISIONS FOR ASSESSING SUBSTANCES AND PREPARING CHEMICAL SAFETY
198
+ REPORTS ANNEX II REQUIREMENTS FOR THE COMPILATION OF SAFETY DATA SHEETS ANNEX
199
+ III CRITERIA FOR SUBSTANCES REGISTERED IN QUANTITIES BETWEEN 1 AND 10 TONNES ANNEX
200
+ IV EXEMPTIONS FROM THE OBLIGATION TO REGISTER IN ACCORDANCE WITH ARTICLE 2(7)(a)
201
+ ANNEX V EXEMPTIONS FROM THE OBLIGATION TO REGISTER IN ACCORDANCE WITH ARTICLE
202
+ 2(7)(b) ANNEX VI INFORMATION REQUIREMENTS REFERRED TO IN ARTICLE 10 ANNEX VII
203
+ STANDARD INFORMATION REQUIREMENTS FOR SUBSTANCES MANUFACTURED OR IMPORTED IN QUANTITIES
204
+ OF ONE TONNE OR MORE ANNEX VIII STANDARD INFORMATION REQUIREMENTS FOR SUBSTANCES
205
+ MANUFACTURED OR IMPORTED IN QUANTITIES OF 10 TONNES OR MORE ANNEX IX STANDARD
206
+ INFORMATION REQUIREMENTS FOR SUBSTANCES MANUFACTURED OR IMPORTED IN QUANTITIES
207
+ OF 100 TONNES OR MORE ANNEX X STANDARD INFORMATION REQUIREMENTS FOR SUBSTANCES
208
+ MANUFACTURED OR IMPORTED IN QUANTITIES OF 1 000 TONNES OR MORE ANNEX XI GENERAL
209
+ RULES FOR ADAPTATION OF THE STANDARD TESTING REGIME SET OUT IN ANNEXES VII TO
210
+ X ANNEX XII GENERAL PROVISIONS FOR DOWNSTREAM USERS TO ASSESS SUBSTANCES AND PREPARE
211
+ CHEMICAL SAFETY REPORTS ANNEX XIII CRITERIA FOR THE IDENTIFICATION OF PERSISTENT,
212
+ BIOACCUMULATIVE AND TOXIC SUBSTANCES, AND VERY PERSISTENT AND VERY BIOACCUMULATIVE
213
+ SUBSTANCES ANNEX XIV LIST OF SUBSTANCES SUBJECT TO AUTHORISATION ANNEX XV DOSSIERS
214
+ ANNEX XVI SOCIO-ECONOMIC ANALYSIS ANNEX XVII RESTRICTIONS ON THE MANUFACTURE,
215
+ PLACING ON THE MARKET AND USE OF CERTAIN DANGEROUS SUBSTANCES, MIXTURES AND ARTICLES
216
+
217
+
218
+ ANNEX I
219
+
220
+
221
+ GENERAL PROVISIONS FOR ASSESSING SUBSTANCES AND PREPARING CHEMICAL SAFETY REPORTS
222
+
223
+
224
+ 0. INTRODUCTION
225
+
226
+
227
+ ▼M51'
228
+ - source_sentence: What actions must the Commission take if the economic operator
229
+ does not provide commitments or if the provided commitments are deemed inappropriate
230
+ or insufficient to address the distortion?
231
+ sentences:
232
+ - '2.
233
+
234
+
235
+ Where the economic operator concerned does not offer commitments or where the
236
+ Commission considers that the commitments referred to in paragraph 1 are neither
237
+ appropriate nor sufficient to fully and effectively remedy the distortion, the
238
+ Commission shall adopt an implementing act in the form of a decision prohibiting
239
+ the award of the contract to the economic operator concerned (‘decision prohibiting
240
+ the award of the contract’). That implementing act shall be adopted in accordance
241
+ with the advisory procedure referred to in Article 48(2). Following that decision,
242
+ the contracting authority or contracting entity shall reject the tender.
243
+
244
+
245
+ 3.'
246
+ - 6,5 8,9 (1) The values for biogas production from manure include negative emissions
247
+ for emissions saved from raw manure management. The value of esca considered is
248
+ equal to – 45 g CO2eq/MJ manure used in anaerobic digestion. (2) Maize whole
249
+ plant means maize harvested as fodder and ensiled for preservation. (3) Transport
250
+ of agricultural raw materials to the transformation plant is, according to the
251
+ methodology provided in the Commission's report of 25 February 2010 on sustainability
252
+ requirements for the use of solid and gaseous biomass sources in electricity,
253
+ heating and cooling, included in the ‘cultivation’ value. The value for transport
254
+ of maize silage accounts for 0,4 g CO2eq/MJ biogas.
255
+ - reduction in the consumption of lightweight plastic carrier bags. It should be
256
+ possible for Member States, while observing the general rules laid down in the
257
+ TFEU and acting in accordance with this Regulation, to adopt provisions which
258
+ go beyond the minimum waste prevention targets set out in this Regulation. When
259
+ implementing such measures, Member States should be aware of the risk of a shift
260
+ from heavier to lighter packaging materials and should prioritise measures that
261
+ minimise that risk.
262
+ - source_sentence: The content provides a comprehensive overview of numerous chemical
263
+ substances, including their structural formulas and potential applications. It
264
+ emphasizes the significance of specific compounds like acrylamide and thioacetamide,
265
+ while also addressing mixtures derived from coal tar. The information reflects
266
+ the intricate nature of chemical synthesis and the importance of understanding
267
+ the properties and uses of these compounds in various industrial contexts.
268
+ sentences:
269
+ - '2.
270
+
271
+
272
+ Each Member State shall ensure that a producer as defined in Article 3(1)(f)(iv)
273
+ and established on its territory, which sells EEE to another Member State in which
274
+ it is not established, appoints an authorised representative in that Member State
275
+ as the person responsible for fulfilling the obligations of that producer, pursuant
276
+ to this Directive, on the territory of that Member State.
277
+
278
+
279
+ 3.
280
+
281
+
282
+ Appointment of an authorised representative shall be by written mandate.
283
+
284
+
285
+ Article 18
286
+
287
+
288
+ Administrative cooperation and exchange of information'
289
+ - '(a) display to customers and potential customers, in a visible manner, the labels
290
+ provided in accordance with Article 32(1), point (b) or (c); (b) make reference
291
+ to the information included on the labels provided in accordance with Article
292
+ 32(1), point (b) or (c), in visual advertisements or in technical promotional
293
+ material for a specific model, in accordance with the applicable delegated acts
294
+ adopted pursuant to Article 4; and --- --- (c) not provide or display other labels,
295
+ marks, symbols or inscriptions that are likely to mislead or confuse customers
296
+ and potential customers with regard to the information included on the label regarding
297
+ ecodesign requirements. --- ---
298
+
299
+
300
+ Article 32
301
+
302
+
303
+ Obligations related to labels'
304
+ - '[2] 612-196-00-0 202-441-6 [1] 221-627-8 [2] 95-69-2 [1] 3165-93-3 [2] ►M5 —
305
+ ◄ 2,4,5-Trimethylaniline [1] 2,4,5-trimethylaniline hydrochloride [2] 612-197-00-6
306
+ 205-282-0 [1] -[2] 137-17-7 [1] 21436-97-5 [2] ►M5 — ◄ 4,4''-Thiodianiline [1]
307
+ and its salts 612-198-00-1 205-370-9 [1] 139-65-1 [1] ►M5 — ◄ 4,4''-Oxydianiline
308
+ [1] and its salts p-Aminophenyl ether [1] 612-199-00-7 202-977-0 [1] 101-80-4
309
+ [1] ►M5 — ◄ 2,4-Diaminoanisole [1] 4-methoxy-m-phenylenediamine 2,4-diaminoanisole
310
+ sulphate [2] 612-200-00-0 210-406-1 [1] 254-323-9 [2] 615-05-4 [1] 39156-41-7
311
+ [2] N, N,N'',N''-tetramethyl-4,4''-methylendianiline 612-201-00-6 202-959-2 101-61-1
312
+ C.I. Basic Violet 3 with ≥ 0,1 % of Michler''s ketone (EC No 202-027-5) 612-205-00-8
313
+ 208-953-6 548-62-9 ►M5 — ◄ 6-Methoxy-m-toluidine p-cresidine 612-209-00-X 204-419-1
314
+ 120-71-8 ►M5 — ◄ [▼M14](./../../../legal-content/EN/AUTO/?uri=celex:32012R0109
315
+ "32012R0109: INSERTED") Biphenyl-3,3′,4,4′-tetrayltetraamine; Diaminobenzidine
316
+ 612-239-00-3 202-110-6 91-95-2 (2-chloroethyl)(3-hydroxypropyl)ammonium chloride
317
+ 612-246-00-1 429-740-6 40722-80-3 3-Amino-9-ethyl carbazole; 9-Ethylcarbazol-3-ylamine
318
+ 612-280-00-7 205-057-7 132-32-1 [▼M49](./../../../legal-content/EN/AUTO/?uri=celex:32018R0675
319
+ "32018R0675: INSERTED") Reaction products of paraformaldehyde and 2-hydroxypropylamine
320
+ (ratio 3:2); [formaldehyde released from 3,3′-methylenebis[5-methyloxazolidine];
321
+ formaldehyde released from oxazolidin]; [MBO] 612-290-00-1 — — Reaction products
322
+ of paraformaldehyde with 2-hydroxypropylamine (ratio 1:1); [formaldehyde released
323
+ from α,α,α-trimethyl-1,3,5-triazine-1,3,5(2H,4H,6H)-triethanol]; [HPT] 612-291-00-7
324
+ — — Methylhydrazine 612-292-00-2 200-471-4 60-34-4 [▼C1](./../../../legal-content/EN/AUTO/?uri=celex:32006R1907R%2801%29
325
+ "32006R1907R(01): REPLACED") Ethyleneimine; aziridine 613-001-00-1 205-793-9 151-56-4
326
+ 2-Methylaziridine; propyleneimine 613-033-00-6 200-878-7 75-55-8 ►M5 — ◄ Captafol
327
+ (ISO); 1,2,3,6-tetrahydro-N-(1,1,2,2-tetrachloroethylthio) phthalimide 613-046-00-7
328
+ 219-363-3 2425-06-1 Carbadox (INN); methyl 3-(quinoxalin-2-ylmethylene)carbazate
329
+ 1,4-dioxide; 2-(methoxycarbonylhydrazonomethyl) quinoxaline 1,4-dioxide 613-050-00-9
330
+ 229-879-0 6804-07-5 A mixture of: 1,3,5-tris(3-aminomethylphenyl)-1,3,5-(1H,3H,5H)-triazine-2,4,6-trione;
331
+ a mixture of oligomers of 3,5-bis(3-aminomethylphenyl)-1-poly[3,5-bis(3-aminomethylphenyl)-2,4,6-trioxo-1,3,5-(1H,3H,5H)-triazin-1-yl]-1,3,5-(1H,3H,5H)-triazine-2,4,6-trione
332
+ 613-199-00-X 421-550-1 — [▼M14](./../../../legal-content/EN/AUTO/?uri=celex:32012R0109
333
+ "32012R0109: INSERTED") Quinoline 613-281-00-5 202-051-6 91-22-5 [▼C1](./../../../legal-content/EN/AUTO/?uri=celex:32006R1907R%2801%29
334
+ "32006R1907R(01): REPLACED") Acrylamide 616-003-00-0 201-173-7 79-06-1 [▼M69](./../../../legal-content/EN/AUTO/?uri=celex:32021R2204
335
+ "32021R2204: INSERTED") Butanone oxime; ethyl methyl ketoxime; ethyl methyl ketone
336
+ oxime 616-014-00-0 202-496-6 96-29-7 [▼C1](./../../../legal-content/EN/AUTO/?uri=celex:32006R1907R%2801%29
337
+ "32006R1907R(01): REPLACED") Thioacetamide 616-026-00-6 200-541-4 62-55-5 A mixture
338
+ of: N-[3-hydroxy-2-(2-methylacryloylamino-methoxy)propoxymethyl]-2-methylacrylamide;
339
+ N-[2,3-Bis-(2-methylacryloylamino-methoxy)propoxymethyl]-2-methylacrylamide; methacrylamide;
340
+ 2-methyl-N-(2-methyl-acryloylaminomethoxymethyl)-acrylamide; N-2,3-dihydroxypropoxymethyl)-2-methylacrylamide
341
+ 616-057-00-5 412-790-8 — [▼M14](./../../../legal-content/EN/AUTO/?uri=celex:32012R0109
342
+ "32012R0109: INSERTED") N-[6,9-dihydro-9-[[2-hydroxy-1-(hydroxymethyl)ethoxy]methyl]-6-oxo-1H-purin-2-yl]acetamide
343
+ 616-148-00-X 424-550-1 84245-12-5 [▼M69](./../../../legal-content/EN/AUTO/?uri=celex:32021R2204
344
+ "32021R2204: INSERTED") N-(hydroxymethyl)acrylamide; methylolacrylamide; [NMA]
345
+ 616-230-00-5 213-103-2 924-42-5 [▼C1](./../../../legal-content/EN/AUTO/?uri=celex:32006R1907R%2801%29
346
+ "32006R1907R(01): REPLACED") Distillates (coal tar), benzole fraction; Light oil
347
+ (A complex combination of hydrocarbons obtained by the distillation of coal tar.
348
+ It consists of hydrocarbons having carbon numbers primarily in the range of C4
349
+ to C10 and distilling in the approximate range of 80 to 160 °C.) 648-001-00-0
350
+ 283-482-7 84650-02-2 Tar oils, brown-coal; Light oil (The distillate from lignite
351
+ tar boiling in the range of approximately 80 to 250 °C. Composed primarily of
352
+ aliphatic and aromatic hydrocarbons and monobasic phenols.) 648-002-00-6 302-674-4
353
+ 94114-40-6 J Benzol forerunnings (coal); Light oil redistillate, low boiling'
354
+ - source_sentence: How does the new Eurostat methodology differ in scope from the
355
+ indicators used in this Directive for calculating energy consumption?
356
+ sentences:
357
+ - (29) The methodology for calculation of primary energy consumption and final energy
358
+ consumption is aligned with the new Eurostat methodology, but the indicators used
359
+ for the purpose of this Directive have a different scope, in that they exclude
360
+ ambient energy and include energy consumption in international aviation for the
361
+ targets in primary energy consumption and final energy consumption. The use of
362
+ new indicators also implies that any changes in energy consumption of blast furnaces
363
+ are now only reflected in primary energy consumption.
364
+ - (92) InvestEU is the Union flagship programme to boost investment, especially
365
+ the green and digital transition, by providing financing and technical assistance,
366
+ for instance through blending mechanisms. Such an approach contributes to crowd
367
+ in additional public and private capital. Moreover, Member States are encouraged
368
+ to contribute to the InvestEU Member State compartment to support financial products
369
+ available to net-zero technology manufacturing, without prejudice to applicable
370
+ State aid rules.
371
+ - be used, filled or transported through the system; --- --- (iii) specify the terms
372
+ and conditions for proper handling and packaging use; --- --- (iv) specify detailed
373
+ requirements for packaging reconditioning; --- --- (v) specify the requirements
374
+ for packaging collection; --- --- (vi) specify the requirements for packaging
375
+ storage; --- --- (vii) specify the requirements for packaging filling or uploading;
376
+ --- --- (viii) specify rules to ensure the effective and efficient collection
377
+ of reusable packaging, including by providing for incentives for end users to
378
+ return the packaging to the collection points or grouped collection system; ---
379
+ --- (ix) specify rules to ensure equal and fair access to the re-use system, including
380
+ for vulnerable
381
+ pipeline_tag: sentence-similarity
382
+ library_name: sentence-transformers
383
+ metrics:
384
+ - cosine_accuracy@1
385
+ - cosine_accuracy@3
386
+ - cosine_accuracy@5
387
+ - cosine_accuracy@10
388
+ - cosine_precision@1
389
+ - cosine_precision@3
390
+ - cosine_precision@5
391
+ - cosine_precision@10
392
+ - cosine_recall@1
393
+ - cosine_recall@3
394
+ - cosine_recall@5
395
+ - cosine_recall@10
396
+ - cosine_ndcg@10
397
+ - cosine_mrr@10
398
+ - cosine_map@100
399
+ model-index:
400
+ - name: SentenceTransformer based on Snowflake/snowflake-arctic-embed-m-v2.0
401
+ results:
402
+ - task:
403
+ type: information-retrieval
404
+ name: Information Retrieval
405
+ dataset:
406
+ name: Unknown
407
+ type: unknown
408
+ metrics:
409
+ - type: cosine_accuracy@1
410
+ value: 0.6452615225271879
411
+ name: Cosine Accuracy@1
412
+ - type: cosine_accuracy@3
413
+ value: 0.8750215777662697
414
+ name: Cosine Accuracy@3
415
+ - type: cosine_accuracy@5
416
+ value: 0.923701018470568
417
+ name: Cosine Accuracy@5
418
+ - type: cosine_accuracy@10
419
+ value: 0.9582254445019851
420
+ name: Cosine Accuracy@10
421
+ - type: cosine_precision@1
422
+ value: 0.6452615225271879
423
+ name: Cosine Precision@1
424
+ - type: cosine_precision@3
425
+ value: 0.2916738592554232
426
+ name: Cosine Precision@3
427
+ - type: cosine_precision@5
428
+ value: 0.18474020369411356
429
+ name: Cosine Precision@5
430
+ - type: cosine_precision@10
431
+ value: 0.0958225444501985
432
+ name: Cosine Precision@10
433
+ - type: cosine_recall@1
434
+ value: 0.6452615225271879
435
+ name: Cosine Recall@1
436
+ - type: cosine_recall@3
437
+ value: 0.8750215777662697
438
+ name: Cosine Recall@3
439
+ - type: cosine_recall@5
440
+ value: 0.923701018470568
441
+ name: Cosine Recall@5
442
+ - type: cosine_recall@10
443
+ value: 0.9582254445019851
444
+ name: Cosine Recall@10
445
+ - type: cosine_ndcg@10
446
+ value: 0.8140391807550109
447
+ name: Cosine Ndcg@10
448
+ - type: cosine_mrr@10
449
+ value: 0.7664228721582445
450
+ name: Cosine Mrr@10
451
+ - type: cosine_map@100
452
+ value: 0.7683024440426897
453
+ name: Cosine Map@100
454
+ ---
455
+
456
+ # SentenceTransformer based on Snowflake/snowflake-arctic-embed-m-v2.0
457
+
458
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Snowflake/snowflake-arctic-embed-m-v2.0](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v2.0). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
459
+
460
+ ## Model Details
461
+
462
+ ### Model Description
463
+ - **Model Type:** Sentence Transformer
464
+ - **Base model:** [Snowflake/snowflake-arctic-embed-m-v2.0](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v2.0) <!-- at revision 5d1bbbdf0d1c2772eff7961f4cdc32b8426dac69 -->
465
+ - **Maximum Sequence Length:** 8192 tokens
466
+ - **Output Dimensionality:** 768 dimensions
467
+ - **Similarity Function:** Cosine Similarity
468
+ <!-- - **Training Dataset:** Unknown -->
469
+ <!-- - **Language:** Unknown -->
470
+ <!-- - **License:** Unknown -->
471
+
472
+ ### Model Sources
473
+
474
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
475
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
476
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
477
+
478
+ ### Full Model Architecture
479
+
480
+ ```
481
+ SentenceTransformer(
482
+ (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: GteModel
483
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
484
+ (2): Normalize()
485
+ )
486
+ ```
487
+
488
+ ## Usage
489
+
490
+ ### Direct Usage (Sentence Transformers)
491
+
492
+ First install the Sentence Transformers library:
493
+
494
+ ```bash
495
+ pip install -U sentence-transformers
496
+ ```
497
+
498
+ Then you can load this model and run inference.
499
+ ```python
500
+ from sentence_transformers import SentenceTransformer
501
+
502
+ # Download from the 🤗 Hub
503
+ model = SentenceTransformer("sentence_transformers_model_id")
504
+ # Run inference
505
+ sentences = [
506
+ 'How does the new Eurostat methodology differ in scope from the indicators used in this Directive for calculating energy consumption?',
507
+ '(29) The methodology for calculation of primary energy consumption and final energy consumption is aligned with the new Eurostat methodology, but the indicators used for the purpose of this Directive have a different scope, in that they exclude ambient energy and include energy consumption in international aviation for the targets in primary energy consumption and final energy consumption. The use of new indicators also implies that any changes in energy consumption of blast furnaces are now only reflected in primary energy consumption.',
508
+ '(92) InvestEU is the Union flagship programme to boost investment, especially the green and digital transition, by providing financing and technical assistance, for instance through blending mechanisms. Such an approach contributes to crowd in additional public and private capital. Moreover, Member States are encouraged to contribute to the InvestEU Member State compartment to support financial products available to net-zero technology manufacturing, without prejudice to applicable State aid rules.',
509
+ ]
510
+ embeddings = model.encode(sentences)
511
+ print(embeddings.shape)
512
+ # [3, 768]
513
+
514
+ # Get the similarity scores for the embeddings
515
+ similarities = model.similarity(embeddings, embeddings)
516
+ print(similarities.shape)
517
+ # [3, 3]
518
+ ```
519
+
520
+ <!--
521
+ ### Direct Usage (Transformers)
522
+
523
+ <details><summary>Click to see the direct usage in Transformers</summary>
524
+
525
+ </details>
526
+ -->
527
+
528
+ <!--
529
+ ### Downstream Usage (Sentence Transformers)
530
+
531
+ You can finetune this model on your own dataset.
532
+
533
+ <details><summary>Click to expand</summary>
534
+
535
+ </details>
536
+ -->
537
+
538
+ <!--
539
+ ### Out-of-Scope Use
540
+
541
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
542
+ -->
543
+
544
+ ## Evaluation
545
+
546
+ ### Metrics
547
+
548
+ #### Information Retrieval
549
+
550
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
551
+
552
+ | Metric | Value |
553
+ |:--------------------|:----------|
554
+ | cosine_accuracy@1 | 0.6453 |
555
+ | cosine_accuracy@3 | 0.875 |
556
+ | cosine_accuracy@5 | 0.9237 |
557
+ | cosine_accuracy@10 | 0.9582 |
558
+ | cosine_precision@1 | 0.6453 |
559
+ | cosine_precision@3 | 0.2917 |
560
+ | cosine_precision@5 | 0.1847 |
561
+ | cosine_precision@10 | 0.0958 |
562
+ | cosine_recall@1 | 0.6453 |
563
+ | cosine_recall@3 | 0.875 |
564
+ | cosine_recall@5 | 0.9237 |
565
+ | cosine_recall@10 | 0.9582 |
566
+ | **cosine_ndcg@10** | **0.814** |
567
+ | cosine_mrr@10 | 0.7664 |
568
+ | cosine_map@100 | 0.7683 |
569
+
570
+ <!--
571
+ ## Bias, Risks and Limitations
572
+
573
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
574
+ -->
575
+
576
+ <!--
577
+ ### Recommendations
578
+
579
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
580
+ -->
581
+
582
+ ## Training Details
583
+
584
+ ### Training Dataset
585
+
586
+ #### Unnamed Dataset
587
+
588
+ * Size: 46,338 training samples
589
+ * Columns: <code>query_text</code> and <code>doc_text</code>
590
+ * Approximate statistics based on the first 1000 samples:
591
+ | | query_text | doc_text |
592
+ |:--------|:-----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
593
+ | type | string | string |
594
+ | details | <ul><li>min: 9 tokens</li><li>mean: 39.44 tokens</li><li>max: 311 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 233.15 tokens</li><li>max: 1900 tokens</li></ul> |
595
+ * Samples:
596
+ | query_text | doc_text |
597
+ |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
598
+ | <code>The regulation's applicability extends to various stakeholders involved in AI systems, including providers, deployers, importers, and manufacturers, regardless of their location. It specifically addresses high-risk AI systems and outlines the limitations of its scope, particularly concerning national security and military applications. Additionally, it clarifies that it does not interfere with the responsibilities of member states regarding national security or the operations of public authorities and international organizations in specific contexts.</code> | <code>(180) The European Data Protection Supervisor and the European Data Protection Board were consulted in accordance with Article 42(1) and (2) of Regulation (EU) 2018/1725 and delivered their joint opinion on 18 June 2021,<br><br>HAVE ADOPTED THIS REGULATION:<br><br>CHAPTER I<br><br>GENERAL PROVISIONS<br><br>Article 1<br><br>Subject matter`<br><br>1. The purpose of this Regulation is to improve the functioning of the internal market and promote the uptake of human-centric and trustworthy artificial intelligence (AI), while ensuring a high level of protection of health, safety, fundamental rights enshrined in the Charter, including democracy, the rule of law and environmental protection, against the harmful effects of AI systems in the Union and supporting innovation.<br><br>2. This Regulation lays down:<br><br>(a) harmonised rules for the placing on the market, the putting into service, and the use of AI systems in the Union; (b) prohibitions of certain AI practices; --- --- (c) specific requirements for high-risk AI systems and oblig...</code> |
599
+ | <code>How should loans with unknown use of proceeds be allocated in terms of sectors and alignment metrics?</code> | <code>instruments. For loans whose use of proceeds is known, the value shall be included for the relevant sector and alignment metric. For loans whose use of proceeds is unknown, the gross carrying amount of the exposure shall be allocated to the relevant sectors and alignment metrics based on the counterparties’ activity distribution, including by counterparties’ turnover by activity. Institutions shall add a row in the template for each relevant combination of sectors disclosed in column (b) and alignment metrics included in column (d). ---|--- (f) | Column (f): the point in time distance of the column (d) metric(s) to the 2030 data points of the Net Zero Emissions by 2050 Scenario (NZE2050), shall be expressed in percentage points. That</code> |
600
+ | <code>What measures must AIFMs implement to ensure they do not rely solely on credit ratings for assessing the creditworthiness of AIFs' assets?</code> | <code>▼M1<br><br>The measures specifying the risk-management systems referred to in point (a) of the first subparagraph shall ensure that the AIFMs are prevented from relying solely or mechanistically on credit ratings, as referred to in the first subparagraph of paragraph 2, for assessing the creditworthiness of the AIFs’ assets.<br><br>▼B<br><br>Article 16<br><br>Liquidity management<br><br>1.<br><br>AIFMs shall, for each AIF that they manage which is not an unleveraged closed- ended AIF, employ an appropriate liquidity management system and adopt procedures which enable them to monitor the liquidity risk of the AIF and to ensure that the liquidity profile of the investments of the AIF complies with its underlying obligations.</code> |
601
+ * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
602
+ ```json
603
+ {
604
+ "loss": "MultipleNegativesRankingLoss",
605
+ "matryoshka_dims": [
606
+ 768,
607
+ 512,
608
+ 256,
609
+ 128,
610
+ 64
611
+ ],
612
+ "matryoshka_weights": [
613
+ 1,
614
+ 1,
615
+ 1,
616
+ 1,
617
+ 1
618
+ ],
619
+ "n_dims_per_step": -1
620
+ }
621
+ ```
622
+
623
+ ### Training Hyperparameters
624
+ #### Non-Default Hyperparameters
625
+
626
+ - `eval_strategy`: steps
627
+ - `learning_rate`: 2e-05
628
+ - `num_train_epochs`: 4
629
+ - `warmup_ratio`: 0.1
630
+ - `fp16`: True
631
+ - `load_best_model_at_end`: True
632
+
633
+ #### All Hyperparameters
634
+ <details><summary>Click to expand</summary>
635
+
636
+ - `overwrite_output_dir`: False
637
+ - `do_predict`: False
638
+ - `eval_strategy`: steps
639
+ - `prediction_loss_only`: True
640
+ - `per_device_train_batch_size`: 8
641
+ - `per_device_eval_batch_size`: 8
642
+ - `per_gpu_train_batch_size`: None
643
+ - `per_gpu_eval_batch_size`: None
644
+ - `gradient_accumulation_steps`: 1
645
+ - `eval_accumulation_steps`: None
646
+ - `torch_empty_cache_steps`: None
647
+ - `learning_rate`: 2e-05
648
+ - `weight_decay`: 0.0
649
+ - `adam_beta1`: 0.9
650
+ - `adam_beta2`: 0.999
651
+ - `adam_epsilon`: 1e-08
652
+ - `max_grad_norm`: 1.0
653
+ - `num_train_epochs`: 4
654
+ - `max_steps`: -1
655
+ - `lr_scheduler_type`: linear
656
+ - `lr_scheduler_kwargs`: {}
657
+ - `warmup_ratio`: 0.1
658
+ - `warmup_steps`: 0
659
+ - `log_level`: passive
660
+ - `log_level_replica`: warning
661
+ - `log_on_each_node`: True
662
+ - `logging_nan_inf_filter`: True
663
+ - `save_safetensors`: True
664
+ - `save_on_each_node`: False
665
+ - `save_only_model`: False
666
+ - `restore_callback_states_from_checkpoint`: False
667
+ - `no_cuda`: False
668
+ - `use_cpu`: False
669
+ - `use_mps_device`: False
670
+ - `seed`: 42
671
+ - `data_seed`: None
672
+ - `jit_mode_eval`: False
673
+ - `use_ipex`: False
674
+ - `bf16`: False
675
+ - `fp16`: True
676
+ - `fp16_opt_level`: O1
677
+ - `half_precision_backend`: auto
678
+ - `bf16_full_eval`: False
679
+ - `fp16_full_eval`: False
680
+ - `tf32`: None
681
+ - `local_rank`: 0
682
+ - `ddp_backend`: None
683
+ - `tpu_num_cores`: None
684
+ - `tpu_metrics_debug`: False
685
+ - `debug`: []
686
+ - `dataloader_drop_last`: False
687
+ - `dataloader_num_workers`: 0
688
+ - `dataloader_prefetch_factor`: None
689
+ - `past_index`: -1
690
+ - `disable_tqdm`: False
691
+ - `remove_unused_columns`: True
692
+ - `label_names`: None
693
+ - `load_best_model_at_end`: True
694
+ - `ignore_data_skip`: False
695
+ - `fsdp`: []
696
+ - `fsdp_min_num_params`: 0
697
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
698
+ - `fsdp_transformer_layer_cls_to_wrap`: None
699
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
700
+ - `deepspeed`: None
701
+ - `label_smoothing_factor`: 0.0
702
+ - `optim`: adamw_torch
703
+ - `optim_args`: None
704
+ - `adafactor`: False
705
+ - `group_by_length`: False
706
+ - `length_column_name`: length
707
+ - `ddp_find_unused_parameters`: None
708
+ - `ddp_bucket_cap_mb`: None
709
+ - `ddp_broadcast_buffers`: False
710
+ - `dataloader_pin_memory`: True
711
+ - `dataloader_persistent_workers`: False
712
+ - `skip_memory_metrics`: True
713
+ - `use_legacy_prediction_loop`: False
714
+ - `push_to_hub`: False
715
+ - `resume_from_checkpoint`: None
716
+ - `hub_model_id`: None
717
+ - `hub_strategy`: every_save
718
+ - `hub_private_repo`: None
719
+ - `hub_always_push`: False
720
+ - `gradient_checkpointing`: False
721
+ - `gradient_checkpointing_kwargs`: None
722
+ - `include_inputs_for_metrics`: False
723
+ - `include_for_metrics`: []
724
+ - `eval_do_concat_batches`: True
725
+ - `fp16_backend`: auto
726
+ - `push_to_hub_model_id`: None
727
+ - `push_to_hub_organization`: None
728
+ - `mp_parameters`:
729
+ - `auto_find_batch_size`: False
730
+ - `full_determinism`: False
731
+ - `torchdynamo`: None
732
+ - `ray_scope`: last
733
+ - `ddp_timeout`: 1800
734
+ - `torch_compile`: False
735
+ - `torch_compile_backend`: None
736
+ - `torch_compile_mode`: None
737
+ - `dispatch_batches`: None
738
+ - `split_batches`: None
739
+ - `include_tokens_per_second`: False
740
+ - `include_num_input_tokens_seen`: False
741
+ - `neftune_noise_alpha`: None
742
+ - `optim_target_modules`: None
743
+ - `batch_eval_metrics`: False
744
+ - `eval_on_start`: False
745
+ - `use_liger_kernel`: False
746
+ - `eval_use_gather_object`: False
747
+ - `average_tokens_across_devices`: False
748
+ - `prompts`: None
749
+ - `batch_sampler`: batch_sampler
750
+ - `multi_dataset_batch_sampler`: proportional
751
+
752
+ </details>
753
+
754
+ ### Training Logs
755
+ | Epoch | Step | Training Loss | cosine_ndcg@10 |
756
+ |:------:|:----:|:-------------:|:--------------:|
757
+ | -1 | -1 | - | 0.7763 |
758
+ | 0.0863 | 500 | 0.2343 | - |
759
+ | 0.1726 | 1000 | 0.1259 | 0.8140 |
760
+
761
+
762
+ ### Framework Versions
763
+ - Python: 3.10.15
764
+ - Sentence Transformers: 4.0.2
765
+ - Transformers: 4.49.0
766
+ - PyTorch: 2.6.0+cu126
767
+ - Accelerate: 0.26.0
768
+ - Datasets: 3.5.0
769
+ - Tokenizers: 0.21.1
770
+
771
+ ## Citation
772
+
773
+ ### BibTeX
774
+
775
+ #### Sentence Transformers
776
+ ```bibtex
777
+ @inproceedings{reimers-2019-sentence-bert,
778
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
779
+ author = "Reimers, Nils and Gurevych, Iryna",
780
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
781
+ month = "11",
782
+ year = "2019",
783
+ publisher = "Association for Computational Linguistics",
784
+ url = "https://arxiv.org/abs/1908.10084",
785
+ }
786
+ ```
787
+
788
+ #### MatryoshkaLoss
789
+ ```bibtex
790
+ @misc{kusupati2024matryoshka,
791
+ title={Matryoshka Representation Learning},
792
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
793
+ year={2024},
794
+ eprint={2205.13147},
795
+ archivePrefix={arXiv},
796
+ primaryClass={cs.LG}
797
+ }
798
+ ```
799
+
800
+ #### MultipleNegativesRankingLoss
801
+ ```bibtex
802
+ @misc{henderson2017efficient,
803
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
804
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
805
+ year={2017},
806
+ eprint={1705.00652},
807
+ archivePrefix={arXiv},
808
+ primaryClass={cs.CL}
809
+ }
810
+ ```
811
+
812
+ <!--
813
+ ## Glossary
814
+
815
+ *Clearly define terms in order to be accessible across audiences.*
816
+ -->
817
+
818
+ <!--
819
+ ## Model Card Authors
820
+
821
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
822
+ -->
823
+
824
+ <!--
825
+ ## Model Card Contact
826
+
827
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
828
+ -->
checkpoint-1000/config.json ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "Snowflake/snowflake-arctic-embed-m-v2.0",
3
+ "architectures": [
4
+ "GteModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.0,
7
+ "auto_map": {
8
+ "AutoConfig": "Snowflake/snowflake-arctic-embed-m-v2.0--configuration_hf_alibaba_nlp_gte.GteConfig",
9
+ "AutoModel": "Snowflake/snowflake-arctic-embed-m-v2.0--modeling_hf_alibaba_nlp_gte.GteModel"
10
+ },
11
+ "classifier_dropout": 0.1,
12
+ "hidden_act": "gelu",
13
+ "hidden_dropout_prob": 0.1,
14
+ "hidden_size": 768,
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 3072,
17
+ "layer_norm_eps": 1e-12,
18
+ "layer_norm_type": "layer_norm",
19
+ "logn_attention_clip1": false,
20
+ "logn_attention_scale": false,
21
+ "max_position_embeddings": 8192,
22
+ "model_type": "gte",
23
+ "num_attention_heads": 12,
24
+ "num_hidden_layers": 12,
25
+ "pack_qkv": true,
26
+ "pad_token_id": 1,
27
+ "position_embedding_type": "rope",
28
+ "rope_scaling": null,
29
+ "rope_theta": 160000,
30
+ "torch_dtype": "float32",
31
+ "transformers_version": "4.49.0",
32
+ "type_vocab_size": 1,
33
+ "unpad_inputs": "true",
34
+ "use_memory_efficient_attention": "true",
35
+ "vocab_size": 250048
36
+ }
checkpoint-1000/config_sentence_transformers.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "4.0.2",
4
+ "transformers": "4.49.0",
5
+ "pytorch": "2.6.0+cu126"
6
+ },
7
+ "prompts": {
8
+ "query": "query: "
9
+ },
10
+ "default_prompt_name": null,
11
+ "similarity_fn_name": "cosine"
12
+ }
checkpoint-1000/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7e133f4ec12d5dd29912e392500017fee9677b995ae86f33284b22f8b48a97f6
3
+ size 1221487872
checkpoint-1000/modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
checkpoint-1000/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1d0420754c980be666386e2e951094f019ce8ffb2bf94b573d8bba1eb9823d73
3
+ size 2443060474
checkpoint-1000/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:98f3a2511c37815d4c0f0cbdd936e7c6c7ecd9e13da0cdaa863b20f39ef0256b
3
+ size 14244
checkpoint-1000/scaler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:849c1bd155c23173464280c951875ce40008ed02ce90370d63d04673b1da8d08
3
+ size 988
checkpoint-1000/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:66b99bd7b456394e4dbb1005e4fc464c4aaf11da4afba141a0c56a33f8a3279b
3
+ size 1064
checkpoint-1000/sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 8192,
3
+ "do_lower_case": false
4
+ }
checkpoint-1000/special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "<unk>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
checkpoint-1000/tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aa7a6ad87a7ce8fe196787355f6af7d03aee94d19c54a5eb1392ed18c8ef451a
3
+ size 17082988
checkpoint-1000/tokenizer_config.json ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "250001": {
36
+ "content": "<mask>",
37
+ "lstrip": true,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "<s>",
45
+ "clean_up_tokenization_spaces": true,
46
+ "cls_token": "<s>",
47
+ "eos_token": "</s>",
48
+ "extra_special_tokens": {},
49
+ "mask_token": "<mask>",
50
+ "max_length": 512,
51
+ "model_max_length": 32768,
52
+ "pad_to_multiple_of": null,
53
+ "pad_token": "<pad>",
54
+ "pad_token_type_id": 0,
55
+ "padding_side": "right",
56
+ "sep_token": "</s>",
57
+ "stride": 0,
58
+ "tokenizer_class": "XLMRobertaTokenizerFast",
59
+ "truncation_side": "right",
60
+ "truncation_strategy": "longest_first",
61
+ "unk_token": "<unk>"
62
+ }
checkpoint-1000/trainer_state.json ADDED
@@ -0,0 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.6452615225271879,
3
+ "best_model_checkpoint": "embedding_model_output\\checkpoint-1000",
4
+ "epoch": 0.17262213015708613,
5
+ "eval_steps": 1000,
6
+ "global_step": 1000,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.08631106507854307,
13
+ "grad_norm": 0.5810384154319763,
14
+ "learning_rate": 4.288179465056083e-06,
15
+ "loss": 0.2343,
16
+ "step": 500
17
+ },
18
+ {
19
+ "epoch": 0.17262213015708613,
20
+ "grad_norm": 0.08727596700191498,
21
+ "learning_rate": 8.593615185504747e-06,
22
+ "loss": 0.1259,
23
+ "step": 1000
24
+ },
25
+ {
26
+ "epoch": 0.17262213015708613,
27
+ "eval_cosine_accuracy@1": 0.6452615225271879,
28
+ "eval_cosine_accuracy@10": 0.9582254445019851,
29
+ "eval_cosine_accuracy@3": 0.8750215777662697,
30
+ "eval_cosine_accuracy@5": 0.923701018470568,
31
+ "eval_cosine_map@100": 0.7683024440426897,
32
+ "eval_cosine_mrr@10": 0.7664228721582445,
33
+ "eval_cosine_ndcg@10": 0.8140391807550109,
34
+ "eval_cosine_precision@1": 0.6452615225271879,
35
+ "eval_cosine_precision@10": 0.0958225444501985,
36
+ "eval_cosine_precision@3": 0.2916738592554232,
37
+ "eval_cosine_precision@5": 0.18474020369411356,
38
+ "eval_cosine_recall@1": 0.6452615225271879,
39
+ "eval_cosine_recall@10": 0.9582254445019851,
40
+ "eval_cosine_recall@3": 0.8750215777662697,
41
+ "eval_cosine_recall@5": 0.923701018470568,
42
+ "eval_runtime": 9.5713,
43
+ "eval_samples_per_second": 0.0,
44
+ "eval_steps_per_second": 0.0,
45
+ "step": 1000
46
+ }
47
+ ],
48
+ "logging_steps": 500,
49
+ "max_steps": 23172,
50
+ "num_input_tokens_seen": 0,
51
+ "num_train_epochs": 4,
52
+ "save_steps": 1000,
53
+ "stateful_callbacks": {
54
+ "TrainerControl": {
55
+ "args": {
56
+ "should_epoch_stop": false,
57
+ "should_evaluate": false,
58
+ "should_log": false,
59
+ "should_save": true,
60
+ "should_training_stop": false
61
+ },
62
+ "attributes": {}
63
+ }
64
+ },
65
+ "total_flos": 0.0,
66
+ "train_batch_size": 8,
67
+ "trial_name": null,
68
+ "trial_params": null
69
+ }
checkpoint-1000/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:24eaf2bec03129904592472d2b5e74cc9b678a5865b48f4e783edad2d50e50df
3
+ size 5560
config.json ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "Snowflake/snowflake-arctic-embed-m-v2.0",
3
+ "architectures": [
4
+ "GteModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.0,
7
+ "auto_map": {
8
+ "AutoConfig": "Snowflake/snowflake-arctic-embed-m-v2.0--configuration_hf_alibaba_nlp_gte.GteConfig",
9
+ "AutoModel": "Snowflake/snowflake-arctic-embed-m-v2.0--modeling_hf_alibaba_nlp_gte.GteModel"
10
+ },
11
+ "classifier_dropout": 0.1,
12
+ "hidden_act": "gelu",
13
+ "hidden_dropout_prob": 0.1,
14
+ "hidden_size": 768,
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 3072,
17
+ "layer_norm_eps": 1e-12,
18
+ "layer_norm_type": "layer_norm",
19
+ "logn_attention_clip1": false,
20
+ "logn_attention_scale": false,
21
+ "max_position_embeddings": 8192,
22
+ "model_type": "gte",
23
+ "num_attention_heads": 12,
24
+ "num_hidden_layers": 12,
25
+ "pack_qkv": true,
26
+ "pad_token_id": 1,
27
+ "position_embedding_type": "rope",
28
+ "rope_scaling": null,
29
+ "rope_theta": 160000,
30
+ "torch_dtype": "float32",
31
+ "transformers_version": "4.49.0",
32
+ "type_vocab_size": 1,
33
+ "unpad_inputs": "true",
34
+ "use_memory_efficient_attention": "true",
35
+ "vocab_size": 250048
36
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "4.0.2",
4
+ "transformers": "4.49.0",
5
+ "pytorch": "2.6.0+cu126"
6
+ },
7
+ "prompts": {
8
+ "query": "query: "
9
+ },
10
+ "default_prompt_name": null,
11
+ "similarity_fn_name": "cosine"
12
+ }
eval/Information-Retrieval_evaluation_results.csv ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ epoch,steps,cosine-Accuracy@1,cosine-Accuracy@3,cosine-Accuracy@5,cosine-Accuracy@10,cosine-Precision@1,cosine-Recall@1,cosine-Precision@3,cosine-Recall@3,cosine-Precision@5,cosine-Recall@5,cosine-Precision@10,cosine-Recall@10,cosine-MRR@10,cosine-NDCG@10,cosine-MAP@100
2
+ 0.17262213015708613,1000,0.6452615225271879,0.8750215777662697,0.923701018470568,0.9582254445019851,0.6452615225271879,0.6452615225271879,0.2916738592554232,0.8750215777662697,0.18474020369411356,0.923701018470568,0.0958225444501985,0.9582254445019851,0.7664228721582445,0.8140391807550109,0.7683024440426897
3
+ 0.34524426031417227,2000,0.6639047125841533,0.8883134817883652,0.9356119454514069,0.9691006387018816,0.6639047125841533,0.6639047125841533,0.2961044939294551,0.8883134817883652,0.18712238909028134,0.9356119454514069,0.09691006387018813,0.9691006387018816,0.7824048728761323,0.8287714922901859,0.7837979737909052
4
+ 0.5178663904712584,3000,0.6630416019333678,0.8902123252200932,0.9333678577593647,0.9691006387018816,0.6630416019333678,0.6630416019333678,0.2967374417400311,0.8902123252200932,0.18667357155187292,0.9333678577593647,0.09691006387018813,0.9691006387018816,0.7824150110012364,0.8288118552714098,0.7838137096104553
5
+ 0.6904885206283445,4000,0.6784049715173486,0.8991886759882617,0.9413084757465907,0.9713447263939237,0.6784049715173486,0.6784049715173486,0.2997295586627539,0.8991886759882617,0.18826169514931812,0.9413084757465907,0.09713447263939236,0.9713447263939237,0.7932798615734924,0.8376229510207887,0.7947166969348967
6
+ 0.8631106507854307,5000,0.6723631969618505,0.8940100120835491,0.939754876575177,0.9696185050923528,0.6723631969618505,0.6723631969618505,0.29800333736118306,0.8940100120835491,0.18795097531503538,0.939754876575177,0.09696185050923527,0.9696185050923528,0.7892472853115005,0.8341799204343456,0.7907848709590974
7
+ 1.0357327809425168,6000,0.6751251510443639,0.9007422751596754,0.9419989642672191,0.9747971689970655,0.6751251510443639,0.6751251510443639,0.30024742505322516,0.9007422751596754,0.1883997928534438,0.9419989642672191,0.09747971689970655,0.9747971689970655,0.7931643006474701,0.8383839598425359,0.7943360363143344
8
+ 1.208354911099603,7000,0.6778871051268772,0.9029863628517176,0.9432073191783187,0.9728983255653375,0.6778871051268772,0.6778871051268772,0.30099545428390584,0.9029863628517176,0.1886414638356637,0.9432073191783187,0.09728983255653373,0.9728983255653375,0.7948279806772833,0.8392460798584618,0.7961575887260864
9
+ 1.380977041256689,8000,0.6809943034697048,0.9079924046262731,0.9490764716036596,0.9761781460383221,0.6809943034697048,0.6809943034697048,0.3026641348754244,0.9079924046262731,0.1898152943207319,0.9490764716036596,0.09761781460383219,0.9761781460383221,0.7994906277143463,0.8436464764966085,0.8006890748516114
10
+ 1.5535991714137753,9000,0.6925599861902296,0.9110996029691006,0.9490764716036596,0.9763507681684792,0.6925599861902296,0.6925599861902296,0.30369986765636686,0.9110996029691006,0.1898152943207319,0.9490764716036596,0.09763507681684791,0.9763507681684792,0.8058929222186606,0.8484696678941415,0.8070560716249376
11
+ 1.7262213015708614,10000,0.7098221992059383,0.9219747971689971,0.9544277576385293,0.9817020542033489,0.7098221992059383,0.7098221992059383,0.30732493238966574,0.9219747971689971,0.19088555152770584,0.9544277576385293,0.09817020542033487,0.9817020542033489,0.8195256179461256,0.8600785648793432,0.8204184538729643
12
+ 1.8988434317279475,11000,0.7001553599171414,0.9211116865182116,0.9520110478163301,0.9787674779906784,0.7001553599171414,0.7001553599171414,0.3070372288394039,0.9211116865182116,0.19040220956326598,0.9520110478163301,0.09787674779906784,0.9787674779906784,0.8134861724193683,0.8548735445529405,0.8145655546345327
13
+ 2.0714655618850335,12000,0.7070602451234248,0.9211116865182116,0.9552908682893146,0.9782496116002072,0.7070602451234248,0.7070602451234248,0.3070372288394039,0.9211116865182116,0.19105817365786293,0.9552908682893146,0.09782496116002071,0.9782496116002072,0.81739263862516,0.8576989575668822,0.8185888259425895
14
+ 2.24408769204212,13000,0.7006732263076126,0.9169687553944416,0.9516658035560159,0.9785948558605213,0.7006732263076126,0.7006732263076126,0.30565625179814715,0.9169687553944416,0.19033316071120315,0.9516658035560159,0.0978594855860521,0.9785948558605213,0.8129143273627983,0.8543586806062912,0.8139810607548555
15
+ 2.416709822199206,14000,0.7058518902123252,0.9228379078197825,0.955808734679786,0.9784222337303642,0.7058518902123252,0.7058518902123252,0.3076126359399275,0.9228379078197825,0.1911617469359572,0.955808734679786,0.09784222337303641,0.9784222337303642,0.8169426565723812,0.8574321642904301,0.8180545800262955
16
+ 2.589331952356292,15000,0.7120662868979803,0.92180217503884,0.9566718453305714,0.9772138788192646,0.7120662868979803,0.7120662868979803,0.3072673916796133,0.92180217503884,0.19133436906611426,0.9566718453305714,0.09772138788192644,0.9772138788192646,0.8200491014059098,0.8594759642138574,0.8212433461863269
17
+ 2.761954082513378,16000,0.7001553599171414,0.9166235111341274,0.9532194027274297,0.9772138788192646,0.7001553599171414,0.7001553599171414,0.30554117037804246,0.9166235111341274,0.1906438805454859,0.9532194027274297,0.09772138788192644,0.9772138788192646,0.8125648360500767,0.853797494487907,0.813704481944789
18
+ 2.934576212670464,17000,0.7023994476091835,0.9197307094769549,0.9570170895908856,0.9798032107716209,0.7023994476091835,0.7023994476091835,0.306576903158985,0.9197307094769549,0.19140341791817705,0.9570170895908856,0.09798032107716208,0.9798032107716209,0.814817965853698,0.8561541567644952,0.8158167847748476
19
+ 3.1071983428275507,18000,0.7008458484377698,0.9188675988261695,0.9561539789401001,0.9782496116002072,0.7008458484377698,0.7008458484377698,0.3062891996087232,0.9188675988261695,0.19123079578801996,0.9561539789401001,0.09782496116002068,0.9782496116002072,0.8140465778347701,0.8552288676846805,0.815111891849899
20
+ 3.2798204729846367,19000,0.7060245123424823,0.9224926635594684,0.9561539789401001,0.9791127222509926,0.7060245123424823,0.7060245123424823,0.3074975545198228,0.9224926635594684,0.19123079578801996,0.9561539789401001,0.09791127222509922,0.9791127222509926,0.8177931630676311,0.8582467043710136,0.8188411665480363
21
+ 3.452442603141723,20000,0.7117210426376661,0.9224926635594684,0.9580528223718281,0.9813568099430346,0.7117210426376661,0.7117210426376661,0.3074975545198228,0.9224926635594684,0.1916105644743656,0.9580528223718281,0.09813568099430345,0.9813568099430346,0.8212620458736453,0.8613536718596451,0.822185152646577
22
+ 3.625064733298809,21000,0.7084412221646815,0.923701018470568,0.9575349559813569,0.9817020542033489,0.7084412221646815,0.7084412221646815,0.3079003394901893,0.923701018470568,0.19150699119627135,0.9575349559813569,0.09817020542033487,0.9817020542033489,0.8194419099131679,0.8600711311402389,0.8203510755969329
23
+ 3.797686863455895,22000,0.7084412221646815,0.9240462627308821,0.959779043673399,0.9815294320731918,0.7084412221646815,0.7084412221646815,0.30801542091029405,0.9240462627308821,0.19195580873467974,0.959779043673399,0.09815294320731917,0.9815294320731918,0.8199812855690634,0.8604835778775234,0.8208969749809484
24
+ 3.9703089936129814,23000,0.7136198860693941,0.9243915069911963,0.9589159330226135,0.981874676333506,0.7136198860693941,0.7136198860693941,0.30813050233039874,0.9243915069911963,0.1917831866045227,0.9589159330226135,0.09818746763335057,0.981874676333506,0.8227635844026309,0.8626251072928146,0.8236564067385257
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7e133f4ec12d5dd29912e392500017fee9677b995ae86f33284b22f8b48a97f6
3
+ size 1221487872
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 8192,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "<unk>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aa7a6ad87a7ce8fe196787355f6af7d03aee94d19c54a5eb1392ed18c8ef451a
3
+ size 17082988
tokenizer_config.json ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "250001": {
36
+ "content": "<mask>",
37
+ "lstrip": true,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "<s>",
45
+ "clean_up_tokenization_spaces": true,
46
+ "cls_token": "<s>",
47
+ "eos_token": "</s>",
48
+ "extra_special_tokens": {},
49
+ "mask_token": "<mask>",
50
+ "max_length": 512,
51
+ "model_max_length": 32768,
52
+ "pad_to_multiple_of": null,
53
+ "pad_token": "<pad>",
54
+ "pad_token_type_id": 0,
55
+ "padding_side": "right",
56
+ "sep_token": "</s>",
57
+ "stride": 0,
58
+ "tokenizer_class": "XLMRobertaTokenizerFast",
59
+ "truncation_side": "right",
60
+ "truncation_strategy": "longest_first",
61
+ "unk_token": "<unk>"
62
+ }