vrnP66 commited on
Commit
c4bbc52
·
verified ·
1 Parent(s): a0369ef

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 1024,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,733 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - dense
7
+ - generated_from_trainer
8
+ - dataset_size:40374
9
+ - loss:MultipleNegativesRankingLoss
10
+ widget:
11
+ - source_sentence: यथोवाच भगवान् धन्वन्तरिः ||२||
12
+ sentences:
13
+ - '**Ashtanga Hridayam, Uttara Sthana, chapter 22, sutra 106**
14
+
15
+
16
+ **Sutra**:
17
+
18
+ पटोल-निम्ब-यष्ट्य्-आह्व-वासा-जात्य्-अरिमेदसाम् । खदिरस्य वरायाश् च पृथग् एवं प्रकल्पना
19
+ ॥ १०६ ॥
20
+
21
+
22
+ **English Transliteration**:
23
+
24
+ paṭola-nimba-yaṣṭy-āhva-vāsā-jāty-arimedasām | khadirasya varāyāś ca pṛthag evaṁ
25
+ prakalpanā || 106 ||
26
+
27
+
28
+ **English Translation**:
29
+
30
+ Thus, a separate preparation should be made from patola, nimba, licorice, vasa,
31
+ jati, arimedasa, khadira, and vara.'
32
+ - '**Susrut Samhita, Sharira Sthana, chapter 9, sutra 2**
33
+
34
+
35
+ **Sutra**:
36
+
37
+ यथोवाच भगवान् धन्वन्तरिः ||२||
38
+
39
+
40
+ **English Transliteration**:
41
+
42
+ yathovāca bhagavān dhanvantariḥ ||2||
43
+
44
+
45
+ **English Translation**:
46
+
47
+ Thus spoke the venerable Dhanvantari.'
48
+ - '**Susrut Samhita, Chikitsa Sthana, chapter 24, sutra 85**
49
+
50
+
51
+ **Sutra**:
52
+
53
+ सुखं वातं प्रसेवेत ग्रीष्मे शरदि मानवः | निवातं ह्यायुषे सेव्यमारोग्याय च सर्वदा
54
+ ||८५||
55
+
56
+
57
+ **English Transliteration**:
58
+
59
+ sukhaṃ vātaṃ praseveta grīṣme śaradi mānavaḥ | nivātaṃ hyāyuṣe sevyamārogyāya
60
+ ca sarvadā ||85||
61
+
62
+
63
+ **English Translation**:
64
+
65
+ A person should enjoy pleasant wind in summer and autumn. Absence of wind is always
66
+ beneficial for longevity and health.'
67
+ - source_sentence: विशीर्यते कूर्चकस्तु दन्तकाष्ठगते विषे | जिह्वादन्तौष्ठमांसानां
68
+ श्वयथुश्चोपजायते ||४८||
69
+ sentences:
70
+ - '**Susrut Samhita, Chikitsa Sthana, chapter 28, sutra 26**
71
+
72
+
73
+ **Sutra**:
74
+
75
+ पाप्मानं नाशयन्त्येता दद्युश्चौषधयः श्रियम् | कुर्युर्नागबलं चापि मनुष्यममरोपमम्
76
+ ||२६||
77
+
78
+
79
+ **English Transliteration**:
80
+
81
+ pāpmānaṃ nāśayantyetā dadyuścauṣadhayaḥ śriyam | kuryurnāgabalaṃ cāpi manuṣyamamaropamam
82
+ ||26||
83
+
84
+
85
+ **English Translation**:
86
+
87
+ These herbs destroy sin, bestow prosperity, and also create strength like that
88
+ of serpents, making a human being comparable to the gods.'
89
+ - '**Susrut Samhita, Kalpa Sthana, chapter 1, sutra 48**
90
+
91
+
92
+ **Sutra**:
93
+
94
+ विशीर्यते कूर्चकस्तु दन्तकाष्ठगते विषे | जिह्वादन्तौष्ठमांसानां श्वयथुश्चोपजायते
95
+ ||४८||
96
+
97
+
98
+ **English Transliteration**:
99
+
100
+ viśīryate kūrcakastu dantakāṣṭhagate viṣe | jihvādantāuṣṭhamāṃsānāṃ śvayathuścopajāyate
101
+ ||48||
102
+
103
+
104
+ **English Translation**:
105
+
106
+ When poison is present in the tooth-stick, the brush-like end disintegrates, and
107
+ swelling arises in the tongue, teeth, lips, and gums.'
108
+ - '**Charak-Samhita, chikitsa sthana, chapter 2, sutra 38**
109
+
110
+
111
+ **Sutra**:
112
+
113
+ गत्वा स्नात्वा पयः पीत्वा रसं वाऽनु शयीत ना| तथाऽस्याप्यायते भूयः शुक्रं च बलमेव
114
+ च||३८||
115
+
116
+
117
+ **English Transliteration**:
118
+
119
+ gatvā snātvā payaḥ pītvā rasaṃ vā''nu śayīta nā| tathā''syāpyāyate bhūyaḥ śukraṃ
120
+ ca balameva ca||38||
121
+
122
+
123
+ **English Translation**:
124
+
125
+ After intercourse, one should bathe, drink milk or juice, and then not sleep immediately.
126
+ Thus, his semen, strength, and nourishment are increased again.'
127
+ - source_sentence: कृच्छ्रोन्मीले पुराणाज्यं द्राक्षा-कल्काम्बु-साधितम् । स-सितं योजयेत्
128
+ स्निग्धं नस्य-धूमाञ्जनादि च ॥ १ ॥
129
+ sentences:
130
+ - '**Ashtanga Hridayam, Uttara Sthana, chapter 9, sutra 1**
131
+
132
+
133
+ **Sutra**:
134
+
135
+ कृच्छ्रोन्मीले पुराणाज्यं द्राक्षा-कल्काम्बु-साधितम् । स-सितं योजयेत् स्निग्धं
136
+ नस्य-धूमाञ्जनादि च ॥ १ ॥
137
+
138
+
139
+ **English Transliteration**:
140
+
141
+ kṛcchronmīle purāṇājyaṃ drākṣā-kalkāmbu-sādhitam | sa-sitaṃ yojayet snigdhaṃ nasya-dhūmāñjanādi
142
+ ca || 1 ||
143
+
144
+
145
+ **English Translation**:
146
+
147
+ In difficult opening of the eyes, old ghee, processed with grape-paste-water;
148
+ with sugar, apply it smoothly, nasal drops, fumigation, collyrium, and so on.'
149
+ - '**Charak-Samhita, chikitsa sthana, chapter 30, sutra 339**
150
+
151
+
152
+ **Sutra**:
153
+
154
+ देशे देशे च यत् सात्म्यं यथा वैद्योऽपराध्यति| चिकित्सा चापि निर्दिष्टा दोषाणां
155
+ गूढचारिणाम्||३३९||
156
+
157
+
158
+ **English Transliteration**:
159
+
160
+ dēśē dēśē ca yat sātmyaṁ yathā vaidyōparādhyati| cikitsā cāpi nirdiṣṭā dōṣāṇāṁ
161
+ gūḍhacāriṇām||339||
162
+
163
+
164
+ **English Translation**:
165
+
166
+ The suitability for different regions, how a physician errs, and the treatment
167
+ of hidden diseases are prescribed.'
168
+ - '**Ashtanga Hridayam, Uttara Sthana, chapter 17, sutra 14**
169
+
170
+
171
+ **Sutra**:
172
+
173
+ खादन्तो जन्तवः कुर्युस् तीव्रां स कृमि-कर्णकः । श्रोत्र-कण्डूयनाज् जाते क्षते
174
+ स्यात् पूर्व-लक्षणः ॥ १४ ॥
175
+
176
+
177
+ **English Transliteration**:
178
+
179
+ khādanto jantavaḥ kuryus tīvrāṃ sa kṛmi-karṇakaḥ | śrotra-kaṇḍūyanāj jāte kṣate
180
+ syāt pūrva-lakṣaṇaḥ || 14 ||
181
+
182
+
183
+ **English Translation**:
184
+
185
+ Biting creatures cause intense pain; that is a worm-infested ear; from scratching
186
+ the ear, when a wound arises, the previous symptoms manifest.'
187
+ - source_sentence: श्वासः कासः प्रतिश्यायो मुखशोषोऽतिपार्श्वरुक्| कफहीने पित्तमध्ये
188
+ लिङ्गं वाताधिके मतम्||१०१||
189
+ sentences:
190
+ - '**Susrut Samhita, Uttara tantra, chapter 42, sutra 80**
191
+
192
+
193
+ **Sutra**:
194
+
195
+ वायुः प्रकुपितः कोष्ठे शूलं सञ्जनयेद्भृशम् | निरुच्छ्वासी भवेत्तेन वेदनापीडितो
196
+ नरः ||८०||
197
+
198
+
199
+ **English Transliteration**:
200
+
201
+ vāyuḥ prakupitaḥ koṣṭhe śūlaṃ sañjanayedbhṛśam | nirucchvāsī bhavettena vedanāpīḍito
202
+ naraḥ ||80||
203
+
204
+
205
+ **English Translation**:
206
+
207
+ Aggravated Vata in the abdomen intensely generates pain (shula). Due to that pain,
208
+ the person becomes breathless and afflicted by suffering.'
209
+ - '**Charak-Samhita, chikitsa sthana, chapter 3, sutra 101**
210
+
211
+
212
+ **Sutra**:
213
+
214
+ श्वासः कासः प्रतिश्यायो मुखशोषोऽतिपार्श्वरुक्| कफहीने पित्तमध्ये लिङ्गं वाताधिके
215
+ मतम्||१०१||
216
+
217
+
218
+ **English Transliteration**:
219
+
220
+ śvāsaḥ kāsaḥ pratiśyāyo mukhaśoṣo''tipārśvaruk| kaphahīne pittamadhye liṅgaṃ vātādhike
221
+ matam||101||
222
+
223
+
224
+ **English Translation**:
225
+
226
+ Shortness of breath, cough, coryza (common cold), dryness of the mouth, and severe
227
+ pain in the sides are the signs of increased Vata, with diminished Kapha and moderate
228
+ Pitta.'
229
+ - '**Charak-Samhita, chikitsa sthana, chapter 23, sutra 86**
230
+
231
+
232
+ **Sutra**:
233
+
234
+ आनद्धे गुदलेपो योनौ लेपश्च मूढगर्भाणाम्| मूर्च्छार्तिषु च ललाटे प्रलेपनमाहुः प्रधानतमम्||८६||
235
+
236
+
237
+ **English Transliteration**:
238
+
239
+ ānaddhe gudalepo yonau lepaśca mūḍhagarbhāṇām| mūrchārtisu ca lalāṭe pralepanamāhuḥ
240
+ pradhānatamam||86||
241
+
242
+
243
+ **English Translation**:
244
+
245
+ It is said that this is an excellent application for flatulence, as a vaginal
246
+ application for obstructed labor, and as a paste on the forehead for fainting
247
+ and pain.'
248
+ - source_sentence: वातातपाध्व-यानादि-परिहार्येष्व् अ-यन्त्रणम् । प्रयोज्यं सु-कुमाराणाम्
249
+ ईश्वराणाम् सुखात्मनाम् ॥ ४५ ॥
250
+ sentences:
251
+ - '**Ashtanga Hridayam, Sutra Sthana, Sutra Sthana, chapter 6, sutra 129**
252
+
253
+
254
+ **Sutra**:
255
+
256
+ गुर्व् आम्रं वात-जित् पक्वं स्वाद्व् अम्लं कफ-शुक्र-कृत् । वृक्षाम्लं ग्राहि रूक्षोष्णं
257
+ वात-श्लेष्म-हरं लघु ॥ १२९ ॥
258
+
259
+
260
+ **English Transliteration**:
261
+
262
+ gurv āmraṃ vāta-jit pakvaṃ svādv amlaṃ kapha-śukra-kṛt । vṛkṣāmlaṃ grāhi rūkṣoṣṇaṃ
263
+ vāta-śleṣma-haraṃ laghu ॥ 129 ॥
264
+
265
+
266
+ **English Translation**:
267
+
268
+ Heavy mango vata-conquering ripe sweet-sour kapha-semen-doing. Garcinia astringent
269
+ dry-hot vata-phlegm-removing light.'
270
+ - '**Ashtanga Hridayam, Chikitsa Sthana, chapter 13, sutra 45**
271
+
272
+
273
+ **Sutra**:
274
+
275
+ वातातपाध्व-यानादि-परिहार्येष्व् अ-यन्त्रणम् । प्रयोज्यं सु-कुमाराणाम् ईश्वराणाम्
276
+ सुखात्मनाम् ॥ ४५ ॥
277
+
278
+
279
+ **English Transliteration**:
280
+
281
+ vātātapādhva-yānādi-parihāryeṣv a-yantraṇam | prayojyaṃ su-kumārāṇām īśvarāṇām
282
+ sukhātmanām || 45 ||
283
+
284
+
285
+ **English Translation**:
286
+
287
+ Without restrictions regarding avoidance of wind, sun, travel, etc., it can be
288
+ used by delicate, wealthy, and happy individuals.'
289
+ - '**Ashtanga Hridayam, Sutra Sthana, chapter 22, sutra 34**
290
+
291
+
292
+ **Sutra**:
293
+
294
+ कच-सदन-सित-त्व-पिञ्जर-त्वं परिफुटनं शिरसः समीर-रोगान् । जयति जनयतीन्द्रिय-प्रसादं
295
+ स्वर-हनु-मूर्द्ध-बलं च मूर्द्ध-तैलम् ॥ ३४ ॥
296
+
297
+
298
+ **English Transliteration**:
299
+
300
+ kaca-sadana-sita-tva-piñjara-tvaṃ parisphuṭanaṃ śirasaḥ samīra-rogān । jayati
301
+ janayatīndriya-prasādaṃ svara-hanu-mūrddha-balaṃ ca mūrddha-tailam ॥ 34 ॥
302
+
303
+
304
+ **English Translation**:
305
+
306
+ Hair-falling-white-ness-yellowish-ness splitting of head wind-diseases overcomes
307
+ generates sense-organ-pleasure voice-jaw-head-strength and head-oil.'
308
+ datasets:
309
+ - vrnP66/Inhouse_Devanagari
310
+ pipeline_tag: sentence-similarity
311
+ library_name: sentence-transformers
312
+ metrics:
313
+ - cosine_accuracy
314
+ model-index:
315
+ - name: SentenceTransformer
316
+ results:
317
+ - task:
318
+ type: triplet
319
+ name: Triplet
320
+ dataset:
321
+ name: Embedding Dataset Dev
322
+ type: Embedding_Dataset_Dev
323
+ metrics:
324
+ - type: cosine_accuracy
325
+ value: 0.9996037483215332
326
+ name: Cosine Accuracy
327
+ ---
328
+
329
+ # SentenceTransformer
330
+
331
+ This is a [sentence-transformers](https://www.SBERT.net) model trained on the [inhouse_devanagari](https://huggingface.co/datasets/vrnP66/Inhouse_Devanagari) dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
332
+
333
+ ## Model Details
334
+
335
+ ### Model Description
336
+ - **Model Type:** Sentence Transformer
337
+ <!-- - **Base model:** [Unknown](https://huggingface.co/unknown) -->
338
+ - **Maximum Sequence Length:** 512 tokens
339
+ - **Output Dimensionality:** 1024 dimensions
340
+ - **Similarity Function:** Cosine Similarity
341
+ - **Training Dataset:**
342
+ - [inhouse_devanagari](https://huggingface.co/datasets/vrnP66/Inhouse_Devanagari)
343
+ <!-- - **Language:** Unknown -->
344
+ <!-- - **License:** Unknown -->
345
+
346
+ ### Model Sources
347
+
348
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
349
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
350
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
351
+
352
+ ### Full Model Architecture
353
+
354
+ ```
355
+ SentenceTransformer(
356
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'XLMRobertaModel'})
357
+ (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
358
+ )
359
+ ```
360
+
361
+ ## Usage
362
+
363
+ ### Direct Usage (Sentence Transformers)
364
+
365
+ First install the Sentence Transformers library:
366
+
367
+ ```bash
368
+ pip install -U sentence-transformers
369
+ ```
370
+
371
+ Then you can load this model and run inference.
372
+ ```python
373
+ from sentence_transformers import SentenceTransformer
374
+
375
+ # Download from the 🤗 Hub
376
+ model = SentenceTransformer("sentence_transformers_model_id")
377
+ # Run inference
378
+ queries = [
379
+ "\u0935\u093e\u0924\u093e\u0924\u092a\u093e\u0927\u094d\u0935-\u092f\u093e\u0928\u093e\u0926\u093f-\u092a\u0930\u093f\u0939\u093e\u0930\u094d\u092f\u0947\u0937\u094d\u0935\u094d \u0905-\u092f\u0928\u094d\u0924\u094d\u0930\u0923\u092e\u094d \u0964 \u092a\u094d\u0930\u092f\u094b\u091c\u094d\u092f\u0902 \u0938\u0941-\u0915\u0941\u092e\u093e\u0930\u093e\u0923\u093e\u092e\u094d \u0908\u0936\u094d\u0935\u0930\u093e\u0923\u093e\u092e\u094d \u0938\u0941\u0916\u093e\u0924\u094d\u092e\u0928\u093e\u092e\u094d \u0965 \u096a\u096b \u0965",
380
+ ]
381
+ documents = [
382
+ '**Ashtanga Hridayam, Chikitsa Sthana, chapter 13, sutra 45**\n\n**Sutra**:\nवातातपाध्व-यानादि-परिहार्येष्व् अ-यन्त्रणम् । प्रयोज्यं सु-कुमाराणाम् ईश्वराणा��् सुखात्मनाम् ॥ ४५ ॥\n\n**English Transliteration**:\nvātātapādhva-yānādi-parihāryeṣv a-yantraṇam | prayojyaṃ su-kumārāṇām īśvarāṇām sukhātmanām || 45 ||\n\n**English Translation**:\nWithout restrictions regarding avoidance of wind, sun, travel, etc., it can be used by delicate, wealthy, and happy individuals.',
383
+ '**Ashtanga Hridayam, Sutra Sthana, chapter 22, sutra 34**\n\n**Sutra**:\nकच-सदन-सित-त्व-पिञ्जर-त्वं परिफुटनं शिरसः समीर-रोगान् । जयति जनयतीन्द्रिय-प्रसादं स्वर-हनु-मूर्द्ध-बलं च मूर्द्ध-तैलम् ॥ ३४ ॥\n\n**English Transliteration**:\nkaca-sadana-sita-tva-piñjara-tvaṃ parisphuṭanaṃ śirasaḥ samīra-rogān । jayati janayatīndriya-prasādaṃ svara-hanu-mūrddha-balaṃ ca mūrddha-tailam ॥ 34 ॥\n\n**English Translation**:\nHair-falling-white-ness-yellowish-ness splitting of head wind-diseases overcomes generates sense-organ-pleasure voice-jaw-head-strength and head-oil.',
384
+ '**Ashtanga Hridayam, Sutra Sthana, Sutra Sthana, chapter 6, sutra 129**\n\n**Sutra**:\nगुर्व् आम्रं वात-जित् पक्वं स्वाद्व् अम्लं कफ-शुक्र-कृत् । वृक्षाम्लं ग्राहि रूक्षोष्णं वात-श्लेष्म-हरं लघु ॥ १२९ ॥\n\n**English Transliteration**:\ngurv āmraṃ vāta-jit pakvaṃ svādv amlaṃ kapha-śukra-kṛt । vṛkṣāmlaṃ grāhi rūkṣoṣṇaṃ vāta-śleṣma-haraṃ laghu ॥ 129 ॥\n\n**English Translation**:\nHeavy mango vata-conquering ripe sweet-sour kapha-semen-doing. Garcinia astringent dry-hot vata-phlegm-removing light.',
385
+ ]
386
+ query_embeddings = model.encode_query(queries)
387
+ document_embeddings = model.encode_document(documents)
388
+ print(query_embeddings.shape, document_embeddings.shape)
389
+ # [1, 1024] [3, 1024]
390
+
391
+ # Get the similarity scores for the embeddings
392
+ similarities = model.similarity(query_embeddings, document_embeddings)
393
+ print(similarities)
394
+ # tensor([[ 0.7478, -0.0367, 0.0590]])
395
+ ```
396
+
397
+ <!--
398
+ ### Direct Usage (Transformers)
399
+
400
+ <details><summary>Click to see the direct usage in Transformers</summary>
401
+
402
+ </details>
403
+ -->
404
+
405
+ <!--
406
+ ### Downstream Usage (Sentence Transformers)
407
+
408
+ You can finetune this model on your own dataset.
409
+
410
+ <details><summary>Click to expand</summary>
411
+
412
+ </details>
413
+ -->
414
+
415
+ <!--
416
+ ### Out-of-Scope Use
417
+
418
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
419
+ -->
420
+
421
+ ## Evaluation
422
+
423
+ ### Metrics
424
+
425
+ #### Triplet
426
+
427
+ * Dataset: `Embedding_Dataset_Dev`
428
+ * Evaluated with [<code>TripletEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator)
429
+
430
+ | Metric | Value |
431
+ |:--------------------|:-----------|
432
+ | **cosine_accuracy** | **0.9996** |
433
+
434
+ <!--
435
+ ## Bias, Risks and Limitations
436
+
437
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
438
+ -->
439
+
440
+ <!--
441
+ ### Recommendations
442
+
443
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
444
+ -->
445
+
446
+ ## Training Details
447
+
448
+ ### Training Dataset
449
+
450
+ #### inhouse_devanagari
451
+
452
+ * Dataset: [inhouse_devanagari](https://huggingface.co/datasets/vrnP66/Inhouse_Devanagari) at [9076844](https://huggingface.co/datasets/vrnP66/Inhouse_Devanagari/tree/9076844d6cc74a40a8d079b482f74061b2087185)
453
+ * Size: 40,374 training samples
454
+ * Columns: <code>query</code>, <code>positive_pair</code>, and <code>negative_pair</code>
455
+ * Approximate statistics based on the first 1000 samples:
456
+ | | query | positive_pair | negative_pair |
457
+ |:--------|:------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
458
+ | type | string | string | string |
459
+ | details | <ul><li>min: 11 tokens</li><li>mean: 53.25 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 81 tokens</li><li>mean: 191.71 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 80 tokens</li><li>mean: 191.47 tokens</li><li>max: 512 tokens</li></ul> |
460
+ * Samples:
461
+ | query | positive_pair | negative_pair |
462
+ |:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
463
+ | <code><br> नैते सृती पार्थ जानन्योगी मुह्यति कश्चन। तस्मात्सर्वेषु कालेषु योगयुक्तो भवार्जुन।।8.27।।</code> | <code>**Shloka:**<br> नैते सृती पार्थ जानन्योगी मुह्यति कश्चन। तस्मात्सर्वेषु कालेषु योगयुक्तो भवार्जुन।।8.27।।<br><br>**Transliteration:**<br> naite sṛtī pārtha jānanyogī muhyati kaścana\| tasmātsarveṣu kāleṣu yogayukto bhavārjuna\|\|8.27\|\|<br><br>**English Translation by Shri Purohit Swami:**<br>O Arjuna! The saint knowing these paths is not confused. Therefore meditate perpetually.<br><br>**English Translation Of Sri Shankaracharya's Sanskrit Commentary By Swami Gambirananda:**<br>O son of Prtha, na kascana yogi, no yogi whosoever; janan, has known; ete srti, these two courses as described-that one leads to worldly life, and the other to Liberation; muhyati, becomes deluded. Tasmat, therefore; O Arjuna, bhava, be you; yoga-yuktah, steadfast in Yoga; sarvesu kalesu, at all times. Here about the greatness of that yoga:</code> | <code>**Shloka:**<br> यज्ञार्थात्कर्मणोऽन्यत्र लोकोऽयं कर्मबन्धनः। तदर्थं कर्म कौन्तेय मुक्तसंगः समाचर।।3.9।।<br><br>**Transliteration:**<br> yajñārthātkarmaṇo'nyatra loko'yaṃ karmabandhanaḥ\| tadarthaṃ karma kaunteya muktasaṃgaḥ samācara\|\|3.9\|\|<br><br>**English Translation by Shri Purohit Swami:**<br>In this world people are fettered by action, unless it is performed as a sacrifice. Therefore, O Arjuna, let thy acts be done without attachment, as sacrifice only.<br><br>**English Translation Of Sri Shankaracharya's Sanskrit Commentary By Swami Gambirananda:**<br>Ayam, this; lokah, man, the one who is eligible for action; karma-bandhanah, becomes bound by actions- the person who has karma as his bondage (bandhana) is karma-bandhanah-; anyatra, other than; that karmanah, action; yajnarthat, meant for Got not by that meant for God. According to the Vedic text, 'Sacrifice is verily Visnu' (Tai. Sam. 1.7.4), yajnah means God; whatever is done for Him is yajnartham. Therefore, mukta-sangah, without being attached, being free fr...</code> |
464
+ | <code>Specifically, in the *shataponaka* type, the physician should create wounds within the tracts. After these have healed, the remaining tracts should be treated.</code> | <code>**Susrut Samhita, Chikitsa Sthana, chapter 8, sutra 5**<br><br>**Sutra**:<br>विशेषतस्तु- नाड्यन्तरे व्रणान् कुर्याद्भिषक् तु शतपोनके \| ततस्तेषूपरूढेषु शेषा नाडीरुपाचरेत् \|\|५\|\|<br><br>**English Transliteration**:<br>viśeṣatastu- nāḍyantare vraṇān kuryādbhīṣak tu śataponake \| tatasteṣūparūḍheṣu śeṣā nāḍīrupācaret \|\|5\|\|<br><br>**English Translation**:<br>Specifically, in the *shataponaka* type, the physician should create wounds within the tracts. After these have healed, the remaining tracts should be treated.</code> | <code>**Susrut Samhita, Uttara tantra, chapter 39, sutra 306**<br><br>**Sutra**:<br>चूर्णितैस्त्रिफलाश्यामात्रिवृत्पिप्पलिसंयुतैः \| सक्षौद्रः शर्करायुक्तो विरेकस्तु प्रशस्यते \|\|३०६\|\|<br><br>**English Transliteration**:<br>cūrṇitaistriphalāśyāmātrivṛtpippalisaṃyutaiḥ \| sakṣaudraḥ śarkarāyukto virekastu praśasyate \|\|306\|\|<br><br>**English Translation**:<br>A purgative (vireka) is recommended when prepared with powdered Triphala, Shyama, Trivrit, and Pippali, mixed with honey and sugar.</code> |
465
+ | <code>अथ पुण्ये ऽह्नि संपूज्य पूज्यांस् तां प्रविशेच् छुचिः । तत्र संशोधनैः शुद्धः सुखी जात-बलः पुनः ॥ ८ ॥</code> | <code>**Ashtanga Hridayam, Uttara Sthana, chapter 39, sutra 8**<br><br>**Sutra**:<br>अथ पुण्ये ऽह्नि संपूज्य पूज्यांस् तां प्रविशेच् छुचिः । तत्र संशोधनैः शुद्धः सुखी जात-बलः पुनः ॥ ८ ॥<br><br>**English Transliteration**:<br>atha puṇye 'hni saṃpūjya pūjyāṃs tāṃ praviśec chuchiḥ \| tatra saṃśodhanaiḥ śuddhaḥ sukhī jāta-balaḥ punaḥ \|\| 8 \|\|<br><br>**English Translation**:<br>Then, on an auspicious day, having worshipped the worshipful, the pure one should enter it; there, purified by cleansing therapies, he becomes happy and regains strength.</code> | <code>**Ashtanga Hridayam, Uttara Sthana, chapter 40, sutra 82**<br><br>**Sutra**:<br>दीर्घ-जीवितम् आरोग्यं धर्मम् अर्थं सुखं यशः । पाठावबोधानुष्ठानैर् अधिगच्छत्य् अतो ध्रुवम् ॥ ८२ ॥<br><br>**English Transliteration**:<br>dīrgha-jīvitam ārogyaṁ dharmam arthaṁ sukhaṁ yaśaḥ \| pāṭhāvabodhānuṣṭhānair adhigacchaty ato dhruvam \|\| 82 \|\|<br><br>**English Translation**:<br>Long life, health, righteousness, wealth, happiness, and fame, one attains surely through reading, understanding, and practicing this.</code> |
466
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
467
+ ```json
468
+ {
469
+ "scale": 20.0,
470
+ "similarity_fct": "cos_sim",
471
+ "gather_across_devices": false
472
+ }
473
+ ```
474
+
475
+ ### Evaluation Dataset
476
+
477
+ #### inhouse_devanagari
478
+
479
+ * Dataset: [inhouse_devanagari](https://huggingface.co/datasets/vrnP66/Inhouse_Devanagari) at [9076844](https://huggingface.co/datasets/vrnP66/Inhouse_Devanagari/tree/9076844d6cc74a40a8d079b482f74061b2087185)
480
+ * Size: 5,047 evaluation samples
481
+ * Columns: <code>query</code>, <code>positive_pair</code>, and <code>negative_pair</code>
482
+ * Approximate statistics based on the first 1000 samples:
483
+ | | query | positive_pair | negative_pair |
484
+ |:--------|:------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
485
+ | type | string | string | string |
486
+ | details | <ul><li>min: 10 tokens</li><li>mean: 51.78 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 74 tokens</li><li>mean: 190.96 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 81 tokens</li><li>mean: 194.94 tokens</li><li>max: 512 tokens</li></ul> |
487
+ * Samples:
488
+ | query | positive_pair | negative_pair |
489
+ |:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
490
+ | <code>Marma-destroyed separately not-said their flesh-etc.-depending-on. Generally with-foreign-body but agitating by action with-pain.</code> | <code>**Ashtanga Hridayam, Sutra Sthana, chapter 28, sutra 17**<br><br>**Sutra**:<br>मर्म-नष्टं पृथङ् नोक्तं तेषां मांसादि-संश्रयात् । सामान्येन स-शल्यं तु क्षोभिण्या क्रियया स-रुक् ॥ १७ ॥<br><br>**English Transliteration**:<br>marma-naṣṭaṃ pṛthaṅ noktaṃ teṣāṃ māṃsādi-saṃśrayāt । sāmānyena sa-śalyaṃ tu kṣobhiṇyā kriyayā sa-ruk ॥ 17 ॥<br><br>**English Translation**:<br>Marma-destroyed separately not-said their flesh-etc.-depending-on. Generally with-foreign-body but agitating by action with-pain.</code> | <code>**Ashtanga Hridayam, Chikitsa Sthana, chapter 6, sutra 34**<br><br>**Sutra**:<br>पञ्च-कोल-शठी-पथ्या-गुड-बीजाह्व-पौष्करम् । वारुणी-कल्कितं भृष्टं यमके लवणान्वितम् ॥ ३४ ॥<br><br>**English Transliteration**:<br>pañca-kola-śaṭhī-pathyā-guḍa-bījāhva-pauṣkaram । vāruṇī-kalkitaṃ bhṛṣṭaṃ yamake lavaṇānvitam ॥ 34 ॥<br><br>**English Translation**:<br>Five-kolas, shathi, pathya, jaggery, bija, and pushkara, ground with varuni, fried in clarified butter, and mixed with salt.</code> |
491
+ | <code>प्राचीनामलकं चैव दोषघ्नं गरहारि च\| ऐङ्गुदं तिक्तमधुरं स्निग्धोष्णं कफवातजित्\|\|१४६\|\|</code> | <code>**Charak-Samhita, sutra sthana, chapter 27, sutra 146**<br><br>**Sutra**:<br>प्राचीनामलकं चैव दोषघ्नं गरहारि च\| ऐङ्गुदं तिक्तमधुरं स्निग्धोष्णं कफवातजित्\|\|१४६\|\|<br><br>**English Transliteration**:<br>prācīnāmalakaṃ caiva doṣaghnaṃ garahāri ca\| aiṅgudaṃ tiktamadhuraṃ snigdhoṣṇaṃ kaphavātajit\|\|146\|\|<br><br>**English Translation**:<br>Pracinamalaka eliminates the doshas and counteracts poison. Inguda is bitter and sweet, unctuous and hot, and conquers Kapha and Vata.</code> | <code>**Charak-Samhita, chikitsa sthana, chapter 15, sutra 65**<br><br>**Sutra**:<br>कट्वजीर्णविदाह्यम्लक्षाराद्यैः पित्तमुल्बणम्\| अग्निमाप्लावयद्धन्ति जलं तप्तमिवानलम्\|\|६५\|\|<br><br>**English Transliteration**:<br>kaṭvajīrṇavidāhyamlākṣārādyaiḥ pittamulbaṇam\| agnimāplāvayaddhanti jalaṃ taptamivānalam\|\|65\|\|<br><br>**English Translation**:<br>*Pitta* (bile) aggravated by pungent, indigestible, burning, sour, alkaline, and other substances, overwhelms the *agni* (digestive fire) and destroys it, just as hot water extinguishes a fire.</code> |
492
+ | <code>*Vāta* becomes aggravated by excessive consumption of dry food, overeating, exposure to easterly winds, dew, sexual intercourse, suppression of natural urges, exertion, and exercise.</code> | <code>**Charak-Samhita, siddhi sthana, chapter 9, sutra 74**<br><br>**Sutra**:<br>रूक्षात्यध्यशनात् पूर्ववातावश्यायमैथुनैः\| वेगसन्धारणायासव्यायामैः कुपितोऽनिलः\|\|७४\|\|<br><br>**English Transliteration**:<br>rūkṣātyadhyaśanāt pūrvavātāvaśyāyamaithunaiḥ\| vegasaṃdhāraṇāyāsavyāyāmaiḥ kupito'nilaḥ\|\|74\|\|<br><br>**English Translation**:<br>*Vāta* becomes aggravated by excessive consumption of dry food, overeating, exposure to easterly winds, dew, sexual intercourse, suppression of natural urges, exertion, and exercise.</code> | <code>**Charak-Samhita, sharira sthana, chapter 4, sutra 4**<br><br>**Sutra**:<br>मातृतः पितृत आत्मतः सात्म्यतो रसतः सत्त्वत इत्येतेभ्यो भावेभ्यः समुदितेभ्यो गर्भः सम्भवति\| तस्य ये येऽवयवा यतो यतः सम्भवतः सम्भवन्ति तान् विभज्य मातृजादीनवयवान् पृथक् पृथगुक्तमग्रे\|\|४\|\|<br><br>**English Transliteration**:<br>mātṛtaḥ pitṛta ātmatas sāmyato rasataḥ sattvata ityetebhyo bhāvebhyaḥ samuditebhyo garbhaḥ sambhavati\| tasya ye ye'vayavā yato yataḥ sambhavataḥ sambhavanti tān vibhajya mātṛjādīnavayavān pṛthak pṛthaguktamagre\|\|4\|\|<br><br>**English Translation**:<br>The embryo originates from the combined factors of the mother, the father, the self, suitability, nutrition, and the mind. The specific components of it that originate from each of these sources will be described separately in the following sections, distinguishing the maternal and other components.</code> |
493
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
494
+ ```json
495
+ {
496
+ "scale": 20.0,
497
+ "similarity_fct": "cos_sim",
498
+ "gather_across_devices": false
499
+ }
500
+ ```
501
+
502
+ ### Training Hyperparameters
503
+ #### Non-Default Hyperparameters
504
+
505
+ - `eval_strategy`: steps
506
+ - `per_device_train_batch_size`: 32
507
+ - `per_device_eval_batch_size`: 32
508
+ - `num_train_epochs`: 4
509
+ - `warmup_ratio`: 0.1
510
+ - `fp16`: True
511
+ - `batch_sampler`: no_duplicates
512
+
513
+ #### All Hyperparameters
514
+ <details><summary>Click to expand</summary>
515
+
516
+ - `overwrite_output_dir`: False
517
+ - `do_predict`: False
518
+ - `eval_strategy`: steps
519
+ - `prediction_loss_only`: True
520
+ - `per_device_train_batch_size`: 32
521
+ - `per_device_eval_batch_size`: 32
522
+ - `per_gpu_train_batch_size`: None
523
+ - `per_gpu_eval_batch_size`: None
524
+ - `gradient_accumulation_steps`: 1
525
+ - `eval_accumulation_steps`: None
526
+ - `torch_empty_cache_steps`: None
527
+ - `learning_rate`: 5e-05
528
+ - `weight_decay`: 0.0
529
+ - `adam_beta1`: 0.9
530
+ - `adam_beta2`: 0.999
531
+ - `adam_epsilon`: 1e-08
532
+ - `max_grad_norm`: 1.0
533
+ - `num_train_epochs`: 4
534
+ - `max_steps`: -1
535
+ - `lr_scheduler_type`: linear
536
+ - `lr_scheduler_kwargs`: {}
537
+ - `warmup_ratio`: 0.1
538
+ - `warmup_steps`: 0
539
+ - `log_level`: passive
540
+ - `log_level_replica`: warning
541
+ - `log_on_each_node`: True
542
+ - `logging_nan_inf_filter`: True
543
+ - `save_safetensors`: True
544
+ - `save_on_each_node`: False
545
+ - `save_only_model`: False
546
+ - `restore_callback_states_from_checkpoint`: False
547
+ - `no_cuda`: False
548
+ - `use_cpu`: False
549
+ - `use_mps_device`: False
550
+ - `seed`: 42
551
+ - `data_seed`: None
552
+ - `jit_mode_eval`: False
553
+ - `bf16`: False
554
+ - `fp16`: True
555
+ - `fp16_opt_level`: O1
556
+ - `half_precision_backend`: auto
557
+ - `bf16_full_eval`: False
558
+ - `fp16_full_eval`: False
559
+ - `tf32`: None
560
+ - `local_rank`: 0
561
+ - `ddp_backend`: None
562
+ - `tpu_num_cores`: None
563
+ - `tpu_metrics_debug`: False
564
+ - `debug`: []
565
+ - `dataloader_drop_last`: False
566
+ - `dataloader_num_workers`: 0
567
+ - `dataloader_prefetch_factor`: None
568
+ - `past_index`: -1
569
+ - `disable_tqdm`: False
570
+ - `remove_unused_columns`: True
571
+ - `label_names`: None
572
+ - `load_best_model_at_end`: False
573
+ - `ignore_data_skip`: False
574
+ - `fsdp`: []
575
+ - `fsdp_min_num_params`: 0
576
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
577
+ - `fsdp_transformer_layer_cls_to_wrap`: None
578
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
579
+ - `parallelism_config`: None
580
+ - `deepspeed`: None
581
+ - `label_smoothing_factor`: 0.0
582
+ - `optim`: adamw_torch_fused
583
+ - `optim_args`: None
584
+ - `adafactor`: False
585
+ - `group_by_length`: False
586
+ - `length_column_name`: length
587
+ - `project`: huggingface
588
+ - `trackio_space_id`: trackio
589
+ - `ddp_find_unused_parameters`: None
590
+ - `ddp_bucket_cap_mb`: None
591
+ - `ddp_broadcast_buffers`: False
592
+ - `dataloader_pin_memory`: True
593
+ - `dataloader_persistent_workers`: False
594
+ - `skip_memory_metrics`: True
595
+ - `use_legacy_prediction_loop`: False
596
+ - `push_to_hub`: False
597
+ - `resume_from_checkpoint`: None
598
+ - `hub_model_id`: None
599
+ - `hub_strategy`: every_save
600
+ - `hub_private_repo`: None
601
+ - `hub_always_push`: False
602
+ - `hub_revision`: None
603
+ - `gradient_checkpointing`: False
604
+ - `gradient_checkpointing_kwargs`: None
605
+ - `include_inputs_for_metrics`: False
606
+ - `include_for_metrics`: []
607
+ - `eval_do_concat_batches`: True
608
+ - `fp16_backend`: auto
609
+ - `push_to_hub_model_id`: None
610
+ - `push_to_hub_organization`: None
611
+ - `mp_parameters`:
612
+ - `auto_find_batch_size`: False
613
+ - `full_determinism`: False
614
+ - `torchdynamo`: None
615
+ - `ray_scope`: last
616
+ - `ddp_timeout`: 1800
617
+ - `torch_compile`: False
618
+ - `torch_compile_backend`: None
619
+ - `torch_compile_mode`: None
620
+ - `include_tokens_per_second`: False
621
+ - `include_num_input_tokens_seen`: no
622
+ - `neftune_noise_alpha`: None
623
+ - `optim_target_modules`: None
624
+ - `batch_eval_metrics`: False
625
+ - `eval_on_start`: False
626
+ - `use_liger_kernel`: False
627
+ - `liger_kernel_config`: None
628
+ - `eval_use_gather_object`: False
629
+ - `average_tokens_across_devices`: True
630
+ - `prompts`: None
631
+ - `batch_sampler`: no_duplicates
632
+ - `multi_dataset_batch_sampler`: proportional
633
+ - `router_mapping`: {}
634
+ - `learning_rate_mapping`: {}
635
+
636
+ </details>
637
+
638
+ ### Training Logs
639
+ | Epoch | Step | Training Loss | Validation Loss | Embedding_Dataset_Dev_cosine_accuracy |
640
+ |:------:|:----:|:-------------:|:---------------:|:-------------------------------------:|
641
+ | -1 | -1 | - | - | 0.9907 |
642
+ | 1.2678 | 1600 | 0.0032 | 0.0054 | 0.9998 |
643
+ | 1.3471 | 1700 | 0.0017 | 0.0060 | 0.9994 |
644
+ | 1.4263 | 1800 | 0.0032 | 0.0059 | 0.9994 |
645
+ | 1.5055 | 1900 | 0.0072 | 0.0061 | 0.9996 |
646
+ | 1.5848 | 2000 | 0.0077 | 0.0074 | 0.9994 |
647
+ | 1.6640 | 2100 | 0.0068 | 0.0879 | 0.9952 |
648
+ | 1.7433 | 2200 | 0.0056 | 0.0061 | 0.9996 |
649
+ | 1.8225 | 2300 | 0.0087 | 0.0052 | 1.0 |
650
+ | 1.9017 | 2400 | 0.0112 | 0.0050 | 0.9998 |
651
+ | 1.9810 | 2500 | 0.0036 | 0.0039 | 0.9994 |
652
+ | 2.0602 | 2600 | 0.0047 | 0.0047 | 0.9994 |
653
+ | 2.1395 | 2700 | 0.0054 | 0.0072 | 0.9998 |
654
+ | 2.2187 | 2800 | 0.0052 | 0.0047 | 0.9998 |
655
+ | 2.2979 | 2900 | 0.0044 | 0.0059 | 0.9996 |
656
+ | 2.3772 | 3000 | 0.0051 | 0.0046 | 0.9996 |
657
+ | 2.4564 | 3100 | 0.0068 | 0.0082 | 0.9996 |
658
+ | 2.5357 | 3200 | 0.0051 | 0.0046 | 0.9996 |
659
+ | 2.6149 | 3300 | 0.0025 | 0.0050 | 0.9998 |
660
+ | 2.6941 | 3400 | 0.004 | 0.0052 | 0.9992 |
661
+ | 2.7734 | 3500 | 0.0019 | 0.0048 | 0.9996 |
662
+ | 2.8526 | 3600 | 0.0039 | 0.0042 | 1.0 |
663
+ | 2.9319 | 3700 | 0.0045 | 0.0049 | 0.9998 |
664
+ | 3.0111 | 3800 | 0.002 | 0.0046 | 0.9996 |
665
+ | 3.0903 | 3900 | 0.0028 | 0.0050 | 0.9996 |
666
+ | 3.1696 | 4000 | 0.0033 | 0.0049 | 0.9992 |
667
+ | 3.2488 | 4100 | 0.0052 | 0.0048 | 0.9996 |
668
+ | 3.3281 | 4200 | 0.0026 | 0.0049 | 0.9994 |
669
+ | 3.4073 | 4300 | 0.0043 | 0.0044 | 1.0 |
670
+ | 3.4865 | 4400 | 0.0038 | 0.0041 | 0.9998 |
671
+ | 3.5658 | 4500 | 0.003 | 0.0043 | 1.0 |
672
+ | 3.6450 | 4600 | 0.003 | 0.0045 | 1.0 |
673
+ | 3.7242 | 4700 | 0.003 | 0.0045 | 0.9998 |
674
+ | 3.8035 | 4800 | 0.0009 | 0.0041 | 0.9998 |
675
+ | 3.8827 | 4900 | 0.0048 | 0.0042 | 0.9998 |
676
+ | 3.9620 | 5000 | 0.0035 | 0.0042 | 0.9996 |
677
+
678
+
679
+ ### Framework Versions
680
+ - Python: 3.12.11
681
+ - Sentence Transformers: 5.1.1
682
+ - Transformers: 4.57.0
683
+ - PyTorch: 2.8.0+cu128
684
+ - Accelerate: 1.10.1
685
+ - Datasets: 4.2.0
686
+ - Tokenizers: 0.22.1
687
+
688
+ ## Citation
689
+
690
+ ### BibTeX
691
+
692
+ #### Sentence Transformers
693
+ ```bibtex
694
+ @inproceedings{reimers-2019-sentence-bert,
695
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
696
+ author = "Reimers, Nils and Gurevych, Iryna",
697
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
698
+ month = "11",
699
+ year = "2019",
700
+ publisher = "Association for Computational Linguistics",
701
+ url = "https://arxiv.org/abs/1908.10084",
702
+ }
703
+ ```
704
+
705
+ #### MultipleNegativesRankingLoss
706
+ ```bibtex
707
+ @misc{henderson2017efficient,
708
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
709
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
710
+ year={2017},
711
+ eprint={1705.00652},
712
+ archivePrefix={arXiv},
713
+ primaryClass={cs.CL}
714
+ }
715
+ ```
716
+
717
+ <!--
718
+ ## Glossary
719
+
720
+ *Clearly define terms in order to be accessible across audiences.*
721
+ -->
722
+
723
+ <!--
724
+ ## Model Card Authors
725
+
726
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
727
+ -->
728
+
729
+ <!--
730
+ ## Model Card Contact
731
+
732
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
733
+ -->
config.json ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "XLMRobertaModel"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "bos_token_id": 0,
7
+ "classifier_dropout": null,
8
+ "dtype": "float32",
9
+ "eos_token_id": 2,
10
+ "hidden_act": "gelu",
11
+ "hidden_dropout_prob": 0.1,
12
+ "hidden_size": 1024,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 4096,
15
+ "layer_norm_eps": 1e-05,
16
+ "max_position_embeddings": 514,
17
+ "model_type": "xlm-roberta",
18
+ "num_attention_heads": 16,
19
+ "num_hidden_layers": 24,
20
+ "output_past": true,
21
+ "pad_token_id": 1,
22
+ "position_embedding_type": "absolute",
23
+ "transformers_version": "4.57.0",
24
+ "type_vocab_size": 1,
25
+ "use_cache": true,
26
+ "vocab_size": 286270
27
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "SentenceTransformer",
3
+ "__version__": {
4
+ "sentence_transformers": "5.1.1",
5
+ "transformers": "4.57.0",
6
+ "pytorch": "2.8.0+cu128"
7
+ },
8
+ "prompts": {
9
+ "query": "",
10
+ "document": ""
11
+ },
12
+ "default_prompt_name": null,
13
+ "similarity_fn_name": "cosine"
14
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:40d105576c14a09ed895198f7b168ba54553c5a74ecfc3353f6c6b24bbe95402
3
+ size 2388160904
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:66955d7f6148863429c5aab7089f8ad0b7fd7a86171ae411bbda6c77eeab6ea8
3
+ size 4768175234
rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5de680bb9ab70c55828ca504b5a071838f36e51fec74a9e6d2f80afa30d4ab63
3
+ size 14645
scaler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:66f76fffcfe1c4382c89bafa7e7b2706f6a59b4d73f34c1fab6c5c7554236aaf
3
+ size 1383
scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:97a224a88ced7f582eea1b473921835f75c3fa1a0a3876ea2abaafd1e39db014
3
+ size 1465
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "<unk>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7b3f177750598cb2d9c737abeee48b8f22d9552e5788f25c9304c70a5f0b57d2
3
+ size 25437877
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff
 
trainer_state.json ADDED
@@ -0,0 +1,834 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": null,
3
+ "best_metric": null,
4
+ "best_model_checkpoint": null,
5
+ "epoch": 4.0,
6
+ "eval_steps": 100,
7
+ "global_step": 5048,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.07923930269413629,
14
+ "grad_norm": 0.36957958340644836,
15
+ "learning_rate": 9.801980198019804e-06,
16
+ "loss": 1.0261,
17
+ "step": 100
18
+ },
19
+ {
20
+ "epoch": 0.07923930269413629,
21
+ "eval_Embedding_Dataset_Dev_cosine_accuracy": 0.9996037483215332,
22
+ "eval_loss": 0.009768067859113216,
23
+ "eval_runtime": 34.4825,
24
+ "eval_samples_per_second": 146.364,
25
+ "eval_steps_per_second": 4.582,
26
+ "step": 100
27
+ },
28
+ {
29
+ "epoch": 0.15847860538827258,
30
+ "grad_norm": 0.15150195360183716,
31
+ "learning_rate": 1.9702970297029703e-05,
32
+ "loss": 0.0086,
33
+ "step": 200
34
+ },
35
+ {
36
+ "epoch": 0.15847860538827258,
37
+ "eval_Embedding_Dataset_Dev_cosine_accuracy": 0.999405562877655,
38
+ "eval_loss": 0.006445930805057287,
39
+ "eval_runtime": 33.42,
40
+ "eval_samples_per_second": 151.017,
41
+ "eval_steps_per_second": 4.728,
42
+ "step": 200
43
+ },
44
+ {
45
+ "epoch": 0.23771790808240886,
46
+ "grad_norm": 0.24285651743412018,
47
+ "learning_rate": 2.9603960396039603e-05,
48
+ "loss": 0.0043,
49
+ "step": 300
50
+ },
51
+ {
52
+ "epoch": 0.23771790808240886,
53
+ "eval_Embedding_Dataset_Dev_cosine_accuracy": 0.9996037483215332,
54
+ "eval_loss": 0.00557651836425066,
55
+ "eval_runtime": 32.8039,
56
+ "eval_samples_per_second": 153.854,
57
+ "eval_steps_per_second": 4.816,
58
+ "step": 300
59
+ },
60
+ {
61
+ "epoch": 0.31695721077654515,
62
+ "grad_norm": 0.061441030353307724,
63
+ "learning_rate": 3.950495049504951e-05,
64
+ "loss": 0.0051,
65
+ "step": 400
66
+ },
67
+ {
68
+ "epoch": 0.31695721077654515,
69
+ "eval_Embedding_Dataset_Dev_cosine_accuracy": 0.999405562877655,
70
+ "eval_loss": 0.005679984577000141,
71
+ "eval_runtime": 32.9416,
72
+ "eval_samples_per_second": 153.21,
73
+ "eval_steps_per_second": 4.796,
74
+ "step": 400
75
+ },
76
+ {
77
+ "epoch": 0.39619651347068147,
78
+ "grad_norm": 0.09473396837711334,
79
+ "learning_rate": 4.9405940594059405e-05,
80
+ "loss": 0.0068,
81
+ "step": 500
82
+ },
83
+ {
84
+ "epoch": 0.39619651347068147,
85
+ "eval_Embedding_Dataset_Dev_cosine_accuracy": 1.0,
86
+ "eval_loss": 0.008005364798009396,
87
+ "eval_runtime": 32.9704,
88
+ "eval_samples_per_second": 153.077,
89
+ "eval_steps_per_second": 4.792,
90
+ "step": 500
91
+ },
92
+ {
93
+ "epoch": 0.4754358161648177,
94
+ "grad_norm": 2.07448148727417,
95
+ "learning_rate": 4.8965441338322695e-05,
96
+ "loss": 0.0078,
97
+ "step": 600
98
+ },
99
+ {
100
+ "epoch": 0.4754358161648177,
101
+ "eval_Embedding_Dataset_Dev_cosine_accuracy": 0.9998018741607666,
102
+ "eval_loss": 0.007311029359698296,
103
+ "eval_runtime": 33.4995,
104
+ "eval_samples_per_second": 150.659,
105
+ "eval_steps_per_second": 4.716,
106
+ "step": 600
107
+ },
108
+ {
109
+ "epoch": 0.554675118858954,
110
+ "grad_norm": 0.44440600275993347,
111
+ "learning_rate": 4.7864847017389395e-05,
112
+ "loss": 0.0066,
113
+ "step": 700
114
+ },
115
+ {
116
+ "epoch": 0.554675118858954,
117
+ "eval_Embedding_Dataset_Dev_cosine_accuracy": 0.999405562877655,
118
+ "eval_loss": 0.00900042150169611,
119
+ "eval_runtime": 33.8286,
120
+ "eval_samples_per_second": 149.193,
121
+ "eval_steps_per_second": 4.671,
122
+ "step": 700
123
+ },
124
+ {
125
+ "epoch": 0.6339144215530903,
126
+ "grad_norm": 0.06913918256759644,
127
+ "learning_rate": 4.676425269645609e-05,
128
+ "loss": 0.0081,
129
+ "step": 800
130
+ },
131
+ {
132
+ "epoch": 0.6339144215530903,
133
+ "eval_Embedding_Dataset_Dev_cosine_accuracy": 0.9996037483215332,
134
+ "eval_loss": 0.0071762725710868835,
135
+ "eval_runtime": 33.3998,
136
+ "eval_samples_per_second": 151.109,
137
+ "eval_steps_per_second": 4.731,
138
+ "step": 800
139
+ },
140
+ {
141
+ "epoch": 0.7131537242472267,
142
+ "grad_norm": 0.7136378884315491,
143
+ "learning_rate": 4.566365837552278e-05,
144
+ "loss": 0.0096,
145
+ "step": 900
146
+ },
147
+ {
148
+ "epoch": 0.7131537242472267,
149
+ "eval_Embedding_Dataset_Dev_cosine_accuracy": 0.999405562877655,
150
+ "eval_loss": 0.008276536129415035,
151
+ "eval_runtime": 33.755,
152
+ "eval_samples_per_second": 149.519,
153
+ "eval_steps_per_second": 4.681,
154
+ "step": 900
155
+ },
156
+ {
157
+ "epoch": 0.7923930269413629,
158
+ "grad_norm": 0.5450271964073181,
159
+ "learning_rate": 4.456306405458948e-05,
160
+ "loss": 0.0088,
161
+ "step": 1000
162
+ },
163
+ {
164
+ "epoch": 0.7923930269413629,
165
+ "eval_Embedding_Dataset_Dev_cosine_accuracy": 0.999405562877655,
166
+ "eval_loss": 0.006255414802581072,
167
+ "eval_runtime": 33.084,
168
+ "eval_samples_per_second": 152.551,
169
+ "eval_steps_per_second": 4.776,
170
+ "step": 1000
171
+ },
172
+ {
173
+ "epoch": 0.8716323296354992,
174
+ "grad_norm": 0.026704631745815277,
175
+ "learning_rate": 4.346246973365617e-05,
176
+ "loss": 0.0087,
177
+ "step": 1100
178
+ },
179
+ {
180
+ "epoch": 0.8716323296354992,
181
+ "eval_Embedding_Dataset_Dev_cosine_accuracy": 0.999405562877655,
182
+ "eval_loss": 0.006418165750801563,
183
+ "eval_runtime": 32.6589,
184
+ "eval_samples_per_second": 154.537,
185
+ "eval_steps_per_second": 4.838,
186
+ "step": 1100
187
+ },
188
+ {
189
+ "epoch": 0.9508716323296355,
190
+ "grad_norm": 2.8012053966522217,
191
+ "learning_rate": 4.236187541272287e-05,
192
+ "loss": 0.0096,
193
+ "step": 1200
194
+ },
195
+ {
196
+ "epoch": 0.9508716323296355,
197
+ "eval_Embedding_Dataset_Dev_cosine_accuracy": 0.9996037483215332,
198
+ "eval_loss": 0.006508174352347851,
199
+ "eval_runtime": 33.0205,
200
+ "eval_samples_per_second": 152.844,
201
+ "eval_steps_per_second": 4.785,
202
+ "step": 1200
203
+ },
204
+ {
205
+ "epoch": 1.0301109350237718,
206
+ "grad_norm": 0.059912703931331635,
207
+ "learning_rate": 4.1261281091789564e-05,
208
+ "loss": 0.0085,
209
+ "step": 1300
210
+ },
211
+ {
212
+ "epoch": 1.0301109350237718,
213
+ "eval_Embedding_Dataset_Dev_cosine_accuracy": 0.9998018741607666,
214
+ "eval_loss": 0.005652877036482096,
215
+ "eval_runtime": 32.9331,
216
+ "eval_samples_per_second": 153.25,
217
+ "eval_steps_per_second": 4.798,
218
+ "step": 1300
219
+ },
220
+ {
221
+ "epoch": 1.109350237717908,
222
+ "grad_norm": 0.039631109684705734,
223
+ "learning_rate": 4.0160686770856264e-05,
224
+ "loss": 0.0049,
225
+ "step": 1400
226
+ },
227
+ {
228
+ "epoch": 1.109350237717908,
229
+ "eval_Embedding_Dataset_Dev_cosine_accuracy": 0.9998018741607666,
230
+ "eval_loss": 0.004980658181011677,
231
+ "eval_runtime": 34.2128,
232
+ "eval_samples_per_second": 147.518,
233
+ "eval_steps_per_second": 4.618,
234
+ "step": 1400
235
+ },
236
+ {
237
+ "epoch": 1.1885895404120443,
238
+ "grad_norm": 0.02145099826157093,
239
+ "learning_rate": 3.9060092449922957e-05,
240
+ "loss": 0.0048,
241
+ "step": 1500
242
+ },
243
+ {
244
+ "epoch": 1.1885895404120443,
245
+ "eval_Embedding_Dataset_Dev_cosine_accuracy": 0.9996037483215332,
246
+ "eval_loss": 0.005376764573156834,
247
+ "eval_runtime": 32.9215,
248
+ "eval_samples_per_second": 153.304,
249
+ "eval_steps_per_second": 4.799,
250
+ "step": 1500
251
+ },
252
+ {
253
+ "epoch": 1.2678288431061806,
254
+ "grad_norm": 1.3420361280441284,
255
+ "learning_rate": 3.7959498128989656e-05,
256
+ "loss": 0.0032,
257
+ "step": 1600
258
+ },
259
+ {
260
+ "epoch": 1.2678288431061806,
261
+ "eval_Embedding_Dataset_Dev_cosine_accuracy": 0.9998018741607666,
262
+ "eval_loss": 0.005436555948108435,
263
+ "eval_runtime": 66.7901,
264
+ "eval_samples_per_second": 75.565,
265
+ "eval_steps_per_second": 2.366,
266
+ "step": 1600
267
+ },
268
+ {
269
+ "epoch": 1.3470681458003169,
270
+ "grad_norm": 0.040875934064388275,
271
+ "learning_rate": 3.685890380805635e-05,
272
+ "loss": 0.0017,
273
+ "step": 1700
274
+ },
275
+ {
276
+ "epoch": 1.3470681458003169,
277
+ "eval_Embedding_Dataset_Dev_cosine_accuracy": 0.999405562877655,
278
+ "eval_loss": 0.006017347332090139,
279
+ "eval_runtime": 67.8654,
280
+ "eval_samples_per_second": 74.368,
281
+ "eval_steps_per_second": 2.328,
282
+ "step": 1700
283
+ },
284
+ {
285
+ "epoch": 1.4263074484944531,
286
+ "grad_norm": 0.030839553102850914,
287
+ "learning_rate": 3.575830948712305e-05,
288
+ "loss": 0.0032,
289
+ "step": 1800
290
+ },
291
+ {
292
+ "epoch": 1.4263074484944531,
293
+ "eval_Embedding_Dataset_Dev_cosine_accuracy": 0.999405562877655,
294
+ "eval_loss": 0.0059111895971000195,
295
+ "eval_runtime": 67.895,
296
+ "eval_samples_per_second": 74.335,
297
+ "eval_steps_per_second": 2.327,
298
+ "step": 1800
299
+ },
300
+ {
301
+ "epoch": 1.5055467511885894,
302
+ "grad_norm": 0.06008416414260864,
303
+ "learning_rate": 3.465771516618974e-05,
304
+ "loss": 0.0072,
305
+ "step": 1900
306
+ },
307
+ {
308
+ "epoch": 1.5055467511885894,
309
+ "eval_Embedding_Dataset_Dev_cosine_accuracy": 0.9996037483215332,
310
+ "eval_loss": 0.006080140359699726,
311
+ "eval_runtime": 67.931,
312
+ "eval_samples_per_second": 74.296,
313
+ "eval_steps_per_second": 2.326,
314
+ "step": 1900
315
+ },
316
+ {
317
+ "epoch": 1.5847860538827259,
318
+ "grad_norm": 0.02880307100713253,
319
+ "learning_rate": 3.355712084525644e-05,
320
+ "loss": 0.0077,
321
+ "step": 2000
322
+ },
323
+ {
324
+ "epoch": 1.5847860538827259,
325
+ "eval_Embedding_Dataset_Dev_cosine_accuracy": 0.999405562877655,
326
+ "eval_loss": 0.007432046812027693,
327
+ "eval_runtime": 67.8369,
328
+ "eval_samples_per_second": 74.399,
329
+ "eval_steps_per_second": 2.329,
330
+ "step": 2000
331
+ },
332
+ {
333
+ "epoch": 1.6640253565768621,
334
+ "grad_norm": 0.1070467010140419,
335
+ "learning_rate": 3.245652652432314e-05,
336
+ "loss": 0.0068,
337
+ "step": 2100
338
+ },
339
+ {
340
+ "epoch": 1.6640253565768621,
341
+ "eval_Embedding_Dataset_Dev_cosine_accuracy": 0.9952446818351746,
342
+ "eval_loss": 0.08792955428361893,
343
+ "eval_runtime": 67.8539,
344
+ "eval_samples_per_second": 74.38,
345
+ "eval_steps_per_second": 2.329,
346
+ "step": 2100
347
+ },
348
+ {
349
+ "epoch": 1.7432646592709984,
350
+ "grad_norm": 0.05642708018422127,
351
+ "learning_rate": 3.135593220338983e-05,
352
+ "loss": 0.0056,
353
+ "step": 2200
354
+ },
355
+ {
356
+ "epoch": 1.7432646592709984,
357
+ "eval_Embedding_Dataset_Dev_cosine_accuracy": 0.9996037483215332,
358
+ "eval_loss": 0.006125820800662041,
359
+ "eval_runtime": 67.8381,
360
+ "eval_samples_per_second": 74.398,
361
+ "eval_steps_per_second": 2.329,
362
+ "step": 2200
363
+ },
364
+ {
365
+ "epoch": 1.8225039619651349,
366
+ "grad_norm": 0.04333237186074257,
367
+ "learning_rate": 3.0255337882456532e-05,
368
+ "loss": 0.0087,
369
+ "step": 2300
370
+ },
371
+ {
372
+ "epoch": 1.8225039619651349,
373
+ "eval_Embedding_Dataset_Dev_cosine_accuracy": 1.0,
374
+ "eval_loss": 0.005210440140217543,
375
+ "eval_runtime": 68.0258,
376
+ "eval_samples_per_second": 74.192,
377
+ "eval_steps_per_second": 2.323,
378
+ "step": 2300
379
+ },
380
+ {
381
+ "epoch": 1.9017432646592711,
382
+ "grad_norm": 0.10529103130102158,
383
+ "learning_rate": 2.9154743561523225e-05,
384
+ "loss": 0.0112,
385
+ "step": 2400
386
+ },
387
+ {
388
+ "epoch": 1.9017432646592711,
389
+ "eval_Embedding_Dataset_Dev_cosine_accuracy": 0.9998018741607666,
390
+ "eval_loss": 0.004964636173099279,
391
+ "eval_runtime": 67.9671,
392
+ "eval_samples_per_second": 74.257,
393
+ "eval_steps_per_second": 2.325,
394
+ "step": 2400
395
+ },
396
+ {
397
+ "epoch": 1.9809825673534074,
398
+ "grad_norm": 0.030080392956733704,
399
+ "learning_rate": 2.805414924058992e-05,
400
+ "loss": 0.0036,
401
+ "step": 2500
402
+ },
403
+ {
404
+ "epoch": 1.9809825673534074,
405
+ "eval_Embedding_Dataset_Dev_cosine_accuracy": 0.999405562877655,
406
+ "eval_loss": 0.003934075124561787,
407
+ "eval_runtime": 67.9768,
408
+ "eval_samples_per_second": 74.246,
409
+ "eval_steps_per_second": 2.324,
410
+ "step": 2500
411
+ },
412
+ {
413
+ "epoch": 2.0602218700475436,
414
+ "grad_norm": 0.05913154035806656,
415
+ "learning_rate": 2.6953554919656613e-05,
416
+ "loss": 0.0047,
417
+ "step": 2600
418
+ },
419
+ {
420
+ "epoch": 2.0602218700475436,
421
+ "eval_Embedding_Dataset_Dev_cosine_accuracy": 0.999405562877655,
422
+ "eval_loss": 0.004718616604804993,
423
+ "eval_runtime": 68.1157,
424
+ "eval_samples_per_second": 74.095,
425
+ "eval_steps_per_second": 2.32,
426
+ "step": 2600
427
+ },
428
+ {
429
+ "epoch": 2.13946117274168,
430
+ "grad_norm": 0.02771185152232647,
431
+ "learning_rate": 2.5852960598723313e-05,
432
+ "loss": 0.0054,
433
+ "step": 2700
434
+ },
435
+ {
436
+ "epoch": 2.13946117274168,
437
+ "eval_Embedding_Dataset_Dev_cosine_accuracy": 0.9998018741607666,
438
+ "eval_loss": 0.007168024778366089,
439
+ "eval_runtime": 67.8508,
440
+ "eval_samples_per_second": 74.384,
441
+ "eval_steps_per_second": 2.329,
442
+ "step": 2700
443
+ },
444
+ {
445
+ "epoch": 2.218700475435816,
446
+ "grad_norm": 0.023437298834323883,
447
+ "learning_rate": 2.4752366277790006e-05,
448
+ "loss": 0.0052,
449
+ "step": 2800
450
+ },
451
+ {
452
+ "epoch": 2.218700475435816,
453
+ "eval_Embedding_Dataset_Dev_cosine_accuracy": 0.9998018741607666,
454
+ "eval_loss": 0.004668584559112787,
455
+ "eval_runtime": 67.8782,
456
+ "eval_samples_per_second": 74.354,
457
+ "eval_steps_per_second": 2.328,
458
+ "step": 2800
459
+ },
460
+ {
461
+ "epoch": 2.2979397781299524,
462
+ "grad_norm": 0.11916761100292206,
463
+ "learning_rate": 2.36517719568567e-05,
464
+ "loss": 0.0044,
465
+ "step": 2900
466
+ },
467
+ {
468
+ "epoch": 2.2979397781299524,
469
+ "eval_Embedding_Dataset_Dev_cosine_accuracy": 0.9996037483215332,
470
+ "eval_loss": 0.0058599598705768585,
471
+ "eval_runtime": 67.9158,
472
+ "eval_samples_per_second": 74.313,
473
+ "eval_steps_per_second": 2.326,
474
+ "step": 2900
475
+ },
476
+ {
477
+ "epoch": 2.3771790808240887,
478
+ "grad_norm": 0.10791371762752533,
479
+ "learning_rate": 2.2551177635923398e-05,
480
+ "loss": 0.0051,
481
+ "step": 3000
482
+ },
483
+ {
484
+ "epoch": 2.3771790808240887,
485
+ "eval_Embedding_Dataset_Dev_cosine_accuracy": 0.9996037483215332,
486
+ "eval_loss": 0.00463093351572752,
487
+ "eval_runtime": 68.0279,
488
+ "eval_samples_per_second": 74.19,
489
+ "eval_steps_per_second": 2.323,
490
+ "step": 3000
491
+ },
492
+ {
493
+ "epoch": 2.456418383518225,
494
+ "grad_norm": 6.349626064300537,
495
+ "learning_rate": 2.1450583314990097e-05,
496
+ "loss": 0.0068,
497
+ "step": 3100
498
+ },
499
+ {
500
+ "epoch": 2.456418383518225,
501
+ "eval_Embedding_Dataset_Dev_cosine_accuracy": 0.9996037483215332,
502
+ "eval_loss": 0.008199479430913925,
503
+ "eval_runtime": 68.0218,
504
+ "eval_samples_per_second": 74.197,
505
+ "eval_steps_per_second": 2.323,
506
+ "step": 3100
507
+ },
508
+ {
509
+ "epoch": 2.535657686212361,
510
+ "grad_norm": 0.07518544048070908,
511
+ "learning_rate": 2.0349988994056793e-05,
512
+ "loss": 0.0051,
513
+ "step": 3200
514
+ },
515
+ {
516
+ "epoch": 2.535657686212361,
517
+ "eval_Embedding_Dataset_Dev_cosine_accuracy": 0.9996037483215332,
518
+ "eval_loss": 0.004618997685611248,
519
+ "eval_runtime": 67.9844,
520
+ "eval_samples_per_second": 74.238,
521
+ "eval_steps_per_second": 2.324,
522
+ "step": 3200
523
+ },
524
+ {
525
+ "epoch": 2.6148969889064975,
526
+ "grad_norm": 0.6720037460327148,
527
+ "learning_rate": 1.924939467312349e-05,
528
+ "loss": 0.0025,
529
+ "step": 3300
530
+ },
531
+ {
532
+ "epoch": 2.6148969889064975,
533
+ "eval_Embedding_Dataset_Dev_cosine_accuracy": 0.9998018741607666,
534
+ "eval_loss": 0.005006096325814724,
535
+ "eval_runtime": 67.9382,
536
+ "eval_samples_per_second": 74.288,
537
+ "eval_steps_per_second": 2.326,
538
+ "step": 3300
539
+ },
540
+ {
541
+ "epoch": 2.6941362916006337,
542
+ "grad_norm": 0.0643150731921196,
543
+ "learning_rate": 1.8148800352190185e-05,
544
+ "loss": 0.004,
545
+ "step": 3400
546
+ },
547
+ {
548
+ "epoch": 2.6941362916006337,
549
+ "eval_Embedding_Dataset_Dev_cosine_accuracy": 0.9992074370384216,
550
+ "eval_loss": 0.005211703013628721,
551
+ "eval_runtime": 67.9947,
552
+ "eval_samples_per_second": 74.226,
553
+ "eval_steps_per_second": 2.324,
554
+ "step": 3400
555
+ },
556
+ {
557
+ "epoch": 2.7733755942947704,
558
+ "grad_norm": 0.07725568860769272,
559
+ "learning_rate": 1.704820603125688e-05,
560
+ "loss": 0.0019,
561
+ "step": 3500
562
+ },
563
+ {
564
+ "epoch": 2.7733755942947704,
565
+ "eval_Embedding_Dataset_Dev_cosine_accuracy": 0.9996037483215332,
566
+ "eval_loss": 0.004846959374845028,
567
+ "eval_runtime": 68.0179,
568
+ "eval_samples_per_second": 74.201,
569
+ "eval_steps_per_second": 2.323,
570
+ "step": 3500
571
+ },
572
+ {
573
+ "epoch": 2.8526148969889062,
574
+ "grad_norm": 0.024744508787989616,
575
+ "learning_rate": 1.5947611710323578e-05,
576
+ "loss": 0.0039,
577
+ "step": 3600
578
+ },
579
+ {
580
+ "epoch": 2.8526148969889062,
581
+ "eval_Embedding_Dataset_Dev_cosine_accuracy": 1.0,
582
+ "eval_loss": 0.0042335595935583115,
583
+ "eval_runtime": 68.2641,
584
+ "eval_samples_per_second": 73.933,
585
+ "eval_steps_per_second": 2.315,
586
+ "step": 3600
587
+ },
588
+ {
589
+ "epoch": 2.931854199683043,
590
+ "grad_norm": 0.015782877802848816,
591
+ "learning_rate": 1.4847017389390272e-05,
592
+ "loss": 0.0045,
593
+ "step": 3700
594
+ },
595
+ {
596
+ "epoch": 2.931854199683043,
597
+ "eval_Embedding_Dataset_Dev_cosine_accuracy": 0.9998018741607666,
598
+ "eval_loss": 0.004929765127599239,
599
+ "eval_runtime": 67.901,
600
+ "eval_samples_per_second": 74.329,
601
+ "eval_steps_per_second": 2.327,
602
+ "step": 3700
603
+ },
604
+ {
605
+ "epoch": 3.011093502377179,
606
+ "grad_norm": 2.817779779434204,
607
+ "learning_rate": 1.3746423068456968e-05,
608
+ "loss": 0.002,
609
+ "step": 3800
610
+ },
611
+ {
612
+ "epoch": 3.011093502377179,
613
+ "eval_Embedding_Dataset_Dev_cosine_accuracy": 0.9996037483215332,
614
+ "eval_loss": 0.004618987441062927,
615
+ "eval_runtime": 67.8421,
616
+ "eval_samples_per_second": 74.393,
617
+ "eval_steps_per_second": 2.329,
618
+ "step": 3800
619
+ },
620
+ {
621
+ "epoch": 3.0903328050713155,
622
+ "grad_norm": 0.011266672052443027,
623
+ "learning_rate": 1.2645828747523664e-05,
624
+ "loss": 0.0028,
625
+ "step": 3900
626
+ },
627
+ {
628
+ "epoch": 3.0903328050713155,
629
+ "eval_Embedding_Dataset_Dev_cosine_accuracy": 0.9996037483215332,
630
+ "eval_loss": 0.004977984819561243,
631
+ "eval_runtime": 67.9701,
632
+ "eval_samples_per_second": 74.253,
633
+ "eval_steps_per_second": 2.325,
634
+ "step": 3900
635
+ },
636
+ {
637
+ "epoch": 3.1695721077654517,
638
+ "grad_norm": 0.030996697023510933,
639
+ "learning_rate": 1.154523442659036e-05,
640
+ "loss": 0.0033,
641
+ "step": 4000
642
+ },
643
+ {
644
+ "epoch": 3.1695721077654517,
645
+ "eval_Embedding_Dataset_Dev_cosine_accuracy": 0.9992074370384216,
646
+ "eval_loss": 0.004901626612991095,
647
+ "eval_runtime": 68.0995,
648
+ "eval_samples_per_second": 74.112,
649
+ "eval_steps_per_second": 2.32,
650
+ "step": 4000
651
+ },
652
+ {
653
+ "epoch": 3.248811410459588,
654
+ "grad_norm": 0.011963835917413235,
655
+ "learning_rate": 1.0444640105657054e-05,
656
+ "loss": 0.0052,
657
+ "step": 4100
658
+ },
659
+ {
660
+ "epoch": 3.248811410459588,
661
+ "eval_Embedding_Dataset_Dev_cosine_accuracy": 0.9996037483215332,
662
+ "eval_loss": 0.0047536250203847885,
663
+ "eval_runtime": 67.9001,
664
+ "eval_samples_per_second": 74.33,
665
+ "eval_steps_per_second": 2.327,
666
+ "step": 4100
667
+ },
668
+ {
669
+ "epoch": 3.3280507131537242,
670
+ "grad_norm": 0.00867912545800209,
671
+ "learning_rate": 9.34404578472375e-06,
672
+ "loss": 0.0026,
673
+ "step": 4200
674
+ },
675
+ {
676
+ "epoch": 3.3280507131537242,
677
+ "eval_Embedding_Dataset_Dev_cosine_accuracy": 0.999405562877655,
678
+ "eval_loss": 0.004853234626352787,
679
+ "eval_runtime": 67.9652,
680
+ "eval_samples_per_second": 74.259,
681
+ "eval_steps_per_second": 2.325,
682
+ "step": 4200
683
+ },
684
+ {
685
+ "epoch": 3.4072900158478605,
686
+ "grad_norm": 0.012174161151051521,
687
+ "learning_rate": 8.243451463790447e-06,
688
+ "loss": 0.0043,
689
+ "step": 4300
690
+ },
691
+ {
692
+ "epoch": 3.4072900158478605,
693
+ "eval_Embedding_Dataset_Dev_cosine_accuracy": 1.0,
694
+ "eval_loss": 0.004362211097031832,
695
+ "eval_runtime": 68.0327,
696
+ "eval_samples_per_second": 74.185,
697
+ "eval_steps_per_second": 2.322,
698
+ "step": 4300
699
+ },
700
+ {
701
+ "epoch": 3.4865293185419968,
702
+ "grad_norm": 1.0564968585968018,
703
+ "learning_rate": 7.142857142857143e-06,
704
+ "loss": 0.0038,
705
+ "step": 4400
706
+ },
707
+ {
708
+ "epoch": 3.4865293185419968,
709
+ "eval_Embedding_Dataset_Dev_cosine_accuracy": 0.9998018741607666,
710
+ "eval_loss": 0.004115838557481766,
711
+ "eval_runtime": 67.8268,
712
+ "eval_samples_per_second": 74.41,
713
+ "eval_steps_per_second": 2.329,
714
+ "step": 4400
715
+ },
716
+ {
717
+ "epoch": 3.565768621236133,
718
+ "grad_norm": 0.018350793048739433,
719
+ "learning_rate": 6.04226282192384e-06,
720
+ "loss": 0.003,
721
+ "step": 4500
722
+ },
723
+ {
724
+ "epoch": 3.565768621236133,
725
+ "eval_Embedding_Dataset_Dev_cosine_accuracy": 1.0,
726
+ "eval_loss": 0.0043433657847344875,
727
+ "eval_runtime": 68.0255,
728
+ "eval_samples_per_second": 74.193,
729
+ "eval_steps_per_second": 2.323,
730
+ "step": 4500
731
+ },
732
+ {
733
+ "epoch": 3.6450079239302693,
734
+ "grad_norm": 0.0198595579713583,
735
+ "learning_rate": 4.941668500990535e-06,
736
+ "loss": 0.003,
737
+ "step": 4600
738
+ },
739
+ {
740
+ "epoch": 3.6450079239302693,
741
+ "eval_Embedding_Dataset_Dev_cosine_accuracy": 1.0,
742
+ "eval_loss": 0.004538228269666433,
743
+ "eval_runtime": 67.8043,
744
+ "eval_samples_per_second": 74.435,
745
+ "eval_steps_per_second": 2.33,
746
+ "step": 4600
747
+ },
748
+ {
749
+ "epoch": 3.7242472266244055,
750
+ "grad_norm": 0.014352944679558277,
751
+ "learning_rate": 3.841074180057231e-06,
752
+ "loss": 0.003,
753
+ "step": 4700
754
+ },
755
+ {
756
+ "epoch": 3.7242472266244055,
757
+ "eval_Embedding_Dataset_Dev_cosine_accuracy": 0.9998018741607666,
758
+ "eval_loss": 0.004474899731576443,
759
+ "eval_runtime": 67.8863,
760
+ "eval_samples_per_second": 74.345,
761
+ "eval_steps_per_second": 2.327,
762
+ "step": 4700
763
+ },
764
+ {
765
+ "epoch": 3.8034865293185423,
766
+ "grad_norm": 0.05333567038178444,
767
+ "learning_rate": 2.740479859123927e-06,
768
+ "loss": 0.0009,
769
+ "step": 4800
770
+ },
771
+ {
772
+ "epoch": 3.8034865293185423,
773
+ "eval_Embedding_Dataset_Dev_cosine_accuracy": 0.9998018741607666,
774
+ "eval_loss": 0.0041495212353765965,
775
+ "eval_runtime": 68.1146,
776
+ "eval_samples_per_second": 74.096,
777
+ "eval_steps_per_second": 2.32,
778
+ "step": 4800
779
+ },
780
+ {
781
+ "epoch": 3.882725832012678,
782
+ "grad_norm": 0.017361685633659363,
783
+ "learning_rate": 1.639885538190623e-06,
784
+ "loss": 0.0048,
785
+ "step": 4900
786
+ },
787
+ {
788
+ "epoch": 3.882725832012678,
789
+ "eval_Embedding_Dataset_Dev_cosine_accuracy": 0.9998018741607666,
790
+ "eval_loss": 0.004217915236949921,
791
+ "eval_runtime": 67.5299,
792
+ "eval_samples_per_second": 74.737,
793
+ "eval_steps_per_second": 2.34,
794
+ "step": 4900
795
+ },
796
+ {
797
+ "epoch": 3.9619651347068148,
798
+ "grad_norm": 0.021525979042053223,
799
+ "learning_rate": 5.392912172573191e-07,
800
+ "loss": 0.0035,
801
+ "step": 5000
802
+ },
803
+ {
804
+ "epoch": 3.9619651347068148,
805
+ "eval_Embedding_Dataset_Dev_cosine_accuracy": 0.9996037483215332,
806
+ "eval_loss": 0.004212545696645975,
807
+ "eval_runtime": 67.678,
808
+ "eval_samples_per_second": 74.574,
809
+ "eval_steps_per_second": 2.335,
810
+ "step": 5000
811
+ }
812
+ ],
813
+ "logging_steps": 100,
814
+ "max_steps": 5048,
815
+ "num_input_tokens_seen": 0,
816
+ "num_train_epochs": 4,
817
+ "save_steps": 100,
818
+ "stateful_callbacks": {
819
+ "TrainerControl": {
820
+ "args": {
821
+ "should_epoch_stop": false,
822
+ "should_evaluate": false,
823
+ "should_log": false,
824
+ "should_save": true,
825
+ "should_training_stop": true
826
+ },
827
+ "attributes": {}
828
+ }
829
+ },
830
+ "total_flos": 0.0,
831
+ "train_batch_size": 32,
832
+ "trial_name": null,
833
+ "trial_params": null
834
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f727731b67728abf8ead49f572102e7325b7ce8d3b6383b7845533f7afe16df6
3
+ size 6225