hellosindh commited on
Commit
ec371f4
·
verified ·
1 Parent(s): 6d396cd

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,105 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - sd
4
+ license: mit
5
+ tags:
6
+ - sindhi
7
+ - bert
8
+ - masked-language-modeling
9
+ - from-scratch
10
+ ---
11
+
12
+ # Sindhi-BERT-base
13
+
14
+ The first BERT-style language model trained from scratch on Sindhi text, using a custom Sindhi BPE tokenizer with 32,000 pure Sindhi tokens.
15
+
16
+ ## Model Details
17
+
18
+ | Detail | Value |
19
+ |---|---|
20
+ | Architecture | RoBERTa-base |
21
+ | Vocabulary | 32,000 tokens (pure Sindhi BPE) |
22
+ | Hidden size | 768 |
23
+ | Layers | 12 |
24
+ | Attention heads | 12 |
25
+ | Max length | 512 tokens |
26
+ | Parameters | ~125M |
27
+ | Language | Sindhi (sd) |
28
+ | License | MIT |
29
+
30
+ ## Training Details
31
+
32
+ | Detail | Value |
33
+ |---|---|
34
+ | Training data | 500K Sindhi sentences |
35
+ | Full corpus size | 447 MB clean Sindhi text |
36
+ | Epochs | 5 |
37
+ | Batch size | 256 (effective) |
38
+ | Learning rate | 1e-4 |
39
+ | Hardware | A100 GPU |
40
+ | Training time | 301.7 minutes |
41
+ | Final eval loss | 4.358 |
42
+ | Final perplexity | 78.10 |
43
+
44
+ ## Tokenizer
45
+
46
+ Trained a custom Sindhi BPE tokenizer with 32,000 vocabulary size built specifically for Sindhi script. Every token is a real Sindhi word or subword — unlike multilingual models like mBERT or XLM-R which give Sindhi very limited vocabulary coverage.
47
+
48
+ Each Sindhi word stays as ONE whole token:
49
+
50
+ Input : سنڌي ٻولي دنيا جي قديم ٻولين مان هڪ آهي
51
+
52
+ Tokens : ['سنڌي', 'ٻولي', 'دنيا', 'جي', 'قديم', 'ٻولين', 'مان', 'هڪ', 'آهي']
53
+
54
+ Count : 9 words = 9 tokens
55
+
56
+ ## Fill-Mask Results
57
+
58
+ Tested on 10 Sindhi sentences after 5 epochs of training:
59
+
60
+ | Input | Top Prediction | Score | Quality |
61
+ |---|---|---|---|
62
+ | سنڌي ___ دنيا جي قديم ٻولين | ٻولي (language) | 15.47% | Perfect |
63
+ | شاهه لطيف سنڌي ___ جو وڏو شاعر | ادب (literature) | 16.48% | Perfect |
64
+ | استاد شاگردن کي ___ سيکاري | تعليم (education) | 5.58% | Perfect |
65
+ | دنيا ___ گھڻي مصروف آھي | ۾ (in) | 27.98% | Correct |
66
+ | سنڌ جي ___ ڏاڍي پراڻي آهي | تاريخ (history) | Top 2 | Good |
67
+ | ڪراچي سنڌ جو سڀ کان وڏو ___ آهي | شهر (city) | Top 3 | Close |
68
+ | ٻار ___ ۾ پڙهن ٿا | گهر (home) | Top 2 | Close |
69
+
70
+ Overall: 50% top-1 accuracy after 5 epochs on 500K sentences.
71
+ Results improve significantly with more training.
72
+
73
+ ## Comparison With Other Models
74
+
75
+ | Model | Type | Perplexity | Fill-mask Quality |
76
+ |---|---|---|---|
77
+ | mBERT fine-tuned | Multilingual | 4.19 | Poor — predicts punctuation |
78
+ | XLM-R fine-tuned | Multilingual | 5.88 | Good — 80% correct |
79
+ | Sindhi-BERT scratch | Sindhi only | 78.10 | 50% — still improving |
80
+
81
+ Note: Perplexity is not directly comparable between from-scratch and fine-tuned models. SindhiBERT starts from zero knowledge while mBERT/XLM-R start from pre-trained multilingual weights. SindhiBERT predictions are always real Sindhi words — never punctuation.
82
+
83
+ ## Roadmap
84
+
85
+ - [x] Train custom Sindhi BPE tokenizer (32K vocab)
86
+ - [x] Session 1 — 500K lines, 5 epochs, A100
87
+ - [ ] Session 2 — full corpus 2.1M lines
88
+ - [ ] Session 3 — more epochs, lower learning rate
89
+ - [ ] Fine-tune for spell checking
90
+ - [ ] Fine-tune for next word prediction
91
+ - [ ] Fine-tune for named entity recognition
92
+ - [ ] Sindhi chatbot
93
+
94
+ ## Citation
95
+
96
+ If you use this model please cite:
97
+
98
+ sindhibert2026,
99
+ title = Sindhi-BERT: A Sindhi Language Model Trained From Scratch,
100
+ year = 2026,
101
+ url = https://huggingface.co/hellosindh/sindhi-bert-base
102
+
103
+ ## About
104
+
105
+ This model is part of a larger effort to build complete NLP tools for the Sindhi language — one of the oldest languages in the world with over 30 million speakers across Pakistan and India.
added_tokens.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "<mask>": 32000
3
+ }
checkpoint-8793/config.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_cross_attention": false,
3
+ "architectures": [
4
+ "RobertaForMaskedLM"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 1,
8
+ "classifier_dropout": null,
9
+ "dtype": "float32",
10
+ "eos_token_id": 2,
11
+ "hidden_act": "gelu",
12
+ "hidden_dropout_prob": 0.1,
13
+ "hidden_size": 768,
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 3072,
16
+ "is_decoder": false,
17
+ "layer_norm_eps": 1e-12,
18
+ "max_position_embeddings": 514,
19
+ "model_type": "roberta",
20
+ "num_attention_heads": 12,
21
+ "num_hidden_layers": 12,
22
+ "pad_token_id": 0,
23
+ "tie_word_embeddings": true,
24
+ "transformers_version": "5.0.0",
25
+ "type_vocab_size": 1,
26
+ "use_cache": false,
27
+ "vocab_size": 32001
28
+ }
checkpoint-8793/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3a069d2f9ec70fe11d2f46141d99a3fc93a3b124370216b9a27984e81da60567
3
+ size 442633884
checkpoint-8793/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:142c6ffe3c929b578dbde40c4a4cba70fdf48a82e83b25dd290da63160a3a637
3
+ size 885391563
checkpoint-8793/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:255bdd7e7348d0b37b004db134ccf0e1c1680b0561813fd020977a0be34422f6
3
+ size 14645
checkpoint-8793/scaler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c96a74477f86632a25ad022c886ebc100a473a6980f09203c40db9c93ef40b9d
3
+ size 1383
checkpoint-8793/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:78468c8fd1aa362c2a80b94d82a5cfc6bbb09adf44953225aa2a5052d06f485f
3
+ size 1465
checkpoint-8793/trainer_state.json ADDED
@@ -0,0 +1,715 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 8793,
3
+ "best_metric": 4.369416236877441,
4
+ "best_model_checkpoint": "sindhibert_scratch/checkpoint-8793",
5
+ "epoch": 4.500224,
6
+ "eval_steps": 977,
7
+ "global_step": 8793,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.0512,
14
+ "grad_norm": 18.65768051147461,
15
+ "learning_rate": 9.900000000000002e-06,
16
+ "loss": 77.516953125,
17
+ "step": 100
18
+ },
19
+ {
20
+ "epoch": 0.1024,
21
+ "grad_norm": 13.30678939819336,
22
+ "learning_rate": 1.9900000000000003e-05,
23
+ "loss": 67.7811669921875,
24
+ "step": 200
25
+ },
26
+ {
27
+ "epoch": 0.1536,
28
+ "grad_norm": 9.597126007080078,
29
+ "learning_rate": 2.9900000000000002e-05,
30
+ "loss": 60.4075341796875,
31
+ "step": 300
32
+ },
33
+ {
34
+ "epoch": 0.2048,
35
+ "grad_norm": 8.602116584777832,
36
+ "learning_rate": 3.99e-05,
37
+ "loss": 57.3657763671875,
38
+ "step": 400
39
+ },
40
+ {
41
+ "epoch": 0.256,
42
+ "grad_norm": 9.814536094665527,
43
+ "learning_rate": 4.99e-05,
44
+ "loss": 56.28333984375,
45
+ "step": 500
46
+ },
47
+ {
48
+ "epoch": 0.3072,
49
+ "grad_norm": 11.049890518188477,
50
+ "learning_rate": 5.99e-05,
51
+ "loss": 55.6001611328125,
52
+ "step": 600
53
+ },
54
+ {
55
+ "epoch": 0.3584,
56
+ "grad_norm": 11.56328010559082,
57
+ "learning_rate": 6.99e-05,
58
+ "loss": 55.2201416015625,
59
+ "step": 700
60
+ },
61
+ {
62
+ "epoch": 0.4096,
63
+ "grad_norm": 13.450942993164062,
64
+ "learning_rate": 7.99e-05,
65
+ "loss": 54.76716796875,
66
+ "step": 800
67
+ },
68
+ {
69
+ "epoch": 0.4608,
70
+ "grad_norm": 11.986808776855469,
71
+ "learning_rate": 8.99e-05,
72
+ "loss": 54.2532373046875,
73
+ "step": 900
74
+ },
75
+ {
76
+ "epoch": 0.500224,
77
+ "eval_loss": 6.677970886230469,
78
+ "eval_runtime": 53.9539,
79
+ "eval_samples_per_second": 370.687,
80
+ "eval_steps_per_second": 11.584,
81
+ "step": 977
82
+ },
83
+ {
84
+ "epoch": 0.512,
85
+ "grad_norm": 11.47065258026123,
86
+ "learning_rate": 9.99e-05,
87
+ "loss": 53.8267431640625,
88
+ "step": 1000
89
+ },
90
+ {
91
+ "epoch": 0.5632,
92
+ "grad_norm": 14.508301734924316,
93
+ "learning_rate": 9.887115165336375e-05,
94
+ "loss": 53.4335546875,
95
+ "step": 1100
96
+ },
97
+ {
98
+ "epoch": 0.6144,
99
+ "grad_norm": 13.515727043151855,
100
+ "learning_rate": 9.773090079817561e-05,
101
+ "loss": 53.126240234375,
102
+ "step": 1200
103
+ },
104
+ {
105
+ "epoch": 0.6656,
106
+ "grad_norm": 13.126218795776367,
107
+ "learning_rate": 9.659064994298746e-05,
108
+ "loss": 52.5950341796875,
109
+ "step": 1300
110
+ },
111
+ {
112
+ "epoch": 0.7168,
113
+ "grad_norm": 21.854541778564453,
114
+ "learning_rate": 9.545039908779932e-05,
115
+ "loss": 52.1828076171875,
116
+ "step": 1400
117
+ },
118
+ {
119
+ "epoch": 0.768,
120
+ "grad_norm": 18.417003631591797,
121
+ "learning_rate": 9.431014823261119e-05,
122
+ "loss": 51.821767578125,
123
+ "step": 1500
124
+ },
125
+ {
126
+ "epoch": 0.8192,
127
+ "grad_norm": 12.346606254577637,
128
+ "learning_rate": 9.316989737742304e-05,
129
+ "loss": 51.36373046875,
130
+ "step": 1600
131
+ },
132
+ {
133
+ "epoch": 0.8704,
134
+ "grad_norm": 22.9099063873291,
135
+ "learning_rate": 9.202964652223489e-05,
136
+ "loss": 50.9531591796875,
137
+ "step": 1700
138
+ },
139
+ {
140
+ "epoch": 0.9216,
141
+ "grad_norm": 18.55419158935547,
142
+ "learning_rate": 9.088939566704675e-05,
143
+ "loss": 50.487490234375,
144
+ "step": 1800
145
+ },
146
+ {
147
+ "epoch": 0.9728,
148
+ "grad_norm": 16.247209548950195,
149
+ "learning_rate": 8.974914481185861e-05,
150
+ "loss": 50.037265625,
151
+ "step": 1900
152
+ },
153
+ {
154
+ "epoch": 1.0,
155
+ "eval_loss": 6.126930236816406,
156
+ "eval_runtime": 54.5716,
157
+ "eval_samples_per_second": 366.491,
158
+ "eval_steps_per_second": 11.453,
159
+ "step": 1954
160
+ },
161
+ {
162
+ "epoch": 1.023552,
163
+ "grad_norm": 15.577176094055176,
164
+ "learning_rate": 8.860889395667046e-05,
165
+ "loss": 49.1747216796875,
166
+ "step": 2000
167
+ },
168
+ {
169
+ "epoch": 1.074752,
170
+ "grad_norm": 14.534530639648438,
171
+ "learning_rate": 8.746864310148233e-05,
172
+ "loss": 48.833271484375,
173
+ "step": 2100
174
+ },
175
+ {
176
+ "epoch": 1.125952,
177
+ "grad_norm": 21.150440216064453,
178
+ "learning_rate": 8.632839224629419e-05,
179
+ "loss": 47.9638330078125,
180
+ "step": 2200
181
+ },
182
+ {
183
+ "epoch": 1.177152,
184
+ "grad_norm": 14.35093879699707,
185
+ "learning_rate": 8.518814139110604e-05,
186
+ "loss": 47.6843505859375,
187
+ "step": 2300
188
+ },
189
+ {
190
+ "epoch": 1.228352,
191
+ "grad_norm": 18.37192726135254,
192
+ "learning_rate": 8.40478905359179e-05,
193
+ "loss": 47.1163623046875,
194
+ "step": 2400
195
+ },
196
+ {
197
+ "epoch": 1.279552,
198
+ "grad_norm": 15.366902351379395,
199
+ "learning_rate": 8.290763968072977e-05,
200
+ "loss": 46.6763720703125,
201
+ "step": 2500
202
+ },
203
+ {
204
+ "epoch": 1.330752,
205
+ "grad_norm": 18.58373260498047,
206
+ "learning_rate": 8.176738882554162e-05,
207
+ "loss": 46.071142578125,
208
+ "step": 2600
209
+ },
210
+ {
211
+ "epoch": 1.381952,
212
+ "grad_norm": 16.35076141357422,
213
+ "learning_rate": 8.062713797035348e-05,
214
+ "loss": 45.5451220703125,
215
+ "step": 2700
216
+ },
217
+ {
218
+ "epoch": 1.433152,
219
+ "grad_norm": 18.8570556640625,
220
+ "learning_rate": 7.948688711516535e-05,
221
+ "loss": 45.1469384765625,
222
+ "step": 2800
223
+ },
224
+ {
225
+ "epoch": 1.484352,
226
+ "grad_norm": 17.72637367248535,
227
+ "learning_rate": 7.83466362599772e-05,
228
+ "loss": 44.6100927734375,
229
+ "step": 2900
230
+ },
231
+ {
232
+ "epoch": 1.500224,
233
+ "eval_loss": 5.475055694580078,
234
+ "eval_runtime": 54.4157,
235
+ "eval_samples_per_second": 367.541,
236
+ "eval_steps_per_second": 11.486,
237
+ "step": 2931
238
+ },
239
+ {
240
+ "epoch": 1.535552,
241
+ "grad_norm": 14.798540115356445,
242
+ "learning_rate": 7.720638540478906e-05,
243
+ "loss": 44.670400390625,
244
+ "step": 3000
245
+ },
246
+ {
247
+ "epoch": 1.5867520000000002,
248
+ "grad_norm": 17.05501365661621,
249
+ "learning_rate": 7.606613454960093e-05,
250
+ "loss": 44.11970703125,
251
+ "step": 3100
252
+ },
253
+ {
254
+ "epoch": 1.6379519999999999,
255
+ "grad_norm": 15.10769271850586,
256
+ "learning_rate": 7.492588369441278e-05,
257
+ "loss": 43.6269775390625,
258
+ "step": 3200
259
+ },
260
+ {
261
+ "epoch": 1.689152,
262
+ "grad_norm": 16.199745178222656,
263
+ "learning_rate": 7.378563283922463e-05,
264
+ "loss": 43.4040771484375,
265
+ "step": 3300
266
+ },
267
+ {
268
+ "epoch": 1.7403520000000001,
269
+ "grad_norm": 19.104358673095703,
270
+ "learning_rate": 7.264538198403649e-05,
271
+ "loss": 42.9161328125,
272
+ "step": 3400
273
+ },
274
+ {
275
+ "epoch": 1.791552,
276
+ "grad_norm": 17.44623374938965,
277
+ "learning_rate": 7.150513112884834e-05,
278
+ "loss": 42.778564453125,
279
+ "step": 3500
280
+ },
281
+ {
282
+ "epoch": 1.842752,
283
+ "grad_norm": 16.97149658203125,
284
+ "learning_rate": 7.03648802736602e-05,
285
+ "loss": 42.4598193359375,
286
+ "step": 3600
287
+ },
288
+ {
289
+ "epoch": 1.893952,
290
+ "grad_norm": 14.990788459777832,
291
+ "learning_rate": 6.922462941847207e-05,
292
+ "loss": 42.2105419921875,
293
+ "step": 3700
294
+ },
295
+ {
296
+ "epoch": 1.945152,
297
+ "grad_norm": 17.827381134033203,
298
+ "learning_rate": 6.808437856328392e-05,
299
+ "loss": 42.019033203125,
300
+ "step": 3800
301
+ },
302
+ {
303
+ "epoch": 1.996352,
304
+ "grad_norm": 20.906902313232422,
305
+ "learning_rate": 6.694412770809578e-05,
306
+ "loss": 41.5656201171875,
307
+ "step": 3900
308
+ },
309
+ {
310
+ "epoch": 2.0,
311
+ "eval_loss": 5.107455253601074,
312
+ "eval_runtime": 54.6023,
313
+ "eval_samples_per_second": 366.285,
314
+ "eval_steps_per_second": 11.446,
315
+ "step": 3908
316
+ },
317
+ {
318
+ "epoch": 2.047104,
319
+ "grad_norm": 16.346454620361328,
320
+ "learning_rate": 6.580387685290765e-05,
321
+ "loss": 40.89203369140625,
322
+ "step": 4000
323
+ },
324
+ {
325
+ "epoch": 2.098304,
326
+ "grad_norm": 16.84693145751953,
327
+ "learning_rate": 6.46636259977195e-05,
328
+ "loss": 41.0606884765625,
329
+ "step": 4100
330
+ },
331
+ {
332
+ "epoch": 2.149504,
333
+ "grad_norm": 18.569360733032227,
334
+ "learning_rate": 6.352337514253136e-05,
335
+ "loss": 40.92314697265625,
336
+ "step": 4200
337
+ },
338
+ {
339
+ "epoch": 2.200704,
340
+ "grad_norm": 15.775079727172852,
341
+ "learning_rate": 6.238312428734322e-05,
342
+ "loss": 40.7518408203125,
343
+ "step": 4300
344
+ },
345
+ {
346
+ "epoch": 2.251904,
347
+ "grad_norm": 18.271591186523438,
348
+ "learning_rate": 6.124287343215507e-05,
349
+ "loss": 40.4889990234375,
350
+ "step": 4400
351
+ },
352
+ {
353
+ "epoch": 2.303104,
354
+ "grad_norm": 20.265701293945312,
355
+ "learning_rate": 6.010262257696694e-05,
356
+ "loss": 40.1488427734375,
357
+ "step": 4500
358
+ },
359
+ {
360
+ "epoch": 2.354304,
361
+ "grad_norm": 19.53594398498535,
362
+ "learning_rate": 5.8962371721778794e-05,
363
+ "loss": 40.10376708984375,
364
+ "step": 4600
365
+ },
366
+ {
367
+ "epoch": 2.405504,
368
+ "grad_norm": 19.707582473754883,
369
+ "learning_rate": 5.782212086659066e-05,
370
+ "loss": 39.953662109375,
371
+ "step": 4700
372
+ },
373
+ {
374
+ "epoch": 2.456704,
375
+ "grad_norm": 15.16901683807373,
376
+ "learning_rate": 5.6681870011402515e-05,
377
+ "loss": 39.77469970703125,
378
+ "step": 4800
379
+ },
380
+ {
381
+ "epoch": 2.5002240000000002,
382
+ "eval_loss": 4.868845462799072,
383
+ "eval_runtime": 54.8341,
384
+ "eval_samples_per_second": 364.737,
385
+ "eval_steps_per_second": 11.398,
386
+ "step": 4885
387
+ },
388
+ {
389
+ "epoch": 2.507904,
390
+ "grad_norm": 17.12852668762207,
391
+ "learning_rate": 5.554161915621437e-05,
392
+ "loss": 39.64916015625,
393
+ "step": 4900
394
+ },
395
+ {
396
+ "epoch": 2.559104,
397
+ "grad_norm": 19.869260787963867,
398
+ "learning_rate": 5.440136830102622e-05,
399
+ "loss": 39.51341552734375,
400
+ "step": 5000
401
+ },
402
+ {
403
+ "epoch": 2.610304,
404
+ "grad_norm": 17.342073440551758,
405
+ "learning_rate": 5.326111744583808e-05,
406
+ "loss": 39.350986328125,
407
+ "step": 5100
408
+ },
409
+ {
410
+ "epoch": 2.661504,
411
+ "grad_norm": 19.635601043701172,
412
+ "learning_rate": 5.212086659064994e-05,
413
+ "loss": 39.2056787109375,
414
+ "step": 5200
415
+ },
416
+ {
417
+ "epoch": 2.712704,
418
+ "grad_norm": 16.2427921295166,
419
+ "learning_rate": 5.09806157354618e-05,
420
+ "loss": 38.98402587890625,
421
+ "step": 5300
422
+ },
423
+ {
424
+ "epoch": 2.763904,
425
+ "grad_norm": 21.025632858276367,
426
+ "learning_rate": 4.984036488027366e-05,
427
+ "loss": 38.84795166015625,
428
+ "step": 5400
429
+ },
430
+ {
431
+ "epoch": 2.815104,
432
+ "grad_norm": 16.225173950195312,
433
+ "learning_rate": 4.870011402508552e-05,
434
+ "loss": 38.63864013671875,
435
+ "step": 5500
436
+ },
437
+ {
438
+ "epoch": 2.866304,
439
+ "grad_norm": 19.175825119018555,
440
+ "learning_rate": 4.755986316989738e-05,
441
+ "loss": 38.5738232421875,
442
+ "step": 5600
443
+ },
444
+ {
445
+ "epoch": 2.917504,
446
+ "grad_norm": 18.190704345703125,
447
+ "learning_rate": 4.6419612314709235e-05,
448
+ "loss": 38.47998046875,
449
+ "step": 5700
450
+ },
451
+ {
452
+ "epoch": 2.968704,
453
+ "grad_norm": 17.764493942260742,
454
+ "learning_rate": 4.52793614595211e-05,
455
+ "loss": 38.36706298828125,
456
+ "step": 5800
457
+ },
458
+ {
459
+ "epoch": 3.0,
460
+ "eval_loss": 4.690184116363525,
461
+ "eval_runtime": 54.4789,
462
+ "eval_samples_per_second": 367.115,
463
+ "eval_steps_per_second": 11.472,
464
+ "step": 5862
465
+ },
466
+ {
467
+ "epoch": 3.019456,
468
+ "grad_norm": 16.051603317260742,
469
+ "learning_rate": 4.4139110604332956e-05,
470
+ "loss": 37.86203369140625,
471
+ "step": 5900
472
+ },
473
+ {
474
+ "epoch": 3.070656,
475
+ "grad_norm": 16.511327743530273,
476
+ "learning_rate": 4.299885974914481e-05,
477
+ "loss": 38.09601318359375,
478
+ "step": 6000
479
+ },
480
+ {
481
+ "epoch": 3.121856,
482
+ "grad_norm": 16.089813232421875,
483
+ "learning_rate": 4.1858608893956676e-05,
484
+ "loss": 37.88030029296875,
485
+ "step": 6100
486
+ },
487
+ {
488
+ "epoch": 3.173056,
489
+ "grad_norm": 18.612686157226562,
490
+ "learning_rate": 4.0718358038768533e-05,
491
+ "loss": 37.66849853515625,
492
+ "step": 6200
493
+ },
494
+ {
495
+ "epoch": 3.224256,
496
+ "grad_norm": 17.79659652709961,
497
+ "learning_rate": 3.957810718358039e-05,
498
+ "loss": 37.43648681640625,
499
+ "step": 6300
500
+ },
501
+ {
502
+ "epoch": 3.275456,
503
+ "grad_norm": 16.967939376831055,
504
+ "learning_rate": 3.843785632839225e-05,
505
+ "loss": 37.43171142578125,
506
+ "step": 6400
507
+ },
508
+ {
509
+ "epoch": 3.326656,
510
+ "grad_norm": 16.4842529296875,
511
+ "learning_rate": 3.7297605473204104e-05,
512
+ "loss": 37.26192138671875,
513
+ "step": 6500
514
+ },
515
+ {
516
+ "epoch": 3.377856,
517
+ "grad_norm": 16.261512756347656,
518
+ "learning_rate": 3.615735461801597e-05,
519
+ "loss": 37.26702392578125,
520
+ "step": 6600
521
+ },
522
+ {
523
+ "epoch": 3.429056,
524
+ "grad_norm": 17.182903289794922,
525
+ "learning_rate": 3.5017103762827825e-05,
526
+ "loss": 37.16359619140625,
527
+ "step": 6700
528
+ },
529
+ {
530
+ "epoch": 3.480256,
531
+ "grad_norm": 17.765832901000977,
532
+ "learning_rate": 3.387685290763968e-05,
533
+ "loss": 37.05822998046875,
534
+ "step": 6800
535
+ },
536
+ {
537
+ "epoch": 3.5002240000000002,
538
+ "eval_loss": 4.552316188812256,
539
+ "eval_runtime": 54.6295,
540
+ "eval_samples_per_second": 366.103,
541
+ "eval_steps_per_second": 11.441,
542
+ "step": 6839
543
+ },
544
+ {
545
+ "epoch": 3.531456,
546
+ "grad_norm": 18.267107009887695,
547
+ "learning_rate": 3.2736602052451546e-05,
548
+ "loss": 36.907744140625,
549
+ "step": 6900
550
+ },
551
+ {
552
+ "epoch": 3.582656,
553
+ "grad_norm": 20.562347412109375,
554
+ "learning_rate": 3.15963511972634e-05,
555
+ "loss": 36.8392919921875,
556
+ "step": 7000
557
+ },
558
+ {
559
+ "epoch": 3.6338559999999998,
560
+ "grad_norm": 16.440811157226562,
561
+ "learning_rate": 3.0456100342075257e-05,
562
+ "loss": 36.639384765625,
563
+ "step": 7100
564
+ },
565
+ {
566
+ "epoch": 3.685056,
567
+ "grad_norm": 17.070350646972656,
568
+ "learning_rate": 2.9315849486887114e-05,
569
+ "loss": 36.63670166015625,
570
+ "step": 7200
571
+ },
572
+ {
573
+ "epoch": 3.736256,
574
+ "grad_norm": 17.15755844116211,
575
+ "learning_rate": 2.8175598631698974e-05,
576
+ "loss": 36.6428759765625,
577
+ "step": 7300
578
+ },
579
+ {
580
+ "epoch": 3.787456,
581
+ "grad_norm": 18.988977432250977,
582
+ "learning_rate": 2.7035347776510834e-05,
583
+ "loss": 36.578349609375,
584
+ "step": 7400
585
+ },
586
+ {
587
+ "epoch": 3.838656,
588
+ "grad_norm": 17.87462615966797,
589
+ "learning_rate": 2.589509692132269e-05,
590
+ "loss": 36.4393994140625,
591
+ "step": 7500
592
+ },
593
+ {
594
+ "epoch": 3.889856,
595
+ "grad_norm": 18.244951248168945,
596
+ "learning_rate": 2.4754846066134552e-05,
597
+ "loss": 36.40805908203125,
598
+ "step": 7600
599
+ },
600
+ {
601
+ "epoch": 3.941056,
602
+ "grad_norm": 16.166940689086914,
603
+ "learning_rate": 2.361459521094641e-05,
604
+ "loss": 36.23387451171875,
605
+ "step": 7700
606
+ },
607
+ {
608
+ "epoch": 3.9922560000000002,
609
+ "grad_norm": 19.10250473022461,
610
+ "learning_rate": 2.2474344355758266e-05,
611
+ "loss": 36.08807373046875,
612
+ "step": 7800
613
+ },
614
+ {
615
+ "epoch": 4.0,
616
+ "eval_loss": 4.433136940002441,
617
+ "eval_runtime": 54.6448,
618
+ "eval_samples_per_second": 366.0,
619
+ "eval_steps_per_second": 11.438,
620
+ "step": 7816
621
+ },
622
+ {
623
+ "epoch": 4.043008,
624
+ "grad_norm": 17.705839157104492,
625
+ "learning_rate": 2.1334093500570126e-05,
626
+ "loss": 35.8281103515625,
627
+ "step": 7900
628
+ },
629
+ {
630
+ "epoch": 4.094208,
631
+ "grad_norm": 16.15859031677246,
632
+ "learning_rate": 2.0193842645381987e-05,
633
+ "loss": 35.8823193359375,
634
+ "step": 8000
635
+ },
636
+ {
637
+ "epoch": 4.145408,
638
+ "grad_norm": 17.96906852722168,
639
+ "learning_rate": 1.9053591790193844e-05,
640
+ "loss": 35.968681640625,
641
+ "step": 8100
642
+ },
643
+ {
644
+ "epoch": 4.196608,
645
+ "grad_norm": 16.74631118774414,
646
+ "learning_rate": 1.79133409350057e-05,
647
+ "loss": 35.894248046875,
648
+ "step": 8200
649
+ },
650
+ {
651
+ "epoch": 4.247808,
652
+ "grad_norm": 16.067609786987305,
653
+ "learning_rate": 1.677309007981756e-05,
654
+ "loss": 35.78274169921875,
655
+ "step": 8300
656
+ },
657
+ {
658
+ "epoch": 4.299008,
659
+ "grad_norm": 17.308250427246094,
660
+ "learning_rate": 1.563283922462942e-05,
661
+ "loss": 35.8246630859375,
662
+ "step": 8400
663
+ },
664
+ {
665
+ "epoch": 4.350208,
666
+ "grad_norm": 17.30617332458496,
667
+ "learning_rate": 1.4492588369441278e-05,
668
+ "loss": 35.7324560546875,
669
+ "step": 8500
670
+ },
671
+ {
672
+ "epoch": 4.401408,
673
+ "grad_norm": 16.38850975036621,
674
+ "learning_rate": 1.3352337514253135e-05,
675
+ "loss": 35.88875244140625,
676
+ "step": 8600
677
+ },
678
+ {
679
+ "epoch": 4.452608,
680
+ "grad_norm": 20.229633331298828,
681
+ "learning_rate": 1.2212086659064994e-05,
682
+ "loss": 35.7011962890625,
683
+ "step": 8700
684
+ },
685
+ {
686
+ "epoch": 4.500224,
687
+ "eval_loss": 4.369416236877441,
688
+ "eval_runtime": 54.3995,
689
+ "eval_samples_per_second": 367.651,
690
+ "eval_steps_per_second": 11.489,
691
+ "step": 8793
692
+ }
693
+ ],
694
+ "logging_steps": 100,
695
+ "max_steps": 9770,
696
+ "num_input_tokens_seen": 0,
697
+ "num_train_epochs": 5,
698
+ "save_steps": 977,
699
+ "stateful_callbacks": {
700
+ "TrainerControl": {
701
+ "args": {
702
+ "should_epoch_stop": false,
703
+ "should_evaluate": false,
704
+ "should_log": false,
705
+ "should_save": true,
706
+ "should_training_stop": false
707
+ },
708
+ "attributes": {}
709
+ }
710
+ },
711
+ "total_flos": 5.922505310959043e+17,
712
+ "train_batch_size": 32,
713
+ "trial_name": null,
714
+ "trial_params": null
715
+ }
checkpoint-8793/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3de483099e0f14e67b25caaa2bbb1cb1097bf08c651d7169f2211a8fd2657c92
3
+ size 5137
checkpoint-9770/config.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_cross_attention": false,
3
+ "architectures": [
4
+ "RobertaForMaskedLM"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 1,
8
+ "classifier_dropout": null,
9
+ "dtype": "float32",
10
+ "eos_token_id": 2,
11
+ "hidden_act": "gelu",
12
+ "hidden_dropout_prob": 0.1,
13
+ "hidden_size": 768,
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 3072,
16
+ "is_decoder": false,
17
+ "layer_norm_eps": 1e-12,
18
+ "max_position_embeddings": 514,
19
+ "model_type": "roberta",
20
+ "num_attention_heads": 12,
21
+ "num_hidden_layers": 12,
22
+ "pad_token_id": 0,
23
+ "tie_word_embeddings": true,
24
+ "transformers_version": "5.0.0",
25
+ "type_vocab_size": 1,
26
+ "use_cache": false,
27
+ "vocab_size": 32001
28
+ }
checkpoint-9770/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e5b34a93e4edb5d17560bd29970e86848d3ed25c9ed758b749e1f9bcbfa93606
3
+ size 442633884
checkpoint-9770/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bb5140302ad069df5391a6c10e4947229e01016097e341ad319e95a6d2d4a464
3
+ size 885391563
checkpoint-9770/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8e33e97e42ce504bb7381d1a5edb67658ff1bba022f7e975f7aec6c923c715e6
3
+ size 14645
checkpoint-9770/scaler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:097c870922de740a0d869ff90de8d3da3716914f2348a314e93c86df9b598877
3
+ size 1383
checkpoint-9770/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:45758048cbf361ad1cd5ce61de07bc242994aea178f2413fee70ffd4d9ed3000
3
+ size 1465
checkpoint-9770/trainer_state.json ADDED
@@ -0,0 +1,793 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 9770,
3
+ "best_metric": 4.354217052459717,
4
+ "best_model_checkpoint": "sindhibert_scratch/checkpoint-9770",
5
+ "epoch": 5.0,
6
+ "eval_steps": 977,
7
+ "global_step": 9770,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.0512,
14
+ "grad_norm": 18.65768051147461,
15
+ "learning_rate": 9.900000000000002e-06,
16
+ "loss": 77.516953125,
17
+ "step": 100
18
+ },
19
+ {
20
+ "epoch": 0.1024,
21
+ "grad_norm": 13.30678939819336,
22
+ "learning_rate": 1.9900000000000003e-05,
23
+ "loss": 67.7811669921875,
24
+ "step": 200
25
+ },
26
+ {
27
+ "epoch": 0.1536,
28
+ "grad_norm": 9.597126007080078,
29
+ "learning_rate": 2.9900000000000002e-05,
30
+ "loss": 60.4075341796875,
31
+ "step": 300
32
+ },
33
+ {
34
+ "epoch": 0.2048,
35
+ "grad_norm": 8.602116584777832,
36
+ "learning_rate": 3.99e-05,
37
+ "loss": 57.3657763671875,
38
+ "step": 400
39
+ },
40
+ {
41
+ "epoch": 0.256,
42
+ "grad_norm": 9.814536094665527,
43
+ "learning_rate": 4.99e-05,
44
+ "loss": 56.28333984375,
45
+ "step": 500
46
+ },
47
+ {
48
+ "epoch": 0.3072,
49
+ "grad_norm": 11.049890518188477,
50
+ "learning_rate": 5.99e-05,
51
+ "loss": 55.6001611328125,
52
+ "step": 600
53
+ },
54
+ {
55
+ "epoch": 0.3584,
56
+ "grad_norm": 11.56328010559082,
57
+ "learning_rate": 6.99e-05,
58
+ "loss": 55.2201416015625,
59
+ "step": 700
60
+ },
61
+ {
62
+ "epoch": 0.4096,
63
+ "grad_norm": 13.450942993164062,
64
+ "learning_rate": 7.99e-05,
65
+ "loss": 54.76716796875,
66
+ "step": 800
67
+ },
68
+ {
69
+ "epoch": 0.4608,
70
+ "grad_norm": 11.986808776855469,
71
+ "learning_rate": 8.99e-05,
72
+ "loss": 54.2532373046875,
73
+ "step": 900
74
+ },
75
+ {
76
+ "epoch": 0.500224,
77
+ "eval_loss": 6.677970886230469,
78
+ "eval_runtime": 53.9539,
79
+ "eval_samples_per_second": 370.687,
80
+ "eval_steps_per_second": 11.584,
81
+ "step": 977
82
+ },
83
+ {
84
+ "epoch": 0.512,
85
+ "grad_norm": 11.47065258026123,
86
+ "learning_rate": 9.99e-05,
87
+ "loss": 53.8267431640625,
88
+ "step": 1000
89
+ },
90
+ {
91
+ "epoch": 0.5632,
92
+ "grad_norm": 14.508301734924316,
93
+ "learning_rate": 9.887115165336375e-05,
94
+ "loss": 53.4335546875,
95
+ "step": 1100
96
+ },
97
+ {
98
+ "epoch": 0.6144,
99
+ "grad_norm": 13.515727043151855,
100
+ "learning_rate": 9.773090079817561e-05,
101
+ "loss": 53.126240234375,
102
+ "step": 1200
103
+ },
104
+ {
105
+ "epoch": 0.6656,
106
+ "grad_norm": 13.126218795776367,
107
+ "learning_rate": 9.659064994298746e-05,
108
+ "loss": 52.5950341796875,
109
+ "step": 1300
110
+ },
111
+ {
112
+ "epoch": 0.7168,
113
+ "grad_norm": 21.854541778564453,
114
+ "learning_rate": 9.545039908779932e-05,
115
+ "loss": 52.1828076171875,
116
+ "step": 1400
117
+ },
118
+ {
119
+ "epoch": 0.768,
120
+ "grad_norm": 18.417003631591797,
121
+ "learning_rate": 9.431014823261119e-05,
122
+ "loss": 51.821767578125,
123
+ "step": 1500
124
+ },
125
+ {
126
+ "epoch": 0.8192,
127
+ "grad_norm": 12.346606254577637,
128
+ "learning_rate": 9.316989737742304e-05,
129
+ "loss": 51.36373046875,
130
+ "step": 1600
131
+ },
132
+ {
133
+ "epoch": 0.8704,
134
+ "grad_norm": 22.9099063873291,
135
+ "learning_rate": 9.202964652223489e-05,
136
+ "loss": 50.9531591796875,
137
+ "step": 1700
138
+ },
139
+ {
140
+ "epoch": 0.9216,
141
+ "grad_norm": 18.55419158935547,
142
+ "learning_rate": 9.088939566704675e-05,
143
+ "loss": 50.487490234375,
144
+ "step": 1800
145
+ },
146
+ {
147
+ "epoch": 0.9728,
148
+ "grad_norm": 16.247209548950195,
149
+ "learning_rate": 8.974914481185861e-05,
150
+ "loss": 50.037265625,
151
+ "step": 1900
152
+ },
153
+ {
154
+ "epoch": 1.0,
155
+ "eval_loss": 6.126930236816406,
156
+ "eval_runtime": 54.5716,
157
+ "eval_samples_per_second": 366.491,
158
+ "eval_steps_per_second": 11.453,
159
+ "step": 1954
160
+ },
161
+ {
162
+ "epoch": 1.023552,
163
+ "grad_norm": 15.577176094055176,
164
+ "learning_rate": 8.860889395667046e-05,
165
+ "loss": 49.1747216796875,
166
+ "step": 2000
167
+ },
168
+ {
169
+ "epoch": 1.074752,
170
+ "grad_norm": 14.534530639648438,
171
+ "learning_rate": 8.746864310148233e-05,
172
+ "loss": 48.833271484375,
173
+ "step": 2100
174
+ },
175
+ {
176
+ "epoch": 1.125952,
177
+ "grad_norm": 21.150440216064453,
178
+ "learning_rate": 8.632839224629419e-05,
179
+ "loss": 47.9638330078125,
180
+ "step": 2200
181
+ },
182
+ {
183
+ "epoch": 1.177152,
184
+ "grad_norm": 14.35093879699707,
185
+ "learning_rate": 8.518814139110604e-05,
186
+ "loss": 47.6843505859375,
187
+ "step": 2300
188
+ },
189
+ {
190
+ "epoch": 1.228352,
191
+ "grad_norm": 18.37192726135254,
192
+ "learning_rate": 8.40478905359179e-05,
193
+ "loss": 47.1163623046875,
194
+ "step": 2400
195
+ },
196
+ {
197
+ "epoch": 1.279552,
198
+ "grad_norm": 15.366902351379395,
199
+ "learning_rate": 8.290763968072977e-05,
200
+ "loss": 46.6763720703125,
201
+ "step": 2500
202
+ },
203
+ {
204
+ "epoch": 1.330752,
205
+ "grad_norm": 18.58373260498047,
206
+ "learning_rate": 8.176738882554162e-05,
207
+ "loss": 46.071142578125,
208
+ "step": 2600
209
+ },
210
+ {
211
+ "epoch": 1.381952,
212
+ "grad_norm": 16.35076141357422,
213
+ "learning_rate": 8.062713797035348e-05,
214
+ "loss": 45.5451220703125,
215
+ "step": 2700
216
+ },
217
+ {
218
+ "epoch": 1.433152,
219
+ "grad_norm": 18.8570556640625,
220
+ "learning_rate": 7.948688711516535e-05,
221
+ "loss": 45.1469384765625,
222
+ "step": 2800
223
+ },
224
+ {
225
+ "epoch": 1.484352,
226
+ "grad_norm": 17.72637367248535,
227
+ "learning_rate": 7.83466362599772e-05,
228
+ "loss": 44.6100927734375,
229
+ "step": 2900
230
+ },
231
+ {
232
+ "epoch": 1.500224,
233
+ "eval_loss": 5.475055694580078,
234
+ "eval_runtime": 54.4157,
235
+ "eval_samples_per_second": 367.541,
236
+ "eval_steps_per_second": 11.486,
237
+ "step": 2931
238
+ },
239
+ {
240
+ "epoch": 1.535552,
241
+ "grad_norm": 14.798540115356445,
242
+ "learning_rate": 7.720638540478906e-05,
243
+ "loss": 44.670400390625,
244
+ "step": 3000
245
+ },
246
+ {
247
+ "epoch": 1.5867520000000002,
248
+ "grad_norm": 17.05501365661621,
249
+ "learning_rate": 7.606613454960093e-05,
250
+ "loss": 44.11970703125,
251
+ "step": 3100
252
+ },
253
+ {
254
+ "epoch": 1.6379519999999999,
255
+ "grad_norm": 15.10769271850586,
256
+ "learning_rate": 7.492588369441278e-05,
257
+ "loss": 43.6269775390625,
258
+ "step": 3200
259
+ },
260
+ {
261
+ "epoch": 1.689152,
262
+ "grad_norm": 16.199745178222656,
263
+ "learning_rate": 7.378563283922463e-05,
264
+ "loss": 43.4040771484375,
265
+ "step": 3300
266
+ },
267
+ {
268
+ "epoch": 1.7403520000000001,
269
+ "grad_norm": 19.104358673095703,
270
+ "learning_rate": 7.264538198403649e-05,
271
+ "loss": 42.9161328125,
272
+ "step": 3400
273
+ },
274
+ {
275
+ "epoch": 1.791552,
276
+ "grad_norm": 17.44623374938965,
277
+ "learning_rate": 7.150513112884834e-05,
278
+ "loss": 42.778564453125,
279
+ "step": 3500
280
+ },
281
+ {
282
+ "epoch": 1.842752,
283
+ "grad_norm": 16.97149658203125,
284
+ "learning_rate": 7.03648802736602e-05,
285
+ "loss": 42.4598193359375,
286
+ "step": 3600
287
+ },
288
+ {
289
+ "epoch": 1.893952,
290
+ "grad_norm": 14.990788459777832,
291
+ "learning_rate": 6.922462941847207e-05,
292
+ "loss": 42.2105419921875,
293
+ "step": 3700
294
+ },
295
+ {
296
+ "epoch": 1.945152,
297
+ "grad_norm": 17.827381134033203,
298
+ "learning_rate": 6.808437856328392e-05,
299
+ "loss": 42.019033203125,
300
+ "step": 3800
301
+ },
302
+ {
303
+ "epoch": 1.996352,
304
+ "grad_norm": 20.906902313232422,
305
+ "learning_rate": 6.694412770809578e-05,
306
+ "loss": 41.5656201171875,
307
+ "step": 3900
308
+ },
309
+ {
310
+ "epoch": 2.0,
311
+ "eval_loss": 5.107455253601074,
312
+ "eval_runtime": 54.6023,
313
+ "eval_samples_per_second": 366.285,
314
+ "eval_steps_per_second": 11.446,
315
+ "step": 3908
316
+ },
317
+ {
318
+ "epoch": 2.047104,
319
+ "grad_norm": 16.346454620361328,
320
+ "learning_rate": 6.580387685290765e-05,
321
+ "loss": 40.89203369140625,
322
+ "step": 4000
323
+ },
324
+ {
325
+ "epoch": 2.098304,
326
+ "grad_norm": 16.84693145751953,
327
+ "learning_rate": 6.46636259977195e-05,
328
+ "loss": 41.0606884765625,
329
+ "step": 4100
330
+ },
331
+ {
332
+ "epoch": 2.149504,
333
+ "grad_norm": 18.569360733032227,
334
+ "learning_rate": 6.352337514253136e-05,
335
+ "loss": 40.92314697265625,
336
+ "step": 4200
337
+ },
338
+ {
339
+ "epoch": 2.200704,
340
+ "grad_norm": 15.775079727172852,
341
+ "learning_rate": 6.238312428734322e-05,
342
+ "loss": 40.7518408203125,
343
+ "step": 4300
344
+ },
345
+ {
346
+ "epoch": 2.251904,
347
+ "grad_norm": 18.271591186523438,
348
+ "learning_rate": 6.124287343215507e-05,
349
+ "loss": 40.4889990234375,
350
+ "step": 4400
351
+ },
352
+ {
353
+ "epoch": 2.303104,
354
+ "grad_norm": 20.265701293945312,
355
+ "learning_rate": 6.010262257696694e-05,
356
+ "loss": 40.1488427734375,
357
+ "step": 4500
358
+ },
359
+ {
360
+ "epoch": 2.354304,
361
+ "grad_norm": 19.53594398498535,
362
+ "learning_rate": 5.8962371721778794e-05,
363
+ "loss": 40.10376708984375,
364
+ "step": 4600
365
+ },
366
+ {
367
+ "epoch": 2.405504,
368
+ "grad_norm": 19.707582473754883,
369
+ "learning_rate": 5.782212086659066e-05,
370
+ "loss": 39.953662109375,
371
+ "step": 4700
372
+ },
373
+ {
374
+ "epoch": 2.456704,
375
+ "grad_norm": 15.16901683807373,
376
+ "learning_rate": 5.6681870011402515e-05,
377
+ "loss": 39.77469970703125,
378
+ "step": 4800
379
+ },
380
+ {
381
+ "epoch": 2.5002240000000002,
382
+ "eval_loss": 4.868845462799072,
383
+ "eval_runtime": 54.8341,
384
+ "eval_samples_per_second": 364.737,
385
+ "eval_steps_per_second": 11.398,
386
+ "step": 4885
387
+ },
388
+ {
389
+ "epoch": 2.507904,
390
+ "grad_norm": 17.12852668762207,
391
+ "learning_rate": 5.554161915621437e-05,
392
+ "loss": 39.64916015625,
393
+ "step": 4900
394
+ },
395
+ {
396
+ "epoch": 2.559104,
397
+ "grad_norm": 19.869260787963867,
398
+ "learning_rate": 5.440136830102622e-05,
399
+ "loss": 39.51341552734375,
400
+ "step": 5000
401
+ },
402
+ {
403
+ "epoch": 2.610304,
404
+ "grad_norm": 17.342073440551758,
405
+ "learning_rate": 5.326111744583808e-05,
406
+ "loss": 39.350986328125,
407
+ "step": 5100
408
+ },
409
+ {
410
+ "epoch": 2.661504,
411
+ "grad_norm": 19.635601043701172,
412
+ "learning_rate": 5.212086659064994e-05,
413
+ "loss": 39.2056787109375,
414
+ "step": 5200
415
+ },
416
+ {
417
+ "epoch": 2.712704,
418
+ "grad_norm": 16.2427921295166,
419
+ "learning_rate": 5.09806157354618e-05,
420
+ "loss": 38.98402587890625,
421
+ "step": 5300
422
+ },
423
+ {
424
+ "epoch": 2.763904,
425
+ "grad_norm": 21.025632858276367,
426
+ "learning_rate": 4.984036488027366e-05,
427
+ "loss": 38.84795166015625,
428
+ "step": 5400
429
+ },
430
+ {
431
+ "epoch": 2.815104,
432
+ "grad_norm": 16.225173950195312,
433
+ "learning_rate": 4.870011402508552e-05,
434
+ "loss": 38.63864013671875,
435
+ "step": 5500
436
+ },
437
+ {
438
+ "epoch": 2.866304,
439
+ "grad_norm": 19.175825119018555,
440
+ "learning_rate": 4.755986316989738e-05,
441
+ "loss": 38.5738232421875,
442
+ "step": 5600
443
+ },
444
+ {
445
+ "epoch": 2.917504,
446
+ "grad_norm": 18.190704345703125,
447
+ "learning_rate": 4.6419612314709235e-05,
448
+ "loss": 38.47998046875,
449
+ "step": 5700
450
+ },
451
+ {
452
+ "epoch": 2.968704,
453
+ "grad_norm": 17.764493942260742,
454
+ "learning_rate": 4.52793614595211e-05,
455
+ "loss": 38.36706298828125,
456
+ "step": 5800
457
+ },
458
+ {
459
+ "epoch": 3.0,
460
+ "eval_loss": 4.690184116363525,
461
+ "eval_runtime": 54.4789,
462
+ "eval_samples_per_second": 367.115,
463
+ "eval_steps_per_second": 11.472,
464
+ "step": 5862
465
+ },
466
+ {
467
+ "epoch": 3.019456,
468
+ "grad_norm": 16.051603317260742,
469
+ "learning_rate": 4.4139110604332956e-05,
470
+ "loss": 37.86203369140625,
471
+ "step": 5900
472
+ },
473
+ {
474
+ "epoch": 3.070656,
475
+ "grad_norm": 16.511327743530273,
476
+ "learning_rate": 4.299885974914481e-05,
477
+ "loss": 38.09601318359375,
478
+ "step": 6000
479
+ },
480
+ {
481
+ "epoch": 3.121856,
482
+ "grad_norm": 16.089813232421875,
483
+ "learning_rate": 4.1858608893956676e-05,
484
+ "loss": 37.88030029296875,
485
+ "step": 6100
486
+ },
487
+ {
488
+ "epoch": 3.173056,
489
+ "grad_norm": 18.612686157226562,
490
+ "learning_rate": 4.0718358038768533e-05,
491
+ "loss": 37.66849853515625,
492
+ "step": 6200
493
+ },
494
+ {
495
+ "epoch": 3.224256,
496
+ "grad_norm": 17.79659652709961,
497
+ "learning_rate": 3.957810718358039e-05,
498
+ "loss": 37.43648681640625,
499
+ "step": 6300
500
+ },
501
+ {
502
+ "epoch": 3.275456,
503
+ "grad_norm": 16.967939376831055,
504
+ "learning_rate": 3.843785632839225e-05,
505
+ "loss": 37.43171142578125,
506
+ "step": 6400
507
+ },
508
+ {
509
+ "epoch": 3.326656,
510
+ "grad_norm": 16.4842529296875,
511
+ "learning_rate": 3.7297605473204104e-05,
512
+ "loss": 37.26192138671875,
513
+ "step": 6500
514
+ },
515
+ {
516
+ "epoch": 3.377856,
517
+ "grad_norm": 16.261512756347656,
518
+ "learning_rate": 3.615735461801597e-05,
519
+ "loss": 37.26702392578125,
520
+ "step": 6600
521
+ },
522
+ {
523
+ "epoch": 3.429056,
524
+ "grad_norm": 17.182903289794922,
525
+ "learning_rate": 3.5017103762827825e-05,
526
+ "loss": 37.16359619140625,
527
+ "step": 6700
528
+ },
529
+ {
530
+ "epoch": 3.480256,
531
+ "grad_norm": 17.765832901000977,
532
+ "learning_rate": 3.387685290763968e-05,
533
+ "loss": 37.05822998046875,
534
+ "step": 6800
535
+ },
536
+ {
537
+ "epoch": 3.5002240000000002,
538
+ "eval_loss": 4.552316188812256,
539
+ "eval_runtime": 54.6295,
540
+ "eval_samples_per_second": 366.103,
541
+ "eval_steps_per_second": 11.441,
542
+ "step": 6839
543
+ },
544
+ {
545
+ "epoch": 3.531456,
546
+ "grad_norm": 18.267107009887695,
547
+ "learning_rate": 3.2736602052451546e-05,
548
+ "loss": 36.907744140625,
549
+ "step": 6900
550
+ },
551
+ {
552
+ "epoch": 3.582656,
553
+ "grad_norm": 20.562347412109375,
554
+ "learning_rate": 3.15963511972634e-05,
555
+ "loss": 36.8392919921875,
556
+ "step": 7000
557
+ },
558
+ {
559
+ "epoch": 3.6338559999999998,
560
+ "grad_norm": 16.440811157226562,
561
+ "learning_rate": 3.0456100342075257e-05,
562
+ "loss": 36.639384765625,
563
+ "step": 7100
564
+ },
565
+ {
566
+ "epoch": 3.685056,
567
+ "grad_norm": 17.070350646972656,
568
+ "learning_rate": 2.9315849486887114e-05,
569
+ "loss": 36.63670166015625,
570
+ "step": 7200
571
+ },
572
+ {
573
+ "epoch": 3.736256,
574
+ "grad_norm": 17.15755844116211,
575
+ "learning_rate": 2.8175598631698974e-05,
576
+ "loss": 36.6428759765625,
577
+ "step": 7300
578
+ },
579
+ {
580
+ "epoch": 3.787456,
581
+ "grad_norm": 18.988977432250977,
582
+ "learning_rate": 2.7035347776510834e-05,
583
+ "loss": 36.578349609375,
584
+ "step": 7400
585
+ },
586
+ {
587
+ "epoch": 3.838656,
588
+ "grad_norm": 17.87462615966797,
589
+ "learning_rate": 2.589509692132269e-05,
590
+ "loss": 36.4393994140625,
591
+ "step": 7500
592
+ },
593
+ {
594
+ "epoch": 3.889856,
595
+ "grad_norm": 18.244951248168945,
596
+ "learning_rate": 2.4754846066134552e-05,
597
+ "loss": 36.40805908203125,
598
+ "step": 7600
599
+ },
600
+ {
601
+ "epoch": 3.941056,
602
+ "grad_norm": 16.166940689086914,
603
+ "learning_rate": 2.361459521094641e-05,
604
+ "loss": 36.23387451171875,
605
+ "step": 7700
606
+ },
607
+ {
608
+ "epoch": 3.9922560000000002,
609
+ "grad_norm": 19.10250473022461,
610
+ "learning_rate": 2.2474344355758266e-05,
611
+ "loss": 36.08807373046875,
612
+ "step": 7800
613
+ },
614
+ {
615
+ "epoch": 4.0,
616
+ "eval_loss": 4.433136940002441,
617
+ "eval_runtime": 54.6448,
618
+ "eval_samples_per_second": 366.0,
619
+ "eval_steps_per_second": 11.438,
620
+ "step": 7816
621
+ },
622
+ {
623
+ "epoch": 4.043008,
624
+ "grad_norm": 17.705839157104492,
625
+ "learning_rate": 2.1334093500570126e-05,
626
+ "loss": 35.8281103515625,
627
+ "step": 7900
628
+ },
629
+ {
630
+ "epoch": 4.094208,
631
+ "grad_norm": 16.15859031677246,
632
+ "learning_rate": 2.0193842645381987e-05,
633
+ "loss": 35.8823193359375,
634
+ "step": 8000
635
+ },
636
+ {
637
+ "epoch": 4.145408,
638
+ "grad_norm": 17.96906852722168,
639
+ "learning_rate": 1.9053591790193844e-05,
640
+ "loss": 35.968681640625,
641
+ "step": 8100
642
+ },
643
+ {
644
+ "epoch": 4.196608,
645
+ "grad_norm": 16.74631118774414,
646
+ "learning_rate": 1.79133409350057e-05,
647
+ "loss": 35.894248046875,
648
+ "step": 8200
649
+ },
650
+ {
651
+ "epoch": 4.247808,
652
+ "grad_norm": 16.067609786987305,
653
+ "learning_rate": 1.677309007981756e-05,
654
+ "loss": 35.78274169921875,
655
+ "step": 8300
656
+ },
657
+ {
658
+ "epoch": 4.299008,
659
+ "grad_norm": 17.308250427246094,
660
+ "learning_rate": 1.563283922462942e-05,
661
+ "loss": 35.8246630859375,
662
+ "step": 8400
663
+ },
664
+ {
665
+ "epoch": 4.350208,
666
+ "grad_norm": 17.30617332458496,
667
+ "learning_rate": 1.4492588369441278e-05,
668
+ "loss": 35.7324560546875,
669
+ "step": 8500
670
+ },
671
+ {
672
+ "epoch": 4.401408,
673
+ "grad_norm": 16.38850975036621,
674
+ "learning_rate": 1.3352337514253135e-05,
675
+ "loss": 35.88875244140625,
676
+ "step": 8600
677
+ },
678
+ {
679
+ "epoch": 4.452608,
680
+ "grad_norm": 20.229633331298828,
681
+ "learning_rate": 1.2212086659064994e-05,
682
+ "loss": 35.7011962890625,
683
+ "step": 8700
684
+ },
685
+ {
686
+ "epoch": 4.500224,
687
+ "eval_loss": 4.369416236877441,
688
+ "eval_runtime": 54.3995,
689
+ "eval_samples_per_second": 367.651,
690
+ "eval_steps_per_second": 11.489,
691
+ "step": 8793
692
+ },
693
+ {
694
+ "epoch": 4.503808,
695
+ "grad_norm": 18.37810707092285,
696
+ "learning_rate": 1.1071835803876854e-05,
697
+ "loss": 35.69582763671875,
698
+ "step": 8800
699
+ },
700
+ {
701
+ "epoch": 4.555008,
702
+ "grad_norm": 16.187055587768555,
703
+ "learning_rate": 9.931584948688711e-06,
704
+ "loss": 35.73132080078125,
705
+ "step": 8900
706
+ },
707
+ {
708
+ "epoch": 4.606208,
709
+ "grad_norm": 16.89447021484375,
710
+ "learning_rate": 8.79133409350057e-06,
711
+ "loss": 35.47484375,
712
+ "step": 9000
713
+ },
714
+ {
715
+ "epoch": 4.657408,
716
+ "grad_norm": 17.172351837158203,
717
+ "learning_rate": 7.651083238312429e-06,
718
+ "loss": 35.4910205078125,
719
+ "step": 9100
720
+ },
721
+ {
722
+ "epoch": 4.708608,
723
+ "grad_norm": 18.75235939025879,
724
+ "learning_rate": 6.5108323831242875e-06,
725
+ "loss": 35.401552734375,
726
+ "step": 9200
727
+ },
728
+ {
729
+ "epoch": 4.759808,
730
+ "grad_norm": 17.79449462890625,
731
+ "learning_rate": 5.370581527936146e-06,
732
+ "loss": 35.37335693359375,
733
+ "step": 9300
734
+ },
735
+ {
736
+ "epoch": 4.811008,
737
+ "grad_norm": 16.077747344970703,
738
+ "learning_rate": 4.230330672748005e-06,
739
+ "loss": 35.395380859375,
740
+ "step": 9400
741
+ },
742
+ {
743
+ "epoch": 4.862208,
744
+ "grad_norm": 17.572834014892578,
745
+ "learning_rate": 3.0900798175598636e-06,
746
+ "loss": 35.3202685546875,
747
+ "step": 9500
748
+ },
749
+ {
750
+ "epoch": 4.913408,
751
+ "grad_norm": 17.8610897064209,
752
+ "learning_rate": 1.949828962371722e-06,
753
+ "loss": 35.46013916015625,
754
+ "step": 9600
755
+ },
756
+ {
757
+ "epoch": 4.964608,
758
+ "grad_norm": 18.012996673583984,
759
+ "learning_rate": 8.095781071835805e-07,
760
+ "loss": 35.363046875,
761
+ "step": 9700
762
+ },
763
+ {
764
+ "epoch": 5.0,
765
+ "eval_loss": 4.354217052459717,
766
+ "eval_runtime": 54.5977,
767
+ "eval_samples_per_second": 366.316,
768
+ "eval_steps_per_second": 11.447,
769
+ "step": 9770
770
+ }
771
+ ],
772
+ "logging_steps": 100,
773
+ "max_steps": 9770,
774
+ "num_input_tokens_seen": 0,
775
+ "num_train_epochs": 5,
776
+ "save_steps": 977,
777
+ "stateful_callbacks": {
778
+ "TrainerControl": {
779
+ "args": {
780
+ "should_epoch_stop": false,
781
+ "should_evaluate": false,
782
+ "should_log": false,
783
+ "should_save": true,
784
+ "should_training_stop": true
785
+ },
786
+ "attributes": {}
787
+ }
788
+ },
789
+ "total_flos": 6.5802339072e+17,
790
+ "train_batch_size": 32,
791
+ "trial_name": null,
792
+ "trial_params": null
793
+ }
checkpoint-9770/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3de483099e0f14e67b25caaa2bbb1cb1097bf08c651d7169f2211a8fd2657c92
3
+ size 5137
config.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_cross_attention": false,
3
+ "architectures": [
4
+ "RobertaForMaskedLM"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 1,
8
+ "classifier_dropout": null,
9
+ "dtype": "float32",
10
+ "eos_token_id": 2,
11
+ "hidden_act": "gelu",
12
+ "hidden_dropout_prob": 0.1,
13
+ "hidden_size": 768,
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 3072,
16
+ "is_decoder": false,
17
+ "layer_norm_eps": 1e-12,
18
+ "max_position_embeddings": 514,
19
+ "model_type": "roberta",
20
+ "num_attention_heads": 12,
21
+ "num_hidden_layers": 12,
22
+ "pad_token_id": 0,
23
+ "tie_word_embeddings": true,
24
+ "transformers_version": "5.0.0",
25
+ "type_vocab_size": 1,
26
+ "use_cache": false,
27
+ "vocab_size": 32001
28
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e5b34a93e4edb5d17560bd29970e86848d3ed25c9ed758b749e1f9bcbfa93606
3
+ size 442633884
sindhi_bpe_32k.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c4b0bb0c4dfcc9dac594b288c5eb6bb103388fc39f75d40003d4d6a2ddf8cf46
3
+ size 644934
tokenizer_config.json ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<pad>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<unk>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "<s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "</s>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "32000": {
36
+ "content": "<mask>",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "additional_special_tokens": null,
45
+ "backend": "custom",
46
+ "bos_token": "<s>",
47
+ "eos_token": "</s>",
48
+ "mask_token": "<mask>",
49
+ "model_max_length": 1000000000000000019884624838656,
50
+ "pad_token": "<pad>",
51
+ "tokenizer_class": "SindhiTokenizer",
52
+ "unk_token": "<unk>"
53
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3de483099e0f14e67b25caaa2bbb1cb1097bf08c651d7169f2211a8fd2657c92
3
+ size 5137