hellosindh commited on
Commit
0ad343a
·
verified ·
1 Parent(s): b500d27

Upload folder using huggingface_hub

Browse files
README.md CHANGED
@@ -11,73 +11,37 @@ tags:
11
 
12
  # Sindhi-BERT-base
13
 
14
- The first BERT-style language model trained from scratch on Sindhi text, using a custom Sindhi BPE tokenizer with 32,000 pure Sindhi tokens.
15
 
16
  ## Training History
17
 
18
- | Session | Data | Epochs | Perplexity | Time |
19
  |---|---|---|---|---|
20
- | Session 1 | 500K lines | 5 | 78.10 | 301 min |
21
- | Session 2 | 1.5M lines | 3 | 41.62 | 359 min |
 
22
 
23
  ## Model Details
24
 
25
  | Detail | Value |
26
  |---|---|
27
  | Architecture | RoBERTa-base |
28
- | Vocabulary | 32,000 tokens (pure Sindhi BPE) |
29
- | Hidden size | 768 |
30
- | Layers | 12 |
31
- | Attention heads | 12 |
32
- | Max length | 512 tokens |
33
  | Parameters | ~125M |
34
  | Language | Sindhi (sd) |
35
  | License | MIT |
36
 
37
- ## Fill-Mask Quality
38
 
39
- | Session | Score |
40
- |---|---|
41
- | Session 1 | 50% (5/10) |
42
- | Session 2 | 70% (7/10) |
43
-
44
- ## Fill-Mask Examples (Session 2)
45
-
46
- | Input | Top Prediction | Confidence | Quality |
47
- |---|---|---|---|
48
- | پاڪستان ۾ سنڌي ___ گهڻي تعداد | ماڻهو (people) | 49.78% | Perfect |
49
- | سنڌي ___ دنيا جي قديم ٻولين | ٻولي (language) | 22.25% | Perfect |
50
- | شاهه لطيف سنڌي ___ جو وڏو شاعر | شاعريءَ | 22.22% | Perfect |
51
- | استاد شاگردن کي ___ سيکاري | تعليم (education) | 10.61% | Good |
52
- | ڪراچي سنڌ جو سڀ کان وڏو ___ | شهر (city) | 9.04% | Perfect |
53
- | سنڌ جي ___ ڏاڍي پراڻي آهي | تاريخ (history) | 7.48% | Perfect |
54
- | دنيا ___ گھڻي مصروف آھي | ۾ (in) | 38.99% | Perfect |
55
-
56
- ## Tokenizer
57
-
58
- Custom Sindhi BPE tokenizer — every Sindhi word stays as ONE token:
59
-
60
- Input : سنڌي ٻولي دنيا جي قديم
61
- Tokens : ['سنڌي', 'ٻولي', 'دنيا', 'جي', 'قديم']
62
- Count : 5 words = 5 tokens
63
-
64
- ## Comparison With Other Models
65
-
66
- | Model | Type | Perplexity | Fill-mask |
67
- |---|---|---|---|
68
- | mBERT fine-tuned | Multilingual | 4.19 | Poor — predicts punctuation |
69
- | XLM-R fine-tuned | Multilingual | 5.88 | 80% correct |
70
- | Sindhi-BERT S1 | Sindhi only | 78.10 | 50% |
71
- | Sindhi-BERT S2 | Sindhi only | 41.62 | 70% |
72
-
73
- ## Roadmap
74
-
75
- - [x] Custom Sindhi BPE tokenizer (32K vocab)
76
- - [x] Session 1 — 500K lines, 5 epochs, perplexity 78
77
- - [x] Session 2 — 1.5M lines, 3 epochs, perplexity 41
78
- - [ ] Session 3 — new data + full corpus
79
- - [ ] Session 4 — lower LR, fine tuning
80
- - [ ] Spell checker fine-tuning
81
- - [ ] Next word prediction
82
- - [ ] Sindhi chatbot
83
 
 
 
 
 
 
 
 
11
 
12
  # Sindhi-BERT-base
13
 
14
+ First BERT-style model trained from scratch on Sindhi text.
15
 
16
  ## Training History
17
 
18
+ | Session | Data | Epochs | PPL | Time |
19
  |---|---|---|---|---|
20
+ | Session 1 | 500K lines | 5 | 78.10 | 301 min |
21
+ | Session 2 | 1.5M lines | 3 | 41.62 | 359 min |
22
+ | Session 3 | 74M words | 2 | 27.99 | 224 min |
23
 
24
  ## Model Details
25
 
26
  | Detail | Value |
27
  |---|---|
28
  | Architecture | RoBERTa-base |
29
+ | Vocabulary | 32,000 pure Sindhi BPE tokens |
 
 
 
 
30
  | Parameters | ~125M |
31
  | Language | Sindhi (sd) |
32
  | License | MIT |
33
 
34
+ ## Usage
35
 
36
+ ```python
37
+ from transformers import AutoModelForMaskedLM
38
+ import sentencepiece as spm, torch
39
+ import torch.nn.functional as F
40
+ from huggingface_hub import hf_hub_download
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
 
42
+ model = AutoModelForMaskedLM.from_pretrained("hellosindh/sindhi-bert-base")
43
+ sp_path = hf_hub_download("hellosindh/sindhi-bert-base", "sindhi_bpe_32k.model")
44
+ sp = spm.SentencePieceProcessor()
45
+ sp.Load(sp_path)
46
+ MASK_ID = 32000
47
+ ```
checkpoint-11542/config.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_cross_attention": false,
3
+ "architectures": [
4
+ "RobertaForMaskedLM"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 1,
8
+ "classifier_dropout": null,
9
+ "dtype": "float32",
10
+ "eos_token_id": 2,
11
+ "hidden_act": "gelu",
12
+ "hidden_dropout_prob": 0.1,
13
+ "hidden_size": 768,
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 3072,
16
+ "is_decoder": false,
17
+ "layer_norm_eps": 1e-12,
18
+ "max_position_embeddings": 514,
19
+ "model_type": "roberta",
20
+ "num_attention_heads": 12,
21
+ "num_hidden_layers": 12,
22
+ "pad_token_id": 0,
23
+ "tie_word_embeddings": true,
24
+ "transformers_version": "5.0.0",
25
+ "type_vocab_size": 1,
26
+ "use_cache": false,
27
+ "vocab_size": 32001
28
+ }
checkpoint-11542/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:af548243f2a2884a4a369c6b04c497110cb9a587cea0a5041e9a0820c72889ef
3
+ size 442633860
checkpoint-11542/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:77744d027c70a5ef30bd24f999356c00ee3fe802c5ecaba8358378793b993c8b
3
+ size 885391563
checkpoint-11542/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c9053f5f5001dda2a9608877e8245dbe87ed3333b99010161983641bb611e08e
3
+ size 14645
checkpoint-11542/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:30654f83d141e16e96d98bb3d11de7ded54c6c2d945e1c8c11c08529794cf46c
3
+ size 1465
checkpoint-11542/trainer_state.json ADDED
@@ -0,0 +1,864 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 11542,
3
+ "best_metric": 3.3484463691711426,
4
+ "best_model_checkpoint": "sindhibert_session3/checkpoint-11542",
5
+ "epoch": 2.0,
6
+ "eval_steps": 5771,
7
+ "global_step": 11542,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.017328019407381736,
14
+ "grad_norm": 9.74232006072998,
15
+ "learning_rate": 5.147313691507799e-06,
16
+ "loss": 16.534342041015623,
17
+ "step": 100
18
+ },
19
+ {
20
+ "epoch": 0.03465603881476347,
21
+ "grad_norm": 9.413031578063965,
22
+ "learning_rate": 1.0346620450606586e-05,
23
+ "loss": 16.06064208984375,
24
+ "step": 200
25
+ },
26
+ {
27
+ "epoch": 0.05198405822214521,
28
+ "grad_norm": 9.366157531738281,
29
+ "learning_rate": 1.554592720970537e-05,
30
+ "loss": 15.73246337890625,
31
+ "step": 300
32
+ },
33
+ {
34
+ "epoch": 0.06931207762952694,
35
+ "grad_norm": 8.934579849243164,
36
+ "learning_rate": 2.074523396880416e-05,
37
+ "loss": 15.634798583984375,
38
+ "step": 400
39
+ },
40
+ {
41
+ "epoch": 0.08664009703690868,
42
+ "grad_norm": 9.873139381408691,
43
+ "learning_rate": 2.594454072790295e-05,
44
+ "loss": 15.491142578125,
45
+ "step": 500
46
+ },
47
+ {
48
+ "epoch": 0.10396811644429042,
49
+ "grad_norm": 9.112743377685547,
50
+ "learning_rate": 2.9999702019626288e-05,
51
+ "loss": 15.47271728515625,
52
+ "step": 600
53
+ },
54
+ {
55
+ "epoch": 0.12129613585167215,
56
+ "grad_norm": 8.721996307373047,
57
+ "learning_rate": 2.999083739047451e-05,
58
+ "loss": 15.291612548828125,
59
+ "step": 700
60
+ },
61
+ {
62
+ "epoch": 0.1386241552590539,
63
+ "grad_norm": 8.849467277526855,
64
+ "learning_rate": 2.9969667845201166e-05,
65
+ "loss": 15.32687255859375,
66
+ "step": 800
67
+ },
68
+ {
69
+ "epoch": 0.15595217466643563,
70
+ "grad_norm": 8.970343589782715,
71
+ "learning_rate": 2.9936210760385845e-05,
72
+ "loss": 15.221800537109376,
73
+ "step": 900
74
+ },
75
+ {
76
+ "epoch": 0.17328019407381737,
77
+ "grad_norm": 9.423188209533691,
78
+ "learning_rate": 2.9890493598578603e-05,
79
+ "loss": 15.21154541015625,
80
+ "step": 1000
81
+ },
82
+ {
83
+ "epoch": 0.1906082134811991,
84
+ "grad_norm": 10.529290199279785,
85
+ "learning_rate": 2.9832553885757926e-05,
86
+ "loss": 15.091610107421875,
87
+ "step": 1100
88
+ },
89
+ {
90
+ "epoch": 0.20793623288858085,
91
+ "grad_norm": 8.895530700683594,
92
+ "learning_rate": 2.97624391805283e-05,
93
+ "loss": 15.116024169921875,
94
+ "step": 1200
95
+ },
96
+ {
97
+ "epoch": 0.22526425229596256,
98
+ "grad_norm": 9.481012344360352,
99
+ "learning_rate": 2.968020703508272e-05,
100
+ "loss": 15.086820068359375,
101
+ "step": 1300
102
+ },
103
+ {
104
+ "epoch": 0.2425922717033443,
105
+ "grad_norm": 8.957283020019531,
106
+ "learning_rate": 2.9585924947962195e-05,
107
+ "loss": 15.09182373046875,
108
+ "step": 1400
109
+ },
110
+ {
111
+ "epoch": 0.25992029111072606,
112
+ "grad_norm": 8.475807189941406,
113
+ "learning_rate": 2.9479670308650942e-05,
114
+ "loss": 14.974696044921876,
115
+ "step": 1500
116
+ },
117
+ {
118
+ "epoch": 0.2772483105181078,
119
+ "grad_norm": 8.860872268676758,
120
+ "learning_rate": 2.9361530334052883e-05,
121
+ "loss": 14.967041015625,
122
+ "step": 1600
123
+ },
124
+ {
125
+ "epoch": 0.29457632992548954,
126
+ "grad_norm": 8.990629196166992,
127
+ "learning_rate": 2.9231601996901433e-05,
128
+ "loss": 14.9465673828125,
129
+ "step": 1700
130
+ },
131
+ {
132
+ "epoch": 0.31190434933287126,
133
+ "grad_norm": 9.683910369873047,
134
+ "learning_rate": 2.9089991946161484e-05,
135
+ "loss": 14.9761962890625,
136
+ "step": 1800
137
+ },
138
+ {
139
+ "epoch": 0.32923236874025297,
140
+ "grad_norm": 9.044540405273438,
141
+ "learning_rate": 2.89368164194888e-05,
142
+ "loss": 14.89200927734375,
143
+ "step": 1900
144
+ },
145
+ {
146
+ "epoch": 0.34656038814763473,
147
+ "grad_norm": 8.935420036315918,
148
+ "learning_rate": 2.8772201147818787e-05,
149
+ "loss": 14.9054736328125,
150
+ "step": 2000
151
+ },
152
+ {
153
+ "epoch": 0.36388840755501645,
154
+ "grad_norm": 8.12104320526123,
155
+ "learning_rate": 2.8596281252162868e-05,
156
+ "loss": 14.8011767578125,
157
+ "step": 2100
158
+ },
159
+ {
160
+ "epoch": 0.3812164269623982,
161
+ "grad_norm": 9.633867263793945,
162
+ "learning_rate": 2.840920113269721e-05,
163
+ "loss": 14.789473876953124,
164
+ "step": 2200
165
+ },
166
+ {
167
+ "epoch": 0.3985444463697799,
168
+ "grad_norm": 9.07466983795166,
169
+ "learning_rate": 2.8211114350234873e-05,
170
+ "loss": 14.80165283203125,
171
+ "step": 2300
172
+ },
173
+ {
174
+ "epoch": 0.4158724657771617,
175
+ "grad_norm": 9.412736892700195,
176
+ "learning_rate": 2.8002183500178594e-05,
177
+ "loss": 14.746627197265624,
178
+ "step": 2400
179
+ },
180
+ {
181
+ "epoch": 0.4332004851845434,
182
+ "grad_norm": 9.755793571472168,
183
+ "learning_rate": 2.7782580079057772e-05,
184
+ "loss": 14.778804931640625,
185
+ "step": 2500
186
+ },
187
+ {
188
+ "epoch": 0.4505285045919251,
189
+ "grad_norm": 9.882634162902832,
190
+ "learning_rate": 2.7552484343759096e-05,
191
+ "loss": 14.704544677734376,
192
+ "step": 2600
193
+ },
194
+ {
195
+ "epoch": 0.4678565239993069,
196
+ "grad_norm": 9.305146217346191,
197
+ "learning_rate": 2.731208516356645e-05,
198
+ "loss": 14.75770751953125,
199
+ "step": 2700
200
+ },
201
+ {
202
+ "epoch": 0.4851845434066886,
203
+ "grad_norm": 9.269790649414062,
204
+ "learning_rate": 2.7061579865131508e-05,
205
+ "loss": 14.68646484375,
206
+ "step": 2800
207
+ },
208
+ {
209
+ "epoch": 0.5025125628140703,
210
+ "grad_norm": 9.310648918151855,
211
+ "learning_rate": 2.6801174070502248e-05,
212
+ "loss": 14.635621337890624,
213
+ "step": 2900
214
+ },
215
+ {
216
+ "epoch": 0.5198405822214521,
217
+ "grad_norm": 9.239577293395996,
218
+ "learning_rate": 2.653108152834241e-05,
219
+ "loss": 14.71250732421875,
220
+ "step": 3000
221
+ },
222
+ {
223
+ "epoch": 0.5371686016288338,
224
+ "grad_norm": 9.674842834472656,
225
+ "learning_rate": 2.6251523938480346e-05,
226
+ "loss": 14.602254638671875,
227
+ "step": 3100
228
+ },
229
+ {
230
+ "epoch": 0.5544966210362156,
231
+ "grad_norm": 10.178524017333984,
232
+ "learning_rate": 2.5962730769931346e-05,
233
+ "loss": 14.558492431640625,
234
+ "step": 3200
235
+ },
236
+ {
237
+ "epoch": 0.5718246404435973,
238
+ "grad_norm": 9.312729835510254,
239
+ "learning_rate": 2.5664939072542787e-05,
240
+ "loss": 14.588648681640626,
241
+ "step": 3300
242
+ },
243
+ {
244
+ "epoch": 0.5891526598509791,
245
+ "grad_norm": 9.438308715820312,
246
+ "learning_rate": 2.5358393282416714e-05,
247
+ "loss": 14.535865478515625,
248
+ "step": 3400
249
+ },
250
+ {
251
+ "epoch": 0.6064806792583608,
252
+ "grad_norm": 8.51146125793457,
253
+ "learning_rate": 2.5043345021269554e-05,
254
+ "loss": 14.5489208984375,
255
+ "step": 3500
256
+ },
257
+ {
258
+ "epoch": 0.6238086986657425,
259
+ "grad_norm": 9.856837272644043,
260
+ "learning_rate": 2.4720052889893698e-05,
261
+ "loss": 14.565177001953124,
262
+ "step": 3600
263
+ },
264
+ {
265
+ "epoch": 0.6411367180731242,
266
+ "grad_norm": 9.223260879516602,
267
+ "learning_rate": 2.4388782255890405e-05,
268
+ "loss": 14.452093505859375,
269
+ "step": 3700
270
+ },
271
+ {
272
+ "epoch": 0.6584647374805059,
273
+ "grad_norm": 9.016181945800781,
274
+ "learning_rate": 2.404980503584838e-05,
275
+ "loss": 14.49298828125,
276
+ "step": 3800
277
+ },
278
+ {
279
+ "epoch": 0.6757927568878878,
280
+ "grad_norm": 9.865802764892578,
281
+ "learning_rate": 2.370339947214669e-05,
282
+ "loss": 14.474598388671875,
283
+ "step": 3900
284
+ },
285
+ {
286
+ "epoch": 0.6931207762952695,
287
+ "grad_norm": 8.965621948242188,
288
+ "learning_rate": 2.3349849904565318e-05,
289
+ "loss": 14.46911376953125,
290
+ "step": 4000
291
+ },
292
+ {
293
+ "epoch": 0.7104487957026512,
294
+ "grad_norm": 8.362798690795898,
295
+ "learning_rate": 2.2989446536890786e-05,
296
+ "loss": 14.390712890625,
297
+ "step": 4100
298
+ },
299
+ {
300
+ "epoch": 0.7277768151100329,
301
+ "grad_norm": 10.564478874206543,
302
+ "learning_rate": 2.2622485198708445e-05,
303
+ "loss": 14.45989501953125,
304
+ "step": 4200
305
+ },
306
+ {
307
+ "epoch": 0.7451048345174146,
308
+ "grad_norm": 9.188340187072754,
309
+ "learning_rate": 2.2249267102576903e-05,
310
+ "loss": 14.422335205078125,
311
+ "step": 4300
312
+ },
313
+ {
314
+ "epoch": 0.7624328539247964,
315
+ "grad_norm": 9.867836952209473,
316
+ "learning_rate": 2.1870098596784012e-05,
317
+ "loss": 14.341461181640625,
318
+ "step": 4400
319
+ },
320
+ {
321
+ "epoch": 0.7797608733321781,
322
+ "grad_norm": 9.469503402709961,
323
+ "learning_rate": 2.148529091388725e-05,
324
+ "loss": 14.42570556640625,
325
+ "step": 4500
326
+ },
327
+ {
328
+ "epoch": 0.7970888927395599,
329
+ "grad_norm": 9.195992469787598,
330
+ "learning_rate": 2.1095159915244956e-05,
331
+ "loss": 14.3226025390625,
332
+ "step": 4600
333
+ },
334
+ {
335
+ "epoch": 0.8144169121469416,
336
+ "grad_norm": 9.930395126342773,
337
+ "learning_rate": 2.070002583174816e-05,
338
+ "loss": 14.317152099609375,
339
+ "step": 4700
340
+ },
341
+ {
342
+ "epoch": 0.8317449315543234,
343
+ "grad_norm": 9.45024299621582,
344
+ "learning_rate": 2.0300213000965707e-05,
345
+ "loss": 14.355799560546876,
346
+ "step": 4800
347
+ },
348
+ {
349
+ "epoch": 0.8490729509617051,
350
+ "grad_norm": 9.889897346496582,
351
+ "learning_rate": 1.989604960091854e-05,
352
+ "loss": 14.314393310546874,
353
+ "step": 4900
354
+ },
355
+ {
356
+ "epoch": 0.8664009703690868,
357
+ "grad_norm": 10.8844575881958,
358
+ "learning_rate": 1.948786738070162e-05,
359
+ "loss": 14.279014892578125,
360
+ "step": 5000
361
+ },
362
+ {
363
+ "epoch": 0.8837289897764685,
364
+ "grad_norm": 9.387309074401855,
365
+ "learning_rate": 1.9076001388174608e-05,
366
+ "loss": 14.240478515625,
367
+ "step": 5100
368
+ },
369
+ {
370
+ "epoch": 0.9010570091838502,
371
+ "grad_norm": 10.535667419433594,
372
+ "learning_rate": 1.866078969494479e-05,
373
+ "loss": 14.26585205078125,
374
+ "step": 5200
375
+ },
376
+ {
377
+ "epoch": 0.918385028591232,
378
+ "grad_norm": 9.147391319274902,
379
+ "learning_rate": 1.8242573118868094e-05,
380
+ "loss": 14.309058837890625,
381
+ "step": 5300
382
+ },
383
+ {
384
+ "epoch": 0.9357130479986138,
385
+ "grad_norm": 9.556977272033691,
386
+ "learning_rate": 1.7821694944295836e-05,
387
+ "loss": 14.21564453125,
388
+ "step": 5400
389
+ },
390
+ {
391
+ "epoch": 0.9530410674059955,
392
+ "grad_norm": 9.025933265686035,
393
+ "learning_rate": 1.7398500640296928e-05,
394
+ "loss": 14.192568359375,
395
+ "step": 5500
396
+ },
397
+ {
398
+ "epoch": 0.9703690868133772,
399
+ "grad_norm": 9.630436897277832,
400
+ "learning_rate": 1.6973337577086803e-05,
401
+ "loss": 14.193314208984376,
402
+ "step": 5600
403
+ },
404
+ {
405
+ "epoch": 0.987697106220759,
406
+ "grad_norm": 9.064878463745117,
407
+ "learning_rate": 1.6546554740895815e-05,
408
+ "loss": 14.1739111328125,
409
+ "step": 5700
410
+ },
411
+ {
412
+ "epoch": 1.0,
413
+ "eval_loss": 3.394857168197632,
414
+ "eval_runtime": 22.6074,
415
+ "eval_samples_per_second": 660.048,
416
+ "eval_steps_per_second": 10.351,
417
+ "step": 5771
418
+ },
419
+ {
420
+ "epoch": 1.0050251256281406,
421
+ "grad_norm": 10.425875663757324,
422
+ "learning_rate": 1.611850244751118e-05,
423
+ "loss": 14.170721435546875,
424
+ "step": 5800
425
+ },
426
+ {
427
+ "epoch": 1.0223531450355225,
428
+ "grad_norm": 8.938867568969727,
429
+ "learning_rate": 1.5689532054727568e-05,
430
+ "loss": 14.155902099609374,
431
+ "step": 5900
432
+ },
433
+ {
434
+ "epoch": 1.0396811644429043,
435
+ "grad_norm": 9.677651405334473,
436
+ "learning_rate": 1.525999567394238e-05,
437
+ "loss": 14.137279052734375,
438
+ "step": 6000
439
+ },
440
+ {
441
+ "epoch": 1.057009183850286,
442
+ "grad_norm": 9.292587280273438,
443
+ "learning_rate": 1.4830245881132463e-05,
444
+ "loss": 14.072491455078126,
445
+ "step": 6100
446
+ },
447
+ {
448
+ "epoch": 1.0743372032576677,
449
+ "grad_norm": 9.849185943603516,
450
+ "learning_rate": 1.4400635427449486e-05,
451
+ "loss": 14.121292724609376,
452
+ "step": 6200
453
+ },
454
+ {
455
+ "epoch": 1.0916652226650494,
456
+ "grad_norm": 9.886627197265625,
457
+ "learning_rate": 1.3971516949671474e-05,
458
+ "loss": 14.058907470703126,
459
+ "step": 6300
460
+ },
461
+ {
462
+ "epoch": 1.108993242072431,
463
+ "grad_norm": 10.207225799560547,
464
+ "learning_rate": 1.3543242680748322e-05,
465
+ "loss": 14.07645263671875,
466
+ "step": 6400
467
+ },
468
+ {
469
+ "epoch": 1.1263212614798128,
470
+ "grad_norm": 10.271860122680664,
471
+ "learning_rate": 1.311616416067868e-05,
472
+ "loss": 14.01097412109375,
473
+ "step": 6500
474
+ },
475
+ {
476
+ "epoch": 1.1436492808871945,
477
+ "grad_norm": 9.155773162841797,
478
+ "learning_rate": 1.2690631947955715e-05,
479
+ "loss": 14.044959716796875,
480
+ "step": 6600
481
+ },
482
+ {
483
+ "epoch": 1.1609773002945762,
484
+ "grad_norm": 9.51415729522705,
485
+ "learning_rate": 1.2266995331818446e-05,
486
+ "loss": 14.045927734375,
487
+ "step": 6700
488
+ },
489
+ {
490
+ "epoch": 1.1783053197019582,
491
+ "grad_norm": 9.813915252685547,
492
+ "learning_rate": 1.184560204554501e-05,
493
+ "loss": 14.03373291015625,
494
+ "step": 6800
495
+ },
496
+ {
497
+ "epoch": 1.1956333391093399,
498
+ "grad_norm": 9.063379287719727,
499
+ "learning_rate": 1.1426797981023001e-05,
500
+ "loss": 14.052874755859374,
501
+ "step": 6900
502
+ },
503
+ {
504
+ "epoch": 1.2129613585167216,
505
+ "grad_norm": 9.815446853637695,
506
+ "learning_rate": 1.1010926904831378e-05,
507
+ "loss": 14.02966552734375,
508
+ "step": 7000
509
+ },
510
+ {
511
+ "epoch": 1.2302893779241033,
512
+ "grad_norm": 9.000835418701172,
513
+ "learning_rate": 1.0598330176066803e-05,
514
+ "loss": 14.0574609375,
515
+ "step": 7100
516
+ },
517
+ {
518
+ "epoch": 1.247617397331485,
519
+ "grad_norm": 9.734049797058105,
520
+ "learning_rate": 1.0189346466146175e-05,
521
+ "loss": 13.99876953125,
522
+ "step": 7200
523
+ },
524
+ {
525
+ "epoch": 1.2649454167388667,
526
+ "grad_norm": 9.00645923614502,
527
+ "learning_rate": 9.784311480815246e-06,
528
+ "loss": 14.043223876953125,
529
+ "step": 7300
530
+ },
531
+ {
532
+ "epoch": 1.2822734361462484,
533
+ "grad_norm": 9.783398628234863,
534
+ "learning_rate": 9.38355768459158e-06,
535
+ "loss": 13.970205078125,
536
+ "step": 7400
537
+ },
538
+ {
539
+ "epoch": 1.2996014555536302,
540
+ "grad_norm": 10.050580024719238,
541
+ "learning_rate": 8.98741402786796e-06,
542
+ "loss": 13.93072021484375,
543
+ "step": 7500
544
+ },
545
+ {
546
+ "epoch": 1.3169294749610119,
547
+ "grad_norm": 8.815546035766602,
548
+ "learning_rate": 8.596205676900367e-06,
549
+ "loss": 14.004686279296875,
550
+ "step": 7600
551
+ },
552
+ {
553
+ "epoch": 1.3342574943683938,
554
+ "grad_norm": 10.487207412719727,
555
+ "learning_rate": 8.210253746901994e-06,
556
+ "loss": 13.99391845703125,
557
+ "step": 7700
558
+ },
559
+ {
560
+ "epoch": 1.3515855137757753,
561
+ "grad_norm": 9.005860328674316,
562
+ "learning_rate": 7.829875038462556e-06,
563
+ "loss": 13.9050439453125,
564
+ "step": 7800
565
+ },
566
+ {
567
+ "epoch": 1.3689135331831572,
568
+ "grad_norm": 9.685938835144043,
569
+ "learning_rate": 7.4553817775091135e-06,
570
+ "loss": 13.942437744140625,
571
+ "step": 7900
572
+ },
573
+ {
574
+ "epoch": 1.386241552590539,
575
+ "grad_norm": 8.673903465270996,
576
+ "learning_rate": 7.087081359021974e-06,
577
+ "loss": 13.9566064453125,
578
+ "step": 8000
579
+ },
580
+ {
581
+ "epoch": 1.4035695719979207,
582
+ "grad_norm": 9.947382926940918,
583
+ "learning_rate": 6.7252760947158586e-06,
584
+ "loss": 13.9610302734375,
585
+ "step": 8100
586
+ },
587
+ {
588
+ "epoch": 1.4208975914053024,
589
+ "grad_norm": 10.123763084411621,
590
+ "learning_rate": 6.370262964893738e-06,
591
+ "loss": 13.928218994140625,
592
+ "step": 8200
593
+ },
594
+ {
595
+ "epoch": 1.438225610812684,
596
+ "grad_norm": 10.419230461120605,
597
+ "learning_rate": 6.0223333746766456e-06,
598
+ "loss": 13.940389404296875,
599
+ "step": 8300
600
+ },
601
+ {
602
+ "epoch": 1.4555536302200658,
603
+ "grad_norm": 9.33340835571289,
604
+ "learning_rate": 5.6817729148099585e-06,
605
+ "loss": 13.91553955078125,
606
+ "step": 8400
607
+ },
608
+ {
609
+ "epoch": 1.4728816496274475,
610
+ "grad_norm": 9.930194854736328,
611
+ "learning_rate": 5.3488611272421005e-06,
612
+ "loss": 13.920137939453125,
613
+ "step": 8500
614
+ },
615
+ {
616
+ "epoch": 1.4902096690348294,
617
+ "grad_norm": 9.439746856689453,
618
+ "learning_rate": 5.023871275668458e-06,
619
+ "loss": 13.894053955078125,
620
+ "step": 8600
621
+ },
622
+ {
623
+ "epoch": 1.507537688442211,
624
+ "grad_norm": 10.869328498840332,
625
+ "learning_rate": 4.707070121228482e-06,
626
+ "loss": 13.908199462890625,
627
+ "step": 8700
628
+ },
629
+ {
630
+ "epoch": 1.5248657078495929,
631
+ "grad_norm": 9.644942283630371,
632
+ "learning_rate": 4.398717703540468e-06,
633
+ "loss": 13.870057373046874,
634
+ "step": 8800
635
+ },
636
+ {
637
+ "epoch": 1.5421937272569746,
638
+ "grad_norm": 9.200098991394043,
639
+ "learning_rate": 4.099067127253367e-06,
640
+ "loss": 13.87569580078125,
641
+ "step": 8900
642
+ },
643
+ {
644
+ "epoch": 1.5595217466643563,
645
+ "grad_norm": 9.218724250793457,
646
+ "learning_rate": 3.8083643542912018e-06,
647
+ "loss": 13.833634033203126,
648
+ "step": 9000
649
+ },
650
+ {
651
+ "epoch": 1.576849766071738,
652
+ "grad_norm": 9.068521499633789,
653
+ "learning_rate": 3.526848001960283e-06,
654
+ "loss": 13.915274658203124,
655
+ "step": 9100
656
+ },
657
+ {
658
+ "epoch": 1.5941777854791197,
659
+ "grad_norm": 9.659659385681152,
660
+ "learning_rate": 3.2547491470852124e-06,
661
+ "loss": 13.857677001953125,
662
+ "step": 9200
663
+ },
664
+ {
665
+ "epoch": 1.6115058048865016,
666
+ "grad_norm": 9.85434341430664,
667
+ "learning_rate": 2.992291136334279e-06,
668
+ "loss": 13.899166259765625,
669
+ "step": 9300
670
+ },
671
+ {
672
+ "epoch": 1.6288338242938831,
673
+ "grad_norm": 9.265457153320312,
674
+ "learning_rate": 2.7396894028900064e-06,
675
+ "loss": 13.8499951171875,
676
+ "step": 9400
677
+ },
678
+ {
679
+ "epoch": 1.646161843701265,
680
+ "grad_norm": 9.825538635253906,
681
+ "learning_rate": 2.497151289615319e-06,
682
+ "loss": 13.880186767578126,
683
+ "step": 9500
684
+ },
685
+ {
686
+ "epoch": 1.6634898631086465,
687
+ "grad_norm": 9.972418785095215,
688
+ "learning_rate": 2.2648758788604805e-06,
689
+ "loss": 13.867176513671875,
690
+ "step": 9600
691
+ },
692
+ {
693
+ "epoch": 1.6808178825160285,
694
+ "grad_norm": 8.635448455810547,
695
+ "learning_rate": 2.043053829050502e-06,
696
+ "loss": 13.825604248046876,
697
+ "step": 9700
698
+ },
699
+ {
700
+ "epoch": 1.6981459019234102,
701
+ "grad_norm": 9.8888578414917,
702
+ "learning_rate": 1.8318672181871465e-06,
703
+ "loss": 13.817935791015625,
704
+ "step": 9800
705
+ },
706
+ {
707
+ "epoch": 1.715473921330792,
708
+ "grad_norm": 8.874034881591797,
709
+ "learning_rate": 1.631489394394005e-06,
710
+ "loss": 13.843865966796875,
711
+ "step": 9900
712
+ },
713
+ {
714
+ "epoch": 1.7328019407381736,
715
+ "grad_norm": 10.087613105773926,
716
+ "learning_rate": 1.4420848336272991e-06,
717
+ "loss": 13.775987548828125,
718
+ "step": 10000
719
+ },
720
+ {
721
+ "epoch": 1.7501299601455553,
722
+ "grad_norm": 9.76197624206543,
723
+ "learning_rate": 1.2638090046692313e-06,
724
+ "loss": 13.89355224609375,
725
+ "step": 10100
726
+ },
727
+ {
728
+ "epoch": 1.7674579795529373,
729
+ "grad_norm": 9.227717399597168,
730
+ "learning_rate": 1.0968082415146735e-06,
731
+ "loss": 13.792811279296876,
732
+ "step": 10200
733
+ },
734
+ {
735
+ "epoch": 1.7847859989603188,
736
+ "grad_norm": 9.555807113647461,
737
+ "learning_rate": 9.412196232559611e-07,
738
+ "loss": 13.825599365234375,
739
+ "step": 10300
740
+ },
741
+ {
742
+ "epoch": 1.8021140183677007,
743
+ "grad_norm": 9.286114692687988,
744
+ "learning_rate": 7.971708615643874e-07,
745
+ "loss": 13.7984130859375,
746
+ "step": 10400
747
+ },
748
+ {
749
+ "epoch": 1.8194420377750822,
750
+ "grad_norm": 10.93297290802002,
751
+ "learning_rate": 6.647801958607236e-07,
752
+ "loss": 13.85096435546875,
753
+ "step": 10500
754
+ },
755
+ {
756
+ "epoch": 1.836770057182464,
757
+ "grad_norm": 9.38337516784668,
758
+ "learning_rate": 5.441562962608837e-07,
759
+ "loss": 13.84087158203125,
760
+ "step": 10600
761
+ },
762
+ {
763
+ "epoch": 1.8540980765898458,
764
+ "grad_norm": 9.067867279052734,
765
+ "learning_rate": 4.353981743762975e-07,
766
+ "loss": 13.902230224609376,
767
+ "step": 10700
768
+ },
769
+ {
770
+ "epoch": 1.8714260959972275,
771
+ "grad_norm": 9.294976234436035,
772
+ "learning_rate": 3.385951020423256e-07,
773
+ "loss": 13.867879638671875,
774
+ "step": 10800
775
+ },
776
+ {
777
+ "epoch": 1.8887541154046092,
778
+ "grad_norm": 8.976114273071289,
779
+ "learning_rate": 2.5382653804130686e-07,
780
+ "loss": 13.81992919921875,
781
+ "step": 10900
782
+ },
783
+ {
784
+ "epoch": 1.906082134811991,
785
+ "grad_norm": 9.376502990722656,
786
+ "learning_rate": 1.8116206288049885e-07,
787
+ "loss": 13.830848388671875,
788
+ "step": 11000
789
+ },
790
+ {
791
+ "epoch": 1.923410154219373,
792
+ "grad_norm": 9.49721622467041,
793
+ "learning_rate": 1.2066132167835253e-07,
794
+ "loss": 13.855355224609376,
795
+ "step": 11100
796
+ },
797
+ {
798
+ "epoch": 1.9407381736267544,
799
+ "grad_norm": 10.23469352722168,
800
+ "learning_rate": 7.237397520607147e-08,
801
+ "loss": 13.8263525390625,
802
+ "step": 11200
803
+ },
804
+ {
805
+ "epoch": 1.9580661930341363,
806
+ "grad_norm": 9.898947715759277,
807
+ "learning_rate": 3.633965912460069e-08,
808
+ "loss": 13.86751708984375,
809
+ "step": 11300
810
+ },
811
+ {
812
+ "epoch": 1.9753942124415178,
813
+ "grad_norm": 9.46378231048584,
814
+ "learning_rate": 1.2587951450517832e-08,
815
+ "loss": 13.787728271484376,
816
+ "step": 11400
817
+ },
818
+ {
819
+ "epoch": 1.9927222318488997,
820
+ "grad_norm": 9.18976879119873,
821
+ "learning_rate": 1.1383482775406685e-09,
822
+ "loss": 13.8797900390625,
823
+ "step": 11500
824
+ },
825
+ {
826
+ "epoch": 2.0,
827
+ "eval_loss": 3.3484463691711426,
828
+ "eval_runtime": 22.6234,
829
+ "eval_samples_per_second": 659.583,
830
+ "eval_steps_per_second": 10.343,
831
+ "step": 11542
832
+ }
833
+ ],
834
+ "logging_steps": 100,
835
+ "max_steps": 11542,
836
+ "num_input_tokens_seen": 0,
837
+ "num_train_epochs": 2,
838
+ "save_steps": 5771,
839
+ "stateful_callbacks": {
840
+ "EarlyStoppingCallback": {
841
+ "args": {
842
+ "early_stopping_patience": 3,
843
+ "early_stopping_threshold": 0.0
844
+ },
845
+ "attributes": {
846
+ "early_stopping_patience_counter": 0
847
+ }
848
+ },
849
+ "TrainerControl": {
850
+ "args": {
851
+ "should_epoch_stop": false,
852
+ "should_evaluate": false,
853
+ "should_log": false,
854
+ "should_save": true,
855
+ "should_training_stop": true
856
+ },
857
+ "attributes": {}
858
+ }
859
+ },
860
+ "total_flos": 7.776973151621775e+17,
861
+ "train_batch_size": 64,
862
+ "trial_name": null,
863
+ "trial_params": null
864
+ }
checkpoint-11542/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dc38b2eea3f8755ab49032af3c555b4a3e9c23274e629dd4c763171401716a57
3
+ size 5137
checkpoint-5771/config.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_cross_attention": false,
3
+ "architectures": [
4
+ "RobertaForMaskedLM"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 1,
8
+ "classifier_dropout": null,
9
+ "dtype": "float32",
10
+ "eos_token_id": 2,
11
+ "hidden_act": "gelu",
12
+ "hidden_dropout_prob": 0.1,
13
+ "hidden_size": 768,
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 3072,
16
+ "is_decoder": false,
17
+ "layer_norm_eps": 1e-12,
18
+ "max_position_embeddings": 514,
19
+ "model_type": "roberta",
20
+ "num_attention_heads": 12,
21
+ "num_hidden_layers": 12,
22
+ "pad_token_id": 0,
23
+ "tie_word_embeddings": true,
24
+ "transformers_version": "5.0.0",
25
+ "type_vocab_size": 1,
26
+ "use_cache": false,
27
+ "vocab_size": 32001
28
+ }
checkpoint-5771/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c446d493e4771bd3c0ee3729569fe99e72cf71e45c7fada2910b06d5aa75c215
3
+ size 442633860
checkpoint-5771/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d80fadf1a8f82adbfd9ae54024726c57f633b321ed9989d56fd6cca4ec11a46f
3
+ size 885391563
checkpoint-5771/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c0e2b801084fdda512ba00d57bbf7cd029bcfcdb660ba9c623a5fa28bda34145
3
+ size 14645
checkpoint-5771/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3c583f8644128464678235f329e6e2302b0eaed80eefaecf5a0470cc3faa7d1c
3
+ size 1465
checkpoint-5771/trainer_state.json ADDED
@@ -0,0 +1,450 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 5771,
3
+ "best_metric": 3.394857168197632,
4
+ "best_model_checkpoint": "sindhibert_session3/checkpoint-5771",
5
+ "epoch": 1.0,
6
+ "eval_steps": 5771,
7
+ "global_step": 5771,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.017328019407381736,
14
+ "grad_norm": 9.74232006072998,
15
+ "learning_rate": 5.147313691507799e-06,
16
+ "loss": 16.534342041015623,
17
+ "step": 100
18
+ },
19
+ {
20
+ "epoch": 0.03465603881476347,
21
+ "grad_norm": 9.413031578063965,
22
+ "learning_rate": 1.0346620450606586e-05,
23
+ "loss": 16.06064208984375,
24
+ "step": 200
25
+ },
26
+ {
27
+ "epoch": 0.05198405822214521,
28
+ "grad_norm": 9.366157531738281,
29
+ "learning_rate": 1.554592720970537e-05,
30
+ "loss": 15.73246337890625,
31
+ "step": 300
32
+ },
33
+ {
34
+ "epoch": 0.06931207762952694,
35
+ "grad_norm": 8.934579849243164,
36
+ "learning_rate": 2.074523396880416e-05,
37
+ "loss": 15.634798583984375,
38
+ "step": 400
39
+ },
40
+ {
41
+ "epoch": 0.08664009703690868,
42
+ "grad_norm": 9.873139381408691,
43
+ "learning_rate": 2.594454072790295e-05,
44
+ "loss": 15.491142578125,
45
+ "step": 500
46
+ },
47
+ {
48
+ "epoch": 0.10396811644429042,
49
+ "grad_norm": 9.112743377685547,
50
+ "learning_rate": 2.9999702019626288e-05,
51
+ "loss": 15.47271728515625,
52
+ "step": 600
53
+ },
54
+ {
55
+ "epoch": 0.12129613585167215,
56
+ "grad_norm": 8.721996307373047,
57
+ "learning_rate": 2.999083739047451e-05,
58
+ "loss": 15.291612548828125,
59
+ "step": 700
60
+ },
61
+ {
62
+ "epoch": 0.1386241552590539,
63
+ "grad_norm": 8.849467277526855,
64
+ "learning_rate": 2.9969667845201166e-05,
65
+ "loss": 15.32687255859375,
66
+ "step": 800
67
+ },
68
+ {
69
+ "epoch": 0.15595217466643563,
70
+ "grad_norm": 8.970343589782715,
71
+ "learning_rate": 2.9936210760385845e-05,
72
+ "loss": 15.221800537109376,
73
+ "step": 900
74
+ },
75
+ {
76
+ "epoch": 0.17328019407381737,
77
+ "grad_norm": 9.423188209533691,
78
+ "learning_rate": 2.9890493598578603e-05,
79
+ "loss": 15.21154541015625,
80
+ "step": 1000
81
+ },
82
+ {
83
+ "epoch": 0.1906082134811991,
84
+ "grad_norm": 10.529290199279785,
85
+ "learning_rate": 2.9832553885757926e-05,
86
+ "loss": 15.091610107421875,
87
+ "step": 1100
88
+ },
89
+ {
90
+ "epoch": 0.20793623288858085,
91
+ "grad_norm": 8.895530700683594,
92
+ "learning_rate": 2.97624391805283e-05,
93
+ "loss": 15.116024169921875,
94
+ "step": 1200
95
+ },
96
+ {
97
+ "epoch": 0.22526425229596256,
98
+ "grad_norm": 9.481012344360352,
99
+ "learning_rate": 2.968020703508272e-05,
100
+ "loss": 15.086820068359375,
101
+ "step": 1300
102
+ },
103
+ {
104
+ "epoch": 0.2425922717033443,
105
+ "grad_norm": 8.957283020019531,
106
+ "learning_rate": 2.9585924947962195e-05,
107
+ "loss": 15.09182373046875,
108
+ "step": 1400
109
+ },
110
+ {
111
+ "epoch": 0.25992029111072606,
112
+ "grad_norm": 8.475807189941406,
113
+ "learning_rate": 2.9479670308650942e-05,
114
+ "loss": 14.974696044921876,
115
+ "step": 1500
116
+ },
117
+ {
118
+ "epoch": 0.2772483105181078,
119
+ "grad_norm": 8.860872268676758,
120
+ "learning_rate": 2.9361530334052883e-05,
121
+ "loss": 14.967041015625,
122
+ "step": 1600
123
+ },
124
+ {
125
+ "epoch": 0.29457632992548954,
126
+ "grad_norm": 8.990629196166992,
127
+ "learning_rate": 2.9231601996901433e-05,
128
+ "loss": 14.9465673828125,
129
+ "step": 1700
130
+ },
131
+ {
132
+ "epoch": 0.31190434933287126,
133
+ "grad_norm": 9.683910369873047,
134
+ "learning_rate": 2.9089991946161484e-05,
135
+ "loss": 14.9761962890625,
136
+ "step": 1800
137
+ },
138
+ {
139
+ "epoch": 0.32923236874025297,
140
+ "grad_norm": 9.044540405273438,
141
+ "learning_rate": 2.89368164194888e-05,
142
+ "loss": 14.89200927734375,
143
+ "step": 1900
144
+ },
145
+ {
146
+ "epoch": 0.34656038814763473,
147
+ "grad_norm": 8.935420036315918,
148
+ "learning_rate": 2.8772201147818787e-05,
149
+ "loss": 14.9054736328125,
150
+ "step": 2000
151
+ },
152
+ {
153
+ "epoch": 0.36388840755501645,
154
+ "grad_norm": 8.12104320526123,
155
+ "learning_rate": 2.8596281252162868e-05,
156
+ "loss": 14.8011767578125,
157
+ "step": 2100
158
+ },
159
+ {
160
+ "epoch": 0.3812164269623982,
161
+ "grad_norm": 9.633867263793945,
162
+ "learning_rate": 2.840920113269721e-05,
163
+ "loss": 14.789473876953124,
164
+ "step": 2200
165
+ },
166
+ {
167
+ "epoch": 0.3985444463697799,
168
+ "grad_norm": 9.07466983795166,
169
+ "learning_rate": 2.8211114350234873e-05,
170
+ "loss": 14.80165283203125,
171
+ "step": 2300
172
+ },
173
+ {
174
+ "epoch": 0.4158724657771617,
175
+ "grad_norm": 9.412736892700195,
176
+ "learning_rate": 2.8002183500178594e-05,
177
+ "loss": 14.746627197265624,
178
+ "step": 2400
179
+ },
180
+ {
181
+ "epoch": 0.4332004851845434,
182
+ "grad_norm": 9.755793571472168,
183
+ "learning_rate": 2.7782580079057772e-05,
184
+ "loss": 14.778804931640625,
185
+ "step": 2500
186
+ },
187
+ {
188
+ "epoch": 0.4505285045919251,
189
+ "grad_norm": 9.882634162902832,
190
+ "learning_rate": 2.7552484343759096e-05,
191
+ "loss": 14.704544677734376,
192
+ "step": 2600
193
+ },
194
+ {
195
+ "epoch": 0.4678565239993069,
196
+ "grad_norm": 9.305146217346191,
197
+ "learning_rate": 2.731208516356645e-05,
198
+ "loss": 14.75770751953125,
199
+ "step": 2700
200
+ },
201
+ {
202
+ "epoch": 0.4851845434066886,
203
+ "grad_norm": 9.269790649414062,
204
+ "learning_rate": 2.7061579865131508e-05,
205
+ "loss": 14.68646484375,
206
+ "step": 2800
207
+ },
208
+ {
209
+ "epoch": 0.5025125628140703,
210
+ "grad_norm": 9.310648918151855,
211
+ "learning_rate": 2.6801174070502248e-05,
212
+ "loss": 14.635621337890624,
213
+ "step": 2900
214
+ },
215
+ {
216
+ "epoch": 0.5198405822214521,
217
+ "grad_norm": 9.239577293395996,
218
+ "learning_rate": 2.653108152834241e-05,
219
+ "loss": 14.71250732421875,
220
+ "step": 3000
221
+ },
222
+ {
223
+ "epoch": 0.5371686016288338,
224
+ "grad_norm": 9.674842834472656,
225
+ "learning_rate": 2.6251523938480346e-05,
226
+ "loss": 14.602254638671875,
227
+ "step": 3100
228
+ },
229
+ {
230
+ "epoch": 0.5544966210362156,
231
+ "grad_norm": 10.178524017333984,
232
+ "learning_rate": 2.5962730769931346e-05,
233
+ "loss": 14.558492431640625,
234
+ "step": 3200
235
+ },
236
+ {
237
+ "epoch": 0.5718246404435973,
238
+ "grad_norm": 9.312729835510254,
239
+ "learning_rate": 2.5664939072542787e-05,
240
+ "loss": 14.588648681640626,
241
+ "step": 3300
242
+ },
243
+ {
244
+ "epoch": 0.5891526598509791,
245
+ "grad_norm": 9.438308715820312,
246
+ "learning_rate": 2.5358393282416714e-05,
247
+ "loss": 14.535865478515625,
248
+ "step": 3400
249
+ },
250
+ {
251
+ "epoch": 0.6064806792583608,
252
+ "grad_norm": 8.51146125793457,
253
+ "learning_rate": 2.5043345021269554e-05,
254
+ "loss": 14.5489208984375,
255
+ "step": 3500
256
+ },
257
+ {
258
+ "epoch": 0.6238086986657425,
259
+ "grad_norm": 9.856837272644043,
260
+ "learning_rate": 2.4720052889893698e-05,
261
+ "loss": 14.565177001953124,
262
+ "step": 3600
263
+ },
264
+ {
265
+ "epoch": 0.6411367180731242,
266
+ "grad_norm": 9.223260879516602,
267
+ "learning_rate": 2.4388782255890405e-05,
268
+ "loss": 14.452093505859375,
269
+ "step": 3700
270
+ },
271
+ {
272
+ "epoch": 0.6584647374805059,
273
+ "grad_norm": 9.016181945800781,
274
+ "learning_rate": 2.404980503584838e-05,
275
+ "loss": 14.49298828125,
276
+ "step": 3800
277
+ },
278
+ {
279
+ "epoch": 0.6757927568878878,
280
+ "grad_norm": 9.865802764892578,
281
+ "learning_rate": 2.370339947214669e-05,
282
+ "loss": 14.474598388671875,
283
+ "step": 3900
284
+ },
285
+ {
286
+ "epoch": 0.6931207762952695,
287
+ "grad_norm": 8.965621948242188,
288
+ "learning_rate": 2.3349849904565318e-05,
289
+ "loss": 14.46911376953125,
290
+ "step": 4000
291
+ },
292
+ {
293
+ "epoch": 0.7104487957026512,
294
+ "grad_norm": 8.362798690795898,
295
+ "learning_rate": 2.2989446536890786e-05,
296
+ "loss": 14.390712890625,
297
+ "step": 4100
298
+ },
299
+ {
300
+ "epoch": 0.7277768151100329,
301
+ "grad_norm": 10.564478874206543,
302
+ "learning_rate": 2.2622485198708445e-05,
303
+ "loss": 14.45989501953125,
304
+ "step": 4200
305
+ },
306
+ {
307
+ "epoch": 0.7451048345174146,
308
+ "grad_norm": 9.188340187072754,
309
+ "learning_rate": 2.2249267102576903e-05,
310
+ "loss": 14.422335205078125,
311
+ "step": 4300
312
+ },
313
+ {
314
+ "epoch": 0.7624328539247964,
315
+ "grad_norm": 9.867836952209473,
316
+ "learning_rate": 2.1870098596784012e-05,
317
+ "loss": 14.341461181640625,
318
+ "step": 4400
319
+ },
320
+ {
321
+ "epoch": 0.7797608733321781,
322
+ "grad_norm": 9.469503402709961,
323
+ "learning_rate": 2.148529091388725e-05,
324
+ "loss": 14.42570556640625,
325
+ "step": 4500
326
+ },
327
+ {
328
+ "epoch": 0.7970888927395599,
329
+ "grad_norm": 9.195992469787598,
330
+ "learning_rate": 2.1095159915244956e-05,
331
+ "loss": 14.3226025390625,
332
+ "step": 4600
333
+ },
334
+ {
335
+ "epoch": 0.8144169121469416,
336
+ "grad_norm": 9.930395126342773,
337
+ "learning_rate": 2.070002583174816e-05,
338
+ "loss": 14.317152099609375,
339
+ "step": 4700
340
+ },
341
+ {
342
+ "epoch": 0.8317449315543234,
343
+ "grad_norm": 9.45024299621582,
344
+ "learning_rate": 2.0300213000965707e-05,
345
+ "loss": 14.355799560546876,
346
+ "step": 4800
347
+ },
348
+ {
349
+ "epoch": 0.8490729509617051,
350
+ "grad_norm": 9.889897346496582,
351
+ "learning_rate": 1.989604960091854e-05,
352
+ "loss": 14.314393310546874,
353
+ "step": 4900
354
+ },
355
+ {
356
+ "epoch": 0.8664009703690868,
357
+ "grad_norm": 10.8844575881958,
358
+ "learning_rate": 1.948786738070162e-05,
359
+ "loss": 14.279014892578125,
360
+ "step": 5000
361
+ },
362
+ {
363
+ "epoch": 0.8837289897764685,
364
+ "grad_norm": 9.387309074401855,
365
+ "learning_rate": 1.9076001388174608e-05,
366
+ "loss": 14.240478515625,
367
+ "step": 5100
368
+ },
369
+ {
370
+ "epoch": 0.9010570091838502,
371
+ "grad_norm": 10.535667419433594,
372
+ "learning_rate": 1.866078969494479e-05,
373
+ "loss": 14.26585205078125,
374
+ "step": 5200
375
+ },
376
+ {
377
+ "epoch": 0.918385028591232,
378
+ "grad_norm": 9.147391319274902,
379
+ "learning_rate": 1.8242573118868094e-05,
380
+ "loss": 14.309058837890625,
381
+ "step": 5300
382
+ },
383
+ {
384
+ "epoch": 0.9357130479986138,
385
+ "grad_norm": 9.556977272033691,
386
+ "learning_rate": 1.7821694944295836e-05,
387
+ "loss": 14.21564453125,
388
+ "step": 5400
389
+ },
390
+ {
391
+ "epoch": 0.9530410674059955,
392
+ "grad_norm": 9.025933265686035,
393
+ "learning_rate": 1.7398500640296928e-05,
394
+ "loss": 14.192568359375,
395
+ "step": 5500
396
+ },
397
+ {
398
+ "epoch": 0.9703690868133772,
399
+ "grad_norm": 9.630436897277832,
400
+ "learning_rate": 1.6973337577086803e-05,
401
+ "loss": 14.193314208984376,
402
+ "step": 5600
403
+ },
404
+ {
405
+ "epoch": 0.987697106220759,
406
+ "grad_norm": 9.064878463745117,
407
+ "learning_rate": 1.6546554740895815e-05,
408
+ "loss": 14.1739111328125,
409
+ "step": 5700
410
+ },
411
+ {
412
+ "epoch": 1.0,
413
+ "eval_loss": 3.394857168197632,
414
+ "eval_runtime": 22.6074,
415
+ "eval_samples_per_second": 660.048,
416
+ "eval_steps_per_second": 10.351,
417
+ "step": 5771
418
+ }
419
+ ],
420
+ "logging_steps": 100,
421
+ "max_steps": 11542,
422
+ "num_input_tokens_seen": 0,
423
+ "num_train_epochs": 2,
424
+ "save_steps": 5771,
425
+ "stateful_callbacks": {
426
+ "EarlyStoppingCallback": {
427
+ "args": {
428
+ "early_stopping_patience": 3,
429
+ "early_stopping_threshold": 0.0
430
+ },
431
+ "attributes": {
432
+ "early_stopping_patience_counter": 0
433
+ }
434
+ },
435
+ "TrainerControl": {
436
+ "args": {
437
+ "should_epoch_stop": false,
438
+ "should_evaluate": false,
439
+ "should_log": false,
440
+ "should_save": true,
441
+ "should_training_stop": false
442
+ },
443
+ "attributes": {}
444
+ }
445
+ },
446
+ "total_flos": 3.888486575810888e+17,
447
+ "train_batch_size": 64,
448
+ "trial_name": null,
449
+ "trial_params": null
450
+ }
checkpoint-5771/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dc38b2eea3f8755ab49032af3c555b4a3e9c23274e629dd4c763171401716a57
3
+ size 5137
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:792f46cb1378b8ab1a168296ccf3cff6636948e4023ba6f87849d3969770c012
3
  size 442633860
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:af548243f2a2884a4a369c6b04c497110cb9a587cea0a5041e9a0820c72889ef
3
  size 442633860
tokenizer_config.json CHANGED
@@ -1,17 +1,4 @@
1
  {
2
- "add_prefix_space": true,
3
- "backend": "custom",
4
- "bos_token": "<s>",
5
- "cls_token": "<s>",
6
- "eos_token": "</s>",
7
- "extra_special_tokens": [],
8
- "mask_token": "<mask>",
9
- "model_max_length": 512,
10
- "pad_token": "<pad>",
11
- "sep_token": "</s>",
12
- "unk_token": "<unk>",
13
- "unk_id": 1,
14
- "tokenizer_class": "XLMRobertaTokenizer",
15
  "added_tokens_decoder": {
16
  "0": {
17
  "content": "<pad>",
@@ -47,11 +34,20 @@
47
  },
48
  "32000": {
49
  "content": "<mask>",
50
- "lstrip": true,
51
  "normalized": false,
52
  "rstrip": false,
53
  "single_word": false,
54
  "special": true
55
  }
56
- }
57
- }
 
 
 
 
 
 
 
 
 
 
1
  {
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  "added_tokens_decoder": {
3
  "0": {
4
  "content": "<pad>",
 
34
  },
35
  "32000": {
36
  "content": "<mask>",
37
+ "lstrip": false,
38
  "normalized": false,
39
  "rstrip": false,
40
  "single_word": false,
41
  "special": true
42
  }
43
+ },
44
+ "additional_special_tokens": null,
45
+ "backend": "custom",
46
+ "bos_token": "<s>",
47
+ "eos_token": "</s>",
48
+ "mask_token": "<mask>",
49
+ "model_max_length": 1000000000000000019884624838656,
50
+ "pad_token": "<pad>",
51
+ "tokenizer_class": "SindhiTokenizer",
52
+ "unk_token": "<unk>"
53
+ }
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:eaec524752280134c2e0387d7f5f1e2cc6d34eaa3f289327a642e9b1d7d2b9c9
3
  size 5137
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dc38b2eea3f8755ab49032af3c555b4a3e9c23274e629dd4c763171401716a57
3
  size 5137