ojo commited on
Commit
8c0fb70
·
verified ·
1 Parent(s): 2c4294c

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
LICENSE ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ Copyright (C) 2024 Xiaomi Corporation.
2
+
3
+ Licensed under the [Gemma](https://ai.google.dev/gemma/terms).
README.md CHANGED
@@ -1,3 +1,71 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: gemma
3
+ license_name: license
4
+ license_link: LICENSE
5
+ base_model:
6
+ - ModelSpace/GemmaX2-28-2B-v0.1
7
+ pipeline_tag: translation
8
+ library_name: transformers
9
+ tags:
10
+ - text-generation
11
+ language:
12
+ - de
13
+ - en
14
+ - fr
15
+ - es
16
+ ---
17
+ ## Model Summary
18
+
19
+ OLaPh is a large language model for phonemization, finetuned from GemmaX2-28-2B-v0.1.
20
+ Its tokenizer was extended with 1,024 phoneme tokens, derived from a BPE tokenizer trained on phoneme sequences generated by the OLaPh framework (to be released).
21
+
22
+ The model was then finetuned for grapheme-to-phoneme conversion on a multilingual dataset (English, German, French, Spanish), created by phonemizing text from HuggingFaceFW/fineweb and HuggingFaceFW/fineweb-2 using the OLaPh framework.
23
+
24
+ - **Finetuned By **: Institute for Information Systems of Hof University
25
+ - **Model type**: Text-To-Text
26
+ - **Language(s)**: English, French, German, Spanish
27
+ - **License**: Gemma (Gemma is provided under and subject to the Gemma Terms of Use found at ai.google.dev/gemma/terms)
28
+ - **Release Date**: September 25, 2025
29
+
30
+ ## Usage
31
+
32
+ ```python
33
+ from transformers import AutoModelForCausalLM, AutoTokenizer
34
+
35
+ lang = "English" #German, French, Spanish
36
+ sentence = "But we are not sorry, for the rain is delightful."
37
+
38
+ model_id = "iisys-hof/olaph"
39
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
40
+ model = AutoModelForCausalLM.from_pretrained(model_id).to("cuda")
41
+ stop_tokens = [tokenizer.eos_token_id, tokenizer.encode(".", add_special_tokens=False)[0]]
42
+
43
+
44
+ prompt = f"Translate this from {lang} to Phones:\n{lang}: "
45
+
46
+ inputs = tokenizer(f"{prompt}{sentence}\nPhones:", return_tensors="pt").to("cuda")
47
+
48
+ outputs = model.generate(**inputs, max_new_tokens=256, eos_token_id=stop_tokens)
49
+ phonemized = tokenizer.decode(outputs[0], skip_special_tokens=False)
50
+ phonemized = phonemized.split("\n")[-1].replace("Phones:", "")
51
+
52
+ print(phonemized)
53
+ ```
54
+
55
+ #Caveats
56
+
57
+
58
+
59
+
60
+ ### Citation
61
+ ```bibtex
62
+ @misc{wirth2025olaphoptimallanguagephonemizer,
63
+ title={OLaPh: Optimal Language Phonemizer},
64
+ author={Johannes Wirth},
65
+ year={2025},
66
+ eprint={2509.20086},
67
+ archivePrefix={arXiv},
68
+ primaryClass={cs.CL},
69
+ url={https://arxiv.org/abs/2509.20086},
70
+ }
71
+ ```
added_tokens.json ADDED
@@ -0,0 +1,608 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ " avɛ": 256208,
3
+ " avɛk": 256343,
4
+ " aʁ": 256368,
5
+ " aʊ̯": 256073,
6
+ " aʊ̯f": 256165,
7
+ " aʊ̯s": 256266,
8
+ " aʊ̯x": 256211,
9
+ " baɪ̯": 256333,
10
+ " bjɛ̃": 256478,
11
+ " bə": 256075,
12
+ " bɪk": 256505,
13
+ " bɪs": 256533,
14
+ " bɹ": 256256,
15
+ " dɑ̃": 256175,
16
+ " dɔx": 256523,
17
+ " dɔ̃": 256509,
18
+ " dɛn": 256601,
19
+ " dɛs": 256199,
20
+ " dɪs": 256361,
21
+ " dɹ": 256404,
22
+ " dʊʁç": 256413,
23
+ " d‍ʒ": 256188,
24
+ " etɛ": 256277,
25
+ " fwe": 256511,
26
+ " fɔn": 256146,
27
+ " fɔʁ": 256507,
28
+ " fɛ": 256513,
29
+ " fɛɐ̯": 256091,
30
+ " fɛʁ": 256545,
31
+ " fɹ": 256144,
32
+ " fʁ": 256388,
33
+ " gʁ": 256389,
34
+ " kaʁ": 256458,
35
+ " komo": 256320,
36
+ " kɔ": 256599,
37
+ " kɔm": 256184,
38
+ " kɔ̃": 256135,
39
+ " kə": 256085,
40
+ " kəm": 256311,
41
+ " kən": 256461,
42
+ " kəns": 256600,
43
+ " kɛl": 256517,
44
+ " kɹ": 256315,
45
+ " lœʁ": 256459,
46
+ " lə": 256062,
47
+ " lɥi": 256345,
48
+ " maʁ": 256393,
49
+ " mwa": 256563,
50
+ " mɔ̃": 256330,
51
+ " mɛ": 256227,
52
+ " mɛm": 256550,
53
+ " mɪt": 256147,
54
+ " mɪç": 256407,
55
+ " nɔx": 256324,
56
+ " nɔ̃": 256498,
57
+ " nə": 256224,
58
+ " nɪçt": 256140,
59
+ " otʁ": 256602,
60
+ " paɾ": 256192,
61
+ " paɾa": 256291,
62
+ " paʁ": 256111,
63
+ " peɾ": 256501,
64
+ " peɾo": 256548,
65
+ " poɾ": 256177,
66
+ " puʁ": 256196,
67
+ " pø": 256521,
68
+ " pɔ": 256568,
69
+ " pə": 256514,
70
+ " pɛʁ": 256328,
71
+ " pɹ": 256142,
72
+ " pɹə": 256394,
73
+ " pɾi": 256540,
74
+ " pɾo": 256321,
75
+ " pʁ": 256103,
76
+ " pʁe": 256444,
77
+ " pʁɔ": 256284,
78
+ " seɲ": 256604,
79
+ " stɹ": 256334,
80
+ " syʁ": 256235,
81
+ " sɑ̃": 256289,
82
+ " sɔ̃": 256172,
83
+ " sɛ": 256373,
84
+ " sɛt": 256258,
85
+ " sɛʁ": 256580,
86
+ " sɛ̃": 256453,
87
+ " s̪": 256183,
88
+ " tje": 256520,
89
+ " tɑ̃": 256574,
90
+ " tɹ": 256202,
91
+ " tɾa": 256552,
92
+ " tʁ": 256236,
93
+ " tʁa": 256484,
94
+ " t͡s": 256049,
95
+ " t͡su": 256074,
96
+ " t͡sʊm": 256490,
97
+ " t‍ʃ": 256226,
98
+ " vwa": 256588,
99
+ " vɛn": 256375,
100
+ " vɪ": 256405,
101
+ " vɪɐ̯t": 256406,
102
+ " zaɪ̯n": 256367,
103
+ " zɪnt": 256359,
104
+ " zɪç": 256143,
105
+ " ðə": 256007,
106
+ " œ̃": 256121,
107
+ " ɐ": 256057,
108
+ " ɐb": 256383,
109
+ " ɐd": 256587,
110
+ " ɐl": 256559,
111
+ " ɐɡ": 256472,
112
+ " ɑ̃": 256056,
113
+ " ɑ̃tʁ": 256581,
114
+ " ɔ": 256210,
115
+ " ɔʁ": 256370,
116
+ " ɔ̃": 256254,
117
+ " ənt": 256482,
118
+ " əp": 256538,
119
+ " ɛks": 256259,
120
+ " ɛl": 256193,
121
+ " ɛnt": 256279,
122
+ " ɛs": 256116,
123
+ " ɛst": 256101,
124
+ " ɛtʁ": 256554,
125
+ " ɛɐ̯": 256145,
126
+ " ɛ̃": 256162,
127
+ " ɡ": 256016,
128
+ " ɡl": 256582,
129
+ " ɡə": 256052,
130
+ " ɡɹ": 256214,
131
+ " ɪm": 256158,
132
+ " ɪn": 256053,
133
+ " ɪst": 256138,
134
+ " ɪç": 256120,
135
+ " ɹ": 256039,
136
+ " ɹɪs": 256547,
137
+ " ʁ": 256042,
138
+ " ʁa": 256419,
139
+ " ʁe": 256185,
140
+ " ʁi": 256596,
141
+ " ʁə": 256218,
142
+ " ʃ": 256031,
143
+ " ʃo": 256440,
144
+ " ʃta": 256510,
145
+ " ʊm": 256310,
146
+ " ʊn": 256414,
147
+ " ʊns": 256593,
148
+ " ʊnt": 256035,
149
+ " ʌn": 256519,
150
+ " ʒ": 256079,
151
+ " ʒə": 256215,
152
+ " ʝ": 256391,
153
+ " ʼ": 256428,
154
+ " ˈ": 256000,
155
+ " ˌ": 256064,
156
+ "alə": 256603,
157
+ "ant͡s": 256409,
158
+ "asjɔ̃": 256247,
159
+ "að": 256067,
160
+ "aða": 256305,
161
+ "aðo": 256213,
162
+ "aðos": 256549,
163
+ "aɪ̯": 256004,
164
+ "aɪ̯l": 256357,
165
+ "aɪ̯n": 256018,
166
+ "aɪ̯nɐ": 256249,
167
+ "aɪ̯nə": 256114,
168
+ "aɪ̯nəm": 256293,
169
+ "aɪ̯nən": 256230,
170
+ "aɪ̯nəs": 256558,
171
+ "aɪ̯t": 256137,
172
+ "aɾ": 256043,
173
+ "aʁ": 256026,
174
+ "aʊ̯": 256083,
175
+ "aʊ̯f": 256312,
176
+ "aʊ̯s": 256245,
177
+ "a‍": 256047,
178
+ "a‍ɪ": 256009,
179
+ "a‍ɪd": 256233,
180
+ "a‍ɪf": 256522,
181
+ "a‍ɪk": 256402,
182
+ "a‍ɪl": 256355,
183
+ "a‍ɪm": 256319,
184
+ "a‍ɪn": 256322,
185
+ "a‍ɪnd": 256358,
186
+ "a‍ɪt": 256168,
187
+ "a‍ɪv": 256515,
188
+ "a‍ɪz": 256381,
189
+ "a‍ɪ‍ə": 256282,
190
+ "a‍ʊ": 256051,
191
+ "a‍ʊn": 256339,
192
+ "a‍ʊnd": 256372,
193
+ "a‍ʊt": 256186,
194
+ "a‍ʊ‍ə": 256349,
195
+ "baɪ̯": 256283,
196
+ "bm̩": 256278,
197
+ "bn̩": 256171,
198
+ "bɐ": 256094,
199
+ "bə": 256159,
200
+ "bə‍l": 256222,
201
+ "bɝ": 256429,
202
+ "bɹ": 256539,
203
+ "bʁ": 256248,
204
+ "dn̩": 256131,
205
+ "dɐ": 256136,
206
+ "dɔɪ̯": 256592,
207
+ "dən": 256351,
208
+ "dɛ": 256543,
209
+ "dɪŋ": 256433,
210
+ "dʁ": 256356,
211
+ "dʒ": 256187,
212
+ "d‍ʒ": 256197,
213
+ "eɪ": 256027,
214
+ "eɾ": 256033,
215
+ "eɾa": 256366,
216
+ "eɾo": 256273,
217
+ "e‍ə": 256077,
218
+ "e‍ɪ": 256011,
219
+ "e‍ɪd": 256264,
220
+ "e‍ɪk": 256250,
221
+ "e‍ɪl": 256557,
222
+ "e‍ɪm": 256220,
223
+ "e‍ɪn": 256296,
224
+ "e‍ɪnd": 256398,
225
+ "e‍ɪnd‍ʒ": 256566,
226
+ "e‍ɪs": 256287,
227
+ "e‍ɪt": 256128,
228
+ "e‍ɪtɪd": 256382,
229
+ "e‍ɪv": 256526,
230
+ "e‍ɪʃən": 256167,
231
+ "fn̩": 256269,
232
+ "ftɐ": 256481,
233
+ "fɔl": 256387,
234
+ "fɔʁ": 256465,
235
+ "fə": 256497,
236
+ "fɛɐ̯": 256524,
237
+ "fʁ": 256426,
238
+ "hatə": 256395,
239
+ "haɪ̯": 256508,
240
+ "haɪ̯t": 256493,
241
+ "hæv": 256191,
242
+ "ið": 256238,
243
+ "iðo": 256445,
244
+ "iɾ": 256489,
245
+ "i‍ə": 256126,
246
+ "jeɾ": 256295,
247
+ "jœʁ": 256589,
248
+ "jɔ̃": 256098,
249
+ "jɛ": 256534,
250
+ "jɛʁ": 256288,
251
+ "jɛ̃": 256200,
252
+ "jʊ": 256309,
253
+ "kaɪ̯t": 256492,
254
+ "ktə": 256265,
255
+ "kœ": 256512,
256
+ "kɑ": 256438,
257
+ "kɔ": 256401,
258
+ "kə": 256263,
259
+ "kɹ": 256317,
260
+ "laŋ": 256487,
261
+ "laɪ̯": 256239,
262
+ "laɪ̯ç": 256571,
263
+ "laʊ̯": 256441,
264
+ "lɐ": 256415,
265
+ "lən": 256272,
266
+ "lɛ": 256344,
267
+ "lɪç": 256088,
268
+ "lɪçn̩": 256443,
269
+ "lɪçə": 256494,
270
+ "l̩": 256130,
271
+ "maɪ̯": 256584,
272
+ "mɑ̃": 256113,
273
+ "mən": 256152,
274
+ "mənt": 256243,
275
+ "mɛ": 256286,
276
+ "mɛn": 256466,
277
+ "mɪt": 256304,
278
+ "m̩": 256205,
279
+ "ndn̩": 256451,
280
+ "ndɐ": 256298,
281
+ "ndə": 256237,
282
+ "noʊ": 256274,
283
+ "ntə": 256377,
284
+ "nt͡s": 256369,
285
+ "nɐ": 256595,
286
+ "nə": 256209,
287
+ "nən": 256189,
288
+ "nəs": 256442,
289
+ "nɛ": 256446,
290
+ "nɪk": 256486,
291
+ "n̩": 256005,
292
+ "oɾ": 256048,
293
+ "oɾa": 256480,
294
+ "oʊ": 256078,
295
+ "pə‍l": 256555,
296
+ "pɹ": 256225,
297
+ "pʁ": 256569,
298
+ "pʁi": 256536,
299
+ "p͡f": 256416,
300
+ "sjɔ̃": 256448,
301
+ "sn̩": 256280,
302
+ "stn̩": 256260,
303
+ "stɐ": 256410,
304
+ "stə": 256240,
305
+ "tn̩": 256076,
306
+ "tɐ": 256100,
307
+ "tɑ̃": 256585,
308
+ "tə": 256024,
309
+ "tən": 256503,
310
+ "tət": 256462,
311
+ "tɛ": 256161,
312
+ "tɛʁ": 256570,
313
+ "tɪd": 256470,
314
+ "tʁ": 256072,
315
+ "tʁa": 256474,
316
+ "tʃ": 256149,
317
+ "tʃa": 256542,
318
+ "tʃo": 256464,
319
+ "t͡": 256430,
320
+ "t͡s": 256028,
321
+ "t͡si": 256198,
322
+ "t͡si̯o": 256340,
323
+ "t͡sn̩": 256537,
324
+ "t͡su": 256253,
325
+ "t‍ʃ": 256216,
326
+ "uʁ": 256099,
327
+ "vaɪ̯": 256180,
328
+ "vɑ̃": 256544,
329
+ "vɔl": 256467,
330
+ "vɔʁ": 256516,
331
+ "vən": 256551,
332
+ "vɛ": 256097,
333
+ "vɛl": 256432,
334
+ "vɛʁ": 256166,
335
+ "vɛʁdn̩": 256352,
336
+ "vɪs": 256535,
337
+ "vɪʁ": 256562,
338
+ "vʊʁ": 256275,
339
+ "vʊʁdə": 256376,
340
+ "waʁ": 256255,
341
+ "weɾ": 256431,
342
+ "wɛ̃": 256418,
343
+ "wɝ": 256203,
344
+ "xn̩": 256485,
345
+ "xtə": 256576,
346
+ "yʁ": 256125,
347
+ "zaɪ̯nə": 256457,
348
+ "zn̩": 256262,
349
+ "zɐ": 256353,
350
+ "zɔ": 256567,
351
+ "zɔl": 256437,
352
+ "zə": 256182,
353
+ "zɛ": 256400,
354
+ "ækt": 256338,
355
+ "æm": 256299,
356
+ "æp": 256403,
357
+ "æz": 256095,
358
+ "æŋ": 256528,
359
+ "æɫ": 256553,
360
+ "æɹ": 256412,
361
+ "çn̩": 256374,
362
+ "çt": 256323,
363
+ "çə": 256439,
364
+ "ðað": 256341,
365
+ "ðe": 256297,
366
+ "ðo": 256153,
367
+ "ðos": 256495,
368
+ "ðɐ": 256190,
369
+ "ŋg": 256469,
370
+ "ŋk": 256155,
371
+ "œl": 256556,
372
+ "œʁ": 256106,
373
+ "œ̃": 256112,
374
+ "ɐn": 256117,
375
+ "ɐ̯": 256002,
376
+ "ɐ̯t": 256169,
377
+ "ɐ̯tə": 256477,
378
+ "ɑɹ": 256115,
379
+ "ɑ̃": 256006,
380
+ "ɑ̃d": 256217,
381
+ "ɑ̃s": 256148,
382
+ "ɑ̃t": 256246,
383
+ "ɒd": 256506,
384
+ "ɒf": 256435,
385
+ "ɒl": 256160,
386
+ "ɒls": 256348,
387
+ "ɒlsə‍ʊ": 256362,
388
+ "ɒm": 256179,
389
+ "ɒn": 256063,
390
+ "ɒnt": 256578,
391
+ "ɒp": 256360,
392
+ "ɒt": 256092,
393
+ "ɒv": 256022,
394
+ "ɒz": 256080,
395
+ "ɒŋ": 256327,
396
+ "ɒɹ": 256499,
397
+ "ɔk": 256560,
398
+ "ɔl": 256084,
399
+ "ɔm": 256082,
400
+ "ɔmən": 256463,
401
+ "ɔn": 256065,
402
+ "ɔs": 256436,
403
+ "ɔt": 256527,
404
+ "ɔx": 256173,
405
+ "ɔɪ̯": 256086,
406
+ "ɔɹ": 256229,
407
+ "ɔʁ": 256038,
408
+ "ɔʁt": 256223,
409
+ "ɔ̃": 256014,
410
+ "ɔ‍ɪ": 256261,
411
+ "əd": 256252,
412
+ "ənd": 256232,
413
+ "əns": 256178,
414
+ "ənt": 256066,
415
+ "ənz": 256335,
416
+ "əs": 256044,
417
+ "əz": 256292,
418
+ "əɹ": 256194,
419
+ "əɹi": 256575,
420
+ "ə‍": 256008,
421
+ "ə‍l": 256037,
422
+ "ə‍li": 256530,
423
+ "ə‍ʊ": 256017,
424
+ "ə‍ʊk": 256475,
425
+ "ə‍ʊl": 256399,
426
+ "ə‍ʊld": 256326,
427
+ "ə‍ʊn": 256212,
428
+ "ə‍ʊnli": 256488,
429
+ "ə‍ʊp": 256586,
430
+ "ə‍ʊst": 256385,
431
+ "ə‍ʊt": 256541,
432
+ "ə‍ʊv": 256420,
433
+ "ə‍ʊvɐ": 256591,
434
+ "ə‍ʊz": 256285,
435
+ "ɛd": 256127,
436
+ "ɛf": 256491,
437
+ "ɛk": 256154,
438
+ "ɛks": 256422,
439
+ "ɛkt": 256195,
440
+ "ɛl": 256032,
441
+ "ɛlf": 256379,
442
+ "ɛlp": 256479,
443
+ "ɛlt": 256276,
444
+ "ɛm": 256122,
445
+ "ɛn": 256030,
446
+ "ɛnd": 256150,
447
+ "ɛni": 256331,
448
+ "ɛns": 256598,
449
+ "ɛnt": 256105,
450
+ "ɛp": 256449,
451
+ "ɛs": 256054,
452
+ "ɛst": 256124,
453
+ "ɛt": 256050,
454
+ "ɛtʁ": 256565,
455
+ "ɛt͡": 256308,
456
+ "ɛt͡st": 256455,
457
+ "ɛv": 256151,
458
+ "ɛvɐ": 256336,
459
+ "ɛz": 256281,
460
+ "ɛŋ": 256417,
461
+ "ɛɐ̯": 256071,
462
+ "ɛɹ": 256396,
463
+ "ɛɹi": 256363,
464
+ "ɛʁ": 256020,
465
+ "ɛʁn": 256386,
466
+ "ɛ̃": 256059,
467
+ "ɝz": 256206,
468
+ "ɡa": 256347,
469
+ "ɡe": 256201,
470
+ "ɡi": 256471,
471
+ "ɡn̩": 256081,
472
+ "ɡz": 256529,
473
+ "ɡɐ": 256427,
474
+ "ɡə": 256069,
475
+ "ɡɹ": 256371,
476
+ "ɡʁ": 256397,
477
+ "ɡʁo": 256605,
478
+ "ɣa": 256579,
479
+ "ɣo": 256329,
480
+ "ɥi": 256119,
481
+ "ɪd": 256041,
482
+ "ɪd‍ʒ": 256244,
483
+ "ɪf": 256102,
484
+ "ɪk": 256034,
485
+ "ɪks": 256561,
486
+ "ɪkt": 256450,
487
+ "ɪkə‍l": 256502,
488
+ "ɪl": 256058,
489
+ "ɪm": 256046,
490
+ "ɪmɐ": 256518,
491
+ "ɪn": 256010,
492
+ "ɪnd": 256176,
493
+ "ɪns": 256546,
494
+ "ɪnt": 256109,
495
+ "ɪntʊ": 256434,
496
+ "ɪp": 256207,
497
+ "ɪs": 256040,
498
+ "ɪst": 256068,
499
+ "ɪt": 256013,
500
+ "ɪti": 256234,
501
+ "ɪts": 256228,
502
+ "ɪtə‍l": 256531,
503
+ "ɪt‍ʃ": 256204,
504
+ "ɪv": 256118,
505
+ "ɪz": 256023,
506
+ "ɪç": 256015,
507
+ "ɪçt": 256096,
508
+ "ɪð": 256108,
509
+ "ɪŋ": 256019,
510
+ "ɪŋk": 256456,
511
+ "ɪŋz": 256583,
512
+ "ɪɡ": 256181,
513
+ "ɪɡn̩": 256316,
514
+ "ɪɡə": 256337,
515
+ "ɪɫ": 256421,
516
+ "ɪɹ": 256392,
517
+ "ɪʁ": 256270,
518
+ "ɪʃ": 256089,
519
+ "ɪʃn̩": 256257,
520
+ "ɪʃə": 256390,
521
+ "ɪʃən": 256301,
522
+ "ɪ̯": 256003,
523
+ "ɫa": 256476,
524
+ "ɫi": 256384,
525
+ "ɫz": 256354,
526
+ "ɹi": 256087,
527
+ "ɹə": 256110,
528
+ "ɹɪ": 256300,
529
+ "ɹɪŋ": 256468,
530
+ "ɾa": 256093,
531
+ "ɾan": 256424,
532
+ "ɾas": 256408,
533
+ "ɾe": 256104,
534
+ "ɾes": 256303,
535
+ "ɾi": 256141,
536
+ "ɾia": 256572,
537
+ "ɾo": 256134,
538
+ "ɾos": 256504,
539
+ "ʁa": 256036,
540
+ "ʁaɪ̯": 256163,
541
+ "ʁaʊ̯": 256219,
542
+ "ʁe": 256170,
543
+ "ʁi": 256061,
544
+ "ʁo": 256174,
545
+ "ʁu": 256473,
546
+ "ʁy": 256306,
547
+ "ʁɐ": 256452,
548
+ "ʁɔ": 256365,
549
+ "ʁə": 256157,
550
+ "ʁən": 256133,
551
+ "ʁɛ": 256090,
552
+ "ʁʊŋ": 256423,
553
+ "ʃa": 256290,
554
+ "ʃaft": 256594,
555
+ "ʃe": 256525,
556
+ "ʃi": 256346,
557
+ "ʃn̩": 256500,
558
+ "ʃp": 256267,
559
+ "ʃt": 256107,
560
+ "ʃta": 256271,
561
+ "ʃtɛl": 256577,
562
+ "ʃən": 256060,
563
+ "ʃənz": 256447,
564
+ "ʃə‍l": 256590,
565
+ "ʊd": 256129,
566
+ "ʊk": 256314,
567
+ "ʊl": 256532,
568
+ "ʊm": 256139,
569
+ "ʊn": 256156,
570
+ "ʊnt": 256025,
571
+ "ʊntɐ": 256307,
572
+ "ʊs": 256251,
573
+ "ʊt": 256597,
574
+ "ʊŋ": 256070,
575
+ "ʊŋən": 256325,
576
+ "ʊʁ": 256132,
577
+ "ʊʁç": 256294,
578
+ "ʊ̯": 256021,
579
+ "ʌm": 256164,
580
+ "ʌn": 256242,
581
+ "ʌnt": 256425,
582
+ "ʌp": 256302,
583
+ "ʌs": 256364,
584
+ "ʌst": 256268,
585
+ "ʌt": 256123,
586
+ "ʌt‍ʃ": 256342,
587
+ "ʌv": 256378,
588
+ "ʌðɐ": 256332,
589
+ "ʌŋ": 256496,
590
+ "ʎa": 256350,
591
+ "ʎe": 256483,
592
+ "ʏk": 256380,
593
+ "ʏn": 256573,
594
+ "ʏʁ": 256318,
595
+ "ʒe": 256460,
596
+ "ʒi": 256454,
597
+ "ː.": 256241,
598
+ "ːˈ": 256221,
599
+ "ːˌ": 256411,
600
+ "̯o": 256231,
601
+ "͡f": 256313,
602
+ "͡s": 256012,
603
+ "‍ə": 256029,
604
+ "‍ən": 256564,
605
+ "‍ɪ": 256001,
606
+ "‍ʃ": 256045,
607
+ "‍ʒ": 256055
608
+ }
all_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "total_flos": 2097228602327040.0,
4
+ "train_loss": 0.050915672732994244,
5
+ "train_runtime": 738268.7103,
6
+ "train_samples_per_second": 10.388,
7
+ "train_steps_per_second": 0.325
8
+ }
chat_template.jinja ADDED
@@ -0,0 +1 @@
 
 
1
+ {% if messages[0]['role'] == 'system' %}{% set loop_messages = messages[1:] %}{% set system_message = messages[0]['content'] %}{% else %}{% set loop_messages = messages %}{% endif %}{% if system_message is defined %}{{ system_message }}{% endif %}{% for message in loop_messages %}{% set content = message['content'] %}{% if message['role'] == 'user' %}{{ content }}{% elif message['role'] == 'assistant' %}{{ content }}{% endif %}{% endfor %}
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Gemma2ForCausalLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "attn_logit_softcapping": 50.0,
8
+ "bos_token_id": 2,
9
+ "cache_implementation": "hybrid",
10
+ "eos_token_id": 1,
11
+ "final_logit_softcapping": 30.0,
12
+ "head_dim": 256,
13
+ "hidden_act": "gelu_pytorch_tanh",
14
+ "hidden_activation": "gelu_pytorch_tanh",
15
+ "hidden_size": 2304,
16
+ "initializer_range": 0.02,
17
+ "intermediate_size": 9216,
18
+ "max_position_embeddings": 8192,
19
+ "model_type": "gemma2",
20
+ "num_attention_heads": 8,
21
+ "num_hidden_layers": 26,
22
+ "num_key_value_heads": 4,
23
+ "pad_token_id": 0,
24
+ "query_pre_attn_scalar": 256,
25
+ "rms_norm_eps": 1e-06,
26
+ "rope_theta": 10000.0,
27
+ "sliding_window": 4096,
28
+ "torch_dtype": "bfloat16",
29
+ "transformers_version": "4.52.1",
30
+ "use_cache": false,
31
+ "vocab_size": 256606
32
+ }
generation_config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 2,
4
+ "cache_implementation": "hybrid",
5
+ "eos_token_id": 1,
6
+ "pad_token_id": 0,
7
+ "transformers_version": "4.52.1",
8
+ "use_cache": false
9
+ }
model-00001-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6b5e987b6ff50d86c9c9bad2d803bb1dd1f10baebc1e4df65fa16f1d275bab3f
3
+ size 4990818208
model-00002-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5189aa23cd8d9d48e7bce61a6e1a0106a8cda856ccd3c938e84908f96c3790bb
3
+ size 240691728
model.safetensors.index.json ADDED
@@ -0,0 +1,295 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 5231476224
4
+ },
5
+ "weight_map": {
6
+ "model.embed_tokens.weight": "model-00001-of-00002.safetensors",
7
+ "model.layers.0.input_layernorm.weight": "model-00001-of-00002.safetensors",
8
+ "model.layers.0.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
9
+ "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
10
+ "model.layers.0.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
11
+ "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
12
+ "model.layers.0.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
13
+ "model.layers.0.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
14
+ "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
15
+ "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
16
+ "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
17
+ "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
18
+ "model.layers.1.input_layernorm.weight": "model-00001-of-00002.safetensors",
19
+ "model.layers.1.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
20
+ "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
21
+ "model.layers.1.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
22
+ "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
23
+ "model.layers.1.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
24
+ "model.layers.1.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
25
+ "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
26
+ "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
27
+ "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
28
+ "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
29
+ "model.layers.10.input_layernorm.weight": "model-00001-of-00002.safetensors",
30
+ "model.layers.10.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
31
+ "model.layers.10.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
32
+ "model.layers.10.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
33
+ "model.layers.10.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
34
+ "model.layers.10.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
35
+ "model.layers.10.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
36
+ "model.layers.10.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
37
+ "model.layers.10.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
38
+ "model.layers.10.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
39
+ "model.layers.10.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
40
+ "model.layers.11.input_layernorm.weight": "model-00001-of-00002.safetensors",
41
+ "model.layers.11.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
42
+ "model.layers.11.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
43
+ "model.layers.11.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
44
+ "model.layers.11.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
45
+ "model.layers.11.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
46
+ "model.layers.11.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
47
+ "model.layers.11.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
48
+ "model.layers.11.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
49
+ "model.layers.11.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
50
+ "model.layers.11.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
51
+ "model.layers.12.input_layernorm.weight": "model-00001-of-00002.safetensors",
52
+ "model.layers.12.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
53
+ "model.layers.12.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
54
+ "model.layers.12.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
55
+ "model.layers.12.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
56
+ "model.layers.12.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
57
+ "model.layers.12.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
58
+ "model.layers.12.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
59
+ "model.layers.12.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
60
+ "model.layers.12.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
61
+ "model.layers.12.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
62
+ "model.layers.13.input_layernorm.weight": "model-00001-of-00002.safetensors",
63
+ "model.layers.13.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
64
+ "model.layers.13.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
65
+ "model.layers.13.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
66
+ "model.layers.13.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
67
+ "model.layers.13.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
68
+ "model.layers.13.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
69
+ "model.layers.13.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
70
+ "model.layers.13.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
71
+ "model.layers.13.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
72
+ "model.layers.13.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
73
+ "model.layers.14.input_layernorm.weight": "model-00001-of-00002.safetensors",
74
+ "model.layers.14.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
75
+ "model.layers.14.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
76
+ "model.layers.14.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
77
+ "model.layers.14.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
78
+ "model.layers.14.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
79
+ "model.layers.14.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
80
+ "model.layers.14.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
81
+ "model.layers.14.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
82
+ "model.layers.14.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
83
+ "model.layers.14.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
84
+ "model.layers.15.input_layernorm.weight": "model-00001-of-00002.safetensors",
85
+ "model.layers.15.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
86
+ "model.layers.15.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
87
+ "model.layers.15.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
88
+ "model.layers.15.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
89
+ "model.layers.15.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
90
+ "model.layers.15.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
91
+ "model.layers.15.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
92
+ "model.layers.15.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
93
+ "model.layers.15.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
94
+ "model.layers.15.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
95
+ "model.layers.16.input_layernorm.weight": "model-00001-of-00002.safetensors",
96
+ "model.layers.16.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
97
+ "model.layers.16.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
98
+ "model.layers.16.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
99
+ "model.layers.16.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
100
+ "model.layers.16.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
101
+ "model.layers.16.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
102
+ "model.layers.16.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
103
+ "model.layers.16.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
104
+ "model.layers.16.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
105
+ "model.layers.16.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
106
+ "model.layers.17.input_layernorm.weight": "model-00001-of-00002.safetensors",
107
+ "model.layers.17.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
108
+ "model.layers.17.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
109
+ "model.layers.17.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
110
+ "model.layers.17.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
111
+ "model.layers.17.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
112
+ "model.layers.17.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
113
+ "model.layers.17.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
114
+ "model.layers.17.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
115
+ "model.layers.17.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
116
+ "model.layers.17.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
117
+ "model.layers.18.input_layernorm.weight": "model-00001-of-00002.safetensors",
118
+ "model.layers.18.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
119
+ "model.layers.18.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
120
+ "model.layers.18.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
121
+ "model.layers.18.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
122
+ "model.layers.18.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
123
+ "model.layers.18.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
124
+ "model.layers.18.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
125
+ "model.layers.18.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
126
+ "model.layers.18.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
127
+ "model.layers.18.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
128
+ "model.layers.19.input_layernorm.weight": "model-00001-of-00002.safetensors",
129
+ "model.layers.19.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
130
+ "model.layers.19.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
131
+ "model.layers.19.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
132
+ "model.layers.19.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
133
+ "model.layers.19.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
134
+ "model.layers.19.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
135
+ "model.layers.19.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
136
+ "model.layers.19.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
137
+ "model.layers.19.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
138
+ "model.layers.19.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
139
+ "model.layers.2.input_layernorm.weight": "model-00001-of-00002.safetensors",
140
+ "model.layers.2.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
141
+ "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
142
+ "model.layers.2.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
143
+ "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
144
+ "model.layers.2.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
145
+ "model.layers.2.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
146
+ "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
147
+ "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
148
+ "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
149
+ "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
150
+ "model.layers.20.input_layernorm.weight": "model-00001-of-00002.safetensors",
151
+ "model.layers.20.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
152
+ "model.layers.20.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
153
+ "model.layers.20.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
154
+ "model.layers.20.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
155
+ "model.layers.20.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
156
+ "model.layers.20.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
157
+ "model.layers.20.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
158
+ "model.layers.20.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
159
+ "model.layers.20.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
160
+ "model.layers.20.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
161
+ "model.layers.21.input_layernorm.weight": "model-00001-of-00002.safetensors",
162
+ "model.layers.21.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
163
+ "model.layers.21.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
164
+ "model.layers.21.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
165
+ "model.layers.21.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
166
+ "model.layers.21.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
167
+ "model.layers.21.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
168
+ "model.layers.21.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
169
+ "model.layers.21.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
170
+ "model.layers.21.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
171
+ "model.layers.21.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
172
+ "model.layers.22.input_layernorm.weight": "model-00001-of-00002.safetensors",
173
+ "model.layers.22.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
174
+ "model.layers.22.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
175
+ "model.layers.22.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
176
+ "model.layers.22.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
177
+ "model.layers.22.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
178
+ "model.layers.22.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
179
+ "model.layers.22.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
180
+ "model.layers.22.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
181
+ "model.layers.22.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
182
+ "model.layers.22.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
183
+ "model.layers.23.input_layernorm.weight": "model-00001-of-00002.safetensors",
184
+ "model.layers.23.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
185
+ "model.layers.23.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
186
+ "model.layers.23.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
187
+ "model.layers.23.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
188
+ "model.layers.23.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
189
+ "model.layers.23.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
190
+ "model.layers.23.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
191
+ "model.layers.23.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
192
+ "model.layers.23.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
193
+ "model.layers.23.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
194
+ "model.layers.24.input_layernorm.weight": "model-00002-of-00002.safetensors",
195
+ "model.layers.24.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
196
+ "model.layers.24.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
197
+ "model.layers.24.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
198
+ "model.layers.24.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
199
+ "model.layers.24.post_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
200
+ "model.layers.24.pre_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
201
+ "model.layers.24.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
202
+ "model.layers.24.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
203
+ "model.layers.24.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
204
+ "model.layers.24.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
205
+ "model.layers.25.input_layernorm.weight": "model-00002-of-00002.safetensors",
206
+ "model.layers.25.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
207
+ "model.layers.25.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
208
+ "model.layers.25.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
209
+ "model.layers.25.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
210
+ "model.layers.25.post_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
211
+ "model.layers.25.pre_feedforward_layernorm.weight": "model-00002-of-00002.safetensors",
212
+ "model.layers.25.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
213
+ "model.layers.25.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
214
+ "model.layers.25.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
215
+ "model.layers.25.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
216
+ "model.layers.3.input_layernorm.weight": "model-00001-of-00002.safetensors",
217
+ "model.layers.3.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
218
+ "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
219
+ "model.layers.3.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
220
+ "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
221
+ "model.layers.3.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
222
+ "model.layers.3.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
223
+ "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
224
+ "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
225
+ "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
226
+ "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
227
+ "model.layers.4.input_layernorm.weight": "model-00001-of-00002.safetensors",
228
+ "model.layers.4.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
229
+ "model.layers.4.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
230
+ "model.layers.4.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
231
+ "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
232
+ "model.layers.4.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
233
+ "model.layers.4.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
234
+ "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
235
+ "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
236
+ "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
237
+ "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
238
+ "model.layers.5.input_layernorm.weight": "model-00001-of-00002.safetensors",
239
+ "model.layers.5.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
240
+ "model.layers.5.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
241
+ "model.layers.5.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
242
+ "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
243
+ "model.layers.5.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
244
+ "model.layers.5.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
245
+ "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
246
+ "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
247
+ "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
248
+ "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
249
+ "model.layers.6.input_layernorm.weight": "model-00001-of-00002.safetensors",
250
+ "model.layers.6.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
251
+ "model.layers.6.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
252
+ "model.layers.6.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
253
+ "model.layers.6.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
254
+ "model.layers.6.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
255
+ "model.layers.6.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
256
+ "model.layers.6.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
257
+ "model.layers.6.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
258
+ "model.layers.6.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
259
+ "model.layers.6.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
260
+ "model.layers.7.input_layernorm.weight": "model-00001-of-00002.safetensors",
261
+ "model.layers.7.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
262
+ "model.layers.7.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
263
+ "model.layers.7.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
264
+ "model.layers.7.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
265
+ "model.layers.7.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
266
+ "model.layers.7.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
267
+ "model.layers.7.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
268
+ "model.layers.7.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
269
+ "model.layers.7.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
270
+ "model.layers.7.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
271
+ "model.layers.8.input_layernorm.weight": "model-00001-of-00002.safetensors",
272
+ "model.layers.8.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
273
+ "model.layers.8.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
274
+ "model.layers.8.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
275
+ "model.layers.8.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
276
+ "model.layers.8.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
277
+ "model.layers.8.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
278
+ "model.layers.8.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
279
+ "model.layers.8.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
280
+ "model.layers.8.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
281
+ "model.layers.8.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
282
+ "model.layers.9.input_layernorm.weight": "model-00001-of-00002.safetensors",
283
+ "model.layers.9.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
284
+ "model.layers.9.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
285
+ "model.layers.9.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
286
+ "model.layers.9.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
287
+ "model.layers.9.post_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
288
+ "model.layers.9.pre_feedforward_layernorm.weight": "model-00001-of-00002.safetensors",
289
+ "model.layers.9.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
290
+ "model.layers.9.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
291
+ "model.layers.9.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
292
+ "model.layers.9.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
293
+ "model.norm.weight": "model-00002-of-00002.safetensors"
294
+ }
295
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<bos>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<eos>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "<pad>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "unk_token": {
24
+ "content": "<unk>",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ }
30
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b9bc615c6c14973dda6a5058c6ee0d3693d581fa5067b99c2c7af6388101fbb8
3
+ size 34472979
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:61a7b147390c64585d6c3543dd6fc636906c9af3865a5548f27f31aee1d4c8e2
3
+ size 4241003
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff
 
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "total_flos": 2097228602327040.0,
4
+ "train_loss": 0.050915672732994244,
5
+ "train_runtime": 738268.7103,
6
+ "train_samples_per_second": 10.388,
7
+ "train_steps_per_second": 0.325
8
+ }
trainer_log.jsonl ADDED
The diff for this file is too large to render. See raw diff
 
trainer_state.json ADDED
The diff for this file is too large to render. See raw diff
 
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:144360cec1f136fe47a8c04783405f9d6a6f86f02e4caaf557091c1f6aebd8e5
3
+ size 7544