rdz-falcon commited on
Commit
fcc3e39
·
verified ·
1 Parent(s): f56606b

stage2: epoch 2 (loss=2.0370)

Browse files
.gitattributes CHANGED
@@ -34,3 +34,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  stage1/latest/tokenizer.json filter=lfs diff=lfs merge=lfs -text
 
 
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  stage1/latest/tokenizer.json filter=lfs diff=lfs merge=lfs -text
37
+ stage2/latest/tokenizer.json filter=lfs diff=lfs merge=lfs -text
stage2/latest/README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: google/gemma-3-270m
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:google/gemma-3-270m
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.18.1
stage2/latest/adapter_config.json ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": "google/gemma-3-270m",
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": false,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 64,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.1,
22
+ "megatron_config": null,
23
+ "megatron_core": "megatron.core",
24
+ "modules_to_save": [
25
+ "embed_tokens",
26
+ "lm_head"
27
+ ],
28
+ "peft_type": "LORA",
29
+ "peft_version": "0.18.1",
30
+ "qalora_group_size": 16,
31
+ "r": 32,
32
+ "rank_pattern": {},
33
+ "revision": null,
34
+ "target_modules": [
35
+ "k_proj",
36
+ "gate_proj",
37
+ "down_proj",
38
+ "o_proj",
39
+ "v_proj",
40
+ "q_proj",
41
+ "up_proj"
42
+ ],
43
+ "target_parameters": null,
44
+ "task_type": "CAUSAL_LM",
45
+ "trainable_token_indices": null,
46
+ "use_dora": false,
47
+ "use_qalora": false,
48
+ "use_rslora": false
49
+ }
stage2/latest/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e6ee785ca40224b5a7f353c56716356a8dfa79ddd84bdc39634d31d09c226a88
3
+ size 4064208176
stage2/latest/added_tokens.json ADDED
@@ -0,0 +1,475 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "<M0>": 262151,
3
+ "<M100>": 262152,
4
+ "<M101>": 262153,
5
+ "<M102>": 262154,
6
+ "<M103>": 262155,
7
+ "<M104>": 262156,
8
+ "<M105>": 262157,
9
+ "<M106>": 262158,
10
+ "<M107>": 262159,
11
+ "<M108>": 262160,
12
+ "<M109>": 262161,
13
+ "<M10>": 262162,
14
+ "<M110>": 262163,
15
+ "<M111>": 262164,
16
+ "<M112>": 262165,
17
+ "<M113>": 262166,
18
+ "<M114>": 262167,
19
+ "<M115>": 262168,
20
+ "<M116>": 262169,
21
+ "<M117>": 262170,
22
+ "<M118>": 262171,
23
+ "<M119>": 262172,
24
+ "<M11>": 262173,
25
+ "<M120>": 262174,
26
+ "<M121>": 262175,
27
+ "<M122>": 262176,
28
+ "<M123>": 262177,
29
+ "<M124>": 262178,
30
+ "<M125>": 262179,
31
+ "<M126>": 262180,
32
+ "<M127>": 262181,
33
+ "<M128>": 262182,
34
+ "<M129>": 262183,
35
+ "<M12>": 262184,
36
+ "<M130>": 262185,
37
+ "<M131>": 262186,
38
+ "<M132>": 262187,
39
+ "<M133>": 262188,
40
+ "<M134>": 262189,
41
+ "<M135>": 262190,
42
+ "<M136>": 262191,
43
+ "<M137>": 262192,
44
+ "<M138>": 262193,
45
+ "<M139>": 262194,
46
+ "<M13>": 262195,
47
+ "<M140>": 262196,
48
+ "<M141>": 262197,
49
+ "<M142>": 262198,
50
+ "<M143>": 262199,
51
+ "<M144>": 262200,
52
+ "<M145>": 262201,
53
+ "<M146>": 262202,
54
+ "<M147>": 262203,
55
+ "<M148>": 262204,
56
+ "<M149>": 262205,
57
+ "<M14>": 262206,
58
+ "<M150>": 262207,
59
+ "<M151>": 262208,
60
+ "<M152>": 262209,
61
+ "<M153>": 262210,
62
+ "<M154>": 262211,
63
+ "<M155>": 262212,
64
+ "<M156>": 262213,
65
+ "<M157>": 262214,
66
+ "<M158>": 262215,
67
+ "<M159>": 262216,
68
+ "<M15>": 262217,
69
+ "<M160>": 262218,
70
+ "<M161>": 262219,
71
+ "<M162>": 262220,
72
+ "<M163>": 262221,
73
+ "<M164>": 262222,
74
+ "<M165>": 262223,
75
+ "<M166>": 262224,
76
+ "<M167>": 262225,
77
+ "<M168>": 262226,
78
+ "<M169>": 262227,
79
+ "<M16>": 262228,
80
+ "<M170>": 262229,
81
+ "<M171>": 262230,
82
+ "<M172>": 262231,
83
+ "<M173>": 262232,
84
+ "<M174>": 262233,
85
+ "<M175>": 262234,
86
+ "<M176>": 262235,
87
+ "<M177>": 262236,
88
+ "<M178>": 262237,
89
+ "<M179>": 262238,
90
+ "<M17>": 262239,
91
+ "<M180>": 262240,
92
+ "<M181>": 262241,
93
+ "<M182>": 262242,
94
+ "<M183>": 262243,
95
+ "<M184>": 262244,
96
+ "<M185>": 262245,
97
+ "<M186>": 262246,
98
+ "<M187>": 262247,
99
+ "<M188>": 262248,
100
+ "<M18>": 262249,
101
+ "<M190>": 262250,
102
+ "<M191>": 262251,
103
+ "<M192>": 262252,
104
+ "<M194>": 262253,
105
+ "<M196>": 262254,
106
+ "<M197>": 262255,
107
+ "<M198>": 262256,
108
+ "<M199>": 262257,
109
+ "<M19>": 262258,
110
+ "<M1>": 262259,
111
+ "<M200>": 262260,
112
+ "<M205>": 262261,
113
+ "<M206>": 262262,
114
+ "<M207>": 262263,
115
+ "<M208>": 262264,
116
+ "<M209>": 262265,
117
+ "<M20>": 262266,
118
+ "<M210>": 262267,
119
+ "<M211>": 262268,
120
+ "<M212>": 262269,
121
+ "<M213>": 262270,
122
+ "<M214>": 262271,
123
+ "<M215>": 262272,
124
+ "<M216>": 262273,
125
+ "<M217>": 262274,
126
+ "<M218>": 262275,
127
+ "<M219>": 262276,
128
+ "<M21>": 262277,
129
+ "<M220>": 262278,
130
+ "<M221>": 262279,
131
+ "<M222>": 262280,
132
+ "<M223>": 262281,
133
+ "<M224>": 262282,
134
+ "<M225>": 262283,
135
+ "<M226>": 262284,
136
+ "<M227>": 262285,
137
+ "<M228>": 262286,
138
+ "<M229>": 262287,
139
+ "<M22>": 262288,
140
+ "<M230>": 262289,
141
+ "<M231>": 262290,
142
+ "<M232>": 262291,
143
+ "<M233>": 262292,
144
+ "<M234>": 262293,
145
+ "<M235>": 262294,
146
+ "<M236>": 262295,
147
+ "<M238>": 262296,
148
+ "<M239>": 262297,
149
+ "<M23>": 262298,
150
+ "<M240>": 262299,
151
+ "<M241>": 262300,
152
+ "<M242>": 262301,
153
+ "<M243>": 262302,
154
+ "<M244>": 262303,
155
+ "<M245>": 262304,
156
+ "<M246>": 262305,
157
+ "<M247>": 262306,
158
+ "<M248>": 262307,
159
+ "<M24>": 262308,
160
+ "<M250>": 262309,
161
+ "<M252>": 262310,
162
+ "<M254>": 262311,
163
+ "<M256>": 262312,
164
+ "<M257>": 262313,
165
+ "<M258>": 262314,
166
+ "<M259>": 262315,
167
+ "<M25>": 262316,
168
+ "<M260>": 262317,
169
+ "<M261>": 262318,
170
+ "<M262>": 262319,
171
+ "<M263>": 262320,
172
+ "<M264>": 262321,
173
+ "<M265>": 262322,
174
+ "<M266>": 262323,
175
+ "<M267>": 262324,
176
+ "<M268>": 262325,
177
+ "<M269>": 262326,
178
+ "<M26>": 262327,
179
+ "<M270>": 262328,
180
+ "<M271>": 262329,
181
+ "<M272>": 262330,
182
+ "<M273>": 262331,
183
+ "<M274>": 262332,
184
+ "<M275>": 262333,
185
+ "<M276>": 262334,
186
+ "<M277>": 262335,
187
+ "<M278>": 262336,
188
+ "<M27>": 262337,
189
+ "<M280>": 262338,
190
+ "<M282>": 262339,
191
+ "<M283>": 262340,
192
+ "<M284>": 262341,
193
+ "<M285>": 262342,
194
+ "<M286>": 262343,
195
+ "<M287>": 262344,
196
+ "<M289>": 262345,
197
+ "<M28>": 262346,
198
+ "<M290>": 262347,
199
+ "<M291>": 262348,
200
+ "<M292>": 262349,
201
+ "<M293>": 262350,
202
+ "<M295>": 262351,
203
+ "<M296>": 262352,
204
+ "<M297>": 262353,
205
+ "<M298>": 262354,
206
+ "<M29>": 262355,
207
+ "<M2>": 262356,
208
+ "<M300>": 262357,
209
+ "<M303>": 262358,
210
+ "<M304>": 262359,
211
+ "<M306>": 262360,
212
+ "<M307>": 262361,
213
+ "<M308>": 262362,
214
+ "<M309>": 262363,
215
+ "<M30>": 262364,
216
+ "<M310>": 262365,
217
+ "<M311>": 262366,
218
+ "<M312>": 262367,
219
+ "<M313>": 262368,
220
+ "<M315>": 262369,
221
+ "<M316>": 262370,
222
+ "<M317>": 262371,
223
+ "<M318>": 262372,
224
+ "<M319>": 262373,
225
+ "<M31>": 262374,
226
+ "<M320>": 262375,
227
+ "<M321>": 262376,
228
+ "<M322>": 262377,
229
+ "<M323>": 262378,
230
+ "<M324>": 262379,
231
+ "<M325>": 262380,
232
+ "<M326>": 262381,
233
+ "<M327>": 262382,
234
+ "<M328>": 262383,
235
+ "<M329>": 262384,
236
+ "<M32>": 262385,
237
+ "<M330>": 262386,
238
+ "<M332>": 262387,
239
+ "<M333>": 262388,
240
+ "<M334>": 262389,
241
+ "<M335>": 262390,
242
+ "<M336>": 262391,
243
+ "<M337>": 262392,
244
+ "<M338>": 262393,
245
+ "<M33>": 262394,
246
+ "<M342>": 262395,
247
+ "<M343>": 262396,
248
+ "<M345>": 262397,
249
+ "<M346>": 262398,
250
+ "<M347>": 262399,
251
+ "<M348>": 262400,
252
+ "<M349>": 262401,
253
+ "<M34>": 262402,
254
+ "<M350>": 262403,
255
+ "<M351>": 262404,
256
+ "<M352>": 262405,
257
+ "<M353>": 262406,
258
+ "<M354>": 262407,
259
+ "<M355>": 262408,
260
+ "<M356>": 262409,
261
+ "<M357>": 262410,
262
+ "<M359>": 262411,
263
+ "<M35>": 262412,
264
+ "<M360>": 262413,
265
+ "<M362>": 262414,
266
+ "<M363>": 262415,
267
+ "<M364>": 262416,
268
+ "<M365>": 262417,
269
+ "<M366>": 262418,
270
+ "<M367>": 262419,
271
+ "<M368>": 262420,
272
+ "<M369>": 262421,
273
+ "<M36>": 262422,
274
+ "<M370>": 262423,
275
+ "<M371>": 262424,
276
+ "<M372>": 262425,
277
+ "<M373>": 262426,
278
+ "<M374>": 262427,
279
+ "<M375>": 262428,
280
+ "<M376>": 262429,
281
+ "<M378>": 262430,
282
+ "<M379>": 262431,
283
+ "<M37>": 262432,
284
+ "<M380>": 262433,
285
+ "<M381>": 262434,
286
+ "<M382>": 262435,
287
+ "<M383>": 262436,
288
+ "<M384>": 262437,
289
+ "<M385>": 262438,
290
+ "<M386>": 262439,
291
+ "<M387>": 262440,
292
+ "<M388>": 262441,
293
+ "<M389>": 262442,
294
+ "<M38>": 262443,
295
+ "<M390>": 262444,
296
+ "<M391>": 262445,
297
+ "<M392>": 262446,
298
+ "<M393>": 262447,
299
+ "<M395>": 262448,
300
+ "<M396>": 262449,
301
+ "<M398>": 262450,
302
+ "<M39>": 262451,
303
+ "<M3>": 262452,
304
+ "<M402>": 262453,
305
+ "<M403>": 262454,
306
+ "<M404>": 262455,
307
+ "<M405>": 262456,
308
+ "<M406>": 262457,
309
+ "<M408>": 262458,
310
+ "<M409>": 262459,
311
+ "<M40>": 262460,
312
+ "<M410>": 262461,
313
+ "<M411>": 262462,
314
+ "<M412>": 262463,
315
+ "<M413>": 262464,
316
+ "<M414>": 262465,
317
+ "<M416>": 262466,
318
+ "<M418>": 262467,
319
+ "<M419>": 262468,
320
+ "<M41>": 262469,
321
+ "<M421>": 262470,
322
+ "<M422>": 262471,
323
+ "<M423>": 262472,
324
+ "<M424>": 262473,
325
+ "<M426>": 262474,
326
+ "<M427>": 262475,
327
+ "<M428>": 262476,
328
+ "<M429>": 262477,
329
+ "<M42>": 262478,
330
+ "<M430>": 262479,
331
+ "<M431>": 262480,
332
+ "<M432>": 262481,
333
+ "<M433>": 262482,
334
+ "<M434>": 262483,
335
+ "<M435>": 262484,
336
+ "<M436>": 262485,
337
+ "<M437>": 262486,
338
+ "<M438>": 262487,
339
+ "<M439>": 262488,
340
+ "<M43>": 262489,
341
+ "<M440>": 262490,
342
+ "<M441>": 262491,
343
+ "<M442>": 262492,
344
+ "<M443>": 262493,
345
+ "<M444>": 262494,
346
+ "<M445>": 262495,
347
+ "<M446>": 262496,
348
+ "<M447>": 262497,
349
+ "<M448>": 262498,
350
+ "<M449>": 262499,
351
+ "<M44>": 262500,
352
+ "<M450>": 262501,
353
+ "<M451>": 262502,
354
+ "<M452>": 262503,
355
+ "<M454>": 262504,
356
+ "<M455>": 262505,
357
+ "<M456>": 262506,
358
+ "<M457>": 262507,
359
+ "<M458>": 262508,
360
+ "<M459>": 262509,
361
+ "<M45>": 262510,
362
+ "<M460>": 262511,
363
+ "<M461>": 262512,
364
+ "<M462>": 262513,
365
+ "<M463>": 262514,
366
+ "<M464>": 262515,
367
+ "<M465>": 262516,
368
+ "<M466>": 262517,
369
+ "<M467>": 262518,
370
+ "<M468>": 262519,
371
+ "<M469>": 262520,
372
+ "<M46>": 262521,
373
+ "<M470>": 262522,
374
+ "<M471>": 262523,
375
+ "<M472>": 262524,
376
+ "<M473>": 262525,
377
+ "<M474>": 262526,
378
+ "<M475>": 262527,
379
+ "<M477>": 262528,
380
+ "<M479>": 262529,
381
+ "<M47>": 262530,
382
+ "<M481>": 262531,
383
+ "<M483>": 262532,
384
+ "<M484>": 262533,
385
+ "<M485>": 262534,
386
+ "<M487>": 262535,
387
+ "<M489>": 262536,
388
+ "<M48>": 262537,
389
+ "<M490>": 262538,
390
+ "<M491>": 262539,
391
+ "<M492>": 262540,
392
+ "<M493>": 262541,
393
+ "<M494>": 262542,
394
+ "<M495>": 262543,
395
+ "<M496>": 262544,
396
+ "<M497>": 262545,
397
+ "<M498>": 262546,
398
+ "<M499>": 262547,
399
+ "<M49>": 262548,
400
+ "<M4>": 262549,
401
+ "<M500>": 262550,
402
+ "<M501>": 262551,
403
+ "<M502>": 262552,
404
+ "<M503>": 262553,
405
+ "<M504>": 262554,
406
+ "<M505>": 262555,
407
+ "<M506>": 262556,
408
+ "<M507>": 262557,
409
+ "<M508>": 262558,
410
+ "<M509>": 262559,
411
+ "<M50>": 262560,
412
+ "<M510>": 262561,
413
+ "<M511>": 262562,
414
+ "<M51>": 262563,
415
+ "<M52>": 262564,
416
+ "<M53>": 262565,
417
+ "<M54>": 262566,
418
+ "<M55>": 262567,
419
+ "<M56>": 262568,
420
+ "<M57>": 262569,
421
+ "<M58>": 262570,
422
+ "<M59>": 262571,
423
+ "<M5>": 262572,
424
+ "<M60>": 262573,
425
+ "<M61>": 262574,
426
+ "<M62>": 262575,
427
+ "<M63>": 262576,
428
+ "<M64>": 262577,
429
+ "<M65>": 262578,
430
+ "<M66>": 262579,
431
+ "<M67>": 262580,
432
+ "<M68>": 262581,
433
+ "<M69>": 262582,
434
+ "<M6>": 262583,
435
+ "<M70>": 262584,
436
+ "<M71>": 262585,
437
+ "<M72>": 262586,
438
+ "<M73>": 262587,
439
+ "<M74>": 262588,
440
+ "<M75>": 262589,
441
+ "<M76>": 262590,
442
+ "<M77>": 262591,
443
+ "<M78>": 262592,
444
+ "<M79>": 262593,
445
+ "<M7>": 262594,
446
+ "<M80>": 262595,
447
+ "<M81>": 262596,
448
+ "<M82>": 262597,
449
+ "<M83>": 262598,
450
+ "<M84>": 262599,
451
+ "<M85>": 262600,
452
+ "<M86>": 262601,
453
+ "<M87>": 262602,
454
+ "<M88>": 262603,
455
+ "<M89>": 262604,
456
+ "<M8>": 262605,
457
+ "<M90>": 262606,
458
+ "<M91>": 262607,
459
+ "<M92>": 262608,
460
+ "<M93>": 262609,
461
+ "<M94>": 262610,
462
+ "<M95>": 262611,
463
+ "<M96>": 262612,
464
+ "<M97>": 262613,
465
+ "<M98>": 262614,
466
+ "<M99>": 262615,
467
+ "<M9>": 262616,
468
+ "<M_END>": 262147,
469
+ "<M_START>": 262146,
470
+ "<PAD>": 262145,
471
+ "<image_soft_token>": 262144,
472
+ "<|LEN_LONG|>": 262150,
473
+ "<|LEN_MEDIUM|>": 262149,
474
+ "<|LEN_SHORT|>": 262148
475
+ }
stage2/latest/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c9c04f171d6ef0c3776cef4b38adc696581d6ce71dc755f4d652e72a30671c83
3
+ size 2750164363
stage2/latest/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f14b2e65070f75934c887817130bc50c3234b38f2e0b24c590a932a3f3836f32
3
+ size 1593
stage2/latest/special_tokens_map.json ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ {
4
+ "content": "<M_START>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false
9
+ },
10
+ {
11
+ "content": "<M_END>",
12
+ "lstrip": false,
13
+ "normalized": false,
14
+ "rstrip": false,
15
+ "single_word": false
16
+ },
17
+ {
18
+ "content": "<|LEN_SHORT|>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ {
25
+ "content": "<|LEN_MEDIUM|>",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ },
31
+ {
32
+ "content": "<|LEN_LONG|>",
33
+ "lstrip": false,
34
+ "normalized": false,
35
+ "rstrip": false,
36
+ "single_word": false
37
+ }
38
+ ],
39
+ "boi_token": "<start_of_image>",
40
+ "bos_token": {
41
+ "content": "<bos>",
42
+ "lstrip": false,
43
+ "normalized": false,
44
+ "rstrip": false,
45
+ "single_word": false
46
+ },
47
+ "eoi_token": "<end_of_image>",
48
+ "eos_token": {
49
+ "content": "<eos>",
50
+ "lstrip": false,
51
+ "normalized": false,
52
+ "rstrip": false,
53
+ "single_word": false
54
+ },
55
+ "image_token": "<image_soft_token>",
56
+ "pad_token": {
57
+ "content": "<PAD>",
58
+ "lstrip": false,
59
+ "normalized": false,
60
+ "rstrip": false,
61
+ "single_word": false
62
+ },
63
+ "unk_token": {
64
+ "content": "<unk>",
65
+ "lstrip": false,
66
+ "normalized": false,
67
+ "rstrip": false,
68
+ "single_word": false
69
+ }
70
+ }
stage2/latest/tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:31abc1ffa28ad72cf54cf10c4225248b4e31d402da9f11b58e061a24d941f7da
3
+ size 33470858
stage2/latest/tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1299c11d7cf632ef3b4e11937501358ada021bbdf7c47638d13c0ee982f2e79c
3
+ size 4689074
stage2/latest/tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff
 
stage2/latest/training_state.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "stage": "stage2",
3
+ "epoch_completed": 2,
4
+ "total_epochs": 30,
5
+ "avg_loss": 2.036984529169244,
6
+ "global_step": 1512,
7
+ "saved_at": "2026-02-10T23:12:59.105976Z",
8
+ "is_lora": true,
9
+ "base_model": "google/gemma-3-270m"
10
+ }