valteu commited on
Commit
d349b51
·
verified ·
1 Parent(s): fc08c4f

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ experiment_config.json filter=lfs diff=lfs merge=lfs -text
37
+ logs.jsonl filter=lfs diff=lfs merge=lfs -text
38
+ profiler_cache.csv filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: meta-llama/Llama-3.2-1B-Instruct
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:meta-llama/Llama-3.2-1B-Instruct
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.17.2.dev0
adapter_config.json ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": "meta-llama/Llama-3.2-1B-Instruct",
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "eva_config": null,
10
+ "exclude_modules": null,
11
+ "fan_in_fan_out": false,
12
+ "inference_mode": true,
13
+ "init_lora_weights": true,
14
+ "layer_replication": null,
15
+ "layers_pattern": null,
16
+ "layers_to_transform": null,
17
+ "loftq_config": {},
18
+ "lora_alpha": 16,
19
+ "lora_bias": false,
20
+ "lora_dropout": 0.1,
21
+ "megatron_config": null,
22
+ "megatron_core": "megatron.core",
23
+ "modules_to_save": null,
24
+ "peft_type": "LORA",
25
+ "qalora_group_size": 16,
26
+ "r": 16,
27
+ "rank_pattern": {},
28
+ "revision": null,
29
+ "target_modules": [
30
+ "v_proj",
31
+ "gate_proj",
32
+ "k_proj",
33
+ "o_proj",
34
+ "down_proj",
35
+ "up_proj",
36
+ "q_proj"
37
+ ],
38
+ "target_parameters": null,
39
+ "task_type": "CAUSAL_LM",
40
+ "trainable_token_indices": null,
41
+ "use_dora": false,
42
+ "use_qalora": false,
43
+ "use_rslora": true
44
+ }
adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6b67fd0924525c82460894368a8284375d9a41631e1a0eddd65f21e0cddf3ed8
3
+ size 45118424
experiment_config.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1340278164b6b90c547d3f1ce3023a5abdb53aa96cbfd5ba9912f7daaa5107f2
3
+ size 42241621
logs.jsonl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3776989caa1368ccb74d035a012f3b1e75249352e1737616ae6ec7d1ded37381
3
+ size 20195290
profiler_cache.csv ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:edf429385a4e2639616b14304c8b96414200fb9c923ea93b70a089f87e958ecb
3
+ size 13759800
results.json ADDED
@@ -0,0 +1,359 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "results": {
3
+ "nlg_e2e_nlg": [
4
+ {
5
+ "rougeL": {
6
+ "precision": 0.5371735641271278,
7
+ "recall": 0.48904862125499576,
8
+ "fmeasure": 0.5040126052119368
9
+ },
10
+ "rouge1": {
11
+ "precision": 0.7578962855797341,
12
+ "recall": 0.6914204132283206,
13
+ "fmeasure": 0.7122474241851529
14
+ },
15
+ "rouge2": {
16
+ "precision": 0.45588092304505795,
17
+ "recall": 0.41478348796625203,
18
+ "fmeasure": 0.4274643923193028
19
+ }
20
+ },
21
+ [
22
+ null,
23
+ null,
24
+ null,
25
+ null,
26
+ null,
27
+ null,
28
+ null,
29
+ null,
30
+ null,
31
+ null,
32
+ null,
33
+ null,
34
+ null,
35
+ null,
36
+ null,
37
+ null,
38
+ null,
39
+ null,
40
+ null,
41
+ null,
42
+ null,
43
+ null,
44
+ null,
45
+ null,
46
+ null,
47
+ null,
48
+ null,
49
+ null,
50
+ null,
51
+ null,
52
+ null,
53
+ null,
54
+ null,
55
+ null,
56
+ null,
57
+ null,
58
+ null,
59
+ null,
60
+ null,
61
+ null,
62
+ null,
63
+ null,
64
+ null,
65
+ null,
66
+ null,
67
+ null,
68
+ null,
69
+ null,
70
+ null,
71
+ null,
72
+ null,
73
+ null,
74
+ null,
75
+ null,
76
+ null,
77
+ null,
78
+ null,
79
+ null,
80
+ null,
81
+ null,
82
+ null,
83
+ null,
84
+ null,
85
+ null,
86
+ null,
87
+ null,
88
+ null,
89
+ null,
90
+ null,
91
+ null,
92
+ null,
93
+ null,
94
+ null,
95
+ null,
96
+ null,
97
+ null,
98
+ null,
99
+ null,
100
+ null,
101
+ null,
102
+ null,
103
+ null,
104
+ null,
105
+ null,
106
+ null,
107
+ null,
108
+ null,
109
+ null,
110
+ null,
111
+ null,
112
+ null,
113
+ null,
114
+ null,
115
+ null,
116
+ null,
117
+ null,
118
+ null,
119
+ null,
120
+ null,
121
+ null,
122
+ null,
123
+ null,
124
+ null,
125
+ null,
126
+ null,
127
+ null,
128
+ null,
129
+ null,
130
+ null,
131
+ null,
132
+ null,
133
+ null,
134
+ null,
135
+ null,
136
+ null,
137
+ null
138
+ ]
139
+ ],
140
+ "nlg_web_nlg": [
141
+ {
142
+ "rougeL": {
143
+ "precision": 0.5868617102915145,
144
+ "recall": 0.555414080511597,
145
+ "fmeasure": 0.5649414515302166
146
+ },
147
+ "rouge1": {
148
+ "precision": 0.7637758306213537,
149
+ "recall": 0.7244423652811981,
150
+ "fmeasure": 0.7361209162883644
151
+ },
152
+ "rouge2": {
153
+ "precision": 0.4879895284862594,
154
+ "recall": 0.4628162316455768,
155
+ "fmeasure": 0.47002322991445167
156
+ }
157
+ },
158
+ [
159
+ null,
160
+ null,
161
+ null,
162
+ null,
163
+ null,
164
+ null,
165
+ null,
166
+ null,
167
+ null,
168
+ null,
169
+ null,
170
+ null,
171
+ null,
172
+ null,
173
+ null,
174
+ null,
175
+ null,
176
+ null,
177
+ null,
178
+ null,
179
+ null,
180
+ null,
181
+ null,
182
+ null,
183
+ null,
184
+ null,
185
+ null,
186
+ null,
187
+ null,
188
+ null,
189
+ null,
190
+ null,
191
+ null,
192
+ null,
193
+ null,
194
+ null,
195
+ null,
196
+ null,
197
+ null,
198
+ null,
199
+ null,
200
+ null,
201
+ null,
202
+ null,
203
+ null,
204
+ null,
205
+ null,
206
+ null,
207
+ null,
208
+ null,
209
+ null,
210
+ null,
211
+ null,
212
+ null,
213
+ null,
214
+ null,
215
+ null,
216
+ null,
217
+ null,
218
+ null,
219
+ null,
220
+ null,
221
+ null,
222
+ null,
223
+ null,
224
+ null,
225
+ null,
226
+ null,
227
+ null,
228
+ null,
229
+ null,
230
+ null,
231
+ null,
232
+ null,
233
+ null,
234
+ null,
235
+ null,
236
+ null,
237
+ null,
238
+ null,
239
+ null,
240
+ null,
241
+ null,
242
+ null,
243
+ null,
244
+ null,
245
+ null,
246
+ null,
247
+ null,
248
+ null,
249
+ null,
250
+ null,
251
+ null,
252
+ null,
253
+ null,
254
+ null,
255
+ null,
256
+ null,
257
+ null,
258
+ null,
259
+ null,
260
+ null,
261
+ null,
262
+ null,
263
+ null,
264
+ null,
265
+ null,
266
+ null,
267
+ null,
268
+ null,
269
+ null,
270
+ null
271
+ ]
272
+ ],
273
+ "nlg_samsum": [
274
+ {
275
+ "rougeL": {
276
+ "precision": 0.49804943588324846,
277
+ "recall": 0.40078467768304066,
278
+ "fmeasure": 0.4208985426291896
279
+ },
280
+ "rouge1": {
281
+ "precision": 0.5931447491247541,
282
+ "recall": 0.47793397368524376,
283
+ "fmeasure": 0.5015238367923469
284
+ },
285
+ "rouge2": {
286
+ "precision": 0.30716322766601073,
287
+ "recall": 0.24605675599787333,
288
+ "fmeasure": 0.25841659934082817
289
+ }
290
+ },
291
+ [
292
+ null,
293
+ null,
294
+ null,
295
+ null,
296
+ null,
297
+ null,
298
+ null,
299
+ null,
300
+ null,
301
+ null,
302
+ null,
303
+ null,
304
+ null,
305
+ null,
306
+ null,
307
+ null,
308
+ null,
309
+ null,
310
+ null,
311
+ null,
312
+ null,
313
+ null,
314
+ null,
315
+ null,
316
+ null,
317
+ null,
318
+ null,
319
+ null,
320
+ null,
321
+ null,
322
+ null,
323
+ null,
324
+ null,
325
+ null,
326
+ null,
327
+ null,
328
+ null,
329
+ null,
330
+ null,
331
+ null,
332
+ null,
333
+ null,
334
+ null,
335
+ null,
336
+ null,
337
+ null,
338
+ null,
339
+ null,
340
+ null,
341
+ null,
342
+ null,
343
+ null
344
+ ]
345
+ ],
346
+ "summary": {
347
+ "precision": 0.5647761461576035,
348
+ "recall": 0.5519065582340063,
349
+ "fmeasure": 0.5478005596574774
350
+ }
351
+ },
352
+ "energy": {
353
+ "total": 1754635.7367900002,
354
+ "train": 1341910.9901,
355
+ "eval": 412724.74669000006
356
+ },
357
+ "train_energy": 1341910.9901,
358
+ "eval_energy": 412724.74669000006
359
+ }
summary.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "flops": {
3
+ "eval": 107584127351353600,
4
+ "train": 2.4337468166316173e+17,
5
+ "total": 3.509588090145153e+17
6
+ },
7
+ "total": {
8
+ "total": 1754635.7367900002,
9
+ "train": 1341910.9901,
10
+ "eval": 412724.74669000006
11
+ },
12
+ "best_evals": {
13
+ "pplx": {
14
+ "score": 6.12011772775794,
15
+ "step": 23526
16
+ },
17
+ "rougel": {
18
+ "precision": 0.5647761461576035,
19
+ "recall": 0.5519065582340063,
20
+ "fmeasure": 0.5478005596574774
21
+ }
22
+ }
23
+ }