Popipopi93 commited on
Commit
a0aee3b
·
verified ·
1 Parent(s): 7603597

Upload folder using huggingface_hub

Browse files
README-template.md ADDED
@@ -0,0 +1,95 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: google/vit-base-patch16-224
4
+ tags:
5
+ - Image Regression
6
+ datasets:
7
+ - "-"
8
+ metrics:
9
+ - accuracy
10
+ model-index:
11
+ - name: "-"
12
+ results: []
13
+ ---
14
+
15
+ # Title
16
+ ## Image Regression Model
17
+
18
+ This model was trained with [Image Regression Model Trainer](https://github.com/TonyAssi/ImageRegression/tree/main). It takes an image as input and outputs a float value.
19
+
20
+ ```python
21
+ from ImageRegression import predict
22
+ predict(repo_id='-',image_path='image.jpg')
23
+ ```
24
+
25
+ ---
26
+
27
+ ## Dataset
28
+ Dataset:\
29
+ Value Column:\
30
+ Train Test Split:
31
+
32
+ ---
33
+
34
+ ## Training
35
+ Base Model: [google/vit-base-patch16-224](https://huggingface.co/google/vit-base-patch16-224)\
36
+ Epochs:\
37
+ Learning Rate:
38
+
39
+ ---
40
+
41
+ ## Usage
42
+
43
+ ### Download
44
+ ```bash
45
+ git clone https://github.com/TonyAssi/ImageRegression.git
46
+ cd ImageRegression
47
+ ```
48
+
49
+ ### Installation
50
+ ```bash
51
+ pip install -r requirements.txt
52
+ ```
53
+
54
+ ### Import
55
+ ```python
56
+ from ImageRegression import train_model, upload_model, predict
57
+ ```
58
+
59
+ ### Inference (Prediction)
60
+ - **repo_id** 🤗 repo id of the model
61
+ - **image_path** path to image
62
+ ```python
63
+ predict(repo_id='-',
64
+ image_path='image.jpg')
65
+ ```
66
+ The first time this function is called it'll download the safetensor model. Subsequent function calls will run faster.
67
+
68
+ ### Train Model
69
+ - **dataset_id** 🤗 dataset id
70
+ - **value_column_name** column name of prediction values in dataset
71
+ - **test_split** test split of the train/test split
72
+ - **output_dir** the directory where the checkpoints will be saved
73
+ - **num_train_epochs** training epochs
74
+ - **learning_rate** learning rate
75
+ ```python
76
+ train_model(dataset_id='-',
77
+ value_column_name='-',
78
+ test_split=-,
79
+ output_dir='./results',
80
+ num_train_epochs=-,
81
+ learning_rate=-)
82
+
83
+ ```
84
+ The trainer will save the checkpoints in the output_dir location. The model.safetensors are the trained weights you'll use for inference (predicton).
85
+
86
+ ### Upload Model
87
+ This function will upload your model to the 🤗 Hub.
88
+ - **model_id** the name of the model id
89
+ - **token** go [here](https://huggingface.co/settings/tokens) to create a new 🤗 token
90
+ - **checkpoint_dir** checkpoint folder that will be uploaded
91
+ ```python
92
+ upload_model(model_id='-',
93
+ token='YOUR_HF_TOKEN',
94
+ checkpoint_dir='./results/checkpoint-940')
95
+ ```
README.md ADDED
@@ -0,0 +1,95 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: google/vit-base-patch16-224
4
+ tags:
5
+ - Image Regression
6
+ datasets:
7
+ - "Popipopi93/bottle_finder"
8
+ metrics:
9
+ - accuracy
10
+ model-index:
11
+ - name: "model_colab_20_bis"
12
+ results: []
13
+ ---
14
+
15
+ # model_colab_20_bis
16
+ ## Image Regression Model
17
+
18
+ This model was trained with [Image Regression Model Trainer](https://github.com/TonyAssi/ImageRegression/tree/main). It takes an image as input and outputs a float value.
19
+
20
+ ```python
21
+ from ImageRegression import predict
22
+ predict(repo_id='Popipopi93/model_colab_20_bis',image_path='image.jpg')
23
+ ```
24
+
25
+ ---
26
+
27
+ ## Dataset
28
+ Dataset: Popipopi93/bottle_finder\
29
+ Value Column: 'level'\
30
+ Train Test Split: 0.1
31
+
32
+ ---
33
+
34
+ ## Training
35
+ Base Model: [google/vit-base-patch16-224](https://huggingface.co/google/vit-base-patch16-224)\
36
+ Epochs: 20\
37
+ Learning Rate: 0.0001
38
+
39
+ ---
40
+
41
+ ## Usage
42
+
43
+ ### Download
44
+ ```bash
45
+ git clone https://github.com/TonyAssi/ImageRegression.git
46
+ cd ImageRegression
47
+ ```
48
+
49
+ ### Installation
50
+ ```bash
51
+ pip install -r requirements.txt
52
+ ```
53
+
54
+ ### Import
55
+ ```python
56
+ from ImageRegression import train_model, upload_model, predict
57
+ ```
58
+
59
+ ### Inference (Prediction)
60
+ - **repo_id** 🤗 repo id of the model
61
+ - **image_path** path to image
62
+ ```python
63
+ predict(repo_id='Popipopi93/model_colab_20_bis',
64
+ image_path='image.jpg')
65
+ ```
66
+ The first time this function is called it'll download the safetensor model. Subsequent function calls will run faster.
67
+
68
+ ### Train Model
69
+ - **dataset_id** 🤗 dataset id
70
+ - **value_column_name** column name of prediction values in dataset
71
+ - **test_split** test split of the train/test split
72
+ - **output_dir** the directory where the checkpoints will be saved
73
+ - **num_train_epochs** training epochs
74
+ - **learning_rate** learning rate
75
+ ```python
76
+ train_model(dataset_id='Popipopi93/bottle_finder',
77
+ value_column_name='level',
78
+ test_split=0.1,
79
+ output_dir='./results',
80
+ num_train_epochs=20,
81
+ learning_rate=0.0001)
82
+
83
+ ```
84
+ The trainer will save the checkpoints in the output_dir location. The model.safetensors are the trained weights you'll use for inference (predicton).
85
+
86
+ ### Upload Model
87
+ This function will upload your model to the 🤗 Hub.
88
+ - **model_id** the name of the model id
89
+ - **token** go [here](https://huggingface.co/settings/tokens) to create a new 🤗 token
90
+ - **checkpoint_dir** checkpoint folder that will be uploaded
91
+ ```python
92
+ upload_model(model_id='model_colab_20_bis',
93
+ token='YOUR_HF_TOKEN',
94
+ checkpoint_dir='./results/checkpoint-940')
95
+ ```
metadata.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "dataset_id": "Popipopi93/bottle_finder",
3
+ "value_column_name": "level",
4
+ "test_split": 0.1,
5
+ "num_train_epochs": 20,
6
+ "learning_rate": 0.0001,
7
+ "max_value": 1.0
8
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5779d24e8ada28193263577cce44f8ebaa0cdc473edb1f03fac9598de9e7ce98
3
+ size 345583444
optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:66202774450a90a2018b51401996dd5da972086797d026eebc5871b567690edb
3
+ size 686562746
rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:289fef051795377d49c655aebabbd2a059abf20869cd8ff3db6589fc9191aaf7
3
+ size 14244
scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5c3e81fd065498afdffbac649d956b8dfda003ccf6ce8894b8c036e844497086
3
+ size 1064
trainer_state.json ADDED
@@ -0,0 +1,611 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": null,
3
+ "best_metric": null,
4
+ "best_model_checkpoint": null,
5
+ "epoch": 33.333333333333336,
6
+ "eval_steps": 500,
7
+ "global_step": 400,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.8333333333333334,
14
+ "grad_norm": 13.07447624206543,
15
+ "learning_rate": 9.916666666666667e-05,
16
+ "loss": 0.56,
17
+ "step": 10
18
+ },
19
+ {
20
+ "epoch": 1.0,
21
+ "eval_loss": 0.025616448372602463,
22
+ "eval_mse": 0.025616448372602463,
23
+ "eval_runtime": 1.0225,
24
+ "eval_samples_per_second": 9.78,
25
+ "eval_steps_per_second": 1.956,
26
+ "step": 12
27
+ },
28
+ {
29
+ "epoch": 1.6666666666666665,
30
+ "grad_norm": 10.742788314819336,
31
+ "learning_rate": 9.833333333333333e-05,
32
+ "loss": 0.0214,
33
+ "step": 20
34
+ },
35
+ {
36
+ "epoch": 2.0,
37
+ "eval_loss": 0.07678178697824478,
38
+ "eval_mse": 0.07678178697824478,
39
+ "eval_runtime": 1.6318,
40
+ "eval_samples_per_second": 6.128,
41
+ "eval_steps_per_second": 1.226,
42
+ "step": 24
43
+ },
44
+ {
45
+ "epoch": 2.5,
46
+ "grad_norm": 11.96033763885498,
47
+ "learning_rate": 9.75e-05,
48
+ "loss": 0.0595,
49
+ "step": 30
50
+ },
51
+ {
52
+ "epoch": 3.0,
53
+ "eval_loss": 0.0654543861746788,
54
+ "eval_mse": 0.0654543861746788,
55
+ "eval_runtime": 0.9333,
56
+ "eval_samples_per_second": 10.715,
57
+ "eval_steps_per_second": 2.143,
58
+ "step": 36
59
+ },
60
+ {
61
+ "epoch": 3.3333333333333335,
62
+ "grad_norm": 15.946290016174316,
63
+ "learning_rate": 9.666666666666667e-05,
64
+ "loss": 0.0701,
65
+ "step": 40
66
+ },
67
+ {
68
+ "epoch": 4.0,
69
+ "eval_loss": 0.016676615923643112,
70
+ "eval_mse": 0.016676615923643112,
71
+ "eval_runtime": 0.9363,
72
+ "eval_samples_per_second": 10.68,
73
+ "eval_steps_per_second": 2.136,
74
+ "step": 48
75
+ },
76
+ {
77
+ "epoch": 4.166666666666667,
78
+ "grad_norm": 2.655425786972046,
79
+ "learning_rate": 9.583333333333334e-05,
80
+ "loss": 0.0557,
81
+ "step": 50
82
+ },
83
+ {
84
+ "epoch": 5.0,
85
+ "grad_norm": 2.651514768600464,
86
+ "learning_rate": 9.5e-05,
87
+ "loss": 0.0163,
88
+ "step": 60
89
+ },
90
+ {
91
+ "epoch": 5.0,
92
+ "eval_loss": 0.017047178000211716,
93
+ "eval_mse": 0.017047178000211716,
94
+ "eval_runtime": 0.9851,
95
+ "eval_samples_per_second": 10.151,
96
+ "eval_steps_per_second": 2.03,
97
+ "step": 60
98
+ },
99
+ {
100
+ "epoch": 5.833333333333333,
101
+ "grad_norm": 0.5332293510437012,
102
+ "learning_rate": 9.416666666666667e-05,
103
+ "loss": 0.0098,
104
+ "step": 70
105
+ },
106
+ {
107
+ "epoch": 6.0,
108
+ "eval_loss": 0.010980404913425446,
109
+ "eval_mse": 0.01098040584474802,
110
+ "eval_runtime": 0.9298,
111
+ "eval_samples_per_second": 10.756,
112
+ "eval_steps_per_second": 2.151,
113
+ "step": 72
114
+ },
115
+ {
116
+ "epoch": 6.666666666666667,
117
+ "grad_norm": 8.73061752319336,
118
+ "learning_rate": 9.333333333333334e-05,
119
+ "loss": 0.0176,
120
+ "step": 80
121
+ },
122
+ {
123
+ "epoch": 7.0,
124
+ "eval_loss": 0.057986367493867874,
125
+ "eval_mse": 0.057986367493867874,
126
+ "eval_runtime": 1.6366,
127
+ "eval_samples_per_second": 6.11,
128
+ "eval_steps_per_second": 1.222,
129
+ "step": 84
130
+ },
131
+ {
132
+ "epoch": 7.5,
133
+ "grad_norm": 5.609442234039307,
134
+ "learning_rate": 9.250000000000001e-05,
135
+ "loss": 0.0213,
136
+ "step": 90
137
+ },
138
+ {
139
+ "epoch": 8.0,
140
+ "eval_loss": 0.009501439519226551,
141
+ "eval_mse": 0.009501439519226551,
142
+ "eval_runtime": 0.9348,
143
+ "eval_samples_per_second": 10.697,
144
+ "eval_steps_per_second": 2.139,
145
+ "step": 96
146
+ },
147
+ {
148
+ "epoch": 8.333333333333334,
149
+ "grad_norm": 1.0403064489364624,
150
+ "learning_rate": 9.166666666666667e-05,
151
+ "loss": 0.009,
152
+ "step": 100
153
+ },
154
+ {
155
+ "epoch": 9.0,
156
+ "eval_loss": 0.015423273667693138,
157
+ "eval_mse": 0.015423273667693138,
158
+ "eval_runtime": 0.9222,
159
+ "eval_samples_per_second": 10.843,
160
+ "eval_steps_per_second": 2.169,
161
+ "step": 108
162
+ },
163
+ {
164
+ "epoch": 9.166666666666666,
165
+ "grad_norm": 4.6708173751831055,
166
+ "learning_rate": 9.083333333333334e-05,
167
+ "loss": 0.0059,
168
+ "step": 110
169
+ },
170
+ {
171
+ "epoch": 10.0,
172
+ "grad_norm": 3.153209686279297,
173
+ "learning_rate": 9e-05,
174
+ "loss": 0.0076,
175
+ "step": 120
176
+ },
177
+ {
178
+ "epoch": 10.0,
179
+ "eval_loss": 0.013172024860978127,
180
+ "eval_mse": 0.013172025792300701,
181
+ "eval_runtime": 1.0158,
182
+ "eval_samples_per_second": 9.844,
183
+ "eval_steps_per_second": 1.969,
184
+ "step": 120
185
+ },
186
+ {
187
+ "epoch": 10.833333333333334,
188
+ "grad_norm": 5.980560302734375,
189
+ "learning_rate": 8.916666666666667e-05,
190
+ "loss": 0.0085,
191
+ "step": 130
192
+ },
193
+ {
194
+ "epoch": 11.0,
195
+ "eval_loss": 0.014298426918685436,
196
+ "eval_mse": 0.014298425987362862,
197
+ "eval_runtime": 0.943,
198
+ "eval_samples_per_second": 10.604,
199
+ "eval_steps_per_second": 2.121,
200
+ "step": 132
201
+ },
202
+ {
203
+ "epoch": 11.666666666666666,
204
+ "grad_norm": 0.19257651269435883,
205
+ "learning_rate": 8.833333333333333e-05,
206
+ "loss": 0.0057,
207
+ "step": 140
208
+ },
209
+ {
210
+ "epoch": 12.0,
211
+ "eval_loss": 0.00729939341545105,
212
+ "eval_mse": 0.00729939341545105,
213
+ "eval_runtime": 1.485,
214
+ "eval_samples_per_second": 6.734,
215
+ "eval_steps_per_second": 1.347,
216
+ "step": 144
217
+ },
218
+ {
219
+ "epoch": 12.5,
220
+ "grad_norm": 0.6362859606742859,
221
+ "learning_rate": 8.75e-05,
222
+ "loss": 0.0011,
223
+ "step": 150
224
+ },
225
+ {
226
+ "epoch": 13.0,
227
+ "eval_loss": 0.0055463844910264015,
228
+ "eval_mse": 0.0055463844910264015,
229
+ "eval_runtime": 0.9257,
230
+ "eval_samples_per_second": 10.802,
231
+ "eval_steps_per_second": 2.16,
232
+ "step": 156
233
+ },
234
+ {
235
+ "epoch": 13.333333333333334,
236
+ "grad_norm": 1.1605561971664429,
237
+ "learning_rate": 8.666666666666667e-05,
238
+ "loss": 0.0009,
239
+ "step": 160
240
+ },
241
+ {
242
+ "epoch": 14.0,
243
+ "eval_loss": 0.006429512985050678,
244
+ "eval_mse": 0.006429512985050678,
245
+ "eval_runtime": 1.2335,
246
+ "eval_samples_per_second": 8.107,
247
+ "eval_steps_per_second": 1.621,
248
+ "step": 168
249
+ },
250
+ {
251
+ "epoch": 14.166666666666666,
252
+ "grad_norm": 0.21375474333763123,
253
+ "learning_rate": 8.583333333333334e-05,
254
+ "loss": 0.0008,
255
+ "step": 170
256
+ },
257
+ {
258
+ "epoch": 15.0,
259
+ "grad_norm": 1.4405007362365723,
260
+ "learning_rate": 8.5e-05,
261
+ "loss": 0.0009,
262
+ "step": 180
263
+ },
264
+ {
265
+ "epoch": 15.0,
266
+ "eval_loss": 0.005366006400436163,
267
+ "eval_mse": 0.005366006400436163,
268
+ "eval_runtime": 1.8814,
269
+ "eval_samples_per_second": 5.315,
270
+ "eval_steps_per_second": 1.063,
271
+ "step": 180
272
+ },
273
+ {
274
+ "epoch": 15.833333333333334,
275
+ "grad_norm": 1.5905839204788208,
276
+ "learning_rate": 8.416666666666668e-05,
277
+ "loss": 0.0017,
278
+ "step": 190
279
+ },
280
+ {
281
+ "epoch": 16.0,
282
+ "eval_loss": 0.005547891370952129,
283
+ "eval_mse": 0.005547891836613417,
284
+ "eval_runtime": 0.9239,
285
+ "eval_samples_per_second": 10.824,
286
+ "eval_steps_per_second": 2.165,
287
+ "step": 192
288
+ },
289
+ {
290
+ "epoch": 16.666666666666668,
291
+ "grad_norm": 1.6494227647781372,
292
+ "learning_rate": 8.333333333333334e-05,
293
+ "loss": 0.0029,
294
+ "step": 200
295
+ },
296
+ {
297
+ "epoch": 17.0,
298
+ "eval_loss": 0.005408396478742361,
299
+ "eval_mse": 0.005408396478742361,
300
+ "eval_runtime": 0.9668,
301
+ "eval_samples_per_second": 10.343,
302
+ "eval_steps_per_second": 2.069,
303
+ "step": 204
304
+ },
305
+ {
306
+ "epoch": 17.5,
307
+ "grad_norm": 0.61334627866745,
308
+ "learning_rate": 8.25e-05,
309
+ "loss": 0.0031,
310
+ "step": 210
311
+ },
312
+ {
313
+ "epoch": 18.0,
314
+ "eval_loss": 0.007544847670942545,
315
+ "eval_mse": 0.007544847670942545,
316
+ "eval_runtime": 0.9278,
317
+ "eval_samples_per_second": 10.778,
318
+ "eval_steps_per_second": 2.156,
319
+ "step": 216
320
+ },
321
+ {
322
+ "epoch": 18.333333333333332,
323
+ "grad_norm": 0.8942199349403381,
324
+ "learning_rate": 8.166666666666667e-05,
325
+ "loss": 0.0015,
326
+ "step": 220
327
+ },
328
+ {
329
+ "epoch": 19.0,
330
+ "eval_loss": 0.008514616638422012,
331
+ "eval_mse": 0.008514615707099438,
332
+ "eval_runtime": 1.6517,
333
+ "eval_samples_per_second": 6.055,
334
+ "eval_steps_per_second": 1.211,
335
+ "step": 228
336
+ },
337
+ {
338
+ "epoch": 19.166666666666668,
339
+ "grad_norm": 1.6490943431854248,
340
+ "learning_rate": 8.083333333333334e-05,
341
+ "loss": 0.0015,
342
+ "step": 230
343
+ },
344
+ {
345
+ "epoch": 20.0,
346
+ "grad_norm": 1.4641326665878296,
347
+ "learning_rate": 8e-05,
348
+ "loss": 0.0014,
349
+ "step": 240
350
+ },
351
+ {
352
+ "epoch": 20.0,
353
+ "eval_loss": 0.008396068587899208,
354
+ "eval_mse": 0.008396068587899208,
355
+ "eval_runtime": 1.058,
356
+ "eval_samples_per_second": 9.452,
357
+ "eval_steps_per_second": 1.89,
358
+ "step": 240
359
+ },
360
+ {
361
+ "epoch": 20.833333333333332,
362
+ "grad_norm": 2.196040153503418,
363
+ "learning_rate": 7.916666666666666e-05,
364
+ "loss": 0.0018,
365
+ "step": 250
366
+ },
367
+ {
368
+ "epoch": 21.0,
369
+ "eval_loss": 0.008127102628350258,
370
+ "eval_mse": 0.008127102628350258,
371
+ "eval_runtime": 0.9631,
372
+ "eval_samples_per_second": 10.383,
373
+ "eval_steps_per_second": 2.077,
374
+ "step": 252
375
+ },
376
+ {
377
+ "epoch": 21.666666666666668,
378
+ "grad_norm": 1.2884770631790161,
379
+ "learning_rate": 7.833333333333333e-05,
380
+ "loss": 0.0021,
381
+ "step": 260
382
+ },
383
+ {
384
+ "epoch": 22.0,
385
+ "eval_loss": 0.007328727748245001,
386
+ "eval_mse": 0.007328727748245001,
387
+ "eval_runtime": 0.9238,
388
+ "eval_samples_per_second": 10.824,
389
+ "eval_steps_per_second": 2.165,
390
+ "step": 264
391
+ },
392
+ {
393
+ "epoch": 22.5,
394
+ "grad_norm": 0.9456672668457031,
395
+ "learning_rate": 7.75e-05,
396
+ "loss": 0.0008,
397
+ "step": 270
398
+ },
399
+ {
400
+ "epoch": 23.0,
401
+ "eval_loss": 0.004704700317233801,
402
+ "eval_mse": 0.004704700317233801,
403
+ "eval_runtime": 0.9456,
404
+ "eval_samples_per_second": 10.576,
405
+ "eval_steps_per_second": 2.115,
406
+ "step": 276
407
+ },
408
+ {
409
+ "epoch": 23.333333333333332,
410
+ "grad_norm": 0.35770225524902344,
411
+ "learning_rate": 7.666666666666667e-05,
412
+ "loss": 0.0006,
413
+ "step": 280
414
+ },
415
+ {
416
+ "epoch": 24.0,
417
+ "eval_loss": 0.004532460123300552,
418
+ "eval_mse": 0.004532460123300552,
419
+ "eval_runtime": 0.9918,
420
+ "eval_samples_per_second": 10.083,
421
+ "eval_steps_per_second": 2.017,
422
+ "step": 288
423
+ },
424
+ {
425
+ "epoch": 24.166666666666668,
426
+ "grad_norm": 1.7567228078842163,
427
+ "learning_rate": 7.583333333333334e-05,
428
+ "loss": 0.0006,
429
+ "step": 290
430
+ },
431
+ {
432
+ "epoch": 25.0,
433
+ "grad_norm": 0.47638174891471863,
434
+ "learning_rate": 7.500000000000001e-05,
435
+ "loss": 0.0007,
436
+ "step": 300
437
+ },
438
+ {
439
+ "epoch": 25.0,
440
+ "eval_loss": 0.00664373766630888,
441
+ "eval_mse": 0.00664373766630888,
442
+ "eval_runtime": 1.874,
443
+ "eval_samples_per_second": 5.336,
444
+ "eval_steps_per_second": 1.067,
445
+ "step": 300
446
+ },
447
+ {
448
+ "epoch": 25.833333333333332,
449
+ "grad_norm": 2.5677366256713867,
450
+ "learning_rate": 7.416666666666668e-05,
451
+ "loss": 0.0017,
452
+ "step": 310
453
+ },
454
+ {
455
+ "epoch": 26.0,
456
+ "eval_loss": 0.007896892726421356,
457
+ "eval_mse": 0.007896892726421356,
458
+ "eval_runtime": 0.9578,
459
+ "eval_samples_per_second": 10.441,
460
+ "eval_steps_per_second": 2.088,
461
+ "step": 312
462
+ },
463
+ {
464
+ "epoch": 26.666666666666668,
465
+ "grad_norm": 0.6687202453613281,
466
+ "learning_rate": 7.333333333333333e-05,
467
+ "loss": 0.0006,
468
+ "step": 320
469
+ },
470
+ {
471
+ "epoch": 27.0,
472
+ "eval_loss": 0.00685582309961319,
473
+ "eval_mse": 0.00685582309961319,
474
+ "eval_runtime": 0.9377,
475
+ "eval_samples_per_second": 10.664,
476
+ "eval_steps_per_second": 2.133,
477
+ "step": 324
478
+ },
479
+ {
480
+ "epoch": 27.5,
481
+ "grad_norm": 1.1073424816131592,
482
+ "learning_rate": 7.25e-05,
483
+ "loss": 0.0007,
484
+ "step": 330
485
+ },
486
+ {
487
+ "epoch": 28.0,
488
+ "eval_loss": 0.006450907792896032,
489
+ "eval_mse": 0.006450907792896032,
490
+ "eval_runtime": 0.9419,
491
+ "eval_samples_per_second": 10.617,
492
+ "eval_steps_per_second": 2.123,
493
+ "step": 336
494
+ },
495
+ {
496
+ "epoch": 28.333333333333332,
497
+ "grad_norm": 0.6167001128196716,
498
+ "learning_rate": 7.166666666666667e-05,
499
+ "loss": 0.0004,
500
+ "step": 340
501
+ },
502
+ {
503
+ "epoch": 29.0,
504
+ "eval_loss": 0.005170729011297226,
505
+ "eval_mse": 0.005170729476958513,
506
+ "eval_runtime": 0.9258,
507
+ "eval_samples_per_second": 10.801,
508
+ "eval_steps_per_second": 2.16,
509
+ "step": 348
510
+ },
511
+ {
512
+ "epoch": 29.166666666666668,
513
+ "grad_norm": 0.26554998755455017,
514
+ "learning_rate": 7.083333333333334e-05,
515
+ "loss": 0.0004,
516
+ "step": 350
517
+ },
518
+ {
519
+ "epoch": 30.0,
520
+ "grad_norm": 0.755969762802124,
521
+ "learning_rate": 7e-05,
522
+ "loss": 0.0003,
523
+ "step": 360
524
+ },
525
+ {
526
+ "epoch": 30.0,
527
+ "eval_loss": 0.005034693516790867,
528
+ "eval_mse": 0.005034693516790867,
529
+ "eval_runtime": 1.7333,
530
+ "eval_samples_per_second": 5.769,
531
+ "eval_steps_per_second": 1.154,
532
+ "step": 360
533
+ },
534
+ {
535
+ "epoch": 30.833333333333332,
536
+ "grad_norm": 0.7286393046379089,
537
+ "learning_rate": 6.916666666666666e-05,
538
+ "loss": 0.0004,
539
+ "step": 370
540
+ },
541
+ {
542
+ "epoch": 31.0,
543
+ "eval_loss": 0.0060684266500175,
544
+ "eval_mse": 0.0060684266500175,
545
+ "eval_runtime": 0.9883,
546
+ "eval_samples_per_second": 10.118,
547
+ "eval_steps_per_second": 2.024,
548
+ "step": 372
549
+ },
550
+ {
551
+ "epoch": 31.666666666666668,
552
+ "grad_norm": 1.2103056907653809,
553
+ "learning_rate": 6.833333333333333e-05,
554
+ "loss": 0.0006,
555
+ "step": 380
556
+ },
557
+ {
558
+ "epoch": 32.0,
559
+ "eval_loss": 0.005998858716338873,
560
+ "eval_mse": 0.005998858250677586,
561
+ "eval_runtime": 0.9499,
562
+ "eval_samples_per_second": 10.527,
563
+ "eval_steps_per_second": 2.105,
564
+ "step": 384
565
+ },
566
+ {
567
+ "epoch": 32.5,
568
+ "grad_norm": 0.4944589138031006,
569
+ "learning_rate": 6.750000000000001e-05,
570
+ "loss": 0.0006,
571
+ "step": 390
572
+ },
573
+ {
574
+ "epoch": 33.0,
575
+ "eval_loss": 0.006172865629196167,
576
+ "eval_mse": 0.006172865629196167,
577
+ "eval_runtime": 0.9416,
578
+ "eval_samples_per_second": 10.621,
579
+ "eval_steps_per_second": 2.124,
580
+ "step": 396
581
+ },
582
+ {
583
+ "epoch": 33.333333333333336,
584
+ "grad_norm": 1.0801491737365723,
585
+ "learning_rate": 6.666666666666667e-05,
586
+ "loss": 0.0006,
587
+ "step": 400
588
+ }
589
+ ],
590
+ "logging_steps": 10,
591
+ "max_steps": 1200,
592
+ "num_input_tokens_seen": 0,
593
+ "num_train_epochs": 100,
594
+ "save_steps": 10,
595
+ "stateful_callbacks": {
596
+ "TrainerControl": {
597
+ "args": {
598
+ "should_epoch_stop": false,
599
+ "should_evaluate": false,
600
+ "should_log": false,
601
+ "should_save": true,
602
+ "should_training_stop": false
603
+ },
604
+ "attributes": {}
605
+ }
606
+ },
607
+ "total_flos": 0.0,
608
+ "train_batch_size": 8,
609
+ "trial_name": null,
610
+ "trial_params": null
611
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2a4dba23053dac499d8d450e9b5f3e36297a28b03ccf516e82dff9469a0f364e
3
+ size 5304