2nzi commited on
Commit
aa4ff38
·
verified ·
1 Parent(s): 6d78b1d

Model save

Browse files
Files changed (3) hide show
  1. README.md +92 -0
  2. model.safetensors +1 -1
  3. trainer_state.json +2274 -0
README.md ADDED
@@ -0,0 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ base_model: MCG-NJU/videomae-base
4
+ tags:
5
+ - generated_from_trainer
6
+ metrics:
7
+ - accuracy
8
+ - f1
9
+ model-index:
10
+ - name: videomae-surf-analytics-runpod4
11
+ results: []
12
+ ---
13
+
14
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
+ should probably proofread and complete it, then remove this comment. -->
16
+
17
+ # videomae-surf-analytics-runpod4
18
+
19
+ This model is a fine-tuned version of [MCG-NJU/videomae-base](https://huggingface.co/MCG-NJU/videomae-base) on an unknown dataset.
20
+ It achieves the following results on the evaluation set:
21
+ - Loss: 0.7259
22
+ - Accuracy: 0.9016
23
+ - F1: 0.9021
24
+
25
+ ## Model description
26
+
27
+ More information needed
28
+
29
+ ## Intended uses & limitations
30
+
31
+ More information needed
32
+
33
+ ## Training and evaluation data
34
+
35
+ More information needed
36
+
37
+ ## Training procedure
38
+
39
+ ### Training hyperparameters
40
+
41
+ The following hyperparameters were used during training:
42
+ - learning_rate: 5e-05
43
+ - train_batch_size: 8
44
+ - eval_batch_size: 8
45
+ - seed: 42
46
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
47
+ - lr_scheduler_type: linear
48
+ - lr_scheduler_warmup_ratio: 0.1
49
+ - training_steps: 2760
50
+
51
+ ### Training results
52
+
53
+ | Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 |
54
+ |:-------------:|:-------:|:----:|:---------------:|:--------:|:------:|
55
+ | 1.2463 | 0.0337 | 93 | 1.2389 | 0.4344 | 0.2949 |
56
+ | 1.024 | 1.0337 | 186 | 1.0343 | 0.5492 | 0.5254 |
57
+ | 0.6932 | 2.0337 | 279 | 0.8280 | 0.6639 | 0.6360 |
58
+ | 0.4467 | 3.0337 | 372 | 0.7665 | 0.7459 | 0.7338 |
59
+ | 0.4449 | 4.0337 | 465 | 0.8715 | 0.7131 | 0.6741 |
60
+ | 0.1371 | 5.0337 | 558 | 1.0560 | 0.7295 | 0.7158 |
61
+ | 0.1789 | 6.0337 | 651 | 0.8218 | 0.7869 | 0.7877 |
62
+ | 0.2125 | 7.0337 | 744 | 0.7612 | 0.7869 | 0.7812 |
63
+ | 0.1561 | 8.0337 | 837 | 0.6051 | 0.8525 | 0.8498 |
64
+ | 0.2297 | 9.0337 | 930 | 0.6321 | 0.8770 | 0.8766 |
65
+ | 0.0692 | 10.0337 | 1023 | 0.7128 | 0.8443 | 0.8455 |
66
+ | 0.0495 | 11.0337 | 1116 | 0.7738 | 0.8361 | 0.8353 |
67
+ | 0.1059 | 12.0337 | 1209 | 0.6213 | 0.8525 | 0.8524 |
68
+ | 0.1672 | 13.0337 | 1302 | 0.7888 | 0.8443 | 0.8409 |
69
+ | 0.0178 | 14.0337 | 1395 | 0.6488 | 0.8689 | 0.8658 |
70
+ | 0.0165 | 15.0337 | 1488 | 0.6845 | 0.8770 | 0.8773 |
71
+ | 0.0166 | 16.0337 | 1581 | 0.8649 | 0.8525 | 0.8445 |
72
+ | 0.0014 | 17.0337 | 1674 | 0.7866 | 0.8525 | 0.8516 |
73
+ | 0.0473 | 18.0337 | 1767 | 0.6390 | 0.8770 | 0.8776 |
74
+ | 0.0441 | 19.0337 | 1860 | 0.8235 | 0.8361 | 0.8342 |
75
+ | 0.0006 | 20.0337 | 1953 | 0.6014 | 0.8852 | 0.8856 |
76
+ | 0.0005 | 21.0337 | 2046 | 0.7581 | 0.8689 | 0.8672 |
77
+ | 0.0032 | 22.0337 | 2139 | 0.6454 | 0.8770 | 0.8772 |
78
+ | 0.0565 | 23.0337 | 2232 | 0.8096 | 0.8525 | 0.8542 |
79
+ | 0.011 | 24.0337 | 2325 | 0.6807 | 0.8852 | 0.8858 |
80
+ | 0.0146 | 25.0337 | 2418 | 0.7754 | 0.8689 | 0.8696 |
81
+ | 0.0004 | 26.0337 | 2511 | 0.7246 | 0.8852 | 0.8857 |
82
+ | 0.0004 | 27.0337 | 2604 | 0.7165 | 0.8934 | 0.8942 |
83
+ | 0.0003 | 28.0337 | 2697 | 0.7232 | 0.9016 | 0.9021 |
84
+ | 0.0177 | 29.0228 | 2760 | 0.7259 | 0.9016 | 0.9021 |
85
+
86
+
87
+ ### Framework versions
88
+
89
+ - Transformers 4.41.2
90
+ - Pytorch 2.3.1+cu121
91
+ - Datasets 2.19.2
92
+ - Tokenizers 0.19.1
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:33ea6e5c683d9e75a7c0b01511a1752bf41fdb9a06a45d705e357ccd0b7368e0
3
  size 344943528
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c7a0d32932f833cdc9432629e3cdd01bd83ad86ece67a76f09ef1c2f71fbad53
3
  size 344943528
trainer_state.json ADDED
@@ -0,0 +1,2274 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.90210919178688,
3
+ "best_model_checkpoint": "videomae-surf-analytics-runpod4/checkpoint-2697",
4
+ "epoch": 29.02282608695652,
5
+ "eval_steps": 500,
6
+ "global_step": 2760,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.0036231884057971015,
13
+ "grad_norm": 9.726770401000977,
14
+ "learning_rate": 1.8115942028985508e-06,
15
+ "loss": 1.3861,
16
+ "step": 10
17
+ },
18
+ {
19
+ "epoch": 0.007246376811594203,
20
+ "grad_norm": 8.901580810546875,
21
+ "learning_rate": 3.6231884057971017e-06,
22
+ "loss": 1.3917,
23
+ "step": 20
24
+ },
25
+ {
26
+ "epoch": 0.010869565217391304,
27
+ "grad_norm": 5.0848612785339355,
28
+ "learning_rate": 5.4347826086956525e-06,
29
+ "loss": 1.3179,
30
+ "step": 30
31
+ },
32
+ {
33
+ "epoch": 0.014492753623188406,
34
+ "grad_norm": 8.99917221069336,
35
+ "learning_rate": 7.246376811594203e-06,
36
+ "loss": 1.2512,
37
+ "step": 40
38
+ },
39
+ {
40
+ "epoch": 0.018115942028985508,
41
+ "grad_norm": 11.607219696044922,
42
+ "learning_rate": 9.057971014492753e-06,
43
+ "loss": 1.2267,
44
+ "step": 50
45
+ },
46
+ {
47
+ "epoch": 0.021739130434782608,
48
+ "grad_norm": 6.150553226470947,
49
+ "learning_rate": 1.0869565217391305e-05,
50
+ "loss": 1.2952,
51
+ "step": 60
52
+ },
53
+ {
54
+ "epoch": 0.025362318840579712,
55
+ "grad_norm": 7.3294358253479,
56
+ "learning_rate": 1.2681159420289857e-05,
57
+ "loss": 1.0431,
58
+ "step": 70
59
+ },
60
+ {
61
+ "epoch": 0.028985507246376812,
62
+ "grad_norm": 4.637725830078125,
63
+ "learning_rate": 1.4492753623188407e-05,
64
+ "loss": 1.2039,
65
+ "step": 80
66
+ },
67
+ {
68
+ "epoch": 0.03260869565217391,
69
+ "grad_norm": 7.160071849822998,
70
+ "learning_rate": 1.630434782608696e-05,
71
+ "loss": 1.2463,
72
+ "step": 90
73
+ },
74
+ {
75
+ "epoch": 0.03369565217391304,
76
+ "eval_accuracy": 0.4344262295081967,
77
+ "eval_f1": 0.29487747520534413,
78
+ "eval_loss": 1.2388594150543213,
79
+ "eval_runtime": 42.3507,
80
+ "eval_samples_per_second": 2.881,
81
+ "eval_steps_per_second": 0.378,
82
+ "step": 93
83
+ },
84
+ {
85
+ "epoch": 1.002536231884058,
86
+ "grad_norm": 4.499807834625244,
87
+ "learning_rate": 1.8115942028985507e-05,
88
+ "loss": 1.1553,
89
+ "step": 100
90
+ },
91
+ {
92
+ "epoch": 1.0061594202898552,
93
+ "grad_norm": 5.520775318145752,
94
+ "learning_rate": 1.992753623188406e-05,
95
+ "loss": 1.1873,
96
+ "step": 110
97
+ },
98
+ {
99
+ "epoch": 1.0097826086956523,
100
+ "grad_norm": 7.460188388824463,
101
+ "learning_rate": 2.173913043478261e-05,
102
+ "loss": 1.0111,
103
+ "step": 120
104
+ },
105
+ {
106
+ "epoch": 1.0134057971014492,
107
+ "grad_norm": 7.1730875968933105,
108
+ "learning_rate": 2.355072463768116e-05,
109
+ "loss": 1.0118,
110
+ "step": 130
111
+ },
112
+ {
113
+ "epoch": 1.0170289855072463,
114
+ "grad_norm": 6.370150566101074,
115
+ "learning_rate": 2.5362318840579714e-05,
116
+ "loss": 0.9757,
117
+ "step": 140
118
+ },
119
+ {
120
+ "epoch": 1.0206521739130434,
121
+ "grad_norm": 12.998475074768066,
122
+ "learning_rate": 2.7173913043478262e-05,
123
+ "loss": 0.9456,
124
+ "step": 150
125
+ },
126
+ {
127
+ "epoch": 1.0242753623188405,
128
+ "grad_norm": 19.65399169921875,
129
+ "learning_rate": 2.8985507246376814e-05,
130
+ "loss": 1.0014,
131
+ "step": 160
132
+ },
133
+ {
134
+ "epoch": 1.0278985507246376,
135
+ "grad_norm": 11.76526927947998,
136
+ "learning_rate": 3.079710144927536e-05,
137
+ "loss": 1.0805,
138
+ "step": 170
139
+ },
140
+ {
141
+ "epoch": 1.0315217391304348,
142
+ "grad_norm": 8.409660339355469,
143
+ "learning_rate": 3.260869565217392e-05,
144
+ "loss": 1.024,
145
+ "step": 180
146
+ },
147
+ {
148
+ "epoch": 1.0336956521739131,
149
+ "eval_accuracy": 0.5491803278688525,
150
+ "eval_f1": 0.5254458656098001,
151
+ "eval_loss": 1.0342785120010376,
152
+ "eval_runtime": 42.2598,
153
+ "eval_samples_per_second": 2.887,
154
+ "eval_steps_per_second": 0.379,
155
+ "step": 186
156
+ },
157
+ {
158
+ "epoch": 2.0014492753623188,
159
+ "grad_norm": 8.813424110412598,
160
+ "learning_rate": 3.4420289855072465e-05,
161
+ "loss": 0.9424,
162
+ "step": 190
163
+ },
164
+ {
165
+ "epoch": 2.005072463768116,
166
+ "grad_norm": 18.248958587646484,
167
+ "learning_rate": 3.6231884057971014e-05,
168
+ "loss": 0.8009,
169
+ "step": 200
170
+ },
171
+ {
172
+ "epoch": 2.008695652173913,
173
+ "grad_norm": 9.742342948913574,
174
+ "learning_rate": 3.804347826086957e-05,
175
+ "loss": 0.6643,
176
+ "step": 210
177
+ },
178
+ {
179
+ "epoch": 2.0123188405797103,
180
+ "grad_norm": 10.787181854248047,
181
+ "learning_rate": 3.985507246376812e-05,
182
+ "loss": 1.0492,
183
+ "step": 220
184
+ },
185
+ {
186
+ "epoch": 2.0159420289855072,
187
+ "grad_norm": 10.376043319702148,
188
+ "learning_rate": 4.166666666666667e-05,
189
+ "loss": 0.6274,
190
+ "step": 230
191
+ },
192
+ {
193
+ "epoch": 2.0195652173913046,
194
+ "grad_norm": 8.305957794189453,
195
+ "learning_rate": 4.347826086956522e-05,
196
+ "loss": 0.7027,
197
+ "step": 240
198
+ },
199
+ {
200
+ "epoch": 2.0231884057971015,
201
+ "grad_norm": 13.777029991149902,
202
+ "learning_rate": 4.528985507246377e-05,
203
+ "loss": 0.6191,
204
+ "step": 250
205
+ },
206
+ {
207
+ "epoch": 2.0268115942028984,
208
+ "grad_norm": 10.486043930053711,
209
+ "learning_rate": 4.710144927536232e-05,
210
+ "loss": 0.8711,
211
+ "step": 260
212
+ },
213
+ {
214
+ "epoch": 2.0304347826086957,
215
+ "grad_norm": 5.285229682922363,
216
+ "learning_rate": 4.891304347826087e-05,
217
+ "loss": 0.6932,
218
+ "step": 270
219
+ },
220
+ {
221
+ "epoch": 2.033695652173913,
222
+ "eval_accuracy": 0.6639344262295082,
223
+ "eval_f1": 0.6359864440646367,
224
+ "eval_loss": 0.8279891610145569,
225
+ "eval_runtime": 36.4539,
226
+ "eval_samples_per_second": 3.347,
227
+ "eval_steps_per_second": 0.439,
228
+ "step": 279
229
+ },
230
+ {
231
+ "epoch": 3.0003623188405797,
232
+ "grad_norm": 7.227272033691406,
233
+ "learning_rate": 4.99194847020934e-05,
234
+ "loss": 0.8223,
235
+ "step": 280
236
+ },
237
+ {
238
+ "epoch": 3.003985507246377,
239
+ "grad_norm": 6.091291427612305,
240
+ "learning_rate": 4.9718196457326895e-05,
241
+ "loss": 0.7407,
242
+ "step": 290
243
+ },
244
+ {
245
+ "epoch": 3.007608695652174,
246
+ "grad_norm": 12.283949851989746,
247
+ "learning_rate": 4.9516908212560386e-05,
248
+ "loss": 0.5408,
249
+ "step": 300
250
+ },
251
+ {
252
+ "epoch": 3.011231884057971,
253
+ "grad_norm": 12.24530029296875,
254
+ "learning_rate": 4.9315619967793884e-05,
255
+ "loss": 0.5201,
256
+ "step": 310
257
+ },
258
+ {
259
+ "epoch": 3.014855072463768,
260
+ "grad_norm": 13.709362983703613,
261
+ "learning_rate": 4.911433172302738e-05,
262
+ "loss": 0.5266,
263
+ "step": 320
264
+ },
265
+ {
266
+ "epoch": 3.018478260869565,
267
+ "grad_norm": 8.324898719787598,
268
+ "learning_rate": 4.891304347826087e-05,
269
+ "loss": 0.5397,
270
+ "step": 330
271
+ },
272
+ {
273
+ "epoch": 3.0221014492753624,
274
+ "grad_norm": 13.807954788208008,
275
+ "learning_rate": 4.871175523349436e-05,
276
+ "loss": 0.5297,
277
+ "step": 340
278
+ },
279
+ {
280
+ "epoch": 3.0257246376811593,
281
+ "grad_norm": 12.566025733947754,
282
+ "learning_rate": 4.851046698872786e-05,
283
+ "loss": 0.4034,
284
+ "step": 350
285
+ },
286
+ {
287
+ "epoch": 3.0293478260869566,
288
+ "grad_norm": 15.83183479309082,
289
+ "learning_rate": 4.830917874396135e-05,
290
+ "loss": 0.4654,
291
+ "step": 360
292
+ },
293
+ {
294
+ "epoch": 3.0329710144927535,
295
+ "grad_norm": 1.7397154569625854,
296
+ "learning_rate": 4.810789049919485e-05,
297
+ "loss": 0.4467,
298
+ "step": 370
299
+ },
300
+ {
301
+ "epoch": 3.033695652173913,
302
+ "eval_accuracy": 0.7459016393442623,
303
+ "eval_f1": 0.7338377280393545,
304
+ "eval_loss": 0.7664637565612793,
305
+ "eval_runtime": 36.3739,
306
+ "eval_samples_per_second": 3.354,
307
+ "eval_steps_per_second": 0.44,
308
+ "step": 372
309
+ },
310
+ {
311
+ "epoch": 4.0028985507246375,
312
+ "grad_norm": 2.2322423458099365,
313
+ "learning_rate": 4.790660225442835e-05,
314
+ "loss": 0.3515,
315
+ "step": 380
316
+ },
317
+ {
318
+ "epoch": 4.006521739130434,
319
+ "grad_norm": 17.386234283447266,
320
+ "learning_rate": 4.770531400966184e-05,
321
+ "loss": 0.2997,
322
+ "step": 390
323
+ },
324
+ {
325
+ "epoch": 4.010144927536232,
326
+ "grad_norm": 11.770244598388672,
327
+ "learning_rate": 4.7504025764895335e-05,
328
+ "loss": 0.2963,
329
+ "step": 400
330
+ },
331
+ {
332
+ "epoch": 4.013768115942029,
333
+ "grad_norm": 17.996971130371094,
334
+ "learning_rate": 4.7302737520128826e-05,
335
+ "loss": 0.2062,
336
+ "step": 410
337
+ },
338
+ {
339
+ "epoch": 4.017391304347826,
340
+ "grad_norm": 0.812595009803772,
341
+ "learning_rate": 4.710144927536232e-05,
342
+ "loss": 0.1948,
343
+ "step": 420
344
+ },
345
+ {
346
+ "epoch": 4.021014492753623,
347
+ "grad_norm": 9.69245433807373,
348
+ "learning_rate": 4.6900161030595815e-05,
349
+ "loss": 0.5129,
350
+ "step": 430
351
+ },
352
+ {
353
+ "epoch": 4.024637681159421,
354
+ "grad_norm": 9.770214080810547,
355
+ "learning_rate": 4.669887278582931e-05,
356
+ "loss": 0.3938,
357
+ "step": 440
358
+ },
359
+ {
360
+ "epoch": 4.028260869565218,
361
+ "grad_norm": 16.608795166015625,
362
+ "learning_rate": 4.64975845410628e-05,
363
+ "loss": 0.3251,
364
+ "step": 450
365
+ },
366
+ {
367
+ "epoch": 4.0318840579710145,
368
+ "grad_norm": 14.123062133789062,
369
+ "learning_rate": 4.62962962962963e-05,
370
+ "loss": 0.4449,
371
+ "step": 460
372
+ },
373
+ {
374
+ "epoch": 4.033695652173913,
375
+ "eval_accuracy": 0.7131147540983607,
376
+ "eval_f1": 0.6741469908968126,
377
+ "eval_loss": 0.871498703956604,
378
+ "eval_runtime": 36.9705,
379
+ "eval_samples_per_second": 3.3,
380
+ "eval_steps_per_second": 0.433,
381
+ "step": 465
382
+ },
383
+ {
384
+ "epoch": 5.0018115942028984,
385
+ "grad_norm": 6.473958969116211,
386
+ "learning_rate": 4.609500805152979e-05,
387
+ "loss": 0.1426,
388
+ "step": 470
389
+ },
390
+ {
391
+ "epoch": 5.005434782608695,
392
+ "grad_norm": 0.1059931218624115,
393
+ "learning_rate": 4.589371980676328e-05,
394
+ "loss": 0.1327,
395
+ "step": 480
396
+ },
397
+ {
398
+ "epoch": 5.009057971014493,
399
+ "grad_norm": 0.7832779884338379,
400
+ "learning_rate": 4.569243156199678e-05,
401
+ "loss": 0.1692,
402
+ "step": 490
403
+ },
404
+ {
405
+ "epoch": 5.01268115942029,
406
+ "grad_norm": 12.166410446166992,
407
+ "learning_rate": 4.549114331723028e-05,
408
+ "loss": 0.2135,
409
+ "step": 500
410
+ },
411
+ {
412
+ "epoch": 5.016304347826087,
413
+ "grad_norm": 4.051369667053223,
414
+ "learning_rate": 4.528985507246377e-05,
415
+ "loss": 0.2939,
416
+ "step": 510
417
+ },
418
+ {
419
+ "epoch": 5.019927536231884,
420
+ "grad_norm": 1.8354891538619995,
421
+ "learning_rate": 4.5088566827697266e-05,
422
+ "loss": 0.2449,
423
+ "step": 520
424
+ },
425
+ {
426
+ "epoch": 5.023550724637682,
427
+ "grad_norm": 0.3607825040817261,
428
+ "learning_rate": 4.488727858293076e-05,
429
+ "loss": 0.0841,
430
+ "step": 530
431
+ },
432
+ {
433
+ "epoch": 5.0271739130434785,
434
+ "grad_norm": 7.8428239822387695,
435
+ "learning_rate": 4.4685990338164255e-05,
436
+ "loss": 0.4005,
437
+ "step": 540
438
+ },
439
+ {
440
+ "epoch": 5.030797101449275,
441
+ "grad_norm": 0.2001497596502304,
442
+ "learning_rate": 4.4484702093397746e-05,
443
+ "loss": 0.1371,
444
+ "step": 550
445
+ },
446
+ {
447
+ "epoch": 5.033695652173913,
448
+ "eval_accuracy": 0.7295081967213115,
449
+ "eval_f1": 0.7157775700184823,
450
+ "eval_loss": 1.0560429096221924,
451
+ "eval_runtime": 42.0495,
452
+ "eval_samples_per_second": 2.901,
453
+ "eval_steps_per_second": 0.381,
454
+ "step": 558
455
+ },
456
+ {
457
+ "epoch": 6.000724637681159,
458
+ "grad_norm": 0.06493417173624039,
459
+ "learning_rate": 4.428341384863124e-05,
460
+ "loss": 0.2275,
461
+ "step": 560
462
+ },
463
+ {
464
+ "epoch": 6.004347826086956,
465
+ "grad_norm": 11.751331329345703,
466
+ "learning_rate": 4.408212560386474e-05,
467
+ "loss": 0.2252,
468
+ "step": 570
469
+ },
470
+ {
471
+ "epoch": 6.007971014492754,
472
+ "grad_norm": 19.441425323486328,
473
+ "learning_rate": 4.388083735909823e-05,
474
+ "loss": 0.2761,
475
+ "step": 580
476
+ },
477
+ {
478
+ "epoch": 6.011594202898551,
479
+ "grad_norm": 0.09549690037965775,
480
+ "learning_rate": 4.367954911433172e-05,
481
+ "loss": 0.2724,
482
+ "step": 590
483
+ },
484
+ {
485
+ "epoch": 6.015217391304348,
486
+ "grad_norm": 12.359782218933105,
487
+ "learning_rate": 4.347826086956522e-05,
488
+ "loss": 0.1888,
489
+ "step": 600
490
+ },
491
+ {
492
+ "epoch": 6.018840579710145,
493
+ "grad_norm": 20.348411560058594,
494
+ "learning_rate": 4.327697262479871e-05,
495
+ "loss": 0.4012,
496
+ "step": 610
497
+ },
498
+ {
499
+ "epoch": 6.022463768115942,
500
+ "grad_norm": 0.05766362324357033,
501
+ "learning_rate": 4.307568438003221e-05,
502
+ "loss": 0.1362,
503
+ "step": 620
504
+ },
505
+ {
506
+ "epoch": 6.026086956521739,
507
+ "grad_norm": 18.2602481842041,
508
+ "learning_rate": 4.2874396135265707e-05,
509
+ "loss": 0.1788,
510
+ "step": 630
511
+ },
512
+ {
513
+ "epoch": 6.029710144927536,
514
+ "grad_norm": 0.44831833243370056,
515
+ "learning_rate": 4.26731078904992e-05,
516
+ "loss": 0.1337,
517
+ "step": 640
518
+ },
519
+ {
520
+ "epoch": 6.033333333333333,
521
+ "grad_norm": 13.512539863586426,
522
+ "learning_rate": 4.247181964573269e-05,
523
+ "loss": 0.1789,
524
+ "step": 650
525
+ },
526
+ {
527
+ "epoch": 6.033695652173913,
528
+ "eval_accuracy": 0.7868852459016393,
529
+ "eval_f1": 0.7876811278368901,
530
+ "eval_loss": 0.8218082189559937,
531
+ "eval_runtime": 41.3621,
532
+ "eval_samples_per_second": 2.95,
533
+ "eval_steps_per_second": 0.387,
534
+ "step": 651
535
+ },
536
+ {
537
+ "epoch": 7.003260869565217,
538
+ "grad_norm": 0.31983035802841187,
539
+ "learning_rate": 4.2270531400966186e-05,
540
+ "loss": 0.1123,
541
+ "step": 660
542
+ },
543
+ {
544
+ "epoch": 7.006884057971014,
545
+ "grad_norm": 3.5525496006011963,
546
+ "learning_rate": 4.206924315619968e-05,
547
+ "loss": 0.1231,
548
+ "step": 670
549
+ },
550
+ {
551
+ "epoch": 7.010507246376812,
552
+ "grad_norm": 17.89982795715332,
553
+ "learning_rate": 4.1867954911433174e-05,
554
+ "loss": 0.1998,
555
+ "step": 680
556
+ },
557
+ {
558
+ "epoch": 7.014130434782609,
559
+ "grad_norm": 14.844808578491211,
560
+ "learning_rate": 4.166666666666667e-05,
561
+ "loss": 0.0988,
562
+ "step": 690
563
+ },
564
+ {
565
+ "epoch": 7.017753623188406,
566
+ "grad_norm": 1.0570082664489746,
567
+ "learning_rate": 4.146537842190016e-05,
568
+ "loss": 0.1069,
569
+ "step": 700
570
+ },
571
+ {
572
+ "epoch": 7.021376811594203,
573
+ "grad_norm": 58.255123138427734,
574
+ "learning_rate": 4.126409017713366e-05,
575
+ "loss": 0.2484,
576
+ "step": 710
577
+ },
578
+ {
579
+ "epoch": 7.025,
580
+ "grad_norm": 0.30179139971733093,
581
+ "learning_rate": 4.106280193236715e-05,
582
+ "loss": 0.0665,
583
+ "step": 720
584
+ },
585
+ {
586
+ "epoch": 7.028623188405797,
587
+ "grad_norm": 36.053836822509766,
588
+ "learning_rate": 4.086151368760064e-05,
589
+ "loss": 0.2122,
590
+ "step": 730
591
+ },
592
+ {
593
+ "epoch": 7.032246376811594,
594
+ "grad_norm": 0.23371699452400208,
595
+ "learning_rate": 4.066022544283414e-05,
596
+ "loss": 0.2125,
597
+ "step": 740
598
+ },
599
+ {
600
+ "epoch": 7.033695652173913,
601
+ "eval_accuracy": 0.7868852459016393,
602
+ "eval_f1": 0.7812358925016187,
603
+ "eval_loss": 0.7612058520317078,
604
+ "eval_runtime": 38.7607,
605
+ "eval_samples_per_second": 3.148,
606
+ "eval_steps_per_second": 0.413,
607
+ "step": 744
608
+ },
609
+ {
610
+ "epoch": 8.002173913043478,
611
+ "grad_norm": 0.2972787022590637,
612
+ "learning_rate": 4.045893719806764e-05,
613
+ "loss": 0.0607,
614
+ "step": 750
615
+ },
616
+ {
617
+ "epoch": 8.005797101449275,
618
+ "grad_norm": 0.18866966664791107,
619
+ "learning_rate": 4.025764895330113e-05,
620
+ "loss": 0.1634,
621
+ "step": 760
622
+ },
623
+ {
624
+ "epoch": 8.009420289855072,
625
+ "grad_norm": 0.39748045802116394,
626
+ "learning_rate": 4.0056360708534626e-05,
627
+ "loss": 0.0266,
628
+ "step": 770
629
+ },
630
+ {
631
+ "epoch": 8.013043478260869,
632
+ "grad_norm": 8.73737907409668,
633
+ "learning_rate": 3.985507246376812e-05,
634
+ "loss": 0.0885,
635
+ "step": 780
636
+ },
637
+ {
638
+ "epoch": 8.016666666666667,
639
+ "grad_norm": 0.15234462916851044,
640
+ "learning_rate": 3.965378421900161e-05,
641
+ "loss": 0.0031,
642
+ "step": 790
643
+ },
644
+ {
645
+ "epoch": 8.020289855072464,
646
+ "grad_norm": 12.640290260314941,
647
+ "learning_rate": 3.9452495974235105e-05,
648
+ "loss": 0.2264,
649
+ "step": 800
650
+ },
651
+ {
652
+ "epoch": 8.023913043478261,
653
+ "grad_norm": 0.857113778591156,
654
+ "learning_rate": 3.92512077294686e-05,
655
+ "loss": 0.2147,
656
+ "step": 810
657
+ },
658
+ {
659
+ "epoch": 8.027536231884058,
660
+ "grad_norm": 22.054996490478516,
661
+ "learning_rate": 3.9049919484702094e-05,
662
+ "loss": 0.1488,
663
+ "step": 820
664
+ },
665
+ {
666
+ "epoch": 8.031159420289855,
667
+ "grad_norm": 0.0397706963121891,
668
+ "learning_rate": 3.884863123993559e-05,
669
+ "loss": 0.1561,
670
+ "step": 830
671
+ },
672
+ {
673
+ "epoch": 8.033695652173913,
674
+ "eval_accuracy": 0.8524590163934426,
675
+ "eval_f1": 0.8498439299955695,
676
+ "eval_loss": 0.6051312685012817,
677
+ "eval_runtime": 37.4453,
678
+ "eval_samples_per_second": 3.258,
679
+ "eval_steps_per_second": 0.427,
680
+ "step": 837
681
+ },
682
+ {
683
+ "epoch": 9.001086956521739,
684
+ "grad_norm": 21.391626358032227,
685
+ "learning_rate": 3.864734299516908e-05,
686
+ "loss": 0.0403,
687
+ "step": 840
688
+ },
689
+ {
690
+ "epoch": 9.004710144927536,
691
+ "grad_norm": 29.436418533325195,
692
+ "learning_rate": 3.844605475040258e-05,
693
+ "loss": 0.0919,
694
+ "step": 850
695
+ },
696
+ {
697
+ "epoch": 9.008333333333333,
698
+ "grad_norm": 0.7489465475082397,
699
+ "learning_rate": 3.824476650563607e-05,
700
+ "loss": 0.1226,
701
+ "step": 860
702
+ },
703
+ {
704
+ "epoch": 9.01195652173913,
705
+ "grad_norm": 0.17123177647590637,
706
+ "learning_rate": 3.804347826086957e-05,
707
+ "loss": 0.1048,
708
+ "step": 870
709
+ },
710
+ {
711
+ "epoch": 9.015579710144927,
712
+ "grad_norm": 0.16690145432949066,
713
+ "learning_rate": 3.784219001610306e-05,
714
+ "loss": 0.019,
715
+ "step": 880
716
+ },
717
+ {
718
+ "epoch": 9.019202898550725,
719
+ "grad_norm": 0.029665417969226837,
720
+ "learning_rate": 3.764090177133656e-05,
721
+ "loss": 0.0916,
722
+ "step": 890
723
+ },
724
+ {
725
+ "epoch": 9.022826086956522,
726
+ "grad_norm": 0.02787993662059307,
727
+ "learning_rate": 3.743961352657005e-05,
728
+ "loss": 0.1313,
729
+ "step": 900
730
+ },
731
+ {
732
+ "epoch": 9.02644927536232,
733
+ "grad_norm": 0.020100735127925873,
734
+ "learning_rate": 3.7238325281803546e-05,
735
+ "loss": 0.0665,
736
+ "step": 910
737
+ },
738
+ {
739
+ "epoch": 9.030072463768116,
740
+ "grad_norm": 0.46395331621170044,
741
+ "learning_rate": 3.7037037037037037e-05,
742
+ "loss": 0.1576,
743
+ "step": 920
744
+ },
745
+ {
746
+ "epoch": 9.033695652173913,
747
+ "grad_norm": 15.341463088989258,
748
+ "learning_rate": 3.6835748792270534e-05,
749
+ "loss": 0.2297,
750
+ "step": 930
751
+ },
752
+ {
753
+ "epoch": 9.033695652173913,
754
+ "eval_accuracy": 0.8770491803278688,
755
+ "eval_f1": 0.8766394106755289,
756
+ "eval_loss": 0.6320860385894775,
757
+ "eval_runtime": 36.4362,
758
+ "eval_samples_per_second": 3.348,
759
+ "eval_steps_per_second": 0.439,
760
+ "step": 930
761
+ },
762
+ {
763
+ "epoch": 10.003623188405797,
764
+ "grad_norm": 0.1273241639137268,
765
+ "learning_rate": 3.663446054750403e-05,
766
+ "loss": 0.023,
767
+ "step": 940
768
+ },
769
+ {
770
+ "epoch": 10.007246376811594,
771
+ "grad_norm": 7.185577869415283,
772
+ "learning_rate": 3.643317230273752e-05,
773
+ "loss": 0.0476,
774
+ "step": 950
775
+ },
776
+ {
777
+ "epoch": 10.01086956521739,
778
+ "grad_norm": 0.06820656359195709,
779
+ "learning_rate": 3.6231884057971014e-05,
780
+ "loss": 0.098,
781
+ "step": 960
782
+ },
783
+ {
784
+ "epoch": 10.014492753623188,
785
+ "grad_norm": 3.738891124725342,
786
+ "learning_rate": 3.603059581320451e-05,
787
+ "loss": 0.0214,
788
+ "step": 970
789
+ },
790
+ {
791
+ "epoch": 10.018115942028986,
792
+ "grad_norm": 0.026407288387417793,
793
+ "learning_rate": 3.5829307568438e-05,
794
+ "loss": 0.0211,
795
+ "step": 980
796
+ },
797
+ {
798
+ "epoch": 10.021739130434783,
799
+ "grad_norm": 0.013281815685331821,
800
+ "learning_rate": 3.56280193236715e-05,
801
+ "loss": 0.0539,
802
+ "step": 990
803
+ },
804
+ {
805
+ "epoch": 10.02536231884058,
806
+ "grad_norm": 0.016630737110972404,
807
+ "learning_rate": 3.5426731078905e-05,
808
+ "loss": 0.0013,
809
+ "step": 1000
810
+ },
811
+ {
812
+ "epoch": 10.028985507246377,
813
+ "grad_norm": 0.013293278403580189,
814
+ "learning_rate": 3.522544283413849e-05,
815
+ "loss": 0.0542,
816
+ "step": 1010
817
+ },
818
+ {
819
+ "epoch": 10.032608695652174,
820
+ "grad_norm": 0.021008765324950218,
821
+ "learning_rate": 3.502415458937198e-05,
822
+ "loss": 0.0692,
823
+ "step": 1020
824
+ },
825
+ {
826
+ "epoch": 10.033695652173913,
827
+ "eval_accuracy": 0.8442622950819673,
828
+ "eval_f1": 0.8454522438128995,
829
+ "eval_loss": 0.7127842903137207,
830
+ "eval_runtime": 37.7495,
831
+ "eval_samples_per_second": 3.232,
832
+ "eval_steps_per_second": 0.424,
833
+ "step": 1023
834
+ },
835
+ {
836
+ "epoch": 11.002536231884058,
837
+ "grad_norm": 1.3483757972717285,
838
+ "learning_rate": 3.482286634460548e-05,
839
+ "loss": 0.0038,
840
+ "step": 1030
841
+ },
842
+ {
843
+ "epoch": 11.006159420289855,
844
+ "grad_norm": 3.5823521614074707,
845
+ "learning_rate": 3.462157809983897e-05,
846
+ "loss": 0.0591,
847
+ "step": 1040
848
+ },
849
+ {
850
+ "epoch": 11.009782608695652,
851
+ "grad_norm": 0.010481027886271477,
852
+ "learning_rate": 3.4420289855072465e-05,
853
+ "loss": 0.1033,
854
+ "step": 1050
855
+ },
856
+ {
857
+ "epoch": 11.013405797101449,
858
+ "grad_norm": 0.022550148889422417,
859
+ "learning_rate": 3.421900161030596e-05,
860
+ "loss": 0.073,
861
+ "step": 1060
862
+ },
863
+ {
864
+ "epoch": 11.017028985507247,
865
+ "grad_norm": 0.22509634494781494,
866
+ "learning_rate": 3.4017713365539454e-05,
867
+ "loss": 0.0726,
868
+ "step": 1070
869
+ },
870
+ {
871
+ "epoch": 11.020652173913044,
872
+ "grad_norm": 0.409952849149704,
873
+ "learning_rate": 3.381642512077295e-05,
874
+ "loss": 0.02,
875
+ "step": 1080
876
+ },
877
+ {
878
+ "epoch": 11.024275362318841,
879
+ "grad_norm": 0.022929221391677856,
880
+ "learning_rate": 3.361513687600644e-05,
881
+ "loss": 0.0079,
882
+ "step": 1090
883
+ },
884
+ {
885
+ "epoch": 11.027898550724638,
886
+ "grad_norm": 0.009896630421280861,
887
+ "learning_rate": 3.341384863123993e-05,
888
+ "loss": 0.0277,
889
+ "step": 1100
890
+ },
891
+ {
892
+ "epoch": 11.031521739130435,
893
+ "grad_norm": 0.04593189060688019,
894
+ "learning_rate": 3.321256038647343e-05,
895
+ "loss": 0.0495,
896
+ "step": 1110
897
+ },
898
+ {
899
+ "epoch": 11.033695652173913,
900
+ "eval_accuracy": 0.8360655737704918,
901
+ "eval_f1": 0.8352522796879587,
902
+ "eval_loss": 0.7737651467323303,
903
+ "eval_runtime": 37.723,
904
+ "eval_samples_per_second": 3.234,
905
+ "eval_steps_per_second": 0.424,
906
+ "step": 1116
907
+ },
908
+ {
909
+ "epoch": 12.001449275362319,
910
+ "grad_norm": 0.4977063536643982,
911
+ "learning_rate": 3.301127214170693e-05,
912
+ "loss": 0.068,
913
+ "step": 1120
914
+ },
915
+ {
916
+ "epoch": 12.005072463768116,
917
+ "grad_norm": 9.698254585266113,
918
+ "learning_rate": 3.280998389694042e-05,
919
+ "loss": 0.0108,
920
+ "step": 1130
921
+ },
922
+ {
923
+ "epoch": 12.008695652173913,
924
+ "grad_norm": 37.918338775634766,
925
+ "learning_rate": 3.260869565217392e-05,
926
+ "loss": 0.0108,
927
+ "step": 1140
928
+ },
929
+ {
930
+ "epoch": 12.01231884057971,
931
+ "grad_norm": 0.012104889377951622,
932
+ "learning_rate": 3.240740740740741e-05,
933
+ "loss": 0.0524,
934
+ "step": 1150
935
+ },
936
+ {
937
+ "epoch": 12.015942028985508,
938
+ "grad_norm": 40.3338623046875,
939
+ "learning_rate": 3.22061191626409e-05,
940
+ "loss": 0.0867,
941
+ "step": 1160
942
+ },
943
+ {
944
+ "epoch": 12.019565217391305,
945
+ "grad_norm": 37.17512130737305,
946
+ "learning_rate": 3.2004830917874396e-05,
947
+ "loss": 0.058,
948
+ "step": 1170
949
+ },
950
+ {
951
+ "epoch": 12.023188405797102,
952
+ "grad_norm": 0.11629298329353333,
953
+ "learning_rate": 3.1803542673107894e-05,
954
+ "loss": 0.0826,
955
+ "step": 1180
956
+ },
957
+ {
958
+ "epoch": 12.026811594202899,
959
+ "grad_norm": 0.01565568707883358,
960
+ "learning_rate": 3.1602254428341385e-05,
961
+ "loss": 0.1293,
962
+ "step": 1190
963
+ },
964
+ {
965
+ "epoch": 12.030434782608696,
966
+ "grad_norm": 0.0614808052778244,
967
+ "learning_rate": 3.140096618357488e-05,
968
+ "loss": 0.1059,
969
+ "step": 1200
970
+ },
971
+ {
972
+ "epoch": 12.033695652173913,
973
+ "eval_accuracy": 0.8524590163934426,
974
+ "eval_f1": 0.8524029704357573,
975
+ "eval_loss": 0.6213375329971313,
976
+ "eval_runtime": 37.3424,
977
+ "eval_samples_per_second": 3.267,
978
+ "eval_steps_per_second": 0.428,
979
+ "step": 1209
980
+ },
981
+ {
982
+ "epoch": 13.00036231884058,
983
+ "grad_norm": 0.01347822230309248,
984
+ "learning_rate": 3.119967793880837e-05,
985
+ "loss": 0.002,
986
+ "step": 1210
987
+ },
988
+ {
989
+ "epoch": 13.003985507246377,
990
+ "grad_norm": 0.02086636610329151,
991
+ "learning_rate": 3.099838969404187e-05,
992
+ "loss": 0.0658,
993
+ "step": 1220
994
+ },
995
+ {
996
+ "epoch": 13.007608695652173,
997
+ "grad_norm": 0.18250229954719543,
998
+ "learning_rate": 3.079710144927536e-05,
999
+ "loss": 0.018,
1000
+ "step": 1230
1001
+ },
1002
+ {
1003
+ "epoch": 13.01123188405797,
1004
+ "grad_norm": 0.01864694245159626,
1005
+ "learning_rate": 3.059581320450886e-05,
1006
+ "loss": 0.0016,
1007
+ "step": 1240
1008
+ },
1009
+ {
1010
+ "epoch": 13.014855072463767,
1011
+ "grad_norm": 0.03139445558190346,
1012
+ "learning_rate": 3.0394524959742354e-05,
1013
+ "loss": 0.0011,
1014
+ "step": 1250
1015
+ },
1016
+ {
1017
+ "epoch": 13.018478260869566,
1018
+ "grad_norm": 0.013789031654596329,
1019
+ "learning_rate": 3.0193236714975848e-05,
1020
+ "loss": 0.0016,
1021
+ "step": 1260
1022
+ },
1023
+ {
1024
+ "epoch": 13.022101449275363,
1025
+ "grad_norm": 0.03744020313024521,
1026
+ "learning_rate": 2.9991948470209342e-05,
1027
+ "loss": 0.001,
1028
+ "step": 1270
1029
+ },
1030
+ {
1031
+ "epoch": 13.02572463768116,
1032
+ "grad_norm": 0.007898851297795773,
1033
+ "learning_rate": 2.9790660225442833e-05,
1034
+ "loss": 0.0106,
1035
+ "step": 1280
1036
+ },
1037
+ {
1038
+ "epoch": 13.029347826086957,
1039
+ "grad_norm": 3.897127866744995,
1040
+ "learning_rate": 2.9589371980676327e-05,
1041
+ "loss": 0.0454,
1042
+ "step": 1290
1043
+ },
1044
+ {
1045
+ "epoch": 13.032971014492754,
1046
+ "grad_norm": 0.009393475018441677,
1047
+ "learning_rate": 2.938808373590982e-05,
1048
+ "loss": 0.1672,
1049
+ "step": 1300
1050
+ },
1051
+ {
1052
+ "epoch": 13.033695652173913,
1053
+ "eval_accuracy": 0.8442622950819673,
1054
+ "eval_f1": 0.8408580352564015,
1055
+ "eval_loss": 0.7887758612632751,
1056
+ "eval_runtime": 37.7507,
1057
+ "eval_samples_per_second": 3.232,
1058
+ "eval_steps_per_second": 0.424,
1059
+ "step": 1302
1060
+ },
1061
+ {
1062
+ "epoch": 14.002898550724638,
1063
+ "grad_norm": 0.034189604222774506,
1064
+ "learning_rate": 2.918679549114332e-05,
1065
+ "loss": 0.044,
1066
+ "step": 1310
1067
+ },
1068
+ {
1069
+ "epoch": 14.006521739130434,
1070
+ "grad_norm": 0.02449285238981247,
1071
+ "learning_rate": 2.8985507246376814e-05,
1072
+ "loss": 0.0904,
1073
+ "step": 1320
1074
+ },
1075
+ {
1076
+ "epoch": 14.010144927536231,
1077
+ "grad_norm": 0.027986012399196625,
1078
+ "learning_rate": 2.8784219001610308e-05,
1079
+ "loss": 0.0059,
1080
+ "step": 1330
1081
+ },
1082
+ {
1083
+ "epoch": 14.013768115942028,
1084
+ "grad_norm": 0.017405090853571892,
1085
+ "learning_rate": 2.8582930756843802e-05,
1086
+ "loss": 0.0016,
1087
+ "step": 1340
1088
+ },
1089
+ {
1090
+ "epoch": 14.017391304347827,
1091
+ "grad_norm": 0.009259097278118134,
1092
+ "learning_rate": 2.8381642512077293e-05,
1093
+ "loss": 0.1116,
1094
+ "step": 1350
1095
+ },
1096
+ {
1097
+ "epoch": 14.021014492753624,
1098
+ "grad_norm": 0.27058884501457214,
1099
+ "learning_rate": 2.8180354267310787e-05,
1100
+ "loss": 0.0011,
1101
+ "step": 1360
1102
+ },
1103
+ {
1104
+ "epoch": 14.02463768115942,
1105
+ "grad_norm": 0.04683419317007065,
1106
+ "learning_rate": 2.7979066022544288e-05,
1107
+ "loss": 0.0022,
1108
+ "step": 1370
1109
+ },
1110
+ {
1111
+ "epoch": 14.028260869565218,
1112
+ "grad_norm": 0.011938877403736115,
1113
+ "learning_rate": 2.777777777777778e-05,
1114
+ "loss": 0.092,
1115
+ "step": 1380
1116
+ },
1117
+ {
1118
+ "epoch": 14.031884057971014,
1119
+ "grad_norm": 0.020922021940350533,
1120
+ "learning_rate": 2.7576489533011273e-05,
1121
+ "loss": 0.0178,
1122
+ "step": 1390
1123
+ },
1124
+ {
1125
+ "epoch": 14.033695652173913,
1126
+ "eval_accuracy": 0.8688524590163934,
1127
+ "eval_f1": 0.8657779177289288,
1128
+ "eval_loss": 0.6488391757011414,
1129
+ "eval_runtime": 37.3683,
1130
+ "eval_samples_per_second": 3.265,
1131
+ "eval_steps_per_second": 0.428,
1132
+ "step": 1395
1133
+ },
1134
+ {
1135
+ "epoch": 15.001811594202898,
1136
+ "grad_norm": 0.014293194748461246,
1137
+ "learning_rate": 2.7375201288244768e-05,
1138
+ "loss": 0.0616,
1139
+ "step": 1400
1140
+ },
1141
+ {
1142
+ "epoch": 15.005434782608695,
1143
+ "grad_norm": 0.012990676797926426,
1144
+ "learning_rate": 2.7173913043478262e-05,
1145
+ "loss": 0.0038,
1146
+ "step": 1410
1147
+ },
1148
+ {
1149
+ "epoch": 15.009057971014492,
1150
+ "grad_norm": 0.021259872242808342,
1151
+ "learning_rate": 2.6972624798711753e-05,
1152
+ "loss": 0.0008,
1153
+ "step": 1420
1154
+ },
1155
+ {
1156
+ "epoch": 15.01268115942029,
1157
+ "grad_norm": 0.02497517503798008,
1158
+ "learning_rate": 2.6771336553945254e-05,
1159
+ "loss": 0.0016,
1160
+ "step": 1430
1161
+ },
1162
+ {
1163
+ "epoch": 15.016304347826088,
1164
+ "grad_norm": 7.305609703063965,
1165
+ "learning_rate": 2.6570048309178748e-05,
1166
+ "loss": 0.1195,
1167
+ "step": 1440
1168
+ },
1169
+ {
1170
+ "epoch": 15.019927536231885,
1171
+ "grad_norm": 65.574462890625,
1172
+ "learning_rate": 2.636876006441224e-05,
1173
+ "loss": 0.0579,
1174
+ "step": 1450
1175
+ },
1176
+ {
1177
+ "epoch": 15.023550724637682,
1178
+ "grad_norm": 0.010529917664825916,
1179
+ "learning_rate": 2.6167471819645733e-05,
1180
+ "loss": 0.0296,
1181
+ "step": 1460
1182
+ },
1183
+ {
1184
+ "epoch": 15.027173913043478,
1185
+ "grad_norm": 0.010935317724943161,
1186
+ "learning_rate": 2.5966183574879227e-05,
1187
+ "loss": 0.0616,
1188
+ "step": 1470
1189
+ },
1190
+ {
1191
+ "epoch": 15.030797101449275,
1192
+ "grad_norm": 0.03048408031463623,
1193
+ "learning_rate": 2.576489533011272e-05,
1194
+ "loss": 0.0165,
1195
+ "step": 1480
1196
+ },
1197
+ {
1198
+ "epoch": 15.033695652173913,
1199
+ "eval_accuracy": 0.8770491803278688,
1200
+ "eval_f1": 0.8772784869363629,
1201
+ "eval_loss": 0.68454909324646,
1202
+ "eval_runtime": 38.539,
1203
+ "eval_samples_per_second": 3.166,
1204
+ "eval_steps_per_second": 0.415,
1205
+ "step": 1488
1206
+ },
1207
+ {
1208
+ "epoch": 16.00072463768116,
1209
+ "grad_norm": 0.026206759735941887,
1210
+ "learning_rate": 2.556360708534622e-05,
1211
+ "loss": 0.0027,
1212
+ "step": 1490
1213
+ },
1214
+ {
1215
+ "epoch": 16.004347826086956,
1216
+ "grad_norm": 0.17636388540267944,
1217
+ "learning_rate": 2.5362318840579714e-05,
1218
+ "loss": 0.0013,
1219
+ "step": 1500
1220
+ },
1221
+ {
1222
+ "epoch": 16.007971014492753,
1223
+ "grad_norm": 0.009907973930239677,
1224
+ "learning_rate": 2.5161030595813208e-05,
1225
+ "loss": 0.0008,
1226
+ "step": 1510
1227
+ },
1228
+ {
1229
+ "epoch": 16.01159420289855,
1230
+ "grad_norm": 0.007697808090597391,
1231
+ "learning_rate": 2.49597423510467e-05,
1232
+ "loss": 0.0012,
1233
+ "step": 1520
1234
+ },
1235
+ {
1236
+ "epoch": 16.015217391304347,
1237
+ "grad_norm": 0.012929815798997879,
1238
+ "learning_rate": 2.4758454106280193e-05,
1239
+ "loss": 0.0008,
1240
+ "step": 1530
1241
+ },
1242
+ {
1243
+ "epoch": 16.018840579710144,
1244
+ "grad_norm": 0.03589075058698654,
1245
+ "learning_rate": 2.455716586151369e-05,
1246
+ "loss": 0.0006,
1247
+ "step": 1540
1248
+ },
1249
+ {
1250
+ "epoch": 16.02246376811594,
1251
+ "grad_norm": 0.012779805809259415,
1252
+ "learning_rate": 2.435587761674718e-05,
1253
+ "loss": 0.1136,
1254
+ "step": 1550
1255
+ },
1256
+ {
1257
+ "epoch": 16.026086956521738,
1258
+ "grad_norm": 51.02324676513672,
1259
+ "learning_rate": 2.4154589371980676e-05,
1260
+ "loss": 0.1051,
1261
+ "step": 1560
1262
+ },
1263
+ {
1264
+ "epoch": 16.029710144927535,
1265
+ "grad_norm": 0.01224570907652378,
1266
+ "learning_rate": 2.3953301127214173e-05,
1267
+ "loss": 0.0355,
1268
+ "step": 1570
1269
+ },
1270
+ {
1271
+ "epoch": 16.033333333333335,
1272
+ "grad_norm": 0.1042768731713295,
1273
+ "learning_rate": 2.3752012882447668e-05,
1274
+ "loss": 0.0166,
1275
+ "step": 1580
1276
+ },
1277
+ {
1278
+ "epoch": 16.033695652173915,
1279
+ "eval_accuracy": 0.8524590163934426,
1280
+ "eval_f1": 0.8445057204972595,
1281
+ "eval_loss": 0.8649476766586304,
1282
+ "eval_runtime": 42.3787,
1283
+ "eval_samples_per_second": 2.879,
1284
+ "eval_steps_per_second": 0.378,
1285
+ "step": 1581
1286
+ },
1287
+ {
1288
+ "epoch": 17.003260869565217,
1289
+ "grad_norm": 0.007085030898451805,
1290
+ "learning_rate": 2.355072463768116e-05,
1291
+ "loss": 0.0251,
1292
+ "step": 1590
1293
+ },
1294
+ {
1295
+ "epoch": 17.006884057971014,
1296
+ "grad_norm": 0.015758316963911057,
1297
+ "learning_rate": 2.3349436392914656e-05,
1298
+ "loss": 0.0598,
1299
+ "step": 1600
1300
+ },
1301
+ {
1302
+ "epoch": 17.01050724637681,
1303
+ "grad_norm": 0.0069606369361281395,
1304
+ "learning_rate": 2.314814814814815e-05,
1305
+ "loss": 0.004,
1306
+ "step": 1610
1307
+ },
1308
+ {
1309
+ "epoch": 17.014130434782608,
1310
+ "grad_norm": 0.008627723902463913,
1311
+ "learning_rate": 2.294685990338164e-05,
1312
+ "loss": 0.0065,
1313
+ "step": 1620
1314
+ },
1315
+ {
1316
+ "epoch": 17.017753623188405,
1317
+ "grad_norm": 0.009509687311947346,
1318
+ "learning_rate": 2.274557165861514e-05,
1319
+ "loss": 0.0005,
1320
+ "step": 1630
1321
+ },
1322
+ {
1323
+ "epoch": 17.0213768115942,
1324
+ "grad_norm": 0.014063004404306412,
1325
+ "learning_rate": 2.2544283413848633e-05,
1326
+ "loss": 0.0011,
1327
+ "step": 1640
1328
+ },
1329
+ {
1330
+ "epoch": 17.025,
1331
+ "grad_norm": 14.068120956420898,
1332
+ "learning_rate": 2.2342995169082127e-05,
1333
+ "loss": 0.0636,
1334
+ "step": 1650
1335
+ },
1336
+ {
1337
+ "epoch": 17.028623188405795,
1338
+ "grad_norm": 0.08216149359941483,
1339
+ "learning_rate": 2.214170692431562e-05,
1340
+ "loss": 0.1043,
1341
+ "step": 1660
1342
+ },
1343
+ {
1344
+ "epoch": 17.032246376811596,
1345
+ "grad_norm": 0.14345373213291168,
1346
+ "learning_rate": 2.1940418679549116e-05,
1347
+ "loss": 0.0014,
1348
+ "step": 1670
1349
+ },
1350
+ {
1351
+ "epoch": 17.033695652173915,
1352
+ "eval_accuracy": 0.8524590163934426,
1353
+ "eval_f1": 0.8516181034534225,
1354
+ "eval_loss": 0.7865917086601257,
1355
+ "eval_runtime": 40.0693,
1356
+ "eval_samples_per_second": 3.045,
1357
+ "eval_steps_per_second": 0.399,
1358
+ "step": 1674
1359
+ },
1360
+ {
1361
+ "epoch": 18.002173913043478,
1362
+ "grad_norm": 0.006833807565271854,
1363
+ "learning_rate": 2.173913043478261e-05,
1364
+ "loss": 0.0039,
1365
+ "step": 1680
1366
+ },
1367
+ {
1368
+ "epoch": 18.005797101449275,
1369
+ "grad_norm": 0.014425553381443024,
1370
+ "learning_rate": 2.1537842190016104e-05,
1371
+ "loss": 0.022,
1372
+ "step": 1690
1373
+ },
1374
+ {
1375
+ "epoch": 18.009420289855072,
1376
+ "grad_norm": 0.05315766856074333,
1377
+ "learning_rate": 2.13365539452496e-05,
1378
+ "loss": 0.0055,
1379
+ "step": 1700
1380
+ },
1381
+ {
1382
+ "epoch": 18.01304347826087,
1383
+ "grad_norm": 0.006013238336890936,
1384
+ "learning_rate": 2.1135265700483093e-05,
1385
+ "loss": 0.0725,
1386
+ "step": 1710
1387
+ },
1388
+ {
1389
+ "epoch": 18.016666666666666,
1390
+ "grad_norm": 0.009230862371623516,
1391
+ "learning_rate": 2.0933977455716587e-05,
1392
+ "loss": 0.0005,
1393
+ "step": 1720
1394
+ },
1395
+ {
1396
+ "epoch": 18.020289855072463,
1397
+ "grad_norm": 0.007369612343609333,
1398
+ "learning_rate": 2.073268921095008e-05,
1399
+ "loss": 0.0005,
1400
+ "step": 1730
1401
+ },
1402
+ {
1403
+ "epoch": 18.02391304347826,
1404
+ "grad_norm": 0.005340999457985163,
1405
+ "learning_rate": 2.0531400966183576e-05,
1406
+ "loss": 0.0043,
1407
+ "step": 1740
1408
+ },
1409
+ {
1410
+ "epoch": 18.027536231884056,
1411
+ "grad_norm": 5.2305498123168945,
1412
+ "learning_rate": 2.033011272141707e-05,
1413
+ "loss": 0.0677,
1414
+ "step": 1750
1415
+ },
1416
+ {
1417
+ "epoch": 18.031159420289853,
1418
+ "grad_norm": 0.008544102311134338,
1419
+ "learning_rate": 2.0128824476650564e-05,
1420
+ "loss": 0.0473,
1421
+ "step": 1760
1422
+ },
1423
+ {
1424
+ "epoch": 18.033695652173915,
1425
+ "eval_accuracy": 0.8770491803278688,
1426
+ "eval_f1": 0.8776382319776725,
1427
+ "eval_loss": 0.6390149593353271,
1428
+ "eval_runtime": 38.4182,
1429
+ "eval_samples_per_second": 3.176,
1430
+ "eval_steps_per_second": 0.416,
1431
+ "step": 1767
1432
+ },
1433
+ {
1434
+ "epoch": 19.00108695652174,
1435
+ "grad_norm": 0.007366463541984558,
1436
+ "learning_rate": 1.992753623188406e-05,
1437
+ "loss": 0.0005,
1438
+ "step": 1770
1439
+ },
1440
+ {
1441
+ "epoch": 19.004710144927536,
1442
+ "grad_norm": 2.3047983646392822,
1443
+ "learning_rate": 1.9726247987117553e-05,
1444
+ "loss": 0.0185,
1445
+ "step": 1780
1446
+ },
1447
+ {
1448
+ "epoch": 19.008333333333333,
1449
+ "grad_norm": 0.005975569132715464,
1450
+ "learning_rate": 1.9524959742351047e-05,
1451
+ "loss": 0.0007,
1452
+ "step": 1790
1453
+ },
1454
+ {
1455
+ "epoch": 19.01195652173913,
1456
+ "grad_norm": 0.007045724429190159,
1457
+ "learning_rate": 1.932367149758454e-05,
1458
+ "loss": 0.0008,
1459
+ "step": 1800
1460
+ },
1461
+ {
1462
+ "epoch": 19.015579710144927,
1463
+ "grad_norm": 0.005632986314594746,
1464
+ "learning_rate": 1.9122383252818036e-05,
1465
+ "loss": 0.001,
1466
+ "step": 1810
1467
+ },
1468
+ {
1469
+ "epoch": 19.019202898550724,
1470
+ "grad_norm": 0.0067955972626805305,
1471
+ "learning_rate": 1.892109500805153e-05,
1472
+ "loss": 0.0006,
1473
+ "step": 1820
1474
+ },
1475
+ {
1476
+ "epoch": 19.02282608695652,
1477
+ "grad_norm": 0.0061930399388074875,
1478
+ "learning_rate": 1.8719806763285024e-05,
1479
+ "loss": 0.0154,
1480
+ "step": 1830
1481
+ },
1482
+ {
1483
+ "epoch": 19.026449275362317,
1484
+ "grad_norm": 0.005985133349895477,
1485
+ "learning_rate": 1.8518518518518518e-05,
1486
+ "loss": 0.0029,
1487
+ "step": 1840
1488
+ },
1489
+ {
1490
+ "epoch": 19.030072463768114,
1491
+ "grad_norm": 0.005963871255517006,
1492
+ "learning_rate": 1.8317230273752016e-05,
1493
+ "loss": 0.0407,
1494
+ "step": 1850
1495
+ },
1496
+ {
1497
+ "epoch": 19.033695652173915,
1498
+ "grad_norm": 0.004762581083923578,
1499
+ "learning_rate": 1.8115942028985507e-05,
1500
+ "loss": 0.0441,
1501
+ "step": 1860
1502
+ },
1503
+ {
1504
+ "epoch": 19.033695652173915,
1505
+ "eval_accuracy": 0.8360655737704918,
1506
+ "eval_f1": 0.834206674473068,
1507
+ "eval_loss": 0.8234833478927612,
1508
+ "eval_runtime": 37.3263,
1509
+ "eval_samples_per_second": 3.268,
1510
+ "eval_steps_per_second": 0.429,
1511
+ "step": 1860
1512
+ },
1513
+ {
1514
+ "epoch": 20.003623188405797,
1515
+ "grad_norm": 0.12766918540000916,
1516
+ "learning_rate": 1.7914653784219e-05,
1517
+ "loss": 0.0067,
1518
+ "step": 1870
1519
+ },
1520
+ {
1521
+ "epoch": 20.007246376811594,
1522
+ "grad_norm": 0.031153714284300804,
1523
+ "learning_rate": 1.77133655394525e-05,
1524
+ "loss": 0.0007,
1525
+ "step": 1880
1526
+ },
1527
+ {
1528
+ "epoch": 20.01086956521739,
1529
+ "grad_norm": 0.0049373493529856205,
1530
+ "learning_rate": 1.751207729468599e-05,
1531
+ "loss": 0.0148,
1532
+ "step": 1890
1533
+ },
1534
+ {
1535
+ "epoch": 20.014492753623188,
1536
+ "grad_norm": 0.04706621542572975,
1537
+ "learning_rate": 1.7310789049919484e-05,
1538
+ "loss": 0.0516,
1539
+ "step": 1900
1540
+ },
1541
+ {
1542
+ "epoch": 20.018115942028984,
1543
+ "grad_norm": 0.011202222667634487,
1544
+ "learning_rate": 1.710950080515298e-05,
1545
+ "loss": 0.0073,
1546
+ "step": 1910
1547
+ },
1548
+ {
1549
+ "epoch": 20.02173913043478,
1550
+ "grad_norm": 0.004066763911396265,
1551
+ "learning_rate": 1.6908212560386476e-05,
1552
+ "loss": 0.0065,
1553
+ "step": 1920
1554
+ },
1555
+ {
1556
+ "epoch": 20.02536231884058,
1557
+ "grad_norm": 0.5498021245002747,
1558
+ "learning_rate": 1.6706924315619967e-05,
1559
+ "loss": 0.0549,
1560
+ "step": 1930
1561
+ },
1562
+ {
1563
+ "epoch": 20.028985507246375,
1564
+ "grad_norm": 0.011782179586589336,
1565
+ "learning_rate": 1.6505636070853464e-05,
1566
+ "loss": 0.0009,
1567
+ "step": 1940
1568
+ },
1569
+ {
1570
+ "epoch": 20.032608695652176,
1571
+ "grad_norm": 0.0048751975409686565,
1572
+ "learning_rate": 1.630434782608696e-05,
1573
+ "loss": 0.0006,
1574
+ "step": 1950
1575
+ },
1576
+ {
1577
+ "epoch": 20.033695652173915,
1578
+ "eval_accuracy": 0.8852459016393442,
1579
+ "eval_f1": 0.88558791751199,
1580
+ "eval_loss": 0.6014039516448975,
1581
+ "eval_runtime": 37.9636,
1582
+ "eval_samples_per_second": 3.214,
1583
+ "eval_steps_per_second": 0.421,
1584
+ "step": 1953
1585
+ },
1586
+ {
1587
+ "epoch": 21.002536231884058,
1588
+ "grad_norm": 0.009569565765559673,
1589
+ "learning_rate": 1.610305958132045e-05,
1590
+ "loss": 0.0004,
1591
+ "step": 1960
1592
+ },
1593
+ {
1594
+ "epoch": 21.006159420289855,
1595
+ "grad_norm": 0.0038696015253663063,
1596
+ "learning_rate": 1.5901771336553947e-05,
1597
+ "loss": 0.0005,
1598
+ "step": 1970
1599
+ },
1600
+ {
1601
+ "epoch": 21.00978260869565,
1602
+ "grad_norm": 0.012325268238782883,
1603
+ "learning_rate": 1.570048309178744e-05,
1604
+ "loss": 0.0248,
1605
+ "step": 1980
1606
+ },
1607
+ {
1608
+ "epoch": 21.01340579710145,
1609
+ "grad_norm": 0.927451491355896,
1610
+ "learning_rate": 1.5499194847020936e-05,
1611
+ "loss": 0.0384,
1612
+ "step": 1990
1613
+ },
1614
+ {
1615
+ "epoch": 21.017028985507245,
1616
+ "grad_norm": 0.005916436202824116,
1617
+ "learning_rate": 1.529790660225443e-05,
1618
+ "loss": 0.0004,
1619
+ "step": 2000
1620
+ },
1621
+ {
1622
+ "epoch": 21.020652173913042,
1623
+ "grad_norm": 0.00484825111925602,
1624
+ "learning_rate": 1.5096618357487924e-05,
1625
+ "loss": 0.0028,
1626
+ "step": 2010
1627
+ },
1628
+ {
1629
+ "epoch": 21.02427536231884,
1630
+ "grad_norm": 0.05984441190958023,
1631
+ "learning_rate": 1.4895330112721417e-05,
1632
+ "loss": 0.0635,
1633
+ "step": 2020
1634
+ },
1635
+ {
1636
+ "epoch": 21.027898550724636,
1637
+ "grad_norm": 0.006016437895596027,
1638
+ "learning_rate": 1.469404186795491e-05,
1639
+ "loss": 0.0005,
1640
+ "step": 2030
1641
+ },
1642
+ {
1643
+ "epoch": 21.031521739130437,
1644
+ "grad_norm": 0.006119464058429003,
1645
+ "learning_rate": 1.4492753623188407e-05,
1646
+ "loss": 0.0005,
1647
+ "step": 2040
1648
+ },
1649
+ {
1650
+ "epoch": 21.033695652173915,
1651
+ "eval_accuracy": 0.8688524590163934,
1652
+ "eval_f1": 0.8672487891684085,
1653
+ "eval_loss": 0.758141815662384,
1654
+ "eval_runtime": 38.1916,
1655
+ "eval_samples_per_second": 3.194,
1656
+ "eval_steps_per_second": 0.419,
1657
+ "step": 2046
1658
+ },
1659
+ {
1660
+ "epoch": 22.00144927536232,
1661
+ "grad_norm": 0.008889817632734776,
1662
+ "learning_rate": 1.4291465378421901e-05,
1663
+ "loss": 0.0009,
1664
+ "step": 2050
1665
+ },
1666
+ {
1667
+ "epoch": 22.005072463768116,
1668
+ "grad_norm": 0.004777275491505861,
1669
+ "learning_rate": 1.4090177133655394e-05,
1670
+ "loss": 0.0159,
1671
+ "step": 2060
1672
+ },
1673
+ {
1674
+ "epoch": 22.008695652173913,
1675
+ "grad_norm": 0.005355835892260075,
1676
+ "learning_rate": 1.388888888888889e-05,
1677
+ "loss": 0.0066,
1678
+ "step": 2070
1679
+ },
1680
+ {
1681
+ "epoch": 22.01231884057971,
1682
+ "grad_norm": 0.006906128488481045,
1683
+ "learning_rate": 1.3687600644122384e-05,
1684
+ "loss": 0.0005,
1685
+ "step": 2080
1686
+ },
1687
+ {
1688
+ "epoch": 22.015942028985506,
1689
+ "grad_norm": 0.0049617839977145195,
1690
+ "learning_rate": 1.3486312399355876e-05,
1691
+ "loss": 0.0007,
1692
+ "step": 2090
1693
+ },
1694
+ {
1695
+ "epoch": 22.019565217391303,
1696
+ "grad_norm": 0.004387282766401768,
1697
+ "learning_rate": 1.3285024154589374e-05,
1698
+ "loss": 0.0004,
1699
+ "step": 2100
1700
+ },
1701
+ {
1702
+ "epoch": 22.0231884057971,
1703
+ "grad_norm": 0.004814252723008394,
1704
+ "learning_rate": 1.3083735909822867e-05,
1705
+ "loss": 0.0041,
1706
+ "step": 2110
1707
+ },
1708
+ {
1709
+ "epoch": 22.026811594202897,
1710
+ "grad_norm": 0.007489080540835857,
1711
+ "learning_rate": 1.288244766505636e-05,
1712
+ "loss": 0.0442,
1713
+ "step": 2120
1714
+ },
1715
+ {
1716
+ "epoch": 22.030434782608694,
1717
+ "grad_norm": 0.00951230525970459,
1718
+ "learning_rate": 1.2681159420289857e-05,
1719
+ "loss": 0.0032,
1720
+ "step": 2130
1721
+ },
1722
+ {
1723
+ "epoch": 22.033695652173915,
1724
+ "eval_accuracy": 0.8770491803278688,
1725
+ "eval_f1": 0.8771730619791396,
1726
+ "eval_loss": 0.6454241871833801,
1727
+ "eval_runtime": 37.8713,
1728
+ "eval_samples_per_second": 3.221,
1729
+ "eval_steps_per_second": 0.422,
1730
+ "step": 2139
1731
+ },
1732
+ {
1733
+ "epoch": 23.00036231884058,
1734
+ "grad_norm": 0.005327664315700531,
1735
+ "learning_rate": 1.247987117552335e-05,
1736
+ "loss": 0.027,
1737
+ "step": 2140
1738
+ },
1739
+ {
1740
+ "epoch": 23.003985507246377,
1741
+ "grad_norm": 0.006378709804266691,
1742
+ "learning_rate": 1.2278582930756845e-05,
1743
+ "loss": 0.0005,
1744
+ "step": 2150
1745
+ },
1746
+ {
1747
+ "epoch": 23.007608695652173,
1748
+ "grad_norm": 0.004449001979082823,
1749
+ "learning_rate": 1.2077294685990338e-05,
1750
+ "loss": 0.0166,
1751
+ "step": 2160
1752
+ },
1753
+ {
1754
+ "epoch": 23.01123188405797,
1755
+ "grad_norm": 0.011220560409128666,
1756
+ "learning_rate": 1.1876006441223834e-05,
1757
+ "loss": 0.013,
1758
+ "step": 2170
1759
+ },
1760
+ {
1761
+ "epoch": 23.014855072463767,
1762
+ "grad_norm": 0.007521830964833498,
1763
+ "learning_rate": 1.1674718196457328e-05,
1764
+ "loss": 0.0004,
1765
+ "step": 2180
1766
+ },
1767
+ {
1768
+ "epoch": 23.018478260869564,
1769
+ "grad_norm": 0.06200418993830681,
1770
+ "learning_rate": 1.147342995169082e-05,
1771
+ "loss": 0.0204,
1772
+ "step": 2190
1773
+ },
1774
+ {
1775
+ "epoch": 23.02210144927536,
1776
+ "grad_norm": 0.005162249319255352,
1777
+ "learning_rate": 1.1272141706924317e-05,
1778
+ "loss": 0.0005,
1779
+ "step": 2200
1780
+ },
1781
+ {
1782
+ "epoch": 23.025724637681158,
1783
+ "grad_norm": 0.004768849816173315,
1784
+ "learning_rate": 1.107085346215781e-05,
1785
+ "loss": 0.0963,
1786
+ "step": 2210
1787
+ },
1788
+ {
1789
+ "epoch": 23.029347826086955,
1790
+ "grad_norm": 0.006896668113768101,
1791
+ "learning_rate": 1.0869565217391305e-05,
1792
+ "loss": 0.0209,
1793
+ "step": 2220
1794
+ },
1795
+ {
1796
+ "epoch": 23.032971014492755,
1797
+ "grad_norm": 0.5598499178886414,
1798
+ "learning_rate": 1.06682769726248e-05,
1799
+ "loss": 0.0565,
1800
+ "step": 2230
1801
+ },
1802
+ {
1803
+ "epoch": 23.033695652173915,
1804
+ "eval_accuracy": 0.8524590163934426,
1805
+ "eval_f1": 0.8542214345493033,
1806
+ "eval_loss": 0.8096156120300293,
1807
+ "eval_runtime": 39.1323,
1808
+ "eval_samples_per_second": 3.118,
1809
+ "eval_steps_per_second": 0.409,
1810
+ "step": 2232
1811
+ },
1812
+ {
1813
+ "epoch": 24.002898550724638,
1814
+ "grad_norm": 0.017325986176729202,
1815
+ "learning_rate": 1.0466988727858294e-05,
1816
+ "loss": 0.0005,
1817
+ "step": 2240
1818
+ },
1819
+ {
1820
+ "epoch": 24.006521739130434,
1821
+ "grad_norm": 0.2567996382713318,
1822
+ "learning_rate": 1.0265700483091788e-05,
1823
+ "loss": 0.011,
1824
+ "step": 2250
1825
+ },
1826
+ {
1827
+ "epoch": 24.01014492753623,
1828
+ "grad_norm": 0.008284560404717922,
1829
+ "learning_rate": 1.0064412238325282e-05,
1830
+ "loss": 0.0105,
1831
+ "step": 2260
1832
+ },
1833
+ {
1834
+ "epoch": 24.013768115942028,
1835
+ "grad_norm": 0.021722067147493362,
1836
+ "learning_rate": 9.863123993558776e-06,
1837
+ "loss": 0.0129,
1838
+ "step": 2270
1839
+ },
1840
+ {
1841
+ "epoch": 24.017391304347825,
1842
+ "grad_norm": 0.01306977029889822,
1843
+ "learning_rate": 9.66183574879227e-06,
1844
+ "loss": 0.093,
1845
+ "step": 2280
1846
+ },
1847
+ {
1848
+ "epoch": 24.021014492753622,
1849
+ "grad_norm": 0.006157809402793646,
1850
+ "learning_rate": 9.460547504025765e-06,
1851
+ "loss": 0.0005,
1852
+ "step": 2290
1853
+ },
1854
+ {
1855
+ "epoch": 24.02463768115942,
1856
+ "grad_norm": 0.006750662811100483,
1857
+ "learning_rate": 9.259259259259259e-06,
1858
+ "loss": 0.0004,
1859
+ "step": 2300
1860
+ },
1861
+ {
1862
+ "epoch": 24.028260869565216,
1863
+ "grad_norm": 0.004828326869755983,
1864
+ "learning_rate": 9.057971014492753e-06,
1865
+ "loss": 0.0095,
1866
+ "step": 2310
1867
+ },
1868
+ {
1869
+ "epoch": 24.031884057971016,
1870
+ "grad_norm": 1.7074612379074097,
1871
+ "learning_rate": 8.85668276972625e-06,
1872
+ "loss": 0.011,
1873
+ "step": 2320
1874
+ },
1875
+ {
1876
+ "epoch": 24.033695652173915,
1877
+ "eval_accuracy": 0.8852459016393442,
1878
+ "eval_f1": 0.8858382568953512,
1879
+ "eval_loss": 0.6807242631912231,
1880
+ "eval_runtime": 41.3656,
1881
+ "eval_samples_per_second": 2.949,
1882
+ "eval_steps_per_second": 0.387,
1883
+ "step": 2325
1884
+ },
1885
+ {
1886
+ "epoch": 25.0018115942029,
1887
+ "grad_norm": 0.0043255784548819065,
1888
+ "learning_rate": 8.655394524959742e-06,
1889
+ "loss": 0.0022,
1890
+ "step": 2330
1891
+ },
1892
+ {
1893
+ "epoch": 25.005434782608695,
1894
+ "grad_norm": 0.007067692466080189,
1895
+ "learning_rate": 8.454106280193238e-06,
1896
+ "loss": 0.0073,
1897
+ "step": 2340
1898
+ },
1899
+ {
1900
+ "epoch": 25.009057971014492,
1901
+ "grad_norm": 1.005963683128357,
1902
+ "learning_rate": 8.252818035426732e-06,
1903
+ "loss": 0.0011,
1904
+ "step": 2350
1905
+ },
1906
+ {
1907
+ "epoch": 25.01268115942029,
1908
+ "grad_norm": 0.004243878182023764,
1909
+ "learning_rate": 8.051529790660225e-06,
1910
+ "loss": 0.0004,
1911
+ "step": 2360
1912
+ },
1913
+ {
1914
+ "epoch": 25.016304347826086,
1915
+ "grad_norm": 0.0035646618343889713,
1916
+ "learning_rate": 7.85024154589372e-06,
1917
+ "loss": 0.0004,
1918
+ "step": 2370
1919
+ },
1920
+ {
1921
+ "epoch": 25.019927536231883,
1922
+ "grad_norm": 0.004057039972394705,
1923
+ "learning_rate": 7.648953301127215e-06,
1924
+ "loss": 0.0004,
1925
+ "step": 2380
1926
+ },
1927
+ {
1928
+ "epoch": 25.02355072463768,
1929
+ "grad_norm": 0.010123531334102154,
1930
+ "learning_rate": 7.447665056360708e-06,
1931
+ "loss": 0.0025,
1932
+ "step": 2390
1933
+ },
1934
+ {
1935
+ "epoch": 25.027173913043477,
1936
+ "grad_norm": 1.058449387550354,
1937
+ "learning_rate": 7.246376811594203e-06,
1938
+ "loss": 0.0228,
1939
+ "step": 2400
1940
+ },
1941
+ {
1942
+ "epoch": 25.030797101449274,
1943
+ "grad_norm": 2.1814162731170654,
1944
+ "learning_rate": 7.045088566827697e-06,
1945
+ "loss": 0.0146,
1946
+ "step": 2410
1947
+ },
1948
+ {
1949
+ "epoch": 25.033695652173915,
1950
+ "eval_accuracy": 0.8688524590163934,
1951
+ "eval_f1": 0.8695984040246334,
1952
+ "eval_loss": 0.7754350900650024,
1953
+ "eval_runtime": 41.8119,
1954
+ "eval_samples_per_second": 2.918,
1955
+ "eval_steps_per_second": 0.383,
1956
+ "step": 2418
1957
+ },
1958
+ {
1959
+ "epoch": 26.00072463768116,
1960
+ "grad_norm": 0.00446619326248765,
1961
+ "learning_rate": 6.843800322061192e-06,
1962
+ "loss": 0.0469,
1963
+ "step": 2420
1964
+ },
1965
+ {
1966
+ "epoch": 26.004347826086956,
1967
+ "grad_norm": 0.005227618385106325,
1968
+ "learning_rate": 6.642512077294687e-06,
1969
+ "loss": 0.0105,
1970
+ "step": 2430
1971
+ },
1972
+ {
1973
+ "epoch": 26.007971014492753,
1974
+ "grad_norm": 2.0163729190826416,
1975
+ "learning_rate": 6.44122383252818e-06,
1976
+ "loss": 0.0116,
1977
+ "step": 2440
1978
+ },
1979
+ {
1980
+ "epoch": 26.01159420289855,
1981
+ "grad_norm": 0.0069722458720207214,
1982
+ "learning_rate": 6.239935587761675e-06,
1983
+ "loss": 0.0079,
1984
+ "step": 2450
1985
+ },
1986
+ {
1987
+ "epoch": 26.015217391304347,
1988
+ "grad_norm": 0.006805983372032642,
1989
+ "learning_rate": 6.038647342995169e-06,
1990
+ "loss": 0.0004,
1991
+ "step": 2460
1992
+ },
1993
+ {
1994
+ "epoch": 26.018840579710144,
1995
+ "grad_norm": 0.005044913850724697,
1996
+ "learning_rate": 5.837359098228664e-06,
1997
+ "loss": 0.0036,
1998
+ "step": 2470
1999
+ },
2000
+ {
2001
+ "epoch": 26.02246376811594,
2002
+ "grad_norm": 0.005564519669860601,
2003
+ "learning_rate": 5.636070853462158e-06,
2004
+ "loss": 0.0003,
2005
+ "step": 2480
2006
+ },
2007
+ {
2008
+ "epoch": 26.026086956521738,
2009
+ "grad_norm": 0.003706958144903183,
2010
+ "learning_rate": 5.4347826086956525e-06,
2011
+ "loss": 0.0004,
2012
+ "step": 2490
2013
+ },
2014
+ {
2015
+ "epoch": 26.029710144927535,
2016
+ "grad_norm": 0.02714346908032894,
2017
+ "learning_rate": 5.233494363929147e-06,
2018
+ "loss": 0.0191,
2019
+ "step": 2500
2020
+ },
2021
+ {
2022
+ "epoch": 26.033333333333335,
2023
+ "grad_norm": 0.008597085252404213,
2024
+ "learning_rate": 5.032206119162641e-06,
2025
+ "loss": 0.0004,
2026
+ "step": 2510
2027
+ },
2028
+ {
2029
+ "epoch": 26.033695652173915,
2030
+ "eval_accuracy": 0.8852459016393442,
2031
+ "eval_f1": 0.885748980830948,
2032
+ "eval_loss": 0.7246056199073792,
2033
+ "eval_runtime": 39.2721,
2034
+ "eval_samples_per_second": 3.107,
2035
+ "eval_steps_per_second": 0.407,
2036
+ "step": 2511
2037
+ },
2038
+ {
2039
+ "epoch": 27.003260869565217,
2040
+ "grad_norm": 0.004729451611638069,
2041
+ "learning_rate": 4.830917874396135e-06,
2042
+ "loss": 0.0004,
2043
+ "step": 2520
2044
+ },
2045
+ {
2046
+ "epoch": 27.006884057971014,
2047
+ "grad_norm": 0.0047379061579704285,
2048
+ "learning_rate": 4.6296296296296296e-06,
2049
+ "loss": 0.0169,
2050
+ "step": 2530
2051
+ },
2052
+ {
2053
+ "epoch": 27.01050724637681,
2054
+ "grad_norm": 0.0059508224949240685,
2055
+ "learning_rate": 4.428341384863125e-06,
2056
+ "loss": 0.0003,
2057
+ "step": 2540
2058
+ },
2059
+ {
2060
+ "epoch": 27.014130434782608,
2061
+ "grad_norm": 0.004985982086509466,
2062
+ "learning_rate": 4.227053140096619e-06,
2063
+ "loss": 0.0004,
2064
+ "step": 2550
2065
+ },
2066
+ {
2067
+ "epoch": 27.017753623188405,
2068
+ "grad_norm": 0.004285223316401243,
2069
+ "learning_rate": 4.025764895330112e-06,
2070
+ "loss": 0.0064,
2071
+ "step": 2560
2072
+ },
2073
+ {
2074
+ "epoch": 27.0213768115942,
2075
+ "grad_norm": 0.00481518916785717,
2076
+ "learning_rate": 3.8244766505636074e-06,
2077
+ "loss": 0.0036,
2078
+ "step": 2570
2079
+ },
2080
+ {
2081
+ "epoch": 27.025,
2082
+ "grad_norm": 0.005598173942416906,
2083
+ "learning_rate": 3.6231884057971017e-06,
2084
+ "loss": 0.0265,
2085
+ "step": 2580
2086
+ },
2087
+ {
2088
+ "epoch": 27.028623188405795,
2089
+ "grad_norm": 0.005482200998812914,
2090
+ "learning_rate": 3.421900161030596e-06,
2091
+ "loss": 0.0003,
2092
+ "step": 2590
2093
+ },
2094
+ {
2095
+ "epoch": 27.032246376811596,
2096
+ "grad_norm": 0.0035128567833453417,
2097
+ "learning_rate": 3.22061191626409e-06,
2098
+ "loss": 0.0004,
2099
+ "step": 2600
2100
+ },
2101
+ {
2102
+ "epoch": 27.033695652173915,
2103
+ "eval_accuracy": 0.8934426229508197,
2104
+ "eval_f1": 0.8942102524069736,
2105
+ "eval_loss": 0.7165006995201111,
2106
+ "eval_runtime": 37.9801,
2107
+ "eval_samples_per_second": 3.212,
2108
+ "eval_steps_per_second": 0.421,
2109
+ "step": 2604
2110
+ },
2111
+ {
2112
+ "epoch": 28.002173913043478,
2113
+ "grad_norm": 0.004918423481285572,
2114
+ "learning_rate": 3.0193236714975845e-06,
2115
+ "loss": 0.0003,
2116
+ "step": 2610
2117
+ },
2118
+ {
2119
+ "epoch": 28.005797101449275,
2120
+ "grad_norm": 0.00445589842274785,
2121
+ "learning_rate": 2.818035426731079e-06,
2122
+ "loss": 0.0096,
2123
+ "step": 2620
2124
+ },
2125
+ {
2126
+ "epoch": 28.009420289855072,
2127
+ "grad_norm": 0.0045196604914963245,
2128
+ "learning_rate": 2.6167471819645734e-06,
2129
+ "loss": 0.0159,
2130
+ "step": 2630
2131
+ },
2132
+ {
2133
+ "epoch": 28.01304347826087,
2134
+ "grad_norm": 0.005592667497694492,
2135
+ "learning_rate": 2.4154589371980677e-06,
2136
+ "loss": 0.0135,
2137
+ "step": 2640
2138
+ },
2139
+ {
2140
+ "epoch": 28.016666666666666,
2141
+ "grad_norm": 0.004337575286626816,
2142
+ "learning_rate": 2.2141706924315623e-06,
2143
+ "loss": 0.0003,
2144
+ "step": 2650
2145
+ },
2146
+ {
2147
+ "epoch": 28.020289855072463,
2148
+ "grad_norm": 0.014842136763036251,
2149
+ "learning_rate": 2.012882447665056e-06,
2150
+ "loss": 0.0004,
2151
+ "step": 2660
2152
+ },
2153
+ {
2154
+ "epoch": 28.02391304347826,
2155
+ "grad_norm": 2.614138603210449,
2156
+ "learning_rate": 1.8115942028985508e-06,
2157
+ "loss": 0.0216,
2158
+ "step": 2670
2159
+ },
2160
+ {
2161
+ "epoch": 28.027536231884056,
2162
+ "grad_norm": 0.004238471854478121,
2163
+ "learning_rate": 1.610305958132045e-06,
2164
+ "loss": 0.0048,
2165
+ "step": 2680
2166
+ },
2167
+ {
2168
+ "epoch": 28.031159420289853,
2169
+ "grad_norm": 0.006333829369395971,
2170
+ "learning_rate": 1.4090177133655396e-06,
2171
+ "loss": 0.0003,
2172
+ "step": 2690
2173
+ },
2174
+ {
2175
+ "epoch": 28.033695652173915,
2176
+ "eval_accuracy": 0.9016393442622951,
2177
+ "eval_f1": 0.90210919178688,
2178
+ "eval_loss": 0.723217248916626,
2179
+ "eval_runtime": 42.3468,
2180
+ "eval_samples_per_second": 2.881,
2181
+ "eval_steps_per_second": 0.378,
2182
+ "step": 2697
2183
+ },
2184
+ {
2185
+ "epoch": 29.00108695652174,
2186
+ "grad_norm": 0.00406339205801487,
2187
+ "learning_rate": 1.2077294685990338e-06,
2188
+ "loss": 0.0003,
2189
+ "step": 2700
2190
+ },
2191
+ {
2192
+ "epoch": 29.004710144927536,
2193
+ "grad_norm": 0.00470451544970274,
2194
+ "learning_rate": 1.006441223832528e-06,
2195
+ "loss": 0.0052,
2196
+ "step": 2710
2197
+ },
2198
+ {
2199
+ "epoch": 29.008333333333333,
2200
+ "grad_norm": 0.0051088836044073105,
2201
+ "learning_rate": 8.051529790660226e-07,
2202
+ "loss": 0.0003,
2203
+ "step": 2720
2204
+ },
2205
+ {
2206
+ "epoch": 29.01195652173913,
2207
+ "grad_norm": 0.00631117494776845,
2208
+ "learning_rate": 6.038647342995169e-07,
2209
+ "loss": 0.0054,
2210
+ "step": 2730
2211
+ },
2212
+ {
2213
+ "epoch": 29.015579710144927,
2214
+ "grad_norm": 0.006218273192644119,
2215
+ "learning_rate": 4.025764895330113e-07,
2216
+ "loss": 0.0003,
2217
+ "step": 2740
2218
+ },
2219
+ {
2220
+ "epoch": 29.019202898550724,
2221
+ "grad_norm": 0.004932409152388573,
2222
+ "learning_rate": 2.0128824476650564e-07,
2223
+ "loss": 0.0043,
2224
+ "step": 2750
2225
+ },
2226
+ {
2227
+ "epoch": 29.02282608695652,
2228
+ "grad_norm": 0.00396711053326726,
2229
+ "learning_rate": 0.0,
2230
+ "loss": 0.0177,
2231
+ "step": 2760
2232
+ },
2233
+ {
2234
+ "epoch": 29.02282608695652,
2235
+ "eval_accuracy": 0.9016393442622951,
2236
+ "eval_f1": 0.90210919178688,
2237
+ "eval_loss": 0.7259094715118408,
2238
+ "eval_runtime": 42.2094,
2239
+ "eval_samples_per_second": 2.89,
2240
+ "eval_steps_per_second": 0.379,
2241
+ "step": 2760
2242
+ },
2243
+ {
2244
+ "epoch": 29.02282608695652,
2245
+ "step": 2760,
2246
+ "total_flos": 2.7405187314155127e+19,
2247
+ "train_loss": 0.1748103732693657,
2248
+ "train_runtime": 10578.7771,
2249
+ "train_samples_per_second": 2.087,
2250
+ "train_steps_per_second": 0.261
2251
+ }
2252
+ ],
2253
+ "logging_steps": 10,
2254
+ "max_steps": 2760,
2255
+ "num_input_tokens_seen": 0,
2256
+ "num_train_epochs": 9223372036854775807,
2257
+ "save_steps": 500,
2258
+ "stateful_callbacks": {
2259
+ "TrainerControl": {
2260
+ "args": {
2261
+ "should_epoch_stop": false,
2262
+ "should_evaluate": false,
2263
+ "should_log": false,
2264
+ "should_save": true,
2265
+ "should_training_stop": true
2266
+ },
2267
+ "attributes": {}
2268
+ }
2269
+ },
2270
+ "total_flos": 2.7405187314155127e+19,
2271
+ "train_batch_size": 8,
2272
+ "trial_name": null,
2273
+ "trial_params": null
2274
+ }