zqTensor commited on
Commit
a9451f9
·
verified ·
1 Parent(s): 525fdfc

End of training

Browse files
Files changed (5) hide show
  1. README.md +3 -1
  2. all_results.json +11 -11
  3. eval_results.json +6 -6
  4. train_results.json +6 -6
  5. trainer_state.json +2194 -14
README.md CHANGED
@@ -3,6 +3,8 @@ library_name: transformers
3
  license: apache-2.0
4
  base_model: google/vit-base-patch16-224-in21k
5
  tags:
 
 
6
  - generated_from_trainer
7
  metrics:
8
  - accuracy
@@ -16,7 +18,7 @@ should probably proofread and complete it, then remove this comment. -->
16
 
17
  # vit-base-beans
18
 
19
- This model is a fine-tuned version of [google/vit-base-patch16-224-in21k](https://huggingface.co/google/vit-base-patch16-224-in21k) on an unknown dataset.
20
  It achieves the following results on the evaluation set:
21
  - Loss: 0.0079
22
  - Accuracy: 1.0
 
3
  license: apache-2.0
4
  base_model: google/vit-base-patch16-224-in21k
5
  tags:
6
+ - image-classification
7
+ - vision
8
  - generated_from_trainer
9
  metrics:
10
  - accuracy
 
18
 
19
  # vit-base-beans
20
 
21
+ This model is a fine-tuned version of [google/vit-base-patch16-224-in21k](https://huggingface.co/google/vit-base-patch16-224-in21k) on the beans dataset.
22
  It achieves the following results on the evaluation set:
23
  - Loss: 0.0079
24
  - Accuracy: 1.0
all_results.json CHANGED
@@ -1,13 +1,13 @@
1
  {
2
- "epoch": 5.0,
3
- "eval_accuracy": 0.9924812030075187,
4
- "eval_loss": 0.06560193002223969,
5
- "eval_runtime": 0.8157,
6
- "eval_samples_per_second": 163.052,
7
- "eval_steps_per_second": 11.034,
8
- "total_flos": 4.006371770595533e+17,
9
- "train_loss": 0.0,
10
- "train_runtime": 3.5904,
11
- "train_samples_per_second": 1439.946,
12
- "train_steps_per_second": 90.519
13
  }
 
1
  {
2
+ "epoch": 50.0,
3
+ "eval_accuracy": 1.0,
4
+ "eval_loss": 0.007947824895381927,
5
+ "eval_runtime": 0.6215,
6
+ "eval_samples_per_second": 213.993,
7
+ "eval_steps_per_second": 14.481,
8
+ "total_flos": 3.6243328994998477e+18,
9
+ "train_loss": 0.03743308119131968,
10
+ "train_runtime": 550.1735,
11
+ "train_samples_per_second": 93.97,
12
+ "train_steps_per_second": 5.907
13
  }
eval_results.json CHANGED
@@ -1,8 +1,8 @@
1
  {
2
- "epoch": 5.0,
3
- "eval_accuracy": 0.9924812030075187,
4
- "eval_loss": 0.06560193002223969,
5
- "eval_runtime": 0.8157,
6
- "eval_samples_per_second": 163.052,
7
- "eval_steps_per_second": 11.034
8
  }
 
1
  {
2
+ "epoch": 50.0,
3
+ "eval_accuracy": 1.0,
4
+ "eval_loss": 0.007947824895381927,
5
+ "eval_runtime": 0.6215,
6
+ "eval_samples_per_second": 213.993,
7
+ "eval_steps_per_second": 14.481
8
  }
train_results.json CHANGED
@@ -1,8 +1,8 @@
1
  {
2
- "epoch": 5.0,
3
- "total_flos": 4.006371770595533e+17,
4
- "train_loss": 0.0,
5
- "train_runtime": 3.5904,
6
- "train_samples_per_second": 1439.946,
7
- "train_steps_per_second": 90.519
8
  }
 
1
  {
2
+ "epoch": 50.0,
3
+ "total_flos": 3.6243328994998477e+18,
4
+ "train_loss": 0.03743308119131968,
5
+ "train_runtime": 550.1735,
6
+ "train_samples_per_second": 93.97,
7
+ "train_steps_per_second": 5.907
8
  }
trainer_state.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
- "best_metric": 0.06558172404766083,
3
- "best_model_checkpoint": "./beans_outputs/checkpoint-520",
4
- "epoch": 5.0,
5
  "eval_steps": 500,
6
- "global_step": 650,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
@@ -509,19 +509,2199 @@
509
  "step": 650
510
  },
511
  {
512
- "epoch": 5.0,
513
- "step": 650,
514
- "total_flos": 4.006371770595533e+17,
515
- "train_loss": 0.0,
516
- "train_runtime": 3.5904,
517
- "train_samples_per_second": 1439.946,
518
- "train_steps_per_second": 90.519
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
519
  }
520
  ],
521
  "logging_steps": 10,
522
- "max_steps": 325,
523
  "num_input_tokens_seen": 0,
524
- "num_train_epochs": 5,
525
  "save_steps": 500,
526
  "stateful_callbacks": {
527
  "TrainerControl": {
@@ -535,7 +2715,7 @@
535
  "attributes": {}
536
  }
537
  },
538
- "total_flos": 4.006371770595533e+17,
539
  "train_batch_size": 8,
540
  "trial_name": null,
541
  "trial_params": null
 
1
  {
2
+ "best_metric": 0.007947824895381927,
3
+ "best_model_checkpoint": "./beans_outputs/checkpoint-3250",
4
+ "epoch": 50.0,
5
  "eval_steps": 500,
6
+ "global_step": 3250,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
 
509
  "step": 650
510
  },
511
  {
512
+ "epoch": 10.153846153846153,
513
+ "grad_norm": 2.079591751098633,
514
+ "learning_rate": 1.593846153846154e-05,
515
+ "loss": 0.0956,
516
+ "step": 660
517
+ },
518
+ {
519
+ "epoch": 10.307692307692308,
520
+ "grad_norm": 0.22365273535251617,
521
+ "learning_rate": 1.587692307692308e-05,
522
+ "loss": 0.0704,
523
+ "step": 670
524
+ },
525
+ {
526
+ "epoch": 10.461538461538462,
527
+ "grad_norm": 1.2620346546173096,
528
+ "learning_rate": 1.5815384615384616e-05,
529
+ "loss": 0.0927,
530
+ "step": 680
531
+ },
532
+ {
533
+ "epoch": 10.615384615384615,
534
+ "grad_norm": 0.17169518768787384,
535
+ "learning_rate": 1.5753846153846154e-05,
536
+ "loss": 0.1318,
537
+ "step": 690
538
+ },
539
+ {
540
+ "epoch": 10.76923076923077,
541
+ "grad_norm": 1.01598060131073,
542
+ "learning_rate": 1.5692307692307693e-05,
543
+ "loss": 0.1102,
544
+ "step": 700
545
+ },
546
+ {
547
+ "epoch": 10.923076923076923,
548
+ "grad_norm": 1.3003897666931152,
549
+ "learning_rate": 1.5630769230769232e-05,
550
+ "loss": 0.098,
551
+ "step": 710
552
+ },
553
+ {
554
+ "epoch": 11.0,
555
+ "eval_accuracy": 0.9924812030075187,
556
+ "eval_loss": 0.056982800364494324,
557
+ "eval_runtime": 0.6404,
558
+ "eval_samples_per_second": 207.697,
559
+ "eval_steps_per_second": 14.055,
560
+ "step": 715
561
+ },
562
+ {
563
+ "epoch": 11.076923076923077,
564
+ "grad_norm": 2.616173505783081,
565
+ "learning_rate": 1.556923076923077e-05,
566
+ "loss": 0.1061,
567
+ "step": 720
568
+ },
569
+ {
570
+ "epoch": 11.23076923076923,
571
+ "grad_norm": 0.20168966054916382,
572
+ "learning_rate": 1.550769230769231e-05,
573
+ "loss": 0.1275,
574
+ "step": 730
575
+ },
576
+ {
577
+ "epoch": 11.384615384615385,
578
+ "grad_norm": 2.059192419052124,
579
+ "learning_rate": 1.544615384615385e-05,
580
+ "loss": 0.0463,
581
+ "step": 740
582
+ },
583
+ {
584
+ "epoch": 11.538461538461538,
585
+ "grad_norm": 0.1697787493467331,
586
+ "learning_rate": 1.5384615384615387e-05,
587
+ "loss": 0.0759,
588
+ "step": 750
589
+ },
590
+ {
591
+ "epoch": 11.692307692307692,
592
+ "grad_norm": 0.8658211827278137,
593
+ "learning_rate": 1.5323076923076926e-05,
594
+ "loss": 0.1983,
595
+ "step": 760
596
+ },
597
+ {
598
+ "epoch": 11.846153846153847,
599
+ "grad_norm": 0.4407813251018524,
600
+ "learning_rate": 1.5261538461538465e-05,
601
+ "loss": 0.1181,
602
+ "step": 770
603
+ },
604
+ {
605
+ "epoch": 12.0,
606
+ "grad_norm": 6.601933479309082,
607
+ "learning_rate": 1.5200000000000002e-05,
608
+ "loss": 0.0935,
609
+ "step": 780
610
+ },
611
+ {
612
+ "epoch": 12.0,
613
+ "eval_accuracy": 1.0,
614
+ "eval_loss": 0.04177865758538246,
615
+ "eval_runtime": 0.6316,
616
+ "eval_samples_per_second": 210.569,
617
+ "eval_steps_per_second": 14.249,
618
+ "step": 780
619
+ },
620
+ {
621
+ "epoch": 12.153846153846153,
622
+ "grad_norm": 1.3233654499053955,
623
+ "learning_rate": 1.5138461538461539e-05,
624
+ "loss": 0.0949,
625
+ "step": 790
626
+ },
627
+ {
628
+ "epoch": 12.307692307692308,
629
+ "grad_norm": 10.496395111083984,
630
+ "learning_rate": 1.5076923076923078e-05,
631
+ "loss": 0.1141,
632
+ "step": 800
633
+ },
634
+ {
635
+ "epoch": 12.461538461538462,
636
+ "grad_norm": 0.40288811922073364,
637
+ "learning_rate": 1.5015384615384617e-05,
638
+ "loss": 0.0662,
639
+ "step": 810
640
+ },
641
+ {
642
+ "epoch": 12.615384615384615,
643
+ "grad_norm": 1.3594541549682617,
644
+ "learning_rate": 1.4953846153846154e-05,
645
+ "loss": 0.0672,
646
+ "step": 820
647
+ },
648
+ {
649
+ "epoch": 12.76923076923077,
650
+ "grad_norm": 0.15957416594028473,
651
+ "learning_rate": 1.4892307692307692e-05,
652
+ "loss": 0.1198,
653
+ "step": 830
654
+ },
655
+ {
656
+ "epoch": 12.923076923076923,
657
+ "grad_norm": 3.9073538780212402,
658
+ "learning_rate": 1.4830769230769233e-05,
659
+ "loss": 0.0907,
660
+ "step": 840
661
+ },
662
+ {
663
+ "epoch": 13.0,
664
+ "eval_accuracy": 0.9699248120300752,
665
+ "eval_loss": 0.10930211842060089,
666
+ "eval_runtime": 0.6314,
667
+ "eval_samples_per_second": 210.658,
668
+ "eval_steps_per_second": 14.255,
669
+ "step": 845
670
+ },
671
+ {
672
+ "epoch": 13.076923076923077,
673
+ "grad_norm": 4.819633483886719,
674
+ "learning_rate": 1.4769230769230772e-05,
675
+ "loss": 0.0678,
676
+ "step": 850
677
+ },
678
+ {
679
+ "epoch": 13.23076923076923,
680
+ "grad_norm": 0.1750306636095047,
681
+ "learning_rate": 1.4707692307692309e-05,
682
+ "loss": 0.0498,
683
+ "step": 860
684
+ },
685
+ {
686
+ "epoch": 13.384615384615385,
687
+ "grad_norm": 0.7398102879524231,
688
+ "learning_rate": 1.4646153846153848e-05,
689
+ "loss": 0.112,
690
+ "step": 870
691
+ },
692
+ {
693
+ "epoch": 13.538461538461538,
694
+ "grad_norm": 1.1426323652267456,
695
+ "learning_rate": 1.4584615384615386e-05,
696
+ "loss": 0.061,
697
+ "step": 880
698
+ },
699
+ {
700
+ "epoch": 13.692307692307692,
701
+ "grad_norm": 0.16307789087295532,
702
+ "learning_rate": 1.4523076923076923e-05,
703
+ "loss": 0.0405,
704
+ "step": 890
705
+ },
706
+ {
707
+ "epoch": 13.846153846153847,
708
+ "grad_norm": 3.81508207321167,
709
+ "learning_rate": 1.4461538461538462e-05,
710
+ "loss": 0.0768,
711
+ "step": 900
712
+ },
713
+ {
714
+ "epoch": 14.0,
715
+ "grad_norm": 0.482637494802475,
716
+ "learning_rate": 1.4400000000000001e-05,
717
+ "loss": 0.0947,
718
+ "step": 910
719
+ },
720
+ {
721
+ "epoch": 14.0,
722
+ "eval_accuracy": 1.0,
723
+ "eval_loss": 0.03473825752735138,
724
+ "eval_runtime": 0.6324,
725
+ "eval_samples_per_second": 210.312,
726
+ "eval_steps_per_second": 14.232,
727
+ "step": 910
728
+ },
729
+ {
730
+ "epoch": 14.153846153846153,
731
+ "grad_norm": 0.18653550744056702,
732
+ "learning_rate": 1.4338461538461538e-05,
733
+ "loss": 0.0853,
734
+ "step": 920
735
+ },
736
+ {
737
+ "epoch": 14.307692307692308,
738
+ "grad_norm": 1.822533130645752,
739
+ "learning_rate": 1.4276923076923077e-05,
740
+ "loss": 0.1008,
741
+ "step": 930
742
+ },
743
+ {
744
+ "epoch": 14.461538461538462,
745
+ "grad_norm": 0.32205572724342346,
746
+ "learning_rate": 1.4215384615384617e-05,
747
+ "loss": 0.0638,
748
+ "step": 940
749
+ },
750
+ {
751
+ "epoch": 14.615384615384615,
752
+ "grad_norm": 0.5714139342308044,
753
+ "learning_rate": 1.4153846153846156e-05,
754
+ "loss": 0.1159,
755
+ "step": 950
756
+ },
757
+ {
758
+ "epoch": 14.76923076923077,
759
+ "grad_norm": 5.901656627655029,
760
+ "learning_rate": 1.4092307692307693e-05,
761
+ "loss": 0.091,
762
+ "step": 960
763
+ },
764
+ {
765
+ "epoch": 14.923076923076923,
766
+ "grad_norm": 2.119544744491577,
767
+ "learning_rate": 1.4030769230769232e-05,
768
+ "loss": 0.1259,
769
+ "step": 970
770
+ },
771
+ {
772
+ "epoch": 15.0,
773
+ "eval_accuracy": 0.9849624060150376,
774
+ "eval_loss": 0.07099475711584091,
775
+ "eval_runtime": 0.6277,
776
+ "eval_samples_per_second": 211.88,
777
+ "eval_steps_per_second": 14.338,
778
+ "step": 975
779
+ },
780
+ {
781
+ "epoch": 15.076923076923077,
782
+ "grad_norm": 0.14739356935024261,
783
+ "learning_rate": 1.3969230769230771e-05,
784
+ "loss": 0.082,
785
+ "step": 980
786
+ },
787
+ {
788
+ "epoch": 15.23076923076923,
789
+ "grad_norm": 0.12192496657371521,
790
+ "learning_rate": 1.3907692307692308e-05,
791
+ "loss": 0.0558,
792
+ "step": 990
793
+ },
794
+ {
795
+ "epoch": 15.384615384615385,
796
+ "grad_norm": 0.11841657012701035,
797
+ "learning_rate": 1.3846153846153847e-05,
798
+ "loss": 0.0852,
799
+ "step": 1000
800
+ },
801
+ {
802
+ "epoch": 15.538461538461538,
803
+ "grad_norm": 0.8062542080879211,
804
+ "learning_rate": 1.3784615384615386e-05,
805
+ "loss": 0.0793,
806
+ "step": 1010
807
+ },
808
+ {
809
+ "epoch": 15.692307692307692,
810
+ "grad_norm": 0.11781653016805649,
811
+ "learning_rate": 1.3723076923076923e-05,
812
+ "loss": 0.0436,
813
+ "step": 1020
814
+ },
815
+ {
816
+ "epoch": 15.846153846153847,
817
+ "grad_norm": 0.4233933389186859,
818
+ "learning_rate": 1.3661538461538461e-05,
819
+ "loss": 0.0448,
820
+ "step": 1030
821
+ },
822
+ {
823
+ "epoch": 16.0,
824
+ "grad_norm": 0.3999457061290741,
825
+ "learning_rate": 1.3600000000000002e-05,
826
+ "loss": 0.0325,
827
+ "step": 1040
828
+ },
829
+ {
830
+ "epoch": 16.0,
831
+ "eval_accuracy": 0.9774436090225563,
832
+ "eval_loss": 0.05867745727300644,
833
+ "eval_runtime": 0.6418,
834
+ "eval_samples_per_second": 207.244,
835
+ "eval_steps_per_second": 14.024,
836
+ "step": 1040
837
+ },
838
+ {
839
+ "epoch": 16.153846153846153,
840
+ "grad_norm": 0.36126333475112915,
841
+ "learning_rate": 1.353846153846154e-05,
842
+ "loss": 0.0373,
843
+ "step": 1050
844
+ },
845
+ {
846
+ "epoch": 16.307692307692307,
847
+ "grad_norm": 2.5870630741119385,
848
+ "learning_rate": 1.3476923076923078e-05,
849
+ "loss": 0.0746,
850
+ "step": 1060
851
+ },
852
+ {
853
+ "epoch": 16.46153846153846,
854
+ "grad_norm": 10.347090721130371,
855
+ "learning_rate": 1.3415384615384617e-05,
856
+ "loss": 0.047,
857
+ "step": 1070
858
+ },
859
+ {
860
+ "epoch": 16.615384615384617,
861
+ "grad_norm": 0.5827314257621765,
862
+ "learning_rate": 1.3353846153846155e-05,
863
+ "loss": 0.0832,
864
+ "step": 1080
865
+ },
866
+ {
867
+ "epoch": 16.76923076923077,
868
+ "grad_norm": 0.1550850123167038,
869
+ "learning_rate": 1.3292307692307692e-05,
870
+ "loss": 0.1211,
871
+ "step": 1090
872
+ },
873
+ {
874
+ "epoch": 16.923076923076923,
875
+ "grad_norm": 0.8602961897850037,
876
+ "learning_rate": 1.3230769230769231e-05,
877
+ "loss": 0.1397,
878
+ "step": 1100
879
+ },
880
+ {
881
+ "epoch": 17.0,
882
+ "eval_accuracy": 0.9924812030075187,
883
+ "eval_loss": 0.049453094601631165,
884
+ "eval_runtime": 0.6291,
885
+ "eval_samples_per_second": 211.407,
886
+ "eval_steps_per_second": 14.306,
887
+ "step": 1105
888
+ },
889
+ {
890
+ "epoch": 17.076923076923077,
891
+ "grad_norm": 1.7802873849868774,
892
+ "learning_rate": 1.316923076923077e-05,
893
+ "loss": 0.0908,
894
+ "step": 1110
895
+ },
896
+ {
897
+ "epoch": 17.23076923076923,
898
+ "grad_norm": 0.10594117641448975,
899
+ "learning_rate": 1.3107692307692307e-05,
900
+ "loss": 0.0698,
901
+ "step": 1120
902
+ },
903
+ {
904
+ "epoch": 17.384615384615383,
905
+ "grad_norm": 0.21712522208690643,
906
+ "learning_rate": 1.3046153846153846e-05,
907
+ "loss": 0.1116,
908
+ "step": 1130
909
+ },
910
+ {
911
+ "epoch": 17.53846153846154,
912
+ "grad_norm": 6.615525245666504,
913
+ "learning_rate": 1.2984615384615386e-05,
914
+ "loss": 0.0767,
915
+ "step": 1140
916
+ },
917
+ {
918
+ "epoch": 17.692307692307693,
919
+ "grad_norm": 0.10564947128295898,
920
+ "learning_rate": 1.2923076923076925e-05,
921
+ "loss": 0.0522,
922
+ "step": 1150
923
+ },
924
+ {
925
+ "epoch": 17.846153846153847,
926
+ "grad_norm": 1.2303924560546875,
927
+ "learning_rate": 1.2861538461538462e-05,
928
+ "loss": 0.0558,
929
+ "step": 1160
930
+ },
931
+ {
932
+ "epoch": 18.0,
933
+ "grad_norm": 0.10486020892858505,
934
+ "learning_rate": 1.2800000000000001e-05,
935
+ "loss": 0.0456,
936
+ "step": 1170
937
+ },
938
+ {
939
+ "epoch": 18.0,
940
+ "eval_accuracy": 0.9774436090225563,
941
+ "eval_loss": 0.051864467561244965,
942
+ "eval_runtime": 0.6367,
943
+ "eval_samples_per_second": 208.899,
944
+ "eval_steps_per_second": 14.136,
945
+ "step": 1170
946
+ },
947
+ {
948
+ "epoch": 18.153846153846153,
949
+ "grad_norm": 0.09572620689868927,
950
+ "learning_rate": 1.273846153846154e-05,
951
+ "loss": 0.0446,
952
+ "step": 1180
953
+ },
954
+ {
955
+ "epoch": 18.307692307692307,
956
+ "grad_norm": 0.12069143354892731,
957
+ "learning_rate": 1.2676923076923077e-05,
958
+ "loss": 0.0612,
959
+ "step": 1190
960
+ },
961
+ {
962
+ "epoch": 18.46153846153846,
963
+ "grad_norm": 3.15175724029541,
964
+ "learning_rate": 1.2615384615384616e-05,
965
+ "loss": 0.1213,
966
+ "step": 1200
967
+ },
968
+ {
969
+ "epoch": 18.615384615384617,
970
+ "grad_norm": 0.6527738571166992,
971
+ "learning_rate": 1.2553846153846155e-05,
972
+ "loss": 0.1027,
973
+ "step": 1210
974
+ },
975
+ {
976
+ "epoch": 18.76923076923077,
977
+ "grad_norm": 0.6189332604408264,
978
+ "learning_rate": 1.2492307692307692e-05,
979
+ "loss": 0.0262,
980
+ "step": 1220
981
+ },
982
+ {
983
+ "epoch": 18.923076923076923,
984
+ "grad_norm": 0.1209394633769989,
985
+ "learning_rate": 1.243076923076923e-05,
986
+ "loss": 0.0439,
987
+ "step": 1230
988
+ },
989
+ {
990
+ "epoch": 19.0,
991
+ "eval_accuracy": 1.0,
992
+ "eval_loss": 0.021639494225382805,
993
+ "eval_runtime": 0.6265,
994
+ "eval_samples_per_second": 212.296,
995
+ "eval_steps_per_second": 14.366,
996
+ "step": 1235
997
+ },
998
+ {
999
+ "epoch": 19.076923076923077,
1000
+ "grad_norm": 6.580589771270752,
1001
+ "learning_rate": 1.2369230769230771e-05,
1002
+ "loss": 0.0469,
1003
+ "step": 1240
1004
+ },
1005
+ {
1006
+ "epoch": 19.23076923076923,
1007
+ "grad_norm": 8.734590530395508,
1008
+ "learning_rate": 1.230769230769231e-05,
1009
+ "loss": 0.0297,
1010
+ "step": 1250
1011
+ },
1012
+ {
1013
+ "epoch": 19.384615384615383,
1014
+ "grad_norm": 1.7616609334945679,
1015
+ "learning_rate": 1.2246153846153847e-05,
1016
+ "loss": 0.1147,
1017
+ "step": 1260
1018
+ },
1019
+ {
1020
+ "epoch": 19.53846153846154,
1021
+ "grad_norm": 0.3767222464084625,
1022
+ "learning_rate": 1.2184615384615386e-05,
1023
+ "loss": 0.0499,
1024
+ "step": 1270
1025
+ },
1026
+ {
1027
+ "epoch": 19.692307692307693,
1028
+ "grad_norm": 0.1071651503443718,
1029
+ "learning_rate": 1.2123076923076924e-05,
1030
+ "loss": 0.0623,
1031
+ "step": 1280
1032
+ },
1033
+ {
1034
+ "epoch": 19.846153846153847,
1035
+ "grad_norm": 4.712902545928955,
1036
+ "learning_rate": 1.2061538461538462e-05,
1037
+ "loss": 0.0365,
1038
+ "step": 1290
1039
+ },
1040
+ {
1041
+ "epoch": 20.0,
1042
+ "grad_norm": 0.08926769345998764,
1043
+ "learning_rate": 1.2e-05,
1044
+ "loss": 0.0484,
1045
+ "step": 1300
1046
+ },
1047
+ {
1048
+ "epoch": 20.0,
1049
+ "eval_accuracy": 0.9924812030075187,
1050
+ "eval_loss": 0.03160810098052025,
1051
+ "eval_runtime": 0.6412,
1052
+ "eval_samples_per_second": 207.432,
1053
+ "eval_steps_per_second": 14.037,
1054
+ "step": 1300
1055
+ },
1056
+ {
1057
+ "epoch": 20.153846153846153,
1058
+ "grad_norm": 1.210523009300232,
1059
+ "learning_rate": 1.1938461538461539e-05,
1060
+ "loss": 0.0789,
1061
+ "step": 1310
1062
+ },
1063
+ {
1064
+ "epoch": 20.307692307692307,
1065
+ "grad_norm": 0.21540193259716034,
1066
+ "learning_rate": 1.1876923076923076e-05,
1067
+ "loss": 0.0214,
1068
+ "step": 1320
1069
+ },
1070
+ {
1071
+ "epoch": 20.46153846153846,
1072
+ "grad_norm": 0.08566620200872421,
1073
+ "learning_rate": 1.1815384615384617e-05,
1074
+ "loss": 0.0308,
1075
+ "step": 1330
1076
+ },
1077
+ {
1078
+ "epoch": 20.615384615384617,
1079
+ "grad_norm": 1.3403387069702148,
1080
+ "learning_rate": 1.1753846153846155e-05,
1081
+ "loss": 0.0656,
1082
+ "step": 1340
1083
+ },
1084
+ {
1085
+ "epoch": 20.76923076923077,
1086
+ "grad_norm": 1.3895039558410645,
1087
+ "learning_rate": 1.1692307692307694e-05,
1088
+ "loss": 0.0651,
1089
+ "step": 1350
1090
+ },
1091
+ {
1092
+ "epoch": 20.923076923076923,
1093
+ "grad_norm": 2.5756139755249023,
1094
+ "learning_rate": 1.1630769230769231e-05,
1095
+ "loss": 0.0276,
1096
+ "step": 1360
1097
+ },
1098
+ {
1099
+ "epoch": 21.0,
1100
+ "eval_accuracy": 1.0,
1101
+ "eval_loss": 0.019228629767894745,
1102
+ "eval_runtime": 0.6415,
1103
+ "eval_samples_per_second": 207.316,
1104
+ "eval_steps_per_second": 14.029,
1105
+ "step": 1365
1106
+ },
1107
+ {
1108
+ "epoch": 21.076923076923077,
1109
+ "grad_norm": 0.11552825570106506,
1110
+ "learning_rate": 1.156923076923077e-05,
1111
+ "loss": 0.0494,
1112
+ "step": 1370
1113
+ },
1114
+ {
1115
+ "epoch": 21.23076923076923,
1116
+ "grad_norm": 0.08301220834255219,
1117
+ "learning_rate": 1.1507692307692309e-05,
1118
+ "loss": 0.1079,
1119
+ "step": 1380
1120
+ },
1121
+ {
1122
+ "epoch": 21.384615384615383,
1123
+ "grad_norm": 0.08622205257415771,
1124
+ "learning_rate": 1.1446153846153846e-05,
1125
+ "loss": 0.0397,
1126
+ "step": 1390
1127
+ },
1128
+ {
1129
+ "epoch": 21.53846153846154,
1130
+ "grad_norm": 1.3071388006210327,
1131
+ "learning_rate": 1.1384615384615385e-05,
1132
+ "loss": 0.1094,
1133
+ "step": 1400
1134
+ },
1135
+ {
1136
+ "epoch": 21.692307692307693,
1137
+ "grad_norm": 17.1097469329834,
1138
+ "learning_rate": 1.1323076923076924e-05,
1139
+ "loss": 0.0298,
1140
+ "step": 1410
1141
+ },
1142
+ {
1143
+ "epoch": 21.846153846153847,
1144
+ "grad_norm": 0.08082272112369537,
1145
+ "learning_rate": 1.126153846153846e-05,
1146
+ "loss": 0.0196,
1147
+ "step": 1420
1148
+ },
1149
+ {
1150
+ "epoch": 22.0,
1151
+ "grad_norm": 0.08441055566072464,
1152
+ "learning_rate": 1.1200000000000001e-05,
1153
+ "loss": 0.0348,
1154
+ "step": 1430
1155
+ },
1156
+ {
1157
+ "epoch": 22.0,
1158
+ "eval_accuracy": 1.0,
1159
+ "eval_loss": 0.0177127867937088,
1160
+ "eval_runtime": 0.5805,
1161
+ "eval_samples_per_second": 229.097,
1162
+ "eval_steps_per_second": 15.503,
1163
+ "step": 1430
1164
+ },
1165
+ {
1166
+ "epoch": 22.153846153846153,
1167
+ "grad_norm": 0.6167936325073242,
1168
+ "learning_rate": 1.113846153846154e-05,
1169
+ "loss": 0.0675,
1170
+ "step": 1440
1171
+ },
1172
+ {
1173
+ "epoch": 22.307692307692307,
1174
+ "grad_norm": 6.9912638664245605,
1175
+ "learning_rate": 1.1076923076923079e-05,
1176
+ "loss": 0.0644,
1177
+ "step": 1450
1178
+ },
1179
+ {
1180
+ "epoch": 22.46153846153846,
1181
+ "grad_norm": 0.07652874290943146,
1182
+ "learning_rate": 1.1015384615384616e-05,
1183
+ "loss": 0.0572,
1184
+ "step": 1460
1185
+ },
1186
+ {
1187
+ "epoch": 22.615384615384617,
1188
+ "grad_norm": 0.08351747691631317,
1189
+ "learning_rate": 1.0953846153846155e-05,
1190
+ "loss": 0.051,
1191
+ "step": 1470
1192
+ },
1193
+ {
1194
+ "epoch": 22.76923076923077,
1195
+ "grad_norm": 0.11051066219806671,
1196
+ "learning_rate": 1.0892307692307693e-05,
1197
+ "loss": 0.032,
1198
+ "step": 1480
1199
+ },
1200
+ {
1201
+ "epoch": 22.923076923076923,
1202
+ "grad_norm": 10.532815933227539,
1203
+ "learning_rate": 1.083076923076923e-05,
1204
+ "loss": 0.0326,
1205
+ "step": 1490
1206
+ },
1207
+ {
1208
+ "epoch": 23.0,
1209
+ "eval_accuracy": 1.0,
1210
+ "eval_loss": 0.01754232682287693,
1211
+ "eval_runtime": 0.6431,
1212
+ "eval_samples_per_second": 206.82,
1213
+ "eval_steps_per_second": 13.995,
1214
+ "step": 1495
1215
+ },
1216
+ {
1217
+ "epoch": 23.076923076923077,
1218
+ "grad_norm": 0.08222197741270065,
1219
+ "learning_rate": 1.076923076923077e-05,
1220
+ "loss": 0.0462,
1221
+ "step": 1500
1222
+ },
1223
+ {
1224
+ "epoch": 23.23076923076923,
1225
+ "grad_norm": 0.0753277987241745,
1226
+ "learning_rate": 1.0707692307692308e-05,
1227
+ "loss": 0.0516,
1228
+ "step": 1510
1229
+ },
1230
+ {
1231
+ "epoch": 23.384615384615383,
1232
+ "grad_norm": 0.9446002840995789,
1233
+ "learning_rate": 1.0646153846153845e-05,
1234
+ "loss": 0.0214,
1235
+ "step": 1520
1236
+ },
1237
+ {
1238
+ "epoch": 23.53846153846154,
1239
+ "grad_norm": 0.0864008441567421,
1240
+ "learning_rate": 1.0584615384615386e-05,
1241
+ "loss": 0.0185,
1242
+ "step": 1530
1243
+ },
1244
+ {
1245
+ "epoch": 23.692307692307693,
1246
+ "grad_norm": 7.382317543029785,
1247
+ "learning_rate": 1.0523076923076924e-05,
1248
+ "loss": 0.0674,
1249
+ "step": 1540
1250
+ },
1251
+ {
1252
+ "epoch": 23.846153846153847,
1253
+ "grad_norm": 0.18953648209571838,
1254
+ "learning_rate": 1.0461538461538463e-05,
1255
+ "loss": 0.02,
1256
+ "step": 1550
1257
+ },
1258
+ {
1259
+ "epoch": 24.0,
1260
+ "grad_norm": 2.4876315593719482,
1261
+ "learning_rate": 1.04e-05,
1262
+ "loss": 0.1014,
1263
+ "step": 1560
1264
+ },
1265
+ {
1266
+ "epoch": 24.0,
1267
+ "eval_accuracy": 0.9924812030075187,
1268
+ "eval_loss": 0.02354033850133419,
1269
+ "eval_runtime": 0.6442,
1270
+ "eval_samples_per_second": 206.452,
1271
+ "eval_steps_per_second": 13.97,
1272
+ "step": 1560
1273
+ },
1274
+ {
1275
+ "epoch": 24.153846153846153,
1276
+ "grad_norm": 0.08115736395120621,
1277
+ "learning_rate": 1.033846153846154e-05,
1278
+ "loss": 0.014,
1279
+ "step": 1570
1280
+ },
1281
+ {
1282
+ "epoch": 24.307692307692307,
1283
+ "grad_norm": 0.06984913349151611,
1284
+ "learning_rate": 1.0276923076923078e-05,
1285
+ "loss": 0.0328,
1286
+ "step": 1580
1287
+ },
1288
+ {
1289
+ "epoch": 24.46153846153846,
1290
+ "grad_norm": 7.471628665924072,
1291
+ "learning_rate": 1.0215384615384615e-05,
1292
+ "loss": 0.0321,
1293
+ "step": 1590
1294
+ },
1295
+ {
1296
+ "epoch": 24.615384615384617,
1297
+ "grad_norm": 0.0844321921467781,
1298
+ "learning_rate": 1.0153846153846154e-05,
1299
+ "loss": 0.051,
1300
+ "step": 1600
1301
+ },
1302
+ {
1303
+ "epoch": 24.76923076923077,
1304
+ "grad_norm": 0.0683414489030838,
1305
+ "learning_rate": 1.0092307692307693e-05,
1306
+ "loss": 0.0428,
1307
+ "step": 1610
1308
+ },
1309
+ {
1310
+ "epoch": 24.923076923076923,
1311
+ "grad_norm": 0.07370961457490921,
1312
+ "learning_rate": 1.0030769230769231e-05,
1313
+ "loss": 0.0395,
1314
+ "step": 1620
1315
+ },
1316
+ {
1317
+ "epoch": 25.0,
1318
+ "eval_accuracy": 0.9849624060150376,
1319
+ "eval_loss": 0.04511820524930954,
1320
+ "eval_runtime": 0.6443,
1321
+ "eval_samples_per_second": 206.419,
1322
+ "eval_steps_per_second": 13.968,
1323
+ "step": 1625
1324
+ },
1325
+ {
1326
+ "epoch": 25.076923076923077,
1327
+ "grad_norm": 0.7060135006904602,
1328
+ "learning_rate": 9.96923076923077e-06,
1329
+ "loss": 0.0197,
1330
+ "step": 1630
1331
+ },
1332
+ {
1333
+ "epoch": 25.23076923076923,
1334
+ "grad_norm": 0.5647442936897278,
1335
+ "learning_rate": 9.907692307692309e-06,
1336
+ "loss": 0.0636,
1337
+ "step": 1640
1338
+ },
1339
+ {
1340
+ "epoch": 25.384615384615383,
1341
+ "grad_norm": 0.06951487809419632,
1342
+ "learning_rate": 9.846153846153848e-06,
1343
+ "loss": 0.0338,
1344
+ "step": 1650
1345
+ },
1346
+ {
1347
+ "epoch": 25.53846153846154,
1348
+ "grad_norm": 0.9578651785850525,
1349
+ "learning_rate": 9.784615384615387e-06,
1350
+ "loss": 0.049,
1351
+ "step": 1660
1352
+ },
1353
+ {
1354
+ "epoch": 25.692307692307693,
1355
+ "grad_norm": 0.1774640828371048,
1356
+ "learning_rate": 9.723076923076924e-06,
1357
+ "loss": 0.0135,
1358
+ "step": 1670
1359
+ },
1360
+ {
1361
+ "epoch": 25.846153846153847,
1362
+ "grad_norm": 0.06652193516492844,
1363
+ "learning_rate": 9.661538461538462e-06,
1364
+ "loss": 0.046,
1365
+ "step": 1680
1366
+ },
1367
+ {
1368
+ "epoch": 26.0,
1369
+ "grad_norm": 0.06518968194723129,
1370
+ "learning_rate": 9.600000000000001e-06,
1371
+ "loss": 0.0265,
1372
+ "step": 1690
1373
+ },
1374
+ {
1375
+ "epoch": 26.0,
1376
+ "eval_accuracy": 0.9924812030075187,
1377
+ "eval_loss": 0.0296646561473608,
1378
+ "eval_runtime": 0.5911,
1379
+ "eval_samples_per_second": 225.018,
1380
+ "eval_steps_per_second": 15.227,
1381
+ "step": 1690
1382
+ },
1383
+ {
1384
+ "epoch": 26.153846153846153,
1385
+ "grad_norm": 0.07430905103683472,
1386
+ "learning_rate": 9.53846153846154e-06,
1387
+ "loss": 0.0326,
1388
+ "step": 1700
1389
+ },
1390
+ {
1391
+ "epoch": 26.307692307692307,
1392
+ "grad_norm": 0.07437779009342194,
1393
+ "learning_rate": 9.476923076923079e-06,
1394
+ "loss": 0.0205,
1395
+ "step": 1710
1396
+ },
1397
+ {
1398
+ "epoch": 26.46153846153846,
1399
+ "grad_norm": 5.995608329772949,
1400
+ "learning_rate": 9.415384615384616e-06,
1401
+ "loss": 0.0725,
1402
+ "step": 1720
1403
+ },
1404
+ {
1405
+ "epoch": 26.615384615384617,
1406
+ "grad_norm": 0.09473489224910736,
1407
+ "learning_rate": 9.353846153846155e-06,
1408
+ "loss": 0.0142,
1409
+ "step": 1730
1410
+ },
1411
+ {
1412
+ "epoch": 26.76923076923077,
1413
+ "grad_norm": 0.05937571823596954,
1414
+ "learning_rate": 9.292307692307694e-06,
1415
+ "loss": 0.0212,
1416
+ "step": 1740
1417
+ },
1418
+ {
1419
+ "epoch": 26.923076923076923,
1420
+ "grad_norm": 19.040122985839844,
1421
+ "learning_rate": 9.230769230769232e-06,
1422
+ "loss": 0.0569,
1423
+ "step": 1750
1424
+ },
1425
+ {
1426
+ "epoch": 27.0,
1427
+ "eval_accuracy": 0.9924812030075187,
1428
+ "eval_loss": 0.026294343173503876,
1429
+ "eval_runtime": 0.642,
1430
+ "eval_samples_per_second": 207.155,
1431
+ "eval_steps_per_second": 14.018,
1432
+ "step": 1755
1433
+ },
1434
+ {
1435
+ "epoch": 27.076923076923077,
1436
+ "grad_norm": 0.5836367011070251,
1437
+ "learning_rate": 9.169230769230771e-06,
1438
+ "loss": 0.1035,
1439
+ "step": 1760
1440
+ },
1441
+ {
1442
+ "epoch": 27.23076923076923,
1443
+ "grad_norm": 0.06087055802345276,
1444
+ "learning_rate": 9.107692307692308e-06,
1445
+ "loss": 0.0518,
1446
+ "step": 1770
1447
+ },
1448
+ {
1449
+ "epoch": 27.384615384615383,
1450
+ "grad_norm": 0.06178651750087738,
1451
+ "learning_rate": 9.046153846153847e-06,
1452
+ "loss": 0.0477,
1453
+ "step": 1780
1454
+ },
1455
+ {
1456
+ "epoch": 27.53846153846154,
1457
+ "grad_norm": 0.0936984047293663,
1458
+ "learning_rate": 8.984615384615386e-06,
1459
+ "loss": 0.012,
1460
+ "step": 1790
1461
+ },
1462
+ {
1463
+ "epoch": 27.692307692307693,
1464
+ "grad_norm": 0.056670334190130234,
1465
+ "learning_rate": 8.923076923076925e-06,
1466
+ "loss": 0.0363,
1467
+ "step": 1800
1468
+ },
1469
+ {
1470
+ "epoch": 27.846153846153847,
1471
+ "grad_norm": 0.13299456238746643,
1472
+ "learning_rate": 8.861538461538463e-06,
1473
+ "loss": 0.013,
1474
+ "step": 1810
1475
+ },
1476
+ {
1477
+ "epoch": 28.0,
1478
+ "grad_norm": 0.0713280737400055,
1479
+ "learning_rate": 8.8e-06,
1480
+ "loss": 0.0666,
1481
+ "step": 1820
1482
+ },
1483
+ {
1484
+ "epoch": 28.0,
1485
+ "eval_accuracy": 0.9849624060150376,
1486
+ "eval_loss": 0.02451479434967041,
1487
+ "eval_runtime": 0.6311,
1488
+ "eval_samples_per_second": 210.727,
1489
+ "eval_steps_per_second": 14.26,
1490
+ "step": 1820
1491
+ },
1492
+ {
1493
+ "epoch": 28.153846153846153,
1494
+ "grad_norm": 0.08556320518255234,
1495
+ "learning_rate": 8.73846153846154e-06,
1496
+ "loss": 0.0119,
1497
+ "step": 1830
1498
+ },
1499
+ {
1500
+ "epoch": 28.307692307692307,
1501
+ "grad_norm": 0.05891846865415573,
1502
+ "learning_rate": 8.676923076923078e-06,
1503
+ "loss": 0.0456,
1504
+ "step": 1840
1505
+ },
1506
+ {
1507
+ "epoch": 28.46153846153846,
1508
+ "grad_norm": 0.05940578132867813,
1509
+ "learning_rate": 8.615384615384617e-06,
1510
+ "loss": 0.0541,
1511
+ "step": 1850
1512
+ },
1513
+ {
1514
+ "epoch": 28.615384615384617,
1515
+ "grad_norm": 0.06775704026222229,
1516
+ "learning_rate": 8.553846153846156e-06,
1517
+ "loss": 0.0162,
1518
+ "step": 1860
1519
+ },
1520
+ {
1521
+ "epoch": 28.76923076923077,
1522
+ "grad_norm": 0.058256104588508606,
1523
+ "learning_rate": 8.492307692307693e-06,
1524
+ "loss": 0.0354,
1525
+ "step": 1870
1526
+ },
1527
+ {
1528
+ "epoch": 28.923076923076923,
1529
+ "grad_norm": 1.286210060119629,
1530
+ "learning_rate": 8.430769230769231e-06,
1531
+ "loss": 0.0285,
1532
+ "step": 1880
1533
+ },
1534
+ {
1535
+ "epoch": 29.0,
1536
+ "eval_accuracy": 0.9774436090225563,
1537
+ "eval_loss": 0.041793130338191986,
1538
+ "eval_runtime": 0.6391,
1539
+ "eval_samples_per_second": 208.111,
1540
+ "eval_steps_per_second": 14.083,
1541
+ "step": 1885
1542
+ },
1543
+ {
1544
+ "epoch": 29.076923076923077,
1545
+ "grad_norm": 2.135648727416992,
1546
+ "learning_rate": 8.36923076923077e-06,
1547
+ "loss": 0.0197,
1548
+ "step": 1890
1549
+ },
1550
+ {
1551
+ "epoch": 29.23076923076923,
1552
+ "grad_norm": 1.5485310554504395,
1553
+ "learning_rate": 8.307692307692309e-06,
1554
+ "loss": 0.0129,
1555
+ "step": 1900
1556
+ },
1557
+ {
1558
+ "epoch": 29.384615384615383,
1559
+ "grad_norm": 1.2594960927963257,
1560
+ "learning_rate": 8.246153846153848e-06,
1561
+ "loss": 0.0964,
1562
+ "step": 1910
1563
+ },
1564
+ {
1565
+ "epoch": 29.53846153846154,
1566
+ "grad_norm": 0.13048137724399567,
1567
+ "learning_rate": 8.184615384615385e-06,
1568
+ "loss": 0.0111,
1569
+ "step": 1920
1570
+ },
1571
+ {
1572
+ "epoch": 29.692307692307693,
1573
+ "grad_norm": 0.38255247473716736,
1574
+ "learning_rate": 8.123076923076924e-06,
1575
+ "loss": 0.0292,
1576
+ "step": 1930
1577
+ },
1578
+ {
1579
+ "epoch": 29.846153846153847,
1580
+ "grad_norm": 2.1401822566986084,
1581
+ "learning_rate": 8.061538461538463e-06,
1582
+ "loss": 0.0417,
1583
+ "step": 1940
1584
+ },
1585
+ {
1586
+ "epoch": 30.0,
1587
+ "grad_norm": 0.2889564633369446,
1588
+ "learning_rate": 8.000000000000001e-06,
1589
+ "loss": 0.0892,
1590
+ "step": 1950
1591
+ },
1592
+ {
1593
+ "epoch": 30.0,
1594
+ "eval_accuracy": 0.9924812030075187,
1595
+ "eval_loss": 0.020448315888643265,
1596
+ "eval_runtime": 0.5776,
1597
+ "eval_samples_per_second": 230.273,
1598
+ "eval_steps_per_second": 15.582,
1599
+ "step": 1950
1600
+ },
1601
+ {
1602
+ "epoch": 30.153846153846153,
1603
+ "grad_norm": 3.4330976009368896,
1604
+ "learning_rate": 7.93846153846154e-06,
1605
+ "loss": 0.0466,
1606
+ "step": 1960
1607
+ },
1608
+ {
1609
+ "epoch": 30.307692307692307,
1610
+ "grad_norm": 0.05678678676486015,
1611
+ "learning_rate": 7.876923076923077e-06,
1612
+ "loss": 0.0701,
1613
+ "step": 1970
1614
+ },
1615
+ {
1616
+ "epoch": 30.46153846153846,
1617
+ "grad_norm": 0.05310628563165665,
1618
+ "learning_rate": 7.815384615384616e-06,
1619
+ "loss": 0.0118,
1620
+ "step": 1980
1621
+ },
1622
+ {
1623
+ "epoch": 30.615384615384617,
1624
+ "grad_norm": 0.07134439796209335,
1625
+ "learning_rate": 7.753846153846155e-06,
1626
+ "loss": 0.0412,
1627
+ "step": 1990
1628
+ },
1629
+ {
1630
+ "epoch": 30.76923076923077,
1631
+ "grad_norm": 0.06206020340323448,
1632
+ "learning_rate": 7.692307692307694e-06,
1633
+ "loss": 0.0254,
1634
+ "step": 2000
1635
+ },
1636
+ {
1637
+ "epoch": 30.923076923076923,
1638
+ "grad_norm": 0.0549427792429924,
1639
+ "learning_rate": 7.630769230769232e-06,
1640
+ "loss": 0.0371,
1641
+ "step": 2010
1642
+ },
1643
+ {
1644
+ "epoch": 31.0,
1645
+ "eval_accuracy": 0.9849624060150376,
1646
+ "eval_loss": 0.03390338271856308,
1647
+ "eval_runtime": 0.6391,
1648
+ "eval_samples_per_second": 208.12,
1649
+ "eval_steps_per_second": 14.083,
1650
+ "step": 2015
1651
+ },
1652
+ {
1653
+ "epoch": 31.076923076923077,
1654
+ "grad_norm": 0.05257405340671539,
1655
+ "learning_rate": 7.5692307692307695e-06,
1656
+ "loss": 0.0119,
1657
+ "step": 2020
1658
+ },
1659
+ {
1660
+ "epoch": 31.23076923076923,
1661
+ "grad_norm": 1.0072262287139893,
1662
+ "learning_rate": 7.507692307692308e-06,
1663
+ "loss": 0.0131,
1664
+ "step": 2030
1665
+ },
1666
+ {
1667
+ "epoch": 31.384615384615383,
1668
+ "grad_norm": 0.054051704704761505,
1669
+ "learning_rate": 7.446153846153846e-06,
1670
+ "loss": 0.0309,
1671
+ "step": 2040
1672
+ },
1673
+ {
1674
+ "epoch": 31.53846153846154,
1675
+ "grad_norm": 0.059842586517333984,
1676
+ "learning_rate": 7.384615384615386e-06,
1677
+ "loss": 0.0699,
1678
+ "step": 2050
1679
+ },
1680
+ {
1681
+ "epoch": 31.692307692307693,
1682
+ "grad_norm": 0.4310505986213684,
1683
+ "learning_rate": 7.323076923076924e-06,
1684
+ "loss": 0.0105,
1685
+ "step": 2060
1686
+ },
1687
+ {
1688
+ "epoch": 31.846153846153847,
1689
+ "grad_norm": 0.05525004491209984,
1690
+ "learning_rate": 7.261538461538462e-06,
1691
+ "loss": 0.0654,
1692
+ "step": 2070
1693
+ },
1694
+ {
1695
+ "epoch": 32.0,
1696
+ "grad_norm": 0.06051107123494148,
1697
+ "learning_rate": 7.2000000000000005e-06,
1698
+ "loss": 0.0105,
1699
+ "step": 2080
1700
+ },
1701
+ {
1702
+ "epoch": 32.0,
1703
+ "eval_accuracy": 1.0,
1704
+ "eval_loss": 0.01434730738401413,
1705
+ "eval_runtime": 0.5855,
1706
+ "eval_samples_per_second": 227.144,
1707
+ "eval_steps_per_second": 15.371,
1708
+ "step": 2080
1709
+ },
1710
+ {
1711
+ "epoch": 32.15384615384615,
1712
+ "grad_norm": 0.04868883639574051,
1713
+ "learning_rate": 7.1384615384615385e-06,
1714
+ "loss": 0.032,
1715
+ "step": 2090
1716
+ },
1717
+ {
1718
+ "epoch": 32.30769230769231,
1719
+ "grad_norm": 0.10859151929616928,
1720
+ "learning_rate": 7.076923076923078e-06,
1721
+ "loss": 0.084,
1722
+ "step": 2100
1723
+ },
1724
+ {
1725
+ "epoch": 32.46153846153846,
1726
+ "grad_norm": 0.05709298700094223,
1727
+ "learning_rate": 7.015384615384616e-06,
1728
+ "loss": 0.0189,
1729
+ "step": 2110
1730
+ },
1731
+ {
1732
+ "epoch": 32.61538461538461,
1733
+ "grad_norm": 0.08583523333072662,
1734
+ "learning_rate": 6.953846153846154e-06,
1735
+ "loss": 0.0124,
1736
+ "step": 2120
1737
+ },
1738
+ {
1739
+ "epoch": 32.76923076923077,
1740
+ "grad_norm": 0.7491576671600342,
1741
+ "learning_rate": 6.892307692307693e-06,
1742
+ "loss": 0.0213,
1743
+ "step": 2130
1744
+ },
1745
+ {
1746
+ "epoch": 32.92307692307692,
1747
+ "grad_norm": 0.3305934965610504,
1748
+ "learning_rate": 6.830769230769231e-06,
1749
+ "loss": 0.0563,
1750
+ "step": 2140
1751
+ },
1752
+ {
1753
+ "epoch": 33.0,
1754
+ "eval_accuracy": 1.0,
1755
+ "eval_loss": 0.014035705476999283,
1756
+ "eval_runtime": 0.6445,
1757
+ "eval_samples_per_second": 206.373,
1758
+ "eval_steps_per_second": 13.965,
1759
+ "step": 2145
1760
+ },
1761
+ {
1762
+ "epoch": 33.07692307692308,
1763
+ "grad_norm": 0.049748744815588,
1764
+ "learning_rate": 6.76923076923077e-06,
1765
+ "loss": 0.0104,
1766
+ "step": 2150
1767
+ },
1768
+ {
1769
+ "epoch": 33.23076923076923,
1770
+ "grad_norm": 0.05033630132675171,
1771
+ "learning_rate": 6.707692307692308e-06,
1772
+ "loss": 0.0548,
1773
+ "step": 2160
1774
+ },
1775
+ {
1776
+ "epoch": 33.38461538461539,
1777
+ "grad_norm": 0.054612692445516586,
1778
+ "learning_rate": 6.646153846153846e-06,
1779
+ "loss": 0.0356,
1780
+ "step": 2170
1781
+ },
1782
+ {
1783
+ "epoch": 33.53846153846154,
1784
+ "grad_norm": 0.05341866612434387,
1785
+ "learning_rate": 6.584615384615385e-06,
1786
+ "loss": 0.0133,
1787
+ "step": 2180
1788
+ },
1789
+ {
1790
+ "epoch": 33.69230769230769,
1791
+ "grad_norm": 0.04895515367388725,
1792
+ "learning_rate": 6.523076923076923e-06,
1793
+ "loss": 0.0213,
1794
+ "step": 2190
1795
+ },
1796
+ {
1797
+ "epoch": 33.84615384615385,
1798
+ "grad_norm": 0.063107430934906,
1799
+ "learning_rate": 6.461538461538463e-06,
1800
+ "loss": 0.0112,
1801
+ "step": 2200
1802
+ },
1803
+ {
1804
+ "epoch": 34.0,
1805
+ "grad_norm": 7.929271221160889,
1806
+ "learning_rate": 6.4000000000000006e-06,
1807
+ "loss": 0.0573,
1808
+ "step": 2210
1809
+ },
1810
+ {
1811
+ "epoch": 34.0,
1812
+ "eval_accuracy": 1.0,
1813
+ "eval_loss": 0.010156131349503994,
1814
+ "eval_runtime": 0.6308,
1815
+ "eval_samples_per_second": 210.855,
1816
+ "eval_steps_per_second": 14.268,
1817
+ "step": 2210
1818
+ },
1819
+ {
1820
+ "epoch": 34.15384615384615,
1821
+ "grad_norm": 0.11311448365449905,
1822
+ "learning_rate": 6.3384615384615385e-06,
1823
+ "loss": 0.0271,
1824
+ "step": 2220
1825
+ },
1826
+ {
1827
+ "epoch": 34.30769230769231,
1828
+ "grad_norm": 0.17935284972190857,
1829
+ "learning_rate": 6.276923076923077e-06,
1830
+ "loss": 0.0444,
1831
+ "step": 2230
1832
+ },
1833
+ {
1834
+ "epoch": 34.46153846153846,
1835
+ "grad_norm": 0.07903819531202316,
1836
+ "learning_rate": 6.215384615384615e-06,
1837
+ "loss": 0.039,
1838
+ "step": 2240
1839
+ },
1840
+ {
1841
+ "epoch": 34.61538461538461,
1842
+ "grad_norm": 0.07042822986841202,
1843
+ "learning_rate": 6.153846153846155e-06,
1844
+ "loss": 0.0617,
1845
+ "step": 2250
1846
+ },
1847
+ {
1848
+ "epoch": 34.76923076923077,
1849
+ "grad_norm": 0.05035420507192612,
1850
+ "learning_rate": 6.092307692307693e-06,
1851
+ "loss": 0.0505,
1852
+ "step": 2260
1853
+ },
1854
+ {
1855
+ "epoch": 34.92307692307692,
1856
+ "grad_norm": 0.04776820167899132,
1857
+ "learning_rate": 6.030769230769231e-06,
1858
+ "loss": 0.0409,
1859
+ "step": 2270
1860
+ },
1861
+ {
1862
+ "epoch": 35.0,
1863
+ "eval_accuracy": 1.0,
1864
+ "eval_loss": 0.009572061710059643,
1865
+ "eval_runtime": 0.6399,
1866
+ "eval_samples_per_second": 207.852,
1867
+ "eval_steps_per_second": 14.065,
1868
+ "step": 2275
1869
+ },
1870
+ {
1871
+ "epoch": 35.07692307692308,
1872
+ "grad_norm": 20.547210693359375,
1873
+ "learning_rate": 5.9692307692307695e-06,
1874
+ "loss": 0.0212,
1875
+ "step": 2280
1876
+ },
1877
+ {
1878
+ "epoch": 35.23076923076923,
1879
+ "grad_norm": 16.86725616455078,
1880
+ "learning_rate": 5.907692307692308e-06,
1881
+ "loss": 0.0421,
1882
+ "step": 2290
1883
+ },
1884
+ {
1885
+ "epoch": 35.38461538461539,
1886
+ "grad_norm": 1.5179917812347412,
1887
+ "learning_rate": 5.846153846153847e-06,
1888
+ "loss": 0.1226,
1889
+ "step": 2300
1890
+ },
1891
+ {
1892
+ "epoch": 35.53846153846154,
1893
+ "grad_norm": 0.045909151434898376,
1894
+ "learning_rate": 5.784615384615385e-06,
1895
+ "loss": 0.0092,
1896
+ "step": 2310
1897
+ },
1898
+ {
1899
+ "epoch": 35.69230769230769,
1900
+ "grad_norm": 0.04946780204772949,
1901
+ "learning_rate": 5.723076923076923e-06,
1902
+ "loss": 0.0505,
1903
+ "step": 2320
1904
+ },
1905
+ {
1906
+ "epoch": 35.84615384615385,
1907
+ "grad_norm": 0.055896684527397156,
1908
+ "learning_rate": 5.661538461538462e-06,
1909
+ "loss": 0.0159,
1910
+ "step": 2330
1911
+ },
1912
+ {
1913
+ "epoch": 36.0,
1914
+ "grad_norm": 0.04570392891764641,
1915
+ "learning_rate": 5.600000000000001e-06,
1916
+ "loss": 0.0523,
1917
+ "step": 2340
1918
+ },
1919
+ {
1920
+ "epoch": 36.0,
1921
+ "eval_accuracy": 0.9924812030075187,
1922
+ "eval_loss": 0.01487450860440731,
1923
+ "eval_runtime": 0.6368,
1924
+ "eval_samples_per_second": 208.852,
1925
+ "eval_steps_per_second": 14.133,
1926
+ "step": 2340
1927
+ },
1928
+ {
1929
+ "epoch": 36.15384615384615,
1930
+ "grad_norm": 0.06980939954519272,
1931
+ "learning_rate": 5.538461538461539e-06,
1932
+ "loss": 0.0219,
1933
+ "step": 2350
1934
+ },
1935
+ {
1936
+ "epoch": 36.30769230769231,
1937
+ "grad_norm": 0.05014393478631973,
1938
+ "learning_rate": 5.476923076923077e-06,
1939
+ "loss": 0.0352,
1940
+ "step": 2360
1941
+ },
1942
+ {
1943
+ "epoch": 36.46153846153846,
1944
+ "grad_norm": 0.04635272175073624,
1945
+ "learning_rate": 5.415384615384615e-06,
1946
+ "loss": 0.0328,
1947
+ "step": 2370
1948
+ },
1949
+ {
1950
+ "epoch": 36.61538461538461,
1951
+ "grad_norm": 0.04787183925509453,
1952
+ "learning_rate": 5.353846153846154e-06,
1953
+ "loss": 0.0106,
1954
+ "step": 2380
1955
+ },
1956
+ {
1957
+ "epoch": 36.76923076923077,
1958
+ "grad_norm": 0.05444607138633728,
1959
+ "learning_rate": 5.292307692307693e-06,
1960
+ "loss": 0.0443,
1961
+ "step": 2390
1962
+ },
1963
+ {
1964
+ "epoch": 36.92307692307692,
1965
+ "grad_norm": 0.046256501227617264,
1966
+ "learning_rate": 5.230769230769232e-06,
1967
+ "loss": 0.0131,
1968
+ "step": 2400
1969
+ },
1970
+ {
1971
+ "epoch": 37.0,
1972
+ "eval_accuracy": 0.9924812030075187,
1973
+ "eval_loss": 0.0196556244045496,
1974
+ "eval_runtime": 0.641,
1975
+ "eval_samples_per_second": 207.498,
1976
+ "eval_steps_per_second": 14.041,
1977
+ "step": 2405
1978
+ },
1979
+ {
1980
+ "epoch": 37.07692307692308,
1981
+ "grad_norm": 0.045623164623975754,
1982
+ "learning_rate": 5.16923076923077e-06,
1983
+ "loss": 0.0112,
1984
+ "step": 2410
1985
+ },
1986
+ {
1987
+ "epoch": 37.23076923076923,
1988
+ "grad_norm": 0.04630188271403313,
1989
+ "learning_rate": 5.1076923076923075e-06,
1990
+ "loss": 0.0129,
1991
+ "step": 2420
1992
+ },
1993
+ {
1994
+ "epoch": 37.38461538461539,
1995
+ "grad_norm": 1.0303974151611328,
1996
+ "learning_rate": 5.046153846153846e-06,
1997
+ "loss": 0.0523,
1998
+ "step": 2430
1999
+ },
2000
+ {
2001
+ "epoch": 37.53846153846154,
2002
+ "grad_norm": 0.8218058347702026,
2003
+ "learning_rate": 4.984615384615385e-06,
2004
+ "loss": 0.0532,
2005
+ "step": 2440
2006
+ },
2007
+ {
2008
+ "epoch": 37.69230769230769,
2009
+ "grad_norm": 1.5203208923339844,
2010
+ "learning_rate": 4.923076923076924e-06,
2011
+ "loss": 0.0345,
2012
+ "step": 2450
2013
+ },
2014
+ {
2015
+ "epoch": 37.84615384615385,
2016
+ "grad_norm": 0.05802327021956444,
2017
+ "learning_rate": 4.861538461538462e-06,
2018
+ "loss": 0.0251,
2019
+ "step": 2460
2020
+ },
2021
+ {
2022
+ "epoch": 38.0,
2023
+ "grad_norm": 0.06536999344825745,
2024
+ "learning_rate": 4.800000000000001e-06,
2025
+ "loss": 0.0329,
2026
+ "step": 2470
2027
+ },
2028
+ {
2029
+ "epoch": 38.0,
2030
+ "eval_accuracy": 1.0,
2031
+ "eval_loss": 0.010934116318821907,
2032
+ "eval_runtime": 0.6397,
2033
+ "eval_samples_per_second": 207.921,
2034
+ "eval_steps_per_second": 14.07,
2035
+ "step": 2470
2036
+ },
2037
+ {
2038
+ "epoch": 38.15384615384615,
2039
+ "grad_norm": 0.07708264887332916,
2040
+ "learning_rate": 4.738461538461539e-06,
2041
+ "loss": 0.0339,
2042
+ "step": 2480
2043
+ },
2044
+ {
2045
+ "epoch": 38.30769230769231,
2046
+ "grad_norm": 0.05018337070941925,
2047
+ "learning_rate": 4.676923076923077e-06,
2048
+ "loss": 0.0371,
2049
+ "step": 2490
2050
+ },
2051
+ {
2052
+ "epoch": 38.46153846153846,
2053
+ "grad_norm": 2.005122423171997,
2054
+ "learning_rate": 4.615384615384616e-06,
2055
+ "loss": 0.0493,
2056
+ "step": 2500
2057
+ },
2058
+ {
2059
+ "epoch": 38.61538461538461,
2060
+ "grad_norm": 0.04191539064049721,
2061
+ "learning_rate": 4.553846153846154e-06,
2062
+ "loss": 0.056,
2063
+ "step": 2510
2064
+ },
2065
+ {
2066
+ "epoch": 38.76923076923077,
2067
+ "grad_norm": 0.08912540227174759,
2068
+ "learning_rate": 4.492307692307693e-06,
2069
+ "loss": 0.0675,
2070
+ "step": 2520
2071
+ },
2072
+ {
2073
+ "epoch": 38.92307692307692,
2074
+ "grad_norm": 4.123304843902588,
2075
+ "learning_rate": 4.430769230769232e-06,
2076
+ "loss": 0.0577,
2077
+ "step": 2530
2078
+ },
2079
+ {
2080
+ "epoch": 39.0,
2081
+ "eval_accuracy": 1.0,
2082
+ "eval_loss": 0.00963473692536354,
2083
+ "eval_runtime": 0.6269,
2084
+ "eval_samples_per_second": 212.155,
2085
+ "eval_steps_per_second": 14.356,
2086
+ "step": 2535
2087
+ },
2088
+ {
2089
+ "epoch": 39.07692307692308,
2090
+ "grad_norm": 0.12956681847572327,
2091
+ "learning_rate": 4.36923076923077e-06,
2092
+ "loss": 0.0348,
2093
+ "step": 2540
2094
+ },
2095
+ {
2096
+ "epoch": 39.23076923076923,
2097
+ "grad_norm": 0.0469551756978035,
2098
+ "learning_rate": 4.307692307692308e-06,
2099
+ "loss": 0.047,
2100
+ "step": 2550
2101
+ },
2102
+ {
2103
+ "epoch": 39.38461538461539,
2104
+ "grad_norm": 0.0566897876560688,
2105
+ "learning_rate": 4.246153846153846e-06,
2106
+ "loss": 0.0305,
2107
+ "step": 2560
2108
+ },
2109
+ {
2110
+ "epoch": 39.53846153846154,
2111
+ "grad_norm": 0.04538924619555473,
2112
+ "learning_rate": 4.184615384615385e-06,
2113
+ "loss": 0.0083,
2114
+ "step": 2570
2115
+ },
2116
+ {
2117
+ "epoch": 39.69230769230769,
2118
+ "grad_norm": 0.1393657773733139,
2119
+ "learning_rate": 4.123076923076924e-06,
2120
+ "loss": 0.0087,
2121
+ "step": 2580
2122
+ },
2123
+ {
2124
+ "epoch": 39.84615384615385,
2125
+ "grad_norm": 0.04170211777091026,
2126
+ "learning_rate": 4.061538461538462e-06,
2127
+ "loss": 0.008,
2128
+ "step": 2590
2129
+ },
2130
+ {
2131
+ "epoch": 40.0,
2132
+ "grad_norm": 0.04205217584967613,
2133
+ "learning_rate": 4.000000000000001e-06,
2134
+ "loss": 0.0085,
2135
+ "step": 2600
2136
+ },
2137
+ {
2138
+ "epoch": 40.0,
2139
+ "eval_accuracy": 0.9924812030075187,
2140
+ "eval_loss": 0.014666187576949596,
2141
+ "eval_runtime": 0.5786,
2142
+ "eval_samples_per_second": 229.849,
2143
+ "eval_steps_per_second": 15.554,
2144
+ "step": 2600
2145
+ },
2146
+ {
2147
+ "epoch": 40.15384615384615,
2148
+ "grad_norm": 0.04466895014047623,
2149
+ "learning_rate": 3.938461538461539e-06,
2150
+ "loss": 0.0376,
2151
+ "step": 2610
2152
+ },
2153
+ {
2154
+ "epoch": 40.30769230769231,
2155
+ "grad_norm": 0.04949569329619408,
2156
+ "learning_rate": 3.876923076923077e-06,
2157
+ "loss": 0.0621,
2158
+ "step": 2620
2159
+ },
2160
+ {
2161
+ "epoch": 40.46153846153846,
2162
+ "grad_norm": 0.0461997464299202,
2163
+ "learning_rate": 3.815384615384616e-06,
2164
+ "loss": 0.0107,
2165
+ "step": 2630
2166
+ },
2167
+ {
2168
+ "epoch": 40.61538461538461,
2169
+ "grad_norm": 0.048004575073719025,
2170
+ "learning_rate": 3.753846153846154e-06,
2171
+ "loss": 0.0093,
2172
+ "step": 2640
2173
+ },
2174
+ {
2175
+ "epoch": 40.76923076923077,
2176
+ "grad_norm": 0.04209740087389946,
2177
+ "learning_rate": 3.692307692307693e-06,
2178
+ "loss": 0.0342,
2179
+ "step": 2650
2180
+ },
2181
+ {
2182
+ "epoch": 40.92307692307692,
2183
+ "grad_norm": 0.15323477983474731,
2184
+ "learning_rate": 3.630769230769231e-06,
2185
+ "loss": 0.0618,
2186
+ "step": 2660
2187
+ },
2188
+ {
2189
+ "epoch": 41.0,
2190
+ "eval_accuracy": 1.0,
2191
+ "eval_loss": 0.009433195926249027,
2192
+ "eval_runtime": 0.6376,
2193
+ "eval_samples_per_second": 208.608,
2194
+ "eval_steps_per_second": 14.116,
2195
+ "step": 2665
2196
+ },
2197
+ {
2198
+ "epoch": 41.07692307692308,
2199
+ "grad_norm": 0.040465518832206726,
2200
+ "learning_rate": 3.5692307692307692e-06,
2201
+ "loss": 0.0079,
2202
+ "step": 2670
2203
+ },
2204
+ {
2205
+ "epoch": 41.23076923076923,
2206
+ "grad_norm": 0.06956275552511215,
2207
+ "learning_rate": 3.507692307692308e-06,
2208
+ "loss": 0.0278,
2209
+ "step": 2680
2210
+ },
2211
+ {
2212
+ "epoch": 41.38461538461539,
2213
+ "grad_norm": 0.04409582540392876,
2214
+ "learning_rate": 3.4461538461538464e-06,
2215
+ "loss": 0.0079,
2216
+ "step": 2690
2217
+ },
2218
+ {
2219
+ "epoch": 41.53846153846154,
2220
+ "grad_norm": 13.665828704833984,
2221
+ "learning_rate": 3.384615384615385e-06,
2222
+ "loss": 0.0187,
2223
+ "step": 2700
2224
+ },
2225
+ {
2226
+ "epoch": 41.69230769230769,
2227
+ "grad_norm": 1.3187448978424072,
2228
+ "learning_rate": 3.323076923076923e-06,
2229
+ "loss": 0.1204,
2230
+ "step": 2710
2231
+ },
2232
+ {
2233
+ "epoch": 41.84615384615385,
2234
+ "grad_norm": 0.04300126060843468,
2235
+ "learning_rate": 3.2615384615384615e-06,
2236
+ "loss": 0.0198,
2237
+ "step": 2720
2238
+ },
2239
+ {
2240
+ "epoch": 42.0,
2241
+ "grad_norm": 0.0438438281416893,
2242
+ "learning_rate": 3.2000000000000003e-06,
2243
+ "loss": 0.0847,
2244
+ "step": 2730
2245
+ },
2246
+ {
2247
+ "epoch": 42.0,
2248
+ "eval_accuracy": 0.9924812030075187,
2249
+ "eval_loss": 0.019689705222845078,
2250
+ "eval_runtime": 0.634,
2251
+ "eval_samples_per_second": 209.767,
2252
+ "eval_steps_per_second": 14.195,
2253
+ "step": 2730
2254
+ },
2255
+ {
2256
+ "epoch": 42.15384615384615,
2257
+ "grad_norm": 0.055344920605421066,
2258
+ "learning_rate": 3.1384615384615386e-06,
2259
+ "loss": 0.0343,
2260
+ "step": 2740
2261
+ },
2262
+ {
2263
+ "epoch": 42.30769230769231,
2264
+ "grad_norm": 0.04840540513396263,
2265
+ "learning_rate": 3.0769230769230774e-06,
2266
+ "loss": 0.0091,
2267
+ "step": 2750
2268
+ },
2269
+ {
2270
+ "epoch": 42.46153846153846,
2271
+ "grad_norm": 0.0416325181722641,
2272
+ "learning_rate": 3.0153846153846154e-06,
2273
+ "loss": 0.0379,
2274
+ "step": 2760
2275
+ },
2276
+ {
2277
+ "epoch": 42.61538461538461,
2278
+ "grad_norm": 0.04047630727291107,
2279
+ "learning_rate": 2.953846153846154e-06,
2280
+ "loss": 0.0344,
2281
+ "step": 2770
2282
+ },
2283
+ {
2284
+ "epoch": 42.76923076923077,
2285
+ "grad_norm": 0.039694271981716156,
2286
+ "learning_rate": 2.8923076923076925e-06,
2287
+ "loss": 0.0556,
2288
+ "step": 2780
2289
+ },
2290
+ {
2291
+ "epoch": 42.92307692307692,
2292
+ "grad_norm": 0.0425509512424469,
2293
+ "learning_rate": 2.830769230769231e-06,
2294
+ "loss": 0.0291,
2295
+ "step": 2790
2296
+ },
2297
+ {
2298
+ "epoch": 43.0,
2299
+ "eval_accuracy": 1.0,
2300
+ "eval_loss": 0.008893251419067383,
2301
+ "eval_runtime": 0.6271,
2302
+ "eval_samples_per_second": 212.079,
2303
+ "eval_steps_per_second": 14.351,
2304
+ "step": 2795
2305
+ },
2306
+ {
2307
+ "epoch": 43.07692307692308,
2308
+ "grad_norm": 0.04988468438386917,
2309
+ "learning_rate": 2.7692307692307697e-06,
2310
+ "loss": 0.0369,
2311
+ "step": 2800
2312
+ },
2313
+ {
2314
+ "epoch": 43.23076923076923,
2315
+ "grad_norm": 0.07137361913919449,
2316
+ "learning_rate": 2.7076923076923076e-06,
2317
+ "loss": 0.0434,
2318
+ "step": 2810
2319
+ },
2320
+ {
2321
+ "epoch": 43.38461538461539,
2322
+ "grad_norm": 7.0051679611206055,
2323
+ "learning_rate": 2.6461538461538464e-06,
2324
+ "loss": 0.0291,
2325
+ "step": 2820
2326
+ },
2327
+ {
2328
+ "epoch": 43.53846153846154,
2329
+ "grad_norm": 0.045469850301742554,
2330
+ "learning_rate": 2.584615384615385e-06,
2331
+ "loss": 0.0338,
2332
+ "step": 2830
2333
+ },
2334
+ {
2335
+ "epoch": 43.69230769230769,
2336
+ "grad_norm": 0.08003593236207962,
2337
+ "learning_rate": 2.523076923076923e-06,
2338
+ "loss": 0.0099,
2339
+ "step": 2840
2340
+ },
2341
+ {
2342
+ "epoch": 43.84615384615385,
2343
+ "grad_norm": 0.04380409047007561,
2344
+ "learning_rate": 2.461538461538462e-06,
2345
+ "loss": 0.0111,
2346
+ "step": 2850
2347
+ },
2348
+ {
2349
+ "epoch": 44.0,
2350
+ "grad_norm": 13.31029224395752,
2351
+ "learning_rate": 2.4000000000000003e-06,
2352
+ "loss": 0.0568,
2353
+ "step": 2860
2354
+ },
2355
+ {
2356
+ "epoch": 44.0,
2357
+ "eval_accuracy": 1.0,
2358
+ "eval_loss": 0.008692615665495396,
2359
+ "eval_runtime": 0.585,
2360
+ "eval_samples_per_second": 227.347,
2361
+ "eval_steps_per_second": 15.384,
2362
+ "step": 2860
2363
+ },
2364
+ {
2365
+ "epoch": 44.15384615384615,
2366
+ "grad_norm": 0.04654600843787193,
2367
+ "learning_rate": 2.3384615384615387e-06,
2368
+ "loss": 0.0087,
2369
+ "step": 2870
2370
+ },
2371
+ {
2372
+ "epoch": 44.30769230769231,
2373
+ "grad_norm": 7.452500343322754,
2374
+ "learning_rate": 2.276923076923077e-06,
2375
+ "loss": 0.0108,
2376
+ "step": 2880
2377
+ },
2378
+ {
2379
+ "epoch": 44.46153846153846,
2380
+ "grad_norm": 13.458589553833008,
2381
+ "learning_rate": 2.215384615384616e-06,
2382
+ "loss": 0.0274,
2383
+ "step": 2890
2384
+ },
2385
+ {
2386
+ "epoch": 44.61538461538461,
2387
+ "grad_norm": 0.044014327228069305,
2388
+ "learning_rate": 2.153846153846154e-06,
2389
+ "loss": 0.0299,
2390
+ "step": 2900
2391
+ },
2392
+ {
2393
+ "epoch": 44.76923076923077,
2394
+ "grad_norm": 0.041860181838274,
2395
+ "learning_rate": 2.0923076923076926e-06,
2396
+ "loss": 0.0112,
2397
+ "step": 2910
2398
+ },
2399
+ {
2400
+ "epoch": 44.92307692307692,
2401
+ "grad_norm": 0.04078350216150284,
2402
+ "learning_rate": 2.030769230769231e-06,
2403
+ "loss": 0.0077,
2404
+ "step": 2920
2405
+ },
2406
+ {
2407
+ "epoch": 45.0,
2408
+ "eval_accuracy": 1.0,
2409
+ "eval_loss": 0.010402214713394642,
2410
+ "eval_runtime": 0.6383,
2411
+ "eval_samples_per_second": 208.376,
2412
+ "eval_steps_per_second": 14.101,
2413
+ "step": 2925
2414
+ },
2415
+ {
2416
+ "epoch": 45.07692307692308,
2417
+ "grad_norm": 0.7966273427009583,
2418
+ "learning_rate": 1.9692307692307693e-06,
2419
+ "loss": 0.0432,
2420
+ "step": 2930
2421
+ },
2422
+ {
2423
+ "epoch": 45.23076923076923,
2424
+ "grad_norm": 0.04070662334561348,
2425
+ "learning_rate": 1.907692307692308e-06,
2426
+ "loss": 0.0549,
2427
+ "step": 2940
2428
+ },
2429
+ {
2430
+ "epoch": 45.38461538461539,
2431
+ "grad_norm": 0.042289331555366516,
2432
+ "learning_rate": 1.8461538461538465e-06,
2433
+ "loss": 0.0364,
2434
+ "step": 2950
2435
+ },
2436
+ {
2437
+ "epoch": 45.53846153846154,
2438
+ "grad_norm": 0.04655339941382408,
2439
+ "learning_rate": 1.7846153846153846e-06,
2440
+ "loss": 0.0114,
2441
+ "step": 2960
2442
+ },
2443
+ {
2444
+ "epoch": 45.69230769230769,
2445
+ "grad_norm": 0.04026507958769798,
2446
+ "learning_rate": 1.7230769230769232e-06,
2447
+ "loss": 0.0078,
2448
+ "step": 2970
2449
+ },
2450
+ {
2451
+ "epoch": 45.84615384615385,
2452
+ "grad_norm": 0.048073675483465195,
2453
+ "learning_rate": 1.6615384615384616e-06,
2454
+ "loss": 0.0155,
2455
+ "step": 2980
2456
+ },
2457
+ {
2458
+ "epoch": 46.0,
2459
+ "grad_norm": 0.1250167340040207,
2460
+ "learning_rate": 1.6000000000000001e-06,
2461
+ "loss": 0.008,
2462
+ "step": 2990
2463
+ },
2464
+ {
2465
+ "epoch": 46.0,
2466
+ "eval_accuracy": 1.0,
2467
+ "eval_loss": 0.013788605108857155,
2468
+ "eval_runtime": 0.5799,
2469
+ "eval_samples_per_second": 229.355,
2470
+ "eval_steps_per_second": 15.52,
2471
+ "step": 2990
2472
+ },
2473
+ {
2474
+ "epoch": 46.15384615384615,
2475
+ "grad_norm": 0.3202461302280426,
2476
+ "learning_rate": 1.5384615384615387e-06,
2477
+ "loss": 0.0352,
2478
+ "step": 3000
2479
+ },
2480
+ {
2481
+ "epoch": 46.30769230769231,
2482
+ "grad_norm": 2.8946588039398193,
2483
+ "learning_rate": 1.476923076923077e-06,
2484
+ "loss": 0.0196,
2485
+ "step": 3010
2486
+ },
2487
+ {
2488
+ "epoch": 46.46153846153846,
2489
+ "grad_norm": 0.0777769535779953,
2490
+ "learning_rate": 1.4153846153846155e-06,
2491
+ "loss": 0.0079,
2492
+ "step": 3020
2493
+ },
2494
+ {
2495
+ "epoch": 46.61538461538461,
2496
+ "grad_norm": 1.6607468128204346,
2497
+ "learning_rate": 1.3538461538461538e-06,
2498
+ "loss": 0.0102,
2499
+ "step": 3030
2500
+ },
2501
+ {
2502
+ "epoch": 46.76923076923077,
2503
+ "grad_norm": 0.04541005194187164,
2504
+ "learning_rate": 1.2923076923076924e-06,
2505
+ "loss": 0.0085,
2506
+ "step": 3040
2507
+ },
2508
+ {
2509
+ "epoch": 46.92307692307692,
2510
+ "grad_norm": 0.041475191712379456,
2511
+ "learning_rate": 1.230769230769231e-06,
2512
+ "loss": 0.0272,
2513
+ "step": 3050
2514
+ },
2515
+ {
2516
+ "epoch": 47.0,
2517
+ "eval_accuracy": 1.0,
2518
+ "eval_loss": 0.00810349639505148,
2519
+ "eval_runtime": 0.6353,
2520
+ "eval_samples_per_second": 209.365,
2521
+ "eval_steps_per_second": 14.168,
2522
+ "step": 3055
2523
+ },
2524
+ {
2525
+ "epoch": 47.07692307692308,
2526
+ "grad_norm": 0.048281800001859665,
2527
+ "learning_rate": 1.1692307692307693e-06,
2528
+ "loss": 0.008,
2529
+ "step": 3060
2530
+ },
2531
+ {
2532
+ "epoch": 47.23076923076923,
2533
+ "grad_norm": 0.03975387290120125,
2534
+ "learning_rate": 1.107692307692308e-06,
2535
+ "loss": 0.0431,
2536
+ "step": 3070
2537
+ },
2538
+ {
2539
+ "epoch": 47.38461538461539,
2540
+ "grad_norm": 0.040405042469501495,
2541
+ "learning_rate": 1.0461538461538463e-06,
2542
+ "loss": 0.0355,
2543
+ "step": 3080
2544
+ },
2545
+ {
2546
+ "epoch": 47.53846153846154,
2547
+ "grad_norm": 0.04081344977021217,
2548
+ "learning_rate": 9.846153846153847e-07,
2549
+ "loss": 0.0078,
2550
+ "step": 3090
2551
+ },
2552
+ {
2553
+ "epoch": 47.69230769230769,
2554
+ "grad_norm": 0.045139458030462265,
2555
+ "learning_rate": 9.230769230769232e-07,
2556
+ "loss": 0.0096,
2557
+ "step": 3100
2558
+ },
2559
+ {
2560
+ "epoch": 47.84615384615385,
2561
+ "grad_norm": 0.06699339300394058,
2562
+ "learning_rate": 8.615384615384616e-07,
2563
+ "loss": 0.0402,
2564
+ "step": 3110
2565
+ },
2566
+ {
2567
+ "epoch": 48.0,
2568
+ "grad_norm": 0.04077847674489021,
2569
+ "learning_rate": 8.000000000000001e-07,
2570
+ "loss": 0.008,
2571
+ "step": 3120
2572
+ },
2573
+ {
2574
+ "epoch": 48.0,
2575
+ "eval_accuracy": 1.0,
2576
+ "eval_loss": 0.008442863821983337,
2577
+ "eval_runtime": 0.6316,
2578
+ "eval_samples_per_second": 210.576,
2579
+ "eval_steps_per_second": 14.25,
2580
+ "step": 3120
2581
+ },
2582
+ {
2583
+ "epoch": 48.15384615384615,
2584
+ "grad_norm": 0.14330124855041504,
2585
+ "learning_rate": 7.384615384615385e-07,
2586
+ "loss": 0.0396,
2587
+ "step": 3130
2588
+ },
2589
+ {
2590
+ "epoch": 48.30769230769231,
2591
+ "grad_norm": 0.04051917791366577,
2592
+ "learning_rate": 6.769230769230769e-07,
2593
+ "loss": 0.0613,
2594
+ "step": 3140
2595
+ },
2596
+ {
2597
+ "epoch": 48.46153846153846,
2598
+ "grad_norm": 0.03945121914148331,
2599
+ "learning_rate": 6.153846153846155e-07,
2600
+ "loss": 0.0092,
2601
+ "step": 3150
2602
+ },
2603
+ {
2604
+ "epoch": 48.61538461538461,
2605
+ "grad_norm": 0.04850227013230324,
2606
+ "learning_rate": 5.53846153846154e-07,
2607
+ "loss": 0.0395,
2608
+ "step": 3160
2609
+ },
2610
+ {
2611
+ "epoch": 48.76923076923077,
2612
+ "grad_norm": 0.611132800579071,
2613
+ "learning_rate": 4.923076923076923e-07,
2614
+ "loss": 0.015,
2615
+ "step": 3170
2616
+ },
2617
+ {
2618
+ "epoch": 48.92307692307692,
2619
+ "grad_norm": 0.039736129343509674,
2620
+ "learning_rate": 4.307692307692308e-07,
2621
+ "loss": 0.0112,
2622
+ "step": 3180
2623
+ },
2624
+ {
2625
+ "epoch": 49.0,
2626
+ "eval_accuracy": 1.0,
2627
+ "eval_loss": 0.008192874491214752,
2628
+ "eval_runtime": 0.594,
2629
+ "eval_samples_per_second": 223.907,
2630
+ "eval_steps_per_second": 15.152,
2631
+ "step": 3185
2632
+ },
2633
+ {
2634
+ "epoch": 49.07692307692308,
2635
+ "grad_norm": 0.0747324600815773,
2636
+ "learning_rate": 3.6923076923076927e-07,
2637
+ "loss": 0.018,
2638
+ "step": 3190
2639
+ },
2640
+ {
2641
+ "epoch": 49.23076923076923,
2642
+ "grad_norm": 0.03965931013226509,
2643
+ "learning_rate": 3.0769230769230774e-07,
2644
+ "loss": 0.0948,
2645
+ "step": 3200
2646
+ },
2647
+ {
2648
+ "epoch": 49.38461538461539,
2649
+ "grad_norm": 3.6046483516693115,
2650
+ "learning_rate": 2.4615384615384616e-07,
2651
+ "loss": 0.0302,
2652
+ "step": 3210
2653
+ },
2654
+ {
2655
+ "epoch": 49.53846153846154,
2656
+ "grad_norm": 0.03949074074625969,
2657
+ "learning_rate": 1.8461538461538464e-07,
2658
+ "loss": 0.0094,
2659
+ "step": 3220
2660
+ },
2661
+ {
2662
+ "epoch": 49.69230769230769,
2663
+ "grad_norm": 0.03980456292629242,
2664
+ "learning_rate": 1.2307692307692308e-07,
2665
+ "loss": 0.0388,
2666
+ "step": 3230
2667
+ },
2668
+ {
2669
+ "epoch": 49.84615384615385,
2670
+ "grad_norm": 8.264440536499023,
2671
+ "learning_rate": 6.153846153846154e-08,
2672
+ "loss": 0.0143,
2673
+ "step": 3240
2674
+ },
2675
+ {
2676
+ "epoch": 50.0,
2677
+ "grad_norm": 0.043733663856983185,
2678
+ "learning_rate": 0.0,
2679
+ "loss": 0.013,
2680
+ "step": 3250
2681
+ },
2682
+ {
2683
+ "epoch": 50.0,
2684
+ "eval_accuracy": 1.0,
2685
+ "eval_loss": 0.007947824895381927,
2686
+ "eval_runtime": 0.6285,
2687
+ "eval_samples_per_second": 211.608,
2688
+ "eval_steps_per_second": 14.319,
2689
+ "step": 3250
2690
+ },
2691
+ {
2692
+ "epoch": 50.0,
2693
+ "step": 3250,
2694
+ "total_flos": 3.6243328994998477e+18,
2695
+ "train_loss": 0.03743308119131968,
2696
+ "train_runtime": 550.1735,
2697
+ "train_samples_per_second": 93.97,
2698
+ "train_steps_per_second": 5.907
2699
  }
2700
  ],
2701
  "logging_steps": 10,
2702
+ "max_steps": 3250,
2703
  "num_input_tokens_seen": 0,
2704
+ "num_train_epochs": 50,
2705
  "save_steps": 500,
2706
  "stateful_callbacks": {
2707
  "TrainerControl": {
 
2715
  "attributes": {}
2716
  }
2717
  },
2718
+ "total_flos": 3.6243328994998477e+18,
2719
  "train_batch_size": 8,
2720
  "trial_name": null,
2721
  "trial_params": null