skrd3 commited on
Commit
79ed931
·
verified ·
1 Parent(s): a07b418

Training in progress, step 375, checkpoint

Browse files
last-checkpoint/model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:59a26b2d6eb001c62d416f2e319486e7143ddfc05085d1588b1bcc566c552c19
3
  size 2200119864
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3ff898de4f88712fb50a5522ea47c598e5a58b6eb9f332c668b6443b61696a09
3
  size 2200119864
last-checkpoint/trainer_state.json CHANGED
@@ -2,9 +2,9 @@
2
  "best_global_step": null,
3
  "best_metric": null,
4
  "best_model_checkpoint": null,
5
- "epoch": 0.0536254865215511,
6
  "eval_steps": 1734,
7
- "global_step": 93,
8
  "is_hyper_param_search": false,
9
  "is_local_process_zero": true,
10
  "is_world_process_zero": true,
@@ -675,6 +675,1988 @@
675
  "eval_samples_per_second": 130.347,
676
  "eval_steps_per_second": 2.803,
677
  "step": 93
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
678
  }
679
  ],
680
  "logging_steps": 1,
@@ -694,7 +2676,7 @@
694
  "attributes": {}
695
  }
696
  },
697
- "total_flos": 1.4185419194027213e+17,
698
  "train_batch_size": 60,
699
  "trial_name": null,
700
  "trial_params": null
 
2
  "best_global_step": null,
3
  "best_metric": null,
4
  "best_model_checkpoint": null,
5
+ "epoch": 0.21623180049012541,
6
  "eval_steps": 1734,
7
+ "global_step": 375,
8
  "is_hyper_param_search": false,
9
  "is_local_process_zero": true,
10
  "is_world_process_zero": true,
 
675
  "eval_samples_per_second": 130.347,
676
  "eval_steps_per_second": 2.803,
677
  "step": 93
678
+ },
679
+ {
680
+ "epoch": 0.054202104656191435,
681
+ "grad_norm": 0.6640625,
682
+ "learning_rate": 9.995587006107607e-05,
683
+ "loss": 0.9885,
684
+ "step": 94
685
+ },
686
+ {
687
+ "epoch": 0.05477872279083177,
688
+ "grad_norm": 0.69921875,
689
+ "learning_rate": 9.995480051351141e-05,
690
+ "loss": 0.9397,
691
+ "step": 95
692
+ },
693
+ {
694
+ "epoch": 0.055355340925472105,
695
+ "grad_norm": 0.62109375,
696
+ "learning_rate": 9.99537181680915e-05,
697
+ "loss": 0.9706,
698
+ "step": 96
699
+ },
700
+ {
701
+ "epoch": 0.05593195906011244,
702
+ "grad_norm": 0.671875,
703
+ "learning_rate": 9.995262302521265e-05,
704
+ "loss": 0.9613,
705
+ "step": 97
706
+ },
707
+ {
708
+ "epoch": 0.056508577194752774,
709
+ "grad_norm": 0.67578125,
710
+ "learning_rate": 9.995151508527583e-05,
711
+ "loss": 0.9783,
712
+ "step": 98
713
+ },
714
+ {
715
+ "epoch": 0.05708519532939311,
716
+ "grad_norm": 0.64453125,
717
+ "learning_rate": 9.995039434868667e-05,
718
+ "loss": 0.9635,
719
+ "step": 99
720
+ },
721
+ {
722
+ "epoch": 0.05766181346403344,
723
+ "grad_norm": 0.62109375,
724
+ "learning_rate": 9.994926081585551e-05,
725
+ "loss": 0.95,
726
+ "step": 100
727
+ },
728
+ {
729
+ "epoch": 0.05823843159867378,
730
+ "grad_norm": 0.59375,
731
+ "learning_rate": 9.994811448719736e-05,
732
+ "loss": 0.9274,
733
+ "step": 101
734
+ },
735
+ {
736
+ "epoch": 0.05881504973331411,
737
+ "grad_norm": 0.64453125,
738
+ "learning_rate": 9.994695536313194e-05,
739
+ "loss": 0.964,
740
+ "step": 102
741
+ },
742
+ {
743
+ "epoch": 0.05939166786795445,
744
+ "grad_norm": 0.64453125,
745
+ "learning_rate": 9.994578344408361e-05,
746
+ "loss": 0.9469,
747
+ "step": 103
748
+ },
749
+ {
750
+ "epoch": 0.05996828600259478,
751
+ "grad_norm": 0.640625,
752
+ "learning_rate": 9.994459873048146e-05,
753
+ "loss": 0.979,
754
+ "step": 104
755
+ },
756
+ {
757
+ "epoch": 0.060544904137235116,
758
+ "grad_norm": 0.640625,
759
+ "learning_rate": 9.99434012227592e-05,
760
+ "loss": 0.9555,
761
+ "step": 105
762
+ },
763
+ {
764
+ "epoch": 0.06112152227187545,
765
+ "grad_norm": 0.6796875,
766
+ "learning_rate": 9.994219092135533e-05,
767
+ "loss": 0.9661,
768
+ "step": 106
769
+ },
770
+ {
771
+ "epoch": 0.061698140406515785,
772
+ "grad_norm": 0.640625,
773
+ "learning_rate": 9.994096782671293e-05,
774
+ "loss": 0.9808,
775
+ "step": 107
776
+ },
777
+ {
778
+ "epoch": 0.06227475854115612,
779
+ "grad_norm": 0.625,
780
+ "learning_rate": 9.993973193927984e-05,
781
+ "loss": 0.9517,
782
+ "step": 108
783
+ },
784
+ {
785
+ "epoch": 0.06285137667579645,
786
+ "grad_norm": 0.625,
787
+ "learning_rate": 9.993848325950852e-05,
788
+ "loss": 0.9701,
789
+ "step": 109
790
+ },
791
+ {
792
+ "epoch": 0.06342799481043679,
793
+ "grad_norm": 0.6484375,
794
+ "learning_rate": 9.993722178785616e-05,
795
+ "loss": 1.0023,
796
+ "step": 110
797
+ },
798
+ {
799
+ "epoch": 0.06400461294507712,
800
+ "grad_norm": 0.65234375,
801
+ "learning_rate": 9.993594752478461e-05,
802
+ "loss": 1.0208,
803
+ "step": 111
804
+ },
805
+ {
806
+ "epoch": 0.06458123107971746,
807
+ "grad_norm": 0.61328125,
808
+ "learning_rate": 9.993466047076041e-05,
809
+ "loss": 0.9391,
810
+ "step": 112
811
+ },
812
+ {
813
+ "epoch": 0.06515784921435779,
814
+ "grad_norm": 0.66015625,
815
+ "learning_rate": 9.99333606262548e-05,
816
+ "loss": 1.0017,
817
+ "step": 113
818
+ },
819
+ {
820
+ "epoch": 0.06573446734899813,
821
+ "grad_norm": 0.60546875,
822
+ "learning_rate": 9.993204799174367e-05,
823
+ "loss": 0.9599,
824
+ "step": 114
825
+ },
826
+ {
827
+ "epoch": 0.06631108548363845,
828
+ "grad_norm": 0.6640625,
829
+ "learning_rate": 9.993072256770759e-05,
830
+ "loss": 0.973,
831
+ "step": 115
832
+ },
833
+ {
834
+ "epoch": 0.0668877036182788,
835
+ "grad_norm": 0.609375,
836
+ "learning_rate": 9.992938435463189e-05,
837
+ "loss": 0.9892,
838
+ "step": 116
839
+ },
840
+ {
841
+ "epoch": 0.06746432175291912,
842
+ "grad_norm": 0.6484375,
843
+ "learning_rate": 9.992803335300647e-05,
844
+ "loss": 0.9819,
845
+ "step": 117
846
+ },
847
+ {
848
+ "epoch": 0.06804093988755947,
849
+ "grad_norm": 0.62109375,
850
+ "learning_rate": 9.992666956332599e-05,
851
+ "loss": 0.9517,
852
+ "step": 118
853
+ },
854
+ {
855
+ "epoch": 0.06861755802219979,
856
+ "grad_norm": 0.625,
857
+ "learning_rate": 9.992529298608977e-05,
858
+ "loss": 0.9475,
859
+ "step": 119
860
+ },
861
+ {
862
+ "epoch": 0.06919417615684013,
863
+ "grad_norm": 0.609375,
864
+ "learning_rate": 9.99239036218018e-05,
865
+ "loss": 0.9438,
866
+ "step": 120
867
+ },
868
+ {
869
+ "epoch": 0.06977079429148046,
870
+ "grad_norm": 0.61328125,
871
+ "learning_rate": 9.992250147097076e-05,
872
+ "loss": 0.9475,
873
+ "step": 121
874
+ },
875
+ {
876
+ "epoch": 0.0703474124261208,
877
+ "grad_norm": 0.625,
878
+ "learning_rate": 9.992108653411e-05,
879
+ "loss": 0.9157,
880
+ "step": 122
881
+ },
882
+ {
883
+ "epoch": 0.07092403056076113,
884
+ "grad_norm": 0.6171875,
885
+ "learning_rate": 9.991965881173764e-05,
886
+ "loss": 0.9425,
887
+ "step": 123
888
+ },
889
+ {
890
+ "epoch": 0.07150064869540147,
891
+ "grad_norm": 0.62890625,
892
+ "learning_rate": 9.991821830437631e-05,
893
+ "loss": 0.9465,
894
+ "step": 124
895
+ },
896
+ {
897
+ "epoch": 0.0720772668300418,
898
+ "grad_norm": 0.66015625,
899
+ "learning_rate": 9.991676501255346e-05,
900
+ "loss": 0.9809,
901
+ "step": 125
902
+ },
903
+ {
904
+ "epoch": 0.07265388496468214,
905
+ "grad_norm": 0.63671875,
906
+ "learning_rate": 9.991529893680121e-05,
907
+ "loss": 0.952,
908
+ "step": 126
909
+ },
910
+ {
911
+ "epoch": 0.07323050309932247,
912
+ "grad_norm": 0.640625,
913
+ "learning_rate": 9.991382007765626e-05,
914
+ "loss": 0.9734,
915
+ "step": 127
916
+ },
917
+ {
918
+ "epoch": 0.07380712123396281,
919
+ "grad_norm": 0.63671875,
920
+ "learning_rate": 9.991232843566011e-05,
921
+ "loss": 0.9436,
922
+ "step": 128
923
+ },
924
+ {
925
+ "epoch": 0.07438373936860314,
926
+ "grad_norm": 0.66015625,
927
+ "learning_rate": 9.991082401135887e-05,
928
+ "loss": 0.9497,
929
+ "step": 129
930
+ },
931
+ {
932
+ "epoch": 0.07496035750324348,
933
+ "grad_norm": 0.640625,
934
+ "learning_rate": 9.990930680530335e-05,
935
+ "loss": 0.973,
936
+ "step": 130
937
+ },
938
+ {
939
+ "epoch": 0.07553697563788381,
940
+ "grad_norm": 0.640625,
941
+ "learning_rate": 9.990777681804903e-05,
942
+ "loss": 0.9116,
943
+ "step": 131
944
+ },
945
+ {
946
+ "epoch": 0.07611359377252415,
947
+ "grad_norm": 0.640625,
948
+ "learning_rate": 9.990623405015612e-05,
949
+ "loss": 0.9332,
950
+ "step": 132
951
+ },
952
+ {
953
+ "epoch": 0.07669021190716448,
954
+ "grad_norm": 0.62109375,
955
+ "learning_rate": 9.990467850218938e-05,
956
+ "loss": 0.9892,
957
+ "step": 133
958
+ },
959
+ {
960
+ "epoch": 0.07726683004180482,
961
+ "grad_norm": 0.64453125,
962
+ "learning_rate": 9.990311017471842e-05,
963
+ "loss": 0.9529,
964
+ "step": 134
965
+ },
966
+ {
967
+ "epoch": 0.07784344817644515,
968
+ "grad_norm": 0.640625,
969
+ "learning_rate": 9.990152906831743e-05,
970
+ "loss": 0.9713,
971
+ "step": 135
972
+ },
973
+ {
974
+ "epoch": 0.07842006631108549,
975
+ "grad_norm": 0.62890625,
976
+ "learning_rate": 9.989993518356526e-05,
977
+ "loss": 0.9428,
978
+ "step": 136
979
+ },
980
+ {
981
+ "epoch": 0.07899668444572581,
982
+ "grad_norm": 0.6640625,
983
+ "learning_rate": 9.98983285210455e-05,
984
+ "loss": 0.9495,
985
+ "step": 137
986
+ },
987
+ {
988
+ "epoch": 0.07957330258036616,
989
+ "grad_norm": 0.6171875,
990
+ "learning_rate": 9.989670908134637e-05,
991
+ "loss": 0.9647,
992
+ "step": 138
993
+ },
994
+ {
995
+ "epoch": 0.08014992071500648,
996
+ "grad_norm": 0.671875,
997
+ "learning_rate": 9.989507686506082e-05,
998
+ "loss": 0.9841,
999
+ "step": 139
1000
+ },
1001
+ {
1002
+ "epoch": 0.08072653884964683,
1003
+ "grad_norm": 0.62109375,
1004
+ "learning_rate": 9.98934318727864e-05,
1005
+ "loss": 0.9468,
1006
+ "step": 140
1007
+ },
1008
+ {
1009
+ "epoch": 0.08130315698428715,
1010
+ "grad_norm": 0.625,
1011
+ "learning_rate": 9.989177410512543e-05,
1012
+ "loss": 0.9491,
1013
+ "step": 141
1014
+ },
1015
+ {
1016
+ "epoch": 0.0818797751189275,
1017
+ "grad_norm": 0.6015625,
1018
+ "learning_rate": 9.989010356268484e-05,
1019
+ "loss": 0.8968,
1020
+ "step": 142
1021
+ },
1022
+ {
1023
+ "epoch": 0.08245639325356782,
1024
+ "grad_norm": 0.63671875,
1025
+ "learning_rate": 9.988842024607625e-05,
1026
+ "loss": 0.9615,
1027
+ "step": 143
1028
+ },
1029
+ {
1030
+ "epoch": 0.08303301138820816,
1031
+ "grad_norm": 0.6484375,
1032
+ "learning_rate": 9.9886724155916e-05,
1033
+ "loss": 0.9638,
1034
+ "step": 144
1035
+ },
1036
+ {
1037
+ "epoch": 0.08360962952284849,
1038
+ "grad_norm": 0.63671875,
1039
+ "learning_rate": 9.988501529282504e-05,
1040
+ "loss": 0.9358,
1041
+ "step": 145
1042
+ },
1043
+ {
1044
+ "epoch": 0.08418624765748883,
1045
+ "grad_norm": 0.6875,
1046
+ "learning_rate": 9.988329365742903e-05,
1047
+ "loss": 0.9658,
1048
+ "step": 146
1049
+ },
1050
+ {
1051
+ "epoch": 0.08476286579212916,
1052
+ "grad_norm": 0.625,
1053
+ "learning_rate": 9.988155925035832e-05,
1054
+ "loss": 0.9569,
1055
+ "step": 147
1056
+ },
1057
+ {
1058
+ "epoch": 0.0853394839267695,
1059
+ "grad_norm": 0.6171875,
1060
+ "learning_rate": 9.987981207224793e-05,
1061
+ "loss": 0.9371,
1062
+ "step": 148
1063
+ },
1064
+ {
1065
+ "epoch": 0.08591610206140983,
1066
+ "grad_norm": 0.65234375,
1067
+ "learning_rate": 9.98780521237375e-05,
1068
+ "loss": 0.9796,
1069
+ "step": 149
1070
+ },
1071
+ {
1072
+ "epoch": 0.08649272019605017,
1073
+ "grad_norm": 0.6015625,
1074
+ "learning_rate": 9.987627940547145e-05,
1075
+ "loss": 0.9195,
1076
+ "step": 150
1077
+ },
1078
+ {
1079
+ "epoch": 0.0870693383306905,
1080
+ "grad_norm": 0.64453125,
1081
+ "learning_rate": 9.987449391809878e-05,
1082
+ "loss": 0.9468,
1083
+ "step": 151
1084
+ },
1085
+ {
1086
+ "epoch": 0.08764595646533084,
1087
+ "grad_norm": 0.66015625,
1088
+ "learning_rate": 9.987269566227322e-05,
1089
+ "loss": 0.9438,
1090
+ "step": 152
1091
+ },
1092
+ {
1093
+ "epoch": 0.08822257459997117,
1094
+ "grad_norm": 0.61328125,
1095
+ "learning_rate": 9.987088463865317e-05,
1096
+ "loss": 0.9673,
1097
+ "step": 153
1098
+ },
1099
+ {
1100
+ "epoch": 0.08879919273461151,
1101
+ "grad_norm": 0.6484375,
1102
+ "learning_rate": 9.986906084790164e-05,
1103
+ "loss": 0.9103,
1104
+ "step": 154
1105
+ },
1106
+ {
1107
+ "epoch": 0.08937581086925184,
1108
+ "grad_norm": 0.58203125,
1109
+ "learning_rate": 9.986722429068644e-05,
1110
+ "loss": 0.918,
1111
+ "step": 155
1112
+ },
1113
+ {
1114
+ "epoch": 0.08995242900389218,
1115
+ "grad_norm": 0.62890625,
1116
+ "learning_rate": 9.986537496767993e-05,
1117
+ "loss": 0.9399,
1118
+ "step": 156
1119
+ },
1120
+ {
1121
+ "epoch": 0.0905290471385325,
1122
+ "grad_norm": 0.6328125,
1123
+ "learning_rate": 9.986351287955922e-05,
1124
+ "loss": 0.9584,
1125
+ "step": 157
1126
+ },
1127
+ {
1128
+ "epoch": 0.09110566527317285,
1129
+ "grad_norm": 0.6484375,
1130
+ "learning_rate": 9.986163802700604e-05,
1131
+ "loss": 0.9801,
1132
+ "step": 158
1133
+ },
1134
+ {
1135
+ "epoch": 0.09168228340781318,
1136
+ "grad_norm": 0.6171875,
1137
+ "learning_rate": 9.985975041070683e-05,
1138
+ "loss": 0.9435,
1139
+ "step": 159
1140
+ },
1141
+ {
1142
+ "epoch": 0.09225890154245352,
1143
+ "grad_norm": 0.625,
1144
+ "learning_rate": 9.985785003135272e-05,
1145
+ "loss": 0.9362,
1146
+ "step": 160
1147
+ },
1148
+ {
1149
+ "epoch": 0.09283551967709384,
1150
+ "grad_norm": 0.6328125,
1151
+ "learning_rate": 9.985593688963948e-05,
1152
+ "loss": 1.0124,
1153
+ "step": 161
1154
+ },
1155
+ {
1156
+ "epoch": 0.09341213781173417,
1157
+ "grad_norm": 0.6015625,
1158
+ "learning_rate": 9.985401098626754e-05,
1159
+ "loss": 0.9245,
1160
+ "step": 162
1161
+ },
1162
+ {
1163
+ "epoch": 0.09398875594637451,
1164
+ "grad_norm": 0.59765625,
1165
+ "learning_rate": 9.985207232194205e-05,
1166
+ "loss": 0.9019,
1167
+ "step": 163
1168
+ },
1169
+ {
1170
+ "epoch": 0.09456537408101484,
1171
+ "grad_norm": 0.62890625,
1172
+ "learning_rate": 9.985012089737278e-05,
1173
+ "loss": 0.9756,
1174
+ "step": 164
1175
+ },
1176
+ {
1177
+ "epoch": 0.09514199221565518,
1178
+ "grad_norm": 0.60546875,
1179
+ "learning_rate": 9.98481567132742e-05,
1180
+ "loss": 0.9209,
1181
+ "step": 165
1182
+ },
1183
+ {
1184
+ "epoch": 0.09571861035029551,
1185
+ "grad_norm": 0.609375,
1186
+ "learning_rate": 9.98461797703655e-05,
1187
+ "loss": 0.9595,
1188
+ "step": 166
1189
+ },
1190
+ {
1191
+ "epoch": 0.09629522848493585,
1192
+ "grad_norm": 0.6328125,
1193
+ "learning_rate": 9.98441900693704e-05,
1194
+ "loss": 0.9633,
1195
+ "step": 167
1196
+ },
1197
+ {
1198
+ "epoch": 0.09687184661957618,
1199
+ "grad_norm": 0.65234375,
1200
+ "learning_rate": 9.984218761101744e-05,
1201
+ "loss": 0.9551,
1202
+ "step": 168
1203
+ },
1204
+ {
1205
+ "epoch": 0.09744846475421652,
1206
+ "grad_norm": 0.61328125,
1207
+ "learning_rate": 9.984017239603978e-05,
1208
+ "loss": 0.927,
1209
+ "step": 169
1210
+ },
1211
+ {
1212
+ "epoch": 0.09802508288885685,
1213
+ "grad_norm": 0.59375,
1214
+ "learning_rate": 9.98381444251752e-05,
1215
+ "loss": 0.9283,
1216
+ "step": 170
1217
+ },
1218
+ {
1219
+ "epoch": 0.09860170102349719,
1220
+ "grad_norm": 0.65625,
1221
+ "learning_rate": 9.983610369916621e-05,
1222
+ "loss": 0.9794,
1223
+ "step": 171
1224
+ },
1225
+ {
1226
+ "epoch": 0.09917831915813752,
1227
+ "grad_norm": 0.64453125,
1228
+ "learning_rate": 9.983405021875998e-05,
1229
+ "loss": 0.9629,
1230
+ "step": 172
1231
+ },
1232
+ {
1233
+ "epoch": 0.09975493729277786,
1234
+ "grad_norm": 0.6171875,
1235
+ "learning_rate": 9.983198398470834e-05,
1236
+ "loss": 0.9343,
1237
+ "step": 173
1238
+ },
1239
+ {
1240
+ "epoch": 0.10033155542741819,
1241
+ "grad_norm": 0.62109375,
1242
+ "learning_rate": 9.982990499776782e-05,
1243
+ "loss": 0.9536,
1244
+ "step": 174
1245
+ },
1246
+ {
1247
+ "epoch": 0.10090817356205853,
1248
+ "grad_norm": 0.625,
1249
+ "learning_rate": 9.982781325869952e-05,
1250
+ "loss": 0.9283,
1251
+ "step": 175
1252
+ },
1253
+ {
1254
+ "epoch": 0.10148479169669886,
1255
+ "grad_norm": 0.609375,
1256
+ "learning_rate": 9.982570876826934e-05,
1257
+ "loss": 0.9128,
1258
+ "step": 176
1259
+ },
1260
+ {
1261
+ "epoch": 0.1020614098313392,
1262
+ "grad_norm": 0.6328125,
1263
+ "learning_rate": 9.982359152724777e-05,
1264
+ "loss": 0.9672,
1265
+ "step": 177
1266
+ },
1267
+ {
1268
+ "epoch": 0.10263802796597953,
1269
+ "grad_norm": 0.6171875,
1270
+ "learning_rate": 9.982146153640997e-05,
1271
+ "loss": 0.9524,
1272
+ "step": 178
1273
+ },
1274
+ {
1275
+ "epoch": 0.10321464610061987,
1276
+ "grad_norm": 0.60546875,
1277
+ "learning_rate": 9.981931879653582e-05,
1278
+ "loss": 0.9426,
1279
+ "step": 179
1280
+ },
1281
+ {
1282
+ "epoch": 0.1037912642352602,
1283
+ "grad_norm": 0.63671875,
1284
+ "learning_rate": 9.981716330840977e-05,
1285
+ "loss": 0.9469,
1286
+ "step": 180
1287
+ },
1288
+ {
1289
+ "epoch": 0.10436788236990054,
1290
+ "grad_norm": 0.625,
1291
+ "learning_rate": 9.981499507282109e-05,
1292
+ "loss": 0.8872,
1293
+ "step": 181
1294
+ },
1295
+ {
1296
+ "epoch": 0.10494450050454086,
1297
+ "grad_norm": 0.60546875,
1298
+ "learning_rate": 9.981281409056358e-05,
1299
+ "loss": 0.9412,
1300
+ "step": 182
1301
+ },
1302
+ {
1303
+ "epoch": 0.1055211186391812,
1304
+ "grad_norm": 0.60546875,
1305
+ "learning_rate": 9.981062036243573e-05,
1306
+ "loss": 0.9138,
1307
+ "step": 183
1308
+ },
1309
+ {
1310
+ "epoch": 0.10609773677382153,
1311
+ "grad_norm": 0.6484375,
1312
+ "learning_rate": 9.980841388924076e-05,
1313
+ "loss": 0.9755,
1314
+ "step": 184
1315
+ },
1316
+ {
1317
+ "epoch": 0.10667435490846187,
1318
+ "grad_norm": 0.59765625,
1319
+ "learning_rate": 9.980619467178646e-05,
1320
+ "loss": 0.9032,
1321
+ "step": 185
1322
+ },
1323
+ {
1324
+ "epoch": 0.1072509730431022,
1325
+ "grad_norm": 0.625,
1326
+ "learning_rate": 9.980396271088544e-05,
1327
+ "loss": 0.9294,
1328
+ "step": 186
1329
+ },
1330
+ {
1331
+ "epoch": 0.10782759117774254,
1332
+ "grad_norm": 0.609375,
1333
+ "learning_rate": 9.98017180073548e-05,
1334
+ "loss": 0.9219,
1335
+ "step": 187
1336
+ },
1337
+ {
1338
+ "epoch": 0.10840420931238287,
1339
+ "grad_norm": 0.625,
1340
+ "learning_rate": 9.97994605620164e-05,
1341
+ "loss": 0.9394,
1342
+ "step": 188
1343
+ },
1344
+ {
1345
+ "epoch": 0.10898082744702321,
1346
+ "grad_norm": 0.58984375,
1347
+ "learning_rate": 9.979719037569677e-05,
1348
+ "loss": 0.9266,
1349
+ "step": 189
1350
+ },
1351
+ {
1352
+ "epoch": 0.10955744558166354,
1353
+ "grad_norm": 0.6015625,
1354
+ "learning_rate": 9.979490744922706e-05,
1355
+ "loss": 0.9293,
1356
+ "step": 190
1357
+ },
1358
+ {
1359
+ "epoch": 0.11013406371630388,
1360
+ "grad_norm": 0.6171875,
1361
+ "learning_rate": 9.979261178344313e-05,
1362
+ "loss": 0.9627,
1363
+ "step": 191
1364
+ },
1365
+ {
1366
+ "epoch": 0.11071068185094421,
1367
+ "grad_norm": 0.609375,
1368
+ "learning_rate": 9.979030337918544e-05,
1369
+ "loss": 0.9455,
1370
+ "step": 192
1371
+ },
1372
+ {
1373
+ "epoch": 0.11128729998558455,
1374
+ "grad_norm": 0.62890625,
1375
+ "learning_rate": 9.978798223729921e-05,
1376
+ "loss": 0.9105,
1377
+ "step": 193
1378
+ },
1379
+ {
1380
+ "epoch": 0.11186391812022488,
1381
+ "grad_norm": 0.625,
1382
+ "learning_rate": 9.978564835863424e-05,
1383
+ "loss": 0.9269,
1384
+ "step": 194
1385
+ },
1386
+ {
1387
+ "epoch": 0.11244053625486522,
1388
+ "grad_norm": 0.5859375,
1389
+ "learning_rate": 9.978330174404504e-05,
1390
+ "loss": 0.8918,
1391
+ "step": 195
1392
+ },
1393
+ {
1394
+ "epoch": 0.11301715438950555,
1395
+ "grad_norm": 0.64453125,
1396
+ "learning_rate": 9.978094239439072e-05,
1397
+ "loss": 0.9321,
1398
+ "step": 196
1399
+ },
1400
+ {
1401
+ "epoch": 0.11359377252414589,
1402
+ "grad_norm": 0.59765625,
1403
+ "learning_rate": 9.977857031053517e-05,
1404
+ "loss": 0.881,
1405
+ "step": 197
1406
+ },
1407
+ {
1408
+ "epoch": 0.11417039065878622,
1409
+ "grad_norm": 0.609375,
1410
+ "learning_rate": 9.977618549334684e-05,
1411
+ "loss": 0.9129,
1412
+ "step": 198
1413
+ },
1414
+ {
1415
+ "epoch": 0.11474700879342656,
1416
+ "grad_norm": 0.609375,
1417
+ "learning_rate": 9.977378794369885e-05,
1418
+ "loss": 0.9191,
1419
+ "step": 199
1420
+ },
1421
+ {
1422
+ "epoch": 0.11532362692806689,
1423
+ "grad_norm": 0.609375,
1424
+ "learning_rate": 9.977137766246905e-05,
1425
+ "loss": 0.9051,
1426
+ "step": 200
1427
+ },
1428
+ {
1429
+ "epoch": 0.11590024506270723,
1430
+ "grad_norm": 0.6015625,
1431
+ "learning_rate": 9.976895465053986e-05,
1432
+ "loss": 0.9171,
1433
+ "step": 201
1434
+ },
1435
+ {
1436
+ "epoch": 0.11647686319734755,
1437
+ "grad_norm": 0.62109375,
1438
+ "learning_rate": 9.976651890879842e-05,
1439
+ "loss": 0.9307,
1440
+ "step": 202
1441
+ },
1442
+ {
1443
+ "epoch": 0.1170534813319879,
1444
+ "grad_norm": 0.640625,
1445
+ "learning_rate": 9.976407043813654e-05,
1446
+ "loss": 1.0037,
1447
+ "step": 203
1448
+ },
1449
+ {
1450
+ "epoch": 0.11763009946662822,
1451
+ "grad_norm": 0.63671875,
1452
+ "learning_rate": 9.976160923945063e-05,
1453
+ "loss": 0.8872,
1454
+ "step": 204
1455
+ },
1456
+ {
1457
+ "epoch": 0.11820671760126857,
1458
+ "grad_norm": 0.58203125,
1459
+ "learning_rate": 9.975913531364185e-05,
1460
+ "loss": 0.9196,
1461
+ "step": 205
1462
+ },
1463
+ {
1464
+ "epoch": 0.1187833357359089,
1465
+ "grad_norm": 0.609375,
1466
+ "learning_rate": 9.975664866161594e-05,
1467
+ "loss": 0.9431,
1468
+ "step": 206
1469
+ },
1470
+ {
1471
+ "epoch": 0.11935995387054923,
1472
+ "grad_norm": 0.62890625,
1473
+ "learning_rate": 9.975414928428331e-05,
1474
+ "loss": 0.9603,
1475
+ "step": 207
1476
+ },
1477
+ {
1478
+ "epoch": 0.11993657200518956,
1479
+ "grad_norm": 0.62890625,
1480
+ "learning_rate": 9.975163718255906e-05,
1481
+ "loss": 0.9841,
1482
+ "step": 208
1483
+ },
1484
+ {
1485
+ "epoch": 0.1205131901398299,
1486
+ "grad_norm": 0.6171875,
1487
+ "learning_rate": 9.974911235736295e-05,
1488
+ "loss": 0.9446,
1489
+ "step": 209
1490
+ },
1491
+ {
1492
+ "epoch": 0.12108980827447023,
1493
+ "grad_norm": 0.6171875,
1494
+ "learning_rate": 9.974657480961938e-05,
1495
+ "loss": 0.9353,
1496
+ "step": 210
1497
+ },
1498
+ {
1499
+ "epoch": 0.12166642640911057,
1500
+ "grad_norm": 0.6171875,
1501
+ "learning_rate": 9.97440245402574e-05,
1502
+ "loss": 0.914,
1503
+ "step": 211
1504
+ },
1505
+ {
1506
+ "epoch": 0.1222430445437509,
1507
+ "grad_norm": 0.6484375,
1508
+ "learning_rate": 9.974146155021074e-05,
1509
+ "loss": 0.9604,
1510
+ "step": 212
1511
+ },
1512
+ {
1513
+ "epoch": 0.12281966267839124,
1514
+ "grad_norm": 0.609375,
1515
+ "learning_rate": 9.973888584041776e-05,
1516
+ "loss": 0.9225,
1517
+ "step": 213
1518
+ },
1519
+ {
1520
+ "epoch": 0.12339628081303157,
1521
+ "grad_norm": 0.5859375,
1522
+ "learning_rate": 9.973629741182151e-05,
1523
+ "loss": 0.8992,
1524
+ "step": 214
1525
+ },
1526
+ {
1527
+ "epoch": 0.12397289894767191,
1528
+ "grad_norm": 0.60546875,
1529
+ "learning_rate": 9.973369626536968e-05,
1530
+ "loss": 0.9262,
1531
+ "step": 215
1532
+ },
1533
+ {
1534
+ "epoch": 0.12454951708231224,
1535
+ "grad_norm": 0.59375,
1536
+ "learning_rate": 9.973108240201461e-05,
1537
+ "loss": 0.9602,
1538
+ "step": 216
1539
+ },
1540
+ {
1541
+ "epoch": 0.12512613521695257,
1542
+ "grad_norm": 0.62109375,
1543
+ "learning_rate": 9.97284558227133e-05,
1544
+ "loss": 0.9615,
1545
+ "step": 217
1546
+ },
1547
+ {
1548
+ "epoch": 0.1257027533515929,
1549
+ "grad_norm": 0.59375,
1550
+ "learning_rate": 9.972581652842743e-05,
1551
+ "loss": 0.9679,
1552
+ "step": 218
1553
+ },
1554
+ {
1555
+ "epoch": 0.12627937148623325,
1556
+ "grad_norm": 0.59375,
1557
+ "learning_rate": 9.972316452012327e-05,
1558
+ "loss": 0.9701,
1559
+ "step": 219
1560
+ },
1561
+ {
1562
+ "epoch": 0.12685598962087358,
1563
+ "grad_norm": 0.61328125,
1564
+ "learning_rate": 9.972049979877183e-05,
1565
+ "loss": 0.92,
1566
+ "step": 220
1567
+ },
1568
+ {
1569
+ "epoch": 0.1274326077555139,
1570
+ "grad_norm": 0.59375,
1571
+ "learning_rate": 9.971782236534872e-05,
1572
+ "loss": 0.9352,
1573
+ "step": 221
1574
+ },
1575
+ {
1576
+ "epoch": 0.12800922589015423,
1577
+ "grad_norm": 0.59765625,
1578
+ "learning_rate": 9.971513222083423e-05,
1579
+ "loss": 0.8721,
1580
+ "step": 222
1581
+ },
1582
+ {
1583
+ "epoch": 0.1285858440247946,
1584
+ "grad_norm": 0.62109375,
1585
+ "learning_rate": 9.971242936621328e-05,
1586
+ "loss": 0.9416,
1587
+ "step": 223
1588
+ },
1589
+ {
1590
+ "epoch": 0.12916246215943492,
1591
+ "grad_norm": 0.6171875,
1592
+ "learning_rate": 9.970971380247545e-05,
1593
+ "loss": 0.9501,
1594
+ "step": 224
1595
+ },
1596
+ {
1597
+ "epoch": 0.12973908029407524,
1598
+ "grad_norm": 0.59765625,
1599
+ "learning_rate": 9.970698553061497e-05,
1600
+ "loss": 0.947,
1601
+ "step": 225
1602
+ },
1603
+ {
1604
+ "epoch": 0.13031569842871557,
1605
+ "grad_norm": 0.67578125,
1606
+ "learning_rate": 9.970424455163074e-05,
1607
+ "loss": 0.8953,
1608
+ "step": 226
1609
+ },
1610
+ {
1611
+ "epoch": 0.13089231656335593,
1612
+ "grad_norm": 0.6015625,
1613
+ "learning_rate": 9.970149086652631e-05,
1614
+ "loss": 0.9052,
1615
+ "step": 227
1616
+ },
1617
+ {
1618
+ "epoch": 0.13146893469799625,
1619
+ "grad_norm": 0.60546875,
1620
+ "learning_rate": 9.969872447630988e-05,
1621
+ "loss": 0.9875,
1622
+ "step": 228
1623
+ },
1624
+ {
1625
+ "epoch": 0.13204555283263658,
1626
+ "grad_norm": 0.59375,
1627
+ "learning_rate": 9.969594538199429e-05,
1628
+ "loss": 0.8976,
1629
+ "step": 229
1630
+ },
1631
+ {
1632
+ "epoch": 0.1326221709672769,
1633
+ "grad_norm": 0.60546875,
1634
+ "learning_rate": 9.969315358459704e-05,
1635
+ "loss": 0.9022,
1636
+ "step": 230
1637
+ },
1638
+ {
1639
+ "epoch": 0.13319878910191726,
1640
+ "grad_norm": 0.62109375,
1641
+ "learning_rate": 9.969034908514026e-05,
1642
+ "loss": 0.8922,
1643
+ "step": 231
1644
+ },
1645
+ {
1646
+ "epoch": 0.1337754072365576,
1647
+ "grad_norm": 0.60546875,
1648
+ "learning_rate": 9.968753188465077e-05,
1649
+ "loss": 0.9523,
1650
+ "step": 232
1651
+ },
1652
+ {
1653
+ "epoch": 0.13435202537119792,
1654
+ "grad_norm": 0.59375,
1655
+ "learning_rate": 9.968470198416e-05,
1656
+ "loss": 0.9256,
1657
+ "step": 233
1658
+ },
1659
+ {
1660
+ "epoch": 0.13492864350583825,
1661
+ "grad_norm": 0.59765625,
1662
+ "learning_rate": 9.968185938470409e-05,
1663
+ "loss": 0.9132,
1664
+ "step": 234
1665
+ },
1666
+ {
1667
+ "epoch": 0.1355052616404786,
1668
+ "grad_norm": 0.58984375,
1669
+ "learning_rate": 9.967900408732373e-05,
1670
+ "loss": 0.8912,
1671
+ "step": 235
1672
+ },
1673
+ {
1674
+ "epoch": 0.13608187977511893,
1675
+ "grad_norm": 0.6328125,
1676
+ "learning_rate": 9.967613609306439e-05,
1677
+ "loss": 0.9169,
1678
+ "step": 236
1679
+ },
1680
+ {
1681
+ "epoch": 0.13665849790975926,
1682
+ "grad_norm": 0.61328125,
1683
+ "learning_rate": 9.967325540297608e-05,
1684
+ "loss": 0.9064,
1685
+ "step": 237
1686
+ },
1687
+ {
1688
+ "epoch": 0.13723511604439959,
1689
+ "grad_norm": 0.58984375,
1690
+ "learning_rate": 9.967036201811346e-05,
1691
+ "loss": 0.9257,
1692
+ "step": 238
1693
+ },
1694
+ {
1695
+ "epoch": 0.13781173417903994,
1696
+ "grad_norm": 0.6328125,
1697
+ "learning_rate": 9.966745593953593e-05,
1698
+ "loss": 0.9532,
1699
+ "step": 239
1700
+ },
1701
+ {
1702
+ "epoch": 0.13838835231368027,
1703
+ "grad_norm": 0.63671875,
1704
+ "learning_rate": 9.966453716830743e-05,
1705
+ "loss": 0.9413,
1706
+ "step": 240
1707
+ },
1708
+ {
1709
+ "epoch": 0.1389649704483206,
1710
+ "grad_norm": 0.62109375,
1711
+ "learning_rate": 9.966160570549666e-05,
1712
+ "loss": 0.9554,
1713
+ "step": 241
1714
+ },
1715
+ {
1716
+ "epoch": 0.13954158858296092,
1717
+ "grad_norm": 0.62109375,
1718
+ "learning_rate": 9.965866155217685e-05,
1719
+ "loss": 0.938,
1720
+ "step": 242
1721
+ },
1722
+ {
1723
+ "epoch": 0.14011820671760128,
1724
+ "grad_norm": 0.625,
1725
+ "learning_rate": 9.965570470942594e-05,
1726
+ "loss": 0.8946,
1727
+ "step": 243
1728
+ },
1729
+ {
1730
+ "epoch": 0.1406948248522416,
1731
+ "grad_norm": 0.6171875,
1732
+ "learning_rate": 9.965273517832652e-05,
1733
+ "loss": 0.9377,
1734
+ "step": 244
1735
+ },
1736
+ {
1737
+ "epoch": 0.14127144298688193,
1738
+ "grad_norm": 0.625,
1739
+ "learning_rate": 9.964975295996582e-05,
1740
+ "loss": 0.9558,
1741
+ "step": 245
1742
+ },
1743
+ {
1744
+ "epoch": 0.14184806112152226,
1745
+ "grad_norm": 0.6484375,
1746
+ "learning_rate": 9.964675805543569e-05,
1747
+ "loss": 0.9083,
1748
+ "step": 246
1749
+ },
1750
+ {
1751
+ "epoch": 0.14242467925616262,
1752
+ "grad_norm": 0.62890625,
1753
+ "learning_rate": 9.964375046583265e-05,
1754
+ "loss": 0.9289,
1755
+ "step": 247
1756
+ },
1757
+ {
1758
+ "epoch": 0.14300129739080294,
1759
+ "grad_norm": 0.609375,
1760
+ "learning_rate": 9.964073019225784e-05,
1761
+ "loss": 0.9139,
1762
+ "step": 248
1763
+ },
1764
+ {
1765
+ "epoch": 0.14357791552544327,
1766
+ "grad_norm": 0.66796875,
1767
+ "learning_rate": 9.963769723581709e-05,
1768
+ "loss": 0.9616,
1769
+ "step": 249
1770
+ },
1771
+ {
1772
+ "epoch": 0.1441545336600836,
1773
+ "grad_norm": 0.609375,
1774
+ "learning_rate": 9.963465159762082e-05,
1775
+ "loss": 0.9082,
1776
+ "step": 250
1777
+ },
1778
+ {
1779
+ "epoch": 0.14473115179472396,
1780
+ "grad_norm": 0.625,
1781
+ "learning_rate": 9.963159327878411e-05,
1782
+ "loss": 0.8906,
1783
+ "step": 251
1784
+ },
1785
+ {
1786
+ "epoch": 0.14530776992936428,
1787
+ "grad_norm": 0.63671875,
1788
+ "learning_rate": 9.962852228042671e-05,
1789
+ "loss": 0.9086,
1790
+ "step": 252
1791
+ },
1792
+ {
1793
+ "epoch": 0.1458843880640046,
1794
+ "grad_norm": 0.6171875,
1795
+ "learning_rate": 9.962543860367299e-05,
1796
+ "loss": 0.914,
1797
+ "step": 253
1798
+ },
1799
+ {
1800
+ "epoch": 0.14646100619864494,
1801
+ "grad_norm": 0.609375,
1802
+ "learning_rate": 9.962234224965196e-05,
1803
+ "loss": 0.955,
1804
+ "step": 254
1805
+ },
1806
+ {
1807
+ "epoch": 0.1470376243332853,
1808
+ "grad_norm": 0.59765625,
1809
+ "learning_rate": 9.961923321949729e-05,
1810
+ "loss": 0.9088,
1811
+ "step": 255
1812
+ },
1813
+ {
1814
+ "epoch": 0.14761424246792562,
1815
+ "grad_norm": 0.59765625,
1816
+ "learning_rate": 9.961611151434722e-05,
1817
+ "loss": 0.9276,
1818
+ "step": 256
1819
+ },
1820
+ {
1821
+ "epoch": 0.14819086060256595,
1822
+ "grad_norm": 0.60546875,
1823
+ "learning_rate": 9.961297713534476e-05,
1824
+ "loss": 0.9369,
1825
+ "step": 257
1826
+ },
1827
+ {
1828
+ "epoch": 0.14876747873720628,
1829
+ "grad_norm": 0.59765625,
1830
+ "learning_rate": 9.960983008363745e-05,
1831
+ "loss": 0.9465,
1832
+ "step": 258
1833
+ },
1834
+ {
1835
+ "epoch": 0.14934409687184663,
1836
+ "grad_norm": 0.58203125,
1837
+ "learning_rate": 9.96066703603775e-05,
1838
+ "loss": 0.9,
1839
+ "step": 259
1840
+ },
1841
+ {
1842
+ "epoch": 0.14992071500648696,
1843
+ "grad_norm": 0.609375,
1844
+ "learning_rate": 9.960349796672177e-05,
1845
+ "loss": 0.9486,
1846
+ "step": 260
1847
+ },
1848
+ {
1849
+ "epoch": 0.1504973331411273,
1850
+ "grad_norm": 0.57421875,
1851
+ "learning_rate": 9.960031290383179e-05,
1852
+ "loss": 0.9293,
1853
+ "step": 261
1854
+ },
1855
+ {
1856
+ "epoch": 0.15107395127576762,
1857
+ "grad_norm": 0.5859375,
1858
+ "learning_rate": 9.959711517287364e-05,
1859
+ "loss": 0.8732,
1860
+ "step": 262
1861
+ },
1862
+ {
1863
+ "epoch": 0.15165056941040797,
1864
+ "grad_norm": 0.609375,
1865
+ "learning_rate": 9.959390477501814e-05,
1866
+ "loss": 0.9095,
1867
+ "step": 263
1868
+ },
1869
+ {
1870
+ "epoch": 0.1522271875450483,
1871
+ "grad_norm": 0.59375,
1872
+ "learning_rate": 9.959068171144063e-05,
1873
+ "loss": 0.9583,
1874
+ "step": 264
1875
+ },
1876
+ {
1877
+ "epoch": 0.15280380567968863,
1878
+ "grad_norm": 0.56640625,
1879
+ "learning_rate": 9.958744598332126e-05,
1880
+ "loss": 0.9175,
1881
+ "step": 265
1882
+ },
1883
+ {
1884
+ "epoch": 0.15338042381432895,
1885
+ "grad_norm": 0.6328125,
1886
+ "learning_rate": 9.958419759184463e-05,
1887
+ "loss": 0.9034,
1888
+ "step": 266
1889
+ },
1890
+ {
1891
+ "epoch": 0.1539570419489693,
1892
+ "grad_norm": 0.57421875,
1893
+ "learning_rate": 9.95809365382001e-05,
1894
+ "loss": 0.8857,
1895
+ "step": 267
1896
+ },
1897
+ {
1898
+ "epoch": 0.15453366008360964,
1899
+ "grad_norm": 0.59765625,
1900
+ "learning_rate": 9.957766282358161e-05,
1901
+ "loss": 0.9408,
1902
+ "step": 268
1903
+ },
1904
+ {
1905
+ "epoch": 0.15511027821824996,
1906
+ "grad_norm": 0.609375,
1907
+ "learning_rate": 9.957437644918775e-05,
1908
+ "loss": 0.9403,
1909
+ "step": 269
1910
+ },
1911
+ {
1912
+ "epoch": 0.1556868963528903,
1913
+ "grad_norm": 0.640625,
1914
+ "learning_rate": 9.957107741622176e-05,
1915
+ "loss": 0.8624,
1916
+ "step": 270
1917
+ },
1918
+ {
1919
+ "epoch": 0.15626351448753062,
1920
+ "grad_norm": 0.57421875,
1921
+ "learning_rate": 9.956776572589148e-05,
1922
+ "loss": 0.9076,
1923
+ "step": 271
1924
+ },
1925
+ {
1926
+ "epoch": 0.15684013262217097,
1927
+ "grad_norm": 0.59765625,
1928
+ "learning_rate": 9.956444137940943e-05,
1929
+ "loss": 0.9275,
1930
+ "step": 272
1931
+ },
1932
+ {
1933
+ "epoch": 0.1574167507568113,
1934
+ "grad_norm": 0.5859375,
1935
+ "learning_rate": 9.956110437799273e-05,
1936
+ "loss": 0.8866,
1937
+ "step": 273
1938
+ },
1939
+ {
1940
+ "epoch": 0.15799336889145163,
1941
+ "grad_norm": 0.58984375,
1942
+ "learning_rate": 9.955775472286315e-05,
1943
+ "loss": 0.9608,
1944
+ "step": 274
1945
+ },
1946
+ {
1947
+ "epoch": 0.15856998702609196,
1948
+ "grad_norm": 0.65234375,
1949
+ "learning_rate": 9.955439241524707e-05,
1950
+ "loss": 0.9419,
1951
+ "step": 275
1952
+ },
1953
+ {
1954
+ "epoch": 0.1591466051607323,
1955
+ "grad_norm": 0.61328125,
1956
+ "learning_rate": 9.955101745637552e-05,
1957
+ "loss": 0.951,
1958
+ "step": 276
1959
+ },
1960
+ {
1961
+ "epoch": 0.15972322329537264,
1962
+ "grad_norm": 0.6328125,
1963
+ "learning_rate": 9.954762984748413e-05,
1964
+ "loss": 0.9377,
1965
+ "step": 277
1966
+ },
1967
+ {
1968
+ "epoch": 0.16029984143001297,
1969
+ "grad_norm": 0.6171875,
1970
+ "learning_rate": 9.954422958981326e-05,
1971
+ "loss": 0.8692,
1972
+ "step": 278
1973
+ },
1974
+ {
1975
+ "epoch": 0.1608764595646533,
1976
+ "grad_norm": 0.58203125,
1977
+ "learning_rate": 9.95408166846078e-05,
1978
+ "loss": 0.9131,
1979
+ "step": 279
1980
+ },
1981
+ {
1982
+ "epoch": 0.16145307769929365,
1983
+ "grad_norm": 0.58984375,
1984
+ "learning_rate": 9.953739113311726e-05,
1985
+ "loss": 0.9578,
1986
+ "step": 280
1987
+ },
1988
+ {
1989
+ "epoch": 0.16202969583393398,
1990
+ "grad_norm": 0.6328125,
1991
+ "learning_rate": 9.953395293659591e-05,
1992
+ "loss": 0.9629,
1993
+ "step": 281
1994
+ },
1995
+ {
1996
+ "epoch": 0.1626063139685743,
1997
+ "grad_norm": 0.61328125,
1998
+ "learning_rate": 9.953050209630249e-05,
1999
+ "loss": 0.9274,
2000
+ "step": 282
2001
+ },
2002
+ {
2003
+ "epoch": 0.16318293210321463,
2004
+ "grad_norm": 0.61328125,
2005
+ "learning_rate": 9.952703861350046e-05,
2006
+ "loss": 0.8947,
2007
+ "step": 283
2008
+ },
2009
+ {
2010
+ "epoch": 0.163759550237855,
2011
+ "grad_norm": 0.58203125,
2012
+ "learning_rate": 9.952356248945791e-05,
2013
+ "loss": 0.9305,
2014
+ "step": 284
2015
+ },
2016
+ {
2017
+ "epoch": 0.16433616837249532,
2018
+ "grad_norm": 0.5625,
2019
+ "learning_rate": 9.952007372544751e-05,
2020
+ "loss": 0.898,
2021
+ "step": 285
2022
+ },
2023
+ {
2024
+ "epoch": 0.16491278650713564,
2025
+ "grad_norm": 0.5859375,
2026
+ "learning_rate": 9.951657232274662e-05,
2027
+ "loss": 0.9065,
2028
+ "step": 286
2029
+ },
2030
+ {
2031
+ "epoch": 0.16548940464177597,
2032
+ "grad_norm": 0.57421875,
2033
+ "learning_rate": 9.951305828263715e-05,
2034
+ "loss": 0.9443,
2035
+ "step": 287
2036
+ },
2037
+ {
2038
+ "epoch": 0.16606602277641633,
2039
+ "grad_norm": 0.5859375,
2040
+ "learning_rate": 9.950953160640573e-05,
2041
+ "loss": 0.8977,
2042
+ "step": 288
2043
+ },
2044
+ {
2045
+ "epoch": 0.16664264091105666,
2046
+ "grad_norm": 0.59765625,
2047
+ "learning_rate": 9.950599229534354e-05,
2048
+ "loss": 0.9149,
2049
+ "step": 289
2050
+ },
2051
+ {
2052
+ "epoch": 0.16721925904569698,
2053
+ "grad_norm": 0.59375,
2054
+ "learning_rate": 9.950244035074641e-05,
2055
+ "loss": 0.8839,
2056
+ "step": 290
2057
+ },
2058
+ {
2059
+ "epoch": 0.1677958771803373,
2060
+ "grad_norm": 0.58984375,
2061
+ "learning_rate": 9.949887577391482e-05,
2062
+ "loss": 0.9395,
2063
+ "step": 291
2064
+ },
2065
+ {
2066
+ "epoch": 0.16837249531497767,
2067
+ "grad_norm": 0.61328125,
2068
+ "learning_rate": 9.949529856615382e-05,
2069
+ "loss": 0.9681,
2070
+ "step": 292
2071
+ },
2072
+ {
2073
+ "epoch": 0.168949113449618,
2074
+ "grad_norm": 0.58203125,
2075
+ "learning_rate": 9.949170872877314e-05,
2076
+ "loss": 0.9334,
2077
+ "step": 293
2078
+ },
2079
+ {
2080
+ "epoch": 0.16952573158425832,
2081
+ "grad_norm": 0.59765625,
2082
+ "learning_rate": 9.948810626308711e-05,
2083
+ "loss": 0.8906,
2084
+ "step": 294
2085
+ },
2086
+ {
2087
+ "epoch": 0.17010234971889865,
2088
+ "grad_norm": 0.59765625,
2089
+ "learning_rate": 9.94844911704147e-05,
2090
+ "loss": 0.936,
2091
+ "step": 295
2092
+ },
2093
+ {
2094
+ "epoch": 0.170678967853539,
2095
+ "grad_norm": 0.59765625,
2096
+ "learning_rate": 9.948086345207945e-05,
2097
+ "loss": 0.9193,
2098
+ "step": 296
2099
+ },
2100
+ {
2101
+ "epoch": 0.17125558598817933,
2102
+ "grad_norm": 0.59765625,
2103
+ "learning_rate": 9.94772231094096e-05,
2104
+ "loss": 0.924,
2105
+ "step": 297
2106
+ },
2107
+ {
2108
+ "epoch": 0.17183220412281966,
2109
+ "grad_norm": 0.59765625,
2110
+ "learning_rate": 9.947357014373797e-05,
2111
+ "loss": 0.921,
2112
+ "step": 298
2113
+ },
2114
+ {
2115
+ "epoch": 0.17240882225746,
2116
+ "grad_norm": 0.60546875,
2117
+ "learning_rate": 9.946990455640197e-05,
2118
+ "loss": 0.9262,
2119
+ "step": 299
2120
+ },
2121
+ {
2122
+ "epoch": 0.17298544039210034,
2123
+ "grad_norm": 0.578125,
2124
+ "learning_rate": 9.94662263487437e-05,
2125
+ "loss": 0.9545,
2126
+ "step": 300
2127
+ },
2128
+ {
2129
+ "epoch": 0.17356205852674067,
2130
+ "grad_norm": 0.60546875,
2131
+ "learning_rate": 9.946253552210984e-05,
2132
+ "loss": 0.9526,
2133
+ "step": 301
2134
+ },
2135
+ {
2136
+ "epoch": 0.174138676661381,
2137
+ "grad_norm": 0.578125,
2138
+ "learning_rate": 9.94588320778517e-05,
2139
+ "loss": 0.8684,
2140
+ "step": 302
2141
+ },
2142
+ {
2143
+ "epoch": 0.17471529479602133,
2144
+ "grad_norm": 0.60546875,
2145
+ "learning_rate": 9.945511601732518e-05,
2146
+ "loss": 0.933,
2147
+ "step": 303
2148
+ },
2149
+ {
2150
+ "epoch": 0.17529191293066168,
2151
+ "grad_norm": 0.58984375,
2152
+ "learning_rate": 9.945138734189088e-05,
2153
+ "loss": 0.9365,
2154
+ "step": 304
2155
+ },
2156
+ {
2157
+ "epoch": 0.175868531065302,
2158
+ "grad_norm": 0.59765625,
2159
+ "learning_rate": 9.94476460529139e-05,
2160
+ "loss": 0.9153,
2161
+ "step": 305
2162
+ },
2163
+ {
2164
+ "epoch": 0.17644514919994234,
2165
+ "grad_norm": 0.58203125,
2166
+ "learning_rate": 9.944389215176406e-05,
2167
+ "loss": 0.9356,
2168
+ "step": 306
2169
+ },
2170
+ {
2171
+ "epoch": 0.17702176733458266,
2172
+ "grad_norm": 0.59765625,
2173
+ "learning_rate": 9.944012563981575e-05,
2174
+ "loss": 0.9362,
2175
+ "step": 307
2176
+ },
2177
+ {
2178
+ "epoch": 0.17759838546922302,
2179
+ "grad_norm": 0.54296875,
2180
+ "learning_rate": 9.9436346518448e-05,
2181
+ "loss": 0.9098,
2182
+ "step": 308
2183
+ },
2184
+ {
2185
+ "epoch": 0.17817500360386335,
2186
+ "grad_norm": 0.58984375,
2187
+ "learning_rate": 9.943255478904444e-05,
2188
+ "loss": 0.9465,
2189
+ "step": 309
2190
+ },
2191
+ {
2192
+ "epoch": 0.17875162173850367,
2193
+ "grad_norm": 0.58984375,
2194
+ "learning_rate": 9.942875045299331e-05,
2195
+ "loss": 0.8589,
2196
+ "step": 310
2197
+ },
2198
+ {
2199
+ "epoch": 0.179328239873144,
2200
+ "grad_norm": 0.5859375,
2201
+ "learning_rate": 9.942493351168747e-05,
2202
+ "loss": 0.9228,
2203
+ "step": 311
2204
+ },
2205
+ {
2206
+ "epoch": 0.17990485800778436,
2207
+ "grad_norm": 0.57421875,
2208
+ "learning_rate": 9.942110396652442e-05,
2209
+ "loss": 0.925,
2210
+ "step": 312
2211
+ },
2212
+ {
2213
+ "epoch": 0.18048147614242468,
2214
+ "grad_norm": 0.61328125,
2215
+ "learning_rate": 9.941726181890625e-05,
2216
+ "loss": 0.9151,
2217
+ "step": 313
2218
+ },
2219
+ {
2220
+ "epoch": 0.181058094277065,
2221
+ "grad_norm": 0.58203125,
2222
+ "learning_rate": 9.941340707023965e-05,
2223
+ "loss": 0.9018,
2224
+ "step": 314
2225
+ },
2226
+ {
2227
+ "epoch": 0.18163471241170534,
2228
+ "grad_norm": 0.5859375,
2229
+ "learning_rate": 9.940953972193596e-05,
2230
+ "loss": 0.9353,
2231
+ "step": 315
2232
+ },
2233
+ {
2234
+ "epoch": 0.1822113305463457,
2235
+ "grad_norm": 0.66015625,
2236
+ "learning_rate": 9.940565977541112e-05,
2237
+ "loss": 0.9979,
2238
+ "step": 316
2239
+ },
2240
+ {
2241
+ "epoch": 0.18278794868098602,
2242
+ "grad_norm": 0.60546875,
2243
+ "learning_rate": 9.940176723208569e-05,
2244
+ "loss": 0.9077,
2245
+ "step": 317
2246
+ },
2247
+ {
2248
+ "epoch": 0.18336456681562635,
2249
+ "grad_norm": 0.60546875,
2250
+ "learning_rate": 9.93978620933848e-05,
2251
+ "loss": 0.9142,
2252
+ "step": 318
2253
+ },
2254
+ {
2255
+ "epoch": 0.18394118495026668,
2256
+ "grad_norm": 0.6015625,
2257
+ "learning_rate": 9.939394436073824e-05,
2258
+ "loss": 0.9162,
2259
+ "step": 319
2260
+ },
2261
+ {
2262
+ "epoch": 0.18451780308490703,
2263
+ "grad_norm": 0.609375,
2264
+ "learning_rate": 9.939001403558036e-05,
2265
+ "loss": 0.9548,
2266
+ "step": 320
2267
+ },
2268
+ {
2269
+ "epoch": 0.18509442121954736,
2270
+ "grad_norm": 0.59765625,
2271
+ "learning_rate": 9.938607111935024e-05,
2272
+ "loss": 0.9308,
2273
+ "step": 321
2274
+ },
2275
+ {
2276
+ "epoch": 0.1856710393541877,
2277
+ "grad_norm": 0.6328125,
2278
+ "learning_rate": 9.938211561349137e-05,
2279
+ "loss": 0.9203,
2280
+ "step": 322
2281
+ },
2282
+ {
2283
+ "epoch": 0.18624765748882802,
2284
+ "grad_norm": 0.58203125,
2285
+ "learning_rate": 9.937814751945207e-05,
2286
+ "loss": 0.8803,
2287
+ "step": 323
2288
+ },
2289
+ {
2290
+ "epoch": 0.18682427562346834,
2291
+ "grad_norm": 0.60546875,
2292
+ "learning_rate": 9.937416683868508e-05,
2293
+ "loss": 0.8631,
2294
+ "step": 324
2295
+ },
2296
+ {
2297
+ "epoch": 0.1874008937581087,
2298
+ "grad_norm": 0.609375,
2299
+ "learning_rate": 9.937017357264786e-05,
2300
+ "loss": 0.9173,
2301
+ "step": 325
2302
+ },
2303
+ {
2304
+ "epoch": 0.18797751189274903,
2305
+ "grad_norm": 0.60546875,
2306
+ "learning_rate": 9.936616772280247e-05,
2307
+ "loss": 0.9349,
2308
+ "step": 326
2309
+ },
2310
+ {
2311
+ "epoch": 0.18855413002738936,
2312
+ "grad_norm": 0.578125,
2313
+ "learning_rate": 9.936214929061552e-05,
2314
+ "loss": 0.9278,
2315
+ "step": 327
2316
+ },
2317
+ {
2318
+ "epoch": 0.18913074816202968,
2319
+ "grad_norm": 0.5859375,
2320
+ "learning_rate": 9.935811827755827e-05,
2321
+ "loss": 0.9365,
2322
+ "step": 328
2323
+ },
2324
+ {
2325
+ "epoch": 0.18970736629667004,
2326
+ "grad_norm": 0.6015625,
2327
+ "learning_rate": 9.935407468510658e-05,
2328
+ "loss": 0.8856,
2329
+ "step": 329
2330
+ },
2331
+ {
2332
+ "epoch": 0.19028398443131037,
2333
+ "grad_norm": 0.59765625,
2334
+ "learning_rate": 9.935001851474093e-05,
2335
+ "loss": 0.9021,
2336
+ "step": 330
2337
+ },
2338
+ {
2339
+ "epoch": 0.1908606025659507,
2340
+ "grad_norm": 0.6015625,
2341
+ "learning_rate": 9.934594976794638e-05,
2342
+ "loss": 0.9066,
2343
+ "step": 331
2344
+ },
2345
+ {
2346
+ "epoch": 0.19143722070059102,
2347
+ "grad_norm": 0.60546875,
2348
+ "learning_rate": 9.93418684462126e-05,
2349
+ "loss": 0.9428,
2350
+ "step": 332
2351
+ },
2352
+ {
2353
+ "epoch": 0.19201383883523138,
2354
+ "grad_norm": 0.58203125,
2355
+ "learning_rate": 9.933777455103385e-05,
2356
+ "loss": 0.9251,
2357
+ "step": 333
2358
+ },
2359
+ {
2360
+ "epoch": 0.1925904569698717,
2361
+ "grad_norm": 0.59765625,
2362
+ "learning_rate": 9.933366808390904e-05,
2363
+ "loss": 0.9136,
2364
+ "step": 334
2365
+ },
2366
+ {
2367
+ "epoch": 0.19316707510451203,
2368
+ "grad_norm": 0.60546875,
2369
+ "learning_rate": 9.932954904634165e-05,
2370
+ "loss": 0.9046,
2371
+ "step": 335
2372
+ },
2373
+ {
2374
+ "epoch": 0.19374369323915236,
2375
+ "grad_norm": 0.59375,
2376
+ "learning_rate": 9.932541743983974e-05,
2377
+ "loss": 0.9416,
2378
+ "step": 336
2379
+ },
2380
+ {
2381
+ "epoch": 0.19432031137379271,
2382
+ "grad_norm": 0.59765625,
2383
+ "learning_rate": 9.9321273265916e-05,
2384
+ "loss": 0.8579,
2385
+ "step": 337
2386
+ },
2387
+ {
2388
+ "epoch": 0.19489692950843304,
2389
+ "grad_norm": 0.58984375,
2390
+ "learning_rate": 9.931711652608777e-05,
2391
+ "loss": 0.9288,
2392
+ "step": 338
2393
+ },
2394
+ {
2395
+ "epoch": 0.19547354764307337,
2396
+ "grad_norm": 0.5625,
2397
+ "learning_rate": 9.931294722187689e-05,
2398
+ "loss": 0.8769,
2399
+ "step": 339
2400
+ },
2401
+ {
2402
+ "epoch": 0.1960501657777137,
2403
+ "grad_norm": 0.609375,
2404
+ "learning_rate": 9.930876535480986e-05,
2405
+ "loss": 0.9835,
2406
+ "step": 340
2407
+ },
2408
+ {
2409
+ "epoch": 0.19662678391235405,
2410
+ "grad_norm": 0.6015625,
2411
+ "learning_rate": 9.930457092641778e-05,
2412
+ "loss": 0.9814,
2413
+ "step": 341
2414
+ },
2415
+ {
2416
+ "epoch": 0.19720340204699438,
2417
+ "grad_norm": 0.609375,
2418
+ "learning_rate": 9.930036393823632e-05,
2419
+ "loss": 0.9236,
2420
+ "step": 342
2421
+ },
2422
+ {
2423
+ "epoch": 0.1977800201816347,
2424
+ "grad_norm": 0.578125,
2425
+ "learning_rate": 9.92961443918058e-05,
2426
+ "loss": 0.8814,
2427
+ "step": 343
2428
+ },
2429
+ {
2430
+ "epoch": 0.19835663831627504,
2431
+ "grad_norm": 0.578125,
2432
+ "learning_rate": 9.929191228867105e-05,
2433
+ "loss": 0.901,
2434
+ "step": 344
2435
+ },
2436
+ {
2437
+ "epoch": 0.1989332564509154,
2438
+ "grad_norm": 0.5859375,
2439
+ "learning_rate": 9.928766763038162e-05,
2440
+ "loss": 0.897,
2441
+ "step": 345
2442
+ },
2443
+ {
2444
+ "epoch": 0.19950987458555572,
2445
+ "grad_norm": 0.578125,
2446
+ "learning_rate": 9.928341041849154e-05,
2447
+ "loss": 0.915,
2448
+ "step": 346
2449
+ },
2450
+ {
2451
+ "epoch": 0.20008649272019605,
2452
+ "grad_norm": 0.6015625,
2453
+ "learning_rate": 9.927914065455952e-05,
2454
+ "loss": 0.9253,
2455
+ "step": 347
2456
+ },
2457
+ {
2458
+ "epoch": 0.20066311085483637,
2459
+ "grad_norm": 0.58984375,
2460
+ "learning_rate": 9.927485834014878e-05,
2461
+ "loss": 0.9277,
2462
+ "step": 348
2463
+ },
2464
+ {
2465
+ "epoch": 0.20123972898947673,
2466
+ "grad_norm": 0.60546875,
2467
+ "learning_rate": 9.927056347682724e-05,
2468
+ "loss": 0.8951,
2469
+ "step": 349
2470
+ },
2471
+ {
2472
+ "epoch": 0.20181634712411706,
2473
+ "grad_norm": 0.609375,
2474
+ "learning_rate": 9.92662560661673e-05,
2475
+ "loss": 0.9207,
2476
+ "step": 350
2477
+ },
2478
+ {
2479
+ "epoch": 0.20239296525875738,
2480
+ "grad_norm": 0.63671875,
2481
+ "learning_rate": 9.92619361097461e-05,
2482
+ "loss": 0.9634,
2483
+ "step": 351
2484
+ },
2485
+ {
2486
+ "epoch": 0.2029695833933977,
2487
+ "grad_norm": 0.56640625,
2488
+ "learning_rate": 9.925760360914523e-05,
2489
+ "loss": 0.919,
2490
+ "step": 352
2491
+ },
2492
+ {
2493
+ "epoch": 0.20354620152803807,
2494
+ "grad_norm": 0.57421875,
2495
+ "learning_rate": 9.925325856595091e-05,
2496
+ "loss": 0.8943,
2497
+ "step": 353
2498
+ },
2499
+ {
2500
+ "epoch": 0.2041228196626784,
2501
+ "grad_norm": 0.6171875,
2502
+ "learning_rate": 9.924890098175403e-05,
2503
+ "loss": 0.8739,
2504
+ "step": 354
2505
+ },
2506
+ {
2507
+ "epoch": 0.20469943779731872,
2508
+ "grad_norm": 0.58984375,
2509
+ "learning_rate": 9.924453085814997e-05,
2510
+ "loss": 0.946,
2511
+ "step": 355
2512
+ },
2513
+ {
2514
+ "epoch": 0.20527605593195905,
2515
+ "grad_norm": 0.59765625,
2516
+ "learning_rate": 9.924014819673877e-05,
2517
+ "loss": 0.8982,
2518
+ "step": 356
2519
+ },
2520
+ {
2521
+ "epoch": 0.2058526740665994,
2522
+ "grad_norm": 0.64453125,
2523
+ "learning_rate": 9.923575299912504e-05,
2524
+ "loss": 0.9856,
2525
+ "step": 357
2526
+ },
2527
+ {
2528
+ "epoch": 0.20642929220123973,
2529
+ "grad_norm": 0.62109375,
2530
+ "learning_rate": 9.923134526691795e-05,
2531
+ "loss": 0.8953,
2532
+ "step": 358
2533
+ },
2534
+ {
2535
+ "epoch": 0.20700591033588006,
2536
+ "grad_norm": 0.60546875,
2537
+ "learning_rate": 9.92269250017313e-05,
2538
+ "loss": 0.9021,
2539
+ "step": 359
2540
+ },
2541
+ {
2542
+ "epoch": 0.2075825284705204,
2543
+ "grad_norm": 0.5859375,
2544
+ "learning_rate": 9.922249220518345e-05,
2545
+ "loss": 0.9057,
2546
+ "step": 360
2547
+ },
2548
+ {
2549
+ "epoch": 0.20815914660516074,
2550
+ "grad_norm": 0.59375,
2551
+ "learning_rate": 9.921804687889738e-05,
2552
+ "loss": 0.9107,
2553
+ "step": 361
2554
+ },
2555
+ {
2556
+ "epoch": 0.20873576473980107,
2557
+ "grad_norm": 0.59765625,
2558
+ "learning_rate": 9.921358902450062e-05,
2559
+ "loss": 0.9374,
2560
+ "step": 362
2561
+ },
2562
+ {
2563
+ "epoch": 0.2093123828744414,
2564
+ "grad_norm": 0.60546875,
2565
+ "learning_rate": 9.920911864362534e-05,
2566
+ "loss": 0.9108,
2567
+ "step": 363
2568
+ },
2569
+ {
2570
+ "epoch": 0.20988900100908173,
2571
+ "grad_norm": 0.5859375,
2572
+ "learning_rate": 9.920463573790822e-05,
2573
+ "loss": 0.8881,
2574
+ "step": 364
2575
+ },
2576
+ {
2577
+ "epoch": 0.21046561914372208,
2578
+ "grad_norm": 0.5859375,
2579
+ "learning_rate": 9.920014030899058e-05,
2580
+ "loss": 0.9157,
2581
+ "step": 365
2582
+ },
2583
+ {
2584
+ "epoch": 0.2110422372783624,
2585
+ "grad_norm": 0.5859375,
2586
+ "learning_rate": 9.919563235851835e-05,
2587
+ "loss": 0.8968,
2588
+ "step": 366
2589
+ },
2590
+ {
2591
+ "epoch": 0.21161885541300274,
2592
+ "grad_norm": 0.63671875,
2593
+ "learning_rate": 9.919111188814192e-05,
2594
+ "loss": 0.9315,
2595
+ "step": 367
2596
+ },
2597
+ {
2598
+ "epoch": 0.21219547354764307,
2599
+ "grad_norm": 0.59765625,
2600
+ "learning_rate": 9.918657889951645e-05,
2601
+ "loss": 0.9069,
2602
+ "step": 368
2603
+ },
2604
+ {
2605
+ "epoch": 0.21277209168228342,
2606
+ "grad_norm": 0.578125,
2607
+ "learning_rate": 9.918203339430152e-05,
2608
+ "loss": 0.8905,
2609
+ "step": 369
2610
+ },
2611
+ {
2612
+ "epoch": 0.21334870981692375,
2613
+ "grad_norm": 0.60546875,
2614
+ "learning_rate": 9.917747537416138e-05,
2615
+ "loss": 0.914,
2616
+ "step": 370
2617
+ },
2618
+ {
2619
+ "epoch": 0.21392532795156408,
2620
+ "grad_norm": 0.578125,
2621
+ "learning_rate": 9.917290484076485e-05,
2622
+ "loss": 0.8891,
2623
+ "step": 371
2624
+ },
2625
+ {
2626
+ "epoch": 0.2145019460862044,
2627
+ "grad_norm": 0.58203125,
2628
+ "learning_rate": 9.91683217957853e-05,
2629
+ "loss": 0.898,
2630
+ "step": 372
2631
+ },
2632
+ {
2633
+ "epoch": 0.21507856422084476,
2634
+ "grad_norm": 0.58203125,
2635
+ "learning_rate": 9.916372624090071e-05,
2636
+ "loss": 0.9283,
2637
+ "step": 373
2638
+ },
2639
+ {
2640
+ "epoch": 0.2156551823554851,
2641
+ "grad_norm": 0.58203125,
2642
+ "learning_rate": 9.915911817779362e-05,
2643
+ "loss": 0.932,
2644
+ "step": 374
2645
+ },
2646
+ {
2647
+ "epoch": 0.21623180049012541,
2648
+ "grad_norm": 0.58984375,
2649
+ "learning_rate": 9.915449760815117e-05,
2650
+ "loss": 0.9047,
2651
+ "step": 375
2652
+ },
2653
+ {
2654
+ "epoch": 0.21623180049012541,
2655
+ "eval_loss": 0.9402498006820679,
2656
+ "eval_runtime": 1.4105,
2657
+ "eval_samples_per_second": 131.872,
2658
+ "eval_steps_per_second": 2.836,
2659
+ "step": 375
2660
  }
2661
  ],
2662
  "logging_steps": 1,
 
2676
  "attributes": {}
2677
  }
2678
  },
2679
+ "total_flos": 5.717261656522752e+17,
2680
  "train_batch_size": 60,
2681
  "trial_name": null,
2682
  "trial_params": null