m2l-diffusion / train_cosine_uncond.out
Gabrijel's picture
Upload folder using huggingface_hub
4619996 verified
/athenahomes/gabrijel/miniconda3/envs/track-generator/lib/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/athenahomes/gabrijel/miniconda3/envs/track-generator/lib/python3.11/site-packages/torchvision/image.so: undefined symbol: _ZN3c1017RegisterOperatorsD1Ev'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
warn(
Schedule: cosine
Cfg: False
Output path: /scratch/shared/beegfs/gabrijel/m2l/mini
Patch Size: 4
Device: cuda:3
=====================================================================================
Layer (type:depth-idx) Param #
=====================================================================================
DiT 18,816
├─PatchEmbed: 1-1 --
│ └─Conv2d: 2-1 6,528
├─TimestepEmbedder: 1-2 --
│ └─Mlp: 2-2 --
│ │ └─Linear: 3-1 98,688
│ │ └─SiLU: 3-2 --
│ │ └─Linear: 3-3 147,840
├─ModuleList: 1-3 --
│ └─DiTBlock: 2-3 --
│ │ └─LayerNorm: 3-4 --
│ │ └─MultiheadAttention: 3-5 591,360
│ │ └─LayerNorm: 3-6 --
│ │ └─Mlp: 3-7 1,181,568
│ │ └─Sequential: 3-8 887,040
│ └─DiTBlock: 2-4 --
│ │ └─LayerNorm: 3-9 --
│ │ └─MultiheadAttention: 3-10 591,360
│ │ └─LayerNorm: 3-11 --
│ │ └─Mlp: 3-12 1,181,568
│ │ └─Sequential: 3-13 887,040
│ └─DiTBlock: 2-5 --
│ │ └─LayerNorm: 3-14 --
│ │ └─MultiheadAttention: 3-15 591,360
│ │ └─LayerNorm: 3-16 --
│ │ └─Mlp: 3-17 1,181,568
│ │ └─Sequential: 3-18 887,040
│ └─DiTBlock: 2-6 --
│ │ └─LayerNorm: 3-19 --
│ │ └─MultiheadAttention: 3-20 591,360
│ │ └─LayerNorm: 3-21 --
│ │ └─Mlp: 3-22 1,181,568
│ │ └─Sequential: 3-23 887,040
│ └─DiTBlock: 2-7 --
│ │ └─LayerNorm: 3-24 --
│ │ └─MultiheadAttention: 3-25 591,360
│ │ └─LayerNorm: 3-26 --
│ │ └─Mlp: 3-27 1,181,568
│ │ └─Sequential: 3-28 887,040
│ └─DiTBlock: 2-8 --
│ │ └─LayerNorm: 3-29 --
│ │ └─MultiheadAttention: 3-30 591,360
│ │ └─LayerNorm: 3-31 --
│ │ └─Mlp: 3-32 1,181,568
│ │ └─Sequential: 3-33 887,040
├─FinalLayer: 1-4 --
│ └─LayerNorm: 2-9 --
│ └─Linear: 2-10 6,160
│ └─Sequential: 2-11 --
│ │ └─SiLU: 3-34 --
│ │ └─Linear: 3-35 295,680
├─Unpatchify: 1-5 --
=====================================================================================
Total params: 16,533,520
Trainable params: 16,514,704
Non-trainable params: 18,816
=====================================================================================
EPOCH: 1
Loss at step 0: 1.003037691116333
Loss at step 50: 0.27661845088005066
Loss at step 100: 0.1424816995859146
Loss at step 150: 0.149506077170372
Loss at step 200: 0.12479458749294281
Loss at step 250: 0.11426719278097153
Loss at step 300: 0.1121714636683464
Loss at step 350: 0.1525452435016632
Loss at step 400: 0.13195544481277466
Loss at step 450: 0.13427531719207764
Loss at step 500: 0.11618398129940033
Loss at step 550: 0.10502675175666809
Loss at step 600: 0.0915534719824791
Loss at step 650: 0.09573911130428314
Loss at step 700: 0.1133173331618309
Loss at step 750: 0.1329326033592224
Loss at step 800: 0.10761799663305283
Loss at step 850: 0.11455381661653519
Loss at step 900: 0.12381163984537125
Mean training loss after epoch 1: 0.1539469359716627
EPOCH: 2
Loss at step 0: 0.10565946251153946
Loss at step 50: 0.10705439746379852
Loss at step 100: 0.10383749008178711
Loss at step 150: 0.06799700856208801
Loss at step 200: 0.10335274040699005
Loss at step 250: 0.06992191821336746
Loss at step 300: 0.08639557659626007
Loss at step 350: 0.075557179749012
Loss at step 400: 0.08633825182914734
Loss at step 450: 0.07788940519094467
Loss at step 500: 0.07782382518053055
Loss at step 550: 0.08034727722406387
Loss at step 600: 0.08342839032411575
Loss at step 650: 0.07229728251695633
Loss at step 700: 0.1125161275267601
Loss at step 750: 0.07861600071191788
Loss at step 800: 0.07409120351076126
Loss at step 850: 0.078849658370018
Loss at step 900: 0.094577357172966
Mean training loss after epoch 2: 0.0890754868369748
EPOCH: 3
Loss at step 0: 0.09418325871229172
Loss at step 50: 0.07725059241056442
Loss at step 100: 0.07888046652078629
Loss at step 150: 0.07631003856658936
Loss at step 200: 0.08261995762586594
Loss at step 250: 0.08296551555395126
Loss at step 300: 0.06751654297113419
Loss at step 350: 0.056437570601701736
Loss at step 400: 0.08921732008457184
Loss at step 450: 0.07375114411115646
Loss at step 500: 0.0711858719587326
Loss at step 550: 0.07505157589912415
Loss at step 600: 0.08232348412275314
Loss at step 650: 0.06679479032754898
Loss at step 700: 0.08148904889822006
Loss at step 750: 0.08080136030912399
Loss at step 800: 0.08312830328941345
Loss at step 850: 0.07133378833532333
Loss at step 900: 0.09072605520486832
Mean training loss after epoch 3: 0.0814713569544653
EPOCH: 4
Loss at step 0: 0.07232233136892319
Loss at step 50: 0.07750030606985092
Loss at step 100: 0.07092481851577759
Loss at step 150: 0.07385428249835968
Loss at step 200: 0.0765489712357521
Loss at step 250: 0.08501236885786057
Loss at step 300: 0.0753922238945961
Loss at step 350: 0.09025567770004272
Loss at step 400: 0.07097692042589188
Loss at step 450: 0.07708340883255005
Loss at step 500: 0.09128562361001968
Loss at step 550: 0.06018142029643059
Loss at step 600: 0.09149488806724548
Loss at step 650: 0.06561129540205002
Loss at step 700: 0.06980118900537491
Loss at step 750: 0.08550712466239929
Loss at step 800: 0.09240806102752686
Loss at step 850: 0.07110103964805603
Loss at step 900: 0.07148086279630661
Mean training loss after epoch 4: 0.0787551273812236
EPOCH: 5
Loss at step 0: 0.06837005913257599
Loss at step 50: 0.08118490874767303
Loss at step 100: 0.07443185895681381
Loss at step 150: 0.07052610069513321
Loss at step 200: 0.06819907575845718
Loss at step 250: 0.07496345043182373
Loss at step 300: 0.06800892949104309
Loss at step 350: 0.06302531063556671
Loss at step 400: 0.08613581210374832
Loss at step 450: 0.08716695010662079
Loss at step 500: 0.06726646423339844
Loss at step 550: 0.08373793959617615
Loss at step 600: 0.06744256615638733
Loss at step 650: 0.06358864903450012
Loss at step 700: 0.07609168440103531
Loss at step 750: 0.08561772853136063
Loss at step 800: 0.08098775148391724
Loss at step 850: 0.06596729904413223
Loss at step 900: 0.09818398952484131
Mean training loss after epoch 5: 0.07621333939132532
EPOCH: 6
Loss at step 0: 0.07262420654296875
Loss at step 50: 0.08250504732131958
Loss at step 100: 0.0771770104765892
Loss at step 150: 0.10342646390199661
Loss at step 200: 0.058930426836013794
Loss at step 250: 0.07842208445072174
Loss at step 300: 0.07260201871395111
Loss at step 350: 0.08667268604040146
Loss at step 400: 0.07582581788301468
Loss at step 450: 0.06609180569648743
Loss at step 500: 0.06838462501764297
Loss at step 550: 0.0686202421784401
Loss at step 600: 0.07450604438781738
Loss at step 650: 0.06995474547147751
Loss at step 700: 0.09052643924951553
Loss at step 750: 0.06410960853099823
Loss at step 800: 0.06000955030322075
Loss at step 850: 0.058537937700748444
Loss at step 900: 0.05873030051589012
Mean training loss after epoch 6: 0.07148564022296527
EPOCH: 7
Loss at step 0: 0.08014527708292007
Loss at step 50: 0.06883670389652252
Loss at step 100: 0.05498388782143593
Loss at step 150: 0.049575336277484894
Loss at step 200: 0.07284802198410034
Loss at step 250: 0.05627802759408951
Loss at step 300: 0.08170035481452942
Loss at step 350: 0.07256447523832321
Loss at step 400: 0.0588550940155983
Loss at step 450: 0.07103971391916275
Loss at step 500: 0.07324987649917603
Loss at step 550: 0.06593427062034607
Loss at step 600: 0.0544898696243763
Loss at step 650: 0.07234711199998856
Loss at step 700: 0.0544438362121582
Loss at step 750: 0.057596754282712936
Loss at step 800: 0.05363701656460762
Loss at step 850: 0.06061040610074997
Loss at step 900: 0.050478413701057434
Mean training loss after epoch 7: 0.061715417988360055
EPOCH: 8
Loss at step 0: 0.06242280825972557
Loss at step 50: 0.06322493404150009
Loss at step 100: 0.06131700053811073
Loss at step 150: 0.0666111558675766
Loss at step 200: 0.05284688621759415
Loss at step 250: 0.05521265044808388
Loss at step 300: 0.0472453273832798
Loss at step 350: 0.05958862975239754
Loss at step 400: 0.05755053088068962
Loss at step 450: 0.05801082402467728
Loss at step 500: 0.05060916021466255
Loss at step 550: 0.07028011977672577
Loss at step 600: 0.06796812266111374
Loss at step 650: 0.0757439136505127
Loss at step 700: 0.07297901809215546
Loss at step 750: 0.04786565154790878
Loss at step 800: 0.04735714942216873
Loss at step 850: 0.050904281437397
Loss at step 900: 0.06405418366193771
Mean training loss after epoch 8: 0.05874664374569586
EPOCH: 9
Loss at step 0: 0.05896533280611038
Loss at step 50: 0.045270130038261414
Loss at step 100: 0.06076472997665405
Loss at step 150: 0.06235165521502495
Loss at step 200: 0.05219578742980957
Loss at step 250: 0.08207492530345917
Loss at step 300: 0.0685364380478859
Loss at step 350: 0.05931883305311203
Loss at step 400: 0.08336658030748367
Loss at step 450: 0.0543605200946331
Loss at step 500: 0.06624364852905273
Loss at step 550: 0.05275069177150726
Loss at step 600: 0.08901050686836243
Loss at step 650: 0.059423163533210754
Loss at step 700: 0.05075065791606903
Loss at step 750: 0.06457599997520447
Loss at step 800: 0.07323766499757767
Loss at step 850: 0.05342152714729309
Loss at step 900: 0.048495594412088394
Mean training loss after epoch 9: 0.057180338576912626
EPOCH: 10
Loss at step 0: 0.05593691021203995
Loss at step 50: 0.0404839850962162
Loss at step 100: 0.05115879699587822
Loss at step 150: 0.06267639249563217
Loss at step 200: 0.05636795982718468
Loss at step 250: 0.04891287907958031
Loss at step 300: 0.05107974261045456
Loss at step 350: 0.04856284707784653
Loss at step 400: 0.04412970319390297
Loss at step 450: 0.060254551470279694
Loss at step 500: 0.051702968776226044
Loss at step 550: 0.05409117415547371
Loss at step 600: 0.051070645451545715
Loss at step 650: 0.052753277122974396
Loss at step 700: 0.047309085726737976
Loss at step 750: 0.06630449742078781
Loss at step 800: 0.05757919326424599
Loss at step 850: 0.06830720603466034
Loss at step 900: 0.06392861902713776
Mean training loss after epoch 10: 0.055850355824364276
EPOCH: 11
Loss at step 0: 0.05673862248659134
Loss at step 50: 0.07288111001253128
Loss at step 100: 0.055279891937971115
Loss at step 150: 0.0829085037112236
Loss at step 200: 0.05173470079898834
Loss at step 250: 0.07002110034227371
Loss at step 300: 0.041786838322877884
Loss at step 350: 0.05969792976975441
Loss at step 400: 0.05089668184518814
Loss at step 450: 0.05513685569167137
Loss at step 500: 0.06826495379209518
Loss at step 550: 0.050237782299518585
Loss at step 600: 0.045402273535728455
Loss at step 650: 0.037458840757608414
Loss at step 700: 0.047057949006557465
Loss at step 750: 0.04279854893684387
Loss at step 800: 0.05576750636100769
Loss at step 850: 0.06578527390956879
Loss at step 900: 0.053406111896038055
Mean training loss after epoch 11: 0.05481477373682741
EPOCH: 12
Loss at step 0: 0.07697451114654541
Loss at step 50: 0.07025545835494995
Loss at step 100: 0.04223531857132912
Loss at step 150: 0.042284440249204636
Loss at step 200: 0.05219224467873573
Loss at step 250: 0.04662042483687401
Loss at step 300: 0.04564218968153
Loss at step 350: 0.05778402090072632
Loss at step 400: 0.04940832406282425
Loss at step 450: 0.06825084984302521
Loss at step 500: 0.047397445887327194
Loss at step 550: 0.05655599385499954
Loss at step 600: 0.038294047117233276
Loss at step 650: 0.05758621171116829
Loss at step 700: 0.05851387232542038
Loss at step 750: 0.04812590777873993
Loss at step 800: 0.05467564985156059
Loss at step 850: 0.03674978390336037
Loss at step 900: 0.048017699271440506
Mean training loss after epoch 12: 0.05442187320719014
EPOCH: 13
Loss at step 0: 0.05527647212147713
Loss at step 50: 0.05649548023939133
Loss at step 100: 0.043758414685726166
Loss at step 150: 0.03759016841650009
Loss at step 200: 0.04890500754117966
Loss at step 250: 0.05938975140452385
Loss at step 300: 0.06087874621152878
Loss at step 350: 0.05134967714548111
Loss at step 400: 0.05763311684131622
Loss at step 450: 0.055892378091812134
Loss at step 500: 0.0512239933013916
Loss at step 550: 0.04450003057718277
Loss at step 600: 0.04901410639286041
Loss at step 650: 0.05313221737742424
Loss at step 700: 0.05558817461133003
Loss at step 750: 0.0535665825009346
Loss at step 800: 0.08542494475841522
Loss at step 850: 0.05859728157520294
Loss at step 900: 0.06013469025492668
Mean training loss after epoch 13: 0.05281311084530247
EPOCH: 14
Loss at step 0: 0.0647217184305191
Loss at step 50: 0.05076614022254944
Loss at step 100: 0.03850740194320679
Loss at step 150: 0.055288080126047134
Loss at step 200: 0.040912121534347534
Loss at step 250: 0.04788177087903023
Loss at step 300: 0.06873010843992233
Loss at step 350: 0.05255446210503578
Loss at step 400: 0.0451323427259922
Loss at step 450: 0.04819340631365776
Loss at step 500: 0.059297483414411545
Loss at step 550: 0.045939523726701736
Loss at step 600: 0.06561025977134705
Loss at step 650: 0.04447270184755325
Loss at step 700: 0.06097499653697014
Loss at step 750: 0.059666190296411514
Loss at step 800: 0.04177537560462952
Loss at step 850: 0.0449681282043457
Loss at step 900: 0.05002983286976814
Mean training loss after epoch 14: 0.052944248951256656
EPOCH: 15
Loss at step 0: 0.037823475897312164
Loss at step 50: 0.07952471077442169
Loss at step 100: 0.04094187170267105
Loss at step 150: 0.060179922729730606
Loss at step 200: 0.04950674995779991
Loss at step 250: 0.06051012873649597
Loss at step 300: 0.051027752459049225
Loss at step 350: 0.08003382384777069
Loss at step 400: 0.054900161921978
Loss at step 450: 0.03677508607506752
Loss at step 500: 0.07743576914072037
Loss at step 550: 0.05444803088903427
Loss at step 600: 0.05422133579850197
Loss at step 650: 0.04789598658680916
Loss at step 700: 0.043794967234134674
Loss at step 750: 0.04442184790968895
Loss at step 800: 0.04888957738876343
Loss at step 850: 0.03554798662662506
Loss at step 900: 0.054901354014873505
Mean training loss after epoch 15: 0.05222853389916135
EPOCH: 16
Loss at step 0: 0.04499037191271782
Loss at step 50: 0.04059568792581558
Loss at step 100: 0.03880501538515091
Loss at step 150: 0.04736030101776123
Loss at step 200: 0.054259732365608215
Loss at step 250: 0.056340526789426804
Loss at step 300: 0.04246404767036438
Loss at step 350: 0.05467449873685837
Loss at step 400: 0.038072679191827774
Loss at step 450: 0.05155707895755768
Loss at step 500: 0.04925785958766937
Loss at step 550: 0.04731006547808647
Loss at step 600: 0.05020074173808098
Loss at step 650: 0.04935452342033386
Loss at step 700: 0.05027550831437111
Loss at step 750: 0.04625604674220085
Loss at step 800: 0.07112246006727219
Loss at step 850: 0.04084265977144241
Loss at step 900: 0.05504539981484413
Mean training loss after epoch 16: 0.05157376733670103
EPOCH: 17
Loss at step 0: 0.07536608725786209
Loss at step 50: 0.042563579976558685
Loss at step 100: 0.035157352685928345
Loss at step 150: 0.05038196220993996
Loss at step 200: 0.049875665456056595
Loss at step 250: 0.03781870752573013
Loss at step 300: 0.056721512228250504
Loss at step 350: 0.04967416077852249
Loss at step 400: 0.03947357088327408
Loss at step 450: 0.053190600126981735
Loss at step 500: 0.0426432341337204
Loss at step 550: 0.05922308564186096
Loss at step 600: 0.054390799254179
Loss at step 650: 0.08069872856140137
Loss at step 700: 0.04753753915429115
Loss at step 750: 0.036334265023469925
Loss at step 800: 0.04727853834629059
Loss at step 850: 0.0459805391728878
Loss at step 900: 0.0718865841627121
Mean training loss after epoch 17: 0.05112017348392813
EPOCH: 18
Loss at step 0: 0.04541482776403427
Loss at step 50: 0.06357073783874512
Loss at step 100: 0.05133732780814171
Loss at step 150: 0.048947181552648544
Loss at step 200: 0.04920993372797966
Loss at step 250: 0.07129635661840439
Loss at step 300: 0.041890259832143784
Loss at step 350: 0.04461813345551491
Loss at step 400: 0.06786632537841797
Loss at step 450: 0.03864361345767975
Loss at step 500: 0.05946999043226242
Loss at step 550: 0.047408659011125565
Loss at step 600: 0.04371762275695801
Loss at step 650: 0.05137581750750542
Loss at step 700: 0.039694201201200485
Loss at step 750: 0.0421057753264904
Loss at step 800: 0.056266531348228455
Loss at step 850: 0.04970719292759895
Loss at step 900: 0.04250839352607727
Mean training loss after epoch 18: 0.05031145928002624
EPOCH: 19
Loss at step 0: 0.04314997047185898
Loss at step 50: 0.041050687432289124
Loss at step 100: 0.04350142553448677
Loss at step 150: 0.04735090211033821
Loss at step 200: 0.042304959148168564
Loss at step 250: 0.0446118600666523
Loss at step 300: 0.057678062468767166
Loss at step 350: 0.05146310478448868
Loss at step 400: 0.04377828165888786
Loss at step 450: 0.03493072837591171
Loss at step 500: 0.06859597563743591
Loss at step 550: 0.046260152012109756
Loss at step 600: 0.043077658861875534
Loss at step 650: 0.055410489439964294
Loss at step 700: 0.05613521486520767
Loss at step 750: 0.051349785178899765
Loss at step 800: 0.04863078147172928
Loss at step 850: 0.03905788064002991
Loss at step 900: 0.050505876541137695
Mean training loss after epoch 19: 0.04981041669662891
EPOCH: 20
Loss at step 0: 0.05344609543681145
Loss at step 50: 0.04180081561207771
Loss at step 100: 0.04135894775390625
Loss at step 150: 0.04881883040070534
Loss at step 200: 0.05157637596130371
Loss at step 250: 0.06887549161911011
Loss at step 300: 0.04278310760855675
Loss at step 350: 0.05660872906446457
Loss at step 400: 0.04872731491923332
Loss at step 450: 0.04364878311753273
Loss at step 500: 0.06668026745319366
Loss at step 550: 0.046359237283468246
Loss at step 600: 0.03858613967895508
Loss at step 650: 0.0463811419904232
Loss at step 700: 0.0656844824552536
Loss at step 750: 0.04059052839875221
Loss at step 800: 0.04606105759739876
Loss at step 850: 0.04518028721213341
Loss at step 900: 0.0448128879070282
Mean training loss after epoch 20: 0.04971912971088119
EPOCH: 21
Loss at step 0: 0.04147127643227577
Loss at step 50: 0.06674125045537949
Loss at step 100: 0.046588215976953506
Loss at step 150: 0.04992952197790146
Loss at step 200: 0.05683210864663124
Loss at step 250: 0.06446004658937454
Loss at step 300: 0.07351561635732651
Loss at step 350: 0.048821087926626205
Loss at step 400: 0.06500155478715897
Loss at step 450: 0.045991450548172
Loss at step 500: 0.04785887897014618
Loss at step 550: 0.048360276967287064
Loss at step 600: 0.06245497241616249
Loss at step 650: 0.04141472652554512
Loss at step 700: 0.04987093433737755
Loss at step 750: 0.0448041595518589
Loss at step 800: 0.039200447499752045
Loss at step 850: 0.061862435191869736
Loss at step 900: 0.0374799408018589
Mean training loss after epoch 21: 0.04948336795083622
EPOCH: 22
Loss at step 0: 0.04231114313006401
Loss at step 50: 0.051757242530584335
Loss at step 100: 0.052523884922266006
Loss at step 150: 0.04511893540620804
Loss at step 200: 0.047847066074609756
Loss at step 250: 0.03973916918039322
Loss at step 300: 0.04602658003568649
Loss at step 350: 0.04559744521975517
Loss at step 400: 0.038105208426713943
Loss at step 450: 0.04447225108742714
Loss at step 500: 0.052992742508649826
Loss at step 550: 0.04886539652943611
Loss at step 600: 0.04174038767814636
Loss at step 650: 0.0403764434158802
Loss at step 700: 0.037883441895246506
Loss at step 750: 0.04011991620063782
Loss at step 800: 0.04163298383355141
Loss at step 850: 0.0440143346786499
Loss at step 900: 0.04741881787776947
Mean training loss after epoch 22: 0.04886792954812045
EPOCH: 23
Loss at step 0: 0.043948426842689514
Loss at step 50: 0.04617948457598686
Loss at step 100: 0.041180990636348724
Loss at step 150: 0.07108350843191147
Loss at step 200: 0.05402369797229767
Loss at step 250: 0.04757066071033478
Loss at step 300: 0.04691469669342041
Loss at step 350: 0.056709371507167816
Loss at step 400: 0.08028485625982285
Loss at step 450: 0.03732737526297569
Loss at step 500: 0.0625799298286438
Loss at step 550: 0.06315683573484421
Loss at step 600: 0.06575877964496613
Loss at step 650: 0.03729895502328873
Loss at step 700: 0.04726678505539894
Loss at step 750: 0.052553772926330566
Loss at step 800: 0.04982905834913254
Loss at step 850: 0.04016149044036865
Loss at step 900: 0.0371665395796299
Mean training loss after epoch 23: 0.04921432044793929
EPOCH: 24
Loss at step 0: 0.06877944618463516
Loss at step 50: 0.04670701548457146
Loss at step 100: 0.05103413760662079
Loss at step 150: 0.04245966300368309
Loss at step 200: 0.0422213077545166
Loss at step 250: 0.05819988250732422
Loss at step 300: 0.03808581456542015
Loss at step 350: 0.044201094657182693
Loss at step 400: 0.04023940488696098
Loss at step 450: 0.03730885311961174
Loss at step 500: 0.0679292157292366
Loss at step 550: 0.04315111041069031
Loss at step 600: 0.040241699665784836
Loss at step 650: 0.048743754625320435
Loss at step 700: 0.06133408471941948
Loss at step 750: 0.035759568214416504
Loss at step 800: 0.07543523609638214
Loss at step 850: 0.041768599301576614
Loss at step 900: 0.0580260194838047
Mean training loss after epoch 24: 0.04907009113572045
EPOCH: 25
Loss at step 0: 0.04890033230185509
Loss at step 50: 0.04610893502831459
Loss at step 100: 0.062320057302713394
Loss at step 150: 0.04533512517809868
Loss at step 200: 0.04547395929694176
Loss at step 250: 0.05030956491827965
Loss at step 300: 0.06314963847398758
Loss at step 350: 0.04957621917128563
Loss at step 400: 0.04225068539381027
Loss at step 450: 0.060565605759620667
Loss at step 500: 0.04107939824461937
Loss at step 550: 0.04405638575553894
Loss at step 600: 0.035555195063352585
Loss at step 650: 0.04223763942718506
Loss at step 700: 0.05385289713740349
Loss at step 750: 0.04406140744686127
Loss at step 800: 0.04753494635224342
Loss at step 850: 0.03908270597457886
Loss at step 900: 0.056173697113990784
Mean training loss after epoch 25: 0.04854819401185205
EPOCH: 26
Loss at step 0: 0.04513881728053093
Loss at step 50: 0.04251381754875183
Loss at step 100: 0.056283511221408844
Loss at step 150: 0.03885107487440109
Loss at step 200: 0.04387012869119644
Loss at step 250: 0.08682840317487717
Loss at step 300: 0.042384058237075806
Loss at step 350: 0.04147539287805557
Loss at step 400: 0.037699781358242035
Loss at step 450: 0.08088959008455276
Loss at step 500: 0.045566800981760025
Loss at step 550: 0.04850791022181511
Loss at step 600: 0.03581390157341957
Loss at step 650: 0.036787454038858414
Loss at step 700: 0.04770814999938011
Loss at step 750: 0.044783566147089005
Loss at step 800: 0.031522780656814575
Loss at step 850: 0.04192390292882919
Loss at step 900: 0.053765568882226944
Mean training loss after epoch 26: 0.048748386362547684
EPOCH: 27
Loss at step 0: 0.062231797724962234
Loss at step 50: 0.05955164507031441
Loss at step 100: 0.039682600647211075
Loss at step 150: 0.06752344220876694
Loss at step 200: 0.061354734003543854
Loss at step 250: 0.037967052310705185
Loss at step 300: 0.04776729643344879
Loss at step 350: 0.0504353865981102
Loss at step 400: 0.04052473604679108
Loss at step 450: 0.05573800951242447
Loss at step 500: 0.04068378359079361
Loss at step 550: 0.03938471898436546
Loss at step 600: 0.04409739375114441
Loss at step 650: 0.06390330195426941
Loss at step 700: 0.04190912842750549
Loss at step 750: 0.04705492779612541
Loss at step 800: 0.05942341312766075
Loss at step 850: 0.054906755685806274
Loss at step 900: 0.04091959819197655
Mean training loss after epoch 27: 0.04785118063590102
EPOCH: 28
Loss at step 0: 0.040065884590148926
Loss at step 50: 0.041816335171461105
Loss at step 100: 0.04768802598118782
Loss at step 150: 0.04224841296672821
Loss at step 200: 0.0600610077381134
Loss at step 250: 0.04674086347222328
Loss at step 300: 0.03749881684780121
Loss at step 350: 0.05725927650928497
Loss at step 400: 0.03463057056069374
Loss at step 450: 0.04139943793416023
Loss at step 500: 0.036785632371902466
Loss at step 550: 0.037280261516571045
Loss at step 600: 0.042486049234867096
Loss at step 650: 0.039891812950372696
Loss at step 700: 0.045055538415908813
Loss at step 750: 0.044806648045778275
Loss at step 800: 0.035296935588121414
Loss at step 850: 0.03997200354933739
Loss at step 900: 0.037372902035713196
Mean training loss after epoch 28: 0.04802111568433771
EPOCH: 29
Loss at step 0: 0.04029547795653343
Loss at step 50: 0.0373159721493721
Loss at step 100: 0.03195224702358246
Loss at step 150: 0.03967324644327164
Loss at step 200: 0.043493278324604034
Loss at step 250: 0.0421735905110836
Loss at step 300: 0.05834754556417465
Loss at step 350: 0.04977051913738251
Loss at step 400: 0.04890020564198494
Loss at step 450: 0.06144270300865173
Loss at step 500: 0.031416650861501694
Loss at step 550: 0.05089873448014259
Loss at step 600: 0.0559379979968071
Loss at step 650: 0.0431060828268528
Loss at step 700: 0.07970152795314789
Loss at step 750: 0.056120067834854126
Loss at step 800: 0.03743378818035126
Loss at step 850: 0.04540432617068291
Loss at step 900: 0.03750884160399437
Mean training loss after epoch 29: 0.0479990868275163
EPOCH: 30
Loss at step 0: 0.06987270712852478
Loss at step 50: 0.04634237289428711
Loss at step 100: 0.04522722586989403
Loss at step 150: 0.037521373480558395
Loss at step 200: 0.04052366316318512
Loss at step 250: 0.047215189784765244
Loss at step 300: 0.03429022431373596
Loss at step 350: 0.042049288749694824
Loss at step 400: 0.04067468270659447
Loss at step 450: 0.04121248796582222
Loss at step 500: 0.041117824614048004
Loss at step 550: 0.04192543774843216
Loss at step 600: 0.0396721288561821
Loss at step 650: 0.0341549776494503
Loss at step 700: 0.0468466654419899
Loss at step 750: 0.047854892909526825
Loss at step 800: 0.04301496222615242
Loss at step 850: 0.06360778212547302
Loss at step 900: 0.04396238178014755
Mean training loss after epoch 30: 0.047711722947943055
EPOCH: 31
Loss at step 0: 0.046192727982997894
Loss at step 50: 0.056457825005054474
Loss at step 100: 0.04437893256545067
Loss at step 150: 0.03571672737598419
Loss at step 200: 0.041461531072854996
Loss at step 250: 0.038135726004838943
Loss at step 300: 0.04517757520079613
Loss at step 350: 0.03788148984313011
Loss at step 400: 0.04530879482626915
Loss at step 450: 0.04953639209270477
Loss at step 500: 0.04382310435175896
Loss at step 550: 0.04811768978834152
Loss at step 600: 0.05481934919953346
Loss at step 650: 0.04873424395918846
Loss at step 700: 0.03968692570924759
Loss at step 750: 0.0569743812084198
Loss at step 800: 0.034903742372989655
Loss at step 850: 0.041342843323946
Loss at step 900: 0.04470614343881607
Mean training loss after epoch 31: 0.04718852380135738
EPOCH: 32
Loss at step 0: 0.04153997823596001
Loss at step 50: 0.05805188789963722
Loss at step 100: 0.041979819536209106
Loss at step 150: 0.056203462183475494
Loss at step 200: 0.0658581554889679
Loss at step 250: 0.047670237720012665
Loss at step 300: 0.04966076835989952
Loss at step 350: 0.03970366343855858
Loss at step 400: 0.04131742939352989
Loss at step 450: 0.0559941828250885
Loss at step 500: 0.0515303798019886
Loss at step 550: 0.03879852220416069
Loss at step 600: 0.04554179310798645
Loss at step 650: 0.041446492075920105
Loss at step 700: 0.045973826199769974
Loss at step 750: 0.05146137624979019
Loss at step 800: 0.050301820039749146
Loss at step 850: 0.05406748503446579
Loss at step 900: 0.047541599720716476
Mean training loss after epoch 32: 0.04734661062555844
EPOCH: 33
Loss at step 0: 0.04255426675081253
Loss at step 50: 0.045197855681180954
Loss at step 100: 0.04791636765003204
Loss at step 150: 0.053649891167879105
Loss at step 200: 0.04649053514003754
Loss at step 250: 0.06254279613494873
Loss at step 300: 0.04070403426885605
Loss at step 350: 0.06130688264966011
Loss at step 400: 0.033172477036714554
Loss at step 450: 0.0585973784327507
Loss at step 500: 0.03862085938453674
Loss at step 550: 0.05066192150115967
Loss at step 600: 0.04806030914187431
Loss at step 650: 0.040029771625995636
Loss at step 700: 0.05965670570731163
Loss at step 750: 0.04943963512778282
Loss at step 800: 0.04053017124533653
Loss at step 850: 0.0498194620013237
Loss at step 900: 0.0408606119453907
Mean training loss after epoch 33: 0.0466234264577598
EPOCH: 34
Loss at step 0: 0.03834282234311104
Loss at step 50: 0.04476098716259003
Loss at step 100: 0.03126645088195801
Loss at step 150: 0.04436739161610603
Loss at step 200: 0.035322658717632294
Loss at step 250: 0.04000015929341316
Loss at step 300: 0.03503677248954773
Loss at step 350: 0.036471109837293625
Loss at step 400: 0.041077565401792526
Loss at step 450: 0.05009006708860397
Loss at step 500: 0.03923311457037926
Loss at step 550: 0.027845796197652817
Loss at step 600: 0.044510193169116974
Loss at step 650: 0.04472494497895241
Loss at step 700: 0.036923471838235855
Loss at step 750: 0.03947778046131134
Loss at step 800: 0.04731610417366028
Loss at step 850: 0.0408109687268734
Loss at step 900: 0.04986851289868355
Mean training loss after epoch 34: 0.046696712082224104
EPOCH: 35
Loss at step 0: 0.05347377434372902
Loss at step 50: 0.04718519747257233
Loss at step 100: 0.03765302151441574
Loss at step 150: 0.04094993695616722
Loss at step 200: 0.03967397287487984
Loss at step 250: 0.04097428172826767
Loss at step 300: 0.044033098965883255
Loss at step 350: 0.04170210286974907
Loss at step 400: 0.0495026521384716
Loss at step 450: 0.04580632597208023
Loss at step 500: 0.03558526188135147
Loss at step 550: 0.03231578320264816
Loss at step 600: 0.03808858245611191
Loss at step 650: 0.07824570685625076
Loss at step 700: 0.054205626249313354
Loss at step 750: 0.04045689105987549
Loss at step 800: 0.05172043293714523
Loss at step 850: 0.04387115314602852
Loss at step 900: 0.041183892637491226
Mean training loss after epoch 35: 0.046853596549123716
EPOCH: 36
Loss at step 0: 0.033817023038864136
Loss at step 50: 0.044775836169719696
Loss at step 100: 0.04400419443845749
Loss at step 150: 0.03964809328317642
Loss at step 200: 0.0711473822593689
Loss at step 250: 0.05198086053133011
Loss at step 300: 0.04352946951985359
Loss at step 350: 0.04083481431007385
Loss at step 400: 0.05258826166391373
Loss at step 450: 0.05412430688738823
Loss at step 500: 0.036205705255270004
Loss at step 550: 0.051293227821588516
Loss at step 600: 0.039234597235918045
Loss at step 650: 0.04175081476569176
Loss at step 700: 0.044233884662389755
Loss at step 750: 0.047170866280794144
Loss at step 800: 0.06287672370672226
Loss at step 850: 0.045048587024211884
Loss at step 900: 0.047904081642627716
Mean training loss after epoch 36: 0.04643794561007511
EPOCH: 37
Loss at step 0: 0.04380636289715767
Loss at step 50: 0.05075736716389656
Loss at step 100: 0.03939420357346535
Loss at step 150: 0.041874587535858154
Loss at step 200: 0.03474506363272667
Loss at step 250: 0.04335370287299156
Loss at step 300: 0.03736516833305359
Loss at step 350: 0.04953424260020256
Loss at step 400: 0.034158531576395035
Loss at step 450: 0.09222576022148132
Loss at step 500: 0.03409889340400696
Loss at step 550: 0.05518524721264839
Loss at step 600: 0.04299849644303322
Loss at step 650: 0.03976324200630188
Loss at step 700: 0.04704044759273529
Loss at step 750: 0.05765819549560547
Loss at step 800: 0.06325271725654602
Loss at step 850: 0.041912227869033813
Loss at step 900: 0.03947821259498596
Mean training loss after epoch 37: 0.046509189958146006
EPOCH: 38
Loss at step 0: 0.04105685278773308
Loss at step 50: 0.03853827714920044
Loss at step 100: 0.0615091510117054
Loss at step 150: 0.037095218896865845
Loss at step 200: 0.07169077545404434
Loss at step 250: 0.04093547165393829
Loss at step 300: 0.03914790228009224
Loss at step 350: 0.041914209723472595
Loss at step 400: 0.04600575566291809
Loss at step 450: 0.04167331010103226
Loss at step 500: 0.05215302109718323
Loss at step 550: 0.057997964322566986
Loss at step 600: 0.04561407491564751
Loss at step 650: 0.035688042640686035
Loss at step 700: 0.041937340050935745
Loss at step 750: 0.042008861899375916
Loss at step 800: 0.043819013983011246
Loss at step 850: 0.06488332897424698
Loss at step 900: 0.04783814772963524
Mean training loss after epoch 38: 0.04656100857343628
EPOCH: 39
Loss at step 0: 0.046811822801828384
Loss at step 50: 0.03458044305443764
Loss at step 100: 0.03519456088542938
Loss at step 150: 0.04023684188723564
Loss at step 200: 0.0593542717397213
Loss at step 250: 0.04351789876818657
Loss at step 300: 0.03666255623102188
Loss at step 350: 0.042385295033454895
Loss at step 400: 0.04535805433988571
Loss at step 450: 0.040133584290742874
Loss at step 500: 0.04300827905535698
Loss at step 550: 0.03428584709763527
Loss at step 600: 0.043937813490629196
Loss at step 650: 0.04107518866658211
Loss at step 700: 0.0656203031539917
Loss at step 750: 0.04333982616662979
Loss at step 800: 0.04911601543426514
Loss at step 850: 0.04030092433094978
Loss at step 900: 0.037120744585990906
Mean training loss after epoch 39: 0.045584303543352876
EPOCH: 40
Loss at step 0: 0.043902307748794556
Loss at step 50: 0.04687623307108879
Loss at step 100: 0.02807818166911602
Loss at step 150: 0.0314083956182003
Loss at step 200: 0.03794383257627487
Loss at step 250: 0.06447502225637436
Loss at step 300: 0.03832196444272995
Loss at step 350: 0.04955507069826126
Loss at step 400: 0.04458783194422722
Loss at step 450: 0.06182854250073433
Loss at step 500: 0.034158360213041306
Loss at step 550: 0.05126744136214256
Loss at step 600: 0.04173079878091812
Loss at step 650: 0.047507818788290024
Loss at step 700: 0.039553675800561905
Loss at step 750: 0.036442361772060394
Loss at step 800: 0.03931163251399994
Loss at step 850: 0.03706500306725502
Loss at step 900: 0.041083257645368576
Mean training loss after epoch 40: 0.04583525314712639
EPOCH: 41
Loss at step 0: 0.04119272530078888
Loss at step 50: 0.03651197627186775
Loss at step 100: 0.04471709579229355
Loss at step 150: 0.047409866005182266
Loss at step 200: 0.057044386863708496
Loss at step 250: 0.03891262039542198
Loss at step 300: 0.058914415538311005
Loss at step 350: 0.04932182654738426
Loss at step 400: 0.03527601808309555
Loss at step 450: 0.05954860895872116
Loss at step 500: 0.05854950472712517
Loss at step 550: 0.04339790716767311
Loss at step 600: 0.04158823564648628
Loss at step 650: 0.07653175294399261
Loss at step 700: 0.06495469808578491
Loss at step 750: 0.043791264295578
Loss at step 800: 0.036664631217718124
Loss at step 850: 0.04060354083776474
Loss at step 900: 0.05105862393975258
Mean training loss after epoch 41: 0.04578510606919588
EPOCH: 42
Loss at step 0: 0.036811619997024536
Loss at step 50: 0.03967383876442909
Loss at step 100: 0.046618491411209106
Loss at step 150: 0.04137549176812172
Loss at step 200: 0.04536155238747597
Loss at step 250: 0.0344829335808754
Loss at step 300: 0.04808695986866951
Loss at step 350: 0.034518640488386154
Loss at step 400: 0.040801674127578735
Loss at step 450: 0.04482355713844299
Loss at step 500: 0.05345488339662552
Loss at step 550: 0.042088404297828674
Loss at step 600: 0.03969200327992439
Loss at step 650: 0.03745196387171745
Loss at step 700: 0.05043499544262886
Loss at step 750: 0.03882772848010063
Loss at step 800: 0.040638964623212814
Loss at step 850: 0.059426199644804
Loss at step 900: 0.055314578115940094
Mean training loss after epoch 42: 0.0457447647595647
EPOCH: 43
Loss at step 0: 0.039296090602874756
Loss at step 50: 0.03757215291261673
Loss at step 100: 0.045208077877759933
Loss at step 150: 0.05504849553108215
Loss at step 200: 0.03858547285199165
Loss at step 250: 0.052830155938863754
Loss at step 300: 0.03214762732386589
Loss at step 350: 0.054610736668109894
Loss at step 400: 0.072967529296875
Loss at step 450: 0.045324429869651794
Loss at step 500: 0.05956553295254707
Loss at step 550: 0.040348950773477554
Loss at step 600: 0.038659922778606415
Loss at step 650: 0.04288933053612709
Loss at step 700: 0.05102803558111191
Loss at step 750: 0.04348177835345268
Loss at step 800: 0.03679901733994484
Loss at step 850: 0.04044809564948082
Loss at step 900: 0.05503632500767708
Mean training loss after epoch 43: 0.04555305257550816
EPOCH: 44
Loss at step 0: 0.03301229327917099
Loss at step 50: 0.03727089241147041
Loss at step 100: 0.042083825916051865
Loss at step 150: 0.03446154296398163
Loss at step 200: 0.0325627475976944
Loss at step 250: 0.03536694869399071
Loss at step 300: 0.03226042538881302
Loss at step 350: 0.029795240610837936
Loss at step 400: 0.06270813196897507
Loss at step 450: 0.052609071135520935
Loss at step 500: 0.04123945161700249
Loss at step 550: 0.04485645145177841
Loss at step 600: 0.03443359211087227
Loss at step 650: 0.05156237632036209
Loss at step 700: 0.05249813199043274
Loss at step 750: 0.04118892922997475
Loss at step 800: 0.042128149420022964
Loss at step 850: 0.05222824215888977
Loss at step 900: 0.04898892343044281
Mean training loss after epoch 44: 0.045909754842567416
EPOCH: 45
Loss at step 0: 0.03726803511381149
Loss at step 50: 0.041821129620075226
Loss at step 100: 0.04788229241967201
Loss at step 150: 0.07195736467838287
Loss at step 200: 0.04190758243203163
Loss at step 250: 0.04406964033842087
Loss at step 300: 0.045066285878419876
Loss at step 350: 0.08542950451374054
Loss at step 400: 0.036356471478939056
Loss at step 450: 0.05726455897092819
Loss at step 500: 0.04086785390973091
Loss at step 550: 0.04460451379418373
Loss at step 600: 0.04124069958925247
Loss at step 650: 0.03595242276787758
Loss at step 700: 0.05711403861641884
Loss at step 750: 0.04426044225692749
Loss at step 800: 0.03155608847737312
Loss at step 850: 0.04087338596582413
Loss at step 900: 0.03957609087228775
Mean training loss after epoch 45: 0.04559232201625798
EPOCH: 46
Loss at step 0: 0.03411095589399338
Loss at step 50: 0.04670671746134758
Loss at step 100: 0.05237060412764549
Loss at step 150: 0.036262497305870056
Loss at step 200: 0.057636260986328125
Loss at step 250: 0.04624015465378761
Loss at step 300: 0.0441092886030674
Loss at step 350: 0.053948499262332916
Loss at step 400: 0.0504290908575058
Loss at step 450: 0.058462247252464294
Loss at step 500: 0.03294298052787781
Loss at step 550: 0.03885740414261818
Loss at step 600: 0.03582470864057541
Loss at step 650: 0.03290253505110741
Loss at step 700: 0.034320082515478134
Loss at step 750: 0.04563366621732712
Loss at step 800: 0.037790633738040924
Loss at step 850: 0.04004897549748421
Loss at step 900: 0.05734499916434288
Mean training loss after epoch 46: 0.045401750487892995
EPOCH: 47
Loss at step 0: 0.057515017688274384
Loss at step 50: 0.05918058380484581
Loss at step 100: 0.06794729083776474
Loss at step 150: 0.04365697130560875
Loss at step 200: 0.03533811867237091
Loss at step 250: 0.0432097464799881
Loss at step 300: 0.04337077960371971
Loss at step 350: 0.05455371364951134
Loss at step 400: 0.0480773001909256
Loss at step 450: 0.03549404814839363
Loss at step 500: 0.04109460487961769
Loss at step 550: 0.05278811976313591
Loss at step 600: 0.044051751494407654
Loss at step 650: 0.03891625255346298
Loss at step 700: 0.033852893859148026
Loss at step 750: 0.05946379527449608
Loss at step 800: 0.03839148208498955
Loss at step 850: 0.05594377964735031
Loss at step 900: 0.04837467148900032
Mean training loss after epoch 47: 0.04540569746672218
EPOCH: 48
Loss at step 0: 0.03606181964278221
Loss at step 50: 0.0393998920917511
Loss at step 100: 0.0373966284096241
Loss at step 150: 0.03726385161280632
Loss at step 200: 0.053862374275922775
Loss at step 250: 0.039281781762838364
Loss at step 300: 0.052047569304704666
Loss at step 350: 0.04218338802456856
Loss at step 400: 0.054648831486701965
Loss at step 450: 0.03355717286467552
Loss at step 500: 0.04461154714226723
Loss at step 550: 0.05380718782544136
Loss at step 600: 0.03750714287161827
Loss at step 650: 0.04356677457690239
Loss at step 700: 0.056141939014196396
Loss at step 750: 0.05722740665078163
Loss at step 800: 0.05378589779138565
Loss at step 850: 0.04318094253540039
Loss at step 900: 0.03716384992003441
Mean training loss after epoch 48: 0.04575384488261775
EPOCH: 49
Loss at step 0: 0.03405068814754486
Loss at step 50: 0.0324208065867424
Loss at step 100: 0.058766111731529236
Loss at step 150: 0.050890326499938965
Loss at step 200: 0.07835625112056732
Loss at step 250: 0.06103396788239479
Loss at step 300: 0.045262910425662994
Loss at step 350: 0.039874423295259476
Loss at step 400: 0.04232581704854965
Loss at step 450: 0.0406830869615078
Loss at step 500: 0.034022778272628784
Loss at step 550: 0.042880669236183167
Loss at step 600: 0.04260297492146492
Loss at step 650: 0.03611469268798828
Loss at step 700: 0.04044531658291817
Loss at step 750: 0.04906051978468895
Loss at step 800: 0.039296798408031464
Loss at step 850: 0.07345636188983917
Loss at step 900: 0.03570806607604027
Mean training loss after epoch 49: 0.04555136628591938
EPOCH: 50
Loss at step 0: 0.03376990929245949
Loss at step 50: 0.03915917128324509
Loss at step 100: 0.045188069343566895
Loss at step 150: 0.03396744653582573
Loss at step 200: 0.037946101278066635
Loss at step 250: 0.048230335116386414
Loss at step 300: 0.03957950696349144
Loss at step 350: 0.06345433741807938
Loss at step 400: 0.05446555092930794
Loss at step 450: 0.041967470198869705
Loss at step 500: 0.06003414839506149
Loss at step 550: 0.04475271329283714
Loss at step 600: 0.0462694950401783
Loss at step 650: 0.03884352743625641
Loss at step 700: 0.03437794744968414
Loss at step 750: 0.03485594317317009
Loss at step 800: 0.03861059248447418
Loss at step 850: 0.04278019815683365
Loss at step 900: 0.036629848182201385
Mean training loss after epoch 50: 0.04574327027675376
EPOCH: 51
Loss at step 0: 0.04773973673582077
Loss at step 50: 0.04592078924179077
Loss at step 100: 0.04153735190629959
Loss at step 150: 0.04289252310991287
Loss at step 200: 0.05378266051411629
Loss at step 250: 0.05587004870176315
Loss at step 300: 0.04163934290409088
Loss at step 350: 0.0363738052546978
Loss at step 400: 0.037308674305677414
Loss at step 450: 0.03993014246225357
Loss at step 500: 0.038397662341594696
Loss at step 550: 0.03834199905395508
Loss at step 600: 0.04541454836726189
Loss at step 650: 0.03435764089226723
Loss at step 700: 0.049492184072732925
Loss at step 750: 0.06151849403977394
Loss at step 800: 0.04823997616767883
Loss at step 850: 0.05454571172595024
Loss at step 900: 0.05095576122403145
Mean training loss after epoch 51: 0.04479894658792883
EPOCH: 52
Loss at step 0: 0.05793466791510582
Loss at step 50: 0.0341513492166996
Loss at step 100: 0.031205492094159126
Loss at step 150: 0.04038567468523979
Loss at step 200: 0.04924270883202553
Loss at step 250: 0.03751857206225395
Loss at step 300: 0.03824424743652344
Loss at step 350: 0.03565317019820213
Loss at step 400: 0.03829163685441017
Loss at step 450: 0.04630826786160469
Loss at step 500: 0.036059457808732986
Loss at step 550: 0.0380818210542202
Loss at step 600: 0.05621505156159401
Loss at step 650: 0.04004345089197159
Loss at step 700: 0.04257235303521156
Loss at step 750: 0.050609856843948364
Loss at step 800: 0.03490433469414711
Loss at step 850: 0.06108823046088219
Loss at step 900: 0.03759215399622917
Mean training loss after epoch 52: 0.04466682242781623
EPOCH: 53
Loss at step 0: 0.07153818756341934
Loss at step 50: 0.037624310702085495
Loss at step 100: 0.04795879125595093
Loss at step 150: 0.042059168219566345
Loss at step 200: 0.04592517390847206
Loss at step 250: 0.047940898686647415
Loss at step 300: 0.03760400414466858
Loss at step 350: 0.058457329869270325
Loss at step 400: 0.04059361293911934
Loss at step 450: 0.03498782962560654
Loss at step 500: 0.03895006701350212
Loss at step 550: 0.03143053501844406
Loss at step 600: 0.04286786541342735
Loss at step 650: 0.02890370972454548
Loss at step 700: 0.03054078109562397
Loss at step 750: 0.0536319725215435
Loss at step 800: 0.03379025310277939
Loss at step 850: 0.041429825127124786
Loss at step 900: 0.036586616188287735
Mean training loss after epoch 53: 0.045109023038210516
EPOCH: 54
Loss at step 0: 0.034632205963134766
Loss at step 50: 0.050900544971227646
Loss at step 100: 0.055058639496564865
Loss at step 150: 0.03736027702689171
Loss at step 200: 0.03886985033750534
Loss at step 250: 0.04547104611992836
Loss at step 300: 0.03748707473278046
Loss at step 350: 0.04005163908004761
Loss at step 400: 0.04667902737855911
Loss at step 450: 0.058137696236371994
Loss at step 500: 0.036225397139787674
Loss at step 550: 0.04980180785059929
Loss at step 600: 0.0319833941757679
Loss at step 650: 0.04132211208343506
Loss at step 700: 0.031180264428257942
Loss at step 750: 0.055976614356040955
Loss at step 800: 0.040415141731500626
Loss at step 850: 0.04537496343255043
Loss at step 900: 0.06559032946825027
Mean training loss after epoch 54: 0.0447263324688842
EPOCH: 55
Loss at step 0: 0.03894679620862007
Loss at step 50: 0.055382825434207916
Loss at step 100: 0.05500679835677147
Loss at step 150: 0.03706794232130051
Loss at step 200: 0.03788752481341362
Loss at step 250: 0.044823382049798965
Loss at step 300: 0.04081863537430763
Loss at step 350: 0.051558658480644226
Loss at step 400: 0.04245118051767349
Loss at step 450: 0.032435957342386246
Loss at step 500: 0.043013881891965866
Loss at step 550: 0.06566750258207321
Loss at step 600: 0.03832860663533211
Loss at step 650: 0.04228802025318146
Loss at step 700: 0.06671218574047089
Loss at step 750: 0.03785625100135803
Loss at step 800: 0.036910444498062134
Loss at step 850: 0.0428953543305397
Loss at step 900: 0.04201735183596611
Mean training loss after epoch 55: 0.0454020186234067
EPOCH: 56
Loss at step 0: 0.04074004292488098
Loss at step 50: 0.037089236080646515
Loss at step 100: 0.038650039583444595
Loss at step 150: 0.0470503568649292
Loss at step 200: 0.0335656963288784
Loss at step 250: 0.05130336433649063
Loss at step 300: 0.041321609169244766
Loss at step 350: 0.05360228195786476
Loss at step 400: 0.054425470530986786
Loss at step 450: 0.034286580979824066
Loss at step 500: 0.05873318016529083
Loss at step 550: 0.04183107987046242
Loss at step 600: 0.04275026172399521
Loss at step 650: 0.036619555205106735
Loss at step 700: 0.04658738896250725
Loss at step 750: 0.044796522706747055
Loss at step 800: 0.06018178537487984
Loss at step 850: 0.06020748242735863
Loss at step 900: 0.04875624179840088
Mean training loss after epoch 56: 0.04514047994947573
EPOCH: 57
Loss at step 0: 0.037959903478622437
Loss at step 50: 0.045734651386737823
Loss at step 100: 0.03870757669210434
Loss at step 150: 0.053085923194885254
Loss at step 200: 0.043345555663108826
Loss at step 250: 0.04010351002216339
Loss at step 300: 0.036371856927871704
Loss at step 350: 0.037596240639686584
Loss at step 400: 0.03472841903567314
Loss at step 450: 0.043242860585451126
Loss at step 500: 0.04697151854634285
Loss at step 550: 0.03386232256889343
Loss at step 600: 0.05848577991127968
Loss at step 650: 0.037965212017297745
Loss at step 700: 0.0595490001142025
Loss at step 750: 0.07245443761348724
Loss at step 800: 0.04417521506547928
Loss at step 850: 0.03808566927909851
Loss at step 900: 0.040217895060777664
Mean training loss after epoch 57: 0.04452560655772686
EPOCH: 58
Loss at step 0: 0.04217497631907463
Loss at step 50: 0.04118611663579941
Loss at step 100: 0.03760745748877525
Loss at step 150: 0.03788227215409279
Loss at step 200: 0.036340679973363876
Loss at step 250: 0.044277723878622055
Loss at step 300: 0.07555166631937027
Loss at step 350: 0.039096567779779434
Loss at step 400: 0.0501922108232975
Loss at step 450: 0.04151548445224762
Loss at step 500: 0.04863559827208519
Loss at step 550: 0.04006786644458771
Loss at step 600: 0.03473867475986481
Loss at step 650: 0.03917982801795006
Loss at step 700: 0.038434259593486786
Loss at step 750: 0.04143731668591499
Loss at step 800: 0.04229950159788132
Loss at step 850: 0.04580509290099144
Loss at step 900: 0.04491458460688591
Mean training loss after epoch 58: 0.044204890307015195
EPOCH: 59
Loss at step 0: 0.04443920776247978
Loss at step 50: 0.038615334779024124
Loss at step 100: 0.044301483780145645
Loss at step 150: 0.04094228893518448
Loss at step 200: 0.039516597986221313
Loss at step 250: 0.04826976731419563
Loss at step 300: 0.06281120330095291
Loss at step 350: 0.04962306469678879
Loss at step 400: 0.0349610410630703
Loss at step 450: 0.04815901815891266
Loss at step 500: 0.04071445018053055
Loss at step 550: 0.03885610029101372
Loss at step 600: 0.04688568040728569
Loss at step 650: 0.03309250622987747
Loss at step 700: 0.0372539721429348
Loss at step 750: 0.06692992895841599
Loss at step 800: 0.05041303485631943
Loss at step 850: 0.049872804433107376
Loss at step 900: 0.036385852843523026
Mean training loss after epoch 59: 0.044467081796369955
EPOCH: 60
Loss at step 0: 0.04014415666460991
Loss at step 50: 0.08164425194263458
Loss at step 100: 0.039430782198905945
Loss at step 150: 0.03921978920698166
Loss at step 200: 0.04293350130319595
Loss at step 250: 0.04878218099474907
Loss at step 300: 0.039910636842250824
Loss at step 350: 0.042387500405311584
Loss at step 400: 0.056811921298503876
Loss at step 450: 0.03985873982310295
Loss at step 500: 0.046190857887268066
Loss at step 550: 0.04720006883144379
Loss at step 600: 0.04626326635479927
Loss at step 650: 0.05943738669157028
Loss at step 700: 0.07274050265550613
Loss at step 750: 0.03711295872926712
Loss at step 800: 0.04901483654975891
Loss at step 850: 0.05160197988152504
Loss at step 900: 0.04616200923919678
Mean training loss after epoch 60: 0.04484942184487131
EPOCH: 61
Loss at step 0: 0.034708376973867416
Loss at step 50: 0.04810373857617378
Loss at step 100: 0.031009318307042122
Loss at step 150: 0.04073278605937958
Loss at step 200: 0.037685640156269073
Loss at step 250: 0.03710269555449486
Loss at step 300: 0.03902563825249672
Loss at step 350: 0.03862561285495758
Loss at step 400: 0.04300009831786156
Loss at step 450: 0.059608832001686096
Loss at step 500: 0.041195113211870193
Loss at step 550: 0.05418688431382179
Loss at step 600: 0.054096776992082596
Loss at step 650: 0.040527116507291794
Loss at step 700: 0.05888514593243599
Loss at step 750: 0.053690355271101
Loss at step 800: 0.04354054108262062
Loss at step 850: 0.03404700756072998
Loss at step 900: 0.06319855898618698
Mean training loss after epoch 61: 0.04449913012924225
EPOCH: 62
Loss at step 0: 0.038908716291189194
Loss at step 50: 0.034311648458242416
Loss at step 100: 0.05498092994093895
Loss at step 150: 0.0483626127243042
Loss at step 200: 0.053626175969839096
Loss at step 250: 0.07578056305646896
Loss at step 300: 0.03906629979610443
Loss at step 350: 0.04471297562122345
Loss at step 400: 0.05710723251104355
Loss at step 450: 0.06131591647863388
Loss at step 500: 0.03654533997178078
Loss at step 550: 0.03932672366499901
Loss at step 600: 0.03606953099370003
Loss at step 650: 0.03935694694519043
Loss at step 700: 0.05097009614109993
Loss at step 750: 0.05328104645013809
Loss at step 800: 0.04115912318229675
Loss at step 850: 0.05281233787536621
Loss at step 900: 0.041887253522872925
Mean training loss after epoch 62: 0.043596246160630354
EPOCH: 63
Loss at step 0: 0.029015716165304184
Loss at step 50: 0.04414261877536774
Loss at step 100: 0.03649734705686569
Loss at step 150: 0.03882032260298729
Loss at step 200: 0.04342050477862358
Loss at step 250: 0.04559728503227234
Loss at step 300: 0.05425933748483658
Loss at step 350: 0.04868466034531593
Loss at step 400: 0.03153557702898979
Loss at step 450: 0.0450434684753418
Loss at step 500: 0.04320262745022774
Loss at step 550: 0.03585299476981163
Loss at step 600: 0.06846386194229126
Loss at step 650: 0.047943733632564545
Loss at step 700: 0.03418514505028725
Loss at step 750: 0.03347398713231087
Loss at step 800: 0.05686632916331291
Loss at step 850: 0.06825472414493561
Loss at step 900: 0.04557757452130318
Mean training loss after epoch 63: 0.04486468083052429
EPOCH: 64
Loss at step 0: 0.038345180451869965
Loss at step 50: 0.04878535494208336
Loss at step 100: 0.03994043916463852
Loss at step 150: 0.03193584829568863
Loss at step 200: 0.04250764101743698
Loss at step 250: 0.042187292128801346
Loss at step 300: 0.03934232145547867
Loss at step 350: 0.0445597879588604
Loss at step 400: 0.05384596064686775
Loss at step 450: 0.05301209166646004
Loss at step 500: 0.05442678555846214
Loss at step 550: 0.04261470213532448
Loss at step 600: 0.07980488240718842
Loss at step 650: 0.04727376252412796
Loss at step 700: 0.05658571049571037
Loss at step 750: 0.033086925745010376
Loss at step 800: 0.03404126316308975
Loss at step 850: 0.03186289966106415
Loss at step 900: 0.05820830911397934
Mean training loss after epoch 64: 0.04441411661973068
EPOCH: 65
Loss at step 0: 0.06885399669408798
Loss at step 50: 0.05599125474691391
Loss at step 100: 0.04768454283475876
Loss at step 150: 0.053254634141922
Loss at step 200: 0.03931909427046776
Loss at step 250: 0.036698125302791595
Loss at step 300: 0.05797281861305237
Loss at step 350: 0.04019729793071747
Loss at step 400: 0.04408475384116173
Loss at step 450: 0.04160203039646149
Loss at step 500: 0.04312726855278015
Loss at step 550: 0.03871629014611244
Loss at step 600: 0.03298605978488922
Loss at step 650: 0.036178793758153915
Loss at step 700: 0.05048339441418648
Loss at step 750: 0.05091601237654686
Loss at step 800: 0.04056558758020401
Loss at step 850: 0.03996409475803375
Loss at step 900: 0.05364915356040001
Mean training loss after epoch 65: 0.044274648685238636
EPOCH: 66
Loss at step 0: 0.04305730387568474
Loss at step 50: 0.03969784080982208
Loss at step 100: 0.035962287336587906
Loss at step 150: 0.041350722312927246
Loss at step 200: 0.04764682427048683
Loss at step 250: 0.052914198487997055
Loss at step 300: 0.052789609879255295
Loss at step 350: 0.04125749692320824
Loss at step 400: 0.05972205847501755
Loss at step 450: 0.03646932169795036
Loss at step 500: 0.04054495692253113
Loss at step 550: 0.057925526052713394
Loss at step 600: 0.05502486601471901
Loss at step 650: 0.06423813849687576
Loss at step 700: 0.041651081293821335
Loss at step 750: 0.03151623159646988
Loss at step 800: 0.037293124943971634
Loss at step 850: 0.05984169617295265
Loss at step 900: 0.036552462726831436
Mean training loss after epoch 66: 0.04449663154963555
EPOCH: 67
Loss at step 0: 0.03824847564101219
Loss at step 50: 0.03565288707613945
Loss at step 100: 0.039270512759685516
Loss at step 150: 0.03623337671160698
Loss at step 200: 0.050690602511167526
Loss at step 250: 0.03742752596735954
Loss at step 300: 0.04467052221298218
Loss at step 350: 0.03912796080112457
Loss at step 400: 0.04786604270339012
Loss at step 450: 0.03586800396442413
Loss at step 500: 0.05089021101593971
Loss at step 550: 0.045541878789663315
Loss at step 600: 0.03662842512130737
Loss at step 650: 0.03804018348455429
Loss at step 700: 0.06523645669221878
Loss at step 750: 0.043222736567258835
Loss at step 800: 0.05228416249155998
Loss at step 850: 0.044387586414813995
Loss at step 900: 0.03197185695171356
Mean training loss after epoch 67: 0.044540597860619965
EPOCH: 68
Loss at step 0: 0.03530889376997948
Loss at step 50: 0.05193200334906578
Loss at step 100: 0.04188147187232971
Loss at step 150: 0.044340841472148895
Loss at step 200: 0.039617713540792465
Loss at step 250: 0.029141953215003014
Loss at step 300: 0.059967041015625
Loss at step 350: 0.0396573543548584
Loss at step 400: 0.05593840777873993
Loss at step 450: 0.0386933758854866
Loss at step 500: 0.034154921770095825
Loss at step 550: 0.03156885504722595
Loss at step 600: 0.036909282207489014
Loss at step 650: 0.04749900475144386
Loss at step 700: 0.05062949284911156
Loss at step 750: 0.04943801835179329
Loss at step 800: 0.04978271201252937
Loss at step 850: 0.04305383935570717
Loss at step 900: 0.04517525061964989
Mean training loss after epoch 68: 0.04403649984416105
EPOCH: 69
Loss at step 0: 0.04713786393404007
Loss at step 50: 0.03688153252005577
Loss at step 100: 0.04463440924882889
Loss at step 150: 0.05203818157315254
Loss at step 200: 0.03601740673184395
Loss at step 250: 0.05964969843626022
Loss at step 300: 0.04440165311098099
Loss at step 350: 0.03362405300140381
Loss at step 400: 0.03362003713846207
Loss at step 450: 0.05673094093799591
Loss at step 500: 0.05858657509088516
Loss at step 550: 0.0479864664375782
Loss at step 600: 0.04205408692359924
Loss at step 650: 0.04245060309767723
Loss at step 700: 0.047696929425001144
Loss at step 750: 0.054844941943883896
Loss at step 800: 0.0449918769299984
Loss at step 850: 0.038500912487506866
Loss at step 900: 0.036807581782341
Mean training loss after epoch 69: 0.044288720746935684
EPOCH: 70
Loss at step 0: 0.033517539501190186
Loss at step 50: 0.03352891281247139
Loss at step 100: 0.03971768915653229
Loss at step 150: 0.037568893283605576
Loss at step 200: 0.066883884370327
Loss at step 250: 0.04035981744527817
Loss at step 300: 0.04199424386024475
Loss at step 350: 0.04756559059023857
Loss at step 400: 0.03550230711698532
Loss at step 450: 0.05051594227552414
Loss at step 500: 0.03901894763112068
Loss at step 550: 0.04193321615457535
Loss at step 600: 0.039895620197057724
Loss at step 650: 0.03211652860045433
Loss at step 700: 0.040562212467193604
Loss at step 750: 0.0446825809776783
Loss at step 800: 0.0380869135260582
Loss at step 850: 0.04290211945772171
Loss at step 900: 0.0694216936826706
Mean training loss after epoch 70: 0.0440076610275995
EPOCH: 71
Loss at step 0: 0.062086571007966995
Loss at step 50: 0.03848670795559883
Loss at step 100: 0.04382505640387535
Loss at step 150: 0.05180523172020912
Loss at step 200: 0.057804908603429794
Loss at step 250: 0.0323488749563694
Loss at step 300: 0.034228235483169556
Loss at step 350: 0.03648495674133301
Loss at step 400: 0.06315921247005463
Loss at step 450: 0.04083339869976044
Loss at step 500: 0.07379228621721268
Loss at step 550: 0.03787954896688461
Loss at step 600: 0.05260041356086731
Loss at step 650: 0.059406060725450516
Loss at step 700: 0.037485867738723755
Loss at step 750: 0.05460409075021744
Loss at step 800: 0.034081242978572845
Loss at step 850: 0.04085755720734596
Loss at step 900: 0.040349770337343216
Mean training loss after epoch 71: 0.04361870941092401
EPOCH: 72
Loss at step 0: 0.029254566878080368
Loss at step 50: 0.04164859652519226
Loss at step 100: 0.055361270904541016
Loss at step 150: 0.034679099917411804
Loss at step 200: 0.057330336421728134
Loss at step 250: 0.03006073087453842
Loss at step 300: 0.04034830629825592
Loss at step 350: 0.0592540018260479
Loss at step 400: 0.050872188061475754
Loss at step 450: 0.04640054702758789
Loss at step 500: 0.04122794792056084
Loss at step 550: 0.03518728166818619
Loss at step 600: 0.03226521611213684
Loss at step 650: 0.05286054685711861
Loss at step 700: 0.056503985077142715
Loss at step 750: 0.056091148406267166
Loss at step 800: 0.0377902016043663
Loss at step 850: 0.03336455300450325
Loss at step 900: 0.053792499005794525
Mean training loss after epoch 72: 0.04410509114215242
EPOCH: 73
Loss at step 0: 0.038552019745111465
Loss at step 50: 0.05634484440088272
Loss at step 100: 0.04676920548081398
Loss at step 150: 0.03462210297584534
Loss at step 200: 0.03854707255959511
Loss at step 250: 0.03935917094349861
Loss at step 300: 0.03562043607234955
Loss at step 350: 0.053162477910518646
Loss at step 400: 0.037162214517593384
Loss at step 450: 0.04485498368740082
Loss at step 500: 0.03977523371577263
Loss at step 550: 0.04587831348180771
Loss at step 600: 0.03899795562028885
Loss at step 650: 0.04515599459409714
Loss at step 700: 0.04103762283921242
Loss at step 750: 0.039450667798519135
Loss at step 800: 0.03686264902353287
Loss at step 850: 0.04491258040070534
Loss at step 900: 0.0488051176071167
Mean training loss after epoch 73: 0.04355169675632644
EPOCH: 74
Loss at step 0: 0.04988710954785347
Loss at step 50: 0.027748363092541695
Loss at step 100: 0.04155075177550316
Loss at step 150: 0.03735537454485893
Loss at step 200: 0.032628364861011505
Loss at step 250: 0.03547734394669533
Loss at step 300: 0.037861354649066925
Loss at step 350: 0.04635513946413994
Loss at step 400: 0.048436835408210754
Loss at step 450: 0.032090190798044205
Loss at step 500: 0.03548084571957588
Loss at step 550: 0.03390015661716461
Loss at step 600: 0.047148313373327255
Loss at step 650: 0.05798763036727905
Loss at step 700: 0.04385918006300926
Loss at step 750: 0.03126253932714462
Loss at step 800: 0.036953262984752655
Loss at step 850: 0.04472682997584343
Loss at step 900: 0.04753943160176277
Mean training loss after epoch 74: 0.0431889588994258
EPOCH: 75
Loss at step 0: 0.04028462618589401
Loss at step 50: 0.0357113741338253
Loss at step 100: 0.036158740520477295
Loss at step 150: 0.03210321068763733
Loss at step 200: 0.05170506611466408
Loss at step 250: 0.03830462694168091
Loss at step 300: 0.03860298916697502
Loss at step 350: 0.04723210632801056
Loss at step 400: 0.03571812063455582
Loss at step 450: 0.039360903203487396
Loss at step 500: 0.05320487916469574
Loss at step 550: 0.04517296701669693
Loss at step 600: 0.06515920162200928
Loss at step 650: 0.04200473055243492
Loss at step 700: 0.04345053806900978
Loss at step 750: 0.0318668894469738
Loss at step 800: 0.03969453275203705
Loss at step 850: 0.03803049027919769
Loss at step 900: 0.05230528861284256
Mean training loss after epoch 75: 0.04395143901194527
EPOCH: 76
Loss at step 0: 0.05301972106099129
Loss at step 50: 0.03598293662071228
Loss at step 100: 0.034400515258312225
Loss at step 150: 0.056835055351257324
Loss at step 200: 0.034850649535655975
Loss at step 250: 0.05500826612114906
Loss at step 300: 0.034377194941043854
Loss at step 350: 0.04097644239664078
Loss at step 400: 0.040672894567251205
Loss at step 450: 0.04145702347159386
Loss at step 500: 0.04885398969054222
Loss at step 550: 0.048190776258707047
Loss at step 600: 0.05448161065578461
Loss at step 650: 0.03598455339670181
Loss at step 700: 0.039347682148218155
Loss at step 750: 0.04168093949556351
Loss at step 800: 0.05553968623280525
Loss at step 850: 0.041357167065143585
Loss at step 900: 0.05268019810318947
Mean training loss after epoch 76: 0.04369923243247497
EPOCH: 77
Loss at step 0: 0.05658970773220062
Loss at step 50: 0.04428544640541077
Loss at step 100: 0.043661028146743774
Loss at step 150: 0.0697946846485138
Loss at step 200: 0.03846244886517525
Loss at step 250: 0.03738200664520264
Loss at step 300: 0.0409412682056427
Loss at step 350: 0.03415956720709801
Loss at step 400: 0.04453108832240105
Loss at step 450: 0.07758844643831253
Loss at step 500: 0.04702244699001312
Loss at step 550: 0.048941150307655334
Loss at step 600: 0.033970486372709274
Loss at step 650: 0.042242471128702164
Loss at step 700: 0.035817310214042664
Loss at step 750: 0.039164334535598755
Loss at step 800: 0.049258679151535034
Loss at step 850: 0.03764650970697403
Loss at step 900: 0.037753425538539886
Mean training loss after epoch 77: 0.04369013253122822
EPOCH: 78
Loss at step 0: 0.03975120559334755
Loss at step 50: 0.054156817495822906
Loss at step 100: 0.037778813391923904
Loss at step 150: 0.04597756266593933
Loss at step 200: 0.05560242384672165
Loss at step 250: 0.038838405162096024
Loss at step 300: 0.04676831513643265
Loss at step 350: 0.0348576083779335
Loss at step 400: 0.0406932570040226
Loss at step 450: 0.03799765929579735
Loss at step 500: 0.056081756949424744
Loss at step 550: 0.036105211824178696
Loss at step 600: 0.041250597685575485
Loss at step 650: 0.05187376216053963
Loss at step 700: 0.039942413568496704
Loss at step 750: 0.03421250730752945
Loss at step 800: 0.04465855285525322
Loss at step 850: 0.04682165011763573
Loss at step 900: 0.035167813301086426
Mean training loss after epoch 78: 0.043499736446958745
EPOCH: 79
Loss at step 0: 0.04953160509467125
Loss at step 50: 0.032434191554784775
Loss at step 100: 0.060896556824445724
Loss at step 150: 0.04916340485215187
Loss at step 200: 0.04294075071811676
Loss at step 250: 0.03761627897620201
Loss at step 300: 0.05153829976916313
Loss at step 350: 0.04079978168010712
Loss at step 400: 0.03249897435307503
Loss at step 450: 0.04026184603571892
Loss at step 500: 0.04575842246413231
Loss at step 550: 0.04249049723148346
Loss at step 600: 0.043605584651231766
Loss at step 650: 0.036947764456272125
Loss at step 700: 0.04640074074268341
Loss at step 750: 0.05125853419303894
Loss at step 800: 0.053849928081035614
Loss at step 850: 0.05566077306866646
Loss at step 900: 0.045117028057575226
Mean training loss after epoch 79: 0.04346864549383553
EPOCH: 80
Loss at step 0: 0.06714697927236557
Loss at step 50: 0.04404205456376076
Loss at step 100: 0.04033733159303665
Loss at step 150: 0.05670079216361046
Loss at step 200: 0.043069370090961456
Loss at step 250: 0.06097843125462532
Loss at step 300: 0.028987368568778038
Loss at step 350: 0.039095696061849594
Loss at step 400: 0.05482931435108185
Loss at step 450: 0.03586386889219284
Loss at step 500: 0.041629496961832047
Loss at step 550: 0.03990510106086731
Loss at step 600: 0.054544877260923386
Loss at step 650: 0.03798103332519531
Loss at step 700: 0.04047699645161629
Loss at step 750: 0.037949416786432266
Loss at step 800: 0.0359608419239521
Loss at step 850: 0.054671015590429306
Loss at step 900: 0.038619618862867355
Mean training loss after epoch 80: 0.043645817013993574
EPOCH: 81
Loss at step 0: 0.041961781680583954
Loss at step 50: 0.040087319910526276
Loss at step 100: 0.05675582215189934
Loss at step 150: 0.05277867987751961
Loss at step 200: 0.03047415055334568
Loss at step 250: 0.03613193705677986
Loss at step 300: 0.043731123208999634
Loss at step 350: 0.04356745257973671
Loss at step 400: 0.03996061906218529
Loss at step 450: 0.03311565890908241
Loss at step 500: 0.04055457562208176
Loss at step 550: 0.05245433375239372
Loss at step 600: 0.038351498544216156
Loss at step 650: 0.05595117434859276
Loss at step 700: 0.03396110609173775
Loss at step 750: 0.0379931665956974
Loss at step 800: 0.03337615355849266
Loss at step 850: 0.0378812812268734
Loss at step 900: 0.03898458927869797
Mean training loss after epoch 81: 0.0442062398152692
EPOCH: 82
Loss at step 0: 0.06833680719137192
Loss at step 50: 0.04500220715999603
Loss at step 100: 0.03772377222776413
Loss at step 150: 0.053365763276815414
Loss at step 200: 0.03693930804729462
Loss at step 250: 0.04482037574052811
Loss at step 300: 0.049105748534202576
Loss at step 350: 0.04576117917895317
Loss at step 400: 0.03354388102889061
Loss at step 450: 0.04362288862466812
Loss at step 500: 0.037595588713884354
Loss at step 550: 0.06382860988378525
Loss at step 600: 0.03677208349108696
Loss at step 650: 0.05604931339621544
Loss at step 700: 0.05555162578821182
Loss at step 750: 0.027041040360927582
Loss at step 800: 0.04180339723825455
Loss at step 850: 0.07140879333019257
Loss at step 900: 0.04118446260690689
Mean training loss after epoch 82: 0.04348311074443463
EPOCH: 83
Loss at step 0: 0.03259948268532753
Loss at step 50: 0.05375359207391739
Loss at step 100: 0.03813811019062996
Loss at step 150: 0.04393210634589195
Loss at step 200: 0.05306578427553177
Loss at step 250: 0.05070703849196434
Loss at step 300: 0.042646415531635284
Loss at step 350: 0.052681658416986465
Loss at step 400: 0.039171285927295685
Loss at step 450: 0.050351619720458984
Loss at step 500: 0.036338597536087036
Loss at step 550: 0.04029726982116699
Loss at step 600: 0.04205435514450073
Loss at step 650: 0.0543961375951767
Loss at step 700: 0.027284376323223114
Loss at step 750: 0.036568958312273026
Loss at step 800: 0.04350341111421585
Loss at step 850: 0.05256534367799759
Loss at step 900: 0.03388596326112747
Mean training loss after epoch 83: 0.04297601644084779
EPOCH: 84
Loss at step 0: 0.04750841483473778
Loss at step 50: 0.04352778568863869
Loss at step 100: 0.03605922311544418
Loss at step 150: 0.040445245802402496
Loss at step 200: 0.044017449021339417
Loss at step 250: 0.036079443991184235
Loss at step 300: 0.03524768352508545
Loss at step 350: 0.058157481253147125
Loss at step 400: 0.03466608375310898
Loss at step 450: 0.04185834154486656
Loss at step 500: 0.039541035890579224
Loss at step 550: 0.03365383669734001
Loss at step 600: 0.032277606427669525
Loss at step 650: 0.038126878440380096
Loss at step 700: 0.06580884009599686
Loss at step 750: 0.029152007773518562
Loss at step 800: 0.05536936596035957
Loss at step 850: 0.03807036951184273
Loss at step 900: 0.03619582951068878
Mean training loss after epoch 84: 0.043629243781269866
EPOCH: 85
Loss at step 0: 0.049788717180490494
Loss at step 50: 0.038022421300411224
Loss at step 100: 0.032273098826408386
Loss at step 150: 0.03269397094845772
Loss at step 200: 0.035025231540203094
Loss at step 250: 0.05248252674937248
Loss at step 300: 0.03256775811314583
Loss at step 350: 0.03763484209775925
Loss at step 400: 0.04265531152486801
Loss at step 450: 0.03436531126499176
Loss at step 500: 0.03978480398654938
Loss at step 550: 0.03378177434206009
Loss at step 600: 0.06930387765169144
Loss at step 650: 0.03307141363620758
Loss at step 700: 0.053521107882261276
Loss at step 750: 0.053345534950494766
Loss at step 800: 0.037429336458444595
Loss at step 850: 0.043063025921583176
Loss at step 900: 0.03620496019721031
Mean training loss after epoch 85: 0.04344467626118075
EPOCH: 86
Loss at step 0: 0.07604561746120453
Loss at step 50: 0.03270219638943672
Loss at step 100: 0.048404015600681305
Loss at step 150: 0.03520134463906288
Loss at step 200: 0.04617032781243324
Loss at step 250: 0.04424238204956055
Loss at step 300: 0.05500755086541176
Loss at step 350: 0.03861868381500244
Loss at step 400: 0.041497714817523956
Loss at step 450: 0.040724895894527435
Loss at step 500: 0.05506446212530136
Loss at step 550: 0.050323523581027985
Loss at step 600: 0.04069007933139801
Loss at step 650: 0.04165627062320709
Loss at step 700: 0.03901602327823639
Loss at step 750: 0.05073634907603264
Loss at step 800: 0.057049982249736786
Loss at step 850: 0.041814662516117096
Loss at step 900: 0.050912611186504364
Mean training loss after epoch 86: 0.04339223529603372
EPOCH: 87
Loss at step 0: 0.03832190856337547
Loss at step 50: 0.03612116724252701
Loss at step 100: 0.046430062502622604
Loss at step 150: 0.0354364775121212
Loss at step 200: 0.04589875787496567
Loss at step 250: 0.046340517699718475
Loss at step 300: 0.03358977660536766
Loss at step 350: 0.05608464032411575
Loss at step 400: 0.03659365326166153
Loss at step 450: 0.031970780342817307
Loss at step 500: 0.05655407905578613
Loss at step 550: 0.035570062696933746
Loss at step 600: 0.050886280834674835
Loss at step 650: 0.035846032202243805
Loss at step 700: 0.033220771700143814
Loss at step 750: 0.05016414076089859
Loss at step 800: 0.03249731287360191
Loss at step 850: 0.051579348742961884
Loss at step 900: 0.04383271932601929
Mean training loss after epoch 87: 0.043103055565802656
EPOCH: 88
Loss at step 0: 0.059718307107686996
Loss at step 50: 0.03696296736598015
Loss at step 100: 0.05220537632703781
Loss at step 150: 0.059678904712200165
Loss at step 200: 0.036867622286081314
Loss at step 250: 0.03667591139674187
Loss at step 300: 0.034308452159166336
Loss at step 350: 0.04058242589235306
Loss at step 400: 0.03458724543452263
Loss at step 450: 0.03771441802382469
Loss at step 500: 0.03839738294482231
Loss at step 550: 0.04058973491191864
Loss at step 600: 0.039897527545690536
Loss at step 650: 0.04819701239466667
Loss at step 700: 0.039945267140865326
Loss at step 750: 0.039837781339883804
Loss at step 800: 0.03418014571070671
Loss at step 850: 0.0561252199113369
Loss at step 900: 0.03720290958881378
Mean training loss after epoch 88: 0.043711327423434906
EPOCH: 89
Loss at step 0: 0.0622403658926487
Loss at step 50: 0.03501512110233307
Loss at step 100: 0.03378031775355339
Loss at step 150: 0.03363748639822006
Loss at step 200: 0.04308519512414932
Loss at step 250: 0.04319875314831734
Loss at step 300: 0.06976722180843353
Loss at step 350: 0.03761878237128258
Loss at step 400: 0.03650447353720665
Loss at step 450: 0.03223029896616936
Loss at step 500: 0.0345304012298584
Loss at step 550: 0.04749958589673042
Loss at step 600: 0.04270829260349274
Loss at step 650: 0.05911377817392349
Loss at step 700: 0.03142249584197998
Loss at step 750: 0.04553729668259621
Loss at step 800: 0.039227329194545746
Loss at step 850: 0.0500837042927742
Loss at step 900: 0.038529522716999054
Mean training loss after epoch 89: 0.043296830720707044
EPOCH: 90
Loss at step 0: 0.03306594118475914
Loss at step 50: 0.04650263860821724
Loss at step 100: 0.04312523081898689
Loss at step 150: 0.034545619040727615
Loss at step 200: 0.04598233103752136
Loss at step 250: 0.03660457953810692
Loss at step 300: 0.04116194322705269
Loss at step 350: 0.05145394057035446
Loss at step 400: 0.05626528337597847
Loss at step 450: 0.042840324342250824
Loss at step 500: 0.03150201216340065
Loss at step 550: 0.03885718807578087
Loss at step 600: 0.04799865931272507
Loss at step 650: 0.03733006864786148
Loss at step 700: 0.03675772249698639
Loss at step 750: 0.0403936468064785
Loss at step 800: 0.060140881687402725
Loss at step 850: 0.056603480130434036
Loss at step 900: 0.03747309371829033
Mean training loss after epoch 90: 0.04427998538202505
EPOCH: 91
Loss at step 0: 0.04910198971629143
Loss at step 50: 0.04037807509303093
Loss at step 100: 0.03771810233592987
Loss at step 150: 0.034219253808259964
Loss at step 200: 0.04347480833530426
Loss at step 250: 0.047169867902994156
Loss at step 300: 0.041771892458200455
Loss at step 350: 0.038890402764081955
Loss at step 400: 0.03771951049566269
Loss at step 450: 0.027332905679941177
Loss at step 500: 0.0472487211227417
Loss at step 550: 0.03344062715768814
Loss at step 600: 0.039007991552352905
Loss at step 650: 0.03601592034101486
Loss at step 700: 0.03445816412568092
Loss at step 750: 0.040622398257255554
Loss at step 800: 0.03840038552880287
Loss at step 850: 0.05284346267580986
Loss at step 900: 0.046714216470718384
Mean training loss after epoch 91: 0.04356519312166901
EPOCH: 92
Loss at step 0: 0.07394585013389587
Loss at step 50: 0.03972001373767853
Loss at step 100: 0.03388422727584839
Loss at step 150: 0.043385058641433716
Loss at step 200: 0.0548548698425293
Loss at step 250: 0.03832437843084335
Loss at step 300: 0.05707750469446182
Loss at step 350: 0.041515324264764786
Loss at step 400: 0.031445201486349106
Loss at step 450: 0.06388327479362488
Loss at step 500: 0.037730999290943146
Loss at step 550: 0.05023184418678284
Loss at step 600: 0.0382477305829525
Loss at step 650: 0.043892908841371536
Loss at step 700: 0.03901918977499008
Loss at step 750: 0.04354777932167053
Loss at step 800: 0.040302760899066925
Loss at step 850: 0.04244803264737129
Loss at step 900: 0.05645808205008507
Mean training loss after epoch 92: 0.042986952499158855
EPOCH: 93
Loss at step 0: 0.03539813309907913
Loss at step 50: 0.048581965267658234
Loss at step 100: 0.04163219407200813
Loss at step 150: 0.03968052938580513
Loss at step 200: 0.03193528577685356
Loss at step 250: 0.0411832369863987
Loss at step 300: 0.03856983408331871
Loss at step 350: 0.038298722356557846
Loss at step 400: 0.04309234768152237
Loss at step 450: 0.040360040962696075
Loss at step 500: 0.04634615406394005
Loss at step 550: 0.06071379780769348
Loss at step 600: 0.04664158821105957
Loss at step 650: 0.03374982625246048
Loss at step 700: 0.052322570234537125
Loss at step 750: 0.04002566263079643
Loss at step 800: 0.03766759857535362
Loss at step 850: 0.05379931628704071
Loss at step 900: 0.037691425532102585
Mean training loss after epoch 93: 0.043927824297455204
EPOCH: 94
Loss at step 0: 0.06536293029785156
Loss at step 50: 0.030876314267516136
Loss at step 100: 0.03512885048985481
Loss at step 150: 0.05157601460814476
Loss at step 200: 0.07189564406871796
Loss at step 250: 0.038722675293684006
Loss at step 300: 0.03872859477996826
Loss at step 350: 0.03599252551794052
Loss at step 400: 0.042109329253435135
Loss at step 450: 0.04384028911590576
Loss at step 500: 0.036185961216688156
Loss at step 550: 0.062151797115802765
Loss at step 600: 0.03850793465971947
Loss at step 650: 0.034010306000709534
Loss at step 700: 0.04265720024704933
Loss at step 750: 0.038343705236911774
Loss at step 800: 0.0559229701757431
Loss at step 850: 0.037223100662231445
Loss at step 900: 0.03685881197452545
Mean training loss after epoch 94: 0.04346638769983673
EPOCH: 95
Loss at step 0: 0.07410895079374313
Loss at step 50: 0.030644426122307777
Loss at step 100: 0.03632679581642151
Loss at step 150: 0.04430529475212097
Loss at step 200: 0.03993538022041321
Loss at step 250: 0.033615030348300934
Loss at step 300: 0.03780102729797363
Loss at step 350: 0.03041430376470089
Loss at step 400: 0.04900200292468071
Loss at step 450: 0.04082078859210014
Loss at step 500: 0.04287412762641907
Loss at step 550: 0.04590524733066559
Loss at step 600: 0.03649120777845383
Loss at step 650: 0.032129764556884766
Loss at step 700: 0.07543421536684036
Loss at step 750: 0.037203382700681686
Loss at step 800: 0.048887595534324646
Loss at step 850: 0.036267053335905075
Loss at step 900: 0.036513570696115494
Mean training loss after epoch 95: 0.04287138573928619
EPOCH: 96
Loss at step 0: 0.03439642861485481
Loss at step 50: 0.032979678362607956
Loss at step 100: 0.03874444216489792
Loss at step 150: 0.03373781591653824
Loss at step 200: 0.03582676500082016
Loss at step 250: 0.03851651772856712
Loss at step 300: 0.03581823408603668
Loss at step 350: 0.036770474165678024
Loss at step 400: 0.03284694626927376
Loss at step 450: 0.04079399257898331
Loss at step 500: 0.04206319525837898
Loss at step 550: 0.03856229409575462
Loss at step 600: 0.04491313174366951
Loss at step 650: 0.03701549395918846
Loss at step 700: 0.053752969950437546
Loss at step 750: 0.05399594083428383
Loss at step 800: 0.0453498549759388
Loss at step 850: 0.038302745670080185
Loss at step 900: 0.04211289435625076
Mean training loss after epoch 96: 0.0431959849816046
EPOCH: 97
Loss at step 0: 0.03641681745648384
Loss at step 50: 0.03312516584992409
Loss at step 100: 0.05817564204335213
Loss at step 150: 0.03705442696809769
Loss at step 200: 0.0319865457713604
Loss at step 250: 0.040375035256147385
Loss at step 300: 0.059476666152477264
Loss at step 350: 0.048445623368024826
Loss at step 400: 0.03463175892829895
Loss at step 450: 0.037774257361888885
Loss at step 500: 0.0523386187851429
Loss at step 550: 0.0657649040222168
Loss at step 600: 0.04200076311826706
Loss at step 650: 0.0315200611948967
Loss at step 700: 0.04242639243602753
Loss at step 750: 0.03878338262438774
Loss at step 800: 0.05410474166274071
Loss at step 850: 0.03390758857131004
Loss at step 900: 0.051311835646629333
Mean training loss after epoch 97: 0.04318891692040826
EPOCH: 98
Loss at step 0: 0.046574775129556656
Loss at step 50: 0.05641099810600281
Loss at step 100: 0.03675806149840355
Loss at step 150: 0.04064656049013138
Loss at step 200: 0.041404858231544495
Loss at step 250: 0.046953871846199036
Loss at step 300: 0.03851976990699768
Loss at step 350: 0.058378178626298904
Loss at step 400: 0.04649720713496208
Loss at step 450: 0.03760731220245361
Loss at step 500: 0.03642148897051811
Loss at step 550: 0.050206299871206284
Loss at step 600: 0.04067458212375641
Loss at step 650: 0.04598100110888481
Loss at step 700: 0.06879585981369019
Loss at step 750: 0.039883676916360855
Loss at step 800: 0.05262867361307144
Loss at step 850: 0.03709295392036438
Loss at step 900: 0.03144507482647896
Mean training loss after epoch 98: 0.043273623160986124
EPOCH: 99
Loss at step 0: 0.05169651284813881
Loss at step 50: 0.04774666577577591
Loss at step 100: 0.04283786192536354
Loss at step 150: 0.05346093699336052
Loss at step 200: 0.06267943233251572
Loss at step 250: 0.029984652996063232
Loss at step 300: 0.04070792719721794
Loss at step 350: 0.042360469698905945
Loss at step 400: 0.04965902864933014
Loss at step 450: 0.034532371908426285
Loss at step 500: 0.031091198325157166
Loss at step 550: 0.0353475958108902
Loss at step 600: 0.03562759980559349
Loss at step 650: 0.05511445179581642
Loss at step 700: 0.04308035969734192
Loss at step 750: 0.039943136274814606
Loss at step 800: 0.03640330582857132
Loss at step 850: 0.052483201026916504
Loss at step 900: 0.038298215717077255
Mean training loss after epoch 99: 0.04306189255761122
EPOCH: 100
Loss at step 0: 0.03868278115987778
Loss at step 50: 0.033273063600063324
Loss at step 100: 0.06904282420873642
Loss at step 150: 0.04511026293039322
Loss at step 200: 0.03361647576093674
Loss at step 250: 0.04274805635213852
Loss at step 300: 0.04436466470360756
Loss at step 350: 0.06252449005842209
Loss at step 400: 0.04299504682421684
Loss at step 450: 0.0542532280087471
Loss at step 500: 0.0670432448387146
Loss at step 550: 0.0696505531668663
Loss at step 600: 0.059786610305309296
Loss at step 650: 0.041346460580825806
Loss at step 700: 0.05354555323719978
Loss at step 750: 0.048287492245435715
Loss at step 800: 0.03784614056348801
Loss at step 850: 0.03387393057346344
Loss at step 900: 0.03863274306058884
Mean training loss after epoch 100: 0.04288483024842894
EPOCH: 101
Loss at step 0: 0.0320252887904644
Loss at step 50: 0.03704806789755821
Loss at step 100: 0.04036383330821991
Loss at step 150: 0.04344441741704941
Loss at step 200: 0.036835040897130966
Loss at step 250: 0.03519628942012787
Loss at step 300: 0.038933347910642624
Loss at step 350: 0.040308840572834015
Loss at step 400: 0.06893625110387802
Loss at step 450: 0.03442879393696785
Loss at step 500: 0.06981664896011353
Loss at step 550: 0.05139239877462387
Loss at step 600: 0.045361023396253586
Loss at step 650: 0.03686761483550072
Loss at step 700: 0.03900550678372383
Loss at step 750: 0.033268269151449203
Loss at step 800: 0.036813121289014816
Loss at step 850: 0.03814901039004326
Loss at step 900: 0.05475571006536484
Mean training loss after epoch 101: 0.04321747353828665
EPOCH: 102
Loss at step 0: 0.05565916746854782
Loss at step 50: 0.0469290092587471
Loss at step 100: 0.03592206537723541
Loss at step 150: 0.03643639758229256
Loss at step 200: 0.06259587407112122
Loss at step 250: 0.0463993102312088
Loss at step 300: 0.03069981187582016
Loss at step 350: 0.06601865589618683
Loss at step 400: 0.03276149928569794
Loss at step 450: 0.02996491827070713
Loss at step 500: 0.0397891029715538
Loss at step 550: 0.04958747699856758
Loss at step 600: 0.05864010751247406
Loss at step 650: 0.04672401770949364
Loss at step 700: 0.04763220250606537
Loss at step 750: 0.03473920747637749
Loss at step 800: 0.037696756422519684
Loss at step 850: 0.0555654838681221
Loss at step 900: 0.05053749680519104
Mean training loss after epoch 102: 0.043392531085274876
EPOCH: 103
Loss at step 0: 0.03874469920992851
Loss at step 50: 0.05485078692436218
Loss at step 100: 0.03334563225507736
Loss at step 150: 0.03602603077888489
Loss at step 200: 0.05366092175245285
Loss at step 250: 0.05623263493180275
Loss at step 300: 0.04819313436746597
Loss at step 350: 0.05220837891101837
Loss at step 400: 0.03978671878576279
Loss at step 450: 0.05293075740337372
Loss at step 500: 0.03283224254846573
Loss at step 550: 0.03934606537222862
Loss at step 600: 0.039841100573539734
Loss at step 650: 0.038126807659864426
Loss at step 700: 0.04651801660656929
Loss at step 750: 0.03205832839012146
Loss at step 800: 0.03260406479239464
Loss at step 850: 0.04802095517516136
Loss at step 900: 0.07081270962953568
Mean training loss after epoch 103: 0.04249289364758522
EPOCH: 104
Loss at step 0: 0.03115161508321762
Loss at step 50: 0.03379862755537033
Loss at step 100: 0.041275929659605026
Loss at step 150: 0.039807382971048355
Loss at step 200: 0.04076811298727989
Loss at step 250: 0.03608894720673561
Loss at step 300: 0.03752580285072327
Loss at step 350: 0.046833109110593796
Loss at step 400: 0.051159657537937164
Loss at step 450: 0.03741542249917984
Loss at step 500: 0.038290105760097504
Loss at step 550: 0.041247326880693436
Loss at step 600: 0.04058607295155525
Loss at step 650: 0.03616589680314064
Loss at step 700: 0.04535121098160744
Loss at step 750: 0.03494952619075775
Loss at step 800: 0.03949456289410591
Loss at step 850: 0.040163662284612656
Loss at step 900: 0.03252166882157326
Mean training loss after epoch 104: 0.043015945414855664
EPOCH: 105
Loss at step 0: 0.036242932081222534
Loss at step 50: 0.03272474184632301
Loss at step 100: 0.04492475464940071
Loss at step 150: 0.045704569667577744
Loss at step 200: 0.036552704870700836
Loss at step 250: 0.041963860392570496
Loss at step 300: 0.03921288624405861
Loss at step 350: 0.04874899238348007
Loss at step 400: 0.04045712947845459
Loss at step 450: 0.030772091820836067
Loss at step 500: 0.05106096714735031
Loss at step 550: 0.027127612382173538
Loss at step 600: 0.05295705795288086
Loss at step 650: 0.04706268757581711
Loss at step 700: 0.03845628723502159
Loss at step 750: 0.03371085971593857
Loss at step 800: 0.04142371937632561
Loss at step 850: 0.052407145500183105
Loss at step 900: 0.037026021629571915
Mean training loss after epoch 105: 0.04317484780558264
EPOCH: 106
Loss at step 0: 0.029048895463347435
Loss at step 50: 0.04750058054924011
Loss at step 100: 0.03844717517495155
Loss at step 150: 0.032337259501218796
Loss at step 200: 0.03150492161512375
Loss at step 250: 0.05508747696876526
Loss at step 300: 0.04623647406697273
Loss at step 350: 0.03420146927237511
Loss at step 400: 0.054901327937841415
Loss at step 450: 0.044081784784793854
Loss at step 500: 0.05747245252132416
Loss at step 550: 0.04383234679698944
Loss at step 600: 0.03540794178843498
Loss at step 650: 0.049649763852357864
Loss at step 700: 0.03573313355445862
Loss at step 750: 0.044091228395700455
Loss at step 800: 0.03443525359034538
Loss at step 850: 0.04229153320193291
Loss at step 900: 0.058335255831480026
Mean training loss after epoch 106: 0.04258429711219916
EPOCH: 107
Loss at step 0: 0.0381803885102272
Loss at step 50: 0.03685297816991806
Loss at step 100: 0.034790556877851486
Loss at step 150: 0.05032115802168846
Loss at step 200: 0.033282600343227386
Loss at step 250: 0.04043008014559746
Loss at step 300: 0.03712046146392822
Loss at step 350: 0.047968342900276184
Loss at step 400: 0.043439656496047974
Loss at step 450: 0.043730784207582474
Loss at step 500: 0.03656981140375137
Loss at step 550: 0.042178064584732056
Loss at step 600: 0.04220736026763916
Loss at step 650: 0.03219039365649223
Loss at step 700: 0.03509557992219925
Loss at step 750: 0.04011481627821922
Loss at step 800: 0.029533641412854195
Loss at step 850: 0.0380370207130909
Loss at step 900: 0.03835464268922806
Mean training loss after epoch 107: 0.042455973341933954
EPOCH: 108
Loss at step 0: 0.053215205669403076
Loss at step 50: 0.03992394357919693
Loss at step 100: 0.03500846028327942
Loss at step 150: 0.05199325084686279
Loss at step 200: 0.03509242460131645
Loss at step 250: 0.033662501722574234
Loss at step 300: 0.0520416721701622
Loss at step 350: 0.03401225432753563
Loss at step 400: 0.04268940910696983
Loss at step 450: 0.036937568336725235
Loss at step 500: 0.053192269057035446
Loss at step 550: 0.029121991246938705
Loss at step 600: 0.035446252673864365
Loss at step 650: 0.05657226964831352
Loss at step 700: 0.040138814598321915
Loss at step 750: 0.042686160653829575
Loss at step 800: 0.03484735265374184
Loss at step 850: 0.03614305332303047
Loss at step 900: 0.04524796083569527
Mean training loss after epoch 108: 0.04264132607021311
EPOCH: 109
Loss at step 0: 0.03732350468635559
Loss at step 50: 0.03490496426820755
Loss at step 100: 0.039503492414951324
Loss at step 150: 0.03732747957110405
Loss at step 200: 0.036649059504270554
Loss at step 250: 0.04840293154120445
Loss at step 300: 0.04136611893773079
Loss at step 350: 0.03097102977335453
Loss at step 400: 0.03997035697102547
Loss at step 450: 0.040247365832328796
Loss at step 500: 0.033872444182634354
Loss at step 550: 0.043919648975133896
Loss at step 600: 0.03842050954699516
Loss at step 650: 0.03633081167936325
Loss at step 700: 0.03185936436057091
Loss at step 750: 0.031556472182273865
Loss at step 800: 0.04246347025036812
Loss at step 850: 0.03699137642979622
Loss at step 900: 0.038073956966400146
Mean training loss after epoch 109: 0.042871232231908134
EPOCH: 110
Loss at step 0: 0.044787343591451645
Loss at step 50: 0.04396039992570877
Loss at step 100: 0.03304486349225044
Loss at step 150: 0.0329318642616272
Loss at step 200: 0.03712211921811104
Loss at step 250: 0.03624889254570007
Loss at step 300: 0.039950963109731674
Loss at step 350: 0.05183451250195503
Loss at step 400: 0.033772554248571396
Loss at step 450: 0.03334091603755951
Loss at step 500: 0.04075763002038002
Loss at step 550: 0.06279843300580978
Loss at step 600: 0.05321997031569481
Loss at step 650: 0.037117939442396164
Loss at step 700: 0.03801078721880913
Loss at step 750: 0.032297197729349136
Loss at step 800: 0.049598902463912964
Loss at step 850: 0.06499156355857849
Loss at step 900: 0.034701526165008545
Mean training loss after epoch 110: 0.04253625459095308
EPOCH: 111
Loss at step 0: 0.04842279478907585
Loss at step 50: 0.030813124030828476
Loss at step 100: 0.03582942485809326
Loss at step 150: 0.04401121661067009
Loss at step 200: 0.03099776618182659
Loss at step 250: 0.039740726351737976
Loss at step 300: 0.049139585345983505
Loss at step 350: 0.032573338598012924
Loss at step 400: 0.0320868156850338
Loss at step 450: 0.03244072571396828
Loss at step 500: 0.036660704761743546
Loss at step 550: 0.05817306786775589
Loss at step 600: 0.046441372483968735
Loss at step 650: 0.04924897104501724
Loss at step 700: 0.032720550894737244
Loss at step 750: 0.03674580901861191
Loss at step 800: 0.0348411463201046
Loss at step 850: 0.0344211645424366
Loss at step 900: 0.03544338792562485
Mean training loss after epoch 111: 0.04321895086808182
EPOCH: 112
Loss at step 0: 0.02806468866765499
Loss at step 50: 0.05090704187750816
Loss at step 100: 0.04020220413804054
Loss at step 150: 0.04250871017575264
Loss at step 200: 0.041078586131334305
Loss at step 250: 0.049902867525815964
Loss at step 300: 0.03141561150550842
Loss at step 350: 0.04201183468103409
Loss at step 400: 0.055819183588027954
Loss at step 450: 0.03885535150766373
Loss at step 500: 0.06769788265228271
Loss at step 550: 0.039124418050050735
Loss at step 600: 0.0550897940993309
Loss at step 650: 0.05477220192551613
Loss at step 700: 0.03775335103273392
Loss at step 750: 0.03932926058769226
Loss at step 800: 0.03385027498006821
Loss at step 850: 0.03438795357942581
Loss at step 900: 0.04016561433672905
Mean training loss after epoch 112: 0.042442821055603056
EPOCH: 113
Loss at step 0: 0.03439236432313919
Loss at step 50: 0.04079783707857132
Loss at step 100: 0.04035534709692001
Loss at step 150: 0.02992972545325756
Loss at step 200: 0.03763053938746452
Loss at step 250: 0.03948152810335159
Loss at step 300: 0.05204397067427635
Loss at step 350: 0.03488073870539665
Loss at step 400: 0.039366453886032104
Loss at step 450: 0.07258787006139755
Loss at step 500: 0.03774897754192352
Loss at step 550: 0.03692486137151718
Loss at step 600: 0.03373141214251518
Loss at step 650: 0.038297783583402634
Loss at step 700: 0.03423190861940384
Loss at step 750: 0.036938928067684174
Loss at step 800: 0.02934008091688156
Loss at step 850: 0.03978004679083824
Loss at step 900: 0.05455329269170761
Mean training loss after epoch 113: 0.04289359403357132
EPOCH: 114
Loss at step 0: 0.04929478093981743
Loss at step 50: 0.036822691559791565
Loss at step 100: 0.033721890300512314
Loss at step 150: 0.034397274255752563
Loss at step 200: 0.044635675847530365
Loss at step 250: 0.04207928106188774
Loss at step 300: 0.039686571806669235
Loss at step 350: 0.05589856952428818
Loss at step 400: 0.03962375968694687
Loss at step 450: 0.02821052446961403
Loss at step 500: 0.059994060546159744
Loss at step 550: 0.035869430750608444
Loss at step 600: 0.035884320735931396
Loss at step 650: 0.03565568849444389
Loss at step 700: 0.03159397095441818
Loss at step 750: 0.038982052356004715
Loss at step 800: 0.03397154062986374
Loss at step 850: 0.03521180897951126
Loss at step 900: 0.036220405250787735
Mean training loss after epoch 114: 0.042887798532335236
EPOCH: 115
Loss at step 0: 0.03161928057670593
Loss at step 50: 0.03745435178279877
Loss at step 100: 0.03620803356170654
Loss at step 150: 0.061530373990535736
Loss at step 200: 0.03712744638323784
Loss at step 250: 0.053335703909397125
Loss at step 300: 0.03177780285477638
Loss at step 350: 0.04157339781522751
Loss at step 400: 0.04165922477841377
Loss at step 450: 0.035079650580883026
Loss at step 500: 0.048865292221307755
Loss at step 550: 0.03243790939450264
Loss at step 600: 0.04753963276743889
Loss at step 650: 0.03399026021361351
Loss at step 700: 0.04165761172771454
Loss at step 750: 0.03889331966638565
Loss at step 800: 0.0683445930480957
Loss at step 850: 0.029029441997408867
Loss at step 900: 0.0362982414662838
Mean training loss after epoch 115: 0.04242513539480058
EPOCH: 116
Loss at step 0: 0.04920484870672226
Loss at step 50: 0.05231902003288269
Loss at step 100: 0.05306987091898918
Loss at step 150: 0.05240674689412117
Loss at step 200: 0.030451014637947083
Loss at step 250: 0.03890807926654816
Loss at step 300: 0.05659289285540581
Loss at step 350: 0.036650046706199646
Loss at step 400: 0.0346701517701149
Loss at step 450: 0.03688763082027435
Loss at step 500: 0.036767538636922836
Loss at step 550: 0.032335709780454636
Loss at step 600: 0.05310001224279404
Loss at step 650: 0.05066174641251564
Loss at step 700: 0.05107351019978523
Loss at step 750: 0.03427013009786606
Loss at step 800: 0.04448677599430084
Loss at step 850: 0.034152865409851074
Loss at step 900: 0.03578594699501991
Mean training loss after epoch 116: 0.04271252417583456
EPOCH: 117
Loss at step 0: 0.041914403438568115
Loss at step 50: 0.03944316506385803
Loss at step 100: 0.036425188183784485
Loss at step 150: 0.052166588604450226
Loss at step 200: 0.03263081610202789
Loss at step 250: 0.05043771117925644
Loss at step 300: 0.042433347553014755
Loss at step 350: 0.04409367963671684
Loss at step 400: 0.05648966133594513
Loss at step 450: 0.03183215856552124
Loss at step 500: 0.0522194467484951
Loss at step 550: 0.03536583483219147
Loss at step 600: 0.037089429795742035
Loss at step 650: 0.03506674990057945
Loss at step 700: 0.07340752333402634
Loss at step 750: 0.03434182330965996
Loss at step 800: 0.05077645555138588
Loss at step 850: 0.03686201199889183
Loss at step 900: 0.04532172158360481
Mean training loss after epoch 117: 0.04204082982872785
EPOCH: 118
Loss at step 0: 0.0334637425839901
Loss at step 50: 0.04286329075694084
Loss at step 100: 0.039089079946279526
Loss at step 150: 0.034637052565813065
Loss at step 200: 0.03227698802947998
Loss at step 250: 0.042662475258111954
Loss at step 300: 0.04539114609360695
Loss at step 350: 0.046212777495384216
Loss at step 400: 0.04341699928045273
Loss at step 450: 0.03234335035085678
Loss at step 500: 0.03549254313111305
Loss at step 550: 0.035518109798431396
Loss at step 600: 0.05670395493507385
Loss at step 650: 0.05329463258385658
Loss at step 700: 0.05610251426696777
Loss at step 750: 0.03034931793808937
Loss at step 800: 0.05315934866666794
Loss at step 850: 0.03849770873785019
Loss at step 900: 0.043866049498319626
Mean training loss after epoch 118: 0.04233125694540899
EPOCH: 119
Loss at step 0: 0.041059985756874084
Loss at step 50: 0.03781072795391083
Loss at step 100: 0.058324579149484634
Loss at step 150: 0.05159909278154373
Loss at step 200: 0.03973580151796341
Loss at step 250: 0.034565798938274384
Loss at step 300: 0.03539511188864708
Loss at step 350: 0.04175569489598274
Loss at step 400: 0.033448122441768646
Loss at step 450: 0.04465514048933983
Loss at step 500: 0.035944875329732895
Loss at step 550: 0.03353596478700638
Loss at step 600: 0.03798174858093262
Loss at step 650: 0.03126048669219017
Loss at step 700: 0.035585448145866394
Loss at step 750: 0.04394625499844551
Loss at step 800: 0.03175988048315048
Loss at step 850: 0.048157915472984314
Loss at step 900: 0.05564304068684578
Mean training loss after epoch 119: 0.04225547817438396
EPOCH: 120
Loss at step 0: 0.03654998168349266
Loss at step 50: 0.04451896995306015
Loss at step 100: 0.03440498933196068
Loss at step 150: 0.045890774577856064
Loss at step 200: 0.06556855142116547
Loss at step 250: 0.04978479444980621
Loss at step 300: 0.05234746262431145
Loss at step 350: 0.048763785511255264
Loss at step 400: 0.03686203435063362
Loss at step 450: 0.0392894446849823
Loss at step 500: 0.02784913033246994
Loss at step 550: 0.060626059770584106
Loss at step 600: 0.028441544622182846
Loss at step 650: 0.03267563879489899
Loss at step 700: 0.07281859219074249
Loss at step 750: 0.03018386848270893
Loss at step 800: 0.05207233875989914
Loss at step 850: 0.03853049501776695
Loss at step 900: 0.03876703232526779
Mean training loss after epoch 120: 0.04244164083558105
EPOCH: 121
Loss at step 0: 0.032931819558143616
Loss at step 50: 0.03577417507767677
Loss at step 100: 0.05528498440980911
Loss at step 150: 0.043831828981637955
Loss at step 200: 0.040254004299640656
Loss at step 250: 0.03872199356555939
Loss at step 300: 0.05530036985874176
Loss at step 350: 0.047019802033901215
Loss at step 400: 0.05679032951593399
Loss at step 450: 0.051629483699798584
Loss at step 500: 0.04071091488003731
Loss at step 550: 0.032426148653030396
Loss at step 600: 0.03054644539952278
Loss at step 650: 0.035769470036029816
Loss at step 700: 0.04064173623919487
Loss at step 750: 0.03792702406644821
Loss at step 800: 0.0382523275911808
Loss at step 850: 0.04808889329433441
Loss at step 900: 0.03021102026104927
Mean training loss after epoch 121: 0.042562772758177984
EPOCH: 122
Loss at step 0: 0.03836292773485184
Loss at step 50: 0.037795569747686386
Loss at step 100: 0.037729810923337936
Loss at step 150: 0.05760970711708069
Loss at step 200: 0.03326259180903435
Loss at step 250: 0.03908127546310425
Loss at step 300: 0.05211610719561577
Loss at step 350: 0.03992561250925064
Loss at step 400: 0.03279055282473564
Loss at step 450: 0.054718513041734695
Loss at step 500: 0.028107738122344017
Loss at step 550: 0.037835340946912766
Loss at step 600: 0.0400877371430397
Loss at step 650: 0.040323756635189056
Loss at step 700: 0.050479333847761154
Loss at step 750: 0.03604499623179436
Loss at step 800: 0.061530329287052155
Loss at step 850: 0.0346403643488884
Loss at step 900: 0.042586684226989746
Mean training loss after epoch 122: 0.042738208541277245
EPOCH: 123
Loss at step 0: 0.037863537669181824
Loss at step 50: 0.04166312888264656
Loss at step 100: 0.0541631244122982
Loss at step 150: 0.03443480655550957
Loss at step 200: 0.03567218780517578
Loss at step 250: 0.07678977400064468
Loss at step 300: 0.035183656960725784
Loss at step 350: 0.04932482913136482
Loss at step 400: 0.03603409230709076
Loss at step 450: 0.045590754598379135
Loss at step 500: 0.050587721168994904
Loss at step 550: 0.03978006914258003
Loss at step 600: 0.03327057883143425
Loss at step 650: 0.032649945467710495
Loss at step 700: 0.040164392441511154
Loss at step 750: 0.054399918764829636
Loss at step 800: 0.03373962640762329
Loss at step 850: 0.037361081689596176
Loss at step 900: 0.040843669325113297
Mean training loss after epoch 123: 0.04323216850188241
EPOCH: 124
Loss at step 0: 0.054330259561538696
Loss at step 50: 0.034485798329114914
Loss at step 100: 0.03967555612325668
Loss at step 150: 0.033644016832113266
Loss at step 200: 0.08355532586574554
Loss at step 250: 0.040038276463747025
Loss at step 300: 0.059040505439043045
Loss at step 350: 0.036361414939165115
Loss at step 400: 0.032022830098867416
Loss at step 450: 0.06843051314353943
Loss at step 500: 0.03649810701608658
Loss at step 550: 0.03632408380508423
Loss at step 600: 0.04380091652274132
Loss at step 650: 0.049652453511953354
Loss at step 700: 0.031190911307930946
Loss at step 750: 0.03930390998721123
Loss at step 800: 0.03655650466680527
Loss at step 850: 0.0601528100669384
Loss at step 900: 0.07293304800987244
Mean training loss after epoch 124: 0.043018826279542975
EPOCH: 125
Loss at step 0: 0.03883155435323715
Loss at step 50: 0.06450485438108444
Loss at step 100: 0.046584781259298325
Loss at step 150: 0.050592802464962006
Loss at step 200: 0.033204685896635056
Loss at step 250: 0.03923625871539116
Loss at step 300: 0.03889136016368866
Loss at step 350: 0.03396601602435112
Loss at step 400: 0.036183927208185196
Loss at step 450: 0.03738848865032196
Loss at step 500: 0.050185903906822205
Loss at step 550: 0.05953901633620262
Loss at step 600: 0.05247611179947853
Loss at step 650: 0.04176788777112961
Loss at step 700: 0.03954014182090759
Loss at step 750: 0.051179856061935425
Loss at step 800: 0.029204268008470535
Loss at step 850: 0.03406751900911331
Loss at step 900: 0.04037805274128914
Mean training loss after epoch 125: 0.042219243697456714
EPOCH: 126
Loss at step 0: 0.05898353084921837
Loss at step 50: 0.040512777864933014
Loss at step 100: 0.0387885719537735
Loss at step 150: 0.05783606320619583
Loss at step 200: 0.0370011106133461
Loss at step 250: 0.034667015075683594
Loss at step 300: 0.07945484668016434
Loss at step 350: 0.05463087558746338
Loss at step 400: 0.05935561656951904
Loss at step 450: 0.03419971093535423
Loss at step 500: 0.0355793833732605
Loss at step 550: 0.05645984038710594
Loss at step 600: 0.049390655010938644
Loss at step 650: 0.05601374804973602
Loss at step 700: 0.03460899740457535
Loss at step 750: 0.047203876078128815
Loss at step 800: 0.043314892798662186
Loss at step 850: 0.03529394418001175
Loss at step 900: 0.03737284615635872
Mean training loss after epoch 126: 0.0425018967548286
EPOCH: 127
Loss at step 0: 0.03094486892223358
Loss at step 50: 0.033535122871398926
Loss at step 100: 0.03193407505750656
Loss at step 150: 0.0369366817176342
Loss at step 200: 0.02800428494811058
Loss at step 250: 0.061115555465221405
Loss at step 300: 0.03858532756567001
Loss at step 350: 0.053538210690021515
Loss at step 400: 0.04869364574551582
Loss at step 450: 0.055324409157037735
Loss at step 500: 0.03902880474925041
Loss at step 550: 0.030761603266000748
Loss at step 600: 0.04287640005350113
Loss at step 650: 0.03342721611261368
Loss at step 700: 0.053739070892333984
Loss at step 750: 0.03543848916888237
Loss at step 800: 0.04534701257944107
Loss at step 850: 0.04458793252706528
Loss at step 900: 0.05748144909739494
Mean training loss after epoch 127: 0.042397225750232935
EPOCH: 128
Loss at step 0: 0.04223093017935753
Loss at step 50: 0.060702186077833176
Loss at step 100: 0.05094020068645477
Loss at step 150: 0.05862560123205185
Loss at step 200: 0.05206675827503204
Loss at step 250: 0.03660311549901962
Loss at step 300: 0.033535201102495193
Loss at step 350: 0.037460967898368835
Loss at step 400: 0.053232479840517044
Loss at step 450: 0.05418254807591438
Loss at step 500: 0.03766205161809921
Loss at step 550: 0.039794545620679855
Loss at step 600: 0.04074326157569885
Loss at step 650: 0.03707185387611389
Loss at step 700: 0.04918089509010315
Loss at step 750: 0.035559263080358505
Loss at step 800: 0.038658492267131805
Loss at step 850: 0.03465277701616287
Loss at step 900: 0.055036671459674835
Mean training loss after epoch 128: 0.04215331066812851
EPOCH: 129
Loss at step 0: 0.05584968626499176
Loss at step 50: 0.044562358409166336
Loss at step 100: 0.04858553409576416
Loss at step 150: 0.035875383764505386
Loss at step 200: 0.03444959968328476
Loss at step 250: 0.049522798508405685
Loss at step 300: 0.038845498114824295
Loss at step 350: 0.0346490778028965
Loss at step 400: 0.0353156253695488
Loss at step 450: 0.04465166851878166
Loss at step 500: 0.051781926304101944
Loss at step 550: 0.04664802551269531
Loss at step 600: 0.04371103271842003
Loss at step 650: 0.03156234323978424
Loss at step 700: 0.03441200777888298
Loss at step 750: 0.04080637916922569
Loss at step 800: 0.04525822773575783
Loss at step 850: 0.038173094391822815
Loss at step 900: 0.03883713483810425
Mean training loss after epoch 129: 0.042144837785265975
EPOCH: 130
Loss at step 0: 0.03796138986945152
Loss at step 50: 0.02935968153178692
Loss at step 100: 0.0350450836122036
Loss at step 150: 0.03702370822429657
Loss at step 200: 0.04717714712023735
Loss at step 250: 0.037151504307985306
Loss at step 300: 0.030538195744156837
Loss at step 350: 0.035160936415195465
Loss at step 400: 0.056185707449913025
Loss at step 450: 0.038169149309396744
Loss at step 500: 0.03497578948736191
Loss at step 550: 0.033725254237651825
Loss at step 600: 0.03654317557811737
Loss at step 650: 0.05485529825091362
Loss at step 700: 0.035201311111450195
Loss at step 750: 0.05112731456756592
Loss at step 800: 0.0363975428044796
Loss at step 850: 0.05157846212387085
Loss at step 900: 0.053440894931554794
Mean training loss after epoch 130: 0.04201112081135895
EPOCH: 131
Loss at step 0: 0.05593106895685196
Loss at step 50: 0.04118822142481804
Loss at step 100: 0.05003751814365387
Loss at step 150: 0.0459161177277565
Loss at step 200: 0.03542783111333847
Loss at step 250: 0.03617458790540695
Loss at step 300: 0.046611104160547256
Loss at step 350: 0.05079322308301926
Loss at step 400: 0.05026945844292641
Loss at step 450: 0.03160965070128441
Loss at step 500: 0.03939694166183472
Loss at step 550: 0.042725078761577606
Loss at step 600: 0.039472032338380814
Loss at step 650: 0.043966952711343765
Loss at step 700: 0.05131252482533455
Loss at step 750: 0.05381901189684868
Loss at step 800: 0.04540075361728668
Loss at step 850: 0.03819124028086662
Loss at step 900: 0.03499366715550423
Mean training loss after epoch 131: 0.04209081147477698
EPOCH: 132
Loss at step 0: 0.03549477830529213
Loss at step 50: 0.04868175461888313
Loss at step 100: 0.06251970678567886
Loss at step 150: 0.03984671086072922
Loss at step 200: 0.03522738069295883
Loss at step 250: 0.041131217032670975
Loss at step 300: 0.03893343731760979
Loss at step 350: 0.029867224395275116
Loss at step 400: 0.03499307483434677
Loss at step 450: 0.04396430775523186
Loss at step 500: 0.039177216589450836
Loss at step 550: 0.034836992621421814
Loss at step 600: 0.0452614389359951
Loss at step 650: 0.03610033541917801
Loss at step 700: 0.03932995721697807
Loss at step 750: 0.03963834419846535
Loss at step 800: 0.03673923760652542
Loss at step 850: 0.03914159908890724
Loss at step 900: 0.0560661219060421
Mean training loss after epoch 132: 0.042048414718351766
EPOCH: 133
Loss at step 0: 0.037345174700021744
Loss at step 50: 0.035683225840330124
Loss at step 100: 0.042083919048309326
Loss at step 150: 0.036224111914634705
Loss at step 200: 0.03736821562051773
Loss at step 250: 0.04696378856897354
Loss at step 300: 0.030614478513598442
Loss at step 350: 0.04149696230888367
Loss at step 400: 0.03438796103000641
Loss at step 450: 0.029761843383312225
Loss at step 500: 0.04799075052142143
Loss at step 550: 0.03374233841896057
Loss at step 600: 0.03222092241048813
Loss at step 650: 0.03481902927160263
Loss at step 700: 0.032734550535678864
Loss at step 750: 0.03693768009543419
Loss at step 800: 0.057545632123947144
Loss at step 850: 0.03527749702334404
Loss at step 900: 0.08574411273002625
Mean training loss after epoch 133: 0.0421965813387368
EPOCH: 134
Loss at step 0: 0.033097926527261734
Loss at step 50: 0.031262170523405075
Loss at step 100: 0.03769355267286301
Loss at step 150: 0.042281635105609894
Loss at step 200: 0.04366563260555267
Loss at step 250: 0.056263700127601624
Loss at step 300: 0.0445694737136364
Loss at step 350: 0.060232821851968765
Loss at step 400: 0.043241508305072784
Loss at step 450: 0.054095227271318436
Loss at step 500: 0.03962637484073639
Loss at step 550: 0.03208627551794052
Loss at step 600: 0.036742426455020905
Loss at step 650: 0.07225906848907471
Loss at step 700: 0.031165501102805138
Loss at step 750: 0.040376145392656326
Loss at step 800: 0.031242134049534798
Loss at step 850: 0.07587699592113495
Loss at step 900: 0.03190525993704796
Mean training loss after epoch 134: 0.04234294356829894
EPOCH: 135
Loss at step 0: 0.04674546420574188
Loss at step 50: 0.03515247255563736
Loss at step 100: 0.030674539506435394
Loss at step 150: 0.03536396101117134
Loss at step 200: 0.03699110075831413
Loss at step 250: 0.05407152697443962
Loss at step 300: 0.03668119013309479
Loss at step 350: 0.05391205474734306
Loss at step 400: 0.035997532308101654
Loss at step 450: 0.0386599525809288
Loss at step 500: 0.05625467747449875
Loss at step 550: 0.040797557681798935
Loss at step 600: 0.03620133921504021
Loss at step 650: 0.041961271315813065
Loss at step 700: 0.04411289095878601
Loss at step 750: 0.04163749888539314
Loss at step 800: 0.03395351395010948
Loss at step 850: 0.035000745207071304
Loss at step 900: 0.049789972603321075
Mean training loss after epoch 135: 0.04251719271339206
EPOCH: 136
Loss at step 0: 0.03721001744270325
Loss at step 50: 0.0380161888897419
Loss at step 100: 0.03318065032362938
Loss at step 150: 0.05025745928287506
Loss at step 200: 0.02983713522553444
Loss at step 250: 0.041543181985616684
Loss at step 300: 0.03690667450428009
Loss at step 350: 0.03452766686677933
Loss at step 400: 0.06734222173690796
Loss at step 450: 0.03532548248767853
Loss at step 500: 0.0402277372777462
Loss at step 550: 0.02923762984573841
Loss at step 600: 0.03848644345998764
Loss at step 650: 0.029214609414339066
Loss at step 700: 0.03156350180506706
Loss at step 750: 0.03646155819296837
Loss at step 800: 0.030136052519083023
Loss at step 850: 0.04044833034276962
Loss at step 900: 0.03588998317718506
Mean training loss after epoch 136: 0.042193242827299304
EPOCH: 137
Loss at step 0: 0.04032415151596069
Loss at step 50: 0.04570217803120613
Loss at step 100: 0.041807111352682114
Loss at step 150: 0.03328615799546242
Loss at step 200: 0.03072769194841385
Loss at step 250: 0.033465202897787094
Loss at step 300: 0.033506300300359726
Loss at step 350: 0.03391355648636818
Loss at step 400: 0.04394785314798355
Loss at step 450: 0.05369193106889725
Loss at step 500: 0.038760554045438766
Loss at step 550: 0.052724115550518036
Loss at step 600: 0.048280686140060425
Loss at step 650: 0.05860856920480728
Loss at step 700: 0.0367179699242115
Loss at step 750: 0.037339795380830765
Loss at step 800: 0.042971864342689514
Loss at step 850: 0.03581308200955391
Loss at step 900: 0.03864269703626633
Mean training loss after epoch 137: 0.04252129275280275
EPOCH: 138
Loss at step 0: 0.03007676638662815
Loss at step 50: 0.03923787921667099
Loss at step 100: 0.03948834165930748
Loss at step 150: 0.03789016231894493
Loss at step 200: 0.046873949468135834
Loss at step 250: 0.056466713547706604
Loss at step 300: 0.03907657042145729
Loss at step 350: 0.03223132714629173
Loss at step 400: 0.04866752400994301
Loss at step 450: 0.03630037605762482
Loss at step 500: 0.029274195432662964
Loss at step 550: 0.047705937176942825
Loss at step 600: 0.05141275003552437
Loss at step 650: 0.0336599238216877
Loss at step 700: 0.03687726706266403
Loss at step 750: 0.03609692305326462
Loss at step 800: 0.03537045046687126
Loss at step 850: 0.030984627082943916
Loss at step 900: 0.032818686217069626
Mean training loss after epoch 138: 0.04185507736051642
EPOCH: 139
Loss at step 0: 0.039597246795892715
Loss at step 50: 0.03734810650348663
Loss at step 100: 0.03580016270279884
Loss at step 150: 0.03648393228650093
Loss at step 200: 0.03841892629861832
Loss at step 250: 0.03909728676080704
Loss at step 300: 0.04612003266811371
Loss at step 350: 0.0468924380838871
Loss at step 400: 0.05363857373595238
Loss at step 450: 0.05658219754695892
Loss at step 500: 0.03320447728037834
Loss at step 550: 0.054929088801145554
Loss at step 600: 0.07501211762428284
Loss at step 650: 0.041361112147569656
Loss at step 700: 0.034978266805410385
Loss at step 750: 0.0490257628262043
Loss at step 800: 0.04843713343143463
Loss at step 850: 0.052945345640182495
Loss at step 900: 0.031348828226327896
Mean training loss after epoch 139: 0.04195261146547571
EPOCH: 140
Loss at step 0: 0.038359805941581726
Loss at step 50: 0.043496184051036835
Loss at step 100: 0.03575534746050835
Loss at step 150: 0.03715290129184723
Loss at step 200: 0.05395453795790672
Loss at step 250: 0.04312689229846001
Loss at step 300: 0.039455071091651917
Loss at step 350: 0.05315985530614853
Loss at step 400: 0.04129849001765251
Loss at step 450: 0.04127022251486778
Loss at step 500: 0.04420750215649605
Loss at step 550: 0.03587833046913147
Loss at step 600: 0.038123976439237595
Loss at step 650: 0.03822450339794159
Loss at step 700: 0.040262363851070404
Loss at step 750: 0.04293157905340195
Loss at step 800: 0.03924191743135452
Loss at step 850: 0.051242876797914505
Loss at step 900: 0.030860869213938713
Mean training loss after epoch 140: 0.041628152157849214
EPOCH: 141
Loss at step 0: 0.0357309952378273
Loss at step 50: 0.0405811183154583
Loss at step 100: 0.03416278213262558
Loss at step 150: 0.03351762890815735
Loss at step 200: 0.030533114448189735
Loss at step 250: 0.03451740741729736
Loss at step 300: 0.03665946051478386
Loss at step 350: 0.05478139594197273
Loss at step 400: 0.05036437511444092
Loss at step 450: 0.05603531748056412
Loss at step 500: 0.03402724489569664
Loss at step 550: 0.03750381991267204
Loss at step 600: 0.040972087532281876
Loss at step 650: 0.042260900139808655
Loss at step 700: 0.03811081871390343
Loss at step 750: 0.036845553666353226
Loss at step 800: 0.0415690578520298
Loss at step 850: 0.03965042531490326
Loss at step 900: 0.03870100900530815
Mean training loss after epoch 141: 0.04202673006763082
EPOCH: 142
Loss at step 0: 0.03593452647328377
Loss at step 50: 0.0351838618516922
Loss at step 100: 0.06876881420612335
Loss at step 150: 0.038944195955991745
Loss at step 200: 0.040944647043943405
Loss at step 250: 0.039801549166440964
Loss at step 300: 0.03724919632077217
Loss at step 350: 0.03012264519929886
Loss at step 400: 0.028339020907878876
Loss at step 450: 0.043429140001535416
Loss at step 500: 0.039767246693372726
Loss at step 550: 0.034304797649383545
Loss at step 600: 0.0415533110499382
Loss at step 650: 0.04240945726633072
Loss at step 700: 0.03782442957162857
Loss at step 750: 0.038175780326128006
Loss at step 800: 0.0390368290245533
Loss at step 850: 0.04891939461231232
Loss at step 900: 0.04027196764945984
Mean training loss after epoch 142: 0.042164087712542334
EPOCH: 143
Loss at step 0: 0.05464017391204834
Loss at step 50: 0.04778584465384483
Loss at step 100: 0.043277475982904434
Loss at step 150: 0.030136309564113617
Loss at step 200: 0.038401342928409576
Loss at step 250: 0.03655426949262619
Loss at step 300: 0.03619815781712532
Loss at step 350: 0.03555699810385704
Loss at step 400: 0.06355958431959152
Loss at step 450: 0.03683250769972801
Loss at step 500: 0.031722113490104675
Loss at step 550: 0.051293518394231796
Loss at step 600: 0.04472873732447624
Loss at step 650: 0.04285021498799324
Loss at step 700: 0.03676331043243408
Loss at step 750: 0.053878091275691986
Loss at step 800: 0.039428550750017166
Loss at step 850: 0.029044492170214653
Loss at step 900: 0.046056412160396576
Mean training loss after epoch 143: 0.04196451393875486
EPOCH: 144
Loss at step 0: 0.03647203743457794
Loss at step 50: 0.03546447679400444
Loss at step 100: 0.028027458116412163
Loss at step 150: 0.03706343099474907
Loss at step 200: 0.02946298196911812
Loss at step 250: 0.04093487188220024
Loss at step 300: 0.03480026498436928
Loss at step 350: 0.0481966957449913
Loss at step 400: 0.04102826490998268
Loss at step 450: 0.035431042313575745
Loss at step 500: 0.05099305883049965
Loss at step 550: 0.042340170592069626
Loss at step 600: 0.05557699128985405
Loss at step 650: 0.03197412192821503
Loss at step 700: 0.035774651914834976
Loss at step 750: 0.03557714819908142
Loss at step 800: 0.042168546468019485
Loss at step 850: 0.02944686822593212
Loss at step 900: 0.034017935395240784
Mean training loss after epoch 144: 0.04199131450323916
EPOCH: 145
Loss at step 0: 0.0357891246676445
Loss at step 50: 0.05408082529902458
Loss at step 100: 0.03127046301960945
Loss at step 150: 0.035092175006866455
Loss at step 200: 0.04589945450425148
Loss at step 250: 0.03216953203082085
Loss at step 300: 0.04434205964207649
Loss at step 350: 0.06629910320043564
Loss at step 400: 0.04920700937509537
Loss at step 450: 0.042137082666158676
Loss at step 500: 0.031183796003460884
Loss at step 550: 0.03542826697230339
Loss at step 600: 0.04166002199053764
Loss at step 650: 0.037325046956539154
Loss at step 700: 0.04343381151556969
Loss at step 750: 0.03837285563349724
Loss at step 800: 0.0291228536516428
Loss at step 850: 0.03319094330072403
Loss at step 900: 0.036391112953424454
Mean training loss after epoch 145: 0.04157962997569077
EPOCH: 146
Loss at step 0: 0.03486700356006622
Loss at step 50: 0.04799339547753334
Loss at step 100: 0.053472988307476044
Loss at step 150: 0.049768686294555664
Loss at step 200: 0.04211004450917244
Loss at step 250: 0.08333371579647064
Loss at step 300: 0.03735140338540077
Loss at step 350: 0.0437965951859951
Loss at step 400: 0.04522951692342758
Loss at step 450: 0.050311364233493805
Loss at step 500: 0.05300811678171158
Loss at step 550: 0.037035051733255386
Loss at step 600: 0.03540425002574921
Loss at step 650: 0.03272141516208649
Loss at step 700: 0.04344155266880989
Loss at step 750: 0.04171430692076683
Loss at step 800: 0.03223220631480217
Loss at step 850: 0.04631401598453522
Loss at step 900: 0.034416262060403824
Mean training loss after epoch 146: 0.04228796660383818
EPOCH: 147
Loss at step 0: 0.04061895236372948
Loss at step 50: 0.06510394811630249
Loss at step 100: 0.034833405166864395
Loss at step 150: 0.04406037554144859
Loss at step 200: 0.045396748930215836
Loss at step 250: 0.04945330694317818
Loss at step 300: 0.0697096735239029
Loss at step 350: 0.040425848215818405
Loss at step 400: 0.03286696970462799
Loss at step 450: 0.030878452584147453
Loss at step 500: 0.052155978977680206
Loss at step 550: 0.050235211849212646
Loss at step 600: 0.03293605148792267
Loss at step 650: 0.05016927048563957
Loss at step 700: 0.03877097740769386
Loss at step 750: 0.051481712609529495
Loss at step 800: 0.0331539511680603
Loss at step 850: 0.0463896170258522
Loss at step 900: 0.050032272934913635
Mean training loss after epoch 147: 0.041787002301181174
EPOCH: 148
Loss at step 0: 0.0357489250600338
Loss at step 50: 0.05856280401349068
Loss at step 100: 0.0320919044315815
Loss at step 150: 0.0821550190448761
Loss at step 200: 0.04068780690431595
Loss at step 250: 0.03332633152604103
Loss at step 300: 0.0528477281332016
Loss at step 350: 0.042828939855098724
Loss at step 400: 0.04491664841771126
Loss at step 450: 0.04121486842632294
Loss at step 500: 0.03646986186504364
Loss at step 550: 0.040154021233320236
Loss at step 600: 0.03764384612441063
Loss at step 650: 0.05622980371117592
Loss at step 700: 0.0694388672709465
Loss at step 750: 0.026673797518014908
Loss at step 800: 0.033385686576366425
Loss at step 850: 0.051487911492586136
Loss at step 900: 0.03869228437542915
Mean training loss after epoch 148: 0.0420387861078609
EPOCH: 149
Loss at step 0: 0.032291069626808167
Loss at step 50: 0.04381278157234192
Loss at step 100: 0.05175172910094261
Loss at step 150: 0.039617761969566345
Loss at step 200: 0.04136640951037407
Loss at step 250: 0.06429476290941238
Loss at step 300: 0.06880918890237808
Loss at step 350: 0.0365937314927578
Loss at step 400: 0.04158145561814308
Loss at step 450: 0.04357963427901268
Loss at step 500: 0.03778946399688721
Loss at step 550: 0.02897937409579754
Loss at step 600: 0.03430233523249626
Loss at step 650: 0.04783837869763374
Loss at step 700: 0.033608194440603256
Loss at step 750: 0.04544506222009659
Loss at step 800: 0.048203710466623306
Loss at step 850: 0.04275382310152054
Loss at step 900: 0.041548918932676315
Mean training loss after epoch 149: 0.042700570968311355
EPOCH: 150
Loss at step 0: 0.049964435398578644
Loss at step 50: 0.048390697687864304
Loss at step 100: 0.05685277283191681
Loss at step 150: 0.036489762365818024
Loss at step 200: 0.05473584309220314
Loss at step 250: 0.03579915314912796
Loss at step 300: 0.03483206406235695
Loss at step 350: 0.049429502338171005
Loss at step 400: 0.050645049661397934
Loss at step 450: 0.035665128380060196
Loss at step 500: 0.055477336049079895
Loss at step 550: 0.036538902670145035
Loss at step 600: 0.034477416425943375
Loss at step 650: 0.029415998607873917
Loss at step 700: 0.036369092762470245
Loss at step 750: 0.03708411753177643
Loss at step 800: 0.04495839774608612
Loss at step 850: 0.03702973574399948
Loss at step 900: 0.03885873034596443
Mean training loss after epoch 150: 0.041629522208815446
EPOCH: 151
Loss at step 0: 0.03872460499405861
Loss at step 50: 0.035674791783094406
Loss at step 100: 0.03539711609482765
Loss at step 150: 0.035271402448415756
Loss at step 200: 0.038769178092479706
Loss at step 250: 0.05824952945113182
Loss at step 300: 0.035957783460617065
Loss at step 350: 0.03432445973157883
Loss at step 400: 0.05303642526268959
Loss at step 450: 0.030788464471697807
Loss at step 500: 0.03501858562231064
Loss at step 550: 0.04111636430025101
Loss at step 600: 0.032229967415332794
Loss at step 650: 0.03571541979908943
Loss at step 700: 0.03328194469213486
Loss at step 750: 0.03248866647481918
Loss at step 800: 0.03424318879842758
Loss at step 850: 0.039510175585746765
Loss at step 900: 0.05264480784535408
Mean training loss after epoch 151: 0.041698548992448396
EPOCH: 152
Loss at step 0: 0.043211840093135834
Loss at step 50: 0.03340320661664009
Loss at step 100: 0.048411767929792404
Loss at step 150: 0.056272272020578384
Loss at step 200: 0.051562827080488205
Loss at step 250: 0.034891560673713684
Loss at step 300: 0.05148777738213539
Loss at step 350: 0.03595224395394325
Loss at step 400: 0.028903206810355186
Loss at step 450: 0.038183365017175674
Loss at step 500: 0.03731566295027733
Loss at step 550: 0.0356873981654644
Loss at step 600: 0.0313677042722702
Loss at step 650: 0.03932942450046539
Loss at step 700: 0.0665639266371727
Loss at step 750: 0.03317286819219589
Loss at step 800: 0.03564203903079033
Loss at step 850: 0.03477342799305916
Loss at step 900: 0.04680752009153366
Mean training loss after epoch 152: 0.041610624573664115
EPOCH: 153
Loss at step 0: 0.030752496793866158
Loss at step 50: 0.03984297439455986
Loss at step 100: 0.031056780368089676
Loss at step 150: 0.03939105570316315
Loss at step 200: 0.06974080204963684
Loss at step 250: 0.03848470747470856
Loss at step 300: 0.03297055512666702
Loss at step 350: 0.04547300189733505
Loss at step 400: 0.03682214766740799
Loss at step 450: 0.03131253272294998
Loss at step 500: 0.04345107078552246
Loss at step 550: 0.04177214205265045
Loss at step 600: 0.030688272789120674
Loss at step 650: 0.048584792762994766
Loss at step 700: 0.038229744881391525
Loss at step 750: 0.031089046970009804
Loss at step 800: 0.03985687717795372
Loss at step 850: 0.041947998106479645
Loss at step 900: 0.044597942382097244
Mean training loss after epoch 153: 0.042380063657535674
EPOCH: 154
Loss at step 0: 0.05345641076564789
Loss at step 50: 0.042899783700704575
Loss at step 100: 0.03357723355293274
Loss at step 150: 0.05140424147248268
Loss at step 200: 0.02819659747183323
Loss at step 250: 0.04493157938122749
Loss at step 300: 0.03656427562236786
Loss at step 350: 0.030955562368035316
Loss at step 400: 0.0369301363825798
Loss at step 450: 0.040071986615657806
Loss at step 500: 0.05197841301560402
Loss at step 550: 0.03778393194079399
Loss at step 600: 0.03276905044913292
Loss at step 650: 0.03182387724518776
Loss at step 700: 0.03351379930973053
Loss at step 750: 0.033173974603414536
Loss at step 800: 0.0386836901307106
Loss at step 850: 0.03227386251091957
Loss at step 900: 0.035299576818943024
Mean training loss after epoch 154: 0.04212207293539032
EPOCH: 155
Loss at step 0: 0.05649520829319954
Loss at step 50: 0.03102577105164528
Loss at step 100: 0.032175131142139435
Loss at step 150: 0.0337371900677681
Loss at step 200: 0.03588712960481644
Loss at step 250: 0.047558512538671494
Loss at step 300: 0.05539235100150108
Loss at step 350: 0.0677049309015274
Loss at step 400: 0.04687006399035454
Loss at step 450: 0.042771343141794205
Loss at step 500: 0.03482664376497269
Loss at step 550: 0.044766876846551895
Loss at step 600: 0.045993443578481674
Loss at step 650: 0.041110921651124954
Loss at step 700: 0.03490821272134781
Loss at step 750: 0.04456814005970955
Loss at step 800: 0.05052883177995682
Loss at step 850: 0.06528449058532715
Loss at step 900: 0.036868300288915634
Mean training loss after epoch 155: 0.04227992473269449
EPOCH: 156
Loss at step 0: 0.05328263342380524
Loss at step 50: 0.05558167025446892
Loss at step 100: 0.035837870091199875
Loss at step 150: 0.03872068226337433
Loss at step 200: 0.062104836106300354
Loss at step 250: 0.038871198892593384
Loss at step 300: 0.035116370767354965
Loss at step 350: 0.0337044931948185
Loss at step 400: 0.035277001559734344
Loss at step 450: 0.05462329462170601
Loss at step 500: 0.03315630555152893
Loss at step 550: 0.03951284661889076
Loss at step 600: 0.04857729747891426
Loss at step 650: 0.042670510709285736
Loss at step 700: 0.030172094702720642
Loss at step 750: 0.035838935524225235
Loss at step 800: 0.05214604362845421
Loss at step 850: 0.041880179196596146
Loss at step 900: 0.03618021681904793
Mean training loss after epoch 156: 0.04151801192469752
EPOCH: 157
Loss at step 0: 0.053754597902297974
Loss at step 50: 0.04583209753036499
Loss at step 100: 0.048923052847385406
Loss at step 150: 0.037170544266700745
Loss at step 200: 0.05567541718482971
Loss at step 250: 0.04202282056212425
Loss at step 300: 0.0530812032520771
Loss at step 350: 0.036701470613479614
Loss at step 400: 0.04105161875486374
Loss at step 450: 0.035315290093421936
Loss at step 500: 0.03592460975050926
Loss at step 550: 0.05100354924798012
Loss at step 600: 0.03835726156830788
Loss at step 650: 0.040117476135492325
Loss at step 700: 0.03319944441318512
Loss at step 750: 0.04874331131577492
Loss at step 800: 0.03211164474487305
Loss at step 850: 0.03677653148770332
Loss at step 900: 0.03921843320131302
Mean training loss after epoch 157: 0.04208396263778019
EPOCH: 158
Loss at step 0: 0.053142696619033813
Loss at step 50: 0.039625342935323715
Loss at step 100: 0.041874464601278305
Loss at step 150: 0.04172403737902641
Loss at step 200: 0.03345603868365288
Loss at step 250: 0.03492490202188492
Loss at step 300: 0.038030609488487244
Loss at step 350: 0.028495457023382187
Loss at step 400: 0.04284537583589554
Loss at step 450: 0.034388937056064606
Loss at step 500: 0.055605459958314896
Loss at step 550: 0.04129524528980255
Loss at step 600: 0.03616258502006531
Loss at step 650: 0.0285470113158226
Loss at step 700: 0.0391588918864727
Loss at step 750: 0.0373045951128006
Loss at step 800: 0.036178652197122574
Loss at step 850: 0.04029553756117821
Loss at step 900: 0.030573036521673203
Mean training loss after epoch 158: 0.041759737425728014
EPOCH: 159
Loss at step 0: 0.03646078705787659
Loss at step 50: 0.03891812264919281
Loss at step 100: 0.04230709373950958
Loss at step 150: 0.038509808480739594
Loss at step 200: 0.053831759840250015
Loss at step 250: 0.045457784086465836
Loss at step 300: 0.035048820078372955
Loss at step 350: 0.0402105338871479
Loss at step 400: 0.040246471762657166
Loss at step 450: 0.03213101997971535
Loss at step 500: 0.039840925484895706
Loss at step 550: 0.034487396478652954
Loss at step 600: 0.05263527110219002
Loss at step 650: 0.03569900244474411
Loss at step 700: 0.03732169046998024
Loss at step 750: 0.03432917222380638
Loss at step 800: 0.029399283230304718
Loss at step 850: 0.03557441011071205
Loss at step 900: 0.05276878550648689
Mean training loss after epoch 159: 0.04198013888056408
EPOCH: 160
Loss at step 0: 0.03272593766450882
Loss at step 50: 0.03808004409074783
Loss at step 100: 0.04068906977772713
Loss at step 150: 0.03765731677412987
Loss at step 200: 0.044079214334487915
Loss at step 250: 0.03938356041908264
Loss at step 300: 0.05161362513899803
Loss at step 350: 0.04128038510680199
Loss at step 400: 0.05006210878491402
Loss at step 450: 0.03323276713490486
Loss at step 500: 0.04292380064725876
Loss at step 550: 0.03615903854370117
Loss at step 600: 0.04243318736553192
Loss at step 650: 0.03709695115685463
Loss at step 700: 0.05068572610616684
Loss at step 750: 0.03722544386982918
Loss at step 800: 0.056304723024368286
Loss at step 850: 0.05459215119481087
Loss at step 900: 0.035317618399858475
Mean training loss after epoch 160: 0.04215756298970185
EPOCH: 161
Loss at step 0: 0.042033180594444275
Loss at step 50: 0.04185538738965988
Loss at step 100: 0.03522934764623642
Loss at step 150: 0.05067601427435875
Loss at step 200: 0.03864051774144173
Loss at step 250: 0.04868760704994202
Loss at step 300: 0.03932592645287514
Loss at step 350: 0.04293591156601906
Loss at step 400: 0.0446302592754364
Loss at step 450: 0.03664654493331909
Loss at step 500: 0.05163183808326721
Loss at step 550: 0.044752635061740875
Loss at step 600: 0.03847457468509674
Loss at step 650: 0.05418030917644501
Loss at step 700: 0.04585876315832138
Loss at step 750: 0.03625411167740822
Loss at step 800: 0.035573311150074005
Loss at step 850: 0.04612034559249878
Loss at step 900: 0.03266897052526474
Mean training loss after epoch 161: 0.041682401608461254
EPOCH: 162
Loss at step 0: 0.03443225100636482
Loss at step 50: 0.05295828357338905
Loss at step 100: 0.044075947254896164
Loss at step 150: 0.039766378700733185
Loss at step 200: 0.03175616264343262
Loss at step 250: 0.03700171783566475
Loss at step 300: 0.06151432916522026
Loss at step 350: 0.03539503365755081
Loss at step 400: 0.035971418023109436
Loss at step 450: 0.06043902039527893
Loss at step 500: 0.026486312970519066
Loss at step 550: 0.03172525390982628
Loss at step 600: 0.05635107308626175
Loss at step 650: 0.049782294780015945
Loss at step 700: 0.03821375221014023
Loss at step 750: 0.03226267918944359
Loss at step 800: 0.04801420122385025
Loss at step 850: 0.037035200744867325
Loss at step 900: 0.0541158989071846
Mean training loss after epoch 162: 0.041881437564312396
EPOCH: 163
Loss at step 0: 0.046113207936286926
Loss at step 50: 0.0443924181163311
Loss at step 100: 0.03236755356192589
Loss at step 150: 0.03812683746218681
Loss at step 200: 0.03635641187429428
Loss at step 250: 0.062125999480485916
Loss at step 300: 0.05177285149693489
Loss at step 350: 0.04134644195437431
Loss at step 400: 0.051762331277132034
Loss at step 450: 0.038076095283031464
Loss at step 500: 0.048686202615499496
Loss at step 550: 0.04135816916823387
Loss at step 600: 0.04114864766597748
Loss at step 650: 0.036704305559396744
Loss at step 700: 0.049851950258016586
Loss at step 750: 0.03393217548727989
Loss at step 800: 0.03302428126335144
Loss at step 850: 0.036545637995004654
Loss at step 900: 0.042946405708789825
Mean training loss after epoch 163: 0.04166837653188881
EPOCH: 164
Loss at step 0: 0.05492968484759331
Loss at step 50: 0.03143102303147316
Loss at step 100: 0.03686508163809776
Loss at step 150: 0.03780407831072807
Loss at step 200: 0.03192293271422386
Loss at step 250: 0.06838148087263107
Loss at step 300: 0.05755019187927246
Loss at step 350: 0.050360266119241714
Loss at step 400: 0.036883942782878876
Loss at step 450: 0.05594587326049805
Loss at step 500: 0.0486307218670845
Loss at step 550: 0.04816119745373726
Loss at step 600: 0.05513868108391762
Loss at step 650: 0.040407393127679825
Loss at step 700: 0.035788264125585556
Loss at step 750: 0.029059693217277527
Loss at step 800: 0.03355870768427849
Loss at step 850: 0.06275414675474167
Loss at step 900: 0.04140639677643776
Mean training loss after epoch 164: 0.0419985556732744
EPOCH: 165
Loss at step 0: 0.05084880441427231
Loss at step 50: 0.0543023943901062
Loss at step 100: 0.056270111352205276
Loss at step 150: 0.034627776592969894
Loss at step 200: 0.029606739059090614
Loss at step 250: 0.04984544217586517
Loss at step 300: 0.033280953764915466
Loss at step 350: 0.04561571031808853
Loss at step 400: 0.034940462559461594
Loss at step 450: 0.04139413684606552
Loss at step 500: 0.040452610701322556
Loss at step 550: 0.04350801184773445
Loss at step 600: 0.03656912222504616
Loss at step 650: 0.04006418213248253
Loss at step 700: 0.04635033383965492
Loss at step 750: 0.03196395933628082
Loss at step 800: 0.03807682916522026
Loss at step 850: 0.030541956424713135
Loss at step 900: 0.05490187183022499
Mean training loss after epoch 165: 0.04234211670477062
EPOCH: 166
Loss at step 0: 0.0435408279299736
Loss at step 50: 0.04618099704384804
Loss at step 100: 0.05463716760277748
Loss at step 150: 0.05916711688041687
Loss at step 200: 0.05074099823832512
Loss at step 250: 0.03297647461295128
Loss at step 300: 0.03944069892168045
Loss at step 350: 0.06627862900495529
Loss at step 400: 0.03429859131574631
Loss at step 450: 0.039032090455293655
Loss at step 500: 0.0400698184967041
Loss at step 550: 0.0376395508646965
Loss at step 600: 0.03616385906934738
Loss at step 650: 0.05470064654946327
Loss at step 700: 0.04045157879590988
Loss at step 750: 0.07541951537132263
Loss at step 800: 0.05270006135106087
Loss at step 850: 0.041193705052137375
Loss at step 900: 0.03363768011331558
Mean training loss after epoch 166: 0.041681498600872974
EPOCH: 167
Loss at step 0: 0.03260944038629532
Loss at step 50: 0.031192278489470482
Loss at step 100: 0.030275052413344383
Loss at step 150: 0.0319354273378849
Loss at step 200: 0.038708776235580444
Loss at step 250: 0.05172756686806679
Loss at step 300: 0.0345488116145134
Loss at step 350: 0.04113924130797386
Loss at step 400: 0.035771701484918594
Loss at step 450: 0.03828283026814461
Loss at step 500: 0.0383487194776535
Loss at step 550: 0.06837842613458633
Loss at step 600: 0.03420782834291458
Loss at step 650: 0.03368475288152695
Loss at step 700: 0.036766745150089264
Loss at step 750: 0.03165431320667267
Loss at step 800: 0.04517103731632233
Loss at step 850: 0.051145922392606735
Loss at step 900: 0.03939900919795036
Mean training loss after epoch 167: 0.04159986691227727
EPOCH: 168
Loss at step 0: 0.04851570725440979
Loss at step 50: 0.06281198561191559
Loss at step 100: 0.04415833204984665
Loss at step 150: 0.03673063963651657
Loss at step 200: 0.03636414185166359
Loss at step 250: 0.03222019225358963
Loss at step 300: 0.0397556908428669
Loss at step 350: 0.03364252671599388
Loss at step 400: 0.03710572049021721
Loss at step 450: 0.02969525195658207
Loss at step 500: 0.08423370122909546
Loss at step 550: 0.0415620356798172
Loss at step 600: 0.027658242732286453
Loss at step 650: 0.042391855269670486
Loss at step 700: 0.03817259520292282
Loss at step 750: 0.03259177878499031
Loss at step 800: 0.03972531110048294
Loss at step 850: 0.0303537305444479
Loss at step 900: 0.03652387112379074
Mean training loss after epoch 168: 0.04140357141579583
EPOCH: 169
Loss at step 0: 0.05354360118508339
Loss at step 50: 0.03195581212639809
Loss at step 100: 0.06500847637653351
Loss at step 150: 0.06610570102930069
Loss at step 200: 0.04113024100661278
Loss at step 250: 0.027583038434386253
Loss at step 300: 0.03656454011797905
Loss at step 350: 0.052170515060424805
Loss at step 400: 0.03852315619587898
Loss at step 450: 0.03867664188146591
Loss at step 500: 0.04367469251155853
Loss at step 550: 0.03310701623558998
Loss at step 600: 0.06611268222332001
Loss at step 650: 0.03098241798579693
Loss at step 700: 0.04891456663608551
Loss at step 750: 0.03942761942744255
Loss at step 800: 0.041297849267721176
Loss at step 850: 0.052537478506565094
Loss at step 900: 0.0340694934129715
Mean training loss after epoch 169: 0.04182671864967801
EPOCH: 170
Loss at step 0: 0.0305585078895092
Loss at step 50: 0.044676389545202255
Loss at step 100: 0.03262167051434517
Loss at step 150: 0.03465871140360832
Loss at step 200: 0.053631119430065155
Loss at step 250: 0.04847809672355652
Loss at step 300: 0.03438197821378708
Loss at step 350: 0.03563510254025459
Loss at step 400: 0.041701238602399826
Loss at step 450: 0.03720324486494064
Loss at step 500: 0.03245695307850838
Loss at step 550: 0.050643812865018845
Loss at step 600: 0.0421716682612896
Loss at step 650: 0.031076569110155106
Loss at step 700: 0.03496023640036583
Loss at step 750: 0.041589297354221344
Loss at step 800: 0.0360647588968277
Loss at step 850: 0.03295659273862839
Loss at step 900: 0.03453189507126808
Mean training loss after epoch 170: 0.04185354243169652
EPOCH: 171
Loss at step 0: 0.053923431783914566
Loss at step 50: 0.05165528133511543
Loss at step 100: 0.05359581485390663
Loss at step 150: 0.03814880549907684
Loss at step 200: 0.057905055582523346
Loss at step 250: 0.05054771900177002
Loss at step 300: 0.029955018311738968
Loss at step 350: 0.04426371306180954
Loss at step 400: 0.037701405584812164
Loss at step 450: 0.04641666263341904
Loss at step 500: 0.05583236366510391
Loss at step 550: 0.03619731217622757
Loss at step 600: 0.03259444236755371
Loss at step 650: 0.060233090072870255
Loss at step 700: 0.04138468950986862
Loss at step 750: 0.03163965418934822
Loss at step 800: 0.032825879752635956
Loss at step 850: 0.05752290412783623
Loss at step 900: 0.03443465754389763
Mean training loss after epoch 171: 0.04144824135389282
EPOCH: 172
Loss at step 0: 0.03209567442536354
Loss at step 50: 0.05818139761686325
Loss at step 100: 0.05136827379465103
Loss at step 150: 0.027739521116018295
Loss at step 200: 0.042954180389642715
Loss at step 250: 0.030690737068653107
Loss at step 300: 0.03409494832158089
Loss at step 350: 0.027100542560219765
Loss at step 400: 0.03840995579957962
Loss at step 450: 0.03903039172291756
Loss at step 500: 0.04468401148915291
Loss at step 550: 0.03910224139690399
Loss at step 600: 0.0376056544482708
Loss at step 650: 0.035212963819503784
Loss at step 700: 0.03200206905603409
Loss at step 750: 0.035134561359882355
Loss at step 800: 0.03554246947169304
Loss at step 850: 0.05527525022625923
Loss at step 900: 0.03256995603442192
Mean training loss after epoch 172: 0.04208877019441204
EPOCH: 173
Loss at step 0: 0.035287100821733475
Loss at step 50: 0.039894670248031616
Loss at step 100: 0.03790206089615822
Loss at step 150: 0.03075706586241722
Loss at step 200: 0.05236460268497467
Loss at step 250: 0.04161772504448891
Loss at step 300: 0.033940721303224564
Loss at step 350: 0.06850095838308334
Loss at step 400: 0.03070612996816635
Loss at step 450: 0.045446161180734634
Loss at step 500: 0.032258786261081696
Loss at step 550: 0.03494522348046303
Loss at step 600: 0.045330144464969635
Loss at step 650: 0.036904022097587585
Loss at step 700: 0.032146237790584564
Loss at step 750: 0.034715086221694946
Loss at step 800: 0.03468701243400574
Loss at step 850: 0.033052023500204086
Loss at step 900: 0.034720417112112045
Mean training loss after epoch 173: 0.042048089476282406
EPOCH: 174
Loss at step 0: 0.03656395152211189
Loss at step 50: 0.04568571224808693
Loss at step 100: 0.03481351211667061
Loss at step 150: 0.038224395364522934
Loss at step 200: 0.058359503746032715
Loss at step 250: 0.03833071514964104
Loss at step 300: 0.043960943818092346
Loss at step 350: 0.04375429451465607
Loss at step 400: 0.03465664014220238
Loss at step 450: 0.029857201501727104
Loss at step 500: 0.044843051582574844
Loss at step 550: 0.05128093063831329
Loss at step 600: 0.06645429879426956
Loss at step 650: 0.03523625433444977
Loss at step 700: 0.03666084632277489
Loss at step 750: 0.04340723156929016
Loss at step 800: 0.03913688659667969
Loss at step 850: 0.055996235460042953
Loss at step 900: 0.032421406358480453
Mean training loss after epoch 174: 0.04112288686655351
EPOCH: 175
Loss at step 0: 0.028204411268234253
Loss at step 50: 0.030527090653777122
Loss at step 100: 0.039564263075590134
Loss at step 150: 0.03648807108402252
Loss at step 200: 0.042801812291145325
Loss at step 250: 0.07026997953653336
Loss at step 300: 0.053135115653276443
Loss at step 350: 0.0397661030292511
Loss at step 400: 0.047537870705127716
Loss at step 450: 0.03553953766822815
Loss at step 500: 0.0405605211853981
Loss at step 550: 0.04098905622959137
Loss at step 600: 0.05999923497438431
Loss at step 650: 0.05660012736916542
Loss at step 700: 0.03235200420022011
Loss at step 750: 0.03290760517120361
Loss at step 800: 0.034900546073913574
Loss at step 850: 0.05151025578379631
Loss at step 900: 0.041141875088214874
Mean training loss after epoch 175: 0.04161210253096021
EPOCH: 176
Loss at step 0: 0.03776821494102478
Loss at step 50: 0.04021279513835907
Loss at step 100: 0.041916023939847946
Loss at step 150: 0.03895966708660126
Loss at step 200: 0.032551463693380356
Loss at step 250: 0.05833392217755318
Loss at step 300: 0.048504963517189026
Loss at step 350: 0.05007625371217728
Loss at step 400: 0.03597918152809143
Loss at step 450: 0.030169488862156868
Loss at step 500: 0.03819960728287697
Loss at step 550: 0.037356436252593994
Loss at step 600: 0.03851252421736717
Loss at step 650: 0.06501714140176773
Loss at step 700: 0.05189223960042
Loss at step 750: 0.03168368339538574
Loss at step 800: 0.03951912373304367
Loss at step 850: 0.03564509004354477
Loss at step 900: 0.03276278078556061
Mean training loss after epoch 176: 0.04125692002546749
EPOCH: 177
Loss at step 0: 0.03515172004699707
Loss at step 50: 0.03367337957024574
Loss at step 100: 0.0318077951669693
Loss at step 150: 0.03582939878106117
Loss at step 200: 0.05992460623383522
Loss at step 250: 0.0508149079978466
Loss at step 300: 0.041243456304073334
Loss at step 350: 0.05429394170641899
Loss at step 400: 0.048130929470062256
Loss at step 450: 0.035247184336185455
Loss at step 500: 0.03542075678706169
Loss at step 550: 0.055204953998327255
Loss at step 600: 0.052897412329912186
Loss at step 650: 0.06927145272493362
Loss at step 700: 0.04794507846236229
Loss at step 750: 0.03416883945465088
Loss at step 800: 0.04960576817393303
Loss at step 850: 0.03425171971321106
Loss at step 900: 0.03997402638196945
Mean training loss after epoch 177: 0.04163434571707681
EPOCH: 178
Loss at step 0: 0.04990256577730179
Loss at step 50: 0.0538577102124691
Loss at step 100: 0.03529036417603493
Loss at step 150: 0.03500866889953613
Loss at step 200: 0.03199452906847
Loss at step 250: 0.036851897835731506
Loss at step 300: 0.029136331751942635
Loss at step 350: 0.030358897522091866
Loss at step 400: 0.051947467029094696
Loss at step 450: 0.037148743867874146
Loss at step 500: 0.03686618059873581
Loss at step 550: 0.036732930690050125
Loss at step 600: 0.04222632199525833
Loss at step 650: 0.03982901945710182
Loss at step 700: 0.03642851859331131
Loss at step 750: 0.04014425352215767
Loss at step 800: 0.0999036505818367
Loss at step 850: 0.038242340087890625
Loss at step 900: 0.035852327942848206
Mean training loss after epoch 178: 0.04207657766875936
EPOCH: 179
Loss at step 0: 0.04087178036570549
Loss at step 50: 0.04753424972295761
Loss at step 100: 0.03290014714002609
Loss at step 150: 0.03695341572165489
Loss at step 200: 0.03138614073395729
Loss at step 250: 0.040319520980119705
Loss at step 300: 0.049994390457868576
Loss at step 350: 0.05205334722995758
Loss at step 400: 0.06807678192853928
Loss at step 450: 0.051330793648958206
Loss at step 500: 0.03462173044681549
Loss at step 550: 0.03375495225191116
Loss at step 600: 0.036820266395807266
Loss at step 650: 0.03800901770591736
Loss at step 700: 0.030175233259797096
Loss at step 750: 0.05230529233813286
Loss at step 800: 0.03545716404914856
Loss at step 850: 0.052242521196603775
Loss at step 900: 0.030589766800403595
Mean training loss after epoch 179: 0.04223245648003972
EPOCH: 180
Loss at step 0: 0.03807156905531883
Loss at step 50: 0.05775832384824753
Loss at step 100: 0.03275652229785919
Loss at step 150: 0.03259390965104103
Loss at step 200: 0.06442693620920181
Loss at step 250: 0.04907452315092087
Loss at step 300: 0.033002205193042755
Loss at step 350: 0.05629289895296097
Loss at step 400: 0.039508040994405746
Loss at step 450: 0.03669936582446098
Loss at step 500: 0.0339655764400959
Loss at step 550: 0.035679228603839874
Loss at step 600: 0.039923977106809616
Loss at step 650: 0.05427751690149307
Loss at step 700: 0.037826668471097946
Loss at step 750: 0.035374172031879425
Loss at step 800: 0.06559602171182632
Loss at step 850: 0.03814015910029411
Loss at step 900: 0.03909355774521828
Mean training loss after epoch 180: 0.04139347126218937
EPOCH: 181
Loss at step 0: 0.03391748294234276
Loss at step 50: 0.04056994244456291
Loss at step 100: 0.049423567950725555
Loss at step 150: 0.046806685626506805
Loss at step 200: 0.03752606734633446
Loss at step 250: 0.03340506553649902
Loss at step 300: 0.04189173877239227
Loss at step 350: 0.031038936227560043
Loss at step 400: 0.0705329179763794
Loss at step 450: 0.03440277650952339
Loss at step 500: 0.0336567685008049
Loss at step 550: 0.03429238125681877
Loss at step 600: 0.031661275774240494
Loss at step 650: 0.05041368678212166
Loss at step 700: 0.030251847580075264
Loss at step 750: 0.042865362018346786
Loss at step 800: 0.033039726316928864
Loss at step 850: 0.058725807815790176
Loss at step 900: 0.03608013689517975
Mean training loss after epoch 181: 0.04149253477157751
EPOCH: 182
Loss at step 0: 0.03710400313138962
Loss at step 50: 0.03371499851346016
Loss at step 100: 0.0341544970870018
Loss at step 150: 0.04096578434109688
Loss at step 200: 0.05157192423939705
Loss at step 250: 0.03875488415360451
Loss at step 300: 0.03593416139483452
Loss at step 350: 0.03340543434023857
Loss at step 400: 0.05967196449637413
Loss at step 450: 0.03931530937552452
Loss at step 500: 0.05755196511745453
Loss at step 550: 0.03740415349602699
Loss at step 600: 0.03923661261796951
Loss at step 650: 0.040235571563243866
Loss at step 700: 0.04200943186879158
Loss at step 750: 0.03610384464263916
Loss at step 800: 0.031481098383665085
Loss at step 850: 0.0516573041677475
Loss at step 900: 0.047245364636182785
Mean training loss after epoch 182: 0.041903669381939145
EPOCH: 183
Loss at step 0: 0.04019775986671448
Loss at step 50: 0.04790247604250908
Loss at step 100: 0.04489818215370178
Loss at step 150: 0.03816897049546242
Loss at step 200: 0.033736344426870346
Loss at step 250: 0.03466064855456352
Loss at step 300: 0.040574636310338974
Loss at step 350: 0.034679412841796875
Loss at step 400: 0.034291334450244904
Loss at step 450: 0.05978235602378845
Loss at step 500: 0.04911112040281296
Loss at step 550: 0.03342486545443535
Loss at step 600: 0.052635472267866135
Loss at step 650: 0.03890468180179596
Loss at step 700: 0.03596392273902893
Loss at step 750: 0.03404191508889198
Loss at step 800: 0.05040137469768524
Loss at step 850: 0.034710999578237534
Loss at step 900: 0.038780633360147476
Mean training loss after epoch 183: 0.041807248219371096
EPOCH: 184
Loss at step 0: 0.05211412534117699
Loss at step 50: 0.04960509017109871
Loss at step 100: 0.03784497454762459
Loss at step 150: 0.03517859801650047
Loss at step 200: 0.03709854185581207
Loss at step 250: 0.03380673751235008
Loss at step 300: 0.03791601583361626
Loss at step 350: 0.04009474813938141
Loss at step 400: 0.03905599191784859
Loss at step 450: 0.03254836052656174
Loss at step 500: 0.05125289410352707
Loss at step 550: 0.038431569933891296
Loss at step 600: 0.03682177513837814
Loss at step 650: 0.03914055600762367
Loss at step 700: 0.03562482073903084
Loss at step 750: 0.04414800554513931
Loss at step 800: 0.03894803300499916
Loss at step 850: 0.03770125284790993
Loss at step 900: 0.030288131907582283
Mean training loss after epoch 184: 0.04138687876527752
EPOCH: 185
Loss at step 0: 0.03609955310821533
Loss at step 50: 0.05682943016290665
Loss at step 100: 0.038018904626369476
Loss at step 150: 0.030884547159075737
Loss at step 200: 0.041269998997449875
Loss at step 250: 0.04373003542423248
Loss at step 300: 0.035877473652362823
Loss at step 350: 0.037849120795726776
Loss at step 400: 0.043793316930532455
Loss at step 450: 0.03450731933116913
Loss at step 500: 0.04273483157157898
Loss at step 550: 0.03578299283981323
Loss at step 600: 0.032646261155605316
Loss at step 650: 0.0462360754609108
Loss at step 700: 0.054228439927101135
Loss at step 750: 0.05155330151319504
Loss at step 800: 0.03744790330529213
Loss at step 850: 0.06680920720100403
Loss at step 900: 0.03223875164985657
Mean training loss after epoch 185: 0.041660312590981595
EPOCH: 186
Loss at step 0: 0.033333949744701385
Loss at step 50: 0.05429576709866524
Loss at step 100: 0.03890854865312576
Loss at step 150: 0.03497329354286194
Loss at step 200: 0.034743137657642365
Loss at step 250: 0.0356815941631794
Loss at step 300: 0.05399752035737038
Loss at step 350: 0.03176051005721092
Loss at step 400: 0.03740697726607323
Loss at step 450: 0.04082540050148964
Loss at step 500: 0.05412736162543297
Loss at step 550: 0.03972582891583443
Loss at step 600: 0.038021307438611984
Loss at step 650: 0.04753376543521881
Loss at step 700: 0.05196103826165199
Loss at step 750: 0.04077177867293358
Loss at step 800: 0.03566981106996536
Loss at step 850: 0.03721268102526665
Loss at step 900: 0.05761560797691345
Mean training loss after epoch 186: 0.04170741890269175
EPOCH: 187
Loss at step 0: 0.03566679731011391
Loss at step 50: 0.11628812551498413
Loss at step 100: 0.03040335141122341
Loss at step 150: 0.03489764779806137
Loss at step 200: 0.05575336515903473
Loss at step 250: 0.046886786818504333
Loss at step 300: 0.035252176225185394
Loss at step 350: 0.043795567005872726
Loss at step 400: 0.028728723526000977
Loss at step 450: 0.036280758678913116
Loss at step 500: 0.04253661632537842
Loss at step 550: 0.05303025618195534
Loss at step 600: 0.03752442076802254
Loss at step 650: 0.031123116612434387
Loss at step 700: 0.03173130005598068
Loss at step 750: 0.04688844829797745
Loss at step 800: 0.036983974277973175
Loss at step 850: 0.04824747145175934
Loss at step 900: 0.039069145917892456
Mean training loss after epoch 187: 0.04183094795626491
EPOCH: 188
Loss at step 0: 0.031547438353300095
Loss at step 50: 0.04061029478907585
Loss at step 100: 0.04006023705005646
Loss at step 150: 0.029827365651726723
Loss at step 200: 0.040378764271736145
Loss at step 250: 0.03698677942156792
Loss at step 300: 0.03617756441235542
Loss at step 350: 0.03405392915010452
Loss at step 400: 0.048338402062654495
Loss at step 450: 0.030941160395741463
Loss at step 500: 0.04532836750149727
Loss at step 550: 0.041155822575092316
Loss at step 600: 0.045638855546712875
Loss at step 650: 0.029842229560017586
Loss at step 700: 0.03459912911057472
Loss at step 750: 0.043139681220054626
Loss at step 800: 0.033857397735118866
Loss at step 850: 0.037075892090797424
Loss at step 900: 0.0419020876288414
Mean training loss after epoch 188: 0.04069976957201132
EPOCH: 189
Loss at step 0: 0.033498603850603104
Loss at step 50: 0.029143836349248886
Loss at step 100: 0.05419186130166054
Loss at step 150: 0.04173846170306206
Loss at step 200: 0.037460848689079285
Loss at step 250: 0.03811207413673401
Loss at step 300: 0.07409770786762238
Loss at step 350: 0.05379877984523773
Loss at step 400: 0.0445668064057827
Loss at step 450: 0.04734111204743385
Loss at step 500: 0.03445335105061531
Loss at step 550: 0.07331730425357819
Loss at step 600: 0.03149990737438202
Loss at step 650: 0.04199940711259842
Loss at step 700: 0.05004751309752464
Loss at step 750: 0.04749492183327675
Loss at step 800: 0.03820522874593735
Loss at step 850: 0.03572584688663483
Loss at step 900: 0.04117851331830025
Mean training loss after epoch 189: 0.041296352620429196
EPOCH: 190
Loss at step 0: 0.03617745265364647
Loss at step 50: 0.03318284451961517
Loss at step 100: 0.03383420780301094
Loss at step 150: 0.048314373940229416
Loss at step 200: 0.03600124642252922
Loss at step 250: 0.07072673738002777
Loss at step 300: 0.04310063272714615
Loss at step 350: 0.03930129110813141
Loss at step 400: 0.042464740574359894
Loss at step 450: 0.03321487456560135
Loss at step 500: 0.0326659195125103
Loss at step 550: 0.03674949333071709
Loss at step 600: 0.04988384619355202
Loss at step 650: 0.03373335674405098
Loss at step 700: 0.04392938315868378
Loss at step 750: 0.0330236591398716
Loss at step 800: 0.041451502591371536
Loss at step 850: 0.03775708004832268
Loss at step 900: 0.027917273342609406
Mean training loss after epoch 190: 0.04126379593039182
EPOCH: 191
Loss at step 0: 0.03522908687591553
Loss at step 50: 0.056289736181497574
Loss at step 100: 0.04285778850317001
Loss at step 150: 0.036079779267311096
Loss at step 200: 0.030695710331201553
Loss at step 250: 0.03508748486638069
Loss at step 300: 0.06639096885919571
Loss at step 350: 0.050350844860076904
Loss at step 400: 0.035781338810920715
Loss at step 450: 0.045949894934892654
Loss at step 500: 0.02796305902302265
Loss at step 550: 0.03321106359362602
Loss at step 600: 0.0372857004404068
Loss at step 650: 0.06294380873441696
Loss at step 700: 0.05089360848069191
Loss at step 750: 0.038762085139751434
Loss at step 800: 0.039276279509067535
Loss at step 850: 0.060114022344350815
Loss at step 900: 0.04756484180688858
Mean training loss after epoch 191: 0.041512265670369426
EPOCH: 192
Loss at step 0: 0.034350521862506866
Loss at step 50: 0.04084480553865433
Loss at step 100: 0.0338444747030735
Loss at step 150: 0.054738275706768036
Loss at step 200: 0.03252284228801727
Loss at step 250: 0.05226755887269974
Loss at step 300: 0.03735841438174248
Loss at step 350: 0.03251798450946808
Loss at step 400: 0.051244113594293594
Loss at step 450: 0.03884818032383919
Loss at step 500: 0.04193374887108803
Loss at step 550: 0.033448901027441025
Loss at step 600: 0.033464811742305756
Loss at step 650: 0.039219241589307785
Loss at step 700: 0.03877369686961174
Loss at step 750: 0.031385522335767746
Loss at step 800: 0.03320163115859032
Loss at step 850: 0.05189124494791031
Loss at step 900: 0.04130826145410538
Mean training loss after epoch 192: 0.041748421510923776
EPOCH: 193
Loss at step 0: 0.04113989695906639
Loss at step 50: 0.03168807551264763
Loss at step 100: 0.035098232328891754
Loss at step 150: 0.034963954240083694
Loss at step 200: 0.05798143520951271
Loss at step 250: 0.03390754386782646
Loss at step 300: 0.04076037555932999
Loss at step 350: 0.0436348132789135
Loss at step 400: 0.03736273944377899
Loss at step 450: 0.038291774690151215
Loss at step 500: 0.040943972766399384
Loss at step 550: 0.04079652577638626
Loss at step 600: 0.04327332228422165
Loss at step 650: 0.03669068217277527
Loss at step 700: 0.037559207528829575
Loss at step 750: 0.032399732619524
Loss at step 800: 0.034300658851861954
Loss at step 850: 0.03065013885498047
Loss at step 900: 0.040528304874897
Mean training loss after epoch 193: 0.04117203234776314
EPOCH: 194
Loss at step 0: 0.06797874718904495
Loss at step 50: 0.052924975752830505
Loss at step 100: 0.05212656781077385
Loss at step 150: 0.03927493467926979
Loss at step 200: 0.043678004294633865
Loss at step 250: 0.036535218358039856
Loss at step 300: 0.03414019197225571
Loss at step 350: 0.03480542451143265
Loss at step 400: 0.04807673394680023
Loss at step 450: 0.03477676957845688
Loss at step 500: 0.05026653781533241
Loss at step 550: 0.049459606409072876
Loss at step 600: 0.05336325243115425
Loss at step 650: 0.05143073946237564
Loss at step 700: 0.029971925541758537
Loss at step 750: 0.040802404284477234
Loss at step 800: 0.037547655403614044
Loss at step 850: 0.03940505161881447
Loss at step 900: 0.029753219336271286
Mean training loss after epoch 194: 0.04150237108090285
EPOCH: 195
Loss at step 0: 0.03520885854959488
Loss at step 50: 0.03639410436153412
Loss at step 100: 0.034854043275117874
Loss at step 150: 0.03551192209124565
Loss at step 200: 0.025638144463300705
Loss at step 250: 0.07378534972667694
Loss at step 300: 0.058501437306404114
Loss at step 350: 0.06586998701095581
Loss at step 400: 0.0365779846906662
Loss at step 450: 0.036176346242427826
Loss at step 500: 0.04393264651298523
Loss at step 550: 0.03411954268813133
Loss at step 600: 0.03807177394628525
Loss at step 650: 0.04445924982428551
Loss at step 700: 0.06824284046888351
Loss at step 750: 0.039268508553504944
Loss at step 800: 0.02649448812007904
Loss at step 850: 0.05371391400694847
Loss at step 900: 0.06416977196931839
Mean training loss after epoch 195: 0.04103545524449999
EPOCH: 196
Loss at step 0: 0.038097675889730453
Loss at step 50: 0.044847916811704636
Loss at step 100: 0.03804577887058258
Loss at step 150: 0.03618156909942627
Loss at step 200: 0.04062632471323013
Loss at step 250: 0.04594080522656441
Loss at step 300: 0.03366316854953766
Loss at step 350: 0.032435376197099686
Loss at step 400: 0.0293735284358263
Loss at step 450: 0.034268464893102646
Loss at step 500: 0.03657921031117439
Loss at step 550: 0.040299192070961
Loss at step 600: 0.03601702302694321
Loss at step 650: 0.04306451603770256
Loss at step 700: 0.03470020368695259
Loss at step 750: 0.0481594018638134
Loss at step 800: 0.04332111403346062
Loss at step 850: 0.048539843410253525
Loss at step 900: 0.040765244513750076
Mean training loss after epoch 196: 0.04173837025671689
EPOCH: 197
Loss at step 0: 0.05736205354332924
Loss at step 50: 0.027131736278533936
Loss at step 100: 0.03562883287668228
Loss at step 150: 0.03762797266244888
Loss at step 200: 0.07137754559516907
Loss at step 250: 0.06394952535629272
Loss at step 300: 0.04078604280948639
Loss at step 350: 0.04103813320398331
Loss at step 400: 0.040748171508312225
Loss at step 450: 0.049523528665304184
Loss at step 500: 0.04065272584557533
Loss at step 550: 0.05001842975616455
Loss at step 600: 0.05851582810282707
Loss at step 650: 0.0581602081656456
Loss at step 700: 0.02705790475010872
Loss at step 750: 0.040607865899801254
Loss at step 800: 0.03715815767645836
Loss at step 850: 0.03969978168606758
Loss at step 900: 0.043329522013664246
Mean training loss after epoch 197: 0.04137408531614458
EPOCH: 198
Loss at step 0: 0.03971967101097107
Loss at step 50: 0.03631794452667236
Loss at step 100: 0.0338369719684124
Loss at step 150: 0.04944315552711487
Loss at step 200: 0.04070665314793587
Loss at step 250: 0.051509302109479904
Loss at step 300: 0.05552169308066368
Loss at step 350: 0.03374495357275009
Loss at step 400: 0.03621584549546242
Loss at step 450: 0.03770939260721207
Loss at step 500: 0.03463056683540344
Loss at step 550: 0.02788911759853363
Loss at step 600: 0.04447398707270622
Loss at step 650: 0.05413061007857323
Loss at step 700: 0.04044802114367485
Loss at step 750: 0.07355301827192307
Loss at step 800: 0.03673677518963814
Loss at step 850: 0.037305835634469986
Loss at step 900: 0.0418182909488678
Mean training loss after epoch 198: 0.041858961478050453
EPOCH: 199
Loss at step 0: 0.04305747523903847
Loss at step 50: 0.03307000920176506
Loss at step 100: 0.033796682953834534
Loss at step 150: 0.0348166823387146
Loss at step 200: 0.045635029673576355
Loss at step 250: 0.03464725613594055
Loss at step 300: 0.07720781862735748
Loss at step 350: 0.024645846337080002
Loss at step 400: 0.03345625847578049
Loss at step 450: 0.03154450282454491
Loss at step 500: 0.045259904116392136
Loss at step 550: 0.03695496916770935
Loss at step 600: 0.0499909333884716
Loss at step 650: 0.04106828570365906
Loss at step 700: 0.032556742429733276
Loss at step 750: 0.03082137182354927
Loss at step 800: 0.03785950317978859
Loss at step 850: 0.05331188812851906
Loss at step 900: 0.027672991156578064
Mean training loss after epoch 199: 0.041491717618427425
EPOCH: 200
Loss at step 0: 0.035790350288152695
Loss at step 50: 0.041217826306819916
Loss at step 100: 0.034450091421604156
Loss at step 150: 0.03561009466648102
Loss at step 200: 0.03485352545976639
Loss at step 250: 0.029213659465312958
Loss at step 300: 0.04888638108968735
Loss at step 350: 0.049630194902420044
Loss at step 400: 0.040246348828077316
Loss at step 450: 0.051989808678627014
Loss at step 500: 0.03532884642481804
Loss at step 550: 0.05131925269961357
Loss at step 600: 0.07050848752260208
Loss at step 650: 0.051529932767152786
Loss at step 700: 0.031611260026693344
Loss at step 750: 0.05834163725376129
Loss at step 800: 0.03199863061308861
Loss at step 850: 0.050334375351667404
Loss at step 900: 0.048048146069049835
Mean training loss after epoch 200: 0.04144771575633842
EPOCH: 201
Loss at step 0: 0.049343012273311615
Loss at step 50: 0.03183600679039955
Loss at step 100: 0.048347923904657364
Loss at step 150: 0.05308234319090843
Loss at step 200: 0.05167638882994652
Loss at step 250: 0.04280967637896538
Loss at step 300: 0.0367121621966362
Loss at step 350: 0.05468958988785744
Loss at step 400: 0.03841932862997055
Loss at step 450: 0.034620095044374466
Loss at step 500: 0.057135775685310364
Loss at step 550: 0.041183676570653915
Loss at step 600: 0.040930718183517456
Loss at step 650: 0.03261233866214752
Loss at step 700: 0.03489133343100548
Loss at step 750: 0.02763674221932888
Loss at step 800: 0.03225076198577881
Loss at step 850: 0.05398840829730034
Loss at step 900: 0.049016717821359634
Mean training loss after epoch 201: 0.041507416116848175
EPOCH: 202
Loss at step 0: 0.03353920951485634
Loss at step 50: 0.03718918189406395
Loss at step 100: 0.03364242613315582
Loss at step 150: 0.03372986242175102
Loss at step 200: 0.056242551654577255
Loss at step 250: 0.04020671918988228
Loss at step 300: 0.03130819648504257
Loss at step 350: 0.03483444079756737
Loss at step 400: 0.0482604056596756
Loss at step 450: 0.03938969224691391
Loss at step 500: 0.05265633016824722
Loss at step 550: 0.03554932773113251
Loss at step 600: 0.03541647270321846
Loss at step 650: 0.029953882098197937
Loss at step 700: 0.03256510570645332
Loss at step 750: 0.04815101623535156
Loss at step 800: 0.038196250796318054
Loss at step 850: 0.03245275840163231
Loss at step 900: 0.03632979467511177
Mean training loss after epoch 202: 0.0410298561172953
EPOCH: 203
Loss at step 0: 0.030459241941571236
Loss at step 50: 0.043799057602882385
Loss at step 100: 0.047694768756628036
Loss at step 150: 0.03889485448598862
Loss at step 200: 0.06752520054578781
Loss at step 250: 0.02685130387544632
Loss at step 300: 0.036014627665281296
Loss at step 350: 0.037845950573682785
Loss at step 400: 0.033889371901750565
Loss at step 450: 0.031847402453422546
Loss at step 500: 0.038399528712034225
Loss at step 550: 0.03463640436530113
Loss at step 600: 0.039087750017642975
Loss at step 650: 0.03970421105623245
Loss at step 700: 0.03944718837738037
Loss at step 750: 0.03693550452589989
Loss at step 800: 0.037040844559669495
Loss at step 850: 0.0331515371799469
Loss at step 900: 0.04205691069364548
Mean training loss after epoch 203: 0.0412048772275289
EPOCH: 204
Loss at step 0: 0.056759487837553024
Loss at step 50: 0.0413278266787529
Loss at step 100: 0.040270932018756866
Loss at step 150: 0.0361940898001194
Loss at step 200: 0.07127005606889725
Loss at step 250: 0.05027209222316742
Loss at step 300: 0.04283789172768593
Loss at step 350: 0.05113120749592781
Loss at step 400: 0.04532406106591225
Loss at step 450: 0.04405355453491211
Loss at step 500: 0.057780180126428604
Loss at step 550: 0.037918925285339355
Loss at step 600: 0.03008476458489895
Loss at step 650: 0.04085720703005791
Loss at step 700: 0.04501824453473091
Loss at step 750: 0.03670662268996239
Loss at step 800: 0.034831445664167404
Loss at step 850: 0.035773713141679764
Loss at step 900: 0.03600168228149414
Mean training loss after epoch 204: 0.04139673354616488
EPOCH: 205
Loss at step 0: 0.040792644023895264
Loss at step 50: 0.039271481335163116
Loss at step 100: 0.04059242457151413
Loss at step 150: 0.03818690404295921
Loss at step 200: 0.04294337332248688
Loss at step 250: 0.057097338140010834
Loss at step 300: 0.03699008747935295
Loss at step 350: 0.05602728947997093
Loss at step 400: 0.036347582936286926
Loss at step 450: 0.03644917160272598
Loss at step 500: 0.03737808018922806
Loss at step 550: 0.03688392788171768
Loss at step 600: 0.03842123597860336
Loss at step 650: 0.031956832855939865
Loss at step 700: 0.0357840359210968
Loss at step 750: 0.037642884999513626
Loss at step 800: 0.031830888241529465
Loss at step 850: 0.03746681660413742
Loss at step 900: 0.04127761349081993
Mean training loss after epoch 205: 0.04121694114329273
EPOCH: 206
Loss at step 0: 0.03936978057026863
Loss at step 50: 0.03137960657477379
Loss at step 100: 0.028490858152508736
Loss at step 150: 0.038255635648965836
Loss at step 200: 0.035686857998371124
Loss at step 250: 0.04872118681669235
Loss at step 300: 0.05316340923309326
Loss at step 350: 0.037066906690597534
Loss at step 400: 0.034964218735694885
Loss at step 450: 0.034100595861673355
Loss at step 500: 0.030275076627731323
Loss at step 550: 0.05340017378330231
Loss at step 600: 0.03532257676124573
Loss at step 650: 0.039942771196365356
Loss at step 700: 0.04474440962076187
Loss at step 750: 0.0567655973136425
Loss at step 800: 0.038071874529123306
Loss at step 850: 0.03440165892243385
Loss at step 900: 0.03704487904906273
Mean training loss after epoch 206: 0.04125204876557723
EPOCH: 207
Loss at step 0: 0.038039278239011765
Loss at step 50: 0.04785553738474846
Loss at step 100: 0.042042892426252365
Loss at step 150: 0.038614820688962936
Loss at step 200: 0.030518142506480217
Loss at step 250: 0.03909027948975563
Loss at step 300: 0.03875470906496048
Loss at step 350: 0.03967742994427681
Loss at step 400: 0.03191816061735153
Loss at step 450: 0.046760812401771545
Loss at step 500: 0.048585861921310425
Loss at step 550: 0.0373445563018322
Loss at step 600: 0.04253246262669563
Loss at step 650: 0.043145034462213516
Loss at step 700: 0.03523382544517517
Loss at step 750: 0.05494747310876846
Loss at step 800: 0.030589915812015533
Loss at step 850: 0.03915799781680107
Loss at step 900: 0.04673401266336441
Mean training loss after epoch 207: 0.04131905022841781
EPOCH: 208
Loss at step 0: 0.04305552691221237
Loss at step 50: 0.034908805042505264
Loss at step 100: 0.03438510000705719
Loss at step 150: 0.05595008656382561
Loss at step 200: 0.036673370748758316
Loss at step 250: 0.061115823686122894
Loss at step 300: 0.03532424941658974
Loss at step 350: 0.03127359226346016
Loss at step 400: 0.036795247346162796
Loss at step 450: 0.03030930832028389
Loss at step 500: 0.047763705253601074
Loss at step 550: 0.03661363571882248
Loss at step 600: 0.05410737171769142
Loss at step 650: 0.06448393315076828
Loss at step 700: 0.04093127325177193
Loss at step 750: 0.05752434581518173
Loss at step 800: 0.03656463697552681
Loss at step 850: 0.03313460201025009
Loss at step 900: 0.038126688450574875
Mean training loss after epoch 208: 0.04117663974351466
EPOCH: 209
Loss at step 0: 0.03493553772568703
Loss at step 50: 0.03722948580980301
Loss at step 100: 0.05398183688521385
Loss at step 150: 0.0524870790541172
Loss at step 200: 0.037257131189107895
Loss at step 250: 0.04640546068549156
Loss at step 300: 0.056868430227041245
Loss at step 350: 0.04795893654227257
Loss at step 400: 0.03179486468434334
Loss at step 450: 0.03765290603041649
Loss at step 500: 0.03589199110865593
Loss at step 550: 0.0380818247795105
Loss at step 600: 0.04972294718027115
Loss at step 650: 0.029078220948576927
Loss at step 700: 0.029039815068244934
Loss at step 750: 0.03712788224220276
Loss at step 800: 0.059975765645504
Loss at step 850: 0.046831708401441574
Loss at step 900: 0.062213387340307236
Mean training loss after epoch 209: 0.04130176584254196
EPOCH: 210
Loss at step 0: 0.03962980955839157
Loss at step 50: 0.03791427984833717
Loss at step 100: 0.03497166559100151
Loss at step 150: 0.031263984739780426
Loss at step 200: 0.051545023918151855
Loss at step 250: 0.03977767378091812
Loss at step 300: 0.04709793999791145
Loss at step 350: 0.05070801451802254
Loss at step 400: 0.05128360167145729
Loss at step 450: 0.02927989326417446
Loss at step 500: 0.04444617033004761
Loss at step 550: 0.03739051893353462
Loss at step 600: 0.05879747122526169
Loss at step 650: 0.053738340735435486
Loss at step 700: 0.033911995589733124
Loss at step 750: 0.04100339114665985
Loss at step 800: 0.04177099093794823
Loss at step 850: 0.05663182586431503
Loss at step 900: 0.06499599665403366
Mean training loss after epoch 210: 0.04155924991527791
EPOCH: 211
Loss at step 0: 0.030548926442861557
Loss at step 50: 0.03719782829284668
Loss at step 100: 0.02642173133790493
Loss at step 150: 0.035055577754974365
Loss at step 200: 0.047963909804821014
Loss at step 250: 0.030586957931518555
Loss at step 300: 0.03406506031751633
Loss at step 350: 0.034562744200229645
Loss at step 400: 0.05038195103406906
Loss at step 450: 0.03493376076221466
Loss at step 500: 0.040189724415540695
Loss at step 550: 0.03963726758956909
Loss at step 600: 0.05514547601342201
Loss at step 650: 0.03667889162898064
Loss at step 700: 0.03222879767417908
Loss at step 750: 0.035416096448898315
Loss at step 800: 0.041518017649650574
Loss at step 850: 0.038832809776067734
Loss at step 900: 0.04651348665356636
Mean training loss after epoch 211: 0.040945966876963814
EPOCH: 212
Loss at step 0: 0.03327246010303497
Loss at step 50: 0.05043778568506241
Loss at step 100: 0.04172353446483612
Loss at step 150: 0.03507307171821594
Loss at step 200: 0.029459193348884583
Loss at step 250: 0.04598444327712059
Loss at step 300: 0.03865790367126465
Loss at step 350: 0.033216699957847595
Loss at step 400: 0.03927202522754669
Loss at step 450: 0.053678590804338455
Loss at step 500: 0.0357988066971302
Loss at step 550: 0.03844377398490906
Loss at step 600: 0.040219828486442566
Loss at step 650: 0.053833648562431335
Loss at step 700: 0.05301403999328613
Loss at step 750: 0.03721046820282936
Loss at step 800: 0.030222177505493164
Loss at step 850: 0.03824635595083237
Loss at step 900: 0.03393498808145523
Mean training loss after epoch 212: 0.04123707259816529
EPOCH: 213
Loss at step 0: 0.05674567446112633
Loss at step 50: 0.03383493423461914
Loss at step 100: 0.034547653049230576
Loss at step 150: 0.042903732508420944
Loss at step 200: 0.03742729872465134
Loss at step 250: 0.04211706295609474
Loss at step 300: 0.03560095280408859
Loss at step 350: 0.03154923394322395
Loss at step 400: 0.03430347889661789
Loss at step 450: 0.035916153341531754
Loss at step 500: 0.04978935420513153
Loss at step 550: 0.02993885800242424
Loss at step 600: 0.0533515140414238
Loss at step 650: 0.04010763391852379
Loss at step 700: 0.03576485812664032
Loss at step 750: 0.03653561323881149
Loss at step 800: 0.034997548907995224
Loss at step 850: 0.056449100375175476
Loss at step 900: 0.0658298134803772
Mean training loss after epoch 213: 0.0411296719149041
EPOCH: 214
Loss at step 0: 0.03312665596604347
Loss at step 50: 0.03246608376502991
Loss at step 100: 0.051583290100097656
Loss at step 150: 0.04974968358874321
Loss at step 200: 0.05574967712163925
Loss at step 250: 0.04247640445828438
Loss at step 300: 0.0370447002351284
Loss at step 350: 0.03886210918426514
Loss at step 400: 0.05328872799873352
Loss at step 450: 0.03179633617401123
Loss at step 500: 0.05542878434062004
Loss at step 550: 0.044399116188287735
Loss at step 600: 0.034311443567276
Loss at step 650: 0.03347017616033554
Loss at step 700: 0.033897753804922104
Loss at step 750: 0.04861360043287277
Loss at step 800: 0.03839123249053955
Loss at step 850: 0.06305564939975739
Loss at step 900: 0.03083120472729206
Mean training loss after epoch 214: 0.041287495432965664
EPOCH: 215
Loss at step 0: 0.03692357987165451
Loss at step 50: 0.03876027837395668
Loss at step 100: 0.038466617465019226
Loss at step 150: 0.03568306565284729
Loss at step 200: 0.032258838415145874
Loss at step 250: 0.033888742327690125
Loss at step 300: 0.031359050422906876
Loss at step 350: 0.0355716198682785
Loss at step 400: 0.036228060722351074
Loss at step 450: 0.034554749727249146
Loss at step 500: 0.052061960101127625
Loss at step 550: 0.03502297401428223
Loss at step 600: 0.03877376392483711
Loss at step 650: 0.07062792778015137
Loss at step 700: 0.03748231753706932
Loss at step 750: 0.048514317721128464
Loss at step 800: 0.03904194012284279
Loss at step 850: 0.042537979781627655
Loss at step 900: 0.03580579534173012
Mean training loss after epoch 215: 0.04120782274665482
EPOCH: 216
Loss at step 0: 0.03698866814374924
Loss at step 50: 0.04032306373119354
Loss at step 100: 0.04292820021510124
Loss at step 150: 0.043672092258930206
Loss at step 200: 0.04129884019494057
Loss at step 250: 0.06413669139146805
Loss at step 300: 0.03333058953285217
Loss at step 350: 0.036104969680309296
Loss at step 400: 0.04874488711357117
Loss at step 450: 0.039387352764606476
Loss at step 500: 0.043956201523542404
Loss at step 550: 0.02769467793405056
Loss at step 600: 0.04720773547887802
Loss at step 650: 0.05882936343550682
Loss at step 700: 0.04028856381773949
Loss at step 750: 0.06633619964122772
Loss at step 800: 0.04738272726535797
Loss at step 850: 0.03966781124472618
Loss at step 900: 0.05900513380765915
Mean training loss after epoch 216: 0.04053880623988569
EPOCH: 217
Loss at step 0: 0.03476103022694588
Loss at step 50: 0.04007076099514961
Loss at step 100: 0.03548238426446915
Loss at step 150: 0.0420597568154335
Loss at step 200: 0.05475492775440216
Loss at step 250: 0.03957177326083183
Loss at step 300: 0.046720318496227264
Loss at step 350: 0.037851348519325256
Loss at step 400: 0.035817135125398636
Loss at step 450: 0.03365528956055641
Loss at step 500: 0.041856925934553146
Loss at step 550: 0.03881705924868584
Loss at step 600: 0.04341663047671318
Loss at step 650: 0.0302233025431633
Loss at step 700: 0.04495518282055855
Loss at step 750: 0.07632829248905182
Loss at step 800: 0.033838625997304916
Loss at step 850: 0.058509018272161484
Loss at step 900: 0.0263651292771101
Mean training loss after epoch 217: 0.04165038813961976
EPOCH: 218
Loss at step 0: 0.03161349147558212
Loss at step 50: 0.03128485754132271
Loss at step 100: 0.04100488871335983
Loss at step 150: 0.029969152063131332
Loss at step 200: 0.04331596568226814
Loss at step 250: 0.03943278640508652
Loss at step 300: 0.033166710287332535
Loss at step 350: 0.029378816485404968
Loss at step 400: 0.0342346727848053
Loss at step 450: 0.036455534398555756
Loss at step 500: 0.05356838181614876
Loss at step 550: 0.03679025173187256
Loss at step 600: 0.03293558210134506
Loss at step 650: 0.03905247524380684
Loss at step 700: 0.049808014184236526
Loss at step 750: 0.05253978446125984
Loss at step 800: 0.04768317937850952
Loss at step 850: 0.03520479053258896
Loss at step 900: 0.05699870362877846
Mean training loss after epoch 218: 0.04153027399770741
EPOCH: 219
Loss at step 0: 0.033535972237586975
Loss at step 50: 0.05234470218420029
Loss at step 100: 0.02671230025589466
Loss at step 150: 0.03986935690045357
Loss at step 200: 0.04880509898066521
Loss at step 250: 0.04045942798256874
Loss at step 300: 0.029259618371725082
Loss at step 350: 0.04278722032904625
Loss at step 400: 0.04514151066541672
Loss at step 450: 0.04280414804816246
Loss at step 500: 0.03450876101851463
Loss at step 550: 0.04778549447655678
Loss at step 600: 0.051221564412117004
Loss at step 650: 0.03971226140856743
Loss at step 700: 0.02804519422352314
Loss at step 750: 0.03532065451145172
Loss at step 800: 0.05204001069068909
Loss at step 850: 0.0395912267267704
Loss at step 900: 0.033726051449775696
Mean training loss after epoch 219: 0.041007235062433714
EPOCH: 220
Loss at step 0: 0.0667114332318306
Loss at step 50: 0.047925014048814774
Loss at step 100: 0.03692636638879776
Loss at step 150: 0.030362550169229507
Loss at step 200: 0.035843972116708755
Loss at step 250: 0.0372978039085865
Loss at step 300: 0.03436199575662613
Loss at step 350: 0.04721400514245033
Loss at step 400: 0.034291476011276245
Loss at step 450: 0.03911425918340683
Loss at step 500: 0.03957908973097801
Loss at step 550: 0.03273768350481987
Loss at step 600: 0.033866237848997116
Loss at step 650: 0.035394810140132904
Loss at step 700: 0.05452428758144379
Loss at step 750: 0.04825645312666893
Loss at step 800: 0.04440128430724144
Loss at step 850: 0.04680107533931732
Loss at step 900: 0.03324844688177109
Mean training loss after epoch 220: 0.04129367575311521
EPOCH: 221
Loss at step 0: 0.032262157648801804
Loss at step 50: 0.038194503635168076
Loss at step 100: 0.07236742973327637
Loss at step 150: 0.06887383759021759
Loss at step 200: 0.04868302121758461
Loss at step 250: 0.03233625367283821
Loss at step 300: 0.04105150327086449
Loss at step 350: 0.029542885720729828
Loss at step 400: 0.03776249662041664
Loss at step 450: 0.03501143306493759
Loss at step 500: 0.03716370835900307
Loss at step 550: 0.03228867053985596
Loss at step 600: 0.034387730062007904
Loss at step 650: 0.05523118004202843
Loss at step 700: 0.034401074051856995
Loss at step 750: 0.04115273803472519
Loss at step 800: 0.03586626797914505
Loss at step 850: 0.04088988155126572
Loss at step 900: 0.03924999013543129
Mean training loss after epoch 221: 0.04116944233174009
EPOCH: 222
Loss at step 0: 0.04622489586472511
Loss at step 50: 0.034178245812654495
Loss at step 100: 0.03427834063768387
Loss at step 150: 0.038221172988414764
Loss at step 200: 0.04681423306465149
Loss at step 250: 0.038031622767448425
Loss at step 300: 0.039281293749809265
Loss at step 350: 0.034416165202856064
Loss at step 400: 0.07188202440738678
Loss at step 450: 0.038608696311712265
Loss at step 500: 0.03562668710947037
Loss at step 550: 0.04744146391749382
Loss at step 600: 0.03219371289014816
Loss at step 650: 0.029797524213790894
Loss at step 700: 0.036422885954380035
Loss at step 750: 0.038234759122133255
Loss at step 800: 0.04142580181360245
Loss at step 850: 0.03657116740942001
Loss at step 900: 0.02977190725505352
Mean training loss after epoch 222: 0.041052178505545996
EPOCH: 223
Loss at step 0: 0.034760016947984695
Loss at step 50: 0.03509802743792534
Loss at step 100: 0.04074249789118767
Loss at step 150: 0.03870717063546181
Loss at step 200: 0.04823608696460724
Loss at step 250: 0.02629156783223152
Loss at step 300: 0.034971971064805984
Loss at step 350: 0.03723495826125145
Loss at step 400: 0.03717128559947014
Loss at step 450: 0.039042744785547256
Loss at step 500: 0.04182994365692139
Loss at step 550: 0.03563202545046806
Loss at step 600: 0.0682126060128212
Loss at step 650: 0.04124102368950844
Loss at step 700: 0.033520348370075226
Loss at step 750: 0.0550982840359211
Loss at step 800: 0.05311400443315506
Loss at step 850: 0.041202362626791
Loss at step 900: 0.07201080769300461
Mean training loss after epoch 223: 0.04102069054291383
EPOCH: 224
Loss at step 0: 0.037563711404800415
Loss at step 50: 0.05556620657444
Loss at step 100: 0.04725373536348343
Loss at step 150: 0.029180817306041718
Loss at step 200: 0.06255805492401123
Loss at step 250: 0.05550690367817879
Loss at step 300: 0.032185718417167664
Loss at step 350: 0.03732905164361
Loss at step 400: 0.03720862418413162
Loss at step 450: 0.05634133890271187
Loss at step 500: 0.037644654512405396
Loss at step 550: 0.041919078677892685
Loss at step 600: 0.04104840010404587
Loss at step 650: 0.04214031249284744
Loss at step 700: 0.03333045169711113
Loss at step 750: 0.035872429609298706
Loss at step 800: 0.055177945643663406
Loss at step 850: 0.06531449407339096
Loss at step 900: 0.04556897282600403
Mean training loss after epoch 224: 0.041014108520898734
EPOCH: 225
Loss at step 0: 0.03936973586678505
Loss at step 50: 0.03350040689110756
Loss at step 100: 0.037846505641937256
Loss at step 150: 0.03458798676729202
Loss at step 200: 0.0477675199508667
Loss at step 250: 0.03837289661169052
Loss at step 300: 0.03477032855153084
Loss at step 350: 0.03714022412896156
Loss at step 400: 0.03415047377347946
Loss at step 450: 0.03395478427410126
Loss at step 500: 0.0354052372276783
Loss at step 550: 0.030104510486125946
Loss at step 600: 0.043730005621910095
Loss at step 650: 0.05497589707374573
Loss at step 700: 0.05407802760601044
Loss at step 750: 0.03253479674458504
Loss at step 800: 0.03486725315451622
Loss at step 850: 0.03352895379066467
Loss at step 900: 0.03306787833571434
Mean training loss after epoch 225: 0.0410375071050071
EPOCH: 226
Loss at step 0: 0.03648871183395386
Loss at step 50: 0.0366818942129612
Loss at step 100: 0.037782009690999985
Loss at step 150: 0.047991879284381866
Loss at step 200: 0.04437065124511719
Loss at step 250: 0.05026853084564209
Loss at step 300: 0.048000041395425797
Loss at step 350: 0.031056242063641548
Loss at step 400: 0.035417258739471436
Loss at step 450: 0.04013531655073166
Loss at step 500: 0.05145081505179405
Loss at step 550: 0.04228727146983147
Loss at step 600: 0.05478085204958916
Loss at step 650: 0.04012715443968773
Loss at step 700: 0.05043533816933632
Loss at step 750: 0.038202911615371704
Loss at step 800: 0.036201126873493195
Loss at step 850: 0.03652939572930336
Loss at step 900: 0.026116954162716866
Mean training loss after epoch 226: 0.04136124605547263
EPOCH: 227
Loss at step 0: 0.04205761104822159
Loss at step 50: 0.04090822488069534
Loss at step 100: 0.043332166969776154
Loss at step 150: 0.05705716460943222
Loss at step 200: 0.03839666768908501
Loss at step 250: 0.029485201463103294
Loss at step 300: 0.056229788810014725
Loss at step 350: 0.03689929470419884
Loss at step 400: 0.030563808977603912
Loss at step 450: 0.04460404813289642
Loss at step 500: 0.03188980743288994
Loss at step 550: 0.029740963131189346
Loss at step 600: 0.035941313952207565
Loss at step 650: 0.05650283768773079
Loss at step 700: 0.04136836528778076
Loss at step 750: 0.05103546008467674
Loss at step 800: 0.049101848155260086
Loss at step 850: 0.04119361191987991
Loss at step 900: 0.034654609858989716
Mean training loss after epoch 227: 0.04193418883263811
EPOCH: 228
Loss at step 0: 0.038860369473695755
Loss at step 50: 0.03692442551255226
Loss at step 100: 0.030994482338428497
Loss at step 150: 0.037316933274269104
Loss at step 200: 0.03066915273666382
Loss at step 250: 0.03554951027035713
Loss at step 300: 0.05594007298350334
Loss at step 350: 0.05369291454553604
Loss at step 400: 0.05064915493130684
Loss at step 450: 0.04152677580714226
Loss at step 500: 0.056227780878543854
Loss at step 550: 0.04867443069815636
Loss at step 600: 0.03939269483089447
Loss at step 650: 0.0381941944360733
Loss at step 700: 0.046426430344581604
Loss at step 750: 0.03176429122686386
Loss at step 800: 0.041528403759002686
Loss at step 850: 0.039497584104537964
Loss at step 900: 0.037410248070955276
Mean training loss after epoch 228: 0.04146860102648293
EPOCH: 229
Loss at step 0: 0.03637472540140152
Loss at step 50: 0.034893423318862915
Loss at step 100: 0.04737719148397446
Loss at step 150: 0.033185504376888275
Loss at step 200: 0.0315871462225914
Loss at step 250: 0.03322930634021759
Loss at step 300: 0.03275240957736969
Loss at step 350: 0.034512899816036224
Loss at step 400: 0.041698455810546875
Loss at step 450: 0.03479326516389847
Loss at step 500: 0.04883550852537155
Loss at step 550: 0.0731632336974144
Loss at step 600: 0.06841642409563065
Loss at step 650: 0.03672165796160698
Loss at step 700: 0.042949128895998
Loss at step 750: 0.03123210184276104
Loss at step 800: 0.03435734286904335
Loss at step 850: 0.029435977339744568
Loss at step 900: 0.04522766172885895
Mean training loss after epoch 229: 0.04051139461063246
EPOCH: 230
Loss at step 0: 0.03913334757089615
Loss at step 50: 0.055999431759119034
Loss at step 100: 0.05603145435452461
Loss at step 150: 0.03465709835290909
Loss at step 200: 0.03714108467102051
Loss at step 250: 0.049302857369184494
Loss at step 300: 0.051496539264917374
Loss at step 350: 0.03816322982311249
Loss at step 400: 0.04686564579606056
Loss at step 450: 0.05239836499094963
Loss at step 500: 0.04643171653151512
Loss at step 550: 0.03184344619512558
Loss at step 600: 0.06275387108325958
Loss at step 650: 0.0721859559416771
Loss at step 700: 0.036288753151893616
Loss at step 750: 0.029158808290958405
Loss at step 800: 0.041492756456136703
Loss at step 850: 0.039844442158937454
Loss at step 900: 0.03264705464243889
Mean training loss after epoch 230: 0.04095347651413509
EPOCH: 231
Loss at step 0: 0.03575587272644043
Loss at step 50: 0.03215237334370613
Loss at step 100: 0.05879320204257965
Loss at step 150: 0.03561503812670708
Loss at step 200: 0.04932316765189171
Loss at step 250: 0.031335704028606415
Loss at step 300: 0.03901561349630356
Loss at step 350: 0.038512203842401505
Loss at step 400: 0.03318621963262558
Loss at step 450: 0.04640994220972061
Loss at step 500: 0.032737333327531815
Loss at step 550: 0.048904407769441605
Loss at step 600: 0.03512667119503021
Loss at step 650: 0.03650322183966637
Loss at step 700: 0.038705259561538696
Loss at step 750: 0.04180947691202164
Loss at step 800: 0.04233972355723381
Loss at step 850: 0.04108935967087746
Loss at step 900: 0.03689875453710556
Mean training loss after epoch 231: 0.04083216685388706
EPOCH: 232
Loss at step 0: 0.04886999726295471
Loss at step 50: 0.05551153048872948
Loss at step 100: 0.03532155975699425
Loss at step 150: 0.038711488246917725
Loss at step 200: 0.03422563523054123
Loss at step 250: 0.0415818989276886
Loss at step 300: 0.052041277289390564
Loss at step 350: 0.031493090093135834
Loss at step 400: 0.041716963052749634
Loss at step 450: 0.045627281069755554
Loss at step 500: 0.03781002387404442
Loss at step 550: 0.04126285761594772
Loss at step 600: 0.055617816746234894
Loss at step 650: 0.030421976000070572
Loss at step 700: 0.05503896623849869
Loss at step 750: 0.04677780345082283
Loss at step 800: 0.05122390016913414
Loss at step 850: 0.05126957967877388
Loss at step 900: 0.038854584097862244
Mean training loss after epoch 232: 0.04083698726237329
EPOCH: 233
Loss at step 0: 0.03805645555257797
Loss at step 50: 0.05763966590166092
Loss at step 100: 0.04127464070916176
Loss at step 150: 0.03656071051955223
Loss at step 200: 0.04233010485768318
Loss at step 250: 0.06715351343154907
Loss at step 300: 0.056858159601688385
Loss at step 350: 0.038870055228471756
Loss at step 400: 0.04549760743975639
Loss at step 450: 0.03370843455195427
Loss at step 500: 0.036580368876457214
Loss at step 550: 0.030094511806964874
Loss at step 600: 0.027368923649191856
Loss at step 650: 0.036164265125989914
Loss at step 700: 0.03536178916692734
Loss at step 750: 0.03123387135565281
Loss at step 800: 0.03417902812361717
Loss at step 850: 0.06495968252420425
Loss at step 900: 0.033987708389759064
Mean training loss after epoch 233: 0.04114598854343647
EPOCH: 234
Loss at step 0: 0.034131698310375214
Loss at step 50: 0.03304879739880562
Loss at step 100: 0.0375090166926384
Loss at step 150: 0.04361898824572563
Loss at step 200: 0.03519846498966217
Loss at step 250: 0.03362388163805008
Loss at step 300: 0.04630090668797493
Loss at step 350: 0.03330394998192787
Loss at step 400: 0.0410035215318203
Loss at step 450: 0.03329356387257576
Loss at step 500: 0.0334615595638752
Loss at step 550: 0.0367317870259285
Loss at step 600: 0.031206030398607254
Loss at step 650: 0.0366726778447628
Loss at step 700: 0.041628316044807434
Loss at step 750: 0.03506317362189293
Loss at step 800: 0.0323176383972168
Loss at step 850: 0.03223805129528046
Loss at step 900: 0.04771678149700165
Mean training loss after epoch 234: 0.0408894696604532
EPOCH: 235
Loss at step 0: 0.036682069301605225
Loss at step 50: 0.03544003888964653
Loss at step 100: 0.03550875186920166
Loss at step 150: 0.038712434470653534
Loss at step 200: 0.0328836590051651
Loss at step 250: 0.03970502316951752
Loss at step 300: 0.033910173922777176
Loss at step 350: 0.037062786519527435
Loss at step 400: 0.07173572480678558
Loss at step 450: 0.059028297662734985
Loss at step 500: 0.033802054822444916
Loss at step 550: 0.03551178053021431
Loss at step 600: 0.050311341881752014
Loss at step 650: 0.02960454858839512
Loss at step 700: 0.044369637966156006
Loss at step 750: 0.03446542099118233
Loss at step 800: 0.026896147057414055
Loss at step 850: 0.0351390466094017
Loss at step 900: 0.03585878759622574
Mean training loss after epoch 235: 0.0412483900519354
EPOCH: 236
Loss at step 0: 0.03225017711520195
Loss at step 50: 0.033339180052280426
Loss at step 100: 0.035093218088150024
Loss at step 150: 0.041725076735019684
Loss at step 200: 0.039611514657735825
Loss at step 250: 0.03811977803707123
Loss at step 300: 0.04116872325539589
Loss at step 350: 0.04522959887981415
Loss at step 400: 0.034504808485507965
Loss at step 450: 0.03280076012015343
Loss at step 500: 0.052272897213697433
Loss at step 550: 0.035520099103450775
Loss at step 600: 0.029213305562734604
Loss at step 650: 0.03851242735981941
Loss at step 700: 0.041083406656980515
Loss at step 750: 0.047109778970479965
Loss at step 800: 0.046369973570108414
Loss at step 850: 0.03564689680933952
Loss at step 900: 0.03542470932006836
Mean training loss after epoch 236: 0.04041676042970818
EPOCH: 237
Loss at step 0: 0.03667047992348671
Loss at step 50: 0.05486058071255684
Loss at step 100: 0.03827936202287674
Loss at step 150: 0.0426441989839077
Loss at step 200: 0.03697700798511505
Loss at step 250: 0.034738317131996155
Loss at step 300: 0.03901257738471031
Loss at step 350: 0.04829375445842743
Loss at step 400: 0.0641462430357933
Loss at step 450: 0.055503036826848984
Loss at step 500: 0.0356808640062809
Loss at step 550: 0.0351221039891243
Loss at step 600: 0.04857305809855461
Loss at step 650: 0.035922374576330185
Loss at step 700: 0.0438312292098999
Loss at step 750: 0.028933558613061905
Loss at step 800: 0.049483731389045715
Loss at step 850: 0.0362832210958004
Loss at step 900: 0.03917879983782768
Mean training loss after epoch 237: 0.04058157685580157
EPOCH: 238
Loss at step 0: 0.05662114545702934
Loss at step 50: 0.07133477181196213
Loss at step 100: 0.05366216599941254
Loss at step 150: 0.05503072589635849
Loss at step 200: 0.0364266000688076
Loss at step 250: 0.03105452097952366
Loss at step 300: 0.0429583378136158
Loss at step 350: 0.03313998878002167
Loss at step 400: 0.0366363525390625
Loss at step 450: 0.02768549509346485
Loss at step 500: 0.03483203426003456
Loss at step 550: 0.04857843741774559
Loss at step 600: 0.03954724594950676
Loss at step 650: 0.035047147423028946
Loss at step 700: 0.0341925173997879
Loss at step 750: 0.03897126764059067
Loss at step 800: 0.041156038641929626
Loss at step 850: 0.042894527316093445
Loss at step 900: 0.040707413107156754
Mean training loss after epoch 238: 0.04040791804808925
EPOCH: 239
Loss at step 0: 0.032015059143304825
Loss at step 50: 0.049798812717199326
Loss at step 100: 0.029131358489394188
Loss at step 150: 0.03209542855620384
Loss at step 200: 0.039810456335544586
Loss at step 250: 0.04038550332188606
Loss at step 300: 0.040614526718854904
Loss at step 350: 0.03396230190992355
Loss at step 400: 0.05594230443239212
Loss at step 450: 0.05109965801239014
Loss at step 500: 0.03540639579296112
Loss at step 550: 0.03735024482011795
Loss at step 600: 0.03928496688604355
Loss at step 650: 0.03327350690960884
Loss at step 700: 0.04737665131688118
Loss at step 750: 0.030339328572154045
Loss at step 800: 0.03868836164474487
Loss at step 850: 0.04888433218002319
Loss at step 900: 0.054284047335386276
Mean training loss after epoch 239: 0.040756947924889354
EPOCH: 240
Loss at step 0: 0.03621514514088631
Loss at step 50: 0.037960972636938095
Loss at step 100: 0.04036441445350647
Loss at step 150: 0.04665497690439224
Loss at step 200: 0.04599844664335251
Loss at step 250: 0.03350859135389328
Loss at step 300: 0.03978145867586136
Loss at step 350: 0.04724019765853882
Loss at step 400: 0.046361032873392105
Loss at step 450: 0.04888570308685303
Loss at step 500: 0.03416692838072777
Loss at step 550: 0.0357830747961998
Loss at step 600: 0.032979223877191544
Loss at step 650: 0.04058676213026047
Loss at step 700: 0.04173377901315689
Loss at step 750: 0.0481884628534317
Loss at step 800: 0.049226146191358566
Loss at step 850: 0.03608490899205208
Loss at step 900: 0.04782750457525253
Mean training loss after epoch 240: 0.04125069013274491
EPOCH: 241
Loss at step 0: 0.02865777723491192
Loss at step 50: 0.030689535662531853
Loss at step 100: 0.03503900393843651
Loss at step 150: 0.043451108038425446
Loss at step 200: 0.03308131545782089
Loss at step 250: 0.03468373790383339
Loss at step 300: 0.04644292965531349
Loss at step 350: 0.043308407068252563
Loss at step 400: 0.05058228597044945
Loss at step 450: 0.0519171804189682
Loss at step 500: 0.04207949712872505
Loss at step 550: 0.04765189439058304
Loss at step 600: 0.0473044328391552
Loss at step 650: 0.042676154524087906
Loss at step 700: 0.032991547137498856
Loss at step 750: 0.03649139404296875
Loss at step 800: 0.06195422634482384
Loss at step 850: 0.03607760742306709
Loss at step 900: 0.03868535906076431
Mean training loss after epoch 241: 0.04128943655345994
EPOCH: 242
Loss at step 0: 0.06794758886098862
Loss at step 50: 0.03913401439785957
Loss at step 100: 0.03793284669518471
Loss at step 150: 0.04773392528295517
Loss at step 200: 0.050653185695409775
Loss at step 250: 0.025860438123345375
Loss at step 300: 0.031997546553611755
Loss at step 350: 0.03752926364541054
Loss at step 400: 0.03538594767451286
Loss at step 450: 0.029974274337291718
Loss at step 500: 0.06059102341532707
Loss at step 550: 0.039873912930488586
Loss at step 600: 0.03409953415393829
Loss at step 650: 0.03079247660934925
Loss at step 700: 0.04495513439178467
Loss at step 750: 0.03563304618000984
Loss at step 800: 0.06306657195091248
Loss at step 850: 0.03369851037859917
Loss at step 900: 0.03801652416586876
Mean training loss after epoch 242: 0.04038986093652591
EPOCH: 243
Loss at step 0: 0.04689265042543411
Loss at step 50: 0.03451554477214813
Loss at step 100: 0.03268691524863243
Loss at step 150: 0.03130830079317093
Loss at step 200: 0.03496095910668373
Loss at step 250: 0.04636874422430992
Loss at step 300: 0.037325453013181686
Loss at step 350: 0.03737280145287514
Loss at step 400: 0.0473397895693779
Loss at step 450: 0.05233558267354965
Loss at step 500: 0.03210145980119705
Loss at step 550: 0.03945494815707207
Loss at step 600: 0.03543688729405403
Loss at step 650: 0.03888096660375595
Loss at step 700: 0.04111270606517792
Loss at step 750: 0.029099611565470695
Loss at step 800: 0.03430042788386345
Loss at step 850: 0.03829982504248619
Loss at step 900: 0.039772406220436096
Mean training loss after epoch 243: 0.040650974677156795
EPOCH: 244
Loss at step 0: 0.03601839020848274
Loss at step 50: 0.03880251199007034
Loss at step 100: 0.03493691235780716
Loss at step 150: 0.02994871512055397
Loss at step 200: 0.032999180257320404
Loss at step 250: 0.03631100431084633
Loss at step 300: 0.033189717680215836
Loss at step 350: 0.03836946189403534
Loss at step 400: 0.0370786115527153
Loss at step 450: 0.033676642924547195
Loss at step 500: 0.0403299480676651
Loss at step 550: 0.03196725621819496
Loss at step 600: 0.04806798696517944
Loss at step 650: 0.04054030403494835
Loss at step 700: 0.03655192628502846
Loss at step 750: 0.042547885328531265
Loss at step 800: 0.03825114667415619
Loss at step 850: 0.03418620303273201
Loss at step 900: 0.031493254005908966
Mean training loss after epoch 244: 0.0404629212424858
EPOCH: 245
Loss at step 0: 0.037501245737075806
Loss at step 50: 0.036022938787937164
Loss at step 100: 0.06744180619716644
Loss at step 150: 0.04214358702301979
Loss at step 200: 0.034945204854011536
Loss at step 250: 0.029832666739821434
Loss at step 300: 0.054127588868141174
Loss at step 350: 0.027778994292020798
Loss at step 400: 0.03888265788555145
Loss at step 450: 0.03413667902350426
Loss at step 500: 0.039878036826848984
Loss at step 550: 0.0383932963013649
Loss at step 600: 0.03586817532777786
Loss at step 650: 0.04729319363832474
Loss at step 700: 0.060726214200258255
Loss at step 750: 0.032855063676834106
Loss at step 800: 0.03362588211894035
Loss at step 850: 0.0399589017033577
Loss at step 900: 0.03648059815168381
Mean training loss after epoch 245: 0.040341950649185096
EPOCH: 246
Loss at step 0: 0.031979892402887344
Loss at step 50: 0.06765910238027573
Loss at step 100: 0.0613054521381855
Loss at step 150: 0.03503995016217232
Loss at step 200: 0.03813648223876953
Loss at step 250: 0.03782317787408829
Loss at step 300: 0.03368839621543884
Loss at step 350: 0.03500468283891678
Loss at step 400: 0.04959193989634514
Loss at step 450: 0.0331171415746212
Loss at step 500: 0.03166046738624573
Loss at step 550: 0.037088554352521896
Loss at step 600: 0.03945896402001381
Loss at step 650: 0.03642991930246353
Loss at step 700: 0.03628809377551079
Loss at step 750: 0.05210085213184357
Loss at step 800: 0.03868551924824715
Loss at step 850: 0.041673678904771805
Loss at step 900: 0.046727459877729416
Mean training loss after epoch 246: 0.04055890298164539
EPOCH: 247
Loss at step 0: 0.050526224076747894
Loss at step 50: 0.03471115231513977
Loss at step 100: 0.02761066146194935
Loss at step 150: 0.03715616837143898
Loss at step 200: 0.06270107626914978
Loss at step 250: 0.036140527576208115
Loss at step 300: 0.03348053991794586
Loss at step 350: 0.04763561859726906
Loss at step 400: 0.04335961118340492
Loss at step 450: 0.03691563382744789
Loss at step 500: 0.03904525935649872
Loss at step 550: 0.04260946810245514
Loss at step 600: 0.049258239567279816
Loss at step 650: 0.03189285472035408
Loss at step 700: 0.0375247485935688
Loss at step 750: 0.03473655879497528
Loss at step 800: 0.03262423351407051
Loss at step 850: 0.046003468334674835
Loss at step 900: 0.0298653244972229
Mean training loss after epoch 247: 0.04082909842401044
EPOCH: 248
Loss at step 0: 0.037177521735429764
Loss at step 50: 0.03851611167192459
Loss at step 100: 0.02362552471458912
Loss at step 150: 0.04929950833320618
Loss at step 200: 0.03481481969356537
Loss at step 250: 0.030810419470071793
Loss at step 300: 0.060464564710855484
Loss at step 350: 0.03210290148854256
Loss at step 400: 0.03180817514657974
Loss at step 450: 0.03282691538333893
Loss at step 500: 0.04492408409714699
Loss at step 550: 0.04005492478609085
Loss at step 600: 0.03255147859454155
Loss at step 650: 0.039729081094264984
Loss at step 700: 0.0332181416451931
Loss at step 750: 0.03630407899618149
Loss at step 800: 0.025422068312764168
Loss at step 850: 0.028221523389220238
Loss at step 900: 0.03473244234919548
Mean training loss after epoch 248: 0.040842697498942614
EPOCH: 249
Loss at step 0: 0.04472397640347481
Loss at step 50: 0.04787271097302437
Loss at step 100: 0.04805320128798485
Loss at step 150: 0.034147415310144424
Loss at step 200: 0.03937199339270592
Loss at step 250: 0.032839335501194
Loss at step 300: 0.030625566840171814
Loss at step 350: 0.06075826659798622
Loss at step 400: 0.030696650967001915
Loss at step 450: 0.043999478220939636
Loss at step 500: 0.05251993238925934
Loss at step 550: 0.03596007078886032
Loss at step 600: 0.03571072220802307
Loss at step 650: 0.03692999854683876
Loss at step 700: 0.05128655582666397
Loss at step 750: 0.03889640048146248
Loss at step 800: 0.0332813523709774
Loss at step 850: 0.05229688435792923
Loss at step 900: 0.03806335851550102
Mean training loss after epoch 249: 0.04073733212088725
EPOCH: 250
Loss at step 0: 0.03640764579176903
Loss at step 50: 0.029529381543397903
Loss at step 100: 0.03215811029076576
Loss at step 150: 0.03508197143673897
Loss at step 200: 0.06036895886063576
Loss at step 250: 0.05026475340127945
Loss at step 300: 0.03762553259730339
Loss at step 350: 0.039728518575429916
Loss at step 400: 0.051400668919086456
Loss at step 450: 0.03622720018029213
Loss at step 500: 0.04798509180545807
Loss at step 550: 0.036011118441820145
Loss at step 600: 0.03086777590215206
Loss at step 650: 0.02947012148797512
Loss at step 700: 0.033220358192920685
Loss at step 750: 0.04179084300994873
Loss at step 800: 0.03159976005554199
Loss at step 850: 0.0499589778482914
Loss at step 900: 0.05065244436264038
Mean training loss after epoch 250: 0.04118522900396954
EPOCH: 251
Loss at step 0: 0.02755182981491089
Loss at step 50: 0.049285098910331726
Loss at step 100: 0.03654266893863678
Loss at step 150: 0.04497296363115311
Loss at step 200: 0.03556593507528305
Loss at step 250: 0.03584232181310654
Loss at step 300: 0.032394710928201675
Loss at step 350: 0.03097252920269966
Loss at step 400: 0.03149956464767456
Loss at step 450: 0.034771617501974106
Loss at step 500: 0.03608296439051628
Loss at step 550: 0.04529433697462082
Loss at step 600: 0.033899255096912384
Loss at step 650: 0.041714031249284744
Loss at step 700: 0.03436104208230972
Loss at step 750: 0.03629880025982857
Loss at step 800: 0.038352809846401215
Loss at step 850: 0.048439860343933105
Loss at step 900: 0.040245577692985535
Mean training loss after epoch 251: 0.04107680348063837
EPOCH: 252
Loss at step 0: 0.04162813723087311
Loss at step 50: 0.03678770735859871
Loss at step 100: 0.036010533571243286
Loss at step 150: 0.05083076283335686
Loss at step 200: 0.04758510738611221
Loss at step 250: 0.04673706740140915
Loss at step 300: 0.03488384932279587
Loss at step 350: 0.04749500751495361
Loss at step 400: 0.03849106281995773
Loss at step 450: 0.0354052409529686
Loss at step 500: 0.032203562557697296
Loss at step 550: 0.046269387006759644
Loss at step 600: 0.03831486403942108
Loss at step 650: 0.04683487117290497
Loss at step 700: 0.03641248121857643
Loss at step 750: 0.03526393696665764
Loss at step 800: 0.039593227207660675
Loss at step 850: 0.037872251123189926
Loss at step 900: 0.03107110597193241
Mean training loss after epoch 252: 0.040780991514417914
EPOCH: 253
Loss at step 0: 0.04510458558797836
Loss at step 50: 0.053014062345027924
Loss at step 100: 0.032169047743082047
Loss at step 150: 0.040137842297554016
Loss at step 200: 0.03011404722929001
Loss at step 250: 0.049014877527952194
Loss at step 300: 0.04769476503133774
Loss at step 350: 0.06785585731267929
Loss at step 400: 0.05150467902421951
Loss at step 450: 0.037310872226953506
Loss at step 500: 0.03545456752181053
Loss at step 550: 0.0527246855199337
Loss at step 600: 0.034180376678705215
Loss at step 650: 0.03249421715736389
Loss at step 700: 0.03964534029364586
Loss at step 750: 0.04362183064222336
Loss at step 800: 0.033075109124183655
Loss at step 850: 0.037886276841163635
Loss at step 900: 0.04572896659374237
Mean training loss after epoch 253: 0.040567862515304004
EPOCH: 254
Loss at step 0: 0.03452959656715393
Loss at step 50: 0.03886321932077408
Loss at step 100: 0.04838445782661438
Loss at step 150: 0.03837645798921585
Loss at step 200: 0.03629623353481293
Loss at step 250: 0.030103985220193863
Loss at step 300: 0.03791821002960205
Loss at step 350: 0.03373400866985321
Loss at step 400: 0.0403708778321743
Loss at step 450: 0.034411486238241196
Loss at step 500: 0.03582341596484184
Loss at step 550: 0.03185276687145233
Loss at step 600: 0.05909956619143486
Loss at step 650: 0.03586168587207794
Loss at step 700: 0.03290408104658127
Loss at step 750: 0.035262443125247955
Loss at step 800: 0.038654476404190063
Loss at step 850: 0.03391736373305321
Loss at step 900: 0.03784594684839249
Mean training loss after epoch 254: 0.04157071075698079
EPOCH: 255
Loss at step 0: 0.036882199347019196
Loss at step 50: 0.03993494436144829
Loss at step 100: 0.04192814603447914
Loss at step 150: 0.05400025099515915
Loss at step 200: 0.036772482097148895
Loss at step 250: 0.03589187189936638
Loss at step 300: 0.041940346360206604
Loss at step 350: 0.03448191285133362
Loss at step 400: 0.029735693708062172
Loss at step 450: 0.033656343817710876
Loss at step 500: 0.055729810148477554
Loss at step 550: 0.03558502718806267
Loss at step 600: 0.03889984264969826
Loss at step 650: 0.049456886947155
Loss at step 700: 0.036641478538513184
Loss at step 750: 0.048241570591926575
Loss at step 800: 0.0453965850174427
Loss at step 850: 0.03907885402441025
Loss at step 900: 0.06074392423033714
Mean training loss after epoch 255: 0.041012863235385305
EPOCH: 256
Loss at step 0: 0.03528229892253876
Loss at step 50: 0.03847229480743408
Loss at step 100: 0.04104534536600113
Loss at step 150: 0.03357851132750511
Loss at step 200: 0.04962961748242378
Loss at step 250: 0.054125647991895676
Loss at step 300: 0.0447879284620285
Loss at step 350: 0.04007836803793907
Loss at step 400: 0.04048490524291992
Loss at step 450: 0.03496857360005379
Loss at step 500: 0.05139027535915375
Loss at step 550: 0.044430509209632874
Loss at step 600: 0.049299247562885284
Loss at step 650: 0.04760231077671051
Loss at step 700: 0.05970483645796776
Loss at step 750: 0.030023494735360146
Loss at step 800: 0.051494043320417404
Loss at step 850: 0.047272972762584686
Loss at step 900: 0.0323207825422287
Mean training loss after epoch 256: 0.041227073287531765
/athenahomes/gabrijel/miniconda3/envs/track-generator/lib/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/athenahomes/gabrijel/miniconda3/envs/track-generator/lib/python3.11/site-packages/torchvision/image.so: undefined symbol: _ZN3c1017RegisterOperatorsD1Ev'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
warn(
Schedule: cosine
Cfg: False
Output path: /scratch/shared/beegfs/gabrijel/m2l/mini
Patch Size: 2
Device: cuda:3
=====================================================================================
Layer (type:depth-idx) Param #
=====================================================================================
DiT 75,264
├─PatchEmbed: 1-1 --
│ └─Conv2d: 2-1 1,920
├─TimestepEmbedder: 1-2 --
│ └─Mlp: 2-2 --
│ │ └─Linear: 3-1 98,688
│ │ └─SiLU: 3-2 --
│ │ └─Linear: 3-3 147,840
├─ModuleList: 1-3 --
│ └─DiTBlock: 2-3 --
│ │ └─LayerNorm: 3-4 --
│ │ └─MultiheadAttention: 3-5 591,360
│ │ └─LayerNorm: 3-6 --
│ │ └─Mlp: 3-7 1,181,568
│ │ └─Sequential: 3-8 887,040
│ └─DiTBlock: 2-4 --
│ │ └─LayerNorm: 3-9 --
│ │ └─MultiheadAttention: 3-10 591,360
│ │ └─LayerNorm: 3-11 --
│ │ └─Mlp: 3-12 1,181,568
│ │ └─Sequential: 3-13 887,040
│ └─DiTBlock: 2-5 --
│ │ └─LayerNorm: 3-14 --
│ │ └─MultiheadAttention: 3-15 591,360
│ │ └─LayerNorm: 3-16 --
│ │ └─Mlp: 3-17 1,181,568
│ │ └─Sequential: 3-18 887,040
│ └─DiTBlock: 2-6 --
│ │ └─LayerNorm: 3-19 --
│ │ └─MultiheadAttention: 3-20 591,360
│ │ └─LayerNorm: 3-21 --
│ │ └─Mlp: 3-22 1,181,568
│ │ └─Sequential: 3-23 887,040
│ └─DiTBlock: 2-7 --
│ │ └─LayerNorm: 3-24 --
│ │ └─MultiheadAttention: 3-25 591,360
│ │ └─LayerNorm: 3-26 --
│ │ └─Mlp: 3-27 1,181,568
│ │ └─Sequential: 3-28 887,040
│ └─DiTBlock: 2-8 --
│ │ └─LayerNorm: 3-29 --
│ │ └─MultiheadAttention: 3-30 591,360
│ │ └─LayerNorm: 3-31 --
│ │ └─Mlp: 3-32 1,181,568
│ │ └─Sequential: 3-33 887,040
├─FinalLayer: 1-4 --
│ └─LayerNorm: 2-9 --
│ └─Linear: 2-10 1,540
│ └─Sequential: 2-11 --
│ │ └─SiLU: 3-34 --
│ │ └─Linear: 3-35 295,680
├─Unpatchify: 1-5 --
=====================================================================================
Total params: 16,580,740
Trainable params: 16,505,476
Non-trainable params: 75,264
=====================================================================================
EPOCH: 1
Loss at step 0: 0.9927976131439209
Loss at step 50: 0.2647687792778015
Loss at step 100: 0.1879013627767563
Loss at step 150: 0.12602929770946503
Loss at step 200: 0.11745952069759369
Loss at step 250: 0.10707128793001175
Loss at step 300: 0.1432727724313736
Loss at step 350: 0.11516840010881424
Loss at step 400: 0.10792354494333267
Loss at step 450: 0.09377487003803253
Loss at step 500: 0.10157567262649536
Loss at step 550: 0.10117540508508682
Loss at step 600: 0.11454605311155319
Loss at step 650: 0.09202694892883301
Loss at step 700: 0.08804260939359665
Loss at step 750: 0.08932791650295258
Loss at step 800: 0.08639156073331833
Loss at step 850: 0.09178578108549118
Loss at step 900: 0.09026865661144257
Mean training loss after epoch 1: 0.14250233578783617
EPOCH: 2
Loss at step 0: 0.08553174138069153
Loss at step 50: 0.0918794572353363
Loss at step 100: 0.07219964265823364
Loss at step 150: 0.07668644934892654
Loss at step 200: 0.08180388808250427
Loss at step 250: 0.07124952226877213
Loss at step 300: 0.08224574476480484
Loss at step 350: 0.08902806788682938
Loss at step 400: 0.0827031284570694
Loss at step 450: 0.07146904617547989
Loss at step 500: 0.07812539488077164
Loss at step 550: 0.06646260619163513
Loss at step 600: 0.06209966167807579
Loss at step 650: 0.07814020663499832
Loss at step 700: 0.058781880885362625
Loss at step 750: 0.051821518689394
Loss at step 800: 0.07093804329633713
Loss at step 850: 0.0550219789147377
Loss at step 900: 0.058788903057575226
Mean training loss after epoch 2: 0.07401431160472603
EPOCH: 3
Loss at step 0: 0.06717473268508911
Loss at step 50: 0.07093988358974457
Loss at step 100: 0.06632817536592484
Loss at step 150: 0.07580439746379852
Loss at step 200: 0.06400690972805023
Loss at step 250: 0.07392463833093643
Loss at step 300: 0.048493415117263794
Loss at step 350: 0.0506267324090004
Loss at step 400: 0.07013332098722458
Loss at step 450: 0.05782090872526169
Loss at step 500: 0.07724333554506302
Loss at step 550: 0.05080138146877289
Loss at step 600: 0.06723678857088089
Loss at step 650: 0.057337675243616104
Loss at step 700: 0.05216258764266968
Loss at step 750: 0.05217692628502846
Loss at step 800: 0.05593561381101608
Loss at step 850: 0.07336627691984177
Loss at step 900: 0.05595633387565613
Mean training loss after epoch 3: 0.06080897097219663
EPOCH: 4
Loss at step 0: 0.04946884885430336
Loss at step 50: 0.054812829941511154
Loss at step 100: 0.06162804737687111
Loss at step 150: 0.05907006561756134
Loss at step 200: 0.04258119687438011
Loss at step 250: 0.06596523523330688
Loss at step 300: 0.049682050943374634
Loss at step 350: 0.05187582969665527
Loss at step 400: 0.045258037745952606
Loss at step 450: 0.050041262060403824
Loss at step 500: 0.06738606095314026
Loss at step 550: 0.057629767805337906
Loss at step 600: 0.05861373245716095
Loss at step 650: 0.054706621915102005
Loss at step 700: 0.046074602752923965
Loss at step 750: 0.04815784841775894
Loss at step 800: 0.054592981934547424
Loss at step 850: 0.06745585799217224
Loss at step 900: 0.04226946830749512
Mean training loss after epoch 4: 0.05705311590198006
EPOCH: 5
Loss at step 0: 0.04679422453045845
Loss at step 50: 0.05274326354265213
Loss at step 100: 0.060322683304548264
Loss at step 150: 0.047847017645835876
Loss at step 200: 0.050057556480169296
Loss at step 250: 0.04592166468501091
Loss at step 300: 0.06508708745241165
Loss at step 350: 0.06251583248376846
Loss at step 400: 0.06060163676738739
Loss at step 450: 0.04539049416780472
Loss at step 500: 0.04933956637978554
Loss at step 550: 0.044366367161273956
Loss at step 600: 0.05056780204176903
Loss at step 650: 0.05007366091012955
Loss at step 700: 0.04736810550093651
Loss at step 750: 0.0458783358335495
Loss at step 800: 0.061146657913923264
Loss at step 850: 0.04192593693733215
Loss at step 900: 0.05574965849518776
Mean training loss after epoch 5: 0.054514974121377666
EPOCH: 6
Loss at step 0: 0.05552354082465172
Loss at step 50: 0.06919590383768082
Loss at step 100: 0.051523979753255844
Loss at step 150: 0.07512925565242767
Loss at step 200: 0.0469990149140358
Loss at step 250: 0.04151985049247742
Loss at step 300: 0.054267555475234985
Loss at step 350: 0.046799663454294205
Loss at step 400: 0.043437469750642776
Loss at step 450: 0.0556630864739418
Loss at step 500: 0.07886315882205963
Loss at step 550: 0.07932259142398834
Loss at step 600: 0.06433942168951035
Loss at step 650: 0.05824227258563042
Loss at step 700: 0.04445641115307808
Loss at step 750: 0.05480702966451645
Loss at step 800: 0.05086228623986244
Loss at step 850: 0.050280384719371796
Loss at step 900: 0.048203449696302414
Mean training loss after epoch 6: 0.05360761121995668
EPOCH: 7
Loss at step 0: 0.05120495334267616
Loss at step 50: 0.041711125522851944
Loss at step 100: 0.03893940523266792
Loss at step 150: 0.043815772980451584
Loss at step 200: 0.04661528393626213
Loss at step 250: 0.04535030573606491
Loss at step 300: 0.052326902747154236
Loss at step 350: 0.04767979308962822
Loss at step 400: 0.05384628102183342
Loss at step 450: 0.03882434964179993
Loss at step 500: 0.05370200797915459
Loss at step 550: 0.05347057804465294
Loss at step 600: 0.04494878277182579
Loss at step 650: 0.04447983577847481
Loss at step 700: 0.04920825734734535
Loss at step 750: 0.057055290788412094
Loss at step 800: 0.10477184504270554
Loss at step 850: 0.043160244822502136
Loss at step 900: 0.04887056723237038
Mean training loss after epoch 7: 0.05228752429599066
EPOCH: 8
Loss at step 0: 0.05011899024248123
Loss at step 50: 0.06245705857872963
Loss at step 100: 0.06631136685609818
Loss at step 150: 0.04895845800638199
Loss at step 200: 0.050268225371837616
Loss at step 250: 0.04723553732037544
Loss at step 300: 0.07536876201629639
Loss at step 350: 0.06294748932123184
Loss at step 400: 0.048326920717954636
Loss at step 450: 0.04759557917714119
Loss at step 500: 0.05410340428352356
Loss at step 550: 0.05210983008146286
Loss at step 600: 0.04399741441011429
Loss at step 650: 0.041283536702394485
Loss at step 700: 0.04357845336198807
Loss at step 750: 0.060434866696596146
Loss at step 800: 0.04585731029510498
Loss at step 850: 0.04889417067170143
Loss at step 900: 0.04579473286867142
Mean training loss after epoch 8: 0.05117087174993334
EPOCH: 9
Loss at step 0: 0.047287292778491974
Loss at step 50: 0.04318024218082428
Loss at step 100: 0.046293068677186966
Loss at step 150: 0.03605443984270096
Loss at step 200: 0.045547377318143845
Loss at step 250: 0.0459781251847744
Loss at step 300: 0.04060017317533493
Loss at step 350: 0.045362140983343124
Loss at step 400: 0.04601544514298439
Loss at step 450: 0.05350184813141823
Loss at step 500: 0.04330907389521599
Loss at step 550: 0.043803706765174866
Loss at step 600: 0.049809280782938004
Loss at step 650: 0.04990199953317642
Loss at step 700: 0.05597102642059326
Loss at step 750: 0.04491954296827316
Loss at step 800: 0.03813697397708893
Loss at step 850: 0.043534938246011734
Loss at step 900: 0.052817512303590775
Mean training loss after epoch 9: 0.050545577774010995
EPOCH: 10
Loss at step 0: 0.06345701217651367
Loss at step 50: 0.04983758181333542
Loss at step 100: 0.058084748685359955
Loss at step 150: 0.057511843740940094
Loss at step 200: 0.04950873926281929
Loss at step 250: 0.044199153780937195
Loss at step 300: 0.049990322440862656
Loss at step 350: 0.042405661195516586
Loss at step 400: 0.0410495363175869
Loss at step 450: 0.050903402268886566
Loss at step 500: 0.03998948261141777
Loss at step 550: 0.04161235690116882
Loss at step 600: 0.04452834278345108
Loss at step 650: 0.04524111747741699
Loss at step 700: 0.05646144971251488
Loss at step 750: 0.039474230259656906
Loss at step 800: 0.042933832854032516
Loss at step 850: 0.04933589696884155
Loss at step 900: 0.036208804696798325
Mean training loss after epoch 10: 0.049290040768047515
EPOCH: 11
Loss at step 0: 0.03956316038966179
Loss at step 50: 0.047969572246074677
Loss at step 100: 0.06317726522684097
Loss at step 150: 0.04703586921095848
Loss at step 200: 0.04931914806365967
Loss at step 250: 0.047175485640764236
Loss at step 300: 0.04173384979367256
Loss at step 350: 0.04106055572628975
Loss at step 400: 0.04758727177977562
Loss at step 450: 0.04685395956039429
Loss at step 500: 0.06105189397931099
Loss at step 550: 0.06041964516043663
Loss at step 600: 0.055400021374225616
Loss at step 650: 0.04655126482248306
Loss at step 700: 0.06059831380844116
Loss at step 750: 0.06167222931981087
Loss at step 800: 0.042501598596572876
Loss at step 850: 0.042257603257894516
Loss at step 900: 0.045965805649757385
Mean training loss after epoch 11: 0.04964323368654259
EPOCH: 12
Loss at step 0: 0.034203123301267624
Loss at step 50: 0.05352773517370224
Loss at step 100: 0.043338775634765625
Loss at step 150: 0.04150288924574852
Loss at step 200: 0.05577273294329643
Loss at step 250: 0.055770426988601685
Loss at step 300: 0.045309942215681076
Loss at step 350: 0.04029889404773712
Loss at step 400: 0.03722796589136124
Loss at step 450: 0.04935304448008537
Loss at step 500: 0.05218170955777168
Loss at step 550: 0.061936669051647186
Loss at step 600: 0.07341843843460083
Loss at step 650: 0.0434122271835804
Loss at step 700: 0.050429198890924454
Loss at step 750: 0.054148439317941666
Loss at step 800: 0.04618339240550995
Loss at step 850: 0.039194535464048386
Loss at step 900: 0.05453008413314819
Mean training loss after epoch 12: 0.04853169766586345
EPOCH: 13
Loss at step 0: 0.05767318978905678
Loss at step 50: 0.04307478666305542
Loss at step 100: 0.039039019495248795
Loss at step 150: 0.050793424248695374
Loss at step 200: 0.05458158627152443
Loss at step 250: 0.0410580076277256
Loss at step 300: 0.03630630671977997
Loss at step 350: 0.04801056906580925
Loss at step 400: 0.038546375930309296
Loss at step 450: 0.04437912628054619
Loss at step 500: 0.04366833716630936
Loss at step 550: 0.07518304139375687
Loss at step 600: 0.04526819288730621
Loss at step 650: 0.07137490063905716
Loss at step 700: 0.056305669248104095
Loss at step 750: 0.05747681111097336
Loss at step 800: 0.05244934558868408
Loss at step 850: 0.04294973239302635
Loss at step 900: 0.03596752509474754
Mean training loss after epoch 13: 0.048368280024718505
EPOCH: 14
Loss at step 0: 0.059222716838121414
Loss at step 50: 0.04363219439983368
Loss at step 100: 0.04555397853255272
Loss at step 150: 0.03750060126185417
Loss at step 200: 0.044108372181653976
Loss at step 250: 0.05813005939126015
Loss at step 300: 0.03639712557196617
Loss at step 350: 0.07586781680583954
Loss at step 400: 0.050042495131492615
Loss at step 450: 0.054030727595090866
Loss at step 500: 0.08736606687307358
Loss at step 550: 0.0347326397895813
Loss at step 600: 0.03835294768214226
Loss at step 650: 0.04757542535662651
Loss at step 700: 0.044310227036476135
Loss at step 750: 0.03899923712015152
Loss at step 800: 0.0389823392033577
Loss at step 850: 0.046352993696928024
Loss at step 900: 0.04369539022445679
Mean training loss after epoch 14: 0.047218455241989095
EPOCH: 15
Loss at step 0: 0.061945103108882904
Loss at step 50: 0.040272314101457596
Loss at step 100: 0.05934322997927666
Loss at step 150: 0.061598729342222214
Loss at step 200: 0.03942662104964256
Loss at step 250: 0.07209756970405579
Loss at step 300: 0.056602660566568375
Loss at step 350: 0.04495527222752571
Loss at step 400: 0.054804615676403046
Loss at step 450: 0.060904551297426224
Loss at step 500: 0.052522361278533936
Loss at step 550: 0.04878659546375275
Loss at step 600: 0.04031701013445854
Loss at step 650: 0.04928752779960632
Loss at step 700: 0.04112618789076805
Loss at step 750: 0.05811849236488342
Loss at step 800: 0.05275645852088928
Loss at step 850: 0.05110972002148628
Loss at step 900: 0.05189656466245651
Mean training loss after epoch 15: 0.04713583987420683
EPOCH: 16
Loss at step 0: 0.05751290172338486
Loss at step 50: 0.05349951237440109
Loss at step 100: 0.061340536922216415
Loss at step 150: 0.04630905017256737
Loss at step 200: 0.040518227964639664
Loss at step 250: 0.04509662091732025
Loss at step 300: 0.054649755358695984
Loss at step 350: 0.04015738144516945
Loss at step 400: 0.05614438280463219
Loss at step 450: 0.04284175485372543
Loss at step 500: 0.0447845458984375
Loss at step 550: 0.04490595683455467
Loss at step 600: 0.045310527086257935
Loss at step 650: 0.04410380497574806
Loss at step 700: 0.038343433290719986
Loss at step 750: 0.05487282946705818
Loss at step 800: 0.050881609320640564
Loss at step 850: 0.05533342808485031
Loss at step 900: 0.07102000713348389
Mean training loss after epoch 16: 0.04699640631524802
EPOCH: 17
Loss at step 0: 0.0580948069691658
Loss at step 50: 0.03580525517463684
Loss at step 100: 0.03325394168496132
Loss at step 150: 0.0388118177652359
Loss at step 200: 0.039655230939388275
Loss at step 250: 0.04357254132628441
Loss at step 300: 0.0388103649020195
Loss at step 350: 0.062123239040374756
Loss at step 400: 0.0393681563436985
Loss at step 450: 0.045537207275629044
Loss at step 500: 0.04149201512336731
Loss at step 550: 0.056296877562999725
Loss at step 600: 0.041269417852163315
Loss at step 650: 0.047324035316705704
Loss at step 700: 0.04247691109776497
Loss at step 750: 0.033218417316675186
Loss at step 800: 0.03631206229329109
Loss at step 850: 0.03608255833387375
Loss at step 900: 0.04036899283528328
Mean training loss after epoch 17: 0.047017275433201014
EPOCH: 18
Loss at step 0: 0.037257082760334015
Loss at step 50: 0.03367447480559349
Loss at step 100: 0.042098648846149445
Loss at step 150: 0.039509519934654236
Loss at step 200: 0.05941949412226677
Loss at step 250: 0.038624610751867294
Loss at step 300: 0.06093718111515045
Loss at step 350: 0.044842202216386795
Loss at step 400: 0.04626288637518883
Loss at step 450: 0.05908516049385071
Loss at step 500: 0.04012873023748398
Loss at step 550: 0.038510892540216446
Loss at step 600: 0.03817109763622284
Loss at step 650: 0.04379972070455551
Loss at step 700: 0.04390494152903557
Loss at step 750: 0.03912144526839256
Loss at step 800: 0.037638112902641296
Loss at step 850: 0.04086960479617119
Loss at step 900: 0.045219577848911285
Mean training loss after epoch 18: 0.04582779359485485
EPOCH: 19
Loss at step 0: 0.041718464344739914
Loss at step 50: 0.041183002293109894
Loss at step 100: 0.04069666936993599
Loss at step 150: 0.037516038864851
Loss at step 200: 0.047058697789907455
Loss at step 250: 0.04376176744699478
Loss at step 300: 0.042468659579753876
Loss at step 350: 0.04458988457918167
Loss at step 400: 0.04017651826143265
Loss at step 450: 0.07378578186035156
Loss at step 500: 0.0400574654340744
Loss at step 550: 0.039062704890966415
Loss at step 600: 0.038819320499897
Loss at step 650: 0.033067211508750916
Loss at step 700: 0.042712289839982986
Loss at step 750: 0.03872883319854736
Loss at step 800: 0.05610249936580658
Loss at step 850: 0.03668671473860741
Loss at step 900: 0.043801337480545044
Mean training loss after epoch 19: 0.04611589505410652
EPOCH: 20
Loss at step 0: 0.04827193170785904
Loss at step 50: 0.05359834060072899
Loss at step 100: 0.04653705283999443
Loss at step 150: 0.05003025382757187
Loss at step 200: 0.04673366993665695
Loss at step 250: 0.040243230760097504
Loss at step 300: 0.03828419744968414
Loss at step 350: 0.04028535634279251
Loss at step 400: 0.04643638804554939
Loss at step 450: 0.03311799094080925
Loss at step 500: 0.04406367987394333
Loss at step 550: 0.03698456659913063
Loss at step 600: 0.058640990406274796
Loss at step 650: 0.040676768869161606
Loss at step 700: 0.0424061119556427
Loss at step 750: 0.05697501823306084
Loss at step 800: 0.03824044018983841
Loss at step 850: 0.066072478890419
Loss at step 900: 0.041484538465738297
Mean training loss after epoch 20: 0.04569254003004479
EPOCH: 21
Loss at step 0: 0.052745524793863297
Loss at step 50: 0.04679976403713226
Loss at step 100: 0.06415614485740662
Loss at step 150: 0.03858758881688118
Loss at step 200: 0.03995379060506821
Loss at step 250: 0.04070864990353584
Loss at step 300: 0.047886040061712265
Loss at step 350: 0.04501399025321007
Loss at step 400: 0.03356078267097473
Loss at step 450: 0.03801573067903519
Loss at step 500: 0.037307173013687134
Loss at step 550: 0.04218755289912224
Loss at step 600: 0.05996202677488327
Loss at step 650: 0.04204526171088219
Loss at step 700: 0.07033933699131012
Loss at step 750: 0.03862342983484268
Loss at step 800: 0.038705822080373764
Loss at step 850: 0.03514530509710312
Loss at step 900: 0.03651607781648636
Mean training loss after epoch 21: 0.045421224570016995
EPOCH: 22
Loss at step 0: 0.05221446231007576
Loss at step 50: 0.04250719025731087
Loss at step 100: 0.05661725997924805
Loss at step 150: 0.03511292487382889
Loss at step 200: 0.040635790675878525
Loss at step 250: 0.04838637635111809
Loss at step 300: 0.0390804298222065
Loss at step 350: 0.0534442774951458
Loss at step 400: 0.038561638444662094
Loss at step 450: 0.06379273533821106
Loss at step 500: 0.035927664488554
Loss at step 550: 0.06136680021882057
Loss at step 600: 0.031272273510694504
Loss at step 650: 0.045360028743743896
Loss at step 700: 0.04633624106645584
Loss at step 750: 0.04046519100666046
Loss at step 800: 0.04161432385444641
Loss at step 850: 0.042055003345012665
Loss at step 900: 0.05448092147707939
Mean training loss after epoch 22: 0.04483226262358651
EPOCH: 23
Loss at step 0: 0.04911420866847038
Loss at step 50: 0.0415814071893692
Loss at step 100: 0.04444144666194916
Loss at step 150: 0.03530682623386383
Loss at step 200: 0.0463300421833992
Loss at step 250: 0.038280509412288666
Loss at step 300: 0.05271584540605545
Loss at step 350: 0.05282475799322128
Loss at step 400: 0.04728353023529053
Loss at step 450: 0.03833460435271263
Loss at step 500: 0.057132475078105927
Loss at step 550: 0.0409892238676548
Loss at step 600: 0.040609076619148254
Loss at step 650: 0.040882013738155365
Loss at step 700: 0.04064009711146355
Loss at step 750: 0.04380909353494644
Loss at step 800: 0.03657805919647217
Loss at step 850: 0.03418336808681488
Loss at step 900: 0.034976694732904434
Mean training loss after epoch 23: 0.04488808057630367
EPOCH: 24
Loss at step 0: 0.048248641192913055
Loss at step 50: 0.05135813355445862
Loss at step 100: 0.050695184618234634
Loss at step 150: 0.05454166978597641
Loss at step 200: 0.040228258818387985
Loss at step 250: 0.045158132910728455
Loss at step 300: 0.03014983795583248
Loss at step 350: 0.030783720314502716
Loss at step 400: 0.05381038039922714
Loss at step 450: 0.03955896943807602
Loss at step 500: 0.057066842913627625
Loss at step 550: 0.036507830023765564
Loss at step 600: 0.07247436791658401
Loss at step 650: 0.0378834530711174
Loss at step 700: 0.0714469701051712
Loss at step 750: 0.040530942380428314
Loss at step 800: 0.04141752049326897
Loss at step 850: 0.034526366740465164
Loss at step 900: 0.03569827228784561
Mean training loss after epoch 24: 0.04518484828600497
EPOCH: 25
Loss at step 0: 0.03326995298266411
Loss at step 50: 0.033842310309410095
Loss at step 100: 0.03736729547381401
Loss at step 150: 0.0668163076043129
Loss at step 200: 0.03536819666624069
Loss at step 250: 0.03721887618303299
Loss at step 300: 0.0470084510743618
Loss at step 350: 0.04588761925697327
Loss at step 400: 0.0489131323993206
Loss at step 450: 0.04456213489174843
Loss at step 500: 0.03591672703623772
Loss at step 550: 0.05648083612322807
Loss at step 600: 0.04321033135056496
Loss at step 650: 0.04505500942468643
Loss at step 700: 0.03846661001443863
Loss at step 750: 0.03818375989794731
Loss at step 800: 0.036762554198503494
Loss at step 850: 0.05255861207842827
Loss at step 900: 0.04515066742897034
Mean training loss after epoch 25: 0.044807881285656874
EPOCH: 26
Loss at step 0: 0.0373607873916626
Loss at step 50: 0.043030742555856705
Loss at step 100: 0.045759208500385284
Loss at step 150: 0.04089202731847763
Loss at step 200: 0.036131978034973145
Loss at step 250: 0.05533413961529732
Loss at step 300: 0.03699621185660362
Loss at step 350: 0.03552429378032684
Loss at step 400: 0.042007893323898315
Loss at step 450: 0.04304102063179016
Loss at step 500: 0.05209362506866455
Loss at step 550: 0.06375249475240707
Loss at step 600: 0.038298461586236954
Loss at step 650: 0.04088416323065758
Loss at step 700: 0.05234269052743912
Loss at step 750: 0.053741395473480225
Loss at step 800: 0.04723315313458443
Loss at step 850: 0.057126112282276154
Loss at step 900: 0.042302802205085754
Mean training loss after epoch 26: 0.04456539857171492
EPOCH: 27
Loss at step 0: 0.04674964025616646
Loss at step 50: 0.04263751581311226
Loss at step 100: 0.03667943552136421
Loss at step 150: 0.040494099259376526
Loss at step 200: 0.0386606901884079
Loss at step 250: 0.040493160486221313
Loss at step 300: 0.03662142530083656
Loss at step 350: 0.03603795915842056
Loss at step 400: 0.037907931953668594
Loss at step 450: 0.05081484094262123
Loss at step 500: 0.0563945472240448
Loss at step 550: 0.0526045560836792
Loss at step 600: 0.06008012220263481
Loss at step 650: 0.051519934087991714
Loss at step 700: 0.05095212161540985
Loss at step 750: 0.038178347051143646
Loss at step 800: 0.04066329076886177
Loss at step 850: 0.038567233830690384
Loss at step 900: 0.052257291972637177
Mean training loss after epoch 27: 0.04509363583584965
EPOCH: 28
Loss at step 0: 0.029913390055298805
Loss at step 50: 0.03645753860473633
Loss at step 100: 0.034360963851213455
Loss at step 150: 0.03566978871822357
Loss at step 200: 0.03904988244175911
Loss at step 250: 0.03659699112176895
Loss at step 300: 0.038922570645809174
Loss at step 350: 0.033944662660360336
Loss at step 400: 0.0383438803255558
Loss at step 450: 0.05149287357926369
Loss at step 500: 0.03719630837440491
Loss at step 550: 0.03484205901622772
Loss at step 600: 0.03765057399868965
Loss at step 650: 0.06854096800088882
Loss at step 700: 0.0372895784676075
Loss at step 750: 0.044852498918771744
Loss at step 800: 0.033244166523218155
Loss at step 850: 0.04573490098118782
Loss at step 900: 0.03735152631998062
Mean training loss after epoch 28: 0.044933726687405286
EPOCH: 29
Loss at step 0: 0.029447682201862335
Loss at step 50: 0.0398627370595932
Loss at step 100: 0.053118184208869934
Loss at step 150: 0.03270377963781357
Loss at step 200: 0.04028083756566048
Loss at step 250: 0.057774025946855545
Loss at step 300: 0.04933173581957817
Loss at step 350: 0.054515670984983444
Loss at step 400: 0.05123170465230942
Loss at step 450: 0.042456742376089096
Loss at step 500: 0.040116313844919205
Loss at step 550: 0.04087410494685173
Loss at step 600: 0.03468229994177818
Loss at step 650: 0.039522793143987656
Loss at step 700: 0.03545898199081421
Loss at step 750: 0.04619523510336876
Loss at step 800: 0.05729640647768974
Loss at step 850: 0.051194336265325546
Loss at step 900: 0.038833290338516235
Mean training loss after epoch 29: 0.04402439670760367
EPOCH: 30
Loss at step 0: 0.04906666278839111
Loss at step 50: 0.04361962527036667
Loss at step 100: 0.04044247418642044
Loss at step 150: 0.05557110905647278
Loss at step 200: 0.03803514316678047
Loss at step 250: 0.04414329677820206
Loss at step 300: 0.04065090790390968
Loss at step 350: 0.03852323070168495
Loss at step 400: 0.03416415676474571
Loss at step 450: 0.05052315816283226
Loss at step 500: 0.056460876017808914
Loss at step 550: 0.041875824332237244
Loss at step 600: 0.039723336696624756
Loss at step 650: 0.03889099508523941
Loss at step 700: 0.0717279314994812
Loss at step 750: 0.05256667360663414
Loss at step 800: 0.04998709261417389
Loss at step 850: 0.03653858229517937
Loss at step 900: 0.06387999653816223
Mean training loss after epoch 30: 0.0444086283079978
EPOCH: 31
Loss at step 0: 0.06570300459861755
Loss at step 50: 0.03289508447051048
Loss at step 100: 0.033931732177734375
Loss at step 150: 0.05701424181461334
Loss at step 200: 0.0524381622672081
Loss at step 250: 0.04885663837194443
Loss at step 300: 0.04105885699391365
Loss at step 350: 0.03617656230926514
Loss at step 400: 0.07422742247581482
Loss at step 450: 0.04127587378025055
Loss at step 500: 0.04956771805882454
Loss at step 550: 0.032986026257276535
Loss at step 600: 0.039893150329589844
Loss at step 650: 0.03447552025318146
Loss at step 700: 0.039409857243299484
Loss at step 750: 0.051296427845954895
Loss at step 800: 0.043854985386133194
Loss at step 850: 0.058668699115514755
Loss at step 900: 0.037739936262369156
Mean training loss after epoch 31: 0.04468885555998412
EPOCH: 32
Loss at step 0: 0.04160969331860542
Loss at step 50: 0.03463306650519371
Loss at step 100: 0.035780616104602814
Loss at step 150: 0.05247391015291214
Loss at step 200: 0.043705083429813385
Loss at step 250: 0.0600837767124176
Loss at step 300: 0.037433646619319916
Loss at step 350: 0.05155890807509422
Loss at step 400: 0.07468236237764359
Loss at step 450: 0.06288442015647888
Loss at step 500: 0.04347878322005272
Loss at step 550: 0.04345081374049187
Loss at step 600: 0.04002153500914574
Loss at step 650: 0.030901411548256874
Loss at step 700: 0.03812825679779053
Loss at step 750: 0.039223719388246536
Loss at step 800: 0.03850940614938736
Loss at step 850: 0.04800470918416977
Loss at step 900: 0.03695946931838989
Mean training loss after epoch 32: 0.04362768248152504
EPOCH: 33
Loss at step 0: 0.07225683331489563
Loss at step 50: 0.039506975561380386
Loss at step 100: 0.04103633388876915
Loss at step 150: 0.036195915192365646
Loss at step 200: 0.034182168543338776
Loss at step 250: 0.052266839891672134
Loss at step 300: 0.0427260585129261
Loss at step 350: 0.039806365966796875
Loss at step 400: 0.046187058091163635
Loss at step 450: 0.035503558814525604
Loss at step 500: 0.0356760174036026
Loss at step 550: 0.03603162616491318
Loss at step 600: 0.0453050471842289
Loss at step 650: 0.04466850683093071
Loss at step 700: 0.03671862930059433
Loss at step 750: 0.032369568943977356
Loss at step 800: 0.039522137492895126
Loss at step 850: 0.05315655469894409
Loss at step 900: 0.034537188708782196
Mean training loss after epoch 33: 0.04295502561551612
EPOCH: 34
Loss at step 0: 0.029321929439902306
Loss at step 50: 0.034134674817323685
Loss at step 100: 0.04984445869922638
Loss at step 150: 0.035927992314100266
Loss at step 200: 0.04887424409389496
Loss at step 250: 0.037316031754016876
Loss at step 300: 0.04465991631150246
Loss at step 350: 0.05477762967348099
Loss at step 400: 0.053357016295194626
Loss at step 450: 0.050971269607543945
Loss at step 500: 0.0499236024916172
Loss at step 550: 0.04749893397092819
Loss at step 600: 0.04637804254889488
Loss at step 650: 0.04684502258896828
Loss at step 700: 0.06652657687664032
Loss at step 750: 0.043038755655288696
Loss at step 800: 0.03603661060333252
Loss at step 850: 0.04426427185535431
Loss at step 900: 0.05061193183064461
Mean training loss after epoch 34: 0.04389402393037195
EPOCH: 35
Loss at step 0: 0.052734531462192535
Loss at step 50: 0.07013734430074692
Loss at step 100: 0.055571962147951126
Loss at step 150: 0.03865116834640503
Loss at step 200: 0.0372077152132988
Loss at step 250: 0.04977886378765106
Loss at step 300: 0.04531254991889
Loss at step 350: 0.05584704130887985
Loss at step 400: 0.0377432182431221
Loss at step 450: 0.04084886610507965
Loss at step 500: 0.07135910540819168
Loss at step 550: 0.03363371267914772
Loss at step 600: 0.03212135657668114
Loss at step 650: 0.03952178731560707
Loss at step 700: 0.041552383452653885
Loss at step 750: 0.03404490277171135
Loss at step 800: 0.03914336860179901
Loss at step 850: 0.043152667582035065
Loss at step 900: 0.03232409805059433
Mean training loss after epoch 35: 0.04302508351819983
EPOCH: 36
Loss at step 0: 0.04851457104086876
Loss at step 50: 0.04562768712639809
Loss at step 100: 0.03531981259584427
Loss at step 150: 0.043214645236730576
Loss at step 200: 0.03744692727923393
Loss at step 250: 0.042286891490221024
Loss at step 300: 0.049430038779973984
Loss at step 350: 0.036436405032873154
Loss at step 400: 0.034777041524648666
Loss at step 450: 0.04153888300061226
Loss at step 500: 0.03553176298737526
Loss at step 550: 0.031961582601070404
Loss at step 600: 0.05224454030394554
Loss at step 650: 0.04388105496764183
Loss at step 700: 0.040361806750297546
Loss at step 750: 0.040422216057777405
Loss at step 800: 0.03775598108768463
Loss at step 850: 0.05734693259000778
Loss at step 900: 0.04077355936169624
Mean training loss after epoch 36: 0.04312853543743142
EPOCH: 37
Loss at step 0: 0.03714748099446297
Loss at step 50: 0.050660282373428345
Loss at step 100: 0.03697645664215088
Loss at step 150: 0.0827973410487175
Loss at step 200: 0.04317590221762657
Loss at step 250: 0.03618314117193222
Loss at step 300: 0.037134528160095215
Loss at step 350: 0.03909546509385109
Loss at step 400: 0.038076251745224
Loss at step 450: 0.04257143288850784
Loss at step 500: 0.03321288153529167
Loss at step 550: 0.03748180717229843
Loss at step 600: 0.06636057794094086
Loss at step 650: 0.03770602494478226
Loss at step 700: 0.08009560406208038
Loss at step 750: 0.03693768009543419
Loss at step 800: 0.03650718182325363
Loss at step 850: 0.04664753004908562
Loss at step 900: 0.03936724364757538
Mean training loss after epoch 37: 0.04336765655941928
EPOCH: 38
Loss at step 0: 0.03465455397963524
Loss at step 50: 0.030911577865481377
Loss at step 100: 0.05364152789115906
Loss at step 150: 0.03758525103330612
Loss at step 200: 0.03997034952044487
Loss at step 250: 0.0362134613096714
Loss at step 300: 0.03510035201907158
Loss at step 350: 0.03568132966756821
Loss at step 400: 0.03927675634622574
Loss at step 450: 0.043984606862068176
Loss at step 500: 0.044523004442453384
Loss at step 550: 0.036992426961660385
Loss at step 600: 0.0432291254401207
Loss at step 650: 0.04972879961133003
Loss at step 700: 0.040408678352832794
Loss at step 750: 0.03769862279295921
Loss at step 800: 0.03676699846982956
Loss at step 850: 0.03363250568509102
Loss at step 900: 0.050196655094623566
Mean training loss after epoch 38: 0.043665299007395054
EPOCH: 39
Loss at step 0: 0.04297603666782379
Loss at step 50: 0.03462286666035652
Loss at step 100: 0.03346511349081993
Loss at step 150: 0.04194847494363785
Loss at step 200: 0.036040548235177994
Loss at step 250: 0.03550048545002937
Loss at step 300: 0.04542850703001022
Loss at step 350: 0.05183808133006096
Loss at step 400: 0.03676900267601013
Loss at step 450: 0.049038663506507874
Loss at step 500: 0.03395945951342583
Loss at step 550: 0.0338163748383522
Loss at step 600: 0.033632803708314896
Loss at step 650: 0.03675498068332672
Loss at step 700: 0.041949450969696045
Loss at step 750: 0.04227084293961525
Loss at step 800: 0.04024386778473854
Loss at step 850: 0.03923187032341957
Loss at step 900: 0.034244369715452194
Mean training loss after epoch 39: 0.04297231583754772
EPOCH: 40
Loss at step 0: 0.030180353671312332
Loss at step 50: 0.049267880618572235
Loss at step 100: 0.03916803002357483
Loss at step 150: 0.056771960109472275
Loss at step 200: 0.039305031299591064
Loss at step 250: 0.039471086114645004
Loss at step 300: 0.03026297129690647
Loss at step 350: 0.0378076434135437
Loss at step 400: 0.055273085832595825
Loss at step 450: 0.04198889806866646
Loss at step 500: 0.05805264040827751
Loss at step 550: 0.04977709427475929
Loss at step 600: 0.0806681215763092
Loss at step 650: 0.08839074522256851
Loss at step 700: 0.041415825486183167
Loss at step 750: 0.0352671816945076
Loss at step 800: 0.03809533268213272
Loss at step 850: 0.03589823096990585
Loss at step 900: 0.03772462159395218
Mean training loss after epoch 40: 0.04321250221781385
EPOCH: 41
Loss at step 0: 0.05648473650217056
Loss at step 50: 0.03883674740791321
Loss at step 100: 0.03668802231550217
Loss at step 150: 0.03581535816192627
Loss at step 200: 0.05020207539200783
Loss at step 250: 0.03667880967259407
Loss at step 300: 0.042587947100400925
Loss at step 350: 0.03846283629536629
Loss at step 400: 0.036278266459703445
Loss at step 450: 0.039367277175188065
Loss at step 500: 0.03347210958600044
Loss at step 550: 0.042285192757844925
Loss at step 600: 0.03368893265724182
Loss at step 650: 0.03954387828707695
Loss at step 700: 0.04458700865507126
Loss at step 750: 0.04066156595945358
Loss at step 800: 0.036489538848400116
Loss at step 850: 0.03715793415904045
Loss at step 900: 0.06632266938686371
Mean training loss after epoch 41: 0.042843369641966784
EPOCH: 42
Loss at step 0: 0.049074843525886536
Loss at step 50: 0.031039627268910408
Loss at step 100: 0.0409608855843544
Loss at step 150: 0.0379287414252758
Loss at step 200: 0.03141274303197861
Loss at step 250: 0.03718740865588188
Loss at step 300: 0.04370876029133797
Loss at step 350: 0.02973066456615925
Loss at step 400: 0.03484489768743515
Loss at step 450: 0.05998479947447777
Loss at step 500: 0.04298393055796623
Loss at step 550: 0.0414208248257637
Loss at step 600: 0.05147707462310791
Loss at step 650: 0.06301922351121902
Loss at step 700: 0.04647999256849289
Loss at step 750: 0.0380011722445488
Loss at step 800: 0.03523740544915199
Loss at step 850: 0.07007824629545212
Loss at step 900: 0.04118829593062401
Mean training loss after epoch 42: 0.042861232980847486
EPOCH: 43
Loss at step 0: 0.028600996360182762
Loss at step 50: 0.03958134725689888
Loss at step 100: 0.0379943922162056
Loss at step 150: 0.0403524711728096
Loss at step 200: 0.038039691746234894
Loss at step 250: 0.05107349529862404
Loss at step 300: 0.04073113203048706
Loss at step 350: 0.043610718101263046
Loss at step 400: 0.07114291191101074
Loss at step 450: 0.04253702610731125
Loss at step 500: 0.05893601104617119
Loss at step 550: 0.0509442500770092
Loss at step 600: 0.044168759137392044
Loss at step 650: 0.054990626871585846
Loss at step 700: 0.042283836752176285
Loss at step 750: 0.03912976384162903
Loss at step 800: 0.051697466522455215
Loss at step 850: 0.05673801526427269
Loss at step 900: 0.035775937139987946
Mean training loss after epoch 43: 0.04222319512836524
EPOCH: 44
Loss at step 0: 0.037387020885944366
Loss at step 50: 0.03670916706323624
Loss at step 100: 0.03850468993186951
Loss at step 150: 0.04527543485164642
Loss at step 200: 0.04270525649189949
Loss at step 250: 0.0406525693833828
Loss at step 300: 0.0392395555973053
Loss at step 350: 0.03703998774290085
Loss at step 400: 0.04435901716351509
Loss at step 450: 0.030216777697205544
Loss at step 500: 0.03501291945576668
Loss at step 550: 0.05483980476856232
Loss at step 600: 0.03645576909184456
Loss at step 650: 0.03959187492728233
Loss at step 700: 0.035912465304136276
Loss at step 750: 0.033680666238069534
Loss at step 800: 0.032813236117362976
Loss at step 850: 0.03100864589214325
Loss at step 900: 0.038795165717601776
Mean training loss after epoch 44: 0.04228815817431029
EPOCH: 45
Loss at step 0: 0.05061938613653183
Loss at step 50: 0.06159723922610283
Loss at step 100: 0.04928106442093849
Loss at step 150: 0.04391185939311981
Loss at step 200: 0.03841090202331543
Loss at step 250: 0.03663986548781395
Loss at step 300: 0.033025361597537994
Loss at step 350: 0.05331796035170555
Loss at step 400: 0.03757380694150925
Loss at step 450: 0.0336463637650013
Loss at step 500: 0.03897778317332268
Loss at step 550: 0.03989659994840622
Loss at step 600: 0.038776081055402756
Loss at step 650: 0.05733334645628929
Loss at step 700: 0.03490510582923889
Loss at step 750: 0.039473410695791245
Loss at step 800: 0.05830595642328262
Loss at step 850: 0.05670051649212837
Loss at step 900: 0.040327806025743484
Mean training loss after epoch 45: 0.042787098425077094
EPOCH: 46
Loss at step 0: 0.036934878677129745
Loss at step 50: 0.033364858478307724
Loss at step 100: 0.03347126394510269
Loss at step 150: 0.041521187871694565
Loss at step 200: 0.0378275141119957
Loss at step 250: 0.03971440717577934
Loss at step 300: 0.03849451243877411
Loss at step 350: 0.030927833169698715
Loss at step 400: 0.03408275172114372
Loss at step 450: 0.04135040193796158
Loss at step 500: 0.03571862354874611
Loss at step 550: 0.039906665682792664
Loss at step 600: 0.03439576178789139
Loss at step 650: 0.052114088088274
Loss at step 700: 0.04131833836436272
Loss at step 750: 0.03150162473320961
Loss at step 800: 0.053609561175107956
Loss at step 850: 0.07118377834558487
Loss at step 900: 0.038397371768951416
Mean training loss after epoch 46: 0.0424582164174617
EPOCH: 47
Loss at step 0: 0.0376482829451561
Loss at step 50: 0.033311713486909866
Loss at step 100: 0.046062029898166656
Loss at step 150: 0.07413475960493088
Loss at step 200: 0.06404237449169159
Loss at step 250: 0.04188799113035202
Loss at step 300: 0.044554203748703
Loss at step 350: 0.03766021877527237
Loss at step 400: 0.05552181974053383
Loss at step 450: 0.037229984998703
Loss at step 500: 0.03611749783158302
Loss at step 550: 0.03990129381418228
Loss at step 600: 0.04216839373111725
Loss at step 650: 0.0565626285970211
Loss at step 700: 0.05230112373828888
Loss at step 750: 0.03784063085913658
Loss at step 800: 0.03894263133406639
Loss at step 850: 0.033128369599580765
Loss at step 900: 0.030786553397774696
Mean training loss after epoch 47: 0.042213420318102025
EPOCH: 48
Loss at step 0: 0.06113245338201523
Loss at step 50: 0.038732144981622696
Loss at step 100: 0.04146707057952881
Loss at step 150: 0.05363892391324043
Loss at step 200: 0.04317339509725571
Loss at step 250: 0.04006329923868179
Loss at step 300: 0.038389842957258224
Loss at step 350: 0.04166923090815544
Loss at step 400: 0.050251998007297516
Loss at step 450: 0.03234206885099411
Loss at step 500: 0.05430510640144348
Loss at step 550: 0.05468246340751648
Loss at step 600: 0.03641614317893982
Loss at step 650: 0.0358758419752121
Loss at step 700: 0.04045901075005531
Loss at step 750: 0.02902292087674141
Loss at step 800: 0.037241075187921524
Loss at step 850: 0.0750131905078888
Loss at step 900: 0.0332866832613945
Mean training loss after epoch 48: 0.04252630450538417
EPOCH: 49
Loss at step 0: 0.04021422937512398
Loss at step 50: 0.03641494736075401
Loss at step 100: 0.038768261671066284
Loss at step 150: 0.04296419024467468
Loss at step 200: 0.036955609917640686
Loss at step 250: 0.040370337665081024
Loss at step 300: 0.03228134289383888
Loss at step 350: 0.048863768577575684
Loss at step 400: 0.03823704645037651
Loss at step 450: 0.05663494020700455
Loss at step 500: 0.03468574583530426
Loss at step 550: 0.04946057125926018
Loss at step 600: 0.04328424111008644
Loss at step 650: 0.0349988117814064
Loss at step 700: 0.04406754672527313
Loss at step 750: 0.037463266402482986
Loss at step 800: 0.04004407674074173
Loss at step 850: 0.03687194362282753
Loss at step 900: 0.03602185472846031
Mean training loss after epoch 49: 0.042539564672230024
EPOCH: 50
Loss at step 0: 0.03601893410086632
Loss at step 50: 0.03488951548933983
Loss at step 100: 0.04303381219506264
Loss at step 150: 0.03584350645542145
Loss at step 200: 0.03999720513820648
Loss at step 250: 0.037075553089380264
Loss at step 300: 0.04014355316758156
Loss at step 350: 0.03799928352236748
Loss at step 400: 0.03307428956031799
Loss at step 450: 0.03780652582645416
Loss at step 500: 0.04645475372672081
Loss at step 550: 0.035657599568367004
Loss at step 600: 0.0703330934047699
Loss at step 650: 0.0441756397485733
Loss at step 700: 0.03868475556373596
Loss at step 750: 0.036150313913822174
Loss at step 800: 0.05835910141468048
Loss at step 850: 0.035950593650341034
Loss at step 900: 0.033544495701789856
Mean training loss after epoch 50: 0.04233735294412893
EPOCH: 51
Loss at step 0: 0.03536150977015495
Loss at step 50: 0.05265863612294197
Loss at step 100: 0.053060974925756454
Loss at step 150: 0.03569833189249039
Loss at step 200: 0.03769461810588837
Loss at step 250: 0.03149081766605377
Loss at step 300: 0.03774293139576912
Loss at step 350: 0.03972206637263298
Loss at step 400: 0.03773874789476395
Loss at step 450: 0.040503572672605515
Loss at step 500: 0.04576042294502258
Loss at step 550: 0.0350419245660305
Loss at step 600: 0.04863131046295166
Loss at step 650: 0.05377759039402008
Loss at step 700: 0.03654267638921738
Loss at step 750: 0.032249435782432556
Loss at step 800: 0.04548071324825287
Loss at step 850: 0.05506904795765877
Loss at step 900: 0.03483176603913307
Mean training loss after epoch 51: 0.04182767730031504
EPOCH: 52
Loss at step 0: 0.035643722862005234
Loss at step 50: 0.035993482917547226
Loss at step 100: 0.039494823664426804
Loss at step 150: 0.03567099943757057
Loss at step 200: 0.0390964038670063
Loss at step 250: 0.035363804548978806
Loss at step 300: 0.04085114598274231
Loss at step 350: 0.03391866385936737
Loss at step 400: 0.048653654754161835
Loss at step 450: 0.03442519158124924
Loss at step 500: 0.035034891217947006
Loss at step 550: 0.05182870849967003
Loss at step 600: 0.038956109434366226
Loss at step 650: 0.04642537236213684
Loss at step 700: 0.03629075363278389
Loss at step 750: 0.03320210427045822
Loss at step 800: 0.03982256352901459
Loss at step 850: 0.05380026251077652
Loss at step 900: 0.04151134565472603
Mean training loss after epoch 52: 0.041800402694626024
EPOCH: 53
Loss at step 0: 0.05759120360016823
Loss at step 50: 0.0414610281586647
Loss at step 100: 0.03528021648526192
Loss at step 150: 0.058663200587034225
Loss at step 200: 0.03417353704571724
Loss at step 250: 0.05214337259531021
Loss at step 300: 0.030978785827755928
Loss at step 350: 0.03184428811073303
Loss at step 400: 0.04949423298239708
Loss at step 450: 0.046923208981752396
Loss at step 500: 0.04200230911374092
Loss at step 550: 0.05963723734021187
Loss at step 600: 0.048732444643974304
Loss at step 650: 0.03525372967123985
Loss at step 700: 0.042062558233737946
Loss at step 750: 0.0699642226099968
Loss at step 800: 0.03157135844230652
Loss at step 850: 0.03461334854364395
Loss at step 900: 0.03986768424510956
Mean training loss after epoch 53: 0.04206207239710446
EPOCH: 54
Loss at step 0: 0.043047960847616196
Loss at step 50: 0.050478626042604446
Loss at step 100: 0.03833431750535965
Loss at step 150: 0.037333905696868896
Loss at step 200: 0.07576791942119598
Loss at step 250: 0.041819144040346146
Loss at step 300: 0.05899069085717201
Loss at step 350: 0.038387831300497055
Loss at step 400: 0.04553873464465141
Loss at step 450: 0.03330902010202408
Loss at step 500: 0.03797665983438492
Loss at step 550: 0.05493336543440819
Loss at step 600: 0.038320817053318024
Loss at step 650: 0.03387630730867386
Loss at step 700: 0.03648149594664574
Loss at step 750: 0.05657254159450531
Loss at step 800: 0.0527743361890316
Loss at step 850: 0.031990937888622284
Loss at step 900: 0.05744639039039612
Mean training loss after epoch 54: 0.042270664306385305
EPOCH: 55
Loss at step 0: 0.04634654521942139
Loss at step 50: 0.038860127329826355
Loss at step 100: 0.0369553305208683
Loss at step 150: 0.04601672664284706
Loss at step 200: 0.026475802063941956
Loss at step 250: 0.040818262845277786
Loss at step 300: 0.050957221537828445
Loss at step 350: 0.05213375389575958
Loss at step 400: 0.03895549848675728
Loss at step 450: 0.039853379130363464
Loss at step 500: 0.0456225723028183
Loss at step 550: 0.03201737627387047
Loss at step 600: 0.04900722950696945
Loss at step 650: 0.033515963703393936
Loss at step 700: 0.045205309987068176
Loss at step 750: 0.05664574354887009
Loss at step 800: 0.05195131152868271
Loss at step 850: 0.03943021968007088
Loss at step 900: 0.051934707909822464
Mean training loss after epoch 55: 0.041760075203915524
EPOCH: 56
Loss at step 0: 0.035669680684804916
Loss at step 50: 0.03731066361069679
Loss at step 100: 0.03839431330561638
Loss at step 150: 0.04437437653541565
Loss at step 200: 0.03604341298341751
Loss at step 250: 0.03787315636873245
Loss at step 300: 0.04129105433821678
Loss at step 350: 0.0681724026799202
Loss at step 400: 0.037422459572553635
Loss at step 450: 0.04147394374012947
Loss at step 500: 0.030557721853256226
Loss at step 550: 0.05729537457227707
Loss at step 600: 0.044497594237327576
Loss at step 650: 0.04281434789299965
Loss at step 700: 0.03463006392121315
Loss at step 750: 0.03795117139816284
Loss at step 800: 0.03903064504265785
Loss at step 850: 0.05226927250623703
Loss at step 900: 0.044425107538700104
Mean training loss after epoch 56: 0.04198346188518284
EPOCH: 57
Loss at step 0: 0.040313441306352615
Loss at step 50: 0.04098132625222206
Loss at step 100: 0.037053175270557404
Loss at step 150: 0.03720230981707573
Loss at step 200: 0.03237070143222809
Loss at step 250: 0.042227283120155334
Loss at step 300: 0.05192343145608902
Loss at step 350: 0.03451365604996681
Loss at step 400: 0.03476078435778618
Loss at step 450: 0.03759030997753143
Loss at step 500: 0.03982206806540489
Loss at step 550: 0.04919624701142311
Loss at step 600: 0.028733564540743828
Loss at step 650: 0.036303646862506866
Loss at step 700: 0.040104515850543976
Loss at step 750: 0.040087904781103134
Loss at step 800: 0.0365641824901104
Loss at step 850: 0.03147405758500099
Loss at step 900: 0.04074550420045853
Mean training loss after epoch 57: 0.0418718279198384
EPOCH: 58
Loss at step 0: 0.027126234024763107
Loss at step 50: 0.03955811262130737
Loss at step 100: 0.03852409869432449
Loss at step 150: 0.04588547348976135
Loss at step 200: 0.0327930673956871
Loss at step 250: 0.03407374769449234
Loss at step 300: 0.0346791073679924
Loss at step 350: 0.03419278562068939
Loss at step 400: 0.04033670201897621
Loss at step 450: 0.037295371294021606
Loss at step 500: 0.036423176527023315
Loss at step 550: 0.03983192518353462
Loss at step 600: 0.03562218323349953
Loss at step 650: 0.057547878473997116
Loss at step 700: 0.04399918019771576
Loss at step 750: 0.06756962835788727
Loss at step 800: 0.0535319484770298
Loss at step 850: 0.040021199733018875
Loss at step 900: 0.05363428220152855
Mean training loss after epoch 58: 0.04227230368035117
EPOCH: 59
Loss at step 0: 0.054008278995752335
Loss at step 50: 0.029929330572485924
Loss at step 100: 0.043072547763586044
Loss at step 150: 0.04712403565645218
Loss at step 200: 0.04172234237194061
Loss at step 250: 0.0380144938826561
Loss at step 300: 0.06645963340997696
Loss at step 350: 0.03476366773247719
Loss at step 400: 0.044849783182144165
Loss at step 450: 0.03969757631421089
Loss at step 500: 0.041757795959711075
Loss at step 550: 0.03557545319199562
Loss at step 600: 0.03668488934636116
Loss at step 650: 0.034634947776794434
Loss at step 700: 0.036369070410728455
Loss at step 750: 0.027953891083598137
Loss at step 800: 0.05794013664126396
Loss at step 850: 0.0434873141348362
Loss at step 900: 0.04062079265713692
Mean training loss after epoch 59: 0.04175872626557533
EPOCH: 60
Loss at step 0: 0.05230647698044777
Loss at step 50: 0.034911252558231354
Loss at step 100: 0.038966234773397446
Loss at step 150: 0.04930701106786728
Loss at step 200: 0.039878372102975845
Loss at step 250: 0.032641004770994186
Loss at step 300: 0.05420990288257599
Loss at step 350: 0.035432182252407074
Loss at step 400: 0.047906797379255295
Loss at step 450: 0.03237815201282501
Loss at step 500: 0.05809643119573593
Loss at step 550: 0.03982365503907204
Loss at step 600: 0.03338718041777611
Loss at step 650: 0.06101324036717415
Loss at step 700: 0.05832071974873543
Loss at step 750: 0.05672917887568474
Loss at step 800: 0.035312261432409286
Loss at step 850: 0.03269599750638008
Loss at step 900: 0.03432564437389374
Mean training loss after epoch 60: 0.04145304879336469
EPOCH: 61
Loss at step 0: 0.05218600109219551
Loss at step 50: 0.028676575049757957
Loss at step 100: 0.05183856189250946
Loss at step 150: 0.04828957840800285
Loss at step 200: 0.03797846660017967
Loss at step 250: 0.04018789157271385
Loss at step 300: 0.029013371095061302
Loss at step 350: 0.046787358820438385
Loss at step 400: 0.03374910354614258
Loss at step 450: 0.04779788479208946
Loss at step 500: 0.03813735023140907
Loss at step 550: 0.06082974001765251
Loss at step 600: 0.031995780766010284
Loss at step 650: 0.036440473049879074
Loss at step 700: 0.03242507576942444
Loss at step 750: 0.047881174832582474
Loss at step 800: 0.03955503925681114
Loss at step 850: 0.039465587586164474
Loss at step 900: 0.032889991998672485
Mean training loss after epoch 61: 0.04225412909505464
EPOCH: 62
Loss at step 0: 0.07850071787834167
Loss at step 50: 0.03590470924973488
Loss at step 100: 0.03401835262775421
Loss at step 150: 0.03575047105550766
Loss at step 200: 0.03225778415799141
Loss at step 250: 0.03880147635936737
Loss at step 300: 0.03870739787817001
Loss at step 350: 0.042368412017822266
Loss at step 400: 0.03864947706460953
Loss at step 450: 0.03193790838122368
Loss at step 500: 0.043149128556251526
Loss at step 550: 0.03357163816690445
Loss at step 600: 0.043366439640522
Loss at step 650: 0.04025929793715477
Loss at step 700: 0.035577934235334396
Loss at step 750: 0.03176751732826233
Loss at step 800: 0.030962109565734863
Loss at step 850: 0.03757867589592934
Loss at step 900: 0.03229587897658348
Mean training loss after epoch 62: 0.04198516900144787
EPOCH: 63
Loss at step 0: 0.057671867311000824
Loss at step 50: 0.04842713847756386
Loss at step 100: 0.030468273907899857
Loss at step 150: 0.05220678821206093
Loss at step 200: 0.04170433431863785
Loss at step 250: 0.05136094242334366
Loss at step 300: 0.04169003665447235
Loss at step 350: 0.031596481800079346
Loss at step 400: 0.04273003712296486
Loss at step 450: 0.061871133744716644
Loss at step 500: 0.04741637408733368
Loss at step 550: 0.03889886289834976
Loss at step 600: 0.0323534719645977
Loss at step 650: 0.03431367874145508
Loss at step 700: 0.03811480104923248
Loss at step 750: 0.03434113413095474
Loss at step 800: 0.049015481024980545
Loss at step 850: 0.04040595889091492
Loss at step 900: 0.03862114995718002
Mean training loss after epoch 63: 0.04131053804334547
EPOCH: 64
Loss at step 0: 0.030019661411643028
Loss at step 50: 0.03455144912004471
Loss at step 100: 0.04964865371584892
Loss at step 150: 0.03217058628797531
Loss at step 200: 0.03193427622318268
Loss at step 250: 0.03496089577674866
Loss at step 300: 0.036626748740673065
Loss at step 350: 0.0424736887216568
Loss at step 400: 0.03943726047873497
Loss at step 450: 0.040541987866163254
Loss at step 500: 0.03838001936674118
Loss at step 550: 0.03629530221223831
Loss at step 600: 0.033054422587156296
Loss at step 650: 0.0398416668176651
Loss at step 700: 0.050053808838129044
Loss at step 750: 0.035665612667798996
Loss at step 800: 0.03748640790581703
Loss at step 850: 0.04931091517210007
Loss at step 900: 0.035654835402965546
Mean training loss after epoch 64: 0.041265397145947035
EPOCH: 65
Loss at step 0: 0.03760325163602829
Loss at step 50: 0.03528781980276108
Loss at step 100: 0.0328512042760849
Loss at step 150: 0.03513569384813309
Loss at step 200: 0.032193057239055634
Loss at step 250: 0.038936797529459
Loss at step 300: 0.03411776199936867
Loss at step 350: 0.050534818321466446
Loss at step 400: 0.05266318842768669
Loss at step 450: 0.03857654333114624
Loss at step 500: 0.03783871978521347
Loss at step 550: 0.03197762742638588
Loss at step 600: 0.03530753776431084
Loss at step 650: 0.038086168467998505
Loss at step 700: 0.03254755958914757
Loss at step 750: 0.033772069960832596
Loss at step 800: 0.03325922414660454
Loss at step 850: 0.037996433675289154
Loss at step 900: 0.044859953224658966
Mean training loss after epoch 65: 0.041470230835031215
EPOCH: 66
Loss at step 0: 0.039558980613946915
Loss at step 50: 0.03798419609665871
Loss at step 100: 0.0470040962100029
Loss at step 150: 0.03549093380570412
Loss at step 200: 0.06594263017177582
Loss at step 250: 0.04572553560137749
Loss at step 300: 0.05195481330156326
Loss at step 350: 0.03912719711661339
Loss at step 400: 0.033564552664756775
Loss at step 450: 0.06788386404514313
Loss at step 500: 0.04538029804825783
Loss at step 550: 0.0429832860827446
Loss at step 600: 0.044070933014154434
Loss at step 650: 0.03674892336130142
Loss at step 700: 0.04859331250190735
Loss at step 750: 0.04238688573241234
Loss at step 800: 0.05128232762217522
Loss at step 850: 0.026888985186815262
Loss at step 900: 0.054580822587013245
Mean training loss after epoch 66: 0.04168821637159281
EPOCH: 67
Loss at step 0: 0.048112139105796814
Loss at step 50: 0.03597401827573776
Loss at step 100: 0.040139153599739075
Loss at step 150: 0.03657975420355797
Loss at step 200: 0.03313577175140381
Loss at step 250: 0.03261137753725052
Loss at step 300: 0.032174672931432724
Loss at step 350: 0.03411630168557167
Loss at step 400: 0.05067075043916702
Loss at step 450: 0.03755392134189606
Loss at step 500: 0.049025941640138626
Loss at step 550: 0.053921084851026535
Loss at step 600: 0.03517812862992287
Loss at step 650: 0.03399652615189552
Loss at step 700: 0.04195648804306984
Loss at step 750: 0.05345770716667175
Loss at step 800: 0.03349097818136215
Loss at step 850: 0.0440969280898571
Loss at step 900: 0.03051835112273693
Mean training loss after epoch 67: 0.04172010423698977
EPOCH: 68
Loss at step 0: 0.029953550547361374
Loss at step 50: 0.04670592024922371
Loss at step 100: 0.035323210060596466
Loss at step 150: 0.046807896345853806
Loss at step 200: 0.062253061681985855
Loss at step 250: 0.03276786580681801
Loss at step 300: 0.033763352781534195
Loss at step 350: 0.02836601622402668
Loss at step 400: 0.03322148323059082
Loss at step 450: 0.02729177102446556
Loss at step 500: 0.04363562911748886
Loss at step 550: 0.04207039624452591
Loss at step 600: 0.04303210973739624
Loss at step 650: 0.04108305275440216
Loss at step 700: 0.033733610063791275
Loss at step 750: 0.03041158616542816
Loss at step 800: 0.03126723691821098
Loss at step 850: 0.049652907997369766
Loss at step 900: 0.04426473379135132
Mean training loss after epoch 68: 0.04149038618855448
EPOCH: 69
Loss at step 0: 0.05051280930638313
Loss at step 50: 0.032365623861551285
Loss at step 100: 0.035237330943346024
Loss at step 150: 0.03612693399190903
Loss at step 200: 0.03822920098900795
Loss at step 250: 0.029838262125849724
Loss at step 300: 0.038925401866436005
Loss at step 350: 0.03142131119966507
Loss at step 400: 0.0644800141453743
Loss at step 450: 0.03479154035449028
Loss at step 500: 0.05621056631207466
Loss at step 550: 0.035777267068624496
Loss at step 600: 0.03171798586845398
Loss at step 650: 0.03767917677760124
Loss at step 700: 0.03505399078130722
Loss at step 750: 0.035311244428157806
Loss at step 800: 0.03133586049079895
Loss at step 850: 0.03778800740838051
Loss at step 900: 0.03941011801362038
Mean training loss after epoch 69: 0.04125585565879655
EPOCH: 70
Loss at step 0: 0.033629123121500015
Loss at step 50: 0.047444071620702744
Loss at step 100: 0.05690629407763481
Loss at step 150: 0.036379892379045486
Loss at step 200: 0.03223971277475357
Loss at step 250: 0.03425416722893715
Loss at step 300: 0.05382617935538292
Loss at step 350: 0.03412552550435066
Loss at step 400: 0.03555070981383324
Loss at step 450: 0.03165304288268089
Loss at step 500: 0.056805454194545746
Loss at step 550: 0.04862965643405914
Loss at step 600: 0.045455142855644226
Loss at step 650: 0.04065563902258873
Loss at step 700: 0.034020159393548965
Loss at step 750: 0.03897823393344879
Loss at step 800: 0.04187474772334099
Loss at step 850: 0.03641607239842415
Loss at step 900: 0.04018361493945122
Mean training loss after epoch 70: 0.04117715101379321
EPOCH: 71
Loss at step 0: 0.031739939004182816
Loss at step 50: 0.03575426712632179
Loss at step 100: 0.041028402745723724
Loss at step 150: 0.03342348709702492
Loss at step 200: 0.03431496396660805
Loss at step 250: 0.03221401944756508
Loss at step 300: 0.048587020486593246
Loss at step 350: 0.03845015913248062
Loss at step 400: 0.04822957515716553
Loss at step 450: 0.031151946634054184
Loss at step 500: 0.033823247998952866
Loss at step 550: 0.032001834362745285
Loss at step 600: 0.04520208016037941
Loss at step 650: 0.03050629422068596
Loss at step 700: 0.05655837431550026
Loss at step 750: 0.04253324121236801
Loss at step 800: 0.05259395018219948
Loss at step 850: 0.039520565420389175
Loss at step 900: 0.052242618054151535
Mean training loss after epoch 71: 0.04180061946243747
EPOCH: 72
Loss at step 0: 0.03948841243982315
Loss at step 50: 0.053918808698654175
Loss at step 100: 0.03918600454926491
Loss at step 150: 0.05136851593852043
Loss at step 200: 0.03424490615725517
Loss at step 250: 0.05601086467504501
Loss at step 300: 0.03227918967604637
Loss at step 350: 0.05336814001202583
Loss at step 400: 0.050874777138233185
Loss at step 450: 0.037637416273355484
Loss at step 500: 0.02950294502079487
Loss at step 550: 0.04061894118785858
Loss at step 600: 0.030604930594563484
Loss at step 650: 0.051947418600320816
Loss at step 700: 0.03727438300848007
Loss at step 750: 0.03555244579911232
Loss at step 800: 0.043268777430057526
Loss at step 850: 0.03692298382520676
Loss at step 900: 0.048946987837553024
Mean training loss after epoch 72: 0.04081341141521105
EPOCH: 73
Loss at step 0: 0.02806745283305645
Loss at step 50: 0.029986565932631493
Loss at step 100: 0.030133534222841263
Loss at step 150: 0.04718175157904625
Loss at step 200: 0.07402471452951431
Loss at step 250: 0.038135360926389694
Loss at step 300: 0.037517957389354706
Loss at step 350: 0.039786506444215775
Loss at step 400: 0.04700249806046486
Loss at step 450: 0.03678031638264656
Loss at step 500: 0.03160851448774338
Loss at step 550: 0.033725056797266006
Loss at step 600: 0.0418279692530632
Loss at step 650: 0.040318895131349564
Loss at step 700: 0.03881711885333061
Loss at step 750: 0.035916246473789215
Loss at step 800: 0.04123677313327789
Loss at step 850: 0.04058227315545082
Loss at step 900: 0.03485511615872383
Mean training loss after epoch 73: 0.04174466865824293
EPOCH: 74
Loss at step 0: 0.02899175137281418
Loss at step 50: 0.03824091702699661
Loss at step 100: 0.038165558129549026
Loss at step 150: 0.03265064209699631
Loss at step 200: 0.032838623970746994
Loss at step 250: 0.03530019149184227
Loss at step 300: 0.03534851595759392
Loss at step 350: 0.04351358860731125
Loss at step 400: 0.04078420624136925
Loss at step 450: 0.0450337752699852
Loss at step 500: 0.03650393709540367
Loss at step 550: 0.05287676304578781
Loss at step 600: 0.0422467403113842
Loss at step 650: 0.037526313215494156
Loss at step 700: 0.03549263998866081
Loss at step 750: 0.032682713121175766
Loss at step 800: 0.03373374044895172
Loss at step 850: 0.03157420828938484
Loss at step 900: 0.03660622984170914
Mean training loss after epoch 74: 0.04113919925548311
EPOCH: 75
Loss at step 0: 0.030831068754196167
Loss at step 50: 0.03200463950634003
Loss at step 100: 0.03574752062559128
Loss at step 150: 0.051985789090394974
Loss at step 200: 0.02577095478773117
Loss at step 250: 0.03551045060157776
Loss at step 300: 0.036199092864990234
Loss at step 350: 0.031102705746889114
Loss at step 400: 0.03810339793562889
Loss at step 450: 0.05419360101222992
Loss at step 500: 0.05693572759628296
Loss at step 550: 0.03602773696184158
Loss at step 600: 0.03415317088365555
Loss at step 650: 0.04872572049498558
Loss at step 700: 0.039838407188653946
Loss at step 750: 0.03620055317878723
Loss at step 800: 0.04124224931001663
Loss at step 850: 0.03605523332953453
Loss at step 900: 0.05318422615528107
Mean training loss after epoch 75: 0.04112100785872194
EPOCH: 76
Loss at step 0: 0.05268241837620735
Loss at step 50: 0.038748666644096375
Loss at step 100: 0.05225672572851181
Loss at step 150: 0.02982509881258011
Loss at step 200: 0.037055786699056625
Loss at step 250: 0.036166444420814514
Loss at step 300: 0.042469609528779984
Loss at step 350: 0.03820810094475746
Loss at step 400: 0.036943428218364716
Loss at step 450: 0.04263583943247795
Loss at step 500: 0.042788028717041016
Loss at step 550: 0.052858270704746246
Loss at step 600: 0.030450038611888885
Loss at step 650: 0.04485831782221794
Loss at step 700: 0.06034138798713684
Loss at step 750: 0.032158125191926956
Loss at step 800: 0.03894932195544243
Loss at step 850: 0.05779428780078888
Loss at step 900: 0.0441526398062706
Mean training loss after epoch 76: 0.04154504369944334
EPOCH: 77
Loss at step 0: 0.039356961846351624
Loss at step 50: 0.028416398912668228
Loss at step 100: 0.03396177664399147
Loss at step 150: 0.04099806770682335
Loss at step 200: 0.06283032149076462
Loss at step 250: 0.04138881340622902
Loss at step 300: 0.061035893857479095
Loss at step 350: 0.04593011736869812
Loss at step 400: 0.05210472643375397
Loss at step 450: 0.03505970537662506
Loss at step 500: 0.032422494143247604
Loss at step 550: 0.05127855762839317
Loss at step 600: 0.036935120820999146
Loss at step 650: 0.03441636264324188
Loss at step 700: 0.03484315797686577
Loss at step 750: 0.03388001397252083
Loss at step 800: 0.037996839731931686
Loss at step 850: 0.03551885113120079
Loss at step 900: 0.033726323395967484
Mean training loss after epoch 77: 0.04113060351747122
EPOCH: 78
Loss at step 0: 0.032344914972782135
Loss at step 50: 0.035145945847034454
Loss at step 100: 0.03198443725705147
Loss at step 150: 0.05172690376639366
Loss at step 200: 0.04912606626749039
Loss at step 250: 0.03330213204026222
Loss at step 300: 0.04923785477876663
Loss at step 350: 0.04205572232604027
Loss at step 400: 0.03545050695538521
Loss at step 450: 0.04081381857395172
Loss at step 500: 0.04776203632354736
Loss at step 550: 0.03815259784460068
Loss at step 600: 0.05272816866636276
Loss at step 650: 0.032447852194309235
Loss at step 700: 0.0346088781952858
Loss at step 750: 0.03316102921962738
Loss at step 800: 0.03658919408917427
Loss at step 850: 0.03636911138892174
Loss at step 900: 0.04945993423461914
Mean training loss after epoch 78: 0.041068576987205285
EPOCH: 79
Loss at step 0: 0.03452589362859726
Loss at step 50: 0.034600645303726196
Loss at step 100: 0.06494107842445374
Loss at step 150: 0.036988429725170135
Loss at step 200: 0.027170222252607346
Loss at step 250: 0.052600547671318054
Loss at step 300: 0.0349196158349514
Loss at step 350: 0.031728651374578476
Loss at step 400: 0.032518353313207626
Loss at step 450: 0.03552037850022316
Loss at step 500: 0.03851528465747833
Loss at step 550: 0.03267049416899681
Loss at step 600: 0.02810988575220108
Loss at step 650: 0.0534193180501461
Loss at step 700: 0.02989153005182743
Loss at step 750: 0.04602940008044243
Loss at step 800: 0.03656817600131035
Loss at step 850: 0.045206040143966675
Loss at step 900: 0.03326031565666199
Mean training loss after epoch 79: 0.0412763856264002
EPOCH: 80
Loss at step 0: 0.03640799969434738
Loss at step 50: 0.03698626905679703
Loss at step 100: 0.036901723593473434
Loss at step 150: 0.02941797859966755
Loss at step 200: 0.05399346724152565
Loss at step 250: 0.038527294993400574
Loss at step 300: 0.03790619969367981
Loss at step 350: 0.03662220761179924
Loss at step 400: 0.034307632595300674
Loss at step 450: 0.04109351709485054
Loss at step 500: 0.03656264394521713
Loss at step 550: 0.033020373433828354
Loss at step 600: 0.031626660376787186
Loss at step 650: 0.037163589149713516
Loss at step 700: 0.03343282267451286
Loss at step 750: 0.03530639782547951
Loss at step 800: 0.07811196148395538
Loss at step 850: 0.037117309868335724
Loss at step 900: 0.036749448627233505
Mean training loss after epoch 80: 0.04114491054252076
EPOCH: 81
Loss at step 0: 0.05479753762483597
Loss at step 50: 0.03132958337664604
Loss at step 100: 0.037393227219581604
Loss at step 150: 0.03205867111682892
Loss at step 200: 0.07083668559789658
Loss at step 250: 0.04586269333958626
Loss at step 300: 0.060223255306482315
Loss at step 350: 0.03582262247800827
Loss at step 400: 0.033127203583717346
Loss at step 450: 0.02675713784992695
Loss at step 500: 0.035183269530534744
Loss at step 550: 0.034161537885665894
Loss at step 600: 0.03505062684416771
Loss at step 650: 0.05540237948298454
Loss at step 700: 0.03116615302860737
Loss at step 750: 0.028590409085154533
Loss at step 800: 0.03489331901073456
Loss at step 850: 0.03400343284010887
Loss at step 900: 0.06202857568860054
Mean training loss after epoch 81: 0.04102237092684517
EPOCH: 82
Loss at step 0: 0.03725624457001686
Loss at step 50: 0.03783930093050003
Loss at step 100: 0.03537292778491974
Loss at step 150: 0.04294422268867493
Loss at step 200: 0.03996248543262482
Loss at step 250: 0.039540305733680725
Loss at step 300: 0.033119600266218185
Loss at step 350: 0.048681121319532394
Loss at step 400: 0.05702071264386177
Loss at step 450: 0.0402117483317852
Loss at step 500: 0.040380872786045074
Loss at step 550: 0.05610261484980583
Loss at step 600: 0.035563670098781586
Loss at step 650: 0.03173322603106499
Loss at step 700: 0.03283904865384102
Loss at step 750: 0.04451049864292145
Loss at step 800: 0.03426874428987503
Loss at step 850: 0.04587911069393158
Loss at step 900: 0.039216335862874985
Mean training loss after epoch 82: 0.041730244616185554
EPOCH: 83
Loss at step 0: 0.04744863510131836
Loss at step 50: 0.03610475733876228
Loss at step 100: 0.035557087510824203
Loss at step 150: 0.03861255571246147
Loss at step 200: 0.030603276565670967
Loss at step 250: 0.04871303215622902
Loss at step 300: 0.07224379479885101
Loss at step 350: 0.03971085324883461
Loss at step 400: 0.03391997143626213
Loss at step 450: 0.044155556708574295
Loss at step 500: 0.034668054431676865
Loss at step 550: 0.05175238102674484
Loss at step 600: 0.049534719437360764
Loss at step 650: 0.03303040564060211
Loss at step 700: 0.029844136908650398
Loss at step 750: 0.04341820254921913
Loss at step 800: 0.03215205669403076
Loss at step 850: 0.03988838940858841
Loss at step 900: 0.05346183106303215
Mean training loss after epoch 83: 0.041038075031073235
EPOCH: 84
Loss at step 0: 0.0528486967086792
Loss at step 50: 0.051950205117464066
Loss at step 100: 0.040812257677316666
Loss at step 150: 0.033961568027734756
Loss at step 200: 0.048738449811935425
Loss at step 250: 0.0713721364736557
Loss at step 300: 0.031749922782182693
Loss at step 350: 0.0486324168741703
Loss at step 400: 0.04092682898044586
Loss at step 450: 0.045225370675325394
Loss at step 500: 0.03676753491163254
Loss at step 550: 0.033339500427246094
Loss at step 600: 0.04189130291342735
Loss at step 650: 0.045919470489025116
Loss at step 700: 0.03852545842528343
Loss at step 750: 0.05126319080591202
Loss at step 800: 0.052123866975307465
Loss at step 850: 0.03563039004802704
Loss at step 900: 0.032997358590364456
Mean training loss after epoch 84: 0.041086105345837724
EPOCH: 85
Loss at step 0: 0.058152444660663605
Loss at step 50: 0.038068030029535294
Loss at step 100: 0.03797232359647751
Loss at step 150: 0.04052959382534027
Loss at step 200: 0.050743792206048965
Loss at step 250: 0.03145845606923103
Loss at step 300: 0.03338661044836044
Loss at step 350: 0.03677058219909668
Loss at step 400: 0.03495321795344353
Loss at step 450: 0.039931200444698334
Loss at step 500: 0.03238952159881592
Loss at step 550: 0.02847830392420292
Loss at step 600: 0.05000064894556999
Loss at step 650: 0.0393374003469944
Loss at step 700: 0.03686242923140526
Loss at step 750: 0.04062030091881752
Loss at step 800: 0.0347592793405056
Loss at step 850: 0.031245408579707146
Loss at step 900: 0.038717370480298996
Mean training loss after epoch 85: 0.04107632083909662
EPOCH: 86
Loss at step 0: 0.03973394259810448
Loss at step 50: 0.030584797263145447
Loss at step 100: 0.06259417533874512
Loss at step 150: 0.03153851255774498
Loss at step 200: 0.055176712572574615
Loss at step 250: 0.038454148918390274
Loss at step 300: 0.04394076392054558
Loss at step 350: 0.03507055342197418
Loss at step 400: 0.03341430053114891
Loss at step 450: 0.04914618283510208
Loss at step 500: 0.04736623913049698
Loss at step 550: 0.03985471650958061
Loss at step 600: 0.03954179957509041
Loss at step 650: 0.03194885700941086
Loss at step 700: 0.035082027316093445
Loss at step 750: 0.0482264868915081
Loss at step 800: 0.03496174141764641
Loss at step 850: 0.03850436955690384
Loss at step 900: 0.057177163660526276
Mean training loss after epoch 86: 0.04068803318194362
EPOCH: 87
Loss at step 0: 0.04399581253528595
Loss at step 50: 0.03488341346383095
Loss at step 100: 0.03428840637207031
Loss at step 150: 0.04180179163813591
Loss at step 200: 0.03338738903403282
Loss at step 250: 0.03731721267104149
Loss at step 300: 0.03757598251104355
Loss at step 350: 0.034594327211380005
Loss at step 400: 0.036474548280239105
Loss at step 450: 0.04442448914051056
Loss at step 500: 0.027858780696988106
Loss at step 550: 0.05683402344584465
Loss at step 600: 0.03499235212802887
Loss at step 650: 0.05560460314154625
Loss at step 700: 0.036094002425670624
Loss at step 750: 0.046759847551584244
Loss at step 800: 0.04085097834467888
Loss at step 850: 0.027367902919650078
Loss at step 900: 0.034118689596652985
Mean training loss after epoch 87: 0.04076247038975009
EPOCH: 88
Loss at step 0: 0.05531320720911026
Loss at step 50: 0.03014991246163845
Loss at step 100: 0.05170894414186478
Loss at step 150: 0.04032305255532265
Loss at step 200: 0.03658030927181244
Loss at step 250: 0.030780110508203506
Loss at step 300: 0.034393101930618286
Loss at step 350: 0.03057672642171383
Loss at step 400: 0.04050949215888977
Loss at step 450: 0.0360785648226738
Loss at step 500: 0.03719230368733406
Loss at step 550: 0.03799004480242729
Loss at step 600: 0.03628097102046013
Loss at step 650: 0.03763250261545181
Loss at step 700: 0.036424919962882996
Loss at step 750: 0.04473429173231125
Loss at step 800: 0.04693128913640976
Loss at step 850: 0.04265372082591057
Loss at step 900: 0.031387049704790115
Mean training loss after epoch 88: 0.04099947016146074
EPOCH: 89
Loss at step 0: 0.034374941140413284
Loss at step 50: 0.0402178056538105
Loss at step 100: 0.03843294084072113
Loss at step 150: 0.036413002759218216
Loss at step 200: 0.039501357823610306
Loss at step 250: 0.029510509222745895
Loss at step 300: 0.036525655537843704
Loss at step 350: 0.04213511198759079
Loss at step 400: 0.037092261016368866
Loss at step 450: 0.03379799425601959
Loss at step 500: 0.03295033797621727
Loss at step 550: 0.031381383538246155
Loss at step 600: 0.0407857745885849
Loss at step 650: 0.03752834349870682
Loss at step 700: 0.03810695558786392
Loss at step 750: 0.03579120337963104
Loss at step 800: 0.0520140565931797
Loss at step 850: 0.038909491151571274
Loss at step 900: 0.03678249195218086
Mean training loss after epoch 89: 0.040811247470329944
EPOCH: 90
Loss at step 0: 0.04095301404595375
Loss at step 50: 0.05082303658127785
Loss at step 100: 0.04138334468007088
Loss at step 150: 0.03449038416147232
Loss at step 200: 0.038363419473171234
Loss at step 250: 0.04477785900235176
Loss at step 300: 0.03956672176718712
Loss at step 350: 0.04098595306277275
Loss at step 400: 0.03981344401836395
Loss at step 450: 0.06300228089094162
Loss at step 500: 0.051460955291986465
Loss at step 550: 0.03568987548351288
Loss at step 600: 0.039645079523324966
Loss at step 650: 0.037908464670181274
Loss at step 700: 0.03197165206074715
Loss at step 750: 0.031707290560007095
Loss at step 800: 0.049262162297964096
Loss at step 850: 0.0344785712659359
Loss at step 900: 0.034114595502614975
Mean training loss after epoch 90: 0.040405405840195065
EPOCH: 91
Loss at step 0: 0.034299690276384354
Loss at step 50: 0.049692295491695404
Loss at step 100: 0.04133080318570137
Loss at step 150: 0.03282630443572998
Loss at step 200: 0.03252938389778137
Loss at step 250: 0.05053752288222313
Loss at step 300: 0.0312495119869709
Loss at step 350: 0.03907566890120506
Loss at step 400: 0.0547638013958931
Loss at step 450: 0.036120086908340454
Loss at step 500: 0.04640704020857811
Loss at step 550: 0.038280077278614044
Loss at step 600: 0.031406719237565994
Loss at step 650: 0.034109827131032944
Loss at step 700: 0.037395015358924866
Loss at step 750: 0.03358292579650879
Loss at step 800: 0.04851951450109482
Loss at step 850: 0.03339794650673866
Loss at step 900: 0.035246942192316055
Mean training loss after epoch 91: 0.04033133036085665
EPOCH: 92
Loss at step 0: 0.032269254326820374
Loss at step 50: 0.031161149963736534
Loss at step 100: 0.060024768114089966
Loss at step 150: 0.05082898959517479
Loss at step 200: 0.050425995141267776
Loss at step 250: 0.04740491509437561
Loss at step 300: 0.054153922945261
Loss at step 350: 0.03820345178246498
Loss at step 400: 0.03505898267030716
Loss at step 450: 0.03790952265262604
Loss at step 500: 0.033213060349226
Loss at step 550: 0.03766785189509392
Loss at step 600: 0.03966356813907623
Loss at step 650: 0.03682432696223259
Loss at step 700: 0.03281429782509804
Loss at step 750: 0.04580431431531906
Loss at step 800: 0.04174647107720375
Loss at step 850: 0.052905965596437454
Loss at step 900: 0.03535788506269455
Mean training loss after epoch 92: 0.04086645406438534
EPOCH: 93
Loss at step 0: 0.031763963401317596
Loss at step 50: 0.04610699787735939
Loss at step 100: 0.03144437074661255
Loss at step 150: 0.036727018654346466
Loss at step 200: 0.029585840180516243
Loss at step 250: 0.042516931891441345
Loss at step 300: 0.04100366309285164
Loss at step 350: 0.0635860487818718
Loss at step 400: 0.03528447076678276
Loss at step 450: 0.04262498766183853
Loss at step 500: 0.04326464980840683
Loss at step 550: 0.06723412871360779
Loss at step 600: 0.03085038810968399
Loss at step 650: 0.03192776069045067
Loss at step 700: 0.03921186178922653
Loss at step 750: 0.03242463245987892
Loss at step 800: 0.054762352257966995
Loss at step 850: 0.035244882106781006
Loss at step 900: 0.03946463763713837
Mean training loss after epoch 93: 0.04065965882329735
EPOCH: 94
Loss at step 0: 0.04889944940805435
Loss at step 50: 0.03690377622842789
Loss at step 100: 0.03591502830386162
Loss at step 150: 0.052436813712120056
Loss at step 200: 0.034925758838653564
Loss at step 250: 0.0353705957531929
Loss at step 300: 0.061986085027456284
Loss at step 350: 0.04875928536057472
Loss at step 400: 0.040921784937381744
Loss at step 450: 0.03871002793312073
Loss at step 500: 0.03833552077412605
Loss at step 550: 0.036446262151002884
Loss at step 600: 0.039174746721982956
Loss at step 650: 0.03723873943090439
Loss at step 700: 0.03249559924006462
Loss at step 750: 0.033896006643772125
Loss at step 800: 0.03666757792234421
Loss at step 850: 0.03137712925672531
Loss at step 900: 0.051495201885700226
Mean training loss after epoch 94: 0.04074181405418336
EPOCH: 95
Loss at step 0: 0.0352119505405426
Loss at step 50: 0.03640684485435486
Loss at step 100: 0.032651618123054504
Loss at step 150: 0.04401347041130066
Loss at step 200: 0.040212228894233704
Loss at step 250: 0.07343073934316635
Loss at step 300: 0.03445030748844147
Loss at step 350: 0.05565089359879494
Loss at step 400: 0.03517284616827965
Loss at step 450: 0.03394188731908798
Loss at step 500: 0.032530587166547775
Loss at step 550: 0.05683201923966408
Loss at step 600: 0.03782414272427559
Loss at step 650: 0.03312551975250244
Loss at step 700: 0.03550421819090843
Loss at step 750: 0.051708586513996124
Loss at step 800: 0.04553850367665291
Loss at step 850: 0.05390772596001625
Loss at step 900: 0.029858194291591644
Mean training loss after epoch 95: 0.040436128432403746
EPOCH: 96
Loss at step 0: 0.03860674425959587
Loss at step 50: 0.04928284138441086
Loss at step 100: 0.030112681910395622
Loss at step 150: 0.030009303241968155
Loss at step 200: 0.03227289393544197
Loss at step 250: 0.035471826791763306
Loss at step 300: 0.045503050088882446
Loss at step 350: 0.03439325466752052
Loss at step 400: 0.03252046927809715
Loss at step 450: 0.041864316910505295
Loss at step 500: 0.06249577924609184
Loss at step 550: 0.032730989158153534
Loss at step 600: 0.050127070397138596
Loss at step 650: 0.05265422537922859
Loss at step 700: 0.033094629645347595
Loss at step 750: 0.03569718822836876
Loss at step 800: 0.03472054749727249
Loss at step 850: 0.048938579857349396
Loss at step 900: 0.04076386243104935
Mean training loss after epoch 96: 0.0409784188537773
EPOCH: 97
Loss at step 0: 0.039326950907707214
Loss at step 50: 0.032597895711660385
Loss at step 100: 0.06816505640745163
Loss at step 150: 0.03475397452712059
Loss at step 200: 0.05419183522462845
Loss at step 250: 0.0331057570874691
Loss at step 300: 0.04599350318312645
Loss at step 350: 0.031561993062496185
Loss at step 400: 0.03588414564728737
Loss at step 450: 0.044904645532369614
Loss at step 500: 0.03464886173605919
Loss at step 550: 0.041914213448762894
Loss at step 600: 0.03834737092256546
Loss at step 650: 0.051248062402009964
Loss at step 700: 0.03469805046916008
Loss at step 750: 0.03618532046675682
Loss at step 800: 0.0414600595831871
Loss at step 850: 0.03126781806349754
Loss at step 900: 0.04621324688196182
Mean training loss after epoch 97: 0.04076281934778001
EPOCH: 98
Loss at step 0: 0.03838682174682617
Loss at step 50: 0.05533002316951752
Loss at step 100: 0.030290279537439346
Loss at step 150: 0.05067134276032448
Loss at step 200: 0.04985364153981209
Loss at step 250: 0.06334389001131058
Loss at step 300: 0.05746433138847351
Loss at step 350: 0.031397782266139984
Loss at step 400: 0.03370814770460129
Loss at step 450: 0.034410182386636734
Loss at step 500: 0.05077733099460602
Loss at step 550: 0.03374994918704033
Loss at step 600: 0.042185988277196884
Loss at step 650: 0.03396277129650116
Loss at step 700: 0.036716412752866745
Loss at step 750: 0.03274477645754814
Loss at step 800: 0.033424075692892075
Loss at step 850: 0.04032216593623161
Loss at step 900: 0.03501669317483902
Mean training loss after epoch 98: 0.040670416941409555
EPOCH: 99
Loss at step 0: 0.03827624395489693
Loss at step 50: 0.048487599939107895
Loss at step 100: 0.049376241862773895
Loss at step 150: 0.0495765246450901
Loss at step 200: 0.031661536544561386
Loss at step 250: 0.03776783123612404
Loss at step 300: 0.0347079262137413
Loss at step 350: 0.05254741013050079
Loss at step 400: 0.05152810364961624
Loss at step 450: 0.035800751298666
Loss at step 500: 0.06741048395633698
Loss at step 550: 0.03415104001760483
Loss at step 600: 0.03682698681950569
Loss at step 650: 0.03967992216348648
Loss at step 700: 0.027164705097675323
Loss at step 750: 0.050535522401332855
Loss at step 800: 0.02985253557562828
Loss at step 850: 0.038912225514650345
Loss at step 900: 0.035812925547361374
Mean training loss after epoch 99: 0.04073020558891647
EPOCH: 100
Loss at step 0: 0.032607365399599075
Loss at step 50: 0.06099787354469299
Loss at step 100: 0.05005550757050514
Loss at step 150: 0.03425469622015953
Loss at step 200: 0.04151573032140732
Loss at step 250: 0.048218220472335815
Loss at step 300: 0.03169530630111694
Loss at step 350: 0.028787700459361076
Loss at step 400: 0.029403982684016228
Loss at step 450: 0.03593473508954048
Loss at step 500: 0.046153709292411804
Loss at step 550: 0.03828473761677742
Loss at step 600: 0.03990305960178375
Loss at step 650: 0.062365055084228516
Loss at step 700: 0.038013067096471786
Loss at step 750: 0.033276695758104324
Loss at step 800: 0.03590114414691925
Loss at step 850: 0.040111057460308075
Loss at step 900: 0.04882941395044327
Mean training loss after epoch 100: 0.04036760972792914
EPOCH: 101
Loss at step 0: 0.03448909521102905
Loss at step 50: 0.03724827989935875
Loss at step 100: 0.06898139417171478
Loss at step 150: 0.02924741432070732
Loss at step 200: 0.03393682464957237
Loss at step 250: 0.03785952553153038
Loss at step 300: 0.03147255256772041
Loss at step 350: 0.0666293352842331
Loss at step 400: 0.05227957293391228
Loss at step 450: 0.032412849366664886
Loss at step 500: 0.028628254309296608
Loss at step 550: 0.03578814119100571
Loss at step 600: 0.052394889295101166
Loss at step 650: 0.0314219631254673
Loss at step 700: 0.03494124859571457
Loss at step 750: 0.03220677748322487
Loss at step 800: 0.0704156830906868
Loss at step 850: 0.038153473287820816
Loss at step 900: 0.04836264252662659
Mean training loss after epoch 101: 0.04046514103494918
EPOCH: 102
Loss at step 0: 0.05540713295340538
Loss at step 50: 0.06708371639251709
Loss at step 100: 0.038608841598033905
Loss at step 150: 0.03573644906282425
Loss at step 200: 0.039247605949640274
Loss at step 250: 0.032899416983127594
Loss at step 300: 0.03535171225667
Loss at step 350: 0.03774742782115936
Loss at step 400: 0.03619568049907684
Loss at step 450: 0.032232049852609634
Loss at step 500: 0.05181793496012688
Loss at step 550: 0.03434407338500023
Loss at step 600: 0.03519069403409958
Loss at step 650: 0.036724913865327835
Loss at step 700: 0.038438115268945694
Loss at step 750: 0.044145528227090836
Loss at step 800: 0.046652257442474365
Loss at step 850: 0.0551580972969532
Loss at step 900: 0.03391639143228531
Mean training loss after epoch 102: 0.04041277822742521
EPOCH: 103
Loss at step 0: 0.03112451732158661
Loss at step 50: 0.04157055541872978
Loss at step 100: 0.047702889889478683
Loss at step 150: 0.03303459286689758
Loss at step 200: 0.03123403526842594
Loss at step 250: 0.046796660870313644
Loss at step 300: 0.06682164967060089
Loss at step 350: 0.030688513070344925
Loss at step 400: 0.04190076142549515
Loss at step 450: 0.05001092329621315
Loss at step 500: 0.03804301470518112
Loss at step 550: 0.058865346014499664
Loss at step 600: 0.05053896829485893
Loss at step 650: 0.03335892781615257
Loss at step 700: 0.05287832021713257
Loss at step 750: 0.05739011988043785
Loss at step 800: 0.03445238992571831
Loss at step 850: 0.061511073261499405
Loss at step 900: 0.029992185533046722
Mean training loss after epoch 103: 0.04018082712759087
EPOCH: 104
Loss at step 0: 0.05760432034730911
Loss at step 50: 0.05041252821683884
Loss at step 100: 0.027177143841981888
Loss at step 150: 0.039961639791727066
Loss at step 200: 0.0325581319630146
Loss at step 250: 0.05665874481201172
Loss at step 300: 0.03727357089519501
Loss at step 350: 0.028202036395668983
Loss at step 400: 0.03974505886435509
Loss at step 450: 0.03842462599277496
Loss at step 500: 0.03134937211871147
Loss at step 550: 0.04020616412162781
Loss at step 600: 0.04058777168393135
Loss at step 650: 0.03416196629405022
Loss at step 700: 0.06070096045732498
Loss at step 750: 0.03729524463415146
Loss at step 800: 0.04380563274025917
Loss at step 850: 0.03157120570540428
Loss at step 900: 0.03815029188990593
Mean training loss after epoch 104: 0.040242427686002974
EPOCH: 105
Loss at step 0: 0.0319080725312233
Loss at step 50: 0.03261766955256462
Loss at step 100: 0.04951752722263336
Loss at step 150: 0.031716614961624146
Loss at step 200: 0.046612419188022614
Loss at step 250: 0.0327620655298233
Loss at step 300: 0.042399682104587555
Loss at step 350: 0.04419102892279625
Loss at step 400: 0.032193947583436966
Loss at step 450: 0.03805965930223465
Loss at step 500: 0.049167387187480927
Loss at step 550: 0.03526442497968674
Loss at step 600: 0.034376244992017746
Loss at step 650: 0.0354459322988987
Loss at step 700: 0.038397882133722305
Loss at step 750: 0.05540142208337784
Loss at step 800: 0.030147867277264595
Loss at step 850: 0.060497745871543884
Loss at step 900: 0.041193459182977676
Mean training loss after epoch 105: 0.04030951934614416
EPOCH: 106
Loss at step 0: 0.03829112648963928
Loss at step 50: 0.026048794388771057
Loss at step 100: 0.0357942208647728
Loss at step 150: 0.05229229852557182
Loss at step 200: 0.04837630316615105
Loss at step 250: 0.051231976598501205
Loss at step 300: 0.04065598174929619
Loss at step 350: 0.03193920850753784
Loss at step 400: 0.0320788212120533
Loss at step 450: 0.05464372783899307
Loss at step 500: 0.026041459292173386
Loss at step 550: 0.035844311118125916
Loss at step 600: 0.029609395191073418
Loss at step 650: 0.03459584712982178
Loss at step 700: 0.036759935319423676
Loss at step 750: 0.0362151637673378
Loss at step 800: 0.0336429588496685
Loss at step 850: 0.046769093722105026
Loss at step 900: 0.03425263985991478
Mean training loss after epoch 106: 0.04064369485822759
EPOCH: 107
Loss at step 0: 0.07431833446025848
Loss at step 50: 0.0602848082780838
Loss at step 100: 0.029872529208660126
Loss at step 150: 0.035330332815647125
Loss at step 200: 0.045894455164670944
Loss at step 250: 0.030223330482840538
Loss at step 300: 0.036641091108322144
Loss at step 350: 0.0410873144865036
Loss at step 400: 0.044177498668432236
Loss at step 450: 0.03326188400387764
Loss at step 500: 0.040481384843587875
Loss at step 550: 0.04043734073638916
Loss at step 600: 0.03910863399505615
Loss at step 650: 0.04387463256716728
Loss at step 700: 0.03435670956969261
Loss at step 750: 0.0295183677226305
Loss at step 800: 0.03602305427193642
Loss at step 850: 0.02967149391770363
Loss at step 900: 0.03301510587334633
Mean training loss after epoch 107: 0.040982513295323736
EPOCH: 108
Loss at step 0: 0.05611558258533478
Loss at step 50: 0.035307254642248154
Loss at step 100: 0.02783018909394741
Loss at step 150: 0.039593588560819626
Loss at step 200: 0.06363610923290253
Loss at step 250: 0.03429553285241127
Loss at step 300: 0.03413379192352295
Loss at step 350: 0.028662709519267082
Loss at step 400: 0.039821330457925797
Loss at step 450: 0.03959920257329941
Loss at step 500: 0.036259181797504425
Loss at step 550: 0.05385353043675423
Loss at step 600: 0.026031946763396263
Loss at step 650: 0.03385019302368164
Loss at step 700: 0.028812086209654808
Loss at step 750: 0.03397990018129349
Loss at step 800: 0.03166824206709862
Loss at step 850: 0.0317801870405674
Loss at step 900: 0.03626967966556549
Mean training loss after epoch 108: 0.040677893166142357
EPOCH: 109
Loss at step 0: 0.04203055799007416
Loss at step 50: 0.028228793293237686
Loss at step 100: 0.04802173748612404
Loss at step 150: 0.0384133942425251
Loss at step 200: 0.0397668182849884
Loss at step 250: 0.03989081457257271
Loss at step 300: 0.03930295258760452
Loss at step 350: 0.03303258866071701
Loss at step 400: 0.05263761058449745
Loss at step 450: 0.03365996852517128
Loss at step 500: 0.05358096584677696
Loss at step 550: 0.04638715088367462
Loss at step 600: 0.03853611648082733
Loss at step 650: 0.03495076298713684
Loss at step 700: 0.06538189202547073
Loss at step 750: 0.028637005016207695
Loss at step 800: 0.035671524703502655
Loss at step 850: 0.038928259164094925
Loss at step 900: 0.030003154650330544
Mean training loss after epoch 109: 0.040187274618173584
EPOCH: 110
Loss at step 0: 0.03870801627635956
Loss at step 50: 0.03549281880259514
Loss at step 100: 0.05035026744008064
Loss at step 150: 0.04582242667675018
Loss at step 200: 0.03825569152832031
Loss at step 250: 0.030541272833943367
Loss at step 300: 0.039547670632600784
Loss at step 350: 0.033150963485240936
Loss at step 400: 0.033019401133060455
Loss at step 450: 0.03229507431387901
Loss at step 500: 0.03577958792448044
Loss at step 550: 0.03508676588535309
Loss at step 600: 0.032196156680583954
Loss at step 650: 0.039787814021110535
Loss at step 700: 0.05168083682656288
Loss at step 750: 0.04113023355603218
Loss at step 800: 0.03392178192734718
Loss at step 850: 0.029219821095466614
Loss at step 900: 0.04385785758495331
Mean training loss after epoch 110: 0.04062215005283925
EPOCH: 111
Loss at step 0: 0.0526747927069664
Loss at step 50: 0.038879457861185074
Loss at step 100: 0.03004498779773712
Loss at step 150: 0.03682366758584976
Loss at step 200: 0.03794126212596893
Loss at step 250: 0.0487305149435997
Loss at step 300: 0.04967457801103592
Loss at step 350: 0.035163119435310364
Loss at step 400: 0.03427344933152199
Loss at step 450: 0.03088877536356449
Loss at step 500: 0.03169834613800049
Loss at step 550: 0.0374300479888916
Loss at step 600: 0.033706407994031906
Loss at step 650: 0.035374246537685394
Loss at step 700: 0.039848484098911285
Loss at step 750: 0.06113433837890625
Loss at step 800: 0.04747779294848442
Loss at step 850: 0.03599917143583298
Loss at step 900: 0.03615652397274971
Mean training loss after epoch 111: 0.039936521320915554
EPOCH: 112
Loss at step 0: 0.0373336486518383
Loss at step 50: 0.05114150419831276
Loss at step 100: 0.05242919921875
Loss at step 150: 0.03499115630984306
Loss at step 200: 0.04813041538000107
Loss at step 250: 0.03306647390127182
Loss at step 300: 0.0505673922598362
Loss at step 350: 0.03386365622282028
Loss at step 400: 0.03137662261724472
Loss at step 450: 0.03400233015418053
Loss at step 500: 0.03519631177186966
Loss at step 550: 0.04086479917168617
Loss at step 600: 0.03908710181713104
Loss at step 650: 0.05335430055856705
Loss at step 700: 0.036864008754491806
Loss at step 750: 0.051715828478336334
Loss at step 800: 0.051827117800712585
Loss at step 850: 0.03942806273698807
Loss at step 900: 0.04879195615649223
Mean training loss after epoch 112: 0.04063963858104909
EPOCH: 113
Loss at step 0: 0.03795037046074867
Loss at step 50: 0.0358436293900013
Loss at step 100: 0.04843265190720558
Loss at step 150: 0.05632951855659485
Loss at step 200: 0.03601030632853508
Loss at step 250: 0.03268331661820412
Loss at step 300: 0.038877252489328384
Loss at step 350: 0.045339521020650864
Loss at step 400: 0.032848652452230453
Loss at step 450: 0.04735710099339485
Loss at step 500: 0.04660831019282341
Loss at step 550: 0.05198223516345024
Loss at step 600: 0.04441724345088005
Loss at step 650: 0.05480624735355377
Loss at step 700: 0.06743644922971725
Loss at step 750: 0.03541141003370285
Loss at step 800: 0.03227515518665314
Loss at step 850: 0.03646775335073471
Loss at step 900: 0.03438189998269081
Mean training loss after epoch 113: 0.04005539193471421
EPOCH: 114
Loss at step 0: 0.036579687148332596
Loss at step 50: 0.03507820516824722
Loss at step 100: 0.029598532244563103
Loss at step 150: 0.03884422406554222
Loss at step 200: 0.05779637023806572
Loss at step 250: 0.04017007723450661
Loss at step 300: 0.04302921146154404
Loss at step 350: 0.033810097724199295
Loss at step 400: 0.03218913450837135
Loss at step 450: 0.03484483063220978
Loss at step 500: 0.05256728455424309
Loss at step 550: 0.032125137746334076
Loss at step 600: 0.034768857061862946
Loss at step 650: 0.036406390368938446
Loss at step 700: 0.05086382478475571
Loss at step 750: 0.03705102577805519
Loss at step 800: 0.03796803578734398
Loss at step 850: 0.037255316972732544
Loss at step 900: 0.03930399939417839
Mean training loss after epoch 114: 0.03997340493960612
EPOCH: 115
Loss at step 0: 0.03761971741914749
Loss at step 50: 0.02730981633067131
Loss at step 100: 0.029183940961956978
Loss at step 150: 0.029592841863632202
Loss at step 200: 0.06739065796136856
Loss at step 250: 0.038498494774103165
Loss at step 300: 0.0405089445412159
Loss at step 350: 0.036972276866436005
Loss at step 400: 0.034451935440301895
Loss at step 450: 0.045539602637290955
Loss at step 500: 0.03423967957496643
Loss at step 550: 0.03492945432662964
Loss at step 600: 0.04334418848156929
Loss at step 650: 0.03467882424592972
Loss at step 700: 0.03075747936964035
Loss at step 750: 0.038405630737543106
Loss at step 800: 0.05285104736685753
Loss at step 850: 0.03415987268090248
Loss at step 900: 0.03387216106057167
Mean training loss after epoch 115: 0.04025659561435233
EPOCH: 116
Loss at step 0: 0.036216430366039276
Loss at step 50: 0.05867926776409149
Loss at step 100: 0.04880305752158165
Loss at step 150: 0.024054808542132378
Loss at step 200: 0.03405758738517761
Loss at step 250: 0.0369703583419323
Loss at step 300: 0.043173935264348984
Loss at step 350: 0.03779338672757149
Loss at step 400: 0.0379754900932312
Loss at step 450: 0.04397716745734215
Loss at step 500: 0.02539883553981781
Loss at step 550: 0.03261855989694595
Loss at step 600: 0.036792535334825516
Loss at step 650: 0.03261512145400047
Loss at step 700: 0.048828668892383575
Loss at step 750: 0.03268284723162651
Loss at step 800: 0.05022961646318436
Loss at step 850: 0.04113951325416565
Loss at step 900: 0.03498257324099541
Mean training loss after epoch 116: 0.04056224513894269
EPOCH: 117
Loss at step 0: 0.0348486453294754
Loss at step 50: 0.04878906533122063
Loss at step 100: 0.044236086308956146
Loss at step 150: 0.03162962570786476
Loss at step 200: 0.03578827157616615
Loss at step 250: 0.030955882743000984
Loss at step 300: 0.03401421010494232
Loss at step 350: 0.03497142344713211
Loss at step 400: 0.035778697580099106
Loss at step 450: 0.03437361121177673
Loss at step 500: 0.04439618065953255
Loss at step 550: 0.03746693581342697
Loss at step 600: 0.034327432513237
Loss at step 650: 0.033807624131441116
Loss at step 700: 0.03443426638841629
Loss at step 750: 0.031515758484601974
Loss at step 800: 0.03196234628558159
Loss at step 850: 0.037271417677402496
Loss at step 900: 0.030908746644854546
Mean training loss after epoch 117: 0.03997806832194328
EPOCH: 118
Loss at step 0: 0.06313231587409973
Loss at step 50: 0.06051144748926163
Loss at step 100: 0.04856443777680397
Loss at step 150: 0.03401831164956093
Loss at step 200: 0.03721427172422409
Loss at step 250: 0.03343229740858078
Loss at step 300: 0.03522949293255806
Loss at step 350: 0.04258454963564873
Loss at step 400: 0.06014687940478325
Loss at step 450: 0.03544767573475838
Loss at step 500: 0.039289381355047226
Loss at step 550: 0.0608576163649559
Loss at step 600: 0.03271527960896492
Loss at step 650: 0.04725121706724167
Loss at step 700: 0.04453769326210022
Loss at step 750: 0.02915090322494507
Loss at step 800: 0.05556664988398552
Loss at step 850: 0.02730882354080677
Loss at step 900: 0.030028440058231354
Mean training loss after epoch 118: 0.04058234326279303
EPOCH: 119
Loss at step 0: 0.037382058799266815
Loss at step 50: 0.05149533599615097
Loss at step 100: 0.03135047107934952
Loss at step 150: 0.030940087512135506
Loss at step 200: 0.03823164850473404
Loss at step 250: 0.03651123121380806
Loss at step 300: 0.03233299031853676
Loss at step 350: 0.036667048931121826
Loss at step 400: 0.03249666467308998
Loss at step 450: 0.03559266775846481
Loss at step 500: 0.0458882637321949
Loss at step 550: 0.03821825981140137
Loss at step 600: 0.033053841441869736
Loss at step 650: 0.03599870204925537
Loss at step 700: 0.029937120154500008
Loss at step 750: 0.03885503485798836
Loss at step 800: 0.03379277139902115
Loss at step 850: 0.04492877423763275
Loss at step 900: 0.04187827929854393
Mean training loss after epoch 119: 0.04040607897195417
EPOCH: 120
Loss at step 0: 0.028167858719825745
Loss at step 50: 0.04726026579737663
Loss at step 100: 0.0361696258187294
Loss at step 150: 0.030510244891047478
Loss at step 200: 0.0330553762614727
Loss at step 250: 0.03535635769367218
Loss at step 300: 0.05735327675938606
Loss at step 350: 0.03452423959970474
Loss at step 400: 0.04450173303484917
Loss at step 450: 0.03576704487204552
Loss at step 500: 0.046488843858242035
Loss at step 550: 0.04781826585531235
Loss at step 600: 0.04467238113284111
Loss at step 650: 0.04754528030753136
Loss at step 700: 0.043728072196245193
Loss at step 750: 0.04102247580885887
Loss at step 800: 0.054334353655576706
Loss at step 850: 0.04936356842517853
Loss at step 900: 0.044593654572963715
Mean training loss after epoch 120: 0.03998779853198256
EPOCH: 121
Loss at step 0: 0.05285077169537544
Loss at step 50: 0.028145067393779755
Loss at step 100: 0.029211612418293953
Loss at step 150: 0.04488343000411987
Loss at step 200: 0.029937224462628365
Loss at step 250: 0.05586611479520798
Loss at step 300: 0.047047797590494156
Loss at step 350: 0.052246637642383575
Loss at step 400: 0.05250212177634239
Loss at step 450: 0.046561263501644135
Loss at step 500: 0.029796786606311798
Loss at step 550: 0.03729530796408653
Loss at step 600: 0.035116907209157944
Loss at step 650: 0.038523416966199875
Loss at step 700: 0.055753014981746674
Loss at step 750: 0.030627282336354256
Loss at step 800: 0.036290332674980164
Loss at step 850: 0.03514028340578079
Loss at step 900: 0.0453634150326252
Mean training loss after epoch 121: 0.040457542389948996
EPOCH: 122
Loss at step 0: 0.031591176986694336
Loss at step 50: 0.03447073698043823
Loss at step 100: 0.041707564145326614
Loss at step 150: 0.06623993813991547
Loss at step 200: 0.03063139319419861
Loss at step 250: 0.033697038888931274
Loss at step 300: 0.05644533410668373
Loss at step 350: 0.06907740980386734
Loss at step 400: 0.036720581352710724
Loss at step 450: 0.0389428436756134
Loss at step 500: 0.02795400097966194
Loss at step 550: 0.03139829635620117
Loss at step 600: 0.03670325130224228
Loss at step 650: 0.04355520382523537
Loss at step 700: 0.03025864250957966
Loss at step 750: 0.0512089803814888
Loss at step 800: 0.0327790305018425
Loss at step 850: 0.03410333767533302
Loss at step 900: 0.034768156707286835
Mean training loss after epoch 122: 0.04006110403790022
EPOCH: 123
Loss at step 0: 0.037590596824884415
Loss at step 50: 0.05157771334052086
Loss at step 100: 0.05391733720898628
Loss at step 150: 0.05016305297613144
Loss at step 200: 0.04134446755051613
Loss at step 250: 0.037453681230545044
Loss at step 300: 0.05308147147297859
Loss at step 350: 0.028448354452848434
Loss at step 400: 0.03311985731124878
Loss at step 450: 0.047355495393276215
Loss at step 500: 0.03463435545563698
Loss at step 550: 0.034671586006879807
Loss at step 600: 0.03247098624706268
Loss at step 650: 0.048032741993665695
Loss at step 700: 0.03710617870092392
Loss at step 750: 0.031204132363200188
Loss at step 800: 0.038457345217466354
Loss at step 850: 0.043605733662843704
Loss at step 900: 0.03489955887198448
Mean training loss after epoch 123: 0.04032010728044551
EPOCH: 124
Loss at step 0: 0.033983953297138214
Loss at step 50: 0.04215710237622261
Loss at step 100: 0.033741917461156845
Loss at step 150: 0.037536218762397766
Loss at step 200: 0.035390716046094894
Loss at step 250: 0.033167146146297455
Loss at step 300: 0.03190934285521507
Loss at step 350: 0.03862353414297104
Loss at step 400: 0.03355590999126434
Loss at step 450: 0.03378209099173546
Loss at step 500: 0.03174929693341255
Loss at step 550: 0.031022535637021065
Loss at step 600: 0.04976525530219078
Loss at step 650: 0.04322446137666702
Loss at step 700: 0.03885587304830551
Loss at step 750: 0.037186697125434875
Loss at step 800: 0.04601681977510452
Loss at step 850: 0.03291945904493332
Loss at step 900: 0.042094048112630844
Mean training loss after epoch 124: 0.040002900197593644
EPOCH: 125
Loss at step 0: 0.0500323623418808
Loss at step 50: 0.05065305903553963
Loss at step 100: 0.033764030784368515
Loss at step 150: 0.050167616456747055
Loss at step 200: 0.030769914388656616
Loss at step 250: 0.04813048616051674
Loss at step 300: 0.03515201061964035
Loss at step 350: 0.03375561162829399
Loss at step 400: 0.050982020795345306
Loss at step 450: 0.035507962107658386
Loss at step 500: 0.03745051100850105
Loss at step 550: 0.03788281977176666
Loss at step 600: 0.03262507915496826
Loss at step 650: 0.032084278762340546
Loss at step 700: 0.040814898908138275
Loss at step 750: 0.03551555424928665
Loss at step 800: 0.03950602188706398
Loss at step 850: 0.03731091693043709
Loss at step 900: 0.034966662526130676
Mean training loss after epoch 125: 0.040024525072894244
EPOCH: 126
Loss at step 0: 0.034171294420957565
Loss at step 50: 0.033561307936906815
Loss at step 100: 0.0351739265024662
Loss at step 150: 0.0548534132540226
Loss at step 200: 0.03751838952302933
Loss at step 250: 0.045688968151807785
Loss at step 300: 0.04876801744103432
Loss at step 350: 0.052135396748781204
Loss at step 400: 0.034007418900728226
Loss at step 450: 0.03813653811812401
Loss at step 500: 0.06047852709889412
Loss at step 550: 0.06819915026426315
Loss at step 600: 0.03307413309812546
Loss at step 650: 0.035687174648046494
Loss at step 700: 0.03782489895820618
Loss at step 750: 0.03208120912313461
Loss at step 800: 0.062991663813591
Loss at step 850: 0.04625096544623375
Loss at step 900: 0.0389222614467144
Mean training loss after epoch 126: 0.03970333728303851
EPOCH: 127
Loss at step 0: 0.03681530803442001
Loss at step 50: 0.04381631314754486
Loss at step 100: 0.05013523995876312
Loss at step 150: 0.06218918785452843
Loss at step 200: 0.03095322847366333
Loss at step 250: 0.03979748860001564
Loss at step 300: 0.06336919218301773
Loss at step 350: 0.03274674341082573
Loss at step 400: 0.04084060713648796
Loss at step 450: 0.03177779167890549
Loss at step 500: 0.03313849866390228
Loss at step 550: 0.038118936121463776
Loss at step 600: 0.03555034101009369
Loss at step 650: 0.0577574260532856
Loss at step 700: 0.03207573667168617
Loss at step 750: 0.020524565130472183
Loss at step 800: 0.03095831535756588
Loss at step 850: 0.026542063802480698
Loss at step 900: 0.051946502178907394
Mean training loss after epoch 127: 0.039454418913657856
EPOCH: 128
Loss at step 0: 0.03673482686281204
Loss at step 50: 0.03247862681746483
Loss at step 100: 0.031764429062604904
Loss at step 150: 0.02867070399224758
Loss at step 200: 0.039199747145175934
Loss at step 250: 0.05398840084671974
Loss at step 300: 0.06494774669408798
Loss at step 350: 0.05154864490032196
Loss at step 400: 0.0476166307926178
Loss at step 450: 0.06569583714008331
Loss at step 500: 0.03495299443602562
Loss at step 550: 0.03356633707880974
Loss at step 600: 0.046172380447387695
Loss at step 650: 0.047938257455825806
Loss at step 700: 0.036930497735738754
Loss at step 750: 0.038262851536273956
Loss at step 800: 0.03517230600118637
Loss at step 850: 0.034791771322488785
Loss at step 900: 0.04119005799293518
Mean training loss after epoch 128: 0.04036835033788101
EPOCH: 129
Loss at step 0: 0.03706024959683418
Loss at step 50: 0.03425043448805809
Loss at step 100: 0.02833772636950016
Loss at step 150: 0.03949480876326561
Loss at step 200: 0.03533528372645378
Loss at step 250: 0.03487789258360863
Loss at step 300: 0.048765428364276886
Loss at step 350: 0.044877734035253525
Loss at step 400: 0.037002693861722946
Loss at step 450: 0.04018561169505119
Loss at step 500: 0.03295544534921646
Loss at step 550: 0.049722786992788315
Loss at step 600: 0.055026598274707794
Loss at step 650: 0.03380793333053589
Loss at step 700: 0.05094536021351814
Loss at step 750: 0.03653072938323021
Loss at step 800: 0.03610605001449585
Loss at step 850: 0.040753696113824844
Loss at step 900: 0.03856388479471207
Mean training loss after epoch 129: 0.039871999711942066
EPOCH: 130
Loss at step 0: 0.048286810517311096
Loss at step 50: 0.047162044793367386
Loss at step 100: 0.038353268057107925
Loss at step 150: 0.0325942263007164
Loss at step 200: 0.03514799103140831
Loss at step 250: 0.03141734004020691
Loss at step 300: 0.04474622756242752
Loss at step 350: 0.03544167801737785
Loss at step 400: 0.038670141249895096
Loss at step 450: 0.046402618288993835
Loss at step 500: 0.03563407063484192
Loss at step 550: 0.04327638819813728
Loss at step 600: 0.040320947766304016
Loss at step 650: 0.040106456726789474
Loss at step 700: 0.03996186703443527
Loss at step 750: 0.037617284804582596
Loss at step 800: 0.039552800357341766
Loss at step 850: 0.06186339259147644
Loss at step 900: 0.03492647036910057
Mean training loss after epoch 130: 0.03991808345926596
EPOCH: 131
Loss at step 0: 0.036616213619709015
Loss at step 50: 0.04928402975201607
Loss at step 100: 0.04600382223725319
Loss at step 150: 0.05377383902668953
Loss at step 200: 0.04120911657810211
Loss at step 250: 0.0662044882774353
Loss at step 300: 0.04024979844689369
Loss at step 350: 0.030393140390515327
Loss at step 400: 0.042372897267341614
Loss at step 450: 0.037753261625766754
Loss at step 500: 0.031086349859833717
Loss at step 550: 0.03298509865999222
Loss at step 600: 0.03297559171915054
Loss at step 650: 0.030049888417124748
Loss at step 700: 0.035726603120565414
Loss at step 750: 0.04066278785467148
Loss at step 800: 0.03545417636632919
Loss at step 850: 0.03513539209961891
Loss at step 900: 0.0556241013109684
Mean training loss after epoch 131: 0.0402852121804124
EPOCH: 132
Loss at step 0: 0.039677299559116364
Loss at step 50: 0.04367973282933235
Loss at step 100: 0.05246155336499214
Loss at step 150: 0.034604262560606
Loss at step 200: 0.049703508615493774
Loss at step 250: 0.03727739304304123
Loss at step 300: 0.049680761992931366
Loss at step 350: 0.0293569453060627
Loss at step 400: 0.0443883016705513
Loss at step 450: 0.0507032610476017
Loss at step 500: 0.03775497153401375
Loss at step 550: 0.0356106273829937
Loss at step 600: 0.06872064620256424
Loss at step 650: 0.03747250884771347
Loss at step 700: 0.03364582359790802
Loss at step 750: 0.05006100982427597
Loss at step 800: 0.049780216068029404
Loss at step 850: 0.035216640681028366
Loss at step 900: 0.03414424508810043
Mean training loss after epoch 132: 0.03970895380750775
EPOCH: 133
Loss at step 0: 0.036541033536195755
Loss at step 50: 0.03570520505309105
Loss at step 100: 0.03586961328983307
Loss at step 150: 0.033113136887550354
Loss at step 200: 0.031563837081193924
Loss at step 250: 0.03401624783873558
Loss at step 300: 0.0378023199737072
Loss at step 350: 0.03832368552684784
Loss at step 400: 0.02344367839396
Loss at step 450: 0.04743019491434097
Loss at step 500: 0.038056880235672
Loss at step 550: 0.04119366407394409
Loss at step 600: 0.03679410368204117
Loss at step 650: 0.04440126568078995
Loss at step 700: 0.0363314226269722
Loss at step 750: 0.03746131435036659
Loss at step 800: 0.05385933071374893
Loss at step 850: 0.031379908323287964
Loss at step 900: 0.042474813759326935
Mean training loss after epoch 133: 0.039679838541045245
EPOCH: 134
Loss at step 0: 0.027622492983937263
Loss at step 50: 0.04098658636212349
Loss at step 100: 0.03279804810881615
Loss at step 150: 0.04508473724126816
Loss at step 200: 0.0342780165374279
Loss at step 250: 0.030243201181292534
Loss at step 300: 0.028790751472115517
Loss at step 350: 0.03409457579255104
Loss at step 400: 0.03327581286430359
Loss at step 450: 0.037851233035326004
Loss at step 500: 0.031168678775429726
Loss at step 550: 0.0319376140832901
Loss at step 600: 0.04381627216935158
Loss at step 650: 0.04569874703884125
Loss at step 700: 0.033138565719127655
Loss at step 750: 0.03507937490940094
Loss at step 800: 0.04184393957257271
Loss at step 850: 0.03176693618297577
Loss at step 900: 0.030614210292696953
Mean training loss after epoch 134: 0.03981186858594799
EPOCH: 135
Loss at step 0: 0.03553043678402901
Loss at step 50: 0.03979399800300598
Loss at step 100: 0.043166384100914
Loss at step 150: 0.04016077518463135
Loss at step 200: 0.04015089198946953
Loss at step 250: 0.038076434284448624
Loss at step 300: 0.032090965658426285
Loss at step 350: 0.05516954883933067
Loss at step 400: 0.0338265635073185
Loss at step 450: 0.05055184289813042
Loss at step 500: 0.0306999534368515
Loss at step 550: 0.046033430844545364
Loss at step 600: 0.03614010661840439
Loss at step 650: 0.03513707220554352
Loss at step 700: 0.055435363203287125
Loss at step 750: 0.033791683614254
Loss at step 800: 0.040268998593091965
Loss at step 850: 0.046138305217027664
Loss at step 900: 0.036269623786211014
Mean training loss after epoch 135: 0.040519412690356596
EPOCH: 136
Loss at step 0: 0.031145719811320305
Loss at step 50: 0.034706104546785355
Loss at step 100: 0.04843144491314888
Loss at step 150: 0.06583622097969055
Loss at step 200: 0.05224967747926712
Loss at step 250: 0.04415404796600342
Loss at step 300: 0.03567051142454147
Loss at step 350: 0.027567612007260323
Loss at step 400: 0.05148737505078316
Loss at step 450: 0.04981483146548271
Loss at step 500: 0.04140967130661011
Loss at step 550: 0.028734488412737846
Loss at step 600: 0.03323344141244888
Loss at step 650: 0.05476273223757744
Loss at step 700: 0.04226115345954895
Loss at step 750: 0.03669494390487671
Loss at step 800: 0.03222007676959038
Loss at step 850: 0.025839785113930702
Loss at step 900: 0.03102259710431099
Mean training loss after epoch 136: 0.040111369352096686
EPOCH: 137
Loss at step 0: 0.035307832062244415
Loss at step 50: 0.035859256982803345
Loss at step 100: 0.04564115032553673
Loss at step 150: 0.08037211000919342
Loss at step 200: 0.044677089899778366
Loss at step 250: 0.04060159623622894
Loss at step 300: 0.05918145924806595
Loss at step 350: 0.033060211688280106
Loss at step 400: 0.038067109882831573
Loss at step 450: 0.03555935248732567
Loss at step 500: 0.04896773397922516
Loss at step 550: 0.03716621175408363
Loss at step 600: 0.03253568708896637
Loss at step 650: 0.0386538952589035
Loss at step 700: 0.03709159418940544
Loss at step 750: 0.047217704355716705
Loss at step 800: 0.03762747347354889
Loss at step 850: 0.033402878791093826
Loss at step 900: 0.03984730318188667
Mean training loss after epoch 137: 0.0398748926123354
EPOCH: 138
Loss at step 0: 0.0351131334900856
Loss at step 50: 0.05153409764170647
Loss at step 100: 0.05654660984873772
Loss at step 150: 0.03171384707093239
Loss at step 200: 0.044744957238435745
Loss at step 250: 0.04206939414143562
Loss at step 300: 0.05348724126815796
Loss at step 350: 0.05097969248890877
Loss at step 400: 0.0321083664894104
Loss at step 450: 0.04224591329693794
Loss at step 500: 0.03608119115233421
Loss at step 550: 0.054212745279073715
Loss at step 600: 0.03262381628155708
Loss at step 650: 0.048176102340221405
Loss at step 700: 0.03508109226822853
Loss at step 750: 0.03597771376371384
Loss at step 800: 0.04042951017618179
Loss at step 850: 0.03498770296573639
Loss at step 900: 0.03275318816304207
Mean training loss after epoch 138: 0.04044409448316674
EPOCH: 139
Loss at step 0: 0.05238460749387741
Loss at step 50: 0.029966481029987335
Loss at step 100: 0.031206198036670685
Loss at step 150: 0.03607112914323807
Loss at step 200: 0.0320424921810627
Loss at step 250: 0.03686009347438812
Loss at step 300: 0.04129014536738396
Loss at step 350: 0.05626775696873665
Loss at step 400: 0.03750087693333626
Loss at step 450: 0.03866837918758392
Loss at step 500: 0.04019865393638611
Loss at step 550: 0.030603976920247078
Loss at step 600: 0.035492926836013794
Loss at step 650: 0.031438861042261124
Loss at step 700: 0.044327959418296814
Loss at step 750: 0.029902664944529533
Loss at step 800: 0.03317349776625633
Loss at step 850: 0.040507715195417404
Loss at step 900: 0.03306630626320839
Mean training loss after epoch 139: 0.03977981074524523
EPOCH: 140
Loss at step 0: 0.06723352521657944
Loss at step 50: 0.036315590143203735
Loss at step 100: 0.048940982669591904
Loss at step 150: 0.030496828258037567
Loss at step 200: 0.03590953350067139
Loss at step 250: 0.04100378230214119
Loss at step 300: 0.035402316600084305
Loss at step 350: 0.05464482679963112
Loss at step 400: 0.05612416937947273
Loss at step 450: 0.04811203107237816
Loss at step 500: 0.03293948248028755
Loss at step 550: 0.029463013634085655
Loss at step 600: 0.0330667719244957
Loss at step 650: 0.07245888561010361
Loss at step 700: 0.03625422343611717
Loss at step 750: 0.029948584735393524
Loss at step 800: 0.03603658825159073
Loss at step 850: 0.038006313145160675
Loss at step 900: 0.04900289326906204
Mean training loss after epoch 140: 0.03966290574433453
EPOCH: 141
Loss at step 0: 0.05389322340488434
Loss at step 50: 0.03978116065263748
Loss at step 100: 0.028913181275129318
Loss at step 150: 0.03143258020281792
Loss at step 200: 0.05115267261862755
Loss at step 250: 0.04871513321995735
Loss at step 300: 0.04140704870223999
Loss at step 350: 0.029615212231874466
Loss at step 400: 0.04913806915283203
Loss at step 450: 0.05245078727602959
Loss at step 500: 0.03655010834336281
Loss at step 550: 0.03750515356659889
Loss at step 600: 0.03648988902568817
Loss at step 650: 0.04609394073486328
Loss at step 700: 0.04012591764330864
Loss at step 750: 0.031112534925341606
Loss at step 800: 0.04239273443818092
Loss at step 850: 0.0286885853856802
Loss at step 900: 0.04067554697394371
Mean training loss after epoch 141: 0.04047492779950216
EPOCH: 142
Loss at step 0: 0.0341968834400177
Loss at step 50: 0.027987387031316757
Loss at step 100: 0.04370037466287613
Loss at step 150: 0.03250158205628395
Loss at step 200: 0.03320758789777756
Loss at step 250: 0.04050271958112717
Loss at step 300: 0.05204188823699951
Loss at step 350: 0.03396312892436981
Loss at step 400: 0.03425634652376175
Loss at step 450: 0.050483450293540955
Loss at step 500: 0.06107752397656441
Loss at step 550: 0.03602050989866257
Loss at step 600: 0.030771542340517044
Loss at step 650: 0.03399882838129997
Loss at step 700: 0.04370621219277382
Loss at step 750: 0.06062234938144684
Loss at step 800: 0.030899103730916977
Loss at step 850: 0.035059645771980286
Loss at step 900: 0.05461152642965317
Mean training loss after epoch 142: 0.03912595202769044
EPOCH: 143
Loss at step 0: 0.03069254197180271
Loss at step 50: 0.028873268514871597
Loss at step 100: 0.028288697823882103
Loss at step 150: 0.044573914259672165
Loss at step 200: 0.035409681499004364
Loss at step 250: 0.045168306678533554
Loss at step 300: 0.03995560109615326
Loss at step 350: 0.0302441269159317
Loss at step 400: 0.03496329486370087
Loss at step 450: 0.03343826159834862
Loss at step 500: 0.034524980932474136
Loss at step 550: 0.03748397156596184
Loss at step 600: 0.07241212576627731
Loss at step 650: 0.040051836520433426
Loss at step 700: 0.06440601497888565
Loss at step 750: 0.032848332077264786
Loss at step 800: 0.043311264365911484
Loss at step 850: 0.031081559136509895
Loss at step 900: 0.038504716008901596
Mean training loss after epoch 143: 0.039704615202571535
EPOCH: 144
Loss at step 0: 0.034515753388404846
Loss at step 50: 0.038643963634967804
Loss at step 100: 0.04919477179646492
Loss at step 150: 0.06632418185472488
Loss at step 200: 0.03776159882545471
Loss at step 250: 0.03524239733815193
Loss at step 300: 0.03229808807373047
Loss at step 350: 0.03772842139005661
Loss at step 400: 0.07176003605127335
Loss at step 450: 0.0438353456556797
Loss at step 500: 0.03729798272252083
Loss at step 550: 0.034596312791109085
Loss at step 600: 0.05316857621073723
Loss at step 650: 0.04838914796710014
Loss at step 700: 0.03394513204693794
Loss at step 750: 0.05002584308385849
Loss at step 800: 0.04865674301981926
Loss at step 850: 0.030374780297279358
Loss at step 900: 0.03489556908607483
Mean training loss after epoch 144: 0.040014465794221424
EPOCH: 145
Loss at step 0: 0.05304267257452011
Loss at step 50: 0.03572936728596687
Loss at step 100: 0.03844626620411873
Loss at step 150: 0.051656000316143036
Loss at step 200: 0.0419786162674427
Loss at step 250: 0.03228844702243805
Loss at step 300: 0.03717874363064766
Loss at step 350: 0.03276830166578293
Loss at step 400: 0.05304082855582237
Loss at step 450: 0.0370134636759758
Loss at step 500: 0.03658261150121689
Loss at step 550: 0.05247374251484871
Loss at step 600: 0.05243967846035957
Loss at step 650: 0.035890765488147736
Loss at step 700: 0.029134448617696762
Loss at step 750: 0.04950794205069542
Loss at step 800: 0.031324274837970734
Loss at step 850: 0.04002757743000984
Loss at step 900: 0.05876762419939041
Mean training loss after epoch 145: 0.03966649990282588
EPOCH: 146
Loss at step 0: 0.03870001062750816
Loss at step 50: 0.0340142585337162
Loss at step 100: 0.032529689371585846
Loss at step 150: 0.04962790012359619
Loss at step 200: 0.06994400918483734
Loss at step 250: 0.04276866093277931
Loss at step 300: 0.054688457399606705
Loss at step 350: 0.051464974880218506
Loss at step 400: 0.0338805727660656
Loss at step 450: 0.04141691327095032
Loss at step 500: 0.04455658793449402
Loss at step 550: 0.04114391282200813
Loss at step 600: 0.03700781613588333
Loss at step 650: 0.03334641084074974
Loss at step 700: 0.037004128098487854
Loss at step 750: 0.045164212584495544
Loss at step 800: 0.04131518676877022
Loss at step 850: 0.04450450465083122
Loss at step 900: 0.05313121899962425
Mean training loss after epoch 146: 0.040262286505624176
EPOCH: 147
Loss at step 0: 0.03320532664656639
Loss at step 50: 0.0375794917345047
Loss at step 100: 0.03766798973083496
Loss at step 150: 0.051330532878637314
Loss at step 200: 0.03195059671998024
Loss at step 250: 0.04082850366830826
Loss at step 300: 0.0481402613222599
Loss at step 350: 0.030774451792240143
Loss at step 400: 0.03871845453977585
Loss at step 450: 0.05088238790631294
Loss at step 500: 0.03520108014345169
Loss at step 550: 0.04160549119114876
Loss at step 600: 0.038120336830616
Loss at step 650: 0.040947865694761276
Loss at step 700: 0.06268515437841415
Loss at step 750: 0.02965262345969677
Loss at step 800: 0.0288058090955019
Loss at step 850: 0.03575523570179939
Loss at step 900: 0.041679564863443375
Mean training loss after epoch 147: 0.0400226015303689
EPOCH: 148
Loss at step 0: 0.03675468638539314
Loss at step 50: 0.03356386721134186
Loss at step 100: 0.05130539834499359
Loss at step 150: 0.058995168656110764
Loss at step 200: 0.06680206954479218
Loss at step 250: 0.029005855321884155
Loss at step 300: 0.037632159888744354
Loss at step 350: 0.032792042940855026
Loss at step 400: 0.03614620864391327
Loss at step 450: 0.03251120075583458
Loss at step 500: 0.04506837949156761
Loss at step 550: 0.034124311059713364
Loss at step 600: 0.0380428172647953
Loss at step 650: 0.04019821807742119
Loss at step 700: 0.033369410783052444
Loss at step 750: 0.05180910602211952
Loss at step 800: 0.03275255858898163
Loss at step 850: 0.035289227962493896
Loss at step 900: 0.03412739187479019
Mean training loss after epoch 148: 0.03986330867123439
EPOCH: 149
Loss at step 0: 0.03967485949397087
Loss at step 50: 0.0405656173825264
Loss at step 100: 0.049536626785993576
Loss at step 150: 0.0467953085899353
Loss at step 200: 0.038718242198228836
Loss at step 250: 0.03391097113490105
Loss at step 300: 0.027988167479634285
Loss at step 350: 0.030590718612074852
Loss at step 400: 0.037134964019060135
Loss at step 450: 0.04939684644341469
Loss at step 500: 0.028641855344176292
Loss at step 550: 0.049792949110269547
Loss at step 600: 0.06914780288934708
Loss at step 650: 0.028473980724811554
Loss at step 700: 0.04541952535510063
Loss at step 750: 0.056351106613874435
Loss at step 800: 0.03810013830661774
Loss at step 850: 0.031036950647830963
Loss at step 900: 0.02989376336336136
Mean training loss after epoch 149: 0.03950454026405046
EPOCH: 150
Loss at step 0: 0.0497492291033268
Loss at step 50: 0.04664904624223709
Loss at step 100: 0.030672777444124222
Loss at step 150: 0.037179891020059586
Loss at step 200: 0.037141673266887665
Loss at step 250: 0.033430758863687515
Loss at step 300: 0.030099069699645042
Loss at step 350: 0.04658585414290428
Loss at step 400: 0.03212271258234978
Loss at step 450: 0.037536367774009705
Loss at step 500: 0.03360248729586601
Loss at step 550: 0.029031341895461082
Loss at step 600: 0.04348360747098923
Loss at step 650: 0.032128266990184784
Loss at step 700: 0.04996559023857117
Loss at step 750: 0.03784072399139404
Loss at step 800: 0.034901782870292664
Loss at step 850: 0.036399587988853455
Loss at step 900: 0.032460201531648636
Mean training loss after epoch 150: 0.03932684691531508
EPOCH: 151
Loss at step 0: 0.038559745997190475
Loss at step 50: 0.034698486328125
Loss at step 100: 0.034396491944789886
Loss at step 150: 0.03460189327597618
Loss at step 200: 0.03432363644242287
Loss at step 250: 0.044413454830646515
Loss at step 300: 0.0633033886551857
Loss at step 350: 0.028063831850886345
Loss at step 400: 0.03233586624264717
Loss at step 450: 0.050064317882061005
Loss at step 500: 0.04057862237095833
Loss at step 550: 0.03515806421637535
Loss at step 600: 0.03948560357093811
Loss at step 650: 0.03225551173090935
Loss at step 700: 0.0390482172369957
Loss at step 750: 0.03579146787524223
Loss at step 800: 0.03390445560216904
Loss at step 850: 0.05053498223423958
Loss at step 900: 0.04960924759507179
Mean training loss after epoch 151: 0.0394894066931946
EPOCH: 152
Loss at step 0: 0.04069296270608902
Loss at step 50: 0.0362466461956501
Loss at step 100: 0.03903277963399887
Loss at step 150: 0.03854478523135185
Loss at step 200: 0.030529094859957695
Loss at step 250: 0.03584692254662514
Loss at step 300: 0.03763008117675781
Loss at step 350: 0.03907984122633934
Loss at step 400: 0.032714176923036575
Loss at step 450: 0.04580497741699219
Loss at step 500: 0.030848801136016846
Loss at step 550: 0.03909658268094063
Loss at step 600: 0.0393528938293457
Loss at step 650: 0.034250885248184204
Loss at step 700: 0.039407070726156235
Loss at step 750: 0.06522706151008606
Loss at step 800: 0.02698972448706627
Loss at step 850: 0.033097878098487854
Loss at step 900: 0.039493318647146225
Mean training loss after epoch 152: 0.039979385349081395
EPOCH: 153
Loss at step 0: 0.03830374777317047
Loss at step 50: 0.03738465532660484
Loss at step 100: 0.03408047929406166
Loss at step 150: 0.08162634074687958
Loss at step 200: 0.047909438610076904
Loss at step 250: 0.04590803012251854
Loss at step 300: 0.03924799710512161
Loss at step 350: 0.049378473311662674
Loss at step 400: 0.03161664679646492
Loss at step 450: 0.060903649777173996
Loss at step 500: 0.0318431593477726
Loss at step 550: 0.030885569751262665
Loss at step 600: 0.04783306270837784
Loss at step 650: 0.03933698311448097
Loss at step 700: 0.03513640910387039
Loss at step 750: 0.03802374005317688
Loss at step 800: 0.03579352796077728
Loss at step 850: 0.039611611515283585
Loss at step 900: 0.032021258026361465
Mean training loss after epoch 153: 0.03981151960806043
EPOCH: 154
Loss at step 0: 0.03201507031917572
Loss at step 50: 0.03376217558979988
Loss at step 100: 0.030808014795184135
Loss at step 150: 0.05287822335958481
Loss at step 200: 0.035732246935367584
Loss at step 250: 0.03311564028263092
Loss at step 300: 0.033874545246362686
Loss at step 350: 0.0348997600376606
Loss at step 400: 0.03685653209686279
Loss at step 450: 0.034120168536901474
Loss at step 500: 0.027468038722872734
Loss at step 550: 0.03726698458194733
Loss at step 600: 0.03796930983662605
Loss at step 650: 0.052408505231142044
Loss at step 700: 0.03583231568336487
Loss at step 750: 0.038083259016275406
Loss at step 800: 0.03880752623081207
Loss at step 850: 0.03209945186972618
Loss at step 900: 0.04039257392287254
Mean training loss after epoch 154: 0.03945487745201537
EPOCH: 155
Loss at step 0: 0.03866823762655258
Loss at step 50: 0.031757935881614685
Loss at step 100: 0.03913172706961632
Loss at step 150: 0.04874129965901375
Loss at step 200: 0.05382993072271347
Loss at step 250: 0.041536904871463776
Loss at step 300: 0.030705511569976807
Loss at step 350: 0.03425120934844017
Loss at step 400: 0.03446534648537636
Loss at step 450: 0.04940277710556984
Loss at step 500: 0.03989547863602638
Loss at step 550: 0.03524135425686836
Loss at step 600: 0.045029882341623306
Loss at step 650: 0.0432940311729908
Loss at step 700: 0.039333123713731766
Loss at step 750: 0.04918549954891205
Loss at step 800: 0.052956581115722656
Loss at step 850: 0.03641032800078392
Loss at step 900: 0.03382905200123787
Mean training loss after epoch 155: 0.039653815565380585
EPOCH: 156
Loss at step 0: 0.052970923483371735
Loss at step 50: 0.05297328904271126
Loss at step 100: 0.04523034766316414
Loss at step 150: 0.052893735468387604
Loss at step 200: 0.06359665840864182
Loss at step 250: 0.03374951332807541
Loss at step 300: 0.033798668533563614
Loss at step 350: 0.03770531713962555
Loss at step 400: 0.06374736875295639
Loss at step 450: 0.04122309386730194
Loss at step 500: 0.03582649677991867
Loss at step 550: 0.03951054438948631
Loss at step 600: 0.03408455848693848
Loss at step 650: 0.03591909632086754
Loss at step 700: 0.028890229761600494
Loss at step 750: 0.050827328115701675
Loss at step 800: 0.03168226033449173
Loss at step 850: 0.06453513354063034
Loss at step 900: 0.05081882327795029
Mean training loss after epoch 156: 0.03980678563981232
EPOCH: 157
Loss at step 0: 0.03517033904790878
Loss at step 50: 0.03029661625623703
Loss at step 100: 0.04061397910118103
Loss at step 150: 0.04112451896071434
Loss at step 200: 0.05511137843132019
Loss at step 250: 0.034754179418087006
Loss at step 300: 0.03483182564377785
Loss at step 350: 0.02881348878145218
Loss at step 400: 0.03589953854680061
Loss at step 450: 0.040302760899066925
Loss at step 500: 0.02812015265226364
Loss at step 550: 0.03429786115884781
Loss at step 600: 0.042274147272109985
Loss at step 650: 0.03722405433654785
Loss at step 700: 0.06251759082078934
Loss at step 750: 0.03280168026685715
Loss at step 800: 0.033976029604673386
Loss at step 850: 0.03572479635477066
Loss at step 900: 0.03401172161102295
Mean training loss after epoch 157: 0.03961787642494066
EPOCH: 158
Loss at step 0: 0.038926295936107635
Loss at step 50: 0.06311167031526566
Loss at step 100: 0.039787448942661285
Loss at step 150: 0.03861629217863083
Loss at step 200: 0.04008471593260765
Loss at step 250: 0.03857661783695221
Loss at step 300: 0.03470795974135399
Loss at step 350: 0.05390108749270439
Loss at step 400: 0.047356195747852325
Loss at step 450: 0.04332069307565689
Loss at step 500: 0.03188915550708771
Loss at step 550: 0.06753725558519363
Loss at step 600: 0.05447079613804817
Loss at step 650: 0.039038512855768204
Loss at step 700: 0.034410372376441956
Loss at step 750: 0.0353931188583374
Loss at step 800: 0.040720392018556595
Loss at step 850: 0.030527109280228615
Loss at step 900: 0.03377766162157059
Mean training loss after epoch 158: 0.03973492824121007
EPOCH: 159
Loss at step 0: 0.04395056143403053
Loss at step 50: 0.03635787218809128
Loss at step 100: 0.052479084581136703
Loss at step 150: 0.0331464521586895
Loss at step 200: 0.03089253231883049
Loss at step 250: 0.038187939673662186
Loss at step 300: 0.030138317495584488
Loss at step 350: 0.03846345096826553
Loss at step 400: 0.049408961087465286
Loss at step 450: 0.05199102312326431
Loss at step 500: 0.03939513862133026
Loss at step 550: 0.025730779394507408
Loss at step 600: 0.040308210998773575
Loss at step 650: 0.04811836779117584
Loss at step 700: 0.03713226318359375
Loss at step 750: 0.03334973752498627
Loss at step 800: 0.054803516715765
Loss at step 850: 0.038321588188409805
Loss at step 900: 0.04296765848994255
Mean training loss after epoch 159: 0.039721751105048254
EPOCH: 160
Loss at step 0: 0.042155567556619644
Loss at step 50: 0.03570934012532234
Loss at step 100: 0.03494516387581825
Loss at step 150: 0.03171156346797943
Loss at step 200: 0.03144378587603569
Loss at step 250: 0.04876674711704254
Loss at step 300: 0.04549702629446983
Loss at step 350: 0.036487117409706116
Loss at step 400: 0.04203188419342041
Loss at step 450: 0.038777709007263184
Loss at step 500: 0.05074477568268776
Loss at step 550: 0.037049807608127594
Loss at step 600: 0.03319176658987999
Loss at step 650: 0.04365323483943939
Loss at step 700: 0.0491318553686142
Loss at step 750: 0.04497558996081352
Loss at step 800: 0.050616584718227386
Loss at step 850: 0.03367560729384422
Loss at step 900: 0.031395260244607925
Mean training loss after epoch 160: 0.03994718715071932
EPOCH: 161
Loss at step 0: 0.03787631168961525
Loss at step 50: 0.03632188215851784
Loss at step 100: 0.031822752207517624
Loss at step 150: 0.03723493218421936
Loss at step 200: 0.03751960024237633
Loss at step 250: 0.03722744435071945
Loss at step 300: 0.03279047831892967
Loss at step 350: 0.038861632347106934
Loss at step 400: 0.03202925994992256
Loss at step 450: 0.03544725105166435
Loss at step 500: 0.030634867027401924
Loss at step 550: 0.034504830837249756
Loss at step 600: 0.03699138015508652
Loss at step 650: 0.03438907489180565
Loss at step 700: 0.032897673547267914
Loss at step 750: 0.039568059146404266
Loss at step 800: 0.02716151438653469
Loss at step 850: 0.04132404923439026
Loss at step 900: 0.02859344705939293
Mean training loss after epoch 161: 0.03965161397838707
EPOCH: 162
Loss at step 0: 0.044058606028556824
Loss at step 50: 0.03533129766583443
Loss at step 100: 0.03633712977170944
Loss at step 150: 0.03775249794125557
Loss at step 200: 0.0342252217233181
Loss at step 250: 0.05188572406768799
Loss at step 300: 0.0261610709130764
Loss at step 350: 0.04466400295495987
Loss at step 400: 0.030652252957224846
Loss at step 450: 0.06206464022397995
Loss at step 500: 0.041849203407764435
Loss at step 550: 0.04211672022938728
Loss at step 600: 0.04404137656092644
Loss at step 650: 0.037888288497924805
Loss at step 700: 0.03815712034702301
Loss at step 750: 0.0420524999499321
Loss at step 800: 0.0497308187186718
Loss at step 850: 0.03643658757209778
Loss at step 900: 0.045854780822992325
Mean training loss after epoch 162: 0.03955368834065158
EPOCH: 163
Loss at step 0: 0.037637751549482346
Loss at step 50: 0.03754173591732979
Loss at step 100: 0.030969874933362007
Loss at step 150: 0.04777289181947708
Loss at step 200: 0.034243032336235046
Loss at step 250: 0.030788376927375793
Loss at step 300: 0.032340116798877716
Loss at step 350: 0.06834275275468826
Loss at step 400: 0.03906245157122612
Loss at step 450: 0.04159681126475334
Loss at step 500: 0.04897071421146393
Loss at step 550: 0.032994478940963745
Loss at step 600: 0.05192587152123451
Loss at step 650: 0.05311141908168793
Loss at step 700: 0.05736926198005676
Loss at step 750: 0.03613514453172684
Loss at step 800: 0.05063817650079727
Loss at step 850: 0.04703342542052269
Loss at step 900: 0.03148123249411583
Mean training loss after epoch 163: 0.03973168544153542
EPOCH: 164
Loss at step 0: 0.03522387892007828
Loss at step 50: 0.0354304239153862
Loss at step 100: 0.034576959908008575
Loss at step 150: 0.04489249363541603
Loss at step 200: 0.04102586582303047
Loss at step 250: 0.04082818701863289
Loss at step 300: 0.048556774854660034
Loss at step 350: 0.028994642198085785
Loss at step 400: 0.03966813161969185
Loss at step 450: 0.05221426114439964
Loss at step 500: 0.04641151428222656
Loss at step 550: 0.051005858927965164
Loss at step 600: 0.04715345799922943
Loss at step 650: 0.06296427547931671
Loss at step 700: 0.036891594529151917
Loss at step 750: 0.030493998900055885
Loss at step 800: 0.037916600704193115
Loss at step 850: 0.04149070009589195
Loss at step 900: 0.03550049662590027
Mean training loss after epoch 164: 0.0396854249514274
EPOCH: 165
Loss at step 0: 0.042220719158649445
Loss at step 50: 0.03374775871634483
Loss at step 100: 0.037937846034765244
Loss at step 150: 0.07947231829166412
Loss at step 200: 0.061438754200935364
Loss at step 250: 0.03606860339641571
Loss at step 300: 0.041866056621074677
Loss at step 350: 0.04028409719467163
Loss at step 400: 0.03585618734359741
Loss at step 450: 0.05177586153149605
Loss at step 500: 0.03241375833749771
Loss at step 550: 0.040101245045661926
Loss at step 600: 0.03462972491979599
Loss at step 650: 0.03558705002069473
Loss at step 700: 0.05475455895066261
Loss at step 750: 0.04315561428666115
Loss at step 800: 0.034266598522663116
Loss at step 850: 0.03024034947156906
Loss at step 900: 0.050545837730169296
Mean training loss after epoch 165: 0.03933941557217064
EPOCH: 166
Loss at step 0: 0.03422791138291359
Loss at step 50: 0.0342918299138546
Loss at step 100: 0.040826451033353806
Loss at step 150: 0.04773464798927307
Loss at step 200: 0.051334526389837265
Loss at step 250: 0.030052201822400093
Loss at step 300: 0.034113090485334396
Loss at step 350: 0.030852695927023888
Loss at step 400: 0.03177307918667793
Loss at step 450: 0.02808055840432644
Loss at step 500: 0.04130341112613678
Loss at step 550: 0.05406184867024422
Loss at step 600: 0.03329135850071907
Loss at step 650: 0.039528608322143555
Loss at step 700: 0.027340373024344444
Loss at step 750: 0.05258629098534584
Loss at step 800: 0.03488999977707863
Loss at step 850: 0.03175424784421921
Loss at step 900: 0.029860569164156914
Mean training loss after epoch 166: 0.0395889686686652
EPOCH: 167
Loss at step 0: 0.038849908858537674
Loss at step 50: 0.04868149012327194
Loss at step 100: 0.03719170764088631
Loss at step 150: 0.050611406564712524
Loss at step 200: 0.0537850558757782
Loss at step 250: 0.03979859873652458
Loss at step 300: 0.049695856869220734
Loss at step 350: 0.030528413131833076
Loss at step 400: 0.026854444295167923
Loss at step 450: 0.0315445214509964
Loss at step 500: 0.03543714061379433
Loss at step 550: 0.03773332014679909
Loss at step 600: 0.03539112210273743
Loss at step 650: 0.05348960682749748
Loss at step 700: 0.0349346399307251
Loss at step 750: 0.05097145959734917
Loss at step 800: 0.03509731963276863
Loss at step 850: 0.06194102019071579
Loss at step 900: 0.03483329713344574
Mean training loss after epoch 167: 0.03931984007914564
EPOCH: 168
Loss at step 0: 0.0442483015358448
Loss at step 50: 0.03344307094812393
Loss at step 100: 0.04714402183890343
Loss at step 150: 0.036182697862386703
Loss at step 200: 0.03450503572821617
Loss at step 250: 0.04949743673205376
Loss at step 300: 0.04699171707034111
Loss at step 350: 0.03356409817934036
Loss at step 400: 0.04914213716983795
Loss at step 450: 0.03713100776076317
Loss at step 500: 0.034673936665058136
Loss at step 550: 0.039094239473342896
Loss at step 600: 0.039589714258909225
Loss at step 650: 0.055458154529333115
Loss at step 700: 0.025569973513484
Loss at step 750: 0.035783909261226654
Loss at step 800: 0.0652955025434494
Loss at step 850: 0.03070709854364395
Loss at step 900: 0.030941976234316826
Mean training loss after epoch 168: 0.03940760542644557
EPOCH: 169
Loss at step 0: 0.03163331374526024
Loss at step 50: 0.03836061432957649
Loss at step 100: 0.04068070277571678
Loss at step 150: 0.030763309448957443
Loss at step 200: 0.036739546805620193
Loss at step 250: 0.029106061905622482
Loss at step 300: 0.032068174332380295
Loss at step 350: 0.032730091363191605
Loss at step 400: 0.05216984823346138
Loss at step 450: 0.04650463908910751
Loss at step 500: 0.040319133549928665
Loss at step 550: 0.030558280646800995
Loss at step 600: 0.041635192930698395
Loss at step 650: 0.03210076317191124
Loss at step 700: 0.025039872154593468
Loss at step 750: 0.04510484263300896
Loss at step 800: 0.04591650143265724
Loss at step 850: 0.03117387555539608
Loss at step 900: 0.03997860476374626
Mean training loss after epoch 169: 0.039096843947702124
EPOCH: 170
Loss at step 0: 0.04807845503091812
Loss at step 50: 0.03216809406876564
Loss at step 100: 0.052466776221990585
Loss at step 150: 0.05527348443865776
Loss at step 200: 0.04938875138759613
Loss at step 250: 0.029139377176761627
Loss at step 300: 0.03292334824800491
Loss at step 350: 0.031563788652420044
Loss at step 400: 0.030839821323752403
Loss at step 450: 0.04812578111886978
Loss at step 500: 0.03261122480034828
Loss at step 550: 0.03332837298512459
Loss at step 600: 0.029706457629799843
Loss at step 650: 0.05334772169589996
Loss at step 700: 0.038234107196331024
Loss at step 750: 0.03391353040933609
Loss at step 800: 0.03238190710544586
Loss at step 850: 0.045458365231752396
Loss at step 900: 0.031078660860657692
Mean training loss after epoch 170: 0.03927428957058995
EPOCH: 171
Loss at step 0: 0.03604511916637421
Loss at step 50: 0.02860851213335991
Loss at step 100: 0.030669499188661575
Loss at step 150: 0.03728640079498291
Loss at step 200: 0.03493388369679451
Loss at step 250: 0.03077596053481102
Loss at step 300: 0.02700110711157322
Loss at step 350: 0.04018703103065491
Loss at step 400: 0.029676662757992744
Loss at step 450: 0.031749628484249115
Loss at step 500: 0.04076163098216057
Loss at step 550: 0.0323580838739872
Loss at step 600: 0.05565223842859268
Loss at step 650: 0.036014098674058914
Loss at step 700: 0.03889094665646553
Loss at step 750: 0.04819978028535843
Loss at step 800: 0.035391367971897125
Loss at step 850: 0.04018506780266762
Loss at step 900: 0.03758300840854645
Mean training loss after epoch 171: 0.03903876623309561
EPOCH: 172
Loss at step 0: 0.040134530514478683
Loss at step 50: 0.030954569578170776
Loss at step 100: 0.032502155750989914
Loss at step 150: 0.047871530055999756
Loss at step 200: 0.03520025312900543
Loss at step 250: 0.033117249608039856
Loss at step 300: 0.04297640919685364
Loss at step 350: 0.03170880302786827
Loss at step 400: 0.03357839956879616
Loss at step 450: 0.03541633114218712
Loss at step 500: 0.10076101869344711
Loss at step 550: 0.04049905017018318
Loss at step 600: 0.03794636204838753
Loss at step 650: 0.05078333243727684
Loss at step 700: 0.032922934740781784
Loss at step 750: 0.03392742574214935
Loss at step 800: 0.034097425639629364
Loss at step 850: 0.0374067984521389
Loss at step 900: 0.03601730614900589
Mean training loss after epoch 172: 0.039571806728871645
EPOCH: 173
Loss at step 0: 0.051600147038698196
Loss at step 50: 0.03650055453181267
Loss at step 100: 0.03364503011107445
Loss at step 150: 0.06648823618888855
Loss at step 200: 0.032441750168800354
Loss at step 250: 0.03592945635318756
Loss at step 300: 0.03565963730216026
Loss at step 350: 0.03935347869992256
Loss at step 400: 0.05051932856440544
Loss at step 450: 0.04962983727455139
Loss at step 500: 0.051245179027318954
Loss at step 550: 0.03936230018734932
Loss at step 600: 0.04920635372400284
Loss at step 650: 0.03439559042453766
Loss at step 700: 0.044927652925252914
Loss at step 750: 0.05031128227710724
Loss at step 800: 0.03805975615978241
Loss at step 850: 0.03496336564421654
Loss at step 900: 0.04957449063658714
Mean training loss after epoch 173: 0.03956725788928235
EPOCH: 174
Loss at step 0: 0.027480291202664375
Loss at step 50: 0.03697388619184494
Loss at step 100: 0.03777739405632019
Loss at step 150: 0.03281679376959801
Loss at step 200: 0.03794809803366661
Loss at step 250: 0.0320734940469265
Loss at step 300: 0.0341799259185791
Loss at step 350: 0.051777079701423645
Loss at step 400: 0.0558072105050087
Loss at step 450: 0.033157266676425934
Loss at step 500: 0.04589252918958664
Loss at step 550: 0.030640888959169388
Loss at step 600: 0.032301343977451324
Loss at step 650: 0.03360531106591225
Loss at step 700: 0.03500358387827873
Loss at step 750: 0.0347106046974659
Loss at step 800: 0.028394857421517372
Loss at step 850: 0.03694157674908638
Loss at step 900: 0.04794781282544136
Mean training loss after epoch 174: 0.03932879738279307
EPOCH: 175
Loss at step 0: 0.03555794060230255
Loss at step 50: 0.02648364193737507
Loss at step 100: 0.03459541127085686
Loss at step 150: 0.03992311656475067
Loss at step 200: 0.024893423542380333
Loss at step 250: 0.047270677983760834
Loss at step 300: 0.03519446775317192
Loss at step 350: 0.09180052578449249
Loss at step 400: 0.02917756326496601
Loss at step 450: 0.034309279173612595
Loss at step 500: 0.03961813449859619
Loss at step 550: 0.04994247853755951
Loss at step 600: 0.04075918719172478
Loss at step 650: 0.031336862593889236
Loss at step 700: 0.03452292084693909
Loss at step 750: 0.05241398140788078
Loss at step 800: 0.027623998001217842
Loss at step 850: 0.02912183292210102
Loss at step 900: 0.03710698336362839
Mean training loss after epoch 175: 0.039300479576674735
EPOCH: 176
Loss at step 0: 0.036183230578899384
Loss at step 50: 0.0666927918791771
Loss at step 100: 0.0493197925388813
Loss at step 150: 0.03803710639476776
Loss at step 200: 0.03835935518145561
Loss at step 250: 0.031230589374899864
Loss at step 300: 0.03340180218219757
Loss at step 350: 0.040427062660455704
Loss at step 400: 0.03160826861858368
Loss at step 450: 0.044037673622369766
Loss at step 500: 0.03656746819615364
Loss at step 550: 0.036755748093128204
Loss at step 600: 0.059527214616537094
Loss at step 650: 0.040880002081394196
Loss at step 700: 0.051435764878988266
Loss at step 750: 0.059903018176555634
Loss at step 800: 0.03034154139459133
Loss at step 850: 0.037557195872068405
Loss at step 900: 0.034874048084020615
Mean training loss after epoch 176: 0.040028610705598586
EPOCH: 177
Loss at step 0: 0.03473574295639992
Loss at step 50: 0.05093587562441826
Loss at step 100: 0.02982262894511223
Loss at step 150: 0.030545443296432495
Loss at step 200: 0.03678588569164276
Loss at step 250: 0.036256153136491776
Loss at step 300: 0.04075793921947479
Loss at step 350: 0.03677338734269142
Loss at step 400: 0.03143291920423508
Loss at step 450: 0.04609094187617302
Loss at step 500: 0.04759456589818001
Loss at step 550: 0.03229653090238571
Loss at step 600: 0.037571556866168976
Loss at step 650: 0.03082422725856304
Loss at step 700: 0.06481219828128815
Loss at step 750: 0.06181209906935692
Loss at step 800: 0.036778099834918976
Loss at step 850: 0.05306657403707504
Loss at step 900: 0.027700865641236305
Mean training loss after epoch 177: 0.03933695467042008
EPOCH: 178
Loss at step 0: 0.03726204112172127
Loss at step 50: 0.03066483698785305
Loss at step 100: 0.03189198300242424
Loss at step 150: 0.03496934846043587
Loss at step 200: 0.03675440698862076
Loss at step 250: 0.05385400727391243
Loss at step 300: 0.06269004940986633
Loss at step 350: 0.0388777069747448
Loss at step 400: 0.040307316929101944
Loss at step 450: 0.04561549797654152
Loss at step 500: 0.032421860843896866
Loss at step 550: 0.043146658688783646
Loss at step 600: 0.04828475043177605
Loss at step 650: 0.03869273141026497
Loss at step 700: 0.027358677238225937
Loss at step 750: 0.05097721889615059
Loss at step 800: 0.03179521858692169
Loss at step 850: 0.029768288135528564
Loss at step 900: 0.03665889427065849
Mean training loss after epoch 178: 0.03920851013402758
EPOCH: 179
Loss at step 0: 0.0432792492210865
Loss at step 50: 0.03617312014102936
Loss at step 100: 0.06370623409748077
Loss at step 150: 0.03260226547718048
Loss at step 200: 0.0277892854064703
Loss at step 250: 0.032151203602552414
Loss at step 300: 0.03364957869052887
Loss at step 350: 0.036430612206459045
Loss at step 400: 0.02578619495034218
Loss at step 450: 0.03799568489193916
Loss at step 500: 0.030682628974318504
Loss at step 550: 0.030547630041837692
Loss at step 600: 0.032164983451366425
Loss at step 650: 0.048926692456007004
Loss at step 700: 0.03040003776550293
Loss at step 750: 0.04074382036924362
Loss at step 800: 0.03304985165596008
Loss at step 850: 0.046545110642910004
Loss at step 900: 0.035846710205078125
Mean training loss after epoch 179: 0.03970178106088819
EPOCH: 180
Loss at step 0: 0.041110847145318985
Loss at step 50: 0.03983008489012718
Loss at step 100: 0.038944534957408905
Loss at step 150: 0.03114279918372631
Loss at step 200: 0.035370562225580215
Loss at step 250: 0.05519220605492592
Loss at step 300: 0.049281731247901917
Loss at step 350: 0.039435915648937225
Loss at step 400: 0.046951428055763245
Loss at step 450: 0.03373648226261139
Loss at step 500: 0.034319255501031876
Loss at step 550: 0.0509195514023304
Loss at step 600: 0.0303181242197752
Loss at step 650: 0.055917058140039444
Loss at step 700: 0.03461319953203201
Loss at step 750: 0.07615417242050171
Loss at step 800: 0.08306694775819778
Loss at step 850: 0.039755940437316895
Loss at step 900: 0.05004990100860596
Mean training loss after epoch 180: 0.04022076416776569
EPOCH: 181
Loss at step 0: 0.029874665662646294
Loss at step 50: 0.029927344992756844
Loss at step 100: 0.03604910150170326
Loss at step 150: 0.03975125402212143
Loss at step 200: 0.03120443783700466
Loss at step 250: 0.04868396371603012
Loss at step 300: 0.049526773393154144
Loss at step 350: 0.03383695334196091
Loss at step 400: 0.034840282052755356
Loss at step 450: 0.05584263056516647
Loss at step 500: 0.0310691948980093
Loss at step 550: 0.04415060952305794
Loss at step 600: 0.032994214445352554
Loss at step 650: 0.03248259797692299
Loss at step 700: 0.03337256610393524
Loss at step 750: 0.035838667303323746
Loss at step 800: 0.050576042383909225
Loss at step 850: 0.06149858236312866
Loss at step 900: 0.035496558994054794
Mean training loss after epoch 181: 0.03953436933267218
EPOCH: 182
Loss at step 0: 0.03431776538491249
Loss at step 50: 0.04044733941555023
Loss at step 100: 0.049678489565849304
Loss at step 150: 0.052443064749240875
Loss at step 200: 0.03474302589893341
Loss at step 250: 0.03703489899635315
Loss at step 300: 0.04352117329835892
Loss at step 350: 0.03962820768356323
Loss at step 400: 0.037225913256406784
Loss at step 450: 0.05509385094046593
Loss at step 500: 0.03221511468291283
Loss at step 550: 0.04929589107632637
Loss at step 600: 0.0626567006111145
Loss at step 650: 0.02498018741607666
Loss at step 700: 0.03269823268055916
Loss at step 750: 0.04034951329231262
Loss at step 800: 0.038959357887506485
Loss at step 850: 0.045052941888570786
Loss at step 900: 0.030315130949020386
Mean training loss after epoch 182: 0.03948396699292573
EPOCH: 183
Loss at step 0: 0.05397326871752739
Loss at step 50: 0.03509332984685898
Loss at step 100: 0.04604775458574295
Loss at step 150: 0.03923487663269043
Loss at step 200: 0.04648161679506302
Loss at step 250: 0.07260828465223312
Loss at step 300: 0.04022159054875374
Loss at step 350: 0.04950942099094391
Loss at step 400: 0.05045710504055023
Loss at step 450: 0.0487130731344223
Loss at step 500: 0.03563311696052551
Loss at step 550: 0.06734095513820648
Loss at step 600: 0.031074846163392067
Loss at step 650: 0.05064196512103081
Loss at step 700: 0.06535758823156357
Loss at step 750: 0.04022606834769249
Loss at step 800: 0.0402403250336647
Loss at step 850: 0.035112425684928894
Loss at step 900: 0.048231590539216995
Mean training loss after epoch 183: 0.03999095734979298
EPOCH: 184
Loss at step 0: 0.06572314351797104
Loss at step 50: 0.040734417736530304
Loss at step 100: 0.033031612634658813
Loss at step 150: 0.04663529992103577
Loss at step 200: 0.028599590063095093
Loss at step 250: 0.03933103755116463
Loss at step 300: 0.029691778123378754
Loss at step 350: 0.034657254815101624
Loss at step 400: 0.0519569106400013
Loss at step 450: 0.04233112558722496
Loss at step 500: 0.03395779803395271
Loss at step 550: 0.036949314177036285
Loss at step 600: 0.036554135382175446
Loss at step 650: 0.030880143865942955
Loss at step 700: 0.044782429933547974
Loss at step 750: 0.04911411926150322
Loss at step 800: 0.037866562604904175
Loss at step 850: 0.06262623518705368
Loss at step 900: 0.03039371594786644
Mean training loss after epoch 184: 0.039548526968814925
EPOCH: 185
Loss at step 0: 0.0805806964635849
Loss at step 50: 0.027917854487895966
Loss at step 100: 0.03495066240429878
Loss at step 150: 0.027538307011127472
Loss at step 200: 0.03303145989775658
Loss at step 250: 0.031799983233213425
Loss at step 300: 0.06004394590854645
Loss at step 350: 0.030346397310495377
Loss at step 400: 0.031851932406425476
Loss at step 450: 0.05824219435453415
Loss at step 500: 0.028569376096129417
Loss at step 550: 0.027987563982605934
Loss at step 600: 0.042291879653930664
Loss at step 650: 0.03323879837989807
Loss at step 700: 0.059478759765625
Loss at step 750: 0.03162425383925438
Loss at step 800: 0.033148087561130524
Loss at step 850: 0.03013218380510807
Loss at step 900: 0.0303540900349617
Mean training loss after epoch 185: 0.03991742543518734
EPOCH: 186
Loss at step 0: 0.03823243826627731
Loss at step 50: 0.04821484163403511
Loss at step 100: 0.0335773304104805
Loss at step 150: 0.03713402897119522
Loss at step 200: 0.059686921536922455
Loss at step 250: 0.030698001384735107
Loss at step 300: 0.05005711689591408
Loss at step 350: 0.03683238849043846
Loss at step 400: 0.031736306846141815
Loss at step 450: 0.02648918889462948
Loss at step 500: 0.03295756131410599
Loss at step 550: 0.055000800639390945
Loss at step 600: 0.03838678076863289
Loss at step 650: 0.03734767809510231
Loss at step 700: 0.034189265221357346
Loss at step 750: 0.031962718814611435
Loss at step 800: 0.04328812658786774
Loss at step 850: 0.04030853882431984
Loss at step 900: 0.030331796035170555
Mean training loss after epoch 186: 0.04037082401602698
EPOCH: 187
Loss at step 0: 0.03479158505797386
Loss at step 50: 0.027288708835840225
Loss at step 100: 0.03628791123628616
Loss at step 150: 0.04474860802292824
Loss at step 200: 0.046368908137083054
Loss at step 250: 0.03773191571235657
Loss at step 300: 0.03852430358529091
Loss at step 350: 0.043728701770305634
Loss at step 400: 0.03673646226525307
Loss at step 450: 0.04937160760164261
Loss at step 500: 0.03505728766322136
Loss at step 550: 0.03285621479153633
Loss at step 600: 0.03919614851474762
Loss at step 650: 0.03945038095116615
Loss at step 700: 0.03538677096366882
Loss at step 750: 0.044934406876564026
Loss at step 800: 0.03320298716425896
Loss at step 850: 0.04768715798854828
Loss at step 900: 0.03354670852422714
Mean training loss after epoch 187: 0.039411708287624664
EPOCH: 188
Loss at step 0: 0.03753814473748207
Loss at step 50: 0.04097013920545578
Loss at step 100: 0.030687859281897545
Loss at step 150: 0.0322105698287487
Loss at step 200: 0.03676756098866463
Loss at step 250: 0.038250818848609924
Loss at step 300: 0.03754312917590141
Loss at step 350: 0.03725780174136162
Loss at step 400: 0.047191228717565536
Loss at step 450: 0.06354070454835892
Loss at step 500: 0.032046183943748474
Loss at step 550: 0.04977399855852127
Loss at step 600: 0.036721959710121155
Loss at step 650: 0.05434718355536461
Loss at step 700: 0.038213517516851425
Loss at step 750: 0.024722011759877205
Loss at step 800: 0.02971751242876053
Loss at step 850: 0.0317879281938076
Loss at step 900: 0.04744962230324745
Mean training loss after epoch 188: 0.04035358981235322
EPOCH: 189
Loss at step 0: 0.040479883551597595
Loss at step 50: 0.036622148007154465
Loss at step 100: 0.034205641597509384
Loss at step 150: 0.04935614764690399
Loss at step 200: 0.029448114335536957
Loss at step 250: 0.03502771258354187
Loss at step 300: 0.05513657256960869
Loss at step 350: 0.04774866998195648
Loss at step 400: 0.0404169000685215
Loss at step 450: 0.06562421470880508
Loss at step 500: 0.035340260714292526
Loss at step 550: 0.0556841604411602
Loss at step 600: 0.030720310285687447
Loss at step 650: 0.036314480006694794
Loss at step 700: 0.03438333421945572
Loss at step 750: 0.033112671226263046
Loss at step 800: 0.03592928498983383
Loss at step 850: 0.03132610768079758
Loss at step 900: 0.03413686528801918
Mean training loss after epoch 189: 0.03904189117379916
EPOCH: 190
Loss at step 0: 0.04827755317091942
Loss at step 50: 0.029142968356609344
Loss at step 100: 0.036558929830789566
Loss at step 150: 0.0487666018307209
Loss at step 200: 0.052352845668792725
Loss at step 250: 0.05458885058760643
Loss at step 300: 0.03482016548514366
Loss at step 350: 0.029485084116458893
Loss at step 400: 0.039250362664461136
Loss at step 450: 0.051643602550029755
Loss at step 500: 0.03167468681931496
Loss at step 550: 0.04167293757200241
Loss at step 600: 0.03159556910395622
Loss at step 650: 0.04445411637425423
Loss at step 700: 0.04958534613251686
Loss at step 750: 0.03268647566437721
Loss at step 800: 0.06649065017700195
Loss at step 850: 0.06429065763950348
Loss at step 900: 0.04084879532456398
Mean training loss after epoch 190: 0.03958941305250819
EPOCH: 191
Loss at step 0: 0.035234544426202774
Loss at step 50: 0.037936512380838394
Loss at step 100: 0.043479036539793015
Loss at step 150: 0.03322908282279968
Loss at step 200: 0.049596164375543594
Loss at step 250: 0.04965569078922272
Loss at step 300: 0.0495617538690567
Loss at step 350: 0.03661232441663742
Loss at step 400: 0.03508910536766052
Loss at step 450: 0.036798376590013504
Loss at step 500: 0.054994553327560425
Loss at step 550: 0.03120352514088154
Loss at step 600: 0.048640795052051544
Loss at step 650: 0.0348392091691494
Loss at step 700: 0.055915024131536484
Loss at step 750: 0.05253498628735542
Loss at step 800: 0.031080756336450577
Loss at step 850: 0.04723038524389267
Loss at step 900: 0.03465425223112106
Mean training loss after epoch 191: 0.039508505227929876
EPOCH: 192
Loss at step 0: 0.028439735993742943
Loss at step 50: 0.04162871092557907
Loss at step 100: 0.031034814193844795
Loss at step 150: 0.05161105841398239
Loss at step 200: 0.03534523397684097
Loss at step 250: 0.03421344980597496
Loss at step 300: 0.03441846743226051
Loss at step 350: 0.05163728818297386
Loss at step 400: 0.038883548229932785
Loss at step 450: 0.031683698296546936
Loss at step 500: 0.044283077120780945
Loss at step 550: 0.0333872064948082
Loss at step 600: 0.03577182814478874
Loss at step 650: 0.035094004124403
Loss at step 700: 0.052114710211753845
Loss at step 750: 0.028549348935484886
Loss at step 800: 0.031279418617486954
Loss at step 850: 0.04974686726927757
Loss at step 900: 0.06214940547943115
Mean training loss after epoch 192: 0.03946408471747884
EPOCH: 193
Loss at step 0: 0.02789539098739624
Loss at step 50: 0.04128626734018326
Loss at step 100: 0.031125077977776527
Loss at step 150: 0.054317064583301544
Loss at step 200: 0.035284653306007385
Loss at step 250: 0.030020158737897873
Loss at step 300: 0.0426630899310112
Loss at step 350: 0.02814929373562336
Loss at step 400: 0.03576153889298439
Loss at step 450: 0.030557071790099144
Loss at step 500: 0.07169098407030106
Loss at step 550: 0.046338628977537155
Loss at step 600: 0.07676282525062561
Loss at step 650: 0.027175115421414375
Loss at step 700: 0.03226439654827118
Loss at step 750: 0.05648811534047127
Loss at step 800: 0.05417701229453087
Loss at step 850: 0.034749649465084076
Loss at step 900: 0.05131750926375389
Mean training loss after epoch 193: 0.0390426356639308
EPOCH: 194
Loss at step 0: 0.031341664493083954
Loss at step 50: 0.034224752336740494
Loss at step 100: 0.030920159071683884
Loss at step 150: 0.03192133828997612
Loss at step 200: 0.05085880681872368
Loss at step 250: 0.030830208212137222
Loss at step 300: 0.032128382474184036
Loss at step 350: 0.03746500611305237
Loss at step 400: 0.03606203570961952
Loss at step 450: 0.034037210047245026
Loss at step 500: 0.032156843692064285
Loss at step 550: 0.05800786241889
Loss at step 600: 0.03398191183805466
Loss at step 650: 0.034924451261758804
Loss at step 700: 0.03250403329730034
Loss at step 750: 0.02987152710556984
Loss at step 800: 0.03198182210326195
Loss at step 850: 0.03203301131725311
Loss at step 900: 0.027829548344016075
Mean training loss after epoch 194: 0.03956125608123124
EPOCH: 195
Loss at step 0: 0.03269430249929428
Loss at step 50: 0.040413130074739456
Loss at step 100: 0.037335216999053955
Loss at step 150: 0.037506330758333206
Loss at step 200: 0.04935978725552559
Loss at step 250: 0.04005909338593483
Loss at step 300: 0.051902711391448975
Loss at step 350: 0.04994789883494377
Loss at step 400: 0.06731879711151123
Loss at step 450: 0.041173528879880905
Loss at step 500: 0.03356955945491791
Loss at step 550: 0.03218766674399376
Loss at step 600: 0.03400886803865433
Loss at step 650: 0.046790432184934616
Loss at step 700: 0.0261378213763237
Loss at step 750: 0.04084371030330658
Loss at step 800: 0.03878629207611084
Loss at step 850: 0.031192457303404808
Loss at step 900: 0.04036341607570648
Mean training loss after epoch 195: 0.03942633575714156
EPOCH: 196
Loss at step 0: 0.05267094075679779
Loss at step 50: 0.051652032881975174
Loss at step 100: 0.030373184010386467
Loss at step 150: 0.04002588614821434
Loss at step 200: 0.033278606832027435
Loss at step 250: 0.03507424145936966
Loss at step 300: 0.037266410887241364
Loss at step 350: 0.039247218519449234
Loss at step 400: 0.029236620292067528
Loss at step 450: 0.03595879673957825
Loss at step 500: 0.042438700795173645
Loss at step 550: 0.04551468789577484
Loss at step 600: 0.04240725561976433
Loss at step 650: 0.03342801705002785
Loss at step 700: 0.0306172464042902
Loss at step 750: 0.03251747041940689
Loss at step 800: 0.037676624953746796
Loss at step 850: 0.03884509578347206
Loss at step 900: 0.05178022384643555
Mean training loss after epoch 196: 0.03961125413166371
EPOCH: 197
Loss at step 0: 0.03382141515612602
Loss at step 50: 0.036528680473566055
Loss at step 100: 0.03256684169173241
Loss at step 150: 0.04024352878332138
Loss at step 200: 0.03290350362658501
Loss at step 250: 0.05601493641734123
Loss at step 300: 0.03842312470078468
Loss at step 350: 0.03384149447083473
Loss at step 400: 0.03558668866753578
Loss at step 450: 0.03347795829176903
Loss at step 500: 0.03716802969574928
Loss at step 550: 0.03866909071803093
Loss at step 600: 0.04315279796719551
Loss at step 650: 0.030964933335781097
Loss at step 700: 0.031178439036011696
Loss at step 750: 0.030622651800513268
Loss at step 800: 0.03555544838309288
Loss at step 850: 0.04361952841281891
Loss at step 900: 0.029841801151633263
Mean training loss after epoch 197: 0.03926740740058519
EPOCH: 198
Loss at step 0: 0.03077380359172821
Loss at step 50: 0.0331268385052681
Loss at step 100: 0.028607361018657684
Loss at step 150: 0.027219180017709732
Loss at step 200: 0.03859898820519447
Loss at step 250: 0.043153245002031326
Loss at step 300: 0.037560559809207916
Loss at step 350: 0.03113088570535183
Loss at step 400: 0.05178512632846832
Loss at step 450: 0.03197965398430824
Loss at step 500: 0.036399465054273605
Loss at step 550: 0.039263416081666946
Loss at step 600: 0.03154432773590088
Loss at step 650: 0.033167921006679535
Loss at step 700: 0.03971502184867859
Loss at step 750: 0.029375536367297173
Loss at step 800: 0.08838345855474472
Loss at step 850: 0.054228276014328
Loss at step 900: 0.034299567341804504
Mean training loss after epoch 198: 0.039270908790213596
EPOCH: 199
Loss at step 0: 0.032876938581466675
Loss at step 50: 0.04957316815853119
Loss at step 100: 0.04838715121150017
Loss at step 150: 0.06222357973456383
Loss at step 200: 0.03767170011997223
Loss at step 250: 0.03373800590634346
Loss at step 300: 0.03463589772582054
Loss at step 350: 0.036685265600681305
Loss at step 400: 0.037816766649484634
Loss at step 450: 0.04994241148233414
Loss at step 500: 0.05309994891285896
Loss at step 550: 0.050136446952819824
Loss at step 600: 0.034272655844688416
Loss at step 650: 0.04841490089893341
Loss at step 700: 0.032551348209381104
Loss at step 750: 0.04803447797894478
Loss at step 800: 0.027424441650509834
Loss at step 850: 0.03509342297911644
Loss at step 900: 0.04698752239346504
Mean training loss after epoch 199: 0.039368446850811624
EPOCH: 200
Loss at step 0: 0.033516522496938705
Loss at step 50: 0.033314839005470276
Loss at step 100: 0.028926312923431396
Loss at step 150: 0.045188531279563904
Loss at step 200: 0.03707105666399002
Loss at step 250: 0.04511125758290291
Loss at step 300: 0.041642624884843826
Loss at step 350: 0.03727297484874725
Loss at step 400: 0.04820782691240311
Loss at step 450: 0.05873133987188339
Loss at step 500: 0.03943607211112976
Loss at step 550: 0.03903554007411003
Loss at step 600: 0.03445577993988991
Loss at step 650: 0.03482406958937645
Loss at step 700: 0.03471191227436066
Loss at step 750: 0.0357685312628746
Loss at step 800: 0.046298760920763016
Loss at step 850: 0.03532588854432106
Loss at step 900: 0.04996367171406746
Mean training loss after epoch 200: 0.03945373677646618
EPOCH: 201
Loss at step 0: 0.035214416682720184
Loss at step 50: 0.035346150398254395
Loss at step 100: 0.034103572368621826
Loss at step 150: 0.038458410650491714
Loss at step 200: 0.038615066558122635
Loss at step 250: 0.03179283067584038
Loss at step 300: 0.035539235919713974
Loss at step 350: 0.03436723351478577
Loss at step 400: 0.05592014268040657
Loss at step 450: 0.02996472455561161
Loss at step 500: 0.03577533736824989
Loss at step 550: 0.025224490091204643
Loss at step 600: 0.03710426017642021
Loss at step 650: 0.04781566187739372
Loss at step 700: 0.03749268129467964
Loss at step 750: 0.03389711305499077
Loss at step 800: 0.03144014626741409
Loss at step 850: 0.038093969225883484
Loss at step 900: 0.04576469212770462
Mean training loss after epoch 201: 0.03969399575620636
EPOCH: 202
Loss at step 0: 0.0476631335914135
Loss at step 50: 0.031282223761081696
Loss at step 100: 0.044516369700431824
Loss at step 150: 0.06186264380812645
Loss at step 200: 0.0393553301692009
Loss at step 250: 0.03279627859592438
Loss at step 300: 0.0629049614071846
Loss at step 350: 0.030579999089241028
Loss at step 400: 0.031155675649642944
Loss at step 450: 0.06583017855882645
Loss at step 500: 0.05303068459033966
Loss at step 550: 0.036140043288469315
Loss at step 600: 0.03231840208172798
Loss at step 650: 0.05249636247754097
Loss at step 700: 0.051017142832279205
Loss at step 750: 0.04280295595526695
Loss at step 800: 0.03303103521466255
Loss at step 850: 0.03351632133126259
Loss at step 900: 0.038658663630485535
Mean training loss after epoch 202: 0.03924750817268451
EPOCH: 203
Loss at step 0: 0.04230416938662529
Loss at step 50: 0.03197874128818512
Loss at step 100: 0.034516096115112305
Loss at step 150: 0.0965784564614296
Loss at step 200: 0.036621060222387314
Loss at step 250: 0.033483728766441345
Loss at step 300: 0.05883985385298729
Loss at step 350: 0.03732655197381973
Loss at step 400: 0.03222108259797096
Loss at step 450: 0.0372038297355175
Loss at step 500: 0.031740378588438034
Loss at step 550: 0.032624147832393646
Loss at step 600: 0.035175345838069916
Loss at step 650: 0.0395626574754715
Loss at step 700: 0.04586372524499893
Loss at step 750: 0.032561711966991425
Loss at step 800: 0.06468833237886429
Loss at step 850: 0.03629869595170021
Loss at step 900: 0.030181996524333954
Mean training loss after epoch 203: 0.03954556779161509
EPOCH: 204
Loss at step 0: 0.029719999060034752
Loss at step 50: 0.03435388207435608
Loss at step 100: 0.035306770354509354
Loss at step 150: 0.04551118612289429
Loss at step 200: 0.03435719758272171
Loss at step 250: 0.034615252166986465
Loss at step 300: 0.030884413048624992
Loss at step 350: 0.050709690898656845
Loss at step 400: 0.038177862763404846
Loss at step 450: 0.031270384788513184
Loss at step 500: 0.035266853868961334
Loss at step 550: 0.07006863504648209
Loss at step 600: 0.03182707354426384
Loss at step 650: 0.05542439594864845
Loss at step 700: 0.04853573441505432
Loss at step 750: 0.027383018285036087
Loss at step 800: 0.03046734444797039
Loss at step 850: 0.04207098111510277
Loss at step 900: 0.03459654748439789
Mean training loss after epoch 204: 0.03929966339456247
EPOCH: 205
Loss at step 0: 0.0312785767018795
Loss at step 50: 0.036261215806007385
Loss at step 100: 0.04925072193145752
Loss at step 150: 0.04340329393744469
Loss at step 200: 0.039639346301555634
Loss at step 250: 0.03776752948760986
Loss at step 300: 0.042972952127456665
Loss at step 350: 0.030083203688263893
Loss at step 400: 0.04385426640510559
Loss at step 450: 0.0570792555809021
Loss at step 500: 0.05148709565401077
Loss at step 550: 0.0342877060174942
Loss at step 600: 0.037329480051994324
Loss at step 650: 0.03023405931890011
Loss at step 700: 0.027055177837610245
Loss at step 750: 0.026842977851629257
Loss at step 800: 0.050824299454689026
Loss at step 850: 0.032287370413541794
Loss at step 900: 0.02937036007642746
Mean training loss after epoch 205: 0.03916449878197997
EPOCH: 206
Loss at step 0: 0.03259656950831413
Loss at step 50: 0.04985092580318451
Loss at step 100: 0.02949802204966545
Loss at step 150: 0.036611951887607574
Loss at step 200: 0.039992108941078186
Loss at step 250: 0.0345536433160305
Loss at step 300: 0.032152608036994934
Loss at step 350: 0.04637967050075531
Loss at step 400: 0.02919534221291542
Loss at step 450: 0.031057648360729218
Loss at step 500: 0.03315690532326698
Loss at step 550: 0.05003923177719116
Loss at step 600: 0.045779258012771606
Loss at step 650: 0.02811567671597004
Loss at step 700: 0.038607172667980194
Loss at step 750: 0.04060652107000351
Loss at step 800: 0.0453539676964283
Loss at step 850: 0.035857975482940674
Loss at step 900: 0.031072260811924934
Mean training loss after epoch 206: 0.038921635300079895
EPOCH: 207
Loss at step 0: 0.03965787962079048
Loss at step 50: 0.0446898527443409
Loss at step 100: 0.028009677305817604
Loss at step 150: 0.038405463099479675
Loss at step 200: 0.06563003361225128
Loss at step 250: 0.03391384705901146
Loss at step 300: 0.04727596417069435
Loss at step 350: 0.03483771160244942
Loss at step 400: 0.036268409341573715
Loss at step 450: 0.030389249324798584
Loss at step 500: 0.0310251172631979
Loss at step 550: 0.0379030704498291
Loss at step 600: 0.040424078702926636
Loss at step 650: 0.0521022230386734
Loss at step 700: 0.033831410109996796
Loss at step 750: 0.033530574291944504
Loss at step 800: 0.036986444145441055
Loss at step 850: 0.032420702278614044
Loss at step 900: 0.03470515087246895
Mean training loss after epoch 207: 0.03863498132858576
EPOCH: 208
Loss at step 0: 0.033557191491127014
Loss at step 50: 0.03412749990820885
Loss at step 100: 0.048487868160009384
Loss at step 150: 0.05572287738323212
Loss at step 200: 0.04506176337599754
Loss at step 250: 0.06600682437419891
Loss at step 300: 0.05004393309354782
Loss at step 350: 0.05113781616091728
Loss at step 400: 0.03928277641534805
Loss at step 450: 0.056767385452985764
Loss at step 500: 0.04085297882556915
Loss at step 550: 0.0346493199467659
Loss at step 600: 0.031214237213134766
Loss at step 650: 0.04923402890563011
Loss at step 700: 0.06293520331382751
Loss at step 750: 0.03206819295883179
Loss at step 800: 0.037167709320783615
Loss at step 850: 0.03659123182296753
Loss at step 900: 0.06300326436758041
Mean training loss after epoch 208: 0.03929860922279579
EPOCH: 209
Loss at step 0: 0.03361137583851814
Loss at step 50: 0.026428446173667908
Loss at step 100: 0.034164272248744965
Loss at step 150: 0.0364573635160923
Loss at step 200: 0.041490085422992706
Loss at step 250: 0.05014254152774811
Loss at step 300: 0.04537137597799301
Loss at step 350: 0.03958693891763687
Loss at step 400: 0.03602161258459091
Loss at step 450: 0.028751106932759285
Loss at step 500: 0.04826076701283455
Loss at step 550: 0.02965088002383709
Loss at step 600: 0.03778247535228729
Loss at step 650: 0.05167883262038231
Loss at step 700: 0.029958970844745636
Loss at step 750: 0.028692517429590225
Loss at step 800: 0.029456930235028267
Loss at step 850: 0.04437252879142761
Loss at step 900: 0.03329407423734665
Mean training loss after epoch 209: 0.03891603271367707
EPOCH: 210
Loss at step 0: 0.034966133534908295
Loss at step 50: 0.03536030650138855
Loss at step 100: 0.07317616790533066
Loss at step 150: 0.0346195213496685
Loss at step 200: 0.027271442115306854
Loss at step 250: 0.024927347898483276
Loss at step 300: 0.031957872211933136
Loss at step 350: 0.033988021314144135
Loss at step 400: 0.0324539951980114
Loss at step 450: 0.027275538071990013
Loss at step 500: 0.034581758081912994
Loss at step 550: 0.05167417973279953
Loss at step 600: 0.03554034233093262
Loss at step 650: 0.03346965089440346
Loss at step 700: 0.03680897504091263
Loss at step 750: 0.03390275314450264
Loss at step 800: 0.03475474938750267
Loss at step 850: 0.03505280613899231
Loss at step 900: 0.030334360897541046
Mean training loss after epoch 210: 0.03880712026352885
EPOCH: 211
Loss at step 0: 0.036252912133932114
Loss at step 50: 0.039842527359724045
Loss at step 100: 0.05772003158926964
Loss at step 150: 0.03693200275301933
Loss at step 200: 0.04155736789107323
Loss at step 250: 0.07334703207015991
Loss at step 300: 0.03494369238615036
Loss at step 350: 0.03264397382736206
Loss at step 400: 0.029918232932686806
Loss at step 450: 0.02940177358686924
Loss at step 500: 0.03305615857243538
Loss at step 550: 0.038510505110025406
Loss at step 600: 0.03817708045244217
Loss at step 650: 0.06494858860969543
Loss at step 700: 0.030305013060569763
Loss at step 750: 0.031748753041028976
Loss at step 800: 0.03335447981953621
Loss at step 850: 0.029417529702186584
Loss at step 900: 0.03409936651587486
Mean training loss after epoch 211: 0.03944876101582861
EPOCH: 212
Loss at step 0: 0.03463529050350189
Loss at step 50: 0.04601624608039856
Loss at step 100: 0.03157833218574524
Loss at step 150: 0.036684878170490265
Loss at step 200: 0.02828829362988472
Loss at step 250: 0.032570887356996536
Loss at step 300: 0.03535538911819458
Loss at step 350: 0.03101705014705658
Loss at step 400: 0.04566393792629242
Loss at step 450: 0.03168395534157753
Loss at step 500: 0.0468168742954731
Loss at step 550: 0.03592550754547119
Loss at step 600: 0.030794337391853333
Loss at step 650: 0.039325911551713943
Loss at step 700: 0.03656806796789169
Loss at step 750: 0.03873779997229576
Loss at step 800: 0.03337247669696808
Loss at step 850: 0.05102049559354782
Loss at step 900: 0.03313032537698746
Mean training loss after epoch 212: 0.038925153630247504
EPOCH: 213
Loss at step 0: 0.03312905132770538
Loss at step 50: 0.034856270998716354
Loss at step 100: 0.03638290613889694
Loss at step 150: 0.030590932816267014
Loss at step 200: 0.03178534656763077
Loss at step 250: 0.03628033027052879
Loss at step 300: 0.03542640805244446
Loss at step 350: 0.03868675231933594
Loss at step 400: 0.03446945920586586
Loss at step 450: 0.031632207334041595
Loss at step 500: 0.03109968639910221
Loss at step 550: 0.03217053413391113
Loss at step 600: 0.031699612736701965
Loss at step 650: 0.031050097197294235
Loss at step 700: 0.03135169669985771
Loss at step 750: 0.05426996573805809
Loss at step 800: 0.031804606318473816
Loss at step 850: 0.037588391453027725
Loss at step 900: 0.058043185621500015
Mean training loss after epoch 213: 0.038873043518934426
EPOCH: 214
Loss at step 0: 0.028386587277054787
Loss at step 50: 0.050295133143663406
Loss at step 100: 0.046243395656347275
Loss at step 150: 0.03635299205780029
Loss at step 200: 0.032567527145147324
Loss at step 250: 0.026220694184303284
Loss at step 300: 0.029877593740820885
Loss at step 350: 0.036172349005937576
Loss at step 400: 0.028854016214609146
Loss at step 450: 0.033716216683387756
Loss at step 500: 0.022720973938703537
Loss at step 550: 0.03392207995057106
Loss at step 600: 0.03281420096755028
Loss at step 650: 0.06516284495592117
Loss at step 700: 0.038289912045001984
Loss at step 750: 0.03463584557175636
Loss at step 800: 0.03736652433872223
Loss at step 850: 0.049522798508405685
Loss at step 900: 0.03491751104593277
Mean training loss after epoch 214: 0.0391494517685222
EPOCH: 215
Loss at step 0: 0.028759721666574478
Loss at step 50: 0.0389941930770874
Loss at step 100: 0.030850136652588844
Loss at step 150: 0.0318099707365036
Loss at step 200: 0.034005057066679
Loss at step 250: 0.052801866084337234
Loss at step 300: 0.035044100135564804
Loss at step 350: 0.052397601306438446
Loss at step 400: 0.04918781667947769
Loss at step 450: 0.0334693007171154
Loss at step 500: 0.0463688038289547
Loss at step 550: 0.033775392919778824
Loss at step 600: 0.030683718621730804
Loss at step 650: 0.03664135932922363
Loss at step 700: 0.03418098762631416
Loss at step 750: 0.03744911029934883
Loss at step 800: 0.029192639514803886
Loss at step 850: 0.05244458466768265
Loss at step 900: 0.03115343302488327
Mean training loss after epoch 215: 0.039243683344059026
EPOCH: 216
Loss at step 0: 0.0490545816719532
Loss at step 50: 0.04258950799703598
Loss at step 100: 0.03089594468474388
Loss at step 150: 0.051975492388010025
Loss at step 200: 0.051118168979883194
Loss at step 250: 0.03654303401708603
Loss at step 300: 0.05530314892530441
Loss at step 350: 0.048319995403289795
Loss at step 400: 0.03275396302342415
Loss at step 450: 0.03487536683678627
Loss at step 500: 0.03248892351984978
Loss at step 550: 0.03192398324608803
Loss at step 600: 0.0315568670630455
Loss at step 650: 0.033777620643377304
Loss at step 700: 0.04205819591879845
Loss at step 750: 0.05035315454006195
Loss at step 800: 0.032998502254486084
Loss at step 850: 0.03232625871896744
Loss at step 900: 0.04717819020152092
Mean training loss after epoch 216: 0.039436278183624815
EPOCH: 217
Loss at step 0: 0.03221813589334488
Loss at step 50: 0.03685177117586136
Loss at step 100: 0.03466470539569855
Loss at step 150: 0.031917549669742584
Loss at step 200: 0.03797820210456848
Loss at step 250: 0.04220956936478615
Loss at step 300: 0.02914607711136341
Loss at step 350: 0.05011358857154846
Loss at step 400: 0.03618037700653076
Loss at step 450: 0.03652055934071541
Loss at step 500: 0.029459647834300995
Loss at step 550: 0.031061850488185883
Loss at step 600: 0.03643409535288811
Loss at step 650: 0.04160930588841438
Loss at step 700: 0.03050972893834114
Loss at step 750: 0.03366422280669212
Loss at step 800: 0.04819341376423836
Loss at step 850: 0.04103286191821098
Loss at step 900: 0.03597785905003548
Mean training loss after epoch 217: 0.03884100024379901
EPOCH: 218
Loss at step 0: 0.034691356122493744
Loss at step 50: 0.0512816347181797
Loss at step 100: 0.0498422309756279
Loss at step 150: 0.034464772790670395
Loss at step 200: 0.03690893575549126
Loss at step 250: 0.027062783017754555
Loss at step 300: 0.037605658173561096
Loss at step 350: 0.034629516303539276
Loss at step 400: 0.033738873898983
Loss at step 450: 0.05193618685007095
Loss at step 500: 0.0349096916615963
Loss at step 550: 0.03417463228106499
Loss at step 600: 0.03664156794548035
Loss at step 650: 0.050338178873062134
Loss at step 700: 0.04119217023253441
Loss at step 750: 0.04932699352502823
Loss at step 800: 0.032882045954465866
Loss at step 850: 0.05195748060941696
Loss at step 900: 0.056095194071531296
Mean training loss after epoch 218: 0.039159961821221466
EPOCH: 219
Loss at step 0: 0.03401932865381241
Loss at step 50: 0.033970147371292114
Loss at step 100: 0.02930084988474846
Loss at step 150: 0.038355208933353424
Loss at step 200: 0.02973788045346737
Loss at step 250: 0.031476765871047974
Loss at step 300: 0.03217240050435066
Loss at step 350: 0.03983880579471588
Loss at step 400: 0.035470906645059586
Loss at step 450: 0.06286390125751495
Loss at step 500: 0.03506041690707207
Loss at step 550: 0.033520910888910294
Loss at step 600: 0.03043210878968239
Loss at step 650: 0.029626408591866493
Loss at step 700: 0.03785299137234688
Loss at step 750: 0.029762817546725273
Loss at step 800: 0.03253250569105148
Loss at step 850: 0.03220551460981369
Loss at step 900: 0.03191516175866127
Mean training loss after epoch 219: 0.03961604257533227
EPOCH: 220
Loss at step 0: 0.035007499158382416
Loss at step 50: 0.03676850348711014
Loss at step 100: 0.04776357114315033
Loss at step 150: 0.05322731286287308
Loss at step 200: 0.0326763354241848
Loss at step 250: 0.02881242334842682
Loss at step 300: 0.06452496349811554
Loss at step 350: 0.04654304310679436
Loss at step 400: 0.033597029745578766
Loss at step 450: 0.06089720129966736
Loss at step 500: 0.05325775966048241
Loss at step 550: 0.034849777817726135
Loss at step 600: 0.03910364955663681
Loss at step 650: 0.04495980218052864
Loss at step 700: 0.03320116922259331
Loss at step 750: 0.031099630519747734
Loss at step 800: 0.02967253513634205
Loss at step 850: 0.05618506297469139
Loss at step 900: 0.023359069600701332
Mean training loss after epoch 220: 0.03917579456711057
EPOCH: 221
Loss at step 0: 0.033655982464551926
Loss at step 50: 0.03283160179853439
Loss at step 100: 0.06974532455205917
Loss at step 150: 0.03525441884994507
Loss at step 200: 0.033504560589790344
Loss at step 250: 0.05242004245519638
Loss at step 300: 0.049112677574157715
Loss at step 350: 0.03913629800081253
Loss at step 400: 0.030687332153320312
Loss at step 450: 0.038405779749155045
Loss at step 500: 0.055703528225421906
Loss at step 550: 0.030176103115081787
Loss at step 600: 0.0338451974093914
Loss at step 650: 0.03483753651380539
Loss at step 700: 0.046717628836631775
Loss at step 750: 0.030152037739753723
Loss at step 800: 0.051735568791627884
Loss at step 850: 0.037322036921978
Loss at step 900: 0.03671986982226372
Mean training loss after epoch 221: 0.039529672089114246
EPOCH: 222
Loss at step 0: 0.040746379643678665
Loss at step 50: 0.03421521559357643
Loss at step 100: 0.03173847496509552
Loss at step 150: 0.027398396283388138
Loss at step 200: 0.06909073889255524
Loss at step 250: 0.02943158522248268
Loss at step 300: 0.04270821437239647
Loss at step 350: 0.036493346095085144
Loss at step 400: 0.03201095387339592
Loss at step 450: 0.03346439078450203
Loss at step 500: 0.03585893660783768
Loss at step 550: 0.03641887009143829
Loss at step 600: 0.029378972947597504
Loss at step 650: 0.03823823109269142
Loss at step 700: 0.03458854556083679
Loss at step 750: 0.03295740485191345
Loss at step 800: 0.03287142887711525
Loss at step 850: 0.027882471680641174
Loss at step 900: 0.037205036729574203
Mean training loss after epoch 222: 0.038900254218817265
EPOCH: 223
Loss at step 0: 0.0465879812836647
Loss at step 50: 0.04563435539603233
Loss at step 100: 0.04606280103325844
Loss at step 150: 0.03840605169534683
Loss at step 200: 0.05436091497540474
Loss at step 250: 0.03147437050938606
Loss at step 300: 0.053010765463113785
Loss at step 350: 0.048824213445186615
Loss at step 400: 0.03178839758038521
Loss at step 450: 0.05775563046336174
Loss at step 500: 0.0351368747651577
Loss at step 550: 0.03580415993928909
Loss at step 600: 0.03660053759813309
Loss at step 650: 0.04339035227894783
Loss at step 700: 0.029888346791267395
Loss at step 750: 0.06932703405618668
Loss at step 800: 0.06762956827878952
Loss at step 850: 0.047050874680280685
Loss at step 900: 0.0409468412399292
Mean training loss after epoch 223: 0.03930669576962238
EPOCH: 224
Loss at step 0: 0.03919639065861702
Loss at step 50: 0.03524477407336235
Loss at step 100: 0.0306867565959692
Loss at step 150: 0.036834586411714554
Loss at step 200: 0.039752665907144547
Loss at step 250: 0.03726755827665329
Loss at step 300: 0.035303812474012375
Loss at step 350: 0.052437279373407364
Loss at step 400: 0.04689083993434906
Loss at step 450: 0.05424738675355911
Loss at step 500: 0.03356577455997467
Loss at step 550: 0.033016446977853775
Loss at step 600: 0.04336623102426529
Loss at step 650: 0.03267418593168259
Loss at step 700: 0.0531155988574028
Loss at step 750: 0.03437129035592079
Loss at step 800: 0.05175966024398804
Loss at step 850: 0.028837081044912338
Loss at step 900: 0.05922437086701393
Mean training loss after epoch 224: 0.0386647883195009
EPOCH: 225
Loss at step 0: 0.0487726666033268
Loss at step 50: 0.03738081455230713
Loss at step 100: 0.03074018657207489
Loss at step 150: 0.03496900945901871
Loss at step 200: 0.04817022383213043
Loss at step 250: 0.04730185121297836
Loss at step 300: 0.034258127212524414
Loss at step 350: 0.034974534064531326
Loss at step 400: 0.04086778685450554
Loss at step 450: 0.03467993810772896
Loss at step 500: 0.033085793256759644
Loss at step 550: 0.09303003549575806
Loss at step 600: 0.035393036901950836
Loss at step 650: 0.035260267555713654
Loss at step 700: 0.02982688695192337
Loss at step 750: 0.032229792326688766
Loss at step 800: 0.0330289825797081
Loss at step 850: 0.03003479540348053
Loss at step 900: 0.029361525550484657
Mean training loss after epoch 225: 0.03890162737551592
EPOCH: 226
Loss at step 0: 0.032596834003925323
Loss at step 50: 0.03327403590083122
Loss at step 100: 0.05183485522866249
Loss at step 150: 0.035328250378370285
Loss at step 200: 0.03449290245771408
Loss at step 250: 0.03304436802864075
Loss at step 300: 0.0349038764834404
Loss at step 350: 0.034197621047496796
Loss at step 400: 0.03215165063738823
Loss at step 450: 0.05494612455368042
Loss at step 500: 0.03127453476190567
Loss at step 550: 0.039228979498147964
Loss at step 600: 0.03493810445070267
Loss at step 650: 0.031990375369787216
Loss at step 700: 0.05123075097799301
Loss at step 750: 0.02953650988638401
Loss at step 800: 0.06226984038949013
Loss at step 850: 0.0387241505086422
Loss at step 900: 0.04094972461462021
Mean training loss after epoch 226: 0.03866499135576522
EPOCH: 227
Loss at step 0: 0.036986760795116425
Loss at step 50: 0.03572157025337219
Loss at step 100: 0.03125032037496567
Loss at step 150: 0.03796594962477684
Loss at step 200: 0.03207015246152878
Loss at step 250: 0.04670996591448784
Loss at step 300: 0.04626971855759621
Loss at step 350: 0.032793719321489334
Loss at step 400: 0.05023602768778801
Loss at step 450: 0.04747440665960312
Loss at step 500: 0.029485540464520454
Loss at step 550: 0.032925404608249664
Loss at step 600: 0.04897836595773697
Loss at step 650: 0.03635069355368614
Loss at step 700: 0.031118150800466537
Loss at step 750: 0.04991304501891136
Loss at step 800: 0.03546535223722458
Loss at step 850: 0.04668305441737175
Loss at step 900: 0.03488590568304062
Mean training loss after epoch 227: 0.03901965910398058
EPOCH: 228
Loss at step 0: 0.04974370822310448
Loss at step 50: 0.03373951464891434
Loss at step 100: 0.033895570784807205
Loss at step 150: 0.03443300724029541
Loss at step 200: 0.03293636441230774
Loss at step 250: 0.04655670002102852
Loss at step 300: 0.02601798251271248
Loss at step 350: 0.03469035029411316
Loss at step 400: 0.02628219872713089
Loss at step 450: 0.034008219838142395
Loss at step 500: 0.037605393677949905
Loss at step 550: 0.04931284487247467
Loss at step 600: 0.047764576971530914
Loss at step 650: 0.03390823304653168
Loss at step 700: 0.036051489412784576
Loss at step 750: 0.03190223500132561
Loss at step 800: 0.027517585083842278
Loss at step 850: 0.026676075533032417
Loss at step 900: 0.035612836480140686
Mean training loss after epoch 228: 0.039303831541652616
EPOCH: 229
Loss at step 0: 0.0288917887955904
Loss at step 50: 0.06252633035182953
Loss at step 100: 0.03233402222394943
Loss at step 150: 0.03316564857959747
Loss at step 200: 0.034948062151670456
Loss at step 250: 0.02844369038939476
Loss at step 300: 0.031051771715283394
Loss at step 350: 0.04054409638047218
Loss at step 400: 0.03256020322442055
Loss at step 450: 0.03426603227853775
Loss at step 500: 0.033789653331041336
Loss at step 550: 0.027340136468410492
Loss at step 600: 0.03498152643442154
Loss at step 650: 0.033199094235897064
Loss at step 700: 0.04883769899606705
Loss at step 750: 0.041650619357824326
Loss at step 800: 0.04393523558974266
Loss at step 850: 0.03471892699599266
Loss at step 900: 0.04146659001708031
Mean training loss after epoch 229: 0.03924599493831905
EPOCH: 230
Loss at step 0: 0.03501119837164879
Loss at step 50: 0.031957730650901794
Loss at step 100: 0.06566370278596878
Loss at step 150: 0.03400876373052597
Loss at step 200: 0.03895947337150574
Loss at step 250: 0.03220095485448837
Loss at step 300: 0.03845254331827164
Loss at step 350: 0.03425909951329231
Loss at step 400: 0.03328506276011467
Loss at step 450: 0.03391508758068085
Loss at step 500: 0.0358009859919548
Loss at step 550: 0.03223380073904991
Loss at step 600: 0.028488216921687126
Loss at step 650: 0.03499921038746834
Loss at step 700: 0.028844734653830528
Loss at step 750: 0.04680425301194191
Loss at step 800: 0.029129402711987495
Loss at step 850: 0.05072740092873573
Loss at step 900: 0.03400781750679016
Mean training loss after epoch 230: 0.03910631895152681
EPOCH: 231
Loss at step 0: 0.029594002291560173
Loss at step 50: 0.033388398587703705
Loss at step 100: 0.030222948640584946
Loss at step 150: 0.03679189831018448
Loss at step 200: 0.034237805753946304
Loss at step 250: 0.03863455355167389
Loss at step 300: 0.03193581849336624
Loss at step 350: 0.04040062427520752
Loss at step 400: 0.042988717555999756
Loss at step 450: 0.048941951245069504
Loss at step 500: 0.036740753799676895
Loss at step 550: 0.04671098291873932
Loss at step 600: 0.02967640571296215
Loss at step 650: 0.03679099678993225
Loss at step 700: 0.03040301240980625
Loss at step 750: 0.048260949552059174
Loss at step 800: 0.04979100823402405
Loss at step 850: 0.03750582039356232
Loss at step 900: 0.0348072350025177
Mean training loss after epoch 231: 0.03864938818784093
EPOCH: 232
Loss at step 0: 0.04080262780189514
Loss at step 50: 0.0347711406648159
Loss at step 100: 0.04852057993412018
Loss at step 150: 0.035218965262174606
Loss at step 200: 0.03393496945500374
Loss at step 250: 0.047152820974588394
Loss at step 300: 0.03276395425200462
Loss at step 350: 0.03969300538301468
Loss at step 400: 0.031070873141288757
Loss at step 450: 0.0518072172999382
Loss at step 500: 0.03238266333937645
Loss at step 550: 0.03421289101243019
Loss at step 600: 0.04663264751434326
Loss at step 650: 0.033952854573726654
Loss at step 700: 0.050659772008657455
Loss at step 750: 0.02905084565281868
Loss at step 800: 0.0381062813103199
Loss at step 850: 0.03878077492117882
Loss at step 900: 0.033104512840509415
Mean training loss after epoch 232: 0.03919413653033565
EPOCH: 233
Loss at step 0: 0.030844537541270256
Loss at step 50: 0.05130849406123161
Loss at step 100: 0.06363049894571304
Loss at step 150: 0.032386112958192825
Loss at step 200: 0.03350917622447014
Loss at step 250: 0.04133734479546547
Loss at step 300: 0.03743404150009155
Loss at step 350: 0.030576646327972412
Loss at step 400: 0.03490522503852844
Loss at step 450: 0.046362556517124176
Loss at step 500: 0.03323175013065338
Loss at step 550: 0.042132873088121414
Loss at step 600: 0.05004531517624855
Loss at step 650: 0.036296043545007706
Loss at step 700: 0.03690037876367569
Loss at step 750: 0.041600458323955536
Loss at step 800: 0.047059088945388794
Loss at step 850: 0.03368044272065163
Loss at step 900: 0.04677114263176918
Mean training loss after epoch 233: 0.039063130267091524
EPOCH: 234
Loss at step 0: 0.03360550105571747
Loss at step 50: 0.03290083259344101
Loss at step 100: 0.034932930022478104
Loss at step 150: 0.03593425080180168
Loss at step 200: 0.0440426729619503
Loss at step 250: 0.05007448047399521
Loss at step 300: 0.036359675228595734
Loss at step 350: 0.031192978844046593
Loss at step 400: 0.02669164352118969
Loss at step 450: 0.0325494147837162
Loss at step 500: 0.02928798645734787
Loss at step 550: 0.037536706775426865
Loss at step 600: 0.04039769247174263
Loss at step 650: 0.03147507831454277
Loss at step 700: 0.05367570370435715
Loss at step 750: 0.044787436723709106
Loss at step 800: 0.028223801404237747
Loss at step 850: 0.05118735879659653
Loss at step 900: 0.03116217814385891
Mean training loss after epoch 234: 0.03894189336120701
EPOCH: 235
Loss at step 0: 0.03430939093232155
Loss at step 50: 0.03122139908373356
Loss at step 100: 0.03365548327565193
Loss at step 150: 0.029249092563986778
Loss at step 200: 0.055865611881017685
Loss at step 250: 0.04746852442622185
Loss at step 300: 0.0332975871860981
Loss at step 350: 0.04182635620236397
Loss at step 400: 0.0324370302259922
Loss at step 450: 0.03309211879968643
Loss at step 500: 0.03604499250650406
Loss at step 550: 0.02579680271446705
Loss at step 600: 0.0383726991713047
Loss at step 650: 0.03601831570267677
Loss at step 700: 0.03624669089913368
Loss at step 750: 0.03240237012505531
Loss at step 800: 0.037177249789237976
Loss at step 850: 0.033456508070230484
Loss at step 900: 0.031031902879476547
Mean training loss after epoch 235: 0.03901019287921155
EPOCH: 236
Loss at step 0: 0.04972589761018753
Loss at step 50: 0.044508837163448334
Loss at step 100: 0.035206809639930725
Loss at step 150: 0.03639085963368416
Loss at step 200: 0.04725930094718933
Loss at step 250: 0.05124089494347572
Loss at step 300: 0.05921793729066849
Loss at step 350: 0.0671318843960762
Loss at step 400: 0.03563682362437248
Loss at step 450: 0.03774426132440567
Loss at step 500: 0.05267181992530823
Loss at step 550: 0.03393133357167244
Loss at step 600: 0.03175343573093414
Loss at step 650: 0.05360681936144829
Loss at step 700: 0.05347166955471039
Loss at step 750: 0.035956475883722305
Loss at step 800: 0.03781764209270477
Loss at step 850: 0.031263165175914764
Loss at step 900: 0.03862721472978592
Mean training loss after epoch 236: 0.03922938913512014
EPOCH: 237
Loss at step 0: 0.06566672772169113
Loss at step 50: 0.031792644411325455
Loss at step 100: 0.03735481947660446
Loss at step 150: 0.034779392182826996
Loss at step 200: 0.06874541938304901
Loss at step 250: 0.05580836907029152
Loss at step 300: 0.04530530422925949
Loss at step 350: 0.03510119020938873
Loss at step 400: 0.03770944103598595
Loss at step 450: 0.06360543519258499
Loss at step 500: 0.05799640715122223
Loss at step 550: 0.03166048228740692
Loss at step 600: 0.04853644594550133
Loss at step 650: 0.03736702352762222
Loss at step 700: 0.03405768796801567
Loss at step 750: 0.03659890219569206
Loss at step 800: 0.025363899767398834
Loss at step 850: 0.03285384550690651
Loss at step 900: 0.027758479118347168
Mean training loss after epoch 237: 0.03887855290953539
EPOCH: 238
Loss at step 0: 0.03370613232254982
Loss at step 50: 0.04008691385388374
Loss at step 100: 0.03578205406665802
Loss at step 150: 0.05115322396159172
Loss at step 200: 0.05261386185884476
Loss at step 250: 0.03729558736085892
Loss at step 300: 0.03199843317270279
Loss at step 350: 0.05246369168162346
Loss at step 400: 0.0468270368874073
Loss at step 450: 0.03319402039051056
Loss at step 500: 0.03305567055940628
Loss at step 550: 0.05719682201743126
Loss at step 600: 0.051412034779787064
Loss at step 650: 0.03428887948393822
Loss at step 700: 0.039233673363924026
Loss at step 750: 0.028749994933605194
Loss at step 800: 0.035130735486745834
Loss at step 850: 0.03354177996516228
Loss at step 900: 0.035492267459630966
Mean training loss after epoch 238: 0.03845155325287314
EPOCH: 239
Loss at step 0: 0.03153827786445618
Loss at step 50: 0.028729312121868134
Loss at step 100: 0.06446194648742676
Loss at step 150: 0.05377689749002457
Loss at step 200: 0.08363740891218185
Loss at step 250: 0.028110457584261894
Loss at step 300: 0.03177076205611229
Loss at step 350: 0.02922712452709675
Loss at step 400: 0.03213167190551758
Loss at step 450: 0.03212027996778488
Loss at step 500: 0.03912307322025299
Loss at step 550: 0.032081134617328644
Loss at step 600: 0.04772243648767471
Loss at step 650: 0.029768118634819984
Loss at step 700: 0.05186186730861664
Loss at step 750: 0.0529065765440464
Loss at step 800: 0.0341126024723053
Loss at step 850: 0.051606934517621994
Loss at step 900: 0.06367186456918716
Mean training loss after epoch 239: 0.03928863341961779
EPOCH: 240
Loss at step 0: 0.03887162730097771
Loss at step 50: 0.050880786031484604
Loss at step 100: 0.039547499269247055
Loss at step 150: 0.028909575194120407
Loss at step 200: 0.03536584973335266
Loss at step 250: 0.03659897297620773
Loss at step 300: 0.033108536154031754
Loss at step 350: 0.02675941213965416
Loss at step 400: 0.03612198680639267
Loss at step 450: 0.05426546931266785
Loss at step 500: 0.0330321379005909
Loss at step 550: 0.05149666219949722
Loss at step 600: 0.03463984280824661
Loss at step 650: 0.028366481885313988
Loss at step 700: 0.044816501438617706
Loss at step 750: 0.04147651046514511
Loss at step 800: 0.033853594213724136
Loss at step 850: 0.03742264583706856
Loss at step 900: 0.03427162766456604
Mean training loss after epoch 240: 0.03903400417028079
EPOCH: 241
Loss at step 0: 0.03618745505809784
Loss at step 50: 0.03541552275419235
Loss at step 100: 0.03832630813121796
Loss at step 150: 0.031097810715436935
Loss at step 200: 0.05100856348872185
Loss at step 250: 0.05207272991538048
Loss at step 300: 0.03732670843601227
Loss at step 350: 0.03722361475229263
Loss at step 400: 0.035946499556303024
Loss at step 450: 0.04949504882097244
Loss at step 500: 0.04036898910999298
Loss at step 550: 0.03357747942209244
Loss at step 600: 0.03259134665131569
Loss at step 650: 0.037669405341148376
Loss at step 700: 0.03358788415789604
Loss at step 750: 0.05227615311741829
Loss at step 800: 0.03881647065281868
Loss at step 850: 0.033830676227808
Loss at step 900: 0.04052143916487694
Mean training loss after epoch 241: 0.03923677153257864
EPOCH: 242
Loss at step 0: 0.031095046550035477
Loss at step 50: 0.036116208881139755
Loss at step 100: 0.029616231098771095
Loss at step 150: 0.04071303829550743
Loss at step 200: 0.05126992613077164
Loss at step 250: 0.02957015484571457
Loss at step 300: 0.04773343354463577
Loss at step 350: 0.045865584164857864
Loss at step 400: 0.03263732045888901
Loss at step 450: 0.032689858227968216
Loss at step 500: 0.04566469416022301
Loss at step 550: 0.0382232740521431
Loss at step 600: 0.0350029319524765
Loss at step 650: 0.032325662672519684
Loss at step 700: 0.029781801626086235
Loss at step 750: 0.049002304673194885
Loss at step 800: 0.05032792314887047
Loss at step 850: 0.04957038536667824
Loss at step 900: 0.03759152814745903
Mean training loss after epoch 242: 0.03931932861029085
EPOCH: 243
Loss at step 0: 0.038250263780355453
Loss at step 50: 0.03833189606666565
Loss at step 100: 0.03076758421957493
Loss at step 150: 0.037498589605093
Loss at step 200: 0.03554268926382065
Loss at step 250: 0.029882796108722687
Loss at step 300: 0.035731032490730286
Loss at step 350: 0.030183272436261177
Loss at step 400: 0.049779996275901794
Loss at step 450: 0.040445808321237564
Loss at step 500: 0.047706544399261475
Loss at step 550: 0.047556981444358826
Loss at step 600: 0.03681233897805214
Loss at step 650: 0.02908683754503727
Loss at step 700: 0.02920670434832573
Loss at step 750: 0.030341751873493195
Loss at step 800: 0.030939605087041855
Loss at step 850: 0.03304276987910271
Loss at step 900: 0.03191913291811943
Mean training loss after epoch 243: 0.03905273881206698
EPOCH: 244
Loss at step 0: 0.06598794460296631
Loss at step 50: 0.02893797680735588
Loss at step 100: 0.03549230843782425
Loss at step 150: 0.03681396320462227
Loss at step 200: 0.03673600032925606
Loss at step 250: 0.04525090008974075
Loss at step 300: 0.03606670722365379
Loss at step 350: 0.024963827803730965
Loss at step 400: 0.05136997625231743
Loss at step 450: 0.06354724615812302
Loss at step 500: 0.03802937641739845
Loss at step 550: 0.04526302590966225
Loss at step 600: 0.03441905602812767
Loss at step 650: 0.04524055868387222
Loss at step 700: 0.02656574547290802
Loss at step 750: 0.07117734104394913
Loss at step 800: 0.036995913833379745
Loss at step 850: 0.03178925812244415
Loss at step 900: 0.037087518721818924
Mean training loss after epoch 244: 0.039370466573739736
EPOCH: 245
Loss at step 0: 0.048373542726039886
Loss at step 50: 0.033451057970523834
Loss at step 100: 0.028194300830364227
Loss at step 150: 0.047540027648210526
Loss at step 200: 0.035591401159763336
Loss at step 250: 0.047648560255765915
Loss at step 300: 0.02961515262722969
Loss at step 350: 0.036826133728027344
Loss at step 400: 0.036649350076913834
Loss at step 450: 0.046033795922994614
Loss at step 500: 0.038278304040431976
Loss at step 550: 0.05737805739045143
Loss at step 600: 0.034337978810071945
Loss at step 650: 0.047907229512929916
Loss at step 700: 0.029077300801873207
Loss at step 750: 0.026141535490751266
Loss at step 800: 0.040920063853263855
Loss at step 850: 0.03886491060256958
Loss at step 900: 0.03947722166776657
Mean training loss after epoch 245: 0.0392171511811806
EPOCH: 246
Loss at step 0: 0.03227032348513603
Loss at step 50: 0.03560794144868851
Loss at step 100: 0.03743302822113037
Loss at step 150: 0.030393380671739578
Loss at step 200: 0.05443951487541199
Loss at step 250: 0.050018344074487686
Loss at step 300: 0.0319996252655983
Loss at step 350: 0.03510141000151634
Loss at step 400: 0.05677584186196327
Loss at step 450: 0.05292211100459099
Loss at step 500: 0.04954027011990547
Loss at step 550: 0.036719441413879395
Loss at step 600: 0.037654343992471695
Loss at step 650: 0.045069120824337006
Loss at step 700: 0.04070558398962021
Loss at step 750: 0.03152972832322121
Loss at step 800: 0.032653797417879105
Loss at step 850: 0.032190416008234024
Loss at step 900: 0.03875408694148064
Mean training loss after epoch 246: 0.03970623201827632
EPOCH: 247
Loss at step 0: 0.03510020300745964
Loss at step 50: 0.028460653498768806
Loss at step 100: 0.027597110718488693
Loss at step 150: 0.0368226058781147
Loss at step 200: 0.03382598236203194
Loss at step 250: 0.0317443311214447
Loss at step 300: 0.04087996855378151
Loss at step 350: 0.029916564002633095
Loss at step 400: 0.03555725887417793
Loss at step 450: 0.047546543180942535
Loss at step 500: 0.03285599499940872
Loss at step 550: 0.03534551337361336
Loss at step 600: 0.06219349056482315
Loss at step 650: 0.0399542972445488
Loss at step 700: 0.028029421344399452
Loss at step 750: 0.0445200577378273
Loss at step 800: 0.03424978256225586
Loss at step 850: 0.025731278583407402
Loss at step 900: 0.038073569536209106
Mean training loss after epoch 247: 0.039303670599580065
EPOCH: 248
Loss at step 0: 0.030523089691996574
Loss at step 50: 0.03858974575996399
Loss at step 100: 0.03324702009558678
Loss at step 150: 0.033646613359451294
Loss at step 200: 0.028181303292512894
Loss at step 250: 0.03686067834496498
Loss at step 300: 0.02987145632505417
Loss at step 350: 0.02894950844347477
Loss at step 400: 0.06689367443323135
Loss at step 450: 0.029212500900030136
Loss at step 500: 0.03293319419026375
Loss at step 550: 0.032734330743551254
Loss at step 600: 0.028700964525341988
Loss at step 650: 0.0392322838306427
Loss at step 700: 0.048850029706954956
Loss at step 750: 0.05006673187017441
Loss at step 800: 0.03270864486694336
Loss at step 850: 0.03546704351902008
Loss at step 900: 0.03319983929395676
Mean training loss after epoch 248: 0.03888359318163667
EPOCH: 249
Loss at step 0: 0.036848295480012894
Loss at step 50: 0.03344952315092087
Loss at step 100: 0.030375657603144646
Loss at step 150: 0.04395264759659767
Loss at step 200: 0.032356906682252884
Loss at step 250: 0.03715647757053375
Loss at step 300: 0.047168467193841934
Loss at step 350: 0.028079552575945854
Loss at step 400: 0.06304143369197845
Loss at step 450: 0.03134738653898239
Loss at step 500: 0.03943207114934921
Loss at step 550: 0.02837996557354927
Loss at step 600: 0.043990012258291245
Loss at step 650: 0.05276433750987053
Loss at step 700: 0.03396638482809067
Loss at step 750: 0.02971942536532879
Loss at step 800: 0.034827522933483124
Loss at step 850: 0.05165525898337364
Loss at step 900: 0.04828884080052376
Mean training loss after epoch 249: 0.03892020832127663
EPOCH: 250
Loss at step 0: 0.03388892486691475
Loss at step 50: 0.03131557255983353
Loss at step 100: 0.05106263980269432
Loss at step 150: 0.034251436591148376
Loss at step 200: 0.05417611449956894
Loss at step 250: 0.035362519323825836
Loss at step 300: 0.03308453783392906
Loss at step 350: 0.03788948431611061
Loss at step 400: 0.03544377163052559
Loss at step 450: 0.036667026579380035
Loss at step 500: 0.05221394822001457
Loss at step 550: 0.04554745554924011
Loss at step 600: 0.030167769640684128
Loss at step 650: 0.0340573750436306
Loss at step 700: 0.04917588457465172
Loss at step 750: 0.03632596880197525
Loss at step 800: 0.03073965013027191
Loss at step 850: 0.027795109897851944
Loss at step 900: 0.03306938707828522
Mean training loss after epoch 250: 0.03891344393081248
EPOCH: 251
Loss at step 0: 0.039425771683454514
Loss at step 50: 0.030994482338428497
Loss at step 100: 0.05115703493356705
Loss at step 150: 0.03247106820344925
Loss at step 200: 0.036715537309646606
Loss at step 250: 0.04827232286334038
Loss at step 300: 0.04005703702569008
Loss at step 350: 0.03334115818142891
Loss at step 400: 0.03590434789657593
Loss at step 450: 0.035763200372457504
Loss at step 500: 0.04415174201130867
Loss at step 550: 0.040398284792900085
Loss at step 600: 0.0336516872048378
Loss at step 650: 0.06026959791779518
Loss at step 700: 0.05111640691757202
Loss at step 750: 0.03100166656076908
Loss at step 800: 0.030193939805030823
Loss at step 850: 0.03528062254190445
Loss at step 900: 0.0353427454829216
Mean training loss after epoch 251: 0.03915926600014096
EPOCH: 252
Loss at step 0: 0.047453004866838455
Loss at step 50: 0.03374917432665825
Loss at step 100: 0.028373297303915024
Loss at step 150: 0.06039762496948242
Loss at step 200: 0.030490513890981674
Loss at step 250: 0.026922443881630898
Loss at step 300: 0.023987047374248505
Loss at step 350: 0.03156132623553276
Loss at step 400: 0.0331733338534832
Loss at step 450: 0.028657851740717888
Loss at step 500: 0.03813005983829498
Loss at step 550: 0.028067875653505325
Loss at step 600: 0.038628771901130676
Loss at step 650: 0.033749524503946304
Loss at step 700: 0.035677965730428696
Loss at step 750: 0.028826430439949036
Loss at step 800: 0.035348471254110336
Loss at step 850: 0.03522353619337082
Loss at step 900: 0.03503499925136566
Mean training loss after epoch 252: 0.03881135373028802
EPOCH: 253
Loss at step 0: 0.05614131689071655
Loss at step 50: 0.02495838701725006
Loss at step 100: 0.047003425657749176
Loss at step 150: 0.029494477435946465
Loss at step 200: 0.030162688344717026
Loss at step 250: 0.06391389667987823
Loss at step 300: 0.02871713973581791
Loss at step 350: 0.03513010963797569
Loss at step 400: 0.06640415638685226
Loss at step 450: 0.03930817171931267
Loss at step 500: 0.029120804741978645
Loss at step 550: 0.02939414419233799
Loss at step 600: 0.0732555091381073
Loss at step 650: 0.030200641602277756
Loss at step 700: 0.03482096642255783
Loss at step 750: 0.030384305864572525
Loss at step 800: 0.034591879695653915
Loss at step 850: 0.03804127871990204
Loss at step 900: 0.04978170618414879
Mean training loss after epoch 253: 0.038376694193669854
EPOCH: 254
Loss at step 0: 0.03105386532843113
Loss at step 50: 0.04386408254504204
Loss at step 100: 0.0490780733525753
Loss at step 150: 0.035760875791311264
Loss at step 200: 0.03169526532292366
Loss at step 250: 0.033175766468048096
Loss at step 300: 0.04693169519305229
Loss at step 350: 0.033079493790864944
Loss at step 400: 0.0317985899746418
Loss at step 450: 0.03733246773481369
Loss at step 500: 0.07064293324947357
Loss at step 550: 0.03856551647186279
Loss at step 600: 0.04498601704835892
Loss at step 650: 0.050545915961265564
Loss at step 700: 0.032745424658060074
Loss at step 750: 0.03153373301029205
Loss at step 800: 0.031938981264829636
Loss at step 850: 0.06265679746866226
Loss at step 900: 0.06484922766685486
Mean training loss after epoch 254: 0.038683842222240056
EPOCH: 255
Loss at step 0: 0.06329744309186935
Loss at step 50: 0.04864448308944702
Loss at step 100: 0.04800715669989586
Loss at step 150: 0.05042517930269241
Loss at step 200: 0.03493422269821167
Loss at step 250: 0.033742208033800125
Loss at step 300: 0.04552479088306427
Loss at step 350: 0.03966813534498215
Loss at step 400: 0.035213813185691833
Loss at step 450: 0.037367723882198334
Loss at step 500: 0.03437703102827072
Loss at step 550: 0.036713361740112305
Loss at step 600: 0.04268074035644531
Loss at step 650: 0.03121114708483219
Loss at step 700: 0.04252978786826134
Loss at step 750: 0.028949955478310585
Loss at step 800: 0.03465810418128967
Loss at step 850: 0.030163034796714783
Loss at step 900: 0.03492124006152153
Mean training loss after epoch 255: 0.03891279746387114
EPOCH: 256
Loss at step 0: 0.034355781972408295
Loss at step 50: 0.051526568830013275
Loss at step 100: 0.05003800615668297
Loss at step 150: 0.03800094127655029
Loss at step 200: 0.06754325330257416
Loss at step 250: 0.03478672355413437
Loss at step 300: 0.051567237824201584
Loss at step 350: 0.0353977233171463
Loss at step 400: 0.050872646272182465
Loss at step 450: 0.04051677882671356
Loss at step 500: 0.03876348212361336
Loss at step 550: 0.04580125957727432
Loss at step 600: 0.03618134185671806
Loss at step 650: 0.025788437575101852
Loss at step 700: 0.03462769463658333
Loss at step 750: 0.0296726506203413
Loss at step 800: 0.035372376441955566
Loss at step 850: 0.06778371334075928
Loss at step 900: 0.04379701986908913
Mean training loss after epoch 256: 0.039542128389546356