/athenahomes/gabrijel/miniconda3/envs/track-generator/lib/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/athenahomes/gabrijel/miniconda3/envs/track-generator/lib/python3.11/site-packages/torchvision/image.so: undefined symbol: _ZN3c1017RegisterOperatorsD1Ev'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source? warn( Schedule: linear Cfg: True Output path: /scratch/shared/beegfs/gabrijel/m2l/mini Patch Size: 4 Device: cuda:6 ===================================================================================== Layer (type:depth-idx) Param # ===================================================================================== DiT 18,816 ├─PatchEmbed: 1-1 -- │ └─Conv2d: 2-1 6,528 ├─TimestepEmbedder: 1-2 -- │ └─Mlp: 2-2 -- │ │ └─Linear: 3-1 98,688 │ │ └─SiLU: 3-2 -- │ │ └─Linear: 3-3 147,840 ├─LabelEmbedder: 1-3 -- │ └─Embedding: 2-3 4,224 ├─ModuleList: 1-4 -- │ └─DiTBlock: 2-4 -- │ │ └─LayerNorm: 3-4 -- │ │ └─MultiheadAttention: 3-5 591,360 │ │ └─LayerNorm: 3-6 -- │ │ └─Mlp: 3-7 1,181,568 │ │ └─Sequential: 3-8 887,040 │ └─DiTBlock: 2-5 -- │ │ └─LayerNorm: 3-9 -- │ │ └─MultiheadAttention: 3-10 591,360 │ │ └─LayerNorm: 3-11 -- │ │ └─Mlp: 3-12 1,181,568 │ │ └─Sequential: 3-13 887,040 │ └─DiTBlock: 2-6 -- │ │ └─LayerNorm: 3-14 -- │ │ └─MultiheadAttention: 3-15 591,360 │ │ └─LayerNorm: 3-16 -- │ │ └─Mlp: 3-17 1,181,568 │ │ └─Sequential: 3-18 887,040 │ └─DiTBlock: 2-7 -- │ │ └─LayerNorm: 3-19 -- │ │ └─MultiheadAttention: 3-20 591,360 │ │ └─LayerNorm: 3-21 -- │ │ └─Mlp: 3-22 1,181,568 │ │ └─Sequential: 3-23 887,040 │ └─DiTBlock: 2-8 -- │ │ └─LayerNorm: 3-24 -- │ │ └─MultiheadAttention: 3-25 591,360 │ │ └─LayerNorm: 3-26 -- │ │ └─Mlp: 3-27 1,181,568 │ │ └─Sequential: 3-28 887,040 │ └─DiTBlock: 2-9 -- │ │ └─LayerNorm: 3-29 -- │ │ └─MultiheadAttention: 3-30 591,360 │ │ └─LayerNorm: 3-31 -- │ │ └─Mlp: 3-32 1,181,568 │ │ └─Sequential: 3-33 887,040 ├─FinalLayer: 1-5 -- │ └─LayerNorm: 2-10 -- │ └─Linear: 2-11 6,160 │ └─Sequential: 2-12 -- │ │ └─SiLU: 3-34 -- │ │ └─Linear: 3-35 295,680 ├─Unpatchify: 1-6 -- ===================================================================================== Total params: 16,537,744 Trainable params: 16,518,928 Non-trainable params: 18,816 ===================================================================================== EPOCH: 1 Loss at step 0: 1.0038596391677856 Loss at step 50: 0.2918556332588196 Loss at step 100: 0.18047790229320526 Loss at step 150: 0.17205047607421875 Loss at step 200: 0.1530153602361679 Loss at step 250: 0.14763256907463074 Loss at step 300: 0.1537582129240036 Loss at step 350: 0.13850589096546173 Loss at step 400: 0.13459451496601105 Loss at step 450: 0.14671212434768677 Loss at step 500: 0.14125366508960724 Loss at step 550: 0.14454728364944458 Loss at step 600: 0.11684989929199219 Loss at step 650: 0.11971846967935562 Loss at step 700: 0.1362464278936386 Loss at step 750: 0.13384230434894562 Loss at step 800: 0.12015460431575775 Loss at step 850: 0.10912631452083588 Loss at step 900: 0.1076808050274849 Mean training loss after epoch 1: 0.17505858766276444 EPOCH: 2 Loss at step 0: 0.09970872849225998 Loss at step 50: 0.11535374075174332 Loss at step 100: 0.08777402341365814 Loss at step 150: 0.094089075922966 Loss at step 200: 0.09134402126073837 Loss at step 250: 0.10842675715684891 Loss at step 300: 0.10726024210453033 Loss at step 350: 0.10613532364368439 Loss at step 400: 0.08223216980695724 Loss at step 450: 0.10601098835468292 Loss at step 500: 0.10789067298173904 Loss at step 550: 0.08660466223955154 Loss at step 600: 0.09686139971017838 Loss at step 650: 0.11000780761241913 Loss at step 700: 0.11315062642097473 Loss at step 750: 0.11132150888442993 Loss at step 800: 0.09282153099775314 Loss at step 850: 0.13458004593849182 Loss at step 900: 0.10913736373186111 Mean training loss after epoch 2: 0.10543004641019459 EPOCH: 3 Loss at step 0: 0.0981135442852974 Loss at step 50: 0.08520153164863586 Loss at step 100: 0.10994406044483185 Loss at step 150: 0.10289637744426727 Loss at step 200: 0.08958550542593002 Loss at step 250: 0.08982916921377182 Loss at step 300: 0.08209376037120819 Loss at step 350: 0.10257779061794281 Loss at step 400: 0.08951161801815033 Loss at step 450: 0.0889066532254219 Loss at step 500: 0.08889805525541306 Loss at step 550: 0.08664335310459137 Loss at step 600: 0.09030818939208984 Loss at step 650: 0.08433490991592407 Loss at step 700: 0.10038868337869644 Loss at step 750: 0.0875115692615509 Loss at step 800: 0.0819409042596817 Loss at step 850: 0.09024179726839066 Loss at step 900: 0.11753983050584793 Mean training loss after epoch 3: 0.09616819024880303 EPOCH: 4 Loss at step 0: 0.11792279779911041 Loss at step 50: 0.09628928452730179 Loss at step 100: 0.09041044116020203 Loss at step 150: 0.0792309045791626 Loss at step 200: 0.10769736766815186 Loss at step 250: 0.09465845674276352 Loss at step 300: 0.08893964439630508 Loss at step 350: 0.07811836153268814 Loss at step 400: 0.09439677745103836 Loss at step 450: 0.08513002842664719 Loss at step 500: 0.08040136098861694 Loss at step 550: 0.08757427334785461 Loss at step 600: 0.09534512460231781 Loss at step 650: 0.09617640823125839 Loss at step 700: 0.09664308279752731 Loss at step 750: 0.0812828540802002 Loss at step 800: 0.10377979278564453 Loss at step 850: 0.08702073246240616 Loss at step 900: 0.08119349181652069 Mean training loss after epoch 4: 0.09236355998050938 EPOCH: 5 Loss at step 0: 0.10263672471046448 Loss at step 50: 0.07507966458797455 Loss at step 100: 0.09297768771648407 Loss at step 150: 0.07523129880428314 Loss at step 200: 0.09416799992322922 Loss at step 250: 0.09460284560918808 Loss at step 300: 0.10723645985126495 Loss at step 350: 0.07740838080644608 Loss at step 400: 0.07607519626617432 Loss at step 450: 0.07258065044879913 Loss at step 500: 0.09733420610427856 Loss at step 550: 0.07502114027738571 Loss at step 600: 0.09033571928739548 Loss at step 650: 0.08851262181997299 Loss at step 700: 0.06531966477632523 Loss at step 750: 0.06647845357656479 Loss at step 800: 0.08263806998729706 Loss at step 850: 0.08561213314533234 Loss at step 900: 0.08004341274499893 Mean training loss after epoch 5: 0.08205677618159414 EPOCH: 6 Loss at step 0: 0.08822717517614365 Loss at step 50: 0.06571617722511292 Loss at step 100: 0.08278663456439972 Loss at step 150: 0.06746267527341843 Loss at step 200: 0.08326079696416855 Loss at step 250: 0.08169841766357422 Loss at step 300: 0.08837293088436127 Loss at step 350: 0.061628058552742004 Loss at step 400: 0.0885109081864357 Loss at step 450: 0.08764369785785675 Loss at step 500: 0.0808992013335228 Loss at step 550: 0.06908903270959854 Loss at step 600: 0.06753462553024292 Loss at step 650: 0.06954791396856308 Loss at step 700: 0.0895712673664093 Loss at step 750: 0.07059746235609055 Loss at step 800: 0.07869650423526764 Loss at step 850: 0.07138389348983765 Loss at step 900: 0.07961047440767288 Mean training loss after epoch 6: 0.07296937145689912 EPOCH: 7 Loss at step 0: 0.0778558999300003 Loss at step 50: 0.06913076341152191 Loss at step 100: 0.08280720561742783 Loss at step 150: 0.05486404150724411 Loss at step 200: 0.05064312368631363 Loss at step 250: 0.0649154931306839 Loss at step 300: 0.07472442090511322 Loss at step 350: 0.05659928917884827 Loss at step 400: 0.08627942204475403 Loss at step 450: 0.06120423600077629 Loss at step 500: 0.07798167318105698 Loss at step 550: 0.05884985998272896 Loss at step 600: 0.08704529702663422 Loss at step 650: 0.06152969226241112 Loss at step 700: 0.060185108333826065 Loss at step 750: 0.06525897234678268 Loss at step 800: 0.09210710972547531 Loss at step 850: 0.05309052765369415 Loss at step 900: 0.07554454356431961 Mean training loss after epoch 7: 0.06944493800878271 EPOCH: 8 Loss at step 0: 0.0724152997136116 Loss at step 50: 0.0804850235581398 Loss at step 100: 0.05591423064470291 Loss at step 150: 0.08297447115182877 Loss at step 200: 0.0653320774435997 Loss at step 250: 0.05818537250161171 Loss at step 300: 0.06279119849205017 Loss at step 350: 0.07660610973834991 Loss at step 400: 0.05760108679533005 Loss at step 450: 0.07492105662822723 Loss at step 500: 0.05119981989264488 Loss at step 550: 0.07763509452342987 Loss at step 600: 0.05997240170836449 Loss at step 650: 0.08027646690607071 Loss at step 700: 0.06130778044462204 Loss at step 750: 0.0692250058054924 Loss at step 800: 0.05851806700229645 Loss at step 850: 0.07936737686395645 Loss at step 900: 0.05879294499754906 Mean training loss after epoch 8: 0.06655077190239674 EPOCH: 9 Loss at step 0: 0.055418383330106735 Loss at step 50: 0.08309626579284668 Loss at step 100: 0.05595369264483452 Loss at step 150: 0.05822937563061714 Loss at step 200: 0.059979554265737534 Loss at step 250: 0.07474486529827118 Loss at step 300: 0.05400432273745537 Loss at step 350: 0.06315609812736511 Loss at step 400: 0.05557180941104889 Loss at step 450: 0.06566519290208817 Loss at step 500: 0.05524474009871483 Loss at step 550: 0.09070457518100739 Loss at step 600: 0.053429923951625824 Loss at step 650: 0.050164055079221725 Loss at step 700: 0.05440102145075798 Loss at step 750: 0.05381234362721443 Loss at step 800: 0.07054930925369263 Loss at step 850: 0.061064671725034714 Loss at step 900: 0.09384455531835556 Mean training loss after epoch 9: 0.06508908228976513 EPOCH: 10 Loss at step 0: 0.06141402944922447 Loss at step 50: 0.0563209094107151 Loss at step 100: 0.07178563624620438 Loss at step 150: 0.0800415426492691 Loss at step 200: 0.05884641408920288 Loss at step 250: 0.0751492902636528 Loss at step 300: 0.05504678189754486 Loss at step 350: 0.06084134429693222 Loss at step 400: 0.07245956361293793 Loss at step 450: 0.04683047905564308 Loss at step 500: 0.061160437762737274 Loss at step 550: 0.062285907566547394 Loss at step 600: 0.062828928232193 Loss at step 650: 0.07543252408504486 Loss at step 700: 0.06915324181318283 Loss at step 750: 0.05954427644610405 Loss at step 800: 0.05505345016717911 Loss at step 850: 0.061922844499349594 Loss at step 900: 0.07354437559843063 Mean training loss after epoch 10: 0.06344215572674645 EPOCH: 11 Loss at step 0: 0.066066674888134 Loss at step 50: 0.05696279928088188 Loss at step 100: 0.051848508417606354 Loss at step 150: 0.07049939036369324 Loss at step 200: 0.06011848896741867 Loss at step 250: 0.06228229030966759 Loss at step 300: 0.05815320834517479 Loss at step 350: 0.09037895500659943 Loss at step 400: 0.06471290439367294 Loss at step 450: 0.06570009887218475 Loss at step 500: 0.05334730073809624 Loss at step 550: 0.06077706068754196 Loss at step 600: 0.05727175995707512 Loss at step 650: 0.0621584989130497 Loss at step 700: 0.055851925164461136 Loss at step 750: 0.057651665061712265 Loss at step 800: 0.05519440770149231 Loss at step 850: 0.06233476474881172 Loss at step 900: 0.06170308589935303 Mean training loss after epoch 11: 0.06235234221336303 EPOCH: 12 Loss at step 0: 0.05750640481710434 Loss at step 50: 0.05570727959275246 Loss at step 100: 0.05462135374546051 Loss at step 150: 0.055836841464042664 Loss at step 200: 0.06629905849695206 Loss at step 250: 0.04664478078484535 Loss at step 300: 0.06300309300422668 Loss at step 350: 0.05601122975349426 Loss at step 400: 0.060761407017707825 Loss at step 450: 0.05825050547719002 Loss at step 500: 0.06259450316429138 Loss at step 550: 0.07740748673677444 Loss at step 600: 0.05285445600748062 Loss at step 650: 0.06447727233171463 Loss at step 700: 0.05040965601801872 Loss at step 750: 0.058349307626485825 Loss at step 800: 0.06772365421056747 Loss at step 850: 0.04651504009962082 Loss at step 900: 0.051043275743722916 Mean training loss after epoch 12: 0.06125331478618355 EPOCH: 13 Loss at step 0: 0.07440580427646637 Loss at step 50: 0.054357703775167465 Loss at step 100: 0.05317981168627739 Loss at step 150: 0.07111787050962448 Loss at step 200: 0.10059788823127747 Loss at step 250: 0.08734046667814255 Loss at step 300: 0.0575638972222805 Loss at step 350: 0.06608319282531738 Loss at step 400: 0.05960310623049736 Loss at step 450: 0.08260594308376312 Loss at step 500: 0.0715898647904396 Loss at step 550: 0.06309525668621063 Loss at step 600: 0.051694974303245544 Loss at step 650: 0.058969646692276 Loss at step 700: 0.04857061430811882 Loss at step 750: 0.0624336339533329 Loss at step 800: 0.045061659067869186 Loss at step 850: 0.07230044901371002 Loss at step 900: 0.0814652368426323 Mean training loss after epoch 13: 0.05995856191732609 EPOCH: 14 Loss at step 0: 0.0533636137843132 Loss at step 50: 0.06731968373060226 Loss at step 100: 0.053199898451566696 Loss at step 150: 0.05677398294210434 Loss at step 200: 0.09122058749198914 Loss at step 250: 0.0725795179605484 Loss at step 300: 0.07084371149539948 Loss at step 350: 0.07957903295755386 Loss at step 400: 0.0708513930439949 Loss at step 450: 0.0755411684513092 Loss at step 500: 0.04351196438074112 Loss at step 550: 0.05537440627813339 Loss at step 600: 0.05488364025950432 Loss at step 650: 0.06846979260444641 Loss at step 700: 0.0554904043674469 Loss at step 750: 0.05296259745955467 Loss at step 800: 0.07148228585720062 Loss at step 850: 0.0633445531129837 Loss at step 900: 0.055035416036844254 Mean training loss after epoch 14: 0.05996675037943732 EPOCH: 15 Loss at step 0: 0.06165888160467148 Loss at step 50: 0.05083753168582916 Loss at step 100: 0.052202798426151276 Loss at step 150: 0.08322983235120773 Loss at step 200: 0.05534429848194122 Loss at step 250: 0.060774676501750946 Loss at step 300: 0.06331092119216919 Loss at step 350: 0.046184320002794266 Loss at step 400: 0.052470020949840546 Loss at step 450: 0.060590557754039764 Loss at step 500: 0.06334500014781952 Loss at step 550: 0.05489909276366234 Loss at step 600: 0.052383169531822205 Loss at step 650: 0.047400590032339096 Loss at step 700: 0.07057070732116699 Loss at step 750: 0.06825336813926697 Loss at step 800: 0.06965584307909012 Loss at step 850: 0.03980647400021553 Loss at step 900: 0.056994251906871796 Mean training loss after epoch 15: 0.059404475674001395 EPOCH: 16 Loss at step 0: 0.047754909843206406 Loss at step 50: 0.05513385683298111 Loss at step 100: 0.06287753582000732 Loss at step 150: 0.054692547768354416 Loss at step 200: 0.048051867634058 Loss at step 250: 0.06173555925488472 Loss at step 300: 0.052458878606557846 Loss at step 350: 0.047613900154829025 Loss at step 400: 0.04803905636072159 Loss at step 450: 0.04426709935069084 Loss at step 500: 0.05184407904744148 Loss at step 550: 0.05601557716727257 Loss at step 600: 0.0557762049138546 Loss at step 650: 0.07856906205415726 Loss at step 700: 0.057997409254312515 Loss at step 750: 0.06079675629734993 Loss at step 800: 0.05392754077911377 Loss at step 850: 0.05010053142905235 Loss at step 900: 0.06751120835542679 Mean training loss after epoch 16: 0.05834122999771826 EPOCH: 17 Loss at step 0: 0.05351615324616432 Loss at step 50: 0.0692821592092514 Loss at step 100: 0.06206255778670311 Loss at step 150: 0.05497032031416893 Loss at step 200: 0.07043731212615967 Loss at step 250: 0.05092174559831619 Loss at step 300: 0.06052644923329353 Loss at step 350: 0.052424702793359756 Loss at step 400: 0.04660550504922867 Loss at step 450: 0.059270936995744705 Loss at step 500: 0.060510698705911636 Loss at step 550: 0.054464999586343765 Loss at step 600: 0.07570153474807739 Loss at step 650: 0.055116117000579834 Loss at step 700: 0.054772503674030304 Loss at step 750: 0.06182297319173813 Loss at step 800: 0.056405533105134964 Loss at step 850: 0.047215867787599564 Loss at step 900: 0.06804905831813812 Mean training loss after epoch 17: 0.05840665373657304 EPOCH: 18 Loss at step 0: 0.08443731814622879 Loss at step 50: 0.05124342069029808 Loss at step 100: 0.04487810283899307 Loss at step 150: 0.05078786611557007 Loss at step 200: 0.05463608726859093 Loss at step 250: 0.05625702440738678 Loss at step 300: 0.0432308055460453 Loss at step 350: 0.04748641699552536 Loss at step 400: 0.0595870316028595 Loss at step 450: 0.0445963591337204 Loss at step 500: 0.05696718767285347 Loss at step 550: 0.06103126332163811 Loss at step 600: 0.05491870641708374 Loss at step 650: 0.058987367898225784 Loss at step 700: 0.06525983661413193 Loss at step 750: 0.08430144190788269 Loss at step 800: 0.05236179009079933 Loss at step 850: 0.06731224805116653 Loss at step 900: 0.051377072930336 Mean training loss after epoch 18: 0.05797024472340592 EPOCH: 19 Loss at step 0: 0.0587436817586422 Loss at step 50: 0.05327077582478523 Loss at step 100: 0.06921906024217606 Loss at step 150: 0.04971097409725189 Loss at step 200: 0.050753261893987656 Loss at step 250: 0.05687187984585762 Loss at step 300: 0.07781829684972763 Loss at step 350: 0.054319459944963455 Loss at step 400: 0.08025463670492172 Loss at step 450: 0.05237594246864319 Loss at step 500: 0.046143461018800735 Loss at step 550: 0.04757945239543915 Loss at step 600: 0.08994781225919724 Loss at step 650: 0.059235502034425735 Loss at step 700: 0.06530774384737015 Loss at step 750: 0.05232860520482063 Loss at step 800: 0.04445043206214905 Loss at step 850: 0.057392507791519165 Loss at step 900: 0.06377249211072922 Mean training loss after epoch 19: 0.057138487112833494 EPOCH: 20 Loss at step 0: 0.07780443131923676 Loss at step 50: 0.04817727208137512 Loss at step 100: 0.05957942083477974 Loss at step 150: 0.05988895148038864 Loss at step 200: 0.05702034756541252 Loss at step 250: 0.050547607243061066 Loss at step 300: 0.048354312777519226 Loss at step 350: 0.04091460257768631 Loss at step 400: 0.048479460179805756 Loss at step 450: 0.05884775519371033 Loss at step 500: 0.06794746220111847 Loss at step 550: 0.06735465675592422 Loss at step 600: 0.059218063950538635 Loss at step 650: 0.047898028045892715 Loss at step 700: 0.05585755407810211 Loss at step 750: 0.039810944348573685 Loss at step 800: 0.053026411682367325 Loss at step 850: 0.05412264168262482 Loss at step 900: 0.04698286950588226 Mean training loss after epoch 20: 0.056786933786738145 EPOCH: 21 Loss at step 0: 0.057807959616184235 Loss at step 50: 0.04883316159248352 Loss at step 100: 0.05270577222108841 Loss at step 150: 0.05998265743255615 Loss at step 200: 0.07046522945165634 Loss at step 250: 0.052538059651851654 Loss at step 300: 0.07746506482362747 Loss at step 350: 0.051633890718221664 Loss at step 400: 0.05229220539331436 Loss at step 450: 0.08403375744819641 Loss at step 500: 0.04291244223713875 Loss at step 550: 0.06407346576452255 Loss at step 600: 0.04788174852728844 Loss at step 650: 0.06721252202987671 Loss at step 700: 0.04440097510814667 Loss at step 750: 0.04826749488711357 Loss at step 800: 0.0564039945602417 Loss at step 850: 0.06968333572149277 Loss at step 900: 0.04944631829857826 Mean training loss after epoch 21: 0.05656386621153431 EPOCH: 22 Loss at step 0: 0.06197817251086235 Loss at step 50: 0.04409079626202583 Loss at step 100: 0.0533909946680069 Loss at step 150: 0.06652917712926865 Loss at step 200: 0.05246768891811371 Loss at step 250: 0.04395615682005882 Loss at step 300: 0.05014088749885559 Loss at step 350: 0.06042720377445221 Loss at step 400: 0.0751257911324501 Loss at step 450: 0.05865512415766716 Loss at step 500: 0.04442135989665985 Loss at step 550: 0.04846680164337158 Loss at step 600: 0.05935171619057655 Loss at step 650: 0.04699040204286575 Loss at step 700: 0.04791724681854248 Loss at step 750: 0.05543459579348564 Loss at step 800: 0.07073624432086945 Loss at step 850: 0.04412810876965523 Loss at step 900: 0.06565819680690765 Mean training loss after epoch 22: 0.055763628163031424 EPOCH: 23 Loss at step 0: 0.05416282266378403 Loss at step 50: 0.04240821301937103 Loss at step 100: 0.050583887845277786 Loss at step 150: 0.06552695482969284 Loss at step 200: 0.05410993844270706 Loss at step 250: 0.05199902132153511 Loss at step 300: 0.04144735634326935 Loss at step 350: 0.04426245763897896 Loss at step 400: 0.051021307706832886 Loss at step 450: 0.05197690799832344 Loss at step 500: 0.06443125754594803 Loss at step 550: 0.07182082533836365 Loss at step 600: 0.049390703439712524 Loss at step 650: 0.05035512149333954 Loss at step 700: 0.04825124517083168 Loss at step 750: 0.043123360723257065 Loss at step 800: 0.0702730119228363 Loss at step 850: 0.0739336684346199 Loss at step 900: 0.04254326596856117 Mean training loss after epoch 23: 0.05587576825354399 EPOCH: 24 Loss at step 0: 0.05578822270035744 Loss at step 50: 0.0565260574221611 Loss at step 100: 0.04038270562887192 Loss at step 150: 0.05259323865175247 Loss at step 200: 0.05878034234046936 Loss at step 250: 0.06186293438076973 Loss at step 300: 0.0606803335249424 Loss at step 350: 0.07158032804727554 Loss at step 400: 0.04844276234507561 Loss at step 450: 0.045417290180921555 Loss at step 500: 0.04860280454158783 Loss at step 550: 0.04883131757378578 Loss at step 600: 0.06629784405231476 Loss at step 650: 0.0655255913734436 Loss at step 700: 0.053581807762384415 Loss at step 750: 0.05660425126552582 Loss at step 800: 0.04831215739250183 Loss at step 850: 0.06627508997917175 Loss at step 900: 0.04953145235776901 Mean training loss after epoch 24: 0.0558088175547339 EPOCH: 25 Loss at step 0: 0.06486007571220398 Loss at step 50: 0.049438633024692535 Loss at step 100: 0.0520949549973011 Loss at step 150: 0.07686775177717209 Loss at step 200: 0.05523598939180374 Loss at step 250: 0.0625782385468483 Loss at step 300: 0.051135655492544174 Loss at step 350: 0.049014825373888016 Loss at step 400: 0.05322359874844551 Loss at step 450: 0.045524563640356064 Loss at step 500: 0.052215781062841415 Loss at step 550: 0.05354513227939606 Loss at step 600: 0.07494399696588516 Loss at step 650: 0.06042160838842392 Loss at step 700: 0.05432344973087311 Loss at step 750: 0.05213435739278793 Loss at step 800: 0.047573838382959366 Loss at step 850: 0.04453451186418533 Loss at step 900: 0.07137525826692581 Mean training loss after epoch 25: 0.05541786125728062 EPOCH: 26 Loss at step 0: 0.04548483341932297 Loss at step 50: 0.05989157781004906 Loss at step 100: 0.04591457545757294 Loss at step 150: 0.0610799565911293 Loss at step 200: 0.06120055913925171 Loss at step 250: 0.04591960087418556 Loss at step 300: 0.07416683435440063 Loss at step 350: 0.050304245203733444 Loss at step 400: 0.05474277213215828 Loss at step 450: 0.08211886882781982 Loss at step 500: 0.04335871338844299 Loss at step 550: 0.04403204843401909 Loss at step 600: 0.05541656166315079 Loss at step 650: 0.04761388152837753 Loss at step 700: 0.06374719738960266 Loss at step 750: 0.04973401874303818 Loss at step 800: 0.06757289171218872 Loss at step 850: 0.045605726540088654 Loss at step 900: 0.05641040951013565 Mean training loss after epoch 26: 0.05483940118617976 EPOCH: 27 Loss at step 0: 0.041867662221193314 Loss at step 50: 0.04870697855949402 Loss at step 100: 0.09304147958755493 Loss at step 150: 0.04961562529206276 Loss at step 200: 0.043587878346443176 Loss at step 250: 0.04243595525622368 Loss at step 300: 0.052153293043375015 Loss at step 350: 0.04960649833083153 Loss at step 400: 0.05281135067343712 Loss at step 450: 0.050468795001506805 Loss at step 500: 0.0512566938996315 Loss at step 550: 0.0785260945558548 Loss at step 600: 0.05762406811118126 Loss at step 650: 0.04962537810206413 Loss at step 700: 0.04903369024395943 Loss at step 750: 0.05728844925761223 Loss at step 800: 0.0867062360048294 Loss at step 850: 0.06505534052848816 Loss at step 900: 0.050107233226299286 Mean training loss after epoch 27: 0.054998563542993845 EPOCH: 28 Loss at step 0: 0.047736089676618576 Loss at step 50: 0.05952451378107071 Loss at step 100: 0.05091068521142006 Loss at step 150: 0.044760167598724365 Loss at step 200: 0.04659377038478851 Loss at step 250: 0.038576625287532806 Loss at step 300: 0.05428094044327736 Loss at step 350: 0.06567058712244034 Loss at step 400: 0.04903387278318405 Loss at step 450: 0.08076394349336624 Loss at step 500: 0.05038176104426384 Loss at step 550: 0.05302233621478081 Loss at step 600: 0.05289943143725395 Loss at step 650: 0.05580538883805275 Loss at step 700: 0.046349260956048965 Loss at step 750: 0.04074665531516075 Loss at step 800: 0.051596611738204956 Loss at step 850: 0.04613744467496872 Loss at step 900: 0.05582321807742119 Mean training loss after epoch 28: 0.0545806220567811 EPOCH: 29 Loss at step 0: 0.055963754653930664 Loss at step 50: 0.05352073535323143 Loss at step 100: 0.05439290776848793 Loss at step 150: 0.060438308864831924 Loss at step 200: 0.053520265966653824 Loss at step 250: 0.0604640357196331 Loss at step 300: 0.04805484041571617 Loss at step 350: 0.05180160328745842 Loss at step 400: 0.052958741784095764 Loss at step 450: 0.052045051008462906 Loss at step 500: 0.05218474566936493 Loss at step 550: 0.05290327966213226 Loss at step 600: 0.049060624092817307 Loss at step 650: 0.06915456801652908 Loss at step 700: 0.04746730998158455 Loss at step 750: 0.04791809618473053 Loss at step 800: 0.04840479791164398 Loss at step 850: 0.045148156583309174 Loss at step 900: 0.03663087636232376 Mean training loss after epoch 29: 0.054162899715337415 EPOCH: 30 Loss at step 0: 0.04772668331861496 Loss at step 50: 0.07788056880235672 Loss at step 100: 0.06185344606637955 Loss at step 150: 0.05172247812151909 Loss at step 200: 0.06506790965795517 Loss at step 250: 0.041413258761167526 Loss at step 300: 0.047469720244407654 Loss at step 350: 0.06524337083101273 Loss at step 400: 0.060637909919023514 Loss at step 450: 0.05373001471161842 Loss at step 500: 0.05846158042550087 Loss at step 550: 0.07577697932720184 Loss at step 600: 0.05092581734061241 Loss at step 650: 0.05859656631946564 Loss at step 700: 0.0693199634552002 Loss at step 750: 0.05934913828969002 Loss at step 800: 0.048118624836206436 Loss at step 850: 0.05650344491004944 Loss at step 900: 0.05311756953597069 Mean training loss after epoch 30: 0.054990000558147296 EPOCH: 31 Loss at step 0: 0.06625262647867203 Loss at step 50: 0.03297751396894455 Loss at step 100: 0.045132461935281754 Loss at step 150: 0.04686988145112991 Loss at step 200: 0.04410744085907936 Loss at step 250: 0.0581258162856102 Loss at step 300: 0.0649137794971466 Loss at step 350: 0.049532849341630936 Loss at step 400: 0.04994118958711624 Loss at step 450: 0.04786548763513565 Loss at step 500: 0.0503094345331192 Loss at step 550: 0.04206209257245064 Loss at step 600: 0.07669640332460403 Loss at step 650: 0.05510042980313301 Loss at step 700: 0.05862889811396599 Loss at step 750: 0.07545014470815659 Loss at step 800: 0.07135963439941406 Loss at step 850: 0.05121200159192085 Loss at step 900: 0.06599773466587067 Mean training loss after epoch 31: 0.054277379399360115 EPOCH: 32 Loss at step 0: 0.0633576363325119 Loss at step 50: 0.06285562366247177 Loss at step 100: 0.050578854978084564 Loss at step 150: 0.04903421550989151 Loss at step 200: 0.060103289783000946 Loss at step 250: 0.06002804636955261 Loss at step 300: 0.05972660332918167 Loss at step 350: 0.05613353103399277 Loss at step 400: 0.05059187114238739 Loss at step 450: 0.046884823590517044 Loss at step 500: 0.052705246955156326 Loss at step 550: 0.05232221633195877 Loss at step 600: 0.04427788034081459 Loss at step 650: 0.06953568011522293 Loss at step 700: 0.05432060733437538 Loss at step 750: 0.05515534430742264 Loss at step 800: 0.046122852712869644 Loss at step 850: 0.05044335499405861 Loss at step 900: 0.05616920068860054 Mean training loss after epoch 32: 0.05364529747984557 EPOCH: 33 Loss at step 0: 0.06812635809183121 Loss at step 50: 0.047215841710567474 Loss at step 100: 0.04166899248957634 Loss at step 150: 0.051177382469177246 Loss at step 200: 0.049012914299964905 Loss at step 250: 0.07571645081043243 Loss at step 300: 0.051014676690101624 Loss at step 350: 0.05759096145629883 Loss at step 400: 0.05065140500664711 Loss at step 450: 0.04851923882961273 Loss at step 500: 0.07313142716884613 Loss at step 550: 0.05073375999927521 Loss at step 600: 0.042708367109298706 Loss at step 650: 0.05150367692112923 Loss at step 700: 0.0667271688580513 Loss at step 750: 0.04989415034651756 Loss at step 800: 0.06956489384174347 Loss at step 850: 0.05254995822906494 Loss at step 900: 0.04482956603169441 Mean training loss after epoch 33: 0.05375767324064205 EPOCH: 34 Loss at step 0: 0.06896451860666275 Loss at step 50: 0.046478066593408585 Loss at step 100: 0.045656491070985794 Loss at step 150: 0.0686042383313179 Loss at step 200: 0.041586849838495255 Loss at step 250: 0.06663074344396591 Loss at step 300: 0.044195640832185745 Loss at step 350: 0.05305260047316551 Loss at step 400: 0.06499624997377396 Loss at step 450: 0.06060551106929779 Loss at step 500: 0.05288441851735115 Loss at step 550: 0.04520798474550247 Loss at step 600: 0.04967028647661209 Loss at step 650: 0.07121746987104416 Loss at step 700: 0.04694961756467819 Loss at step 750: 0.04730687662959099 Loss at step 800: 0.04450223222374916 Loss at step 850: 0.04835294187068939 Loss at step 900: 0.047444675117731094 Mean training loss after epoch 34: 0.05390497635422485 EPOCH: 35 Loss at step 0: 0.04912083223462105 Loss at step 50: 0.045205529779195786 Loss at step 100: 0.04346419870853424 Loss at step 150: 0.044354408979415894 Loss at step 200: 0.045831941068172455 Loss at step 250: 0.04960440471768379 Loss at step 300: 0.055836938321590424 Loss at step 350: 0.06396603584289551 Loss at step 400: 0.05111861601471901 Loss at step 450: 0.07385130971670151 Loss at step 500: 0.047456711530685425 Loss at step 550: 0.04401195049285889 Loss at step 600: 0.0892113447189331 Loss at step 650: 0.05048896744847298 Loss at step 700: 0.044888418167829514 Loss at step 750: 0.04775058478116989 Loss at step 800: 0.04808545485138893 Loss at step 850: 0.06727813929319382 Loss at step 900: 0.047582004219293594 Mean training loss after epoch 35: 0.05338834169338634 EPOCH: 36 Loss at step 0: 0.04541529342532158 Loss at step 50: 0.04370921850204468 Loss at step 100: 0.04396657645702362 Loss at step 150: 0.04812668263912201 Loss at step 200: 0.047855425626039505 Loss at step 250: 0.06259651482105255 Loss at step 300: 0.05623249337077141 Loss at step 350: 0.04796265438199043 Loss at step 400: 0.05409134924411774 Loss at step 450: 0.046287283301353455 Loss at step 500: 0.04403143376111984 Loss at step 550: 0.04554906487464905 Loss at step 600: 0.05372817441821098 Loss at step 650: 0.05676133185625076 Loss at step 700: 0.07154540717601776 Loss at step 750: 0.04281185567378998 Loss at step 800: 0.04032238572835922 Loss at step 850: 0.04623551294207573 Loss at step 900: 0.04021555185317993 Mean training loss after epoch 36: 0.053192674672838725 EPOCH: 37 Loss at step 0: 0.045762062072753906 Loss at step 50: 0.05676257237792015 Loss at step 100: 0.05107102543115616 Loss at step 150: 0.06450726836919785 Loss at step 200: 0.054469574242830276 Loss at step 250: 0.04686671122908592 Loss at step 300: 0.05743500962853432 Loss at step 350: 0.05544617772102356 Loss at step 400: 0.04504067450761795 Loss at step 450: 0.03776704519987106 Loss at step 500: 0.043384842574596405 Loss at step 550: 0.06852810829877853 Loss at step 600: 0.044159650802612305 Loss at step 650: 0.051487989723682404 Loss at step 700: 0.051597028970718384 Loss at step 750: 0.046976156532764435 Loss at step 800: 0.050152573734521866 Loss at step 850: 0.05712180212140083 Loss at step 900: 0.051823604851961136 Mean training loss after epoch 37: 0.05276597742396377 EPOCH: 38 Loss at step 0: 0.06339272111654282 Loss at step 50: 0.05461210384964943 Loss at step 100: 0.04900922626256943 Loss at step 150: 0.05558406934142113 Loss at step 200: 0.06925886124372482 Loss at step 250: 0.04936544969677925 Loss at step 300: 0.06626597791910172 Loss at step 350: 0.06773914396762848 Loss at step 400: 0.0761856883764267 Loss at step 450: 0.048226140439510345 Loss at step 500: 0.048241209238767624 Loss at step 550: 0.05734432861208916 Loss at step 600: 0.049652099609375 Loss at step 650: 0.04865444451570511 Loss at step 700: 0.047797683626413345 Loss at step 750: 0.0472390316426754 Loss at step 800: 0.06560857594013214 Loss at step 850: 0.05306312441825867 Loss at step 900: 0.04709041863679886 Mean training loss after epoch 38: 0.053274497580426586 EPOCH: 39 Loss at step 0: 0.06951993703842163 Loss at step 50: 0.06394577026367188 Loss at step 100: 0.047820109874010086 Loss at step 150: 0.05408184602856636 Loss at step 200: 0.07596515864133835 Loss at step 250: 0.04721921309828758 Loss at step 300: 0.04239613935351372 Loss at step 350: 0.05378371849656105 Loss at step 400: 0.043496035039424896 Loss at step 450: 0.043344225734472275 Loss at step 500: 0.04851044341921806 Loss at step 550: 0.049583181738853455 Loss at step 600: 0.06192677840590477 Loss at step 650: 0.0689084529876709 Loss at step 700: 0.07021592557430267 Loss at step 750: 0.048737261444330215 Loss at step 800: 0.051356635987758636 Loss at step 850: 0.0562000498175621 Loss at step 900: 0.05968385189771652 Mean training loss after epoch 39: 0.053126662230885614 EPOCH: 40 Loss at step 0: 0.0484396331012249 Loss at step 50: 0.05094202607870102 Loss at step 100: 0.04496720805764198 Loss at step 150: 0.04922233149409294 Loss at step 200: 0.0467192642390728 Loss at step 250: 0.06523916870355606 Loss at step 300: 0.06212552636861801 Loss at step 350: 0.052773166447877884 Loss at step 400: 0.0399431511759758 Loss at step 450: 0.06167420744895935 Loss at step 500: 0.06588062644004822 Loss at step 550: 0.051238760352134705 Loss at step 600: 0.05138370767235756 Loss at step 650: 0.0613965205848217 Loss at step 700: 0.03715534508228302 Loss at step 750: 0.05500869080424309 Loss at step 800: 0.07757848501205444 Loss at step 850: 0.06786368042230606 Loss at step 900: 0.0420406349003315 Mean training loss after epoch 40: 0.05336185254410767 EPOCH: 41 Loss at step 0: 0.04463280364871025 Loss at step 50: 0.042441241443157196 Loss at step 100: 0.04412682726979256 Loss at step 150: 0.07652612030506134 Loss at step 200: 0.04528549686074257 Loss at step 250: 0.042522359639406204 Loss at step 300: 0.055694784969091415 Loss at step 350: 0.05537368729710579 Loss at step 400: 0.05099685862660408 Loss at step 450: 0.07990245521068573 Loss at step 500: 0.042299382388591766 Loss at step 550: 0.050495702773332596 Loss at step 600: 0.05253590643405914 Loss at step 650: 0.042639926075935364 Loss at step 700: 0.04351387917995453 Loss at step 750: 0.050791047513484955 Loss at step 800: 0.05065562576055527 Loss at step 850: 0.056836359202861786 Loss at step 900: 0.04871829226613045 Mean training loss after epoch 41: 0.05192941717152148 EPOCH: 42 Loss at step 0: 0.05419398844242096 Loss at step 50: 0.061052631586790085 Loss at step 100: 0.04561636224389076 Loss at step 150: 0.04703420400619507 Loss at step 200: 0.04350092262029648 Loss at step 250: 0.052190735936164856 Loss at step 300: 0.05374899134039879 Loss at step 350: 0.049110352993011475 Loss at step 400: 0.05230622738599777 Loss at step 450: 0.04520367830991745 Loss at step 500: 0.04741860181093216 Loss at step 550: 0.061318445950746536 Loss at step 600: 0.040016982704401016 Loss at step 650: 0.08398542553186417 Loss at step 700: 0.04936767369508743 Loss at step 750: 0.054242901504039764 Loss at step 800: 0.06486527621746063 Loss at step 850: 0.040605101734399796 Loss at step 900: 0.04274871200323105 Mean training loss after epoch 42: 0.05313773624011194 EPOCH: 43 Loss at step 0: 0.07220982015132904 Loss at step 50: 0.04995967447757721 Loss at step 100: 0.05277238413691521 Loss at step 150: 0.04592238366603851 Loss at step 200: 0.03931228071451187 Loss at step 250: 0.04937140643596649 Loss at step 300: 0.05456802248954773 Loss at step 350: 0.05300365015864372 Loss at step 400: 0.06010204181075096 Loss at step 450: 0.04582206532359123 Loss at step 500: 0.0443120077252388 Loss at step 550: 0.04928284138441086 Loss at step 600: 0.05886274203658104 Loss at step 650: 0.03908972814679146 Loss at step 700: 0.053044721484184265 Loss at step 750: 0.05071185529232025 Loss at step 800: 0.04959995672106743 Loss at step 850: 0.04500409588217735 Loss at step 900: 0.060250043869018555 Mean training loss after epoch 43: 0.05242794456242371 EPOCH: 44 Loss at step 0: 0.06881649792194366 Loss at step 50: 0.06501111388206482 Loss at step 100: 0.04589168354868889 Loss at step 150: 0.052366703748703 Loss at step 200: 0.05281873419880867 Loss at step 250: 0.04559116065502167 Loss at step 300: 0.050950787961483 Loss at step 350: 0.04548067972064018 Loss at step 400: 0.06596677750349045 Loss at step 450: 0.04409659653902054 Loss at step 500: 0.05103202164173126 Loss at step 550: 0.04238968342542648 Loss at step 600: 0.05037658289074898 Loss at step 650: 0.062277957797050476 Loss at step 700: 0.0449141263961792 Loss at step 750: 0.06508783996105194 Loss at step 800: 0.0647672489285469 Loss at step 850: 0.040505051612854004 Loss at step 900: 0.04051791876554489 Mean training loss after epoch 44: 0.052238932316268936 EPOCH: 45 Loss at step 0: 0.04335169866681099 Loss at step 50: 0.05988471210002899 Loss at step 100: 0.0565020889043808 Loss at step 150: 0.03942599147558212 Loss at step 200: 0.04371253401041031 Loss at step 250: 0.04010947793722153 Loss at step 300: 0.054537948220968246 Loss at step 350: 0.043327514082193375 Loss at step 400: 0.044455356895923615 Loss at step 450: 0.04105052724480629 Loss at step 500: 0.050342559814453125 Loss at step 550: 0.05141071230173111 Loss at step 600: 0.047359269112348557 Loss at step 650: 0.05273369327187538 Loss at step 700: 0.05100448057055473 Loss at step 750: 0.05109340324997902 Loss at step 800: 0.05621657148003578 Loss at step 850: 0.049869049340486526 Loss at step 900: 0.05222681537270546 Mean training loss after epoch 45: 0.0520110372136206 EPOCH: 46 Loss at step 0: 0.04744008183479309 Loss at step 50: 0.09220264852046967 Loss at step 100: 0.04865335673093796 Loss at step 150: 0.0445791557431221 Loss at step 200: 0.04327036440372467 Loss at step 250: 0.040208108723163605 Loss at step 300: 0.04335105046629906 Loss at step 350: 0.06127457693219185 Loss at step 400: 0.04563213139772415 Loss at step 450: 0.04399702325463295 Loss at step 500: 0.06041733920574188 Loss at step 550: 0.08049262315034866 Loss at step 600: 0.050656549632549286 Loss at step 650: 0.04378483071923256 Loss at step 700: 0.06066746637225151 Loss at step 750: 0.053949978202581406 Loss at step 800: 0.060890331864356995 Loss at step 850: 0.04763341322541237 Loss at step 900: 0.045564115047454834 Mean training loss after epoch 46: 0.05214744370613398 EPOCH: 47 Loss at step 0: 0.06334493309259415 Loss at step 50: 0.043001845479011536 Loss at step 100: 0.041305288672447205 Loss at step 150: 0.07116203755140305 Loss at step 200: 0.05902948975563049 Loss at step 250: 0.03352877125144005 Loss at step 300: 0.06669014692306519 Loss at step 350: 0.05279366672039032 Loss at step 400: 0.0584820955991745 Loss at step 450: 0.04021114483475685 Loss at step 500: 0.04710566997528076 Loss at step 550: 0.046046867966651917 Loss at step 600: 0.06527300924062729 Loss at step 650: 0.0601801872253418 Loss at step 700: 0.04708945378661156 Loss at step 750: 0.04673488810658455 Loss at step 800: 0.04641760513186455 Loss at step 850: 0.04443147033452988 Loss at step 900: 0.07035021483898163 Mean training loss after epoch 47: 0.052334080571368305 EPOCH: 48 Loss at step 0: 0.048678286373615265 Loss at step 50: 0.04798053950071335 Loss at step 100: 0.05344396457076073 Loss at step 150: 0.06464274972677231 Loss at step 200: 0.04617174342274666 Loss at step 250: 0.06757453829050064 Loss at step 300: 0.05741308629512787 Loss at step 350: 0.06421269476413727 Loss at step 400: 0.043978359550237656 Loss at step 450: 0.04808632284402847 Loss at step 500: 0.06918931752443314 Loss at step 550: 0.04354654252529144 Loss at step 600: 0.0409148745238781 Loss at step 650: 0.04661271348595619 Loss at step 700: 0.042187172919511795 Loss at step 750: 0.043993350118398666 Loss at step 800: 0.04342823475599289 Loss at step 850: 0.041872017085552216 Loss at step 900: 0.050419386476278305 Mean training loss after epoch 48: 0.05174199299120318 EPOCH: 49 Loss at step 0: 0.059843577444553375 Loss at step 50: 0.06304727494716644 Loss at step 100: 0.049113962799310684 Loss at step 150: 0.06738635152578354 Loss at step 200: 0.0517294742166996 Loss at step 250: 0.0801587849855423 Loss at step 300: 0.06058383360505104 Loss at step 350: 0.04067930579185486 Loss at step 400: 0.052246954292058945 Loss at step 450: 0.07528789341449738 Loss at step 500: 0.08474183082580566 Loss at step 550: 0.0542464479804039 Loss at step 600: 0.046418387442827225 Loss at step 650: 0.051564257591962814 Loss at step 700: 0.06672357767820358 Loss at step 750: 0.04466120898723602 Loss at step 800: 0.043668415397405624 Loss at step 850: 0.05778983235359192 Loss at step 900: 0.056089360266923904 Mean training loss after epoch 49: 0.05222476798413532 EPOCH: 50 Loss at step 0: 0.07730726897716522 Loss at step 50: 0.039322927594184875 Loss at step 100: 0.06098679080605507 Loss at step 150: 0.04104066267609596 Loss at step 200: 0.04662679508328438 Loss at step 250: 0.06167072057723999 Loss at step 300: 0.05266662314534187 Loss at step 350: 0.04234598949551582 Loss at step 400: 0.04490789398550987 Loss at step 450: 0.054746489971876144 Loss at step 500: 0.04013847932219505 Loss at step 550: 0.05154792591929436 Loss at step 600: 0.05065382272005081 Loss at step 650: 0.04308348149061203 Loss at step 700: 0.04812760278582573 Loss at step 750: 0.040205128490924835 Loss at step 800: 0.050986457616090775 Loss at step 850: 0.04447365552186966 Loss at step 900: 0.050267353653907776 Mean training loss after epoch 50: 0.051064737971181044 EPOCH: 51 Loss at step 0: 0.05700429528951645 Loss at step 50: 0.06748168170452118 Loss at step 100: 0.05973386764526367 Loss at step 150: 0.04503392428159714 Loss at step 200: 0.06442605704069138 Loss at step 250: 0.051454853266477585 Loss at step 300: 0.057949457317590714 Loss at step 350: 0.07098916918039322 Loss at step 400: 0.05279741436243057 Loss at step 450: 0.05992886796593666 Loss at step 500: 0.04893937334418297 Loss at step 550: 0.04913891851902008 Loss at step 600: 0.0436481349170208 Loss at step 650: 0.0482383631169796 Loss at step 700: 0.0519651398062706 Loss at step 750: 0.05052124708890915 Loss at step 800: 0.0582507848739624 Loss at step 850: 0.04967383295297623 Loss at step 900: 0.05037849023938179 Mean training loss after epoch 51: 0.05160942872259409 EPOCH: 52 Loss at step 0: 0.04921506345272064 Loss at step 50: 0.04847161844372749 Loss at step 100: 0.051314450800418854 Loss at step 150: 0.06377735733985901 Loss at step 200: 0.057962074875831604 Loss at step 250: 0.06625580787658691 Loss at step 300: 0.048517704010009766 Loss at step 350: 0.06590788066387177 Loss at step 400: 0.04050039127469063 Loss at step 450: 0.0497954897582531 Loss at step 500: 0.03971201553940773 Loss at step 550: 0.05543612688779831 Loss at step 600: 0.044844381511211395 Loss at step 650: 0.05199620872735977 Loss at step 700: 0.059054743498563766 Loss at step 750: 0.0392836295068264 Loss at step 800: 0.04268956556916237 Loss at step 850: 0.0572165846824646 Loss at step 900: 0.048795077949762344 Mean training loss after epoch 52: 0.051823142093858486 EPOCH: 53 Loss at step 0: 0.04877086356282234 Loss at step 50: 0.051732469350099564 Loss at step 100: 0.05237935855984688 Loss at step 150: 0.053616929799318314 Loss at step 200: 0.0594794787466526 Loss at step 250: 0.0621161051094532 Loss at step 300: 0.05698757618665695 Loss at step 350: 0.040153756737709045 Loss at step 400: 0.06672519445419312 Loss at step 450: 0.04268636554479599 Loss at step 500: 0.05624125152826309 Loss at step 550: 0.04814796894788742 Loss at step 600: 0.05454114079475403 Loss at step 650: 0.048263661563396454 Loss at step 700: 0.049506936222314835 Loss at step 750: 0.045945215970277786 Loss at step 800: 0.045099228620529175 Loss at step 850: 0.037148721516132355 Loss at step 900: 0.049464672803878784 Mean training loss after epoch 53: 0.05160045661508783 EPOCH: 54 Loss at step 0: 0.06570709496736526 Loss at step 50: 0.05618491768836975 Loss at step 100: 0.07387585937976837 Loss at step 150: 0.047460924834012985 Loss at step 200: 0.044740986078977585 Loss at step 250: 0.0416225790977478 Loss at step 300: 0.062166985124349594 Loss at step 350: 0.050919823348522186 Loss at step 400: 0.0498194694519043 Loss at step 450: 0.04187632352113724 Loss at step 500: 0.04341103881597519 Loss at step 550: 0.046307068318128586 Loss at step 600: 0.04182589799165726 Loss at step 650: 0.043032802641391754 Loss at step 700: 0.05058400332927704 Loss at step 750: 0.04318546503782272 Loss at step 800: 0.06491613388061523 Loss at step 850: 0.0738903135061264 Loss at step 900: 0.048315685242414474 Mean training loss after epoch 54: 0.0518916003120098 EPOCH: 55 Loss at step 0: 0.04703742638230324 Loss at step 50: 0.05610983446240425 Loss at step 100: 0.06372210383415222 Loss at step 150: 0.04502696171402931 Loss at step 200: 0.047079894691705704 Loss at step 250: 0.05695986747741699 Loss at step 300: 0.045650701969861984 Loss at step 350: 0.053284358233213425 Loss at step 400: 0.06193862482905388 Loss at step 450: 0.044673822820186615 Loss at step 500: 0.05352337285876274 Loss at step 550: 0.06254230439662933 Loss at step 600: 0.044199880212545395 Loss at step 650: 0.042015090584754944 Loss at step 700: 0.07393958419561386 Loss at step 750: 0.04949575290083885 Loss at step 800: 0.04628092795610428 Loss at step 850: 0.03855917975306511 Loss at step 900: 0.05910501629114151 Mean training loss after epoch 55: 0.05107430944532982 EPOCH: 56 Loss at step 0: 0.054718635976314545 Loss at step 50: 0.03806263953447342 Loss at step 100: 0.06221146881580353 Loss at step 150: 0.060800950974226 Loss at step 200: 0.06683429330587387 Loss at step 250: 0.049812715500593185 Loss at step 300: 0.04348970949649811 Loss at step 350: 0.04575766250491142 Loss at step 400: 0.04081176593899727 Loss at step 450: 0.04893069341778755 Loss at step 500: 0.04088420048356056 Loss at step 550: 0.0433179996907711 Loss at step 600: 0.04609419405460358 Loss at step 650: 0.05773158371448517 Loss at step 700: 0.05205998942255974 Loss at step 750: 0.04020344838500023 Loss at step 800: 0.049087412655353546 Loss at step 850: 0.04753759503364563 Loss at step 900: 0.06387466937303543 Mean training loss after epoch 56: 0.051701201967942687 EPOCH: 57 Loss at step 0: 0.05704663693904877 Loss at step 50: 0.06344162672758102 Loss at step 100: 0.04397514835000038 Loss at step 150: 0.046032972633838654 Loss at step 200: 0.040545135736465454 Loss at step 250: 0.05314444378018379 Loss at step 300: 0.03840856999158859 Loss at step 350: 0.04263267293572426 Loss at step 400: 0.047070279717445374 Loss at step 450: 0.04869265854358673 Loss at step 500: 0.04961506277322769 Loss at step 550: 0.049489885568618774 Loss at step 600: 0.04505577310919762 Loss at step 650: 0.0442231222987175 Loss at step 700: 0.055069323629140854 Loss at step 750: 0.03864406421780586 Loss at step 800: 0.0516878142952919 Loss at step 850: 0.04796821251511574 Loss at step 900: 0.04297705367207527 Mean training loss after epoch 57: 0.05092442163160996 EPOCH: 58 Loss at step 0: 0.03791898116469383 Loss at step 50: 0.04859341308474541 Loss at step 100: 0.050501830875873566 Loss at step 150: 0.058060478419065475 Loss at step 200: 0.061533961445093155 Loss at step 250: 0.04703228548169136 Loss at step 300: 0.045163776725530624 Loss at step 350: 0.044098835438489914 Loss at step 400: 0.050891242921352386 Loss at step 450: 0.05651549622416496 Loss at step 500: 0.040559321641922 Loss at step 550: 0.05831410735845566 Loss at step 600: 0.060178034007549286 Loss at step 650: 0.04794180393218994 Loss at step 700: 0.0498415045440197 Loss at step 750: 0.04138218238949776 Loss at step 800: 0.05206992104649544 Loss at step 850: 0.04769282788038254 Loss at step 900: 0.04517361521720886 Mean training loss after epoch 58: 0.050956997881407166 EPOCH: 59 Loss at step 0: 0.05800165235996246 Loss at step 50: 0.049462053924798965 Loss at step 100: 0.04753778502345085 Loss at step 150: 0.04435642436146736 Loss at step 200: 0.06326772272586823 Loss at step 250: 0.057381901890039444 Loss at step 300: 0.05919411778450012 Loss at step 350: 0.045620258897542953 Loss at step 400: 0.04668216034770012 Loss at step 450: 0.08037187904119492 Loss at step 500: 0.04911748692393303 Loss at step 550: 0.05818548798561096 Loss at step 600: 0.04258112981915474 Loss at step 650: 0.03999834880232811 Loss at step 700: 0.03783874958753586 Loss at step 750: 0.04801853373646736 Loss at step 800: 0.048125430941581726 Loss at step 850: 0.05036113038659096 Loss at step 900: 0.052047595381736755 Mean training loss after epoch 59: 0.050994112972479896 EPOCH: 60 Loss at step 0: 0.04704003408551216 Loss at step 50: 0.04999922588467598 Loss at step 100: 0.045486971735954285 Loss at step 150: 0.05661618709564209 Loss at step 200: 0.05969623103737831 Loss at step 250: 0.06106581538915634 Loss at step 300: 0.051666975021362305 Loss at step 350: 0.04490900784730911 Loss at step 400: 0.05739257484674454 Loss at step 450: 0.06081131845712662 Loss at step 500: 0.08291563391685486 Loss at step 550: 0.0533590167760849 Loss at step 600: 0.04949339106678963 Loss at step 650: 0.06067138537764549 Loss at step 700: 0.05453011393547058 Loss at step 750: 0.07079144567251205 Loss at step 800: 0.04292576387524605 Loss at step 850: 0.03816090151667595 Loss at step 900: 0.04948538914322853 Mean training loss after epoch 60: 0.05122591244545319 EPOCH: 61 Loss at step 0: 0.0529257208108902 Loss at step 50: 0.04618692398071289 Loss at step 100: 0.05143215134739876 Loss at step 150: 0.04695028066635132 Loss at step 200: 0.04216445982456207 Loss at step 250: 0.045071735978126526 Loss at step 300: 0.04147830232977867 Loss at step 350: 0.04242280125617981 Loss at step 400: 0.042706020176410675 Loss at step 450: 0.04796634614467621 Loss at step 500: 0.05429748818278313 Loss at step 550: 0.07536093145608902 Loss at step 600: 0.05044069141149521 Loss at step 650: 0.045819614082574844 Loss at step 700: 0.04423454776406288 Loss at step 750: 0.06065993383526802 Loss at step 800: 0.05541303753852844 Loss at step 850: 0.0486580915749073 Loss at step 900: 0.05072345584630966 Mean training loss after epoch 61: 0.05076696010810861 EPOCH: 62 Loss at step 0: 0.045170072466135025 Loss at step 50: 0.039409443736076355 Loss at step 100: 0.05257018655538559 Loss at step 150: 0.04666901007294655 Loss at step 200: 0.046109214425086975 Loss at step 250: 0.05578288063406944 Loss at step 300: 0.046791426837444305 Loss at step 350: 0.04945089668035507 Loss at step 400: 0.04336920008063316 Loss at step 450: 0.06196383014321327 Loss at step 500: 0.04986429959535599 Loss at step 550: 0.05044066160917282 Loss at step 600: 0.058090247213840485 Loss at step 650: 0.04784657806158066 Loss at step 700: 0.06890707463026047 Loss at step 750: 0.050859760493040085 Loss at step 800: 0.04876793920993805 Loss at step 850: 0.04607706516981125 Loss at step 900: 0.05400692671537399 Mean training loss after epoch 62: 0.05129054434168567 EPOCH: 63 Loss at step 0: 0.04717352241277695 Loss at step 50: 0.08160953223705292 Loss at step 100: 0.04645571485161781 Loss at step 150: 0.066143698990345 Loss at step 200: 0.04273189604282379 Loss at step 250: 0.043605394661426544 Loss at step 300: 0.04916206747293472 Loss at step 350: 0.043965376913547516 Loss at step 400: 0.04549300670623779 Loss at step 450: 0.0490669310092926 Loss at step 500: 0.0407918356359005 Loss at step 550: 0.07341572642326355 Loss at step 600: 0.041574135422706604 Loss at step 650: 0.04465575888752937 Loss at step 700: 0.044661745429039 Loss at step 750: 0.03733194246888161 Loss at step 800: 0.04462084546685219 Loss at step 850: 0.05270441249012947 Loss at step 900: 0.05932937189936638 Mean training loss after epoch 63: 0.05034899110717179 EPOCH: 64 Loss at step 0: 0.05004839971661568 Loss at step 50: 0.048374880105257034 Loss at step 100: 0.045799620449543 Loss at step 150: 0.044760771095752716 Loss at step 200: 0.050215184688568115 Loss at step 250: 0.04337754473090172 Loss at step 300: 0.04679477959871292 Loss at step 350: 0.044668179005384445 Loss at step 400: 0.04020514339208603 Loss at step 450: 0.04759356379508972 Loss at step 500: 0.04927915707230568 Loss at step 550: 0.04045950248837471 Loss at step 600: 0.04013300687074661 Loss at step 650: 0.04482993111014366 Loss at step 700: 0.04367061331868172 Loss at step 750: 0.03685653209686279 Loss at step 800: 0.05268407240509987 Loss at step 850: 0.0475052185356617 Loss at step 900: 0.05607149377465248 Mean training loss after epoch 64: 0.05036599384442067 EPOCH: 65 Loss at step 0: 0.040364135056734085 Loss at step 50: 0.04769054055213928 Loss at step 100: 0.05866328999400139 Loss at step 150: 0.058180127292871475 Loss at step 200: 0.03929576650261879 Loss at step 250: 0.0378921814262867 Loss at step 300: 0.041699592024087906 Loss at step 350: 0.045517511665821075 Loss at step 400: 0.05446157604455948 Loss at step 450: 0.03979659825563431 Loss at step 500: 0.043068427592515945 Loss at step 550: 0.05807185173034668 Loss at step 600: 0.038154877722263336 Loss at step 650: 0.05153799057006836 Loss at step 700: 0.044144146144390106 Loss at step 750: 0.04160083830356598 Loss at step 800: 0.05352494493126869 Loss at step 850: 0.04288729652762413 Loss at step 900: 0.04979894682765007 Mean training loss after epoch 65: 0.050891414924121615 EPOCH: 66 Loss at step 0: 0.049171630293130875 Loss at step 50: 0.048286695033311844 Loss at step 100: 0.04526142776012421 Loss at step 150: 0.048606593161821365 Loss at step 200: 0.04630963131785393 Loss at step 250: 0.042576391249895096 Loss at step 300: 0.04235804080963135 Loss at step 350: 0.06159910559654236 Loss at step 400: 0.04631379246711731 Loss at step 450: 0.05314553529024124 Loss at step 500: 0.043132685124874115 Loss at step 550: 0.05759816989302635 Loss at step 600: 0.04580684006214142 Loss at step 650: 0.045405931770801544 Loss at step 700: 0.05848602578043938 Loss at step 750: 0.05385895073413849 Loss at step 800: 0.048791028559207916 Loss at step 850: 0.04013132303953171 Loss at step 900: 0.07109563052654266 Mean training loss after epoch 66: 0.050346224936150284 EPOCH: 67 Loss at step 0: 0.05570570006966591 Loss at step 50: 0.05974671617150307 Loss at step 100: 0.04547185078263283 Loss at step 150: 0.044866882264614105 Loss at step 200: 0.06495455652475357 Loss at step 250: 0.04716600105166435 Loss at step 300: 0.04231267794966698 Loss at step 350: 0.041276246309280396 Loss at step 400: 0.04982661455869675 Loss at step 450: 0.04563838988542557 Loss at step 500: 0.08127034455537796 Loss at step 550: 0.053776271641254425 Loss at step 600: 0.04744301736354828 Loss at step 650: 0.045860376209020615 Loss at step 700: 0.044031914323568344 Loss at step 750: 0.0574486181139946 Loss at step 800: 0.060910891741514206 Loss at step 850: 0.044506363570690155 Loss at step 900: 0.05276281014084816 Mean training loss after epoch 67: 0.050118425432076334 EPOCH: 68 Loss at step 0: 0.07838599383831024 Loss at step 50: 0.057270050048828125 Loss at step 100: 0.04398617520928383 Loss at step 150: 0.06875047087669373 Loss at step 200: 0.04419751837849617 Loss at step 250: 0.0422881618142128 Loss at step 300: 0.04767818748950958 Loss at step 350: 0.03940465301275253 Loss at step 400: 0.03851163014769554 Loss at step 450: 0.06456851959228516 Loss at step 500: 0.04567818343639374 Loss at step 550: 0.043894339352846146 Loss at step 600: 0.04671262204647064 Loss at step 650: 0.06549642980098724 Loss at step 700: 0.042775511741638184 Loss at step 750: 0.07136005908250809 Loss at step 800: 0.0682968944311142 Loss at step 850: 0.05361362174153328 Loss at step 900: 0.04421563073992729 Mean training loss after epoch 68: 0.0498697652793261 EPOCH: 69 Loss at step 0: 0.043934568762779236 Loss at step 50: 0.04894100874662399 Loss at step 100: 0.033745866268873215 Loss at step 150: 0.047060973942279816 Loss at step 200: 0.048971932381391525 Loss at step 250: 0.04393519088625908 Loss at step 300: 0.04938068985939026 Loss at step 350: 0.051268115639686584 Loss at step 400: 0.0446104072034359 Loss at step 450: 0.0622677244246006 Loss at step 500: 0.06359981745481491 Loss at step 550: 0.043566033244132996 Loss at step 600: 0.03748597204685211 Loss at step 650: 0.04594617709517479 Loss at step 700: 0.04788847267627716 Loss at step 750: 0.04379519447684288 Loss at step 800: 0.06489232927560806 Loss at step 850: 0.04078242555260658 Loss at step 900: 0.043725382536649704 Mean training loss after epoch 69: 0.05008557119539806 EPOCH: 70 Loss at step 0: 0.04628583416342735 Loss at step 50: 0.04489534720778465 Loss at step 100: 0.0601724311709404 Loss at step 150: 0.06418289989233017 Loss at step 200: 0.04378928244113922 Loss at step 250: 0.06021778658032417 Loss at step 300: 0.062019895762205124 Loss at step 350: 0.05925728380680084 Loss at step 400: 0.05062420293688774 Loss at step 450: 0.04968784376978874 Loss at step 500: 0.0562630370259285 Loss at step 550: 0.0545557364821434 Loss at step 600: 0.041002679616212845 Loss at step 650: 0.047820571810007095 Loss at step 700: 0.05764731392264366 Loss at step 750: 0.059197794646024704 Loss at step 800: 0.05873227119445801 Loss at step 850: 0.04714258760213852 Loss at step 900: 0.04861563816666603 Mean training loss after epoch 70: 0.050214311417772065 EPOCH: 71 Loss at step 0: 0.05924751237034798 Loss at step 50: 0.04485459625720978 Loss at step 100: 0.04354957491159439 Loss at step 150: 0.05759981647133827 Loss at step 200: 0.04213394224643707 Loss at step 250: 0.049675170332193375 Loss at step 300: 0.04087544232606888 Loss at step 350: 0.06799086928367615 Loss at step 400: 0.07687286287546158 Loss at step 450: 0.04114937782287598 Loss at step 500: 0.05114371329545975 Loss at step 550: 0.04506329447031021 Loss at step 600: 0.05867122858762741 Loss at step 650: 0.04098004475235939 Loss at step 700: 0.04890831932425499 Loss at step 750: 0.04448964446783066 Loss at step 800: 0.04816422984004021 Loss at step 850: 0.04242493584752083 Loss at step 900: 0.043583497405052185 Mean training loss after epoch 71: 0.04991116140013946 EPOCH: 72 Loss at step 0: 0.06295810639858246 Loss at step 50: 0.06067740172147751 Loss at step 100: 0.03900166228413582 Loss at step 150: 0.05908537283539772 Loss at step 200: 0.06486482173204422 Loss at step 250: 0.04703989624977112 Loss at step 300: 0.043847743421792984 Loss at step 350: 0.04073580726981163 Loss at step 400: 0.042772311717271805 Loss at step 450: 0.05016903206706047 Loss at step 500: 0.04429326578974724 Loss at step 550: 0.05105995014309883 Loss at step 600: 0.0523739792406559 Loss at step 650: 0.04315879940986633 Loss at step 700: 0.05653160437941551 Loss at step 750: 0.047688547521829605 Loss at step 800: 0.042826175689697266 Loss at step 850: 0.047797296196222305 Loss at step 900: 0.044900815933942795 Mean training loss after epoch 72: 0.05014316279337858 EPOCH: 73 Loss at step 0: 0.049324881285429 Loss at step 50: 0.04234444350004196 Loss at step 100: 0.05287063866853714 Loss at step 150: 0.03627130389213562 Loss at step 200: 0.047968290746212006 Loss at step 250: 0.04452960193157196 Loss at step 300: 0.09241139143705368 Loss at step 350: 0.048379041254520416 Loss at step 400: 0.04883043095469475 Loss at step 450: 0.04453295096755028 Loss at step 500: 0.04316273331642151 Loss at step 550: 0.038302771747112274 Loss at step 600: 0.04349696636199951 Loss at step 650: 0.04451757296919823 Loss at step 700: 0.04050501435995102 Loss at step 750: 0.07538346946239471 Loss at step 800: 0.04725679010152817 Loss at step 850: 0.048073552548885345 Loss at step 900: 0.055838022381067276 Mean training loss after epoch 73: 0.05015706202225772 EPOCH: 74 Loss at step 0: 0.04782593995332718 Loss at step 50: 0.04040204733610153 Loss at step 100: 0.046974629163742065 Loss at step 150: 0.05674521252512932 Loss at step 200: 0.038781046867370605 Loss at step 250: 0.044871777296066284 Loss at step 300: 0.0670565590262413 Loss at step 350: 0.04893114045262337 Loss at step 400: 0.05826184153556824 Loss at step 450: 0.039714232087135315 Loss at step 500: 0.043467286974191666 Loss at step 550: 0.04774037003517151 Loss at step 600: 0.05850028246641159 Loss at step 650: 0.04323713853955269 Loss at step 700: 0.05983903259038925 Loss at step 750: 0.04518348351120949 Loss at step 800: 0.037743713706731796 Loss at step 850: 0.04765409231185913 Loss at step 900: 0.0647517517209053 Mean training loss after epoch 74: 0.0500479978102166 EPOCH: 75 Loss at step 0: 0.046374864876270294 Loss at step 50: 0.07098346203565598 Loss at step 100: 0.05029915273189545 Loss at step 150: 0.04307358339428902 Loss at step 200: 0.04149695858359337 Loss at step 250: 0.04551580548286438 Loss at step 300: 0.056261420249938965 Loss at step 350: 0.0652204155921936 Loss at step 400: 0.06650262326002121 Loss at step 450: 0.03607139736413956 Loss at step 500: 0.0779443234205246 Loss at step 550: 0.06874851137399673 Loss at step 600: 0.048428554087877274 Loss at step 650: 0.05616293102502823 Loss at step 700: 0.041777513921260834 Loss at step 750: 0.0465385764837265 Loss at step 800: 0.04477789252996445 Loss at step 850: 0.05935507267713547 Loss at step 900: 0.04809953644871712 Mean training loss after epoch 75: 0.050210673294500754 EPOCH: 76 Loss at step 0: 0.044063106179237366 Loss at step 50: 0.04791790992021561 Loss at step 100: 0.049018532037734985 Loss at step 150: 0.06660467386245728 Loss at step 200: 0.060037996619939804 Loss at step 250: 0.0532044880092144 Loss at step 300: 0.04531312733888626 Loss at step 350: 0.057441093027591705 Loss at step 400: 0.03851018100976944 Loss at step 450: 0.0543062761425972 Loss at step 500: 0.04034740850329399 Loss at step 550: 0.04283927008509636 Loss at step 600: 0.047690559178590775 Loss at step 650: 0.04616939648985863 Loss at step 700: 0.04882442578673363 Loss at step 750: 0.04310169070959091 Loss at step 800: 0.0474732331931591 Loss at step 850: 0.05042954906821251 Loss at step 900: 0.04356149584054947 Mean training loss after epoch 76: 0.04911110982664231 EPOCH: 77 Loss at step 0: 0.04581378027796745 Loss at step 50: 0.0488450862467289 Loss at step 100: 0.04347705841064453 Loss at step 150: 0.048986878246068954 Loss at step 200: 0.04546577110886574 Loss at step 250: 0.0455501526594162 Loss at step 300: 0.0610041618347168 Loss at step 350: 0.06500419229269028 Loss at step 400: 0.06105761602520943 Loss at step 450: 0.04236435145139694 Loss at step 500: 0.04703521728515625 Loss at step 550: 0.045128654688596725 Loss at step 600: 0.04426829516887665 Loss at step 650: 0.04098552092909813 Loss at step 700: 0.05003681778907776 Loss at step 750: 0.06544378399848938 Loss at step 800: 0.05895969644188881 Loss at step 850: 0.04944916069507599 Loss at step 900: 0.04170158877968788 Mean training loss after epoch 77: 0.05033854774034608 EPOCH: 78 Loss at step 0: 0.0401393286883831 Loss at step 50: 0.06864287704229355 Loss at step 100: 0.05438418313860893 Loss at step 150: 0.048720281571149826 Loss at step 200: 0.04838241636753082 Loss at step 250: 0.038070909678936005 Loss at step 300: 0.06045425683259964 Loss at step 350: 0.05647020787000656 Loss at step 400: 0.07525329291820526 Loss at step 450: 0.05821820721030235 Loss at step 500: 0.033526282757520676 Loss at step 550: 0.04264465719461441 Loss at step 600: 0.047454677522182465 Loss at step 650: 0.050573304295539856 Loss at step 700: 0.05161435529589653 Loss at step 750: 0.054218824952840805 Loss at step 800: 0.04025213420391083 Loss at step 850: 0.04642380028963089 Loss at step 900: 0.05001845210790634 Mean training loss after epoch 78: 0.04962524947231767 EPOCH: 79 Loss at step 0: 0.04150118678808212 Loss at step 50: 0.04279084876179695 Loss at step 100: 0.061266377568244934 Loss at step 150: 0.04423912242054939 Loss at step 200: 0.03968709334731102 Loss at step 250: 0.04060054197907448 Loss at step 300: 0.03931966796517372 Loss at step 350: 0.04212937876582146 Loss at step 400: 0.035971298813819885 Loss at step 450: 0.05450601875782013 Loss at step 500: 0.04486501216888428 Loss at step 550: 0.057187024503946304 Loss at step 600: 0.08405883610248566 Loss at step 650: 0.05945008993148804 Loss at step 700: 0.040889546275138855 Loss at step 750: 0.045303236693143845 Loss at step 800: 0.04918999597430229 Loss at step 850: 0.049786366522312164 Loss at step 900: 0.06428909301757812 Mean training loss after epoch 79: 0.04925928820472664 EPOCH: 80 Loss at step 0: 0.046271633356809616 Loss at step 50: 0.04907194525003433 Loss at step 100: 0.044059451669454575 Loss at step 150: 0.0745888352394104 Loss at step 200: 0.04555227607488632 Loss at step 250: 0.05743687227368355 Loss at step 300: 0.04701891541481018 Loss at step 350: 0.06384223699569702 Loss at step 400: 0.04667115584015846 Loss at step 450: 0.039980385452508926 Loss at step 500: 0.050276853144168854 Loss at step 550: 0.043712519109249115 Loss at step 600: 0.0549771785736084 Loss at step 650: 0.047554198652505875 Loss at step 700: 0.04244942590594292 Loss at step 750: 0.068080835044384 Loss at step 800: 0.05198397487401962 Loss at step 850: 0.059530626982450485 Loss at step 900: 0.04206497594714165 Mean training loss after epoch 80: 0.05009438978758321 EPOCH: 81 Loss at step 0: 0.06408950686454773 Loss at step 50: 0.04227837547659874 Loss at step 100: 0.04312210530042648 Loss at step 150: 0.0597798191010952 Loss at step 200: 0.048537690192461014 Loss at step 250: 0.05681542679667473 Loss at step 300: 0.054121118038892746 Loss at step 350: 0.04852880537509918 Loss at step 400: 0.04138527065515518 Loss at step 450: 0.03872168809175491 Loss at step 500: 0.045597679913043976 Loss at step 550: 0.04516393318772316 Loss at step 600: 0.06174073740839958 Loss at step 650: 0.06342548131942749 Loss at step 700: 0.044436853379011154 Loss at step 750: 0.04963374137878418 Loss at step 800: 0.051497682929039 Loss at step 850: 0.061029598116874695 Loss at step 900: 0.0525282621383667 Mean training loss after epoch 81: 0.04967194809111705 EPOCH: 82 Loss at step 0: 0.04469631239771843 Loss at step 50: 0.046690892428159714 Loss at step 100: 0.056132588535547256 Loss at step 150: 0.06796795129776001 Loss at step 200: 0.048907581716775894 Loss at step 250: 0.04631473869085312 Loss at step 300: 0.05341147631406784 Loss at step 350: 0.03813350573182106 Loss at step 400: 0.05712549015879631 Loss at step 450: 0.04514017701148987 Loss at step 500: 0.04591641202569008 Loss at step 550: 0.0357416607439518 Loss at step 600: 0.040903665125370026 Loss at step 650: 0.03916526213288307 Loss at step 700: 0.041303250938653946 Loss at step 750: 0.04796610400080681 Loss at step 800: 0.05697270855307579 Loss at step 850: 0.04189692810177803 Loss at step 900: 0.038887809962034225 Mean training loss after epoch 82: 0.04978321601888899 EPOCH: 83 Loss at step 0: 0.044696688652038574 Loss at step 50: 0.042571213096380234 Loss at step 100: 0.044071078300476074 Loss at step 150: 0.04886649549007416 Loss at step 200: 0.04747084900736809 Loss at step 250: 0.047336727380752563 Loss at step 300: 0.04314280301332474 Loss at step 350: 0.05387262627482414 Loss at step 400: 0.04965066909790039 Loss at step 450: 0.0399225614964962 Loss at step 500: 0.04800769314169884 Loss at step 550: 0.04022686183452606 Loss at step 600: 0.03997340053319931 Loss at step 650: 0.05505065992474556 Loss at step 700: 0.05618844926357269 Loss at step 750: 0.0399673730134964 Loss at step 800: 0.05837350711226463 Loss at step 850: 0.05255882069468498 Loss at step 900: 0.061643972992897034 Mean training loss after epoch 83: 0.049807082500253154 EPOCH: 84 Loss at step 0: 0.04430685192346573 Loss at step 50: 0.05718126893043518 Loss at step 100: 0.04967713728547096 Loss at step 150: 0.06849541515111923 Loss at step 200: 0.06359919905662537 Loss at step 250: 0.04295890033245087 Loss at step 300: 0.0778447762131691 Loss at step 350: 0.04167753830552101 Loss at step 400: 0.05978678539395332 Loss at step 450: 0.04790279641747475 Loss at step 500: 0.04586710035800934 Loss at step 550: 0.04812353476881981 Loss at step 600: 0.06456448137760162 Loss at step 650: 0.050504736602306366 Loss at step 700: 0.05369085073471069 Loss at step 750: 0.037896715104579926 Loss at step 800: 0.04759923368692398 Loss at step 850: 0.046865977346897125 Loss at step 900: 0.0404953770339489 Mean training loss after epoch 84: 0.049343062135805964 EPOCH: 85 Loss at step 0: 0.05046042054891586 Loss at step 50: 0.055002376437187195 Loss at step 100: 0.05485739931464195 Loss at step 150: 0.04909391701221466 Loss at step 200: 0.051128629595041275 Loss at step 250: 0.07906811684370041 Loss at step 300: 0.04183168336749077 Loss at step 350: 0.04271380603313446 Loss at step 400: 0.041309162974357605 Loss at step 450: 0.05367245525121689 Loss at step 500: 0.06387744843959808 Loss at step 550: 0.04261847957968712 Loss at step 600: 0.045111194252967834 Loss at step 650: 0.04453716427087784 Loss at step 700: 0.03961269184947014 Loss at step 750: 0.04325570911169052 Loss at step 800: 0.05692153424024582 Loss at step 850: 0.044681746512651443 Loss at step 900: 0.041538305580616 Mean training loss after epoch 85: 0.04996878141835173 EPOCH: 86 Loss at step 0: 0.04415849596261978 Loss at step 50: 0.04128764569759369 Loss at step 100: 0.05729440227150917 Loss at step 150: 0.03987518325448036 Loss at step 200: 0.046550750732421875 Loss at step 250: 0.04656359180808067 Loss at step 300: 0.044602178037166595 Loss at step 350: 0.0752747654914856 Loss at step 400: 0.0633234977722168 Loss at step 450: 0.04294883832335472 Loss at step 500: 0.038414642214775085 Loss at step 550: 0.06052756682038307 Loss at step 600: 0.044098999351263046 Loss at step 650: 0.04006928578019142 Loss at step 700: 0.039461780339479446 Loss at step 750: 0.05038025975227356 Loss at step 800: 0.06242853030562401 Loss at step 850: 0.04324846714735031 Loss at step 900: 0.0538313128054142 Mean training loss after epoch 86: 0.04963183211984792 EPOCH: 87 Loss at step 0: 0.042789362370967865 Loss at step 50: 0.04530739039182663 Loss at step 100: 0.06257033348083496 Loss at step 150: 0.04159552976489067 Loss at step 200: 0.053391534835100174 Loss at step 250: 0.0404144749045372 Loss at step 300: 0.048430196940898895 Loss at step 350: 0.04687448963522911 Loss at step 400: 0.037999026477336884 Loss at step 450: 0.0454571433365345 Loss at step 500: 0.056420568376779556 Loss at step 550: 0.044810499995946884 Loss at step 600: 0.04330715909600258 Loss at step 650: 0.04449573904275894 Loss at step 700: 0.06225510314106941 Loss at step 750: 0.049395203590393066 Loss at step 800: 0.04462554305791855 Loss at step 850: 0.047608599066734314 Loss at step 900: 0.04538687318563461 Mean training loss after epoch 87: 0.049688750758830676 EPOCH: 88 Loss at step 0: 0.06456145644187927 Loss at step 50: 0.050503961741924286 Loss at step 100: 0.04618368297815323 Loss at step 150: 0.03939589485526085 Loss at step 200: 0.0474473237991333 Loss at step 250: 0.04578113555908203 Loss at step 300: 0.04663315415382385 Loss at step 350: 0.05692466348409653 Loss at step 400: 0.057414110749959946 Loss at step 450: 0.04246346652507782 Loss at step 500: 0.058897893875837326 Loss at step 550: 0.05395405367016792 Loss at step 600: 0.06350165605545044 Loss at step 650: 0.04406236112117767 Loss at step 700: 0.07270924001932144 Loss at step 750: 0.04284822568297386 Loss at step 800: 0.052527036517858505 Loss at step 850: 0.043290283530950546 Loss at step 900: 0.04459693282842636 Mean training loss after epoch 88: 0.049538824695355095 EPOCH: 89 Loss at step 0: 0.037260450422763824 Loss at step 50: 0.043813761323690414 Loss at step 100: 0.039857037365436554 Loss at step 150: 0.05514853075146675 Loss at step 200: 0.045483849942684174 Loss at step 250: 0.0457334965467453 Loss at step 300: 0.041142646223306656 Loss at step 350: 0.051438428461551666 Loss at step 400: 0.03642463684082031 Loss at step 450: 0.03718217834830284 Loss at step 500: 0.053569719195365906 Loss at step 550: 0.08104965835809708 Loss at step 600: 0.058939989656209946 Loss at step 650: 0.04228615015745163 Loss at step 700: 0.05895977467298508 Loss at step 750: 0.058878570795059204 Loss at step 800: 0.0628143772482872 Loss at step 850: 0.05810512229800224 Loss at step 900: 0.035853948444128036 Mean training loss after epoch 89: 0.050128749208346104 EPOCH: 90 Loss at step 0: 0.04607094079256058 Loss at step 50: 0.049979373812675476 Loss at step 100: 0.043149206787347794 Loss at step 150: 0.04139300808310509 Loss at step 200: 0.055736564099788666 Loss at step 250: 0.05848607048392296 Loss at step 300: 0.042085159569978714 Loss at step 350: 0.05474313721060753 Loss at step 400: 0.06437666714191437 Loss at step 450: 0.04497305303812027 Loss at step 500: 0.04660419374704361 Loss at step 550: 0.042044732719659805 Loss at step 600: 0.04332390055060387 Loss at step 650: 0.04653332009911537 Loss at step 700: 0.040376175194978714 Loss at step 750: 0.04832889512181282 Loss at step 800: 0.06275078654289246 Loss at step 850: 0.04724928364157677 Loss at step 900: 0.049332208931446075 Mean training loss after epoch 90: 0.049903877530254916 EPOCH: 91 Loss at step 0: 0.044104136526584625 Loss at step 50: 0.06215718016028404 Loss at step 100: 0.0427938848733902 Loss at step 150: 0.039437200874090195 Loss at step 200: 0.041375286877155304 Loss at step 250: 0.08824264258146286 Loss at step 300: 0.047282714396715164 Loss at step 350: 0.043333664536476135 Loss at step 400: 0.05817312374711037 Loss at step 450: 0.0665164589881897 Loss at step 500: 0.04573878273367882 Loss at step 550: 0.04910000041127205 Loss at step 600: 0.03893563151359558 Loss at step 650: 0.05897514894604683 Loss at step 700: 0.041741225868463516 Loss at step 750: 0.04443982243537903 Loss at step 800: 0.04583890363574028 Loss at step 850: 0.04302506521344185 Loss at step 900: 0.04091271385550499 Mean training loss after epoch 91: 0.049480246078135615 EPOCH: 92 Loss at step 0: 0.042101986706256866 Loss at step 50: 0.03859606012701988 Loss at step 100: 0.04027479514479637 Loss at step 150: 0.0411132387816906 Loss at step 200: 0.04963397979736328 Loss at step 250: 0.04326982796192169 Loss at step 300: 0.04471549019217491 Loss at step 350: 0.05481642857193947 Loss at step 400: 0.04154738411307335 Loss at step 450: 0.061191070824861526 Loss at step 500: 0.049981359392404556 Loss at step 550: 0.03990795090794563 Loss at step 600: 0.04980159178376198 Loss at step 650: 0.060116443783044815 Loss at step 700: 0.05402575060725212 Loss at step 750: 0.04082254320383072 Loss at step 800: 0.058263253420591354 Loss at step 850: 0.04322369024157524 Loss at step 900: 0.0461888313293457 Mean training loss after epoch 92: 0.049126974098495584 EPOCH: 93 Loss at step 0: 0.0459405779838562 Loss at step 50: 0.04385087266564369 Loss at step 100: 0.05670112371444702 Loss at step 150: 0.04358655586838722 Loss at step 200: 0.05368026718497276 Loss at step 250: 0.048020731657743454 Loss at step 300: 0.04861794412136078 Loss at step 350: 0.048971861600875854 Loss at step 400: 0.04269807040691376 Loss at step 450: 0.04159032925963402 Loss at step 500: 0.05909647047519684 Loss at step 550: 0.04345737025141716 Loss at step 600: 0.04127160459756851 Loss at step 650: 0.07942552119493484 Loss at step 700: 0.058795537799596786 Loss at step 750: 0.043718282133340836 Loss at step 800: 0.04370647296309471 Loss at step 850: 0.0775584876537323 Loss at step 900: 0.0595923587679863 Mean training loss after epoch 93: 0.04921821293546193 EPOCH: 94 Loss at step 0: 0.04134402051568031 Loss at step 50: 0.05048449710011482 Loss at step 100: 0.0536898672580719 Loss at step 150: 0.04136862978339195 Loss at step 200: 0.04599060118198395 Loss at step 250: 0.09084892272949219 Loss at step 300: 0.04266970977187157 Loss at step 350: 0.04893401265144348 Loss at step 400: 0.044866327196359634 Loss at step 450: 0.03999479115009308 Loss at step 500: 0.05388679355382919 Loss at step 550: 0.06818437576293945 Loss at step 600: 0.05689013749361038 Loss at step 650: 0.04596896469593048 Loss at step 700: 0.05052827298641205 Loss at step 750: 0.051583535969257355 Loss at step 800: 0.07066348940134048 Loss at step 850: 0.04286623001098633 Loss at step 900: 0.04519067704677582 Mean training loss after epoch 94: 0.049636554882438705 EPOCH: 95 Loss at step 0: 0.04553544521331787 Loss at step 50: 0.037725165486335754 Loss at step 100: 0.04108651727437973 Loss at step 150: 0.05293981358408928 Loss at step 200: 0.043984558433294296 Loss at step 250: 0.04061182588338852 Loss at step 300: 0.049995165318250656 Loss at step 350: 0.043118786066770554 Loss at step 400: 0.06520935148000717 Loss at step 450: 0.047619905322790146 Loss at step 500: 0.039747800678014755 Loss at step 550: 0.04617295786738396 Loss at step 600: 0.05013127624988556 Loss at step 650: 0.07285774499177933 Loss at step 700: 0.04267391189932823 Loss at step 750: 0.044371817260980606 Loss at step 800: 0.07955112308263779 Loss at step 850: 0.04106201231479645 Loss at step 900: 0.05531475320458412 Mean training loss after epoch 95: 0.04902714889234444 EPOCH: 96 Loss at step 0: 0.0396571084856987 Loss at step 50: 0.047130685299634933 Loss at step 100: 0.06466418504714966 Loss at step 150: 0.06302312016487122 Loss at step 200: 0.048206787556409836 Loss at step 250: 0.060839857906103134 Loss at step 300: 0.04497721046209335 Loss at step 350: 0.037976425141096115 Loss at step 400: 0.06313156336545944 Loss at step 450: 0.06082936003804207 Loss at step 500: 0.06272584199905396 Loss at step 550: 0.059597212821245193 Loss at step 600: 0.043508388102054596 Loss at step 650: 0.04391218721866608 Loss at step 700: 0.03952464461326599 Loss at step 750: 0.056175172328948975 Loss at step 800: 0.04919031634926796 Loss at step 850: 0.045863695442676544 Loss at step 900: 0.06201969087123871 Mean training loss after epoch 96: 0.048931560973559364 EPOCH: 97 Loss at step 0: 0.055770643055438995 Loss at step 50: 0.04662568122148514 Loss at step 100: 0.05256309360265732 Loss at step 150: 0.04946652799844742 Loss at step 200: 0.04325781762599945 Loss at step 250: 0.04095741733908653 Loss at step 300: 0.042382922023534775 Loss at step 350: 0.04991372674703598 Loss at step 400: 0.042100079357624054 Loss at step 450: 0.04106783866882324 Loss at step 500: 0.04807097092270851 Loss at step 550: 0.05407888814806938 Loss at step 600: 0.04308953881263733 Loss at step 650: 0.062133874744176865 Loss at step 700: 0.03700052574276924 Loss at step 750: 0.041649386286735535 Loss at step 800: 0.04823816940188408 Loss at step 850: 0.04797515645623207 Loss at step 900: 0.0419154018163681 Mean training loss after epoch 97: 0.04946406047020767 EPOCH: 98 Loss at step 0: 0.06148543581366539 Loss at step 50: 0.03504558280110359 Loss at step 100: 0.046274397522211075 Loss at step 150: 0.0427582785487175 Loss at step 200: 0.04233979433774948 Loss at step 250: 0.04401496425271034 Loss at step 300: 0.04430773854255676 Loss at step 350: 0.046044591814279556 Loss at step 400: 0.04406341537833214 Loss at step 450: 0.04576988145709038 Loss at step 500: 0.04091288894414902 Loss at step 550: 0.04889476299285889 Loss at step 600: 0.04876663163304329 Loss at step 650: 0.06343076378107071 Loss at step 700: 0.05256517603993416 Loss at step 750: 0.05464605242013931 Loss at step 800: 0.03760198503732681 Loss at step 850: 0.04161202162504196 Loss at step 900: 0.04388168454170227 Mean training loss after epoch 98: 0.049113452049301884 EPOCH: 99 Loss at step 0: 0.03995278477668762 Loss at step 50: 0.04505302011966705 Loss at step 100: 0.05228656157851219 Loss at step 150: 0.07359860092401505 Loss at step 200: 0.04271768406033516 Loss at step 250: 0.04519430547952652 Loss at step 300: 0.04168752580881119 Loss at step 350: 0.04520302265882492 Loss at step 400: 0.04385397210717201 Loss at step 450: 0.04275738447904587 Loss at step 500: 0.04357877001166344 Loss at step 550: 0.055648792535066605 Loss at step 600: 0.048568855971097946 Loss at step 650: 0.045275717973709106 Loss at step 700: 0.040647901594638824 Loss at step 750: 0.043948639184236526 Loss at step 800: 0.04289211332798004 Loss at step 850: 0.05396602302789688 Loss at step 900: 0.06339140236377716 Mean training loss after epoch 99: 0.049555576420319616 EPOCH: 100 Loss at step 0: 0.042619138956069946 Loss at step 50: 0.040140554308891296 Loss at step 100: 0.04784046858549118 Loss at step 150: 0.048589076846838 Loss at step 200: 0.04020867496728897 Loss at step 250: 0.055588774383068085 Loss at step 300: 0.050348613411188126 Loss at step 350: 0.04708428308367729 Loss at step 400: 0.04293529689311981 Loss at step 450: 0.049777332693338394 Loss at step 500: 0.06527303904294968 Loss at step 550: 0.042571961879730225 Loss at step 600: 0.036007314920425415 Loss at step 650: 0.0694173127412796 Loss at step 700: 0.0682796761393547 Loss at step 750: 0.05578703060746193 Loss at step 800: 0.041744619607925415 Loss at step 850: 0.0407642163336277 Loss at step 900: 0.05935173109173775 Mean training loss after epoch 100: 0.04890836941908354 EPOCH: 101 Loss at step 0: 0.04987026005983353 Loss at step 50: 0.06078115478157997 Loss at step 100: 0.057693324983119965 Loss at step 150: 0.04776197299361229 Loss at step 200: 0.046843256801366806 Loss at step 250: 0.040905795991420746 Loss at step 300: 0.047711897641420364 Loss at step 350: 0.049732137471437454 Loss at step 400: 0.04095924645662308 Loss at step 450: 0.06000039726495743 Loss at step 500: 0.06848817318677902 Loss at step 550: 0.0716969445347786 Loss at step 600: 0.06592187285423279 Loss at step 650: 0.03623607009649277 Loss at step 700: 0.06397902220487595 Loss at step 750: 0.06425417214632034 Loss at step 800: 0.03839142248034477 Loss at step 850: 0.04181625321507454 Loss at step 900: 0.0407731719315052 Mean training loss after epoch 101: 0.04946132611507165 EPOCH: 102 Loss at step 0: 0.03850685432553291 Loss at step 50: 0.04663720726966858 Loss at step 100: 0.04934526979923248 Loss at step 150: 0.04820127412676811 Loss at step 200: 0.04352913796901703 Loss at step 250: 0.046520885080099106 Loss at step 300: 0.04891414940357208 Loss at step 350: 0.04998835548758507 Loss at step 400: 0.0490589439868927 Loss at step 450: 0.04011859744787216 Loss at step 500: 0.053185462951660156 Loss at step 550: 0.04940098151564598 Loss at step 600: 0.0519132986664772 Loss at step 650: 0.044007617980241776 Loss at step 700: 0.058221522718667984 Loss at step 750: 0.06942025572061539 Loss at step 800: 0.04221067205071449 Loss at step 850: 0.04795975610613823 Loss at step 900: 0.04430650919675827 Mean training loss after epoch 102: 0.04882913201189499 EPOCH: 103 Loss at step 0: 0.03960133343935013 Loss at step 50: 0.03917718678712845 Loss at step 100: 0.045689746737480164 Loss at step 150: 0.055453814566135406 Loss at step 200: 0.0636676624417305 Loss at step 250: 0.04916020855307579 Loss at step 300: 0.050267528742551804 Loss at step 350: 0.047037869691848755 Loss at step 400: 0.0510585680603981 Loss at step 450: 0.04494280740618706 Loss at step 500: 0.051097571849823 Loss at step 550: 0.04776797816157341 Loss at step 600: 0.045107826590538025 Loss at step 650: 0.04530348628759384 Loss at step 700: 0.0374017208814621 Loss at step 750: 0.053104598075151443 Loss at step 800: 0.04187878221273422 Loss at step 850: 0.07655023038387299 Loss at step 900: 0.061343323439359665 Mean training loss after epoch 103: 0.04867533911893299 EPOCH: 104 Loss at step 0: 0.04669943079352379 Loss at step 50: 0.05491526052355766 Loss at step 100: 0.06942932307720184 Loss at step 150: 0.061767108738422394 Loss at step 200: 0.040030572563409805 Loss at step 250: 0.06999941915273666 Loss at step 300: 0.04081505909562111 Loss at step 350: 0.05251864716410637 Loss at step 400: 0.05680739879608154 Loss at step 450: 0.036437977105379105 Loss at step 500: 0.04127573221921921 Loss at step 550: 0.03854331746697426 Loss at step 600: 0.04240141063928604 Loss at step 650: 0.038306813687086105 Loss at step 700: 0.03980494290590286 Loss at step 750: 0.0751243382692337 Loss at step 800: 0.06256326287984848 Loss at step 850: 0.052333880215883255 Loss at step 900: 0.041536495089530945 Mean training loss after epoch 104: 0.04886450591101957 EPOCH: 105 Loss at step 0: 0.04690930247306824 Loss at step 50: 0.043316323310136795 Loss at step 100: 0.05028771236538887 Loss at step 150: 0.046669188886880875 Loss at step 200: 0.06033322960138321 Loss at step 250: 0.07230734825134277 Loss at step 300: 0.05890059471130371 Loss at step 350: 0.03918857127428055 Loss at step 400: 0.04826260358095169 Loss at step 450: 0.04593419283628464 Loss at step 500: 0.041088689118623734 Loss at step 550: 0.04200593754649162 Loss at step 600: 0.030580302700400352 Loss at step 650: 0.04777055233716965 Loss at step 700: 0.08203510195016861 Loss at step 750: 0.056089214980602264 Loss at step 800: 0.05297306552529335 Loss at step 850: 0.05402449145913124 Loss at step 900: 0.057078082114458084 Mean training loss after epoch 105: 0.04909526086303153 EPOCH: 106 Loss at step 0: 0.04028727114200592 Loss at step 50: 0.04893907532095909 Loss at step 100: 0.0450473390519619 Loss at step 150: 0.04549811780452728 Loss at step 200: 0.046697478741407394 Loss at step 250: 0.03546193242073059 Loss at step 300: 0.06147328391671181 Loss at step 350: 0.03978927433490753 Loss at step 400: 0.06366751343011856 Loss at step 450: 0.042370881885290146 Loss at step 500: 0.043448854237794876 Loss at step 550: 0.03465500846505165 Loss at step 600: 0.04476481303572655 Loss at step 650: 0.046579066663980484 Loss at step 700: 0.0436222068965435 Loss at step 750: 0.049572817981243134 Loss at step 800: 0.08291643112897873 Loss at step 850: 0.043720752000808716 Loss at step 900: 0.05070682242512703 Mean training loss after epoch 106: 0.048765338631644686 EPOCH: 107 Loss at step 0: 0.04030218347907066 Loss at step 50: 0.04060344398021698 Loss at step 100: 0.03576940670609474 Loss at step 150: 0.042393431067466736 Loss at step 200: 0.04389498010277748 Loss at step 250: 0.04216749593615532 Loss at step 300: 0.049407850950956345 Loss at step 350: 0.05480605736374855 Loss at step 400: 0.051308177411556244 Loss at step 450: 0.04175540432333946 Loss at step 500: 0.05692284181714058 Loss at step 550: 0.05844518914818764 Loss at step 600: 0.03944043442606926 Loss at step 650: 0.0409339964389801 Loss at step 700: 0.046769879758358 Loss at step 750: 0.05138499289751053 Loss at step 800: 0.04983990266919136 Loss at step 850: 0.045317795127630234 Loss at step 900: 0.04733778536319733 Mean training loss after epoch 107: 0.04952236133883757 EPOCH: 108 Loss at step 0: 0.06505302339792252 Loss at step 50: 0.040447961539030075 Loss at step 100: 0.041972022503614426 Loss at step 150: 0.05929534137248993 Loss at step 200: 0.05047089606523514 Loss at step 250: 0.05999187007546425 Loss at step 300: 0.07150460034608841 Loss at step 350: 0.06041579693555832 Loss at step 400: 0.05207663029432297 Loss at step 450: 0.0395498163998127 Loss at step 500: 0.04243307560682297 Loss at step 550: 0.058856841176748276 Loss at step 600: 0.06496166437864304 Loss at step 650: 0.042524755001068115 Loss at step 700: 0.04318412393331528 Loss at step 750: 0.039470233023166656 Loss at step 800: 0.06378208100795746 Loss at step 850: 0.06586473435163498 Loss at step 900: 0.03804762661457062 Mean training loss after epoch 108: 0.049106057510891955 EPOCH: 109 Loss at step 0: 0.046232495456933975 Loss at step 50: 0.03647739812731743 Loss at step 100: 0.05083920806646347 Loss at step 150: 0.06045791134238243 Loss at step 200: 0.042983561754226685 Loss at step 250: 0.04336068034172058 Loss at step 300: 0.04110205918550491 Loss at step 350: 0.04164930060505867 Loss at step 400: 0.04276866093277931 Loss at step 450: 0.04803874343633652 Loss at step 500: 0.046752043068408966 Loss at step 550: 0.03728315234184265 Loss at step 600: 0.058778371661901474 Loss at step 650: 0.04618687555193901 Loss at step 700: 0.046224817633628845 Loss at step 750: 0.045793142169713974 Loss at step 800: 0.05921325832605362 Loss at step 850: 0.06457895785570145 Loss at step 900: 0.04175681248307228 Mean training loss after epoch 109: 0.04878384856852705 EPOCH: 110 Loss at step 0: 0.04968038946390152 Loss at step 50: 0.04226292297244072 Loss at step 100: 0.03723354637622833 Loss at step 150: 0.04843694344162941 Loss at step 200: 0.04766783490777016 Loss at step 250: 0.06083642691373825 Loss at step 300: 0.09079065918922424 Loss at step 350: 0.057187799364328384 Loss at step 400: 0.046779926866292953 Loss at step 450: 0.039857689291238785 Loss at step 500: 0.042120613157749176 Loss at step 550: 0.04693334922194481 Loss at step 600: 0.06589117646217346 Loss at step 650: 0.03495512902736664 Loss at step 700: 0.04711277410387993 Loss at step 750: 0.04943154752254486 Loss at step 800: 0.050080131739377975 Loss at step 850: 0.05709643289446831 Loss at step 900: 0.041808199137449265 Mean training loss after epoch 110: 0.04878664847566629 EPOCH: 111 Loss at step 0: 0.057815685868263245 Loss at step 50: 0.05347634106874466 Loss at step 100: 0.06409960985183716 Loss at step 150: 0.045633260160684586 Loss at step 200: 0.05606812611222267 Loss at step 250: 0.05957503989338875 Loss at step 300: 0.07925248146057129 Loss at step 350: 0.04473511874675751 Loss at step 400: 0.03767020255327225 Loss at step 450: 0.04691488668322563 Loss at step 500: 0.04331127554178238 Loss at step 550: 0.04786911606788635 Loss at step 600: 0.03290712833404541 Loss at step 650: 0.060080915689468384 Loss at step 700: 0.053698308765888214 Loss at step 750: 0.044264666736125946 Loss at step 800: 0.062248993664979935 Loss at step 850: 0.043780092149972916 Loss at step 900: 0.04737289994955063 Mean training loss after epoch 111: 0.04872489497383266 EPOCH: 112 Loss at step 0: 0.039848536252975464 Loss at step 50: 0.042560480535030365 Loss at step 100: 0.0639154464006424 Loss at step 150: 0.042356155812740326 Loss at step 200: 0.0729765072464943 Loss at step 250: 0.03674968704581261 Loss at step 300: 0.03186776489019394 Loss at step 350: 0.040285930037498474 Loss at step 400: 0.051610060036182404 Loss at step 450: 0.04130977392196655 Loss at step 500: 0.05851125717163086 Loss at step 550: 0.04433787614107132 Loss at step 600: 0.04782512038946152 Loss at step 650: 0.0473966971039772 Loss at step 700: 0.046860311180353165 Loss at step 750: 0.06769396364688873 Loss at step 800: 0.04326288029551506 Loss at step 850: 0.04840104654431343 Loss at step 900: 0.038936447352170944 Mean training loss after epoch 112: 0.04892842766683874 EPOCH: 113 Loss at step 0: 0.043252259492874146 Loss at step 50: 0.043869368731975555 Loss at step 100: 0.06973021477460861 Loss at step 150: 0.04265906661748886 Loss at step 200: 0.04190574958920479 Loss at step 250: 0.05722861364483833 Loss at step 300: 0.05630997195839882 Loss at step 350: 0.05879300832748413 Loss at step 400: 0.05065404996275902 Loss at step 450: 0.044739652425050735 Loss at step 500: 0.04535117372870445 Loss at step 550: 0.053969606757164 Loss at step 600: 0.04195964336395264 Loss at step 650: 0.04622307047247887 Loss at step 700: 0.03985252231359482 Loss at step 750: 0.04138439893722534 Loss at step 800: 0.053085096180438995 Loss at step 850: 0.042479436844587326 Loss at step 900: 0.041439350694417953 Mean training loss after epoch 113: 0.04910419215517702 EPOCH: 114 Loss at step 0: 0.04304268956184387 Loss at step 50: 0.05519329011440277 Loss at step 100: 0.046760961413383484 Loss at step 150: 0.05967749282717705 Loss at step 200: 0.03930860012769699 Loss at step 250: 0.0423293262720108 Loss at step 300: 0.04224036633968353 Loss at step 350: 0.06279461830854416 Loss at step 400: 0.04417447745800018 Loss at step 450: 0.042528774589300156 Loss at step 500: 0.09121984243392944 Loss at step 550: 0.04960979148745537 Loss at step 600: 0.04497240111231804 Loss at step 650: 0.0583343505859375 Loss at step 700: 0.043451808393001556 Loss at step 750: 0.04024875909090042 Loss at step 800: 0.06129460781812668 Loss at step 850: 0.0562470518052578 Loss at step 900: 0.0604911632835865 Mean training loss after epoch 114: 0.04887088127672545 EPOCH: 115 Loss at step 0: 0.042519811540842056 Loss at step 50: 0.04335087537765503 Loss at step 100: 0.04406960681080818 Loss at step 150: 0.054973576217889786 Loss at step 200: 0.05814209580421448 Loss at step 250: 0.04802197217941284 Loss at step 300: 0.051835671067237854 Loss at step 350: 0.049630552530288696 Loss at step 400: 0.04491092264652252 Loss at step 450: 0.06269073486328125 Loss at step 500: 0.03520191088318825 Loss at step 550: 0.04450923949480057 Loss at step 600: 0.06526888161897659 Loss at step 650: 0.041005466133356094 Loss at step 700: 0.0522814579308033 Loss at step 750: 0.04958881437778473 Loss at step 800: 0.05273706838488579 Loss at step 850: 0.04273068532347679 Loss at step 900: 0.03488602489233017 Mean training loss after epoch 115: 0.048966162899600416 EPOCH: 116 Loss at step 0: 0.039944786578416824 Loss at step 50: 0.0460820347070694 Loss at step 100: 0.06460567563772202 Loss at step 150: 0.04436538740992546 Loss at step 200: 0.04281877726316452 Loss at step 250: 0.06092846393585205 Loss at step 300: 0.04400229454040527 Loss at step 350: 0.06441931426525116 Loss at step 400: 0.06212672218680382 Loss at step 450: 0.049356527626514435 Loss at step 500: 0.0552033968269825 Loss at step 550: 0.04376041516661644 Loss at step 600: 0.05018463730812073 Loss at step 650: 0.04874347895383835 Loss at step 700: 0.04673641547560692 Loss at step 750: 0.041436176747083664 Loss at step 800: 0.056158397346735 Loss at step 850: 0.03869563713669777 Loss at step 900: 0.04955567419528961 Mean training loss after epoch 116: 0.048448064735830465 EPOCH: 117 Loss at step 0: 0.04943149909377098 Loss at step 50: 0.050077494233846664 Loss at step 100: 0.05832513049244881 Loss at step 150: 0.06729210168123245 Loss at step 200: 0.052060533314943314 Loss at step 250: 0.03730674088001251 Loss at step 300: 0.04581424221396446 Loss at step 350: 0.04273047670722008 Loss at step 400: 0.04203833267092705 Loss at step 450: 0.04501136392354965 Loss at step 500: 0.04355204105377197 Loss at step 550: 0.04271211847662926 Loss at step 600: 0.043606825172901154 Loss at step 650: 0.04525885730981827 Loss at step 700: 0.0664316937327385 Loss at step 750: 0.041386112570762634 Loss at step 800: 0.050622209906578064 Loss at step 850: 0.04522951319813728 Loss at step 900: 0.04584679752588272 Mean training loss after epoch 117: 0.04818040859708781 EPOCH: 118 Loss at step 0: 0.034839119762182236 Loss at step 50: 0.04122745245695114 Loss at step 100: 0.04522548243403435 Loss at step 150: 0.04677584767341614 Loss at step 200: 0.04607687145471573 Loss at step 250: 0.04960291460156441 Loss at step 300: 0.051647551357746124 Loss at step 350: 0.05028221383690834 Loss at step 400: 0.053716834634542465 Loss at step 450: 0.06144123896956444 Loss at step 500: 0.0550815612077713 Loss at step 550: 0.039255231618881226 Loss at step 600: 0.05884433165192604 Loss at step 650: 0.04376909136772156 Loss at step 700: 0.04610493779182434 Loss at step 750: 0.03936005011200905 Loss at step 800: 0.042407795786857605 Loss at step 850: 0.05704762786626816 Loss at step 900: 0.06470819562673569 Mean training loss after epoch 118: 0.048430874768986124 EPOCH: 119 Loss at step 0: 0.0567111037671566 Loss at step 50: 0.042898278683423996 Loss at step 100: 0.043276529759168625 Loss at step 150: 0.04436664283275604 Loss at step 200: 0.04696665704250336 Loss at step 250: 0.0615541934967041 Loss at step 300: 0.05104081332683563 Loss at step 350: 0.0647173672914505 Loss at step 400: 0.0422729067504406 Loss at step 450: 0.05883825942873955 Loss at step 500: 0.044646620750427246 Loss at step 550: 0.055214568972587585 Loss at step 600: 0.04495606943964958 Loss at step 650: 0.043223172426223755 Loss at step 700: 0.04783511161804199 Loss at step 750: 0.06108446046710014 Loss at step 800: 0.04684356600046158 Loss at step 850: 0.06944011896848679 Loss at step 900: 0.050889287143945694 Mean training loss after epoch 119: 0.04872648282123527 EPOCH: 120 Loss at step 0: 0.04811539500951767 Loss at step 50: 0.06313597410917282 Loss at step 100: 0.03966924175620079 Loss at step 150: 0.048546090722084045 Loss at step 200: 0.03894300386309624 Loss at step 250: 0.0406767912209034 Loss at step 300: 0.07337778061628342 Loss at step 350: 0.042906031012535095 Loss at step 400: 0.04507303237915039 Loss at step 450: 0.052193570882081985 Loss at step 500: 0.04565911367535591 Loss at step 550: 0.0451495498418808 Loss at step 600: 0.04028734192252159 Loss at step 650: 0.045540545135736465 Loss at step 700: 0.04972809925675392 Loss at step 750: 0.04414341226220131 Loss at step 800: 0.039158329367637634 Loss at step 850: 0.058330241590738297 Loss at step 900: 0.041156645864248276 Mean training loss after epoch 120: 0.04875098317384974 EPOCH: 121 Loss at step 0: 0.04470215365290642 Loss at step 50: 0.048543043434619904 Loss at step 100: 0.046490006148815155 Loss at step 150: 0.04412710666656494 Loss at step 200: 0.051241979002952576 Loss at step 250: 0.05778942257165909 Loss at step 300: 0.04941749572753906 Loss at step 350: 0.052952006459236145 Loss at step 400: 0.04837014526128769 Loss at step 450: 0.03905497491359711 Loss at step 500: 0.04348765313625336 Loss at step 550: 0.05124882981181145 Loss at step 600: 0.06307277083396912 Loss at step 650: 0.038353536278009415 Loss at step 700: 0.058547310531139374 Loss at step 750: 0.03746767342090607 Loss at step 800: 0.043770186603069305 Loss at step 850: 0.06271298974752426 Loss at step 900: 0.040334563702344894 Mean training loss after epoch 121: 0.04852884418086838 EPOCH: 122 Loss at step 0: 0.06182936951518059 Loss at step 50: 0.06565168499946594 Loss at step 100: 0.07490838319063187 Loss at step 150: 0.0717703327536583 Loss at step 200: 0.047581471502780914 Loss at step 250: 0.05467211455106735 Loss at step 300: 0.0464843325316906 Loss at step 350: 0.056970469653606415 Loss at step 400: 0.038349173963069916 Loss at step 450: 0.04014921560883522 Loss at step 500: 0.03847106173634529 Loss at step 550: 0.043343909084796906 Loss at step 600: 0.03836037963628769 Loss at step 650: 0.06525057554244995 Loss at step 700: 0.042454514652490616 Loss at step 750: 0.03860451281070709 Loss at step 800: 0.04114649072289467 Loss at step 850: 0.04198504984378815 Loss at step 900: 0.056722719222307205 Mean training loss after epoch 122: 0.04805394287493183 EPOCH: 123 Loss at step 0: 0.06943417340517044 Loss at step 50: 0.04312584176659584 Loss at step 100: 0.04711692035198212 Loss at step 150: 0.04146064817905426 Loss at step 200: 0.03884458169341087 Loss at step 250: 0.048305053263902664 Loss at step 300: 0.035895269364118576 Loss at step 350: 0.04626725986599922 Loss at step 400: 0.05452008172869682 Loss at step 450: 0.04145897179841995 Loss at step 500: 0.04111278057098389 Loss at step 550: 0.04597388952970505 Loss at step 600: 0.04333062842488289 Loss at step 650: 0.04713686183094978 Loss at step 700: 0.07415705919265747 Loss at step 750: 0.0510522723197937 Loss at step 800: 0.0566122829914093 Loss at step 850: 0.043810129165649414 Loss at step 900: 0.05466644465923309 Mean training loss after epoch 123: 0.04840215760618766 EPOCH: 124 Loss at step 0: 0.0425613634288311 Loss at step 50: 0.04526353254914284 Loss at step 100: 0.048045564442873 Loss at step 150: 0.07229669392108917 Loss at step 200: 0.04286525025963783 Loss at step 250: 0.03194453567266464 Loss at step 300: 0.05687033757567406 Loss at step 350: 0.04356146603822708 Loss at step 400: 0.04045304283499718 Loss at step 450: 0.04303451254963875 Loss at step 500: 0.04135030880570412 Loss at step 550: 0.045940328389406204 Loss at step 600: 0.07247654348611832 Loss at step 650: 0.04583844169974327 Loss at step 700: 0.03390170633792877 Loss at step 750: 0.04460088536143303 Loss at step 800: 0.07188066095113754 Loss at step 850: 0.04158822447061539 Loss at step 900: 0.06504479050636292 Mean training loss after epoch 124: 0.04891533395851345 EPOCH: 125 Loss at step 0: 0.05066239833831787 Loss at step 50: 0.0426696352660656 Loss at step 100: 0.05380329117178917 Loss at step 150: 0.046559449285268784 Loss at step 200: 0.043796148151159286 Loss at step 250: 0.06823095679283142 Loss at step 300: 0.04093124344944954 Loss at step 350: 0.03697185218334198 Loss at step 400: 0.0402502603828907 Loss at step 450: 0.04627738520503044 Loss at step 500: 0.05995035916566849 Loss at step 550: 0.043111402541399 Loss at step 600: 0.05396724119782448 Loss at step 650: 0.06508559733629227 Loss at step 700: 0.04958206042647362 Loss at step 750: 0.04655090719461441 Loss at step 800: 0.0532492995262146 Loss at step 850: 0.037808507680892944 Loss at step 900: 0.04625258967280388 Mean training loss after epoch 125: 0.049067387532101256 EPOCH: 126 Loss at step 0: 0.04309287294745445 Loss at step 50: 0.037295978516340256 Loss at step 100: 0.04052497819066048 Loss at step 150: 0.06257282197475433 Loss at step 200: 0.043346185237169266 Loss at step 250: 0.05827660486102104 Loss at step 300: 0.04505710303783417 Loss at step 350: 0.05021904408931732 Loss at step 400: 0.05236724019050598 Loss at step 450: 0.04331468045711517 Loss at step 500: 0.03872426226735115 Loss at step 550: 0.03708256036043167 Loss at step 600: 0.036741551011800766 Loss at step 650: 0.04717062786221504 Loss at step 700: 0.0353982038795948 Loss at step 750: 0.04976630583405495 Loss at step 800: 0.05311547964811325 Loss at step 850: 0.038135938346385956 Loss at step 900: 0.04831712320446968 Mean training loss after epoch 126: 0.04823600547288908 EPOCH: 127 Loss at step 0: 0.039447665214538574 Loss at step 50: 0.04751758649945259 Loss at step 100: 0.046387020498514175 Loss at step 150: 0.04138258099555969 Loss at step 200: 0.044335365295410156 Loss at step 250: 0.04048046097159386 Loss at step 300: 0.04244330897927284 Loss at step 350: 0.05827585980296135 Loss at step 400: 0.057214174419641495 Loss at step 450: 0.061907507479190826 Loss at step 500: 0.038360800594091415 Loss at step 550: 0.058634378015995026 Loss at step 600: 0.04540303349494934 Loss at step 650: 0.05697581171989441 Loss at step 700: 0.04678212106227875 Loss at step 750: 0.03870060294866562 Loss at step 800: 0.056095052510499954 Loss at step 850: 0.04334866255521774 Loss at step 900: 0.058345064520835876 Mean training loss after epoch 127: 0.04836863971182278 EPOCH: 128 Loss at step 0: 0.04446917027235031 Loss at step 50: 0.036230795085430145 Loss at step 100: 0.03390878438949585 Loss at step 150: 0.05076201632618904 Loss at step 200: 0.04533998295664787 Loss at step 250: 0.06388229876756668 Loss at step 300: 0.04310081526637077 Loss at step 350: 0.04730470851063728 Loss at step 400: 0.04927119240164757 Loss at step 450: 0.04337615892291069 Loss at step 500: 0.04499523714184761 Loss at step 550: 0.06709408760070801 Loss at step 600: 0.047208406031131744 Loss at step 650: 0.062088172882795334 Loss at step 700: 0.045349251478910446 Loss at step 750: 0.045405711978673935 Loss at step 800: 0.04638300836086273 Loss at step 850: 0.04633015766739845 Loss at step 900: 0.04517402499914169 Mean training loss after epoch 128: 0.04834019551788375 EPOCH: 129 Loss at step 0: 0.04730665683746338 Loss at step 50: 0.08005648851394653 Loss at step 100: 0.05609212443232536 Loss at step 150: 0.03608287125825882 Loss at step 200: 0.07246395945549011 Loss at step 250: 0.043266426771879196 Loss at step 300: 0.03814108297228813 Loss at step 350: 0.0400502048432827 Loss at step 400: 0.04808216542005539 Loss at step 450: 0.07416518032550812 Loss at step 500: 0.052690211683511734 Loss at step 550: 0.03688739612698555 Loss at step 600: 0.04007221385836601 Loss at step 650: 0.05895022675395012 Loss at step 700: 0.04804253578186035 Loss at step 750: 0.04299578443169594 Loss at step 800: 0.038000501692295074 Loss at step 850: 0.0356425978243351 Loss at step 900: 0.03870824724435806 Mean training loss after epoch 129: 0.04848297571203411 EPOCH: 130 Loss at step 0: 0.04697586968541145 Loss at step 50: 0.030932608991861343 Loss at step 100: 0.0630011335015297 Loss at step 150: 0.04436697065830231 Loss at step 200: 0.041259560734033585 Loss at step 250: 0.04434643313288689 Loss at step 300: 0.03745892643928528 Loss at step 350: 0.04469682648777962 Loss at step 400: 0.044172268360853195 Loss at step 450: 0.047252051532268524 Loss at step 500: 0.05004040524363518 Loss at step 550: 0.0402071438729763 Loss at step 600: 0.04100075364112854 Loss at step 650: 0.042602911591529846 Loss at step 700: 0.039173975586891174 Loss at step 750: 0.051787540316581726 Loss at step 800: 0.03999105095863342 Loss at step 850: 0.04970330372452736 Loss at step 900: 0.03868302330374718 Mean training loss after epoch 130: 0.048095945280783974 EPOCH: 131 Loss at step 0: 0.03945254161953926 Loss at step 50: 0.046345699578523636 Loss at step 100: 0.03901904448866844 Loss at step 150: 0.053221989423036575 Loss at step 200: 0.038796305656433105 Loss at step 250: 0.040886927396059036 Loss at step 300: 0.058965444564819336 Loss at step 350: 0.09325801581144333 Loss at step 400: 0.04412093758583069 Loss at step 450: 0.04441311955451965 Loss at step 500: 0.039930786937475204 Loss at step 550: 0.0433332584798336 Loss at step 600: 0.04773958399891853 Loss at step 650: 0.0525241382420063 Loss at step 700: 0.04139648750424385 Loss at step 750: 0.04376818612217903 Loss at step 800: 0.03579619526863098 Loss at step 850: 0.037825196981430054 Loss at step 900: 0.039991606026887894 Mean training loss after epoch 131: 0.04813931281688307 EPOCH: 132 Loss at step 0: 0.060901518911123276 Loss at step 50: 0.04412722960114479 Loss at step 100: 0.05966703221201897 Loss at step 150: 0.05515938624739647 Loss at step 200: 0.06949061155319214 Loss at step 250: 0.03898091986775398 Loss at step 300: 0.04608812555670738 Loss at step 350: 0.038992900401353836 Loss at step 400: 0.06122872978448868 Loss at step 450: 0.038341931998729706 Loss at step 500: 0.04099982976913452 Loss at step 550: 0.060087982565164566 Loss at step 600: 0.04502629116177559 Loss at step 650: 0.051901426166296005 Loss at step 700: 0.04355320706963539 Loss at step 750: 0.0422915518283844 Loss at step 800: 0.06365448981523514 Loss at step 850: 0.04554571956396103 Loss at step 900: 0.05126821622252464 Mean training loss after epoch 132: 0.0482410254683703 EPOCH: 133 Loss at step 0: 0.07518590986728668 Loss at step 50: 0.054417483508586884 Loss at step 100: 0.06153472512960434 Loss at step 150: 0.037731070071458817 Loss at step 200: 0.044344671070575714 Loss at step 250: 0.0471181757748127 Loss at step 300: 0.0506008043885231 Loss at step 350: 0.0532262809574604 Loss at step 400: 0.04656652361154556 Loss at step 450: 0.05597405508160591 Loss at step 500: 0.04102633520960808 Loss at step 550: 0.05982747673988342 Loss at step 600: 0.046552930027246475 Loss at step 650: 0.06984357535839081 Loss at step 700: 0.05233166366815567 Loss at step 750: 0.0637328252196312 Loss at step 800: 0.042027413845062256 Loss at step 850: 0.04495489224791527 Loss at step 900: 0.055648788809776306 Mean training loss after epoch 133: 0.0486821632312019 EPOCH: 134 Loss at step 0: 0.03275757282972336 Loss at step 50: 0.05485304072499275 Loss at step 100: 0.04942537844181061 Loss at step 150: 0.0440823957324028 Loss at step 200: 0.04479627311229706 Loss at step 250: 0.04602312296628952 Loss at step 300: 0.04875355586409569 Loss at step 350: 0.037963587790727615 Loss at step 400: 0.043428149074316025 Loss at step 450: 0.06420020759105682 Loss at step 500: 0.04984017834067345 Loss at step 550: 0.0486188605427742 Loss at step 600: 0.03966096416115761 Loss at step 650: 0.04057718813419342 Loss at step 700: 0.043027691543102264 Loss at step 750: 0.05656832829117775 Loss at step 800: 0.05872407928109169 Loss at step 850: 0.06269802153110504 Loss at step 900: 0.04432373493909836 Mean training loss after epoch 134: 0.04860290090666651 EPOCH: 135 Loss at step 0: 0.05233273655176163 Loss at step 50: 0.05610797181725502 Loss at step 100: 0.03993593528866768 Loss at step 150: 0.042799919843673706 Loss at step 200: 0.05465363338589668 Loss at step 250: 0.04657053202390671 Loss at step 300: 0.05145983025431633 Loss at step 350: 0.039075691252946854 Loss at step 400: 0.042336564511060715 Loss at step 450: 0.03943547233939171 Loss at step 500: 0.039916880428791046 Loss at step 550: 0.042199596762657166 Loss at step 600: 0.054571881890296936 Loss at step 650: 0.05190263316035271 Loss at step 700: 0.061620235443115234 Loss at step 750: 0.047968197613954544 Loss at step 800: 0.05276516452431679 Loss at step 850: 0.038781631737947464 Loss at step 900: 0.04355131834745407 Mean training loss after epoch 135: 0.04755938240587076 EPOCH: 136 Loss at step 0: 0.058247342705726624 Loss at step 50: 0.052558425813913345 Loss at step 100: 0.04809776693582535 Loss at step 150: 0.042935263365507126 Loss at step 200: 0.04989586025476456 Loss at step 250: 0.04435832053422928 Loss at step 300: 0.041102901101112366 Loss at step 350: 0.04911432042717934 Loss at step 400: 0.056255459785461426 Loss at step 450: 0.060185305774211884 Loss at step 500: 0.05894460529088974 Loss at step 550: 0.04147157818078995 Loss at step 600: 0.043802279978990555 Loss at step 650: 0.04800396040081978 Loss at step 700: 0.041861772537231445 Loss at step 750: 0.03566359728574753 Loss at step 800: 0.055345598608255386 Loss at step 850: 0.043186236172914505 Loss at step 900: 0.04311433434486389 Mean training loss after epoch 136: 0.048636297252513706 EPOCH: 137 Loss at step 0: 0.05066664144396782 Loss at step 50: 0.04629452899098396 Loss at step 100: 0.04978542774915695 Loss at step 150: 0.038557521998882294 Loss at step 200: 0.03935644403100014 Loss at step 250: 0.047433171421289444 Loss at step 300: 0.04238813742995262 Loss at step 350: 0.03924359008669853 Loss at step 400: 0.04746655747294426 Loss at step 450: 0.055975817143917084 Loss at step 500: 0.04477768391370773 Loss at step 550: 0.047749828547239304 Loss at step 600: 0.05821654200553894 Loss at step 650: 0.037011031061410904 Loss at step 700: 0.04028184339404106 Loss at step 750: 0.04351707175374031 Loss at step 800: 0.040833622217178345 Loss at step 850: 0.04764173924922943 Loss at step 900: 0.040814828127622604 Mean training loss after epoch 137: 0.04809693268923236 EPOCH: 138 Loss at step 0: 0.04736685752868652 Loss at step 50: 0.03835521265864372 Loss at step 100: 0.03589266911149025 Loss at step 150: 0.04202171787619591 Loss at step 200: 0.0428219698369503 Loss at step 250: 0.038759272545576096 Loss at step 300: 0.0466533862054348 Loss at step 350: 0.04689541086554527 Loss at step 400: 0.04181366413831711 Loss at step 450: 0.04345124959945679 Loss at step 500: 0.04273868352174759 Loss at step 550: 0.042811017483472824 Loss at step 600: 0.06171903759241104 Loss at step 650: 0.05848368629813194 Loss at step 700: 0.04884861409664154 Loss at step 750: 0.04454055055975914 Loss at step 800: 0.03974572941660881 Loss at step 850: 0.04441690444946289 Loss at step 900: 0.0434747077524662 Mean training loss after epoch 138: 0.04809947801170064 EPOCH: 139 Loss at step 0: 0.08527266979217529 Loss at step 50: 0.04639517515897751 Loss at step 100: 0.04607561603188515 Loss at step 150: 0.047110944986343384 Loss at step 200: 0.0541897788643837 Loss at step 250: 0.04011073708534241 Loss at step 300: 0.033517319709062576 Loss at step 350: 0.06222081929445267 Loss at step 400: 0.03806560859084129 Loss at step 450: 0.042660634964704514 Loss at step 500: 0.043633926659822464 Loss at step 550: 0.046489499509334564 Loss at step 600: 0.039683252573013306 Loss at step 650: 0.05128851532936096 Loss at step 700: 0.03956982493400574 Loss at step 750: 0.04005344957113266 Loss at step 800: 0.04997238889336586 Loss at step 850: 0.04967762529850006 Loss at step 900: 0.0502094104886055 Mean training loss after epoch 139: 0.04805197540932754 EPOCH: 140 Loss at step 0: 0.04959901422262192 Loss at step 50: 0.05228995159268379 Loss at step 100: 0.046477094292640686 Loss at step 150: 0.040125228464603424 Loss at step 200: 0.04821782931685448 Loss at step 250: 0.04473807290196419 Loss at step 300: 0.0428842268884182 Loss at step 350: 0.041385143995285034 Loss at step 400: 0.046500902622938156 Loss at step 450: 0.09133294224739075 Loss at step 500: 0.04446685314178467 Loss at step 550: 0.0497596301138401 Loss at step 600: 0.04933673515915871 Loss at step 650: 0.04097163677215576 Loss at step 700: 0.04848690330982208 Loss at step 750: 0.032619908452034 Loss at step 800: 0.04444914311170578 Loss at step 850: 0.04953921586275101 Loss at step 900: 0.04851660877466202 Mean training loss after epoch 140: 0.04822871221233405 EPOCH: 141 Loss at step 0: 0.04221989959478378 Loss at step 50: 0.07759493589401245 Loss at step 100: 0.05703134462237358 Loss at step 150: 0.04103032872080803 Loss at step 200: 0.05494566261768341 Loss at step 250: 0.06402164697647095 Loss at step 300: 0.06391774863004684 Loss at step 350: 0.06424076855182648 Loss at step 400: 0.047507546842098236 Loss at step 450: 0.04847937077283859 Loss at step 500: 0.06257084012031555 Loss at step 550: 0.04097563400864601 Loss at step 600: 0.041609734296798706 Loss at step 650: 0.06513803452253342 Loss at step 700: 0.04723398759961128 Loss at step 750: 0.036807216703891754 Loss at step 800: 0.04603337123990059 Loss at step 850: 0.05967361107468605 Loss at step 900: 0.039211906492710114 Mean training loss after epoch 141: 0.0478967134175556 EPOCH: 142 Loss at step 0: 0.03906850889325142 Loss at step 50: 0.04973115772008896 Loss at step 100: 0.056517649441957474 Loss at step 150: 0.039413951337337494 Loss at step 200: 0.04075133800506592 Loss at step 250: 0.03613419458270073 Loss at step 300: 0.044767651706933975 Loss at step 350: 0.04034607112407684 Loss at step 400: 0.04152689129114151 Loss at step 450: 0.060365621000528336 Loss at step 500: 0.05523533746600151 Loss at step 550: 0.05406200513243675 Loss at step 600: 0.04481928423047066 Loss at step 650: 0.03756895288825035 Loss at step 700: 0.040956903249025345 Loss at step 750: 0.04768558219075203 Loss at step 800: 0.0357772558927536 Loss at step 850: 0.046740975230932236 Loss at step 900: 0.04051882028579712 Mean training loss after epoch 142: 0.04823975722942907 EPOCH: 143 Loss at step 0: 0.041432712227106094 Loss at step 50: 0.03937713801860809 Loss at step 100: 0.052498962730169296 Loss at step 150: 0.043708980083465576 Loss at step 200: 0.05813673138618469 Loss at step 250: 0.04596243053674698 Loss at step 300: 0.04315383359789848 Loss at step 350: 0.06011993810534477 Loss at step 400: 0.04048530384898186 Loss at step 450: 0.04083188623189926 Loss at step 500: 0.049775589257478714 Loss at step 550: 0.044776055961847305 Loss at step 600: 0.04295695573091507 Loss at step 650: 0.04333944618701935 Loss at step 700: 0.04534582793712616 Loss at step 750: 0.05514002591371536 Loss at step 800: 0.03822556138038635 Loss at step 850: 0.0571867860853672 Loss at step 900: 0.036526959389448166 Mean training loss after epoch 143: 0.04780379378957662 EPOCH: 144 Loss at step 0: 0.045310068875551224 Loss at step 50: 0.04525734484195709 Loss at step 100: 0.05311569944024086 Loss at step 150: 0.04969799146056175 Loss at step 200: 0.05705079063773155 Loss at step 250: 0.04452880099415779 Loss at step 300: 0.04202457144856453 Loss at step 350: 0.04375694319605827 Loss at step 400: 0.0434386245906353 Loss at step 450: 0.05591442063450813 Loss at step 500: 0.05066610872745514 Loss at step 550: 0.04539495334029198 Loss at step 600: 0.05147344619035721 Loss at step 650: 0.0514625608921051 Loss at step 700: 0.043818891048431396 Loss at step 750: 0.038058750331401825 Loss at step 800: 0.06564795970916748 Loss at step 850: 0.04329938814043999 Loss at step 900: 0.04927845299243927 Mean training loss after epoch 144: 0.048110375826213275 EPOCH: 145 Loss at step 0: 0.04202912747859955 Loss at step 50: 0.03972046449780464 Loss at step 100: 0.046217381954193115 Loss at step 150: 0.03980931267142296 Loss at step 200: 0.05494968220591545 Loss at step 250: 0.04428204149007797 Loss at step 300: 0.056132812052965164 Loss at step 350: 0.04495026543736458 Loss at step 400: 0.04572979360818863 Loss at step 450: 0.044756144285202026 Loss at step 500: 0.0454598031938076 Loss at step 550: 0.051626548171043396 Loss at step 600: 0.039677321910858154 Loss at step 650: 0.03964082896709442 Loss at step 700: 0.05638100951910019 Loss at step 750: 0.056740764528512955 Loss at step 800: 0.03713207319378853 Loss at step 850: 0.05869666859507561 Loss at step 900: 0.04122999683022499 Mean training loss after epoch 145: 0.048078522366135995 EPOCH: 146 Loss at step 0: 0.040502823889255524 Loss at step 50: 0.05495402589440346 Loss at step 100: 0.03602927178144455 Loss at step 150: 0.057629313319921494 Loss at step 200: 0.04840699955821037 Loss at step 250: 0.04364636540412903 Loss at step 300: 0.04949730262160301 Loss at step 350: 0.04318000748753548 Loss at step 400: 0.04250628501176834 Loss at step 450: 0.05796785652637482 Loss at step 500: 0.051449161022901535 Loss at step 550: 0.04198930785059929 Loss at step 600: 0.04214757680892944 Loss at step 650: 0.05911919102072716 Loss at step 700: 0.04252789914608002 Loss at step 750: 0.03858495131134987 Loss at step 800: 0.043688032776117325 Loss at step 850: 0.039767369627952576 Loss at step 900: 0.051621872931718826 Mean training loss after epoch 146: 0.047909198805435635 EPOCH: 147 Loss at step 0: 0.06225462257862091 Loss at step 50: 0.03969653323292732 Loss at step 100: 0.04131756350398064 Loss at step 150: 0.04693257808685303 Loss at step 200: 0.038300156593322754 Loss at step 250: 0.05204145237803459 Loss at step 300: 0.08692367374897003 Loss at step 350: 0.05888795852661133 Loss at step 400: 0.07480836659669876 Loss at step 450: 0.04509488865733147 Loss at step 500: 0.03844299539923668 Loss at step 550: 0.042850859463214874 Loss at step 600: 0.046785663813352585 Loss at step 650: 0.04514247551560402 Loss at step 700: 0.042631346732378006 Loss at step 750: 0.041802626103162766 Loss at step 800: 0.04003133624792099 Loss at step 850: 0.04264260083436966 Loss at step 900: 0.04100607708096504 Mean training loss after epoch 147: 0.04840825801528593 EPOCH: 148 Loss at step 0: 0.04437281936407089 Loss at step 50: 0.04259352758526802 Loss at step 100: 0.044428080320358276 Loss at step 150: 0.042147621512413025 Loss at step 200: 0.05449044704437256 Loss at step 250: 0.042162563651800156 Loss at step 300: 0.042122259736061096 Loss at step 350: 0.04886902496218681 Loss at step 400: 0.07493843883275986 Loss at step 450: 0.04366179183125496 Loss at step 500: 0.043343883007764816 Loss at step 550: 0.038856472820043564 Loss at step 600: 0.060272857546806335 Loss at step 650: 0.05685914680361748 Loss at step 700: 0.038389164954423904 Loss at step 750: 0.05983494967222214 Loss at step 800: 0.04730628430843353 Loss at step 850: 0.047110751271247864 Loss at step 900: 0.04190683364868164 Mean training loss after epoch 148: 0.04748984102779297 EPOCH: 149 Loss at step 0: 0.06131398677825928 Loss at step 50: 0.057646993547677994 Loss at step 100: 0.04564312472939491 Loss at step 150: 0.04707563668489456 Loss at step 200: 0.046313896775245667 Loss at step 250: 0.05284058675169945 Loss at step 300: 0.049038827419281006 Loss at step 350: 0.053099777549505234 Loss at step 400: 0.05843466892838478 Loss at step 450: 0.053093500435352325 Loss at step 500: 0.042451437562704086 Loss at step 550: 0.04467247799038887 Loss at step 600: 0.045495402067899704 Loss at step 650: 0.04197118803858757 Loss at step 700: 0.0591651052236557 Loss at step 750: 0.04414018616080284 Loss at step 800: 0.05778920277953148 Loss at step 850: 0.04678281024098396 Loss at step 900: 0.0692528635263443 Mean training loss after epoch 149: 0.04815618381269578 EPOCH: 150 Loss at step 0: 0.04685692861676216 Loss at step 50: 0.06407912075519562 Loss at step 100: 0.04622413218021393 Loss at step 150: 0.04480960965156555 Loss at step 200: 0.046229783445596695 Loss at step 250: 0.0373903252184391 Loss at step 300: 0.03786709904670715 Loss at step 350: 0.04431575536727905 Loss at step 400: 0.03820958733558655 Loss at step 450: 0.045989472419023514 Loss at step 500: 0.035547781735658646 Loss at step 550: 0.04289945214986801 Loss at step 600: 0.04332456737756729 Loss at step 650: 0.048033807426691055 Loss at step 700: 0.04117652401328087 Loss at step 750: 0.0450725220143795 Loss at step 800: 0.06029557064175606 Loss at step 850: 0.05633963271975517 Loss at step 900: 0.044749338179826736 Mean training loss after epoch 150: 0.047823521624337124 EPOCH: 151 Loss at step 0: 0.04830772057175636 Loss at step 50: 0.0717492550611496 Loss at step 100: 0.0465884692966938 Loss at step 150: 0.03923540934920311 Loss at step 200: 0.07384171336889267 Loss at step 250: 0.03534482792019844 Loss at step 300: 0.05776512250304222 Loss at step 350: 0.04497353732585907 Loss at step 400: 0.044133175164461136 Loss at step 450: 0.05690193921327591 Loss at step 500: 0.04707585275173187 Loss at step 550: 0.04227139800786972 Loss at step 600: 0.051336392760276794 Loss at step 650: 0.07088176161050797 Loss at step 700: 0.04694501683115959 Loss at step 750: 0.04349277541041374 Loss at step 800: 0.05168856307864189 Loss at step 850: 0.039288997650146484 Loss at step 900: 0.03858185186982155 Mean training loss after epoch 151: 0.04805107841065634 EPOCH: 152 Loss at step 0: 0.044062741100788116 Loss at step 50: 0.047415852546691895 Loss at step 100: 0.04606248438358307 Loss at step 150: 0.04317641258239746 Loss at step 200: 0.051159899681806564 Loss at step 250: 0.0729796439409256 Loss at step 300: 0.049446724355220795 Loss at step 350: 0.04840434715151787 Loss at step 400: 0.04895773530006409 Loss at step 450: 0.07889073342084885 Loss at step 500: 0.03835040703415871 Loss at step 550: 0.06313285231590271 Loss at step 600: 0.04946298524737358 Loss at step 650: 0.04593770578503609 Loss at step 700: 0.05483170226216316 Loss at step 750: 0.052406881004571915 Loss at step 800: 0.060273509472608566 Loss at step 850: 0.03952197730541229 Loss at step 900: 0.04153937101364136 Mean training loss after epoch 152: 0.048407141568024024 EPOCH: 153 Loss at step 0: 0.07584570348262787 Loss at step 50: 0.051185932010412216 Loss at step 100: 0.043529365211725235 Loss at step 150: 0.05507998540997505 Loss at step 200: 0.04450291022658348 Loss at step 250: 0.04505030810832977 Loss at step 300: 0.037257637828588486 Loss at step 350: 0.0462719164788723 Loss at step 400: 0.05139850452542305 Loss at step 450: 0.05660077929496765 Loss at step 500: 0.04068489372730255 Loss at step 550: 0.043151188641786575 Loss at step 600: 0.0745161846280098 Loss at step 650: 0.04348268359899521 Loss at step 700: 0.04038102552294731 Loss at step 750: 0.04145857319235802 Loss at step 800: 0.055077046155929565 Loss at step 850: 0.03749913349747658 Loss at step 900: 0.038099102675914764 Mean training loss after epoch 153: 0.04789722627048681 EPOCH: 154 Loss at step 0: 0.046416349709033966 Loss at step 50: 0.03874007612466812 Loss at step 100: 0.07669641077518463 Loss at step 150: 0.043296072632074356 Loss at step 200: 0.03838653117418289 Loss at step 250: 0.04113393649458885 Loss at step 300: 0.05899520590901375 Loss at step 350: 0.04374735429883003 Loss at step 400: 0.04343133047223091 Loss at step 450: 0.05908855423331261 Loss at step 500: 0.07816362380981445 Loss at step 550: 0.03837007284164429 Loss at step 600: 0.04788869619369507 Loss at step 650: 0.04302465170621872 Loss at step 700: 0.046568211168050766 Loss at step 750: 0.054292209446430206 Loss at step 800: 0.04697525128722191 Loss at step 850: 0.044622473418712616 Loss at step 900: 0.04441223293542862 Mean training loss after epoch 154: 0.0480414333342235 EPOCH: 155 Loss at step 0: 0.03954365849494934 Loss at step 50: 0.038076214492321014 Loss at step 100: 0.042772721499204636 Loss at step 150: 0.03764300048351288 Loss at step 200: 0.045437734574079514 Loss at step 250: 0.045676346868276596 Loss at step 300: 0.037698086351156235 Loss at step 350: 0.05182648450136185 Loss at step 400: 0.04024353623390198 Loss at step 450: 0.047634441405534744 Loss at step 500: 0.03999991714954376 Loss at step 550: 0.04662137106060982 Loss at step 600: 0.06413653492927551 Loss at step 650: 0.046583957970142365 Loss at step 700: 0.039558958262205124 Loss at step 750: 0.06201162934303284 Loss at step 800: 0.06745882332324982 Loss at step 850: 0.04451286420226097 Loss at step 900: 0.03870999813079834 Mean training loss after epoch 155: 0.04833967745828349 EPOCH: 156 Loss at step 0: 0.04155467450618744 Loss at step 50: 0.04500237852334976 Loss at step 100: 0.06213785707950592 Loss at step 150: 0.04752960801124573 Loss at step 200: 0.04445803910493851 Loss at step 250: 0.045776404440402985 Loss at step 300: 0.038761187344789505 Loss at step 350: 0.053915683180093765 Loss at step 400: 0.0479012094438076 Loss at step 450: 0.04025132209062576 Loss at step 500: 0.05762089416384697 Loss at step 550: 0.04133060947060585 Loss at step 600: 0.04038073495030403 Loss at step 650: 0.03854462504386902 Loss at step 700: 0.04015268757939339 Loss at step 750: 0.04108273237943649 Loss at step 800: 0.04079483076930046 Loss at step 850: 0.05419192090630531 Loss at step 900: 0.04428043216466904 Mean training loss after epoch 156: 0.04794969440682102 EPOCH: 157 Loss at step 0: 0.04512400925159454 Loss at step 50: 0.042229048907756805 Loss at step 100: 0.03599534556269646 Loss at step 150: 0.0425695963203907 Loss at step 200: 0.039569009095430374 Loss at step 250: 0.03814446181058884 Loss at step 300: 0.03872933238744736 Loss at step 350: 0.04000912234187126 Loss at step 400: 0.05306512489914894 Loss at step 450: 0.048274777829647064 Loss at step 500: 0.04127361252903938 Loss at step 550: 0.037766072899103165 Loss at step 600: 0.038731738924980164 Loss at step 650: 0.042483437806367874 Loss at step 700: 0.04545659199357033 Loss at step 750: 0.04230126366019249 Loss at step 800: 0.03913111984729767 Loss at step 850: 0.05486308038234711 Loss at step 900: 0.037370823323726654 Mean training loss after epoch 157: 0.04773757931615498 EPOCH: 158 Loss at step 0: 0.034559279680252075 Loss at step 50: 0.04275784641504288 Loss at step 100: 0.049133382737636566 Loss at step 150: 0.04944677650928497 Loss at step 200: 0.07183919847011566 Loss at step 250: 0.04344380646944046 Loss at step 300: 0.0733976885676384 Loss at step 350: 0.05442284420132637 Loss at step 400: 0.048356521874666214 Loss at step 450: 0.045110445469617844 Loss at step 500: 0.043389663100242615 Loss at step 550: 0.047038398683071136 Loss at step 600: 0.04686042293906212 Loss at step 650: 0.03596083074808121 Loss at step 700: 0.03871665149927139 Loss at step 750: 0.047651465982198715 Loss at step 800: 0.04248692840337753 Loss at step 850: 0.044925980269908905 Loss at step 900: 0.046804752200841904 Mean training loss after epoch 158: 0.047782646575525625 EPOCH: 159 Loss at step 0: 0.05593869090080261 Loss at step 50: 0.058280397206544876 Loss at step 100: 0.04470512643456459 Loss at step 150: 0.040098875761032104 Loss at step 200: 0.04546288028359413 Loss at step 250: 0.03717805817723274 Loss at step 300: 0.0678711086511612 Loss at step 350: 0.046012673527002335 Loss at step 400: 0.044276461005210876 Loss at step 450: 0.044945765286684036 Loss at step 500: 0.043988797813653946 Loss at step 550: 0.06140179559588432 Loss at step 600: 0.04631127044558525 Loss at step 650: 0.042346224188804626 Loss at step 700: 0.05324014648795128 Loss at step 750: 0.04134303703904152 Loss at step 800: 0.033862341195344925 Loss at step 850: 0.0526561513543129 Loss at step 900: 0.049821846187114716 Mean training loss after epoch 159: 0.047809329596775044 EPOCH: 160 Loss at step 0: 0.04540388658642769 Loss at step 50: 0.05537613108754158 Loss at step 100: 0.04735409468412399 Loss at step 150: 0.08175001293420792 Loss at step 200: 0.041856955736875534 Loss at step 250: 0.045059289783239365 Loss at step 300: 0.03776615485548973 Loss at step 350: 0.037512101233005524 Loss at step 400: 0.03858927637338638 Loss at step 450: 0.04542721435427666 Loss at step 500: 0.04226672276854515 Loss at step 550: 0.038330450654029846 Loss at step 600: 0.04116389527916908 Loss at step 650: 0.04830767959356308 Loss at step 700: 0.04205837845802307 Loss at step 750: 0.0429217554628849 Loss at step 800: 0.05768262594938278 Loss at step 850: 0.04009474813938141 Loss at step 900: 0.04810440540313721 Mean training loss after epoch 160: 0.04765331817032305 EPOCH: 161 Loss at step 0: 0.04326077178120613 Loss at step 50: 0.0755653828382492 Loss at step 100: 0.04228503257036209 Loss at step 150: 0.06257589161396027 Loss at step 200: 0.04525024816393852 Loss at step 250: 0.07635457813739777 Loss at step 300: 0.06882943958044052 Loss at step 350: 0.056726206094026566 Loss at step 400: 0.04079444333910942 Loss at step 450: 0.044670745730400085 Loss at step 500: 0.06855488568544388 Loss at step 550: 0.04803077131509781 Loss at step 600: 0.05635075643658638 Loss at step 650: 0.06654441356658936 Loss at step 700: 0.03761180862784386 Loss at step 750: 0.04306430742144585 Loss at step 800: 0.06389497965574265 Loss at step 850: 0.07804830372333527 Loss at step 900: 0.05677548050880432 Mean training loss after epoch 161: 0.048063445860134765 EPOCH: 162 Loss at step 0: 0.07487881928682327 Loss at step 50: 0.046307116746902466 Loss at step 100: 0.05867844074964523 Loss at step 150: 0.050710443407297134 Loss at step 200: 0.047236617654561996 Loss at step 250: 0.03848901391029358 Loss at step 300: 0.04445381462574005 Loss at step 350: 0.08095137774944305 Loss at step 400: 0.06667964905500412 Loss at step 450: 0.04531310126185417 Loss at step 500: 0.039744146168231964 Loss at step 550: 0.04062851518392563 Loss at step 600: 0.05665161460638046 Loss at step 650: 0.04299502447247505 Loss at step 700: 0.03974420577287674 Loss at step 750: 0.0584307499229908 Loss at step 800: 0.036681436002254486 Loss at step 850: 0.03878191113471985 Loss at step 900: 0.04742102697491646 Mean training loss after epoch 162: 0.0478052657975285 EPOCH: 163 Loss at step 0: 0.04728839918971062 Loss at step 50: 0.06780382245779037 Loss at step 100: 0.037290189415216446 Loss at step 150: 0.04960430786013603 Loss at step 200: 0.041062887758016586 Loss at step 250: 0.04563622921705246 Loss at step 300: 0.04115348681807518 Loss at step 350: 0.039581798017024994 Loss at step 400: 0.03971965238451958 Loss at step 450: 0.042402684688568115 Loss at step 500: 0.04605765640735626 Loss at step 550: 0.04233759641647339 Loss at step 600: 0.05710843950510025 Loss at step 650: 0.058112747967243195 Loss at step 700: 0.04582667723298073 Loss at step 750: 0.037784647196531296 Loss at step 800: 0.04192262515425682 Loss at step 850: 0.04018480330705643 Loss at step 900: 0.04028730466961861 Mean training loss after epoch 163: 0.048498545727654815 EPOCH: 164 Loss at step 0: 0.03863983601331711 Loss at step 50: 0.07076840102672577 Loss at step 100: 0.03974257782101631 Loss at step 150: 0.039578523486852646 Loss at step 200: 0.06738995760679245 Loss at step 250: 0.05871117115020752 Loss at step 300: 0.038366302847862244 Loss at step 350: 0.07523320615291595 Loss at step 400: 0.03736096993088722 Loss at step 450: 0.044970594346523285 Loss at step 500: 0.04567759111523628 Loss at step 550: 0.03908020630478859 Loss at step 600: 0.03966924548149109 Loss at step 650: 0.07065963745117188 Loss at step 700: 0.039956122636795044 Loss at step 750: 0.04448740556836128 Loss at step 800: 0.038425132632255554 Loss at step 850: 0.03914375603199005 Loss at step 900: 0.039881378412246704 Mean training loss after epoch 164: 0.04818853178718832 EPOCH: 165 Loss at step 0: 0.04275514557957649 Loss at step 50: 0.05698924884200096 Loss at step 100: 0.06184988468885422 Loss at step 150: 0.06128988415002823 Loss at step 200: 0.043407171964645386 Loss at step 250: 0.04046672210097313 Loss at step 300: 0.06366463750600815 Loss at step 350: 0.039501454681158066 Loss at step 400: 0.04681350290775299 Loss at step 450: 0.05270976200699806 Loss at step 500: 0.04818674549460411 Loss at step 550: 0.047339219599962234 Loss at step 600: 0.0589040145277977 Loss at step 650: 0.044746316969394684 Loss at step 700: 0.04189896956086159 Loss at step 750: 0.0372002050280571 Loss at step 800: 0.05116238817572594 Loss at step 850: 0.04464266821742058 Loss at step 900: 0.05948268994688988 Mean training loss after epoch 165: 0.04771104191086376 EPOCH: 166 Loss at step 0: 0.04820879176259041 Loss at step 50: 0.061463139951229095 Loss at step 100: 0.04127024859189987 Loss at step 150: 0.07459872215986252 Loss at step 200: 0.03874204680323601 Loss at step 250: 0.05351409688591957 Loss at step 300: 0.04781927168369293 Loss at step 350: 0.04361966997385025 Loss at step 400: 0.04515915364027023 Loss at step 450: 0.03657384589314461 Loss at step 500: 0.045050181448459625 Loss at step 550: 0.0649053305387497 Loss at step 600: 0.061859130859375 Loss at step 650: 0.038574982434511185 Loss at step 700: 0.03965464234352112 Loss at step 750: 0.05017831176519394 Loss at step 800: 0.0432170145213604 Loss at step 850: 0.04410000145435333 Loss at step 900: 0.03287072852253914 Mean training loss after epoch 166: 0.04807026966858202 EPOCH: 167 Loss at step 0: 0.048311568796634674 Loss at step 50: 0.04444672167301178 Loss at step 100: 0.036667563021183014 Loss at step 150: 0.041704367846250534 Loss at step 200: 0.04485035687685013 Loss at step 250: 0.04258917644619942 Loss at step 300: 0.05487225949764252 Loss at step 350: 0.05188580974936485 Loss at step 400: 0.05310903117060661 Loss at step 450: 0.037933070212602615 Loss at step 500: 0.03933123126626015 Loss at step 550: 0.036960113793611526 Loss at step 600: 0.048559267073869705 Loss at step 650: 0.05926473066210747 Loss at step 700: 0.06105296313762665 Loss at step 750: 0.058266531676054 Loss at step 800: 0.044637687504291534 Loss at step 850: 0.051973022520542145 Loss at step 900: 0.039842259138822556 Mean training loss after epoch 167: 0.047765946847750054 EPOCH: 168 Loss at step 0: 0.03583192080259323 Loss at step 50: 0.05071759223937988 Loss at step 100: 0.044631410390138626 Loss at step 150: 0.04129388928413391 Loss at step 200: 0.05522938072681427 Loss at step 250: 0.047773320227861404 Loss at step 300: 0.045271698385477066 Loss at step 350: 0.053860727697610855 Loss at step 400: 0.04081827774643898 Loss at step 450: 0.03988466039299965 Loss at step 500: 0.040832191705703735 Loss at step 550: 0.040615834295749664 Loss at step 600: 0.06027018278837204 Loss at step 650: 0.0552295446395874 Loss at step 700: 0.03635886311531067 Loss at step 750: 0.04683920741081238 Loss at step 800: 0.040551040321588516 Loss at step 850: 0.06692285090684891 Loss at step 900: 0.041895121335983276 Mean training loss after epoch 168: 0.047546703257222676 EPOCH: 169 Loss at step 0: 0.06241433322429657 Loss at step 50: 0.0348331555724144 Loss at step 100: 0.041900936514139175 Loss at step 150: 0.04626212641596794 Loss at step 200: 0.05040445178747177 Loss at step 250: 0.06138380244374275 Loss at step 300: 0.042326390743255615 Loss at step 350: 0.062196675688028336 Loss at step 400: 0.06258014589548111 Loss at step 450: 0.03955921158194542 Loss at step 500: 0.044862572103738785 Loss at step 550: 0.061056047677993774 Loss at step 600: 0.05452129617333412 Loss at step 650: 0.03808721527457237 Loss at step 700: 0.04335397481918335 Loss at step 750: 0.0546102374792099 Loss at step 800: 0.044229913502931595 Loss at step 850: 0.04321904852986336 Loss at step 900: 0.04298299923539162 Mean training loss after epoch 169: 0.04792103273615336 EPOCH: 170 Loss at step 0: 0.053736090660095215 Loss at step 50: 0.054893698543310165 Loss at step 100: 0.0687706395983696 Loss at step 150: 0.0470273420214653 Loss at step 200: 0.05783786624670029 Loss at step 250: 0.04437735676765442 Loss at step 300: 0.06868226826190948 Loss at step 350: 0.044668275862932205 Loss at step 400: 0.042323894798755646 Loss at step 450: 0.05258039012551308 Loss at step 500: 0.04457084462046623 Loss at step 550: 0.04323815181851387 Loss at step 600: 0.05789602920413017 Loss at step 650: 0.0465211383998394 Loss at step 700: 0.05117054283618927 Loss at step 750: 0.04157792404294014 Loss at step 800: 0.055859778076410294 Loss at step 850: 0.03837772086262703 Loss at step 900: 0.042630866169929504 Mean training loss after epoch 170: 0.04782832482619199 EPOCH: 171 Loss at step 0: 0.03933744132518768 Loss at step 50: 0.04144204780459404 Loss at step 100: 0.044694267213344574 Loss at step 150: 0.03813988342881203 Loss at step 200: 0.044000037014484406 Loss at step 250: 0.04249131307005882 Loss at step 300: 0.06107486039400101 Loss at step 350: 0.03424929454922676 Loss at step 400: 0.04060051590204239 Loss at step 450: 0.03581353649497032 Loss at step 500: 0.045312508940696716 Loss at step 550: 0.04219142720103264 Loss at step 600: 0.043060287833213806 Loss at step 650: 0.060591720044612885 Loss at step 700: 0.046898990869522095 Loss at step 750: 0.046386510133743286 Loss at step 800: 0.039342839270830154 Loss at step 850: 0.044247936457395554 Loss at step 900: 0.04336980730295181 Mean training loss after epoch 171: 0.047326321904259575 EPOCH: 172 Loss at step 0: 0.048242926597595215 Loss at step 50: 0.046787794679403305 Loss at step 100: 0.04514370113611221 Loss at step 150: 0.04028860107064247 Loss at step 200: 0.061843667179346085 Loss at step 250: 0.03985725715756416 Loss at step 300: 0.045516032725572586 Loss at step 350: 0.05544756352901459 Loss at step 400: 0.05456496775150299 Loss at step 450: 0.05566051974892616 Loss at step 500: 0.03286433219909668 Loss at step 550: 0.04833296686410904 Loss at step 600: 0.04450564831495285 Loss at step 650: 0.048302553594112396 Loss at step 700: 0.06939399987459183 Loss at step 750: 0.04342791438102722 Loss at step 800: 0.04265208542346954 Loss at step 850: 0.08692354708909988 Loss at step 900: 0.04115518182516098 Mean training loss after epoch 172: 0.04735287215743365 EPOCH: 173 Loss at step 0: 0.08069363981485367 Loss at step 50: 0.03429450839757919 Loss at step 100: 0.04992176964879036 Loss at step 150: 0.04373497888445854 Loss at step 200: 0.07297713309526443 Loss at step 250: 0.05464089661836624 Loss at step 300: 0.03572789207100868 Loss at step 350: 0.043281782418489456 Loss at step 400: 0.04134601354598999 Loss at step 450: 0.04099157825112343 Loss at step 500: 0.03934439271688461 Loss at step 550: 0.03919655457139015 Loss at step 600: 0.03829936683177948 Loss at step 650: 0.04881497845053673 Loss at step 700: 0.049676161259412766 Loss at step 750: 0.04690747708082199 Loss at step 800: 0.0400041788816452 Loss at step 850: 0.0408165380358696 Loss at step 900: 0.060348741710186005 Mean training loss after epoch 173: 0.047448974752079834 EPOCH: 174 Loss at step 0: 0.05278756469488144 Loss at step 50: 0.046692173928022385 Loss at step 100: 0.04280523210763931 Loss at step 150: 0.046241018921136856 Loss at step 200: 0.0625358298420906 Loss at step 250: 0.04309813678264618 Loss at step 300: 0.0395776741206646 Loss at step 350: 0.053622547537088394 Loss at step 400: 0.039063770323991776 Loss at step 450: 0.03832712024450302 Loss at step 500: 0.04065966233611107 Loss at step 550: 0.048303812742233276 Loss at step 600: 0.03805069997906685 Loss at step 650: 0.043353673070669174 Loss at step 700: 0.03893181309103966 Loss at step 750: 0.038492463529109955 Loss at step 800: 0.051266323775053024 Loss at step 850: 0.041366592049598694 Loss at step 900: 0.04213057830929756 Mean training loss after epoch 174: 0.04726688832894508 EPOCH: 175 Loss at step 0: 0.0575219951570034 Loss at step 50: 0.04046965390443802 Loss at step 100: 0.05921709164977074 Loss at step 150: 0.042010754346847534 Loss at step 200: 0.03955774009227753 Loss at step 250: 0.056155428290367126 Loss at step 300: 0.04134117811918259 Loss at step 350: 0.040671493858098984 Loss at step 400: 0.03142482414841652 Loss at step 450: 0.042936746031045914 Loss at step 500: 0.043023109436035156 Loss at step 550: 0.05444042384624481 Loss at step 600: 0.04467251896858215 Loss at step 650: 0.03788699209690094 Loss at step 700: 0.05118957906961441 Loss at step 750: 0.04554032161831856 Loss at step 800: 0.06459875404834747 Loss at step 850: 0.042506709694862366 Loss at step 900: 0.045860692858695984 Mean training loss after epoch 175: 0.047596867996524135 EPOCH: 176 Loss at step 0: 0.03673785179853439 Loss at step 50: 0.06085589528083801 Loss at step 100: 0.04084182158112526 Loss at step 150: 0.07729913294315338 Loss at step 200: 0.05132095515727997 Loss at step 250: 0.04874790832400322 Loss at step 300: 0.04346683993935585 Loss at step 350: 0.037337351590394974 Loss at step 400: 0.03889841213822365 Loss at step 450: 0.04347096383571625 Loss at step 500: 0.037590544670820236 Loss at step 550: 0.04141179844737053 Loss at step 600: 0.03916460648179054 Loss at step 650: 0.05615443363785744 Loss at step 700: 0.056202225387096405 Loss at step 750: 0.04267645627260208 Loss at step 800: 0.06632127612829208 Loss at step 850: 0.0839352235198021 Loss at step 900: 0.03491226211190224 Mean training loss after epoch 176: 0.04734463965111196 EPOCH: 177 Loss at step 0: 0.0587606318295002 Loss at step 50: 0.0387873500585556 Loss at step 100: 0.06058310344815254 Loss at step 150: 0.05957905948162079 Loss at step 200: 0.05863909050822258 Loss at step 250: 0.058735329657793045 Loss at step 300: 0.0460820272564888 Loss at step 350: 0.03849330544471741 Loss at step 400: 0.04301401972770691 Loss at step 450: 0.04044269025325775 Loss at step 500: 0.04267947003245354 Loss at step 550: 0.038330283015966415 Loss at step 600: 0.0531792975962162 Loss at step 650: 0.056672241538763046 Loss at step 700: 0.04561460018157959 Loss at step 750: 0.04132943972945213 Loss at step 800: 0.038321930915117264 Loss at step 850: 0.04893188923597336 Loss at step 900: 0.049419865012168884 Mean training loss after epoch 177: 0.04735273427602007 EPOCH: 178 Loss at step 0: 0.04519001394510269 Loss at step 50: 0.05503096804022789 Loss at step 100: 0.05623124539852142 Loss at step 150: 0.04818836599588394 Loss at step 200: 0.050508517771959305 Loss at step 250: 0.04574983939528465 Loss at step 300: 0.05991625040769577 Loss at step 350: 0.04266383498907089 Loss at step 400: 0.0401555597782135 Loss at step 450: 0.04020598903298378 Loss at step 500: 0.04338383674621582 Loss at step 550: 0.03658461570739746 Loss at step 600: 0.041277553886175156 Loss at step 650: 0.037133023142814636 Loss at step 700: 0.04255005717277527 Loss at step 750: 0.04625401273369789 Loss at step 800: 0.036323707550764084 Loss at step 850: 0.04118512198328972 Loss at step 900: 0.05181794986128807 Mean training loss after epoch 178: 0.048210509544385396 EPOCH: 179 Loss at step 0: 0.03560945391654968 Loss at step 50: 0.04546826705336571 Loss at step 100: 0.03838157653808594 Loss at step 150: 0.06325048953294754 Loss at step 200: 0.046791765838861465 Loss at step 250: 0.05577459931373596 Loss at step 300: 0.05988738685846329 Loss at step 350: 0.057163212448358536 Loss at step 400: 0.03954584151506424 Loss at step 450: 0.04145480692386627 Loss at step 500: 0.07393773645162582 Loss at step 550: 0.07935898751020432 Loss at step 600: 0.05602613091468811 Loss at step 650: 0.05532299727201462 Loss at step 700: 0.05308080092072487 Loss at step 750: 0.04252909496426582 Loss at step 800: 0.04694265127182007 Loss at step 850: 0.03434552997350693 Loss at step 900: 0.06521283835172653 Mean training loss after epoch 179: 0.04760616249653068 EPOCH: 180 Loss at step 0: 0.038117047399282455 Loss at step 50: 0.04210633412003517 Loss at step 100: 0.04016827046871185 Loss at step 150: 0.039608415216207504 Loss at step 200: 0.05489107966423035 Loss at step 250: 0.03685580566525459 Loss at step 300: 0.045024625957012177 Loss at step 350: 0.040137290954589844 Loss at step 400: 0.04312143102288246 Loss at step 450: 0.03888474032282829 Loss at step 500: 0.039244163781404495 Loss at step 550: 0.057850319892168045 Loss at step 600: 0.04972587153315544 Loss at step 650: 0.03404628485441208 Loss at step 700: 0.04657438397407532 Loss at step 750: 0.054797857999801636 Loss at step 800: 0.04266953840851784 Loss at step 850: 0.03932221978902817 Loss at step 900: 0.06294462829828262 Mean training loss after epoch 180: 0.0474428842062635 EPOCH: 181 Loss at step 0: 0.047666214406490326 Loss at step 50: 0.04493202269077301 Loss at step 100: 0.04621574655175209 Loss at step 150: 0.039545316249132156 Loss at step 200: 0.053358159959316254 Loss at step 250: 0.06034937500953674 Loss at step 300: 0.039920296519994736 Loss at step 350: 0.046864718198776245 Loss at step 400: 0.03954041004180908 Loss at step 450: 0.039751939475536346 Loss at step 500: 0.06023193895816803 Loss at step 550: 0.044349733740091324 Loss at step 600: 0.057992108166217804 Loss at step 650: 0.07416155189275742 Loss at step 700: 0.0396910235285759 Loss at step 750: 0.035366322845220566 Loss at step 800: 0.0489087738096714 Loss at step 850: 0.050106845796108246 Loss at step 900: 0.05105986446142197 Mean training loss after epoch 181: 0.047331239215966096 EPOCH: 182 Loss at step 0: 0.04026282578706741 Loss at step 50: 0.05612548813223839 Loss at step 100: 0.03620382770895958 Loss at step 150: 0.04726428911089897 Loss at step 200: 0.04515327513217926 Loss at step 250: 0.06388495117425919 Loss at step 300: 0.04183843731880188 Loss at step 350: 0.062432996928691864 Loss at step 400: 0.03732362389564514 Loss at step 450: 0.039694517850875854 Loss at step 500: 0.05779682844877243 Loss at step 550: 0.048597659915685654 Loss at step 600: 0.04815478250384331 Loss at step 650: 0.04316947981715202 Loss at step 700: 0.04150030389428139 Loss at step 750: 0.05083782598376274 Loss at step 800: 0.04254673421382904 Loss at step 850: 0.04585887864232063 Loss at step 900: 0.05327890068292618 Mean training loss after epoch 182: 0.04774329603623861 EPOCH: 183 Loss at step 0: 0.04404857009649277 Loss at step 50: 0.047710102051496506 Loss at step 100: 0.04351811483502388 Loss at step 150: 0.05533577874302864 Loss at step 200: 0.04576675966382027 Loss at step 250: 0.056674547493457794 Loss at step 300: 0.036151062697172165 Loss at step 350: 0.035286709666252136 Loss at step 400: 0.029603268951177597 Loss at step 450: 0.047485724091529846 Loss at step 500: 0.04846247285604477 Loss at step 550: 0.049435727298259735 Loss at step 600: 0.03942878916859627 Loss at step 650: 0.05948914587497711 Loss at step 700: 0.04899653047323227 Loss at step 750: 0.052653077989816666 Loss at step 800: 0.04149549826979637 Loss at step 850: 0.03887404501438141 Loss at step 900: 0.04937179014086723 Mean training loss after epoch 183: 0.04786680022806628 EPOCH: 184 Loss at step 0: 0.03755056485533714 Loss at step 50: 0.06816501170396805 Loss at step 100: 0.03630031272768974 Loss at step 150: 0.044707152992486954 Loss at step 200: 0.041807811707258224 Loss at step 250: 0.04240897297859192 Loss at step 300: 0.05743900313973427 Loss at step 350: 0.0599166564643383 Loss at step 400: 0.044668927788734436 Loss at step 450: 0.04632830619812012 Loss at step 500: 0.05630120262503624 Loss at step 550: 0.05834256857633591 Loss at step 600: 0.04330079257488251 Loss at step 650: 0.04237429052591324 Loss at step 700: 0.039245057851076126 Loss at step 750: 0.04478013515472412 Loss at step 800: 0.04519081115722656 Loss at step 850: 0.06446610391139984 Loss at step 900: 0.051682326942682266 Mean training loss after epoch 184: 0.04745182287154485 EPOCH: 185 Loss at step 0: 0.04205906018614769 Loss at step 50: 0.050259850919246674 Loss at step 100: 0.046109504997730255 Loss at step 150: 0.05521038547158241 Loss at step 200: 0.046026334166526794 Loss at step 250: 0.04400377348065376 Loss at step 300: 0.045639995485544205 Loss at step 350: 0.0730522945523262 Loss at step 400: 0.041286420077085495 Loss at step 450: 0.04184383153915405 Loss at step 500: 0.042843468487262726 Loss at step 550: 0.04336606711149216 Loss at step 600: 0.043566226959228516 Loss at step 650: 0.06014822795987129 Loss at step 700: 0.04736814647912979 Loss at step 750: 0.04357977584004402 Loss at step 800: 0.03346090763807297 Loss at step 850: 0.05796531215310097 Loss at step 900: 0.03858127444982529 Mean training loss after epoch 185: 0.04762490522036992 EPOCH: 186 Loss at step 0: 0.06015775352716446 Loss at step 50: 0.034273549914360046 Loss at step 100: 0.044088419526815414 Loss at step 150: 0.0405399315059185 Loss at step 200: 0.04146652668714523 Loss at step 250: 0.038672931492328644 Loss at step 300: 0.06452177464962006 Loss at step 350: 0.04180072247982025 Loss at step 400: 0.04651358723640442 Loss at step 450: 0.03588107228279114 Loss at step 500: 0.05570957437157631 Loss at step 550: 0.045239679515361786 Loss at step 600: 0.056078217923641205 Loss at step 650: 0.045940566807985306 Loss at step 700: 0.04604792967438698 Loss at step 750: 0.059157516807317734 Loss at step 800: 0.03747933357954025 Loss at step 850: 0.04564405977725983 Loss at step 900: 0.042597945779561996 Mean training loss after epoch 186: 0.04755176525952211 EPOCH: 187 Loss at step 0: 0.04132939875125885 Loss at step 50: 0.05010872334241867 Loss at step 100: 0.058861467987298965 Loss at step 150: 0.0579364188015461 Loss at step 200: 0.03836563974618912 Loss at step 250: 0.04487018659710884 Loss at step 300: 0.039082977920770645 Loss at step 350: 0.04587705805897713 Loss at step 400: 0.0473325289785862 Loss at step 450: 0.05725112184882164 Loss at step 500: 0.04225655272603035 Loss at step 550: 0.043946653604507446 Loss at step 600: 0.04035954549908638 Loss at step 650: 0.04044673964381218 Loss at step 700: 0.059851184487342834 Loss at step 750: 0.05071434751152992 Loss at step 800: 0.04210760071873665 Loss at step 850: 0.03896353021264076 Loss at step 900: 0.05229070037603378 Mean training loss after epoch 187: 0.0473269669613096 EPOCH: 188 Loss at step 0: 0.05119980499148369 Loss at step 50: 0.04287783056497574 Loss at step 100: 0.057657647877931595 Loss at step 150: 0.042366448789834976 Loss at step 200: 0.049015842378139496 Loss at step 250: 0.055862512439489365 Loss at step 300: 0.03757524490356445 Loss at step 350: 0.04207034036517143 Loss at step 400: 0.045083366334438324 Loss at step 450: 0.059483736753463745 Loss at step 500: 0.04152573272585869 Loss at step 550: 0.04262612387537956 Loss at step 600: 0.051372673362493515 Loss at step 650: 0.040087610483169556 Loss at step 700: 0.055833570659160614 Loss at step 750: 0.04518987610936165 Loss at step 800: 0.044332440942525864 Loss at step 850: 0.05918114259839058 Loss at step 900: 0.043632928282022476 Mean training loss after epoch 188: 0.04717418309380569 EPOCH: 189 Loss at step 0: 0.047010838985443115 Loss at step 50: 0.03771784156560898 Loss at step 100: 0.03946223109960556 Loss at step 150: 0.03927743062376976 Loss at step 200: 0.04313114657998085 Loss at step 250: 0.04213150963187218 Loss at step 300: 0.04186033457517624 Loss at step 350: 0.031576432287693024 Loss at step 400: 0.041887857019901276 Loss at step 450: 0.04159478843212128 Loss at step 500: 0.04046766459941864 Loss at step 550: 0.047089964151382446 Loss at step 600: 0.05251268669962883 Loss at step 650: 0.0437716543674469 Loss at step 700: 0.07849439978599548 Loss at step 750: 0.03858314827084541 Loss at step 800: 0.055375199764966965 Loss at step 850: 0.05185810849070549 Loss at step 900: 0.06199009343981743 Mean training loss after epoch 189: 0.046852382386464685 EPOCH: 190 Loss at step 0: 0.04514359310269356 Loss at step 50: 0.03440997004508972 Loss at step 100: 0.05935356393456459 Loss at step 150: 0.041912730783224106 Loss at step 200: 0.04255301132798195 Loss at step 250: 0.04536569118499756 Loss at step 300: 0.05859140679240227 Loss at step 350: 0.04762500151991844 Loss at step 400: 0.03543528541922569 Loss at step 450: 0.033522430807352066 Loss at step 500: 0.039455339312553406 Loss at step 550: 0.03864050656557083 Loss at step 600: 0.057356882840394974 Loss at step 650: 0.046711619943380356 Loss at step 700: 0.04612146317958832 Loss at step 750: 0.062081705778837204 Loss at step 800: 0.03487958014011383 Loss at step 850: 0.03612062707543373 Loss at step 900: 0.047697700560092926 Mean training loss after epoch 190: 0.047026167627670236 EPOCH: 191 Loss at step 0: 0.039414435625076294 Loss at step 50: 0.04223354905843735 Loss at step 100: 0.05274349823594093 Loss at step 150: 0.06195962056517601 Loss at step 200: 0.03695555403828621 Loss at step 250: 0.040743954479694366 Loss at step 300: 0.06529859453439713 Loss at step 350: 0.06378206610679626 Loss at step 400: 0.05489561706781387 Loss at step 450: 0.041718628257513046 Loss at step 500: 0.05879081413149834 Loss at step 550: 0.06051637604832649 Loss at step 600: 0.08561386168003082 Loss at step 650: 0.05148482695221901 Loss at step 700: 0.040529463440179825 Loss at step 750: 0.07472296059131622 Loss at step 800: 0.04400051757693291 Loss at step 850: 0.05773424357175827 Loss at step 900: 0.040698159486055374 Mean training loss after epoch 191: 0.04786672354387894 EPOCH: 192 Loss at step 0: 0.041321203112602234 Loss at step 50: 0.04145752638578415 Loss at step 100: 0.0428769625723362 Loss at step 150: 0.04243357852101326 Loss at step 200: 0.051680319011211395 Loss at step 250: 0.039200298488140106 Loss at step 300: 0.046646252274513245 Loss at step 350: 0.06262180954217911 Loss at step 400: 0.047752588987350464 Loss at step 450: 0.04620083421468735 Loss at step 500: 0.04256881773471832 Loss at step 550: 0.05828232690691948 Loss at step 600: 0.03637267276644707 Loss at step 650: 0.05465107411146164 Loss at step 700: 0.060324572026729584 Loss at step 750: 0.04122084751725197 Loss at step 800: 0.04849835857748985 Loss at step 850: 0.07015468180179596 Loss at step 900: 0.05295858532190323 Mean training loss after epoch 192: 0.04733888250090547 EPOCH: 193 Loss at step 0: 0.041436925530433655 Loss at step 50: 0.04491574689745903 Loss at step 100: 0.05086640641093254 Loss at step 150: 0.043427594006061554 Loss at step 200: 0.05979708582162857 Loss at step 250: 0.049593325704336166 Loss at step 300: 0.04221150279045105 Loss at step 350: 0.051194801926612854 Loss at step 400: 0.043865837156772614 Loss at step 450: 0.040090057998895645 Loss at step 500: 0.0436972975730896 Loss at step 550: 0.04455951601266861 Loss at step 600: 0.08402562141418457 Loss at step 650: 0.04182561859488487 Loss at step 700: 0.03900632634758949 Loss at step 750: 0.06474176794290543 Loss at step 800: 0.04060227796435356 Loss at step 850: 0.04922721907496452 Loss at step 900: 0.04032108187675476 Mean training loss after epoch 193: 0.04722758357141064 EPOCH: 194 Loss at step 0: 0.04199838638305664 Loss at step 50: 0.040771134197711945 Loss at step 100: 0.036027662456035614 Loss at step 150: 0.056184981018304825 Loss at step 200: 0.03271754831075668 Loss at step 250: 0.05203456059098244 Loss at step 300: 0.0427439846098423 Loss at step 350: 0.0399836041033268 Loss at step 400: 0.06174096092581749 Loss at step 450: 0.05949702858924866 Loss at step 500: 0.038518719375133514 Loss at step 550: 0.0410148948431015 Loss at step 600: 0.04330739006400108 Loss at step 650: 0.06056107580661774 Loss at step 700: 0.0466480553150177 Loss at step 750: 0.05166352912783623 Loss at step 800: 0.04770488664507866 Loss at step 850: 0.04806135967373848 Loss at step 900: 0.04603643715381622 Mean training loss after epoch 194: 0.04752552920360682 EPOCH: 195 Loss at step 0: 0.05844394490122795 Loss at step 50: 0.07258730381727219 Loss at step 100: 0.04749195650219917 Loss at step 150: 0.036580298095941544 Loss at step 200: 0.03887391835451126 Loss at step 250: 0.04019351303577423 Loss at step 300: 0.04878673329949379 Loss at step 350: 0.0414201021194458 Loss at step 400: 0.059882279485464096 Loss at step 450: 0.03595207259058952 Loss at step 500: 0.05353190749883652 Loss at step 550: 0.05391766130924225 Loss at step 600: 0.03533376753330231 Loss at step 650: 0.054236188530921936 Loss at step 700: 0.045192066580057144 Loss at step 750: 0.04041789099574089 Loss at step 800: 0.06234394758939743 Loss at step 850: 0.04358278959989548 Loss at step 900: 0.05819908156991005 Mean training loss after epoch 195: 0.04736351688851172 EPOCH: 196 Loss at step 0: 0.05739545822143555 Loss at step 50: 0.03522986173629761 Loss at step 100: 0.044359274208545685 Loss at step 150: 0.04070301353931427 Loss at step 200: 0.04432486370205879 Loss at step 250: 0.048534441739320755 Loss at step 300: 0.041911330074071884 Loss at step 350: 0.05188979208469391 Loss at step 400: 0.046585943549871445 Loss at step 450: 0.040595196187496185 Loss at step 500: 0.05591829493641853 Loss at step 550: 0.042053140699863434 Loss at step 600: 0.04108443483710289 Loss at step 650: 0.042703598737716675 Loss at step 700: 0.04112744703888893 Loss at step 750: 0.04483233019709587 Loss at step 800: 0.03621356561779976 Loss at step 850: 0.04369891434907913 Loss at step 900: 0.06977087259292603 Mean training loss after epoch 196: 0.04745578305966564 EPOCH: 197 Loss at step 0: 0.07109194248914719 Loss at step 50: 0.06163087114691734 Loss at step 100: 0.04594847559928894 Loss at step 150: 0.042514555156230927 Loss at step 200: 0.07387527823448181 Loss at step 250: 0.05659754201769829 Loss at step 300: 0.04215007647871971 Loss at step 350: 0.04368995130062103 Loss at step 400: 0.0501178577542305 Loss at step 450: 0.04699409008026123 Loss at step 500: 0.04706057533621788 Loss at step 550: 0.0455315038561821 Loss at step 600: 0.03235261142253876 Loss at step 650: 0.0531482994556427 Loss at step 700: 0.03937156870961189 Loss at step 750: 0.047523051500320435 Loss at step 800: 0.043409910053014755 Loss at step 850: 0.05969436839222908 Loss at step 900: 0.04288965091109276 Mean training loss after epoch 197: 0.04675396697830035 EPOCH: 198 Loss at step 0: 0.03910886123776436 Loss at step 50: 0.040962666273117065 Loss at step 100: 0.04430381581187248 Loss at step 150: 0.043251629918813705 Loss at step 200: 0.04260077700018883 Loss at step 250: 0.036410704255104065 Loss at step 300: 0.04658989980816841 Loss at step 350: 0.03709014877676964 Loss at step 400: 0.0391421839594841 Loss at step 450: 0.038733113557100296 Loss at step 500: 0.033423468470573425 Loss at step 550: 0.06096374988555908 Loss at step 600: 0.07837387919425964 Loss at step 650: 0.04770457744598389 Loss at step 700: 0.04611004516482353 Loss at step 750: 0.04323802515864372 Loss at step 800: 0.051454924046993256 Loss at step 850: 0.04396731033921242 Loss at step 900: 0.037516966462135315 Mean training loss after epoch 198: 0.04732492997416301 EPOCH: 199 Loss at step 0: 0.0647946447134018 Loss at step 50: 0.05926031991839409 Loss at step 100: 0.043709881603717804 Loss at step 150: 0.046477336436510086 Loss at step 200: 0.04161691665649414 Loss at step 250: 0.039948053658008575 Loss at step 300: 0.03915286436676979 Loss at step 350: 0.039333511143922806 Loss at step 400: 0.040510762482881546 Loss at step 450: 0.07470471411943436 Loss at step 500: 0.04168612137436867 Loss at step 550: 0.043100569397211075 Loss at step 600: 0.05323639512062073 Loss at step 650: 0.04470862075686455 Loss at step 700: 0.06419889628887177 Loss at step 750: 0.04313807189464569 Loss at step 800: 0.06811791658401489 Loss at step 850: 0.04139979928731918 Loss at step 900: 0.06052303686738014 Mean training loss after epoch 199: 0.04719927428222732 EPOCH: 200 Loss at step 0: 0.04722476005554199 Loss at step 50: 0.04078567028045654 Loss at step 100: 0.05497278273105621 Loss at step 150: 0.06030789390206337 Loss at step 200: 0.042657218873500824 Loss at step 250: 0.040267445147037506 Loss at step 300: 0.03635428473353386 Loss at step 350: 0.057234786450862885 Loss at step 400: 0.045503079891204834 Loss at step 450: 0.04130933806300163 Loss at step 500: 0.03366037458181381 Loss at step 550: 0.03974539414048195 Loss at step 600: 0.059761714190244675 Loss at step 650: 0.05435216426849365 Loss at step 700: 0.04387698695063591 Loss at step 750: 0.04825926572084427 Loss at step 800: 0.05772475153207779 Loss at step 850: 0.04070121422410011 Loss at step 900: 0.05665159970521927 Mean training loss after epoch 200: 0.047546687645158535 EPOCH: 201 Loss at step 0: 0.05883101001381874 Loss at step 50: 0.04619805887341499 Loss at step 100: 0.044048357754945755 Loss at step 150: 0.037990376353263855 Loss at step 200: 0.047216884791851044 Loss at step 250: 0.03588378429412842 Loss at step 300: 0.08154052495956421 Loss at step 350: 0.04108566418290138 Loss at step 400: 0.039325907826423645 Loss at step 450: 0.04783937707543373 Loss at step 500: 0.05628166347742081 Loss at step 550: 0.04968461021780968 Loss at step 600: 0.04157556593418121 Loss at step 650: 0.057879600673913956 Loss at step 700: 0.04595500975847244 Loss at step 750: 0.04229682683944702 Loss at step 800: 0.04214996099472046 Loss at step 850: 0.06010609492659569 Loss at step 900: 0.04151468724012375 Mean training loss after epoch 201: 0.04705871738342524 EPOCH: 202 Loss at step 0: 0.04560292512178421 Loss at step 50: 0.0476008802652359 Loss at step 100: 0.04099060595035553 Loss at step 150: 0.043428339064121246 Loss at step 200: 0.0387798473238945 Loss at step 250: 0.04484549164772034 Loss at step 300: 0.033436279743909836 Loss at step 350: 0.044179633259773254 Loss at step 400: 0.04054473340511322 Loss at step 450: 0.055530477315187454 Loss at step 500: 0.06623927503824234 Loss at step 550: 0.048612307757139206 Loss at step 600: 0.03906761854887009 Loss at step 650: 0.046095769852399826 Loss at step 700: 0.034860629588365555 Loss at step 750: 0.05650866776704788 Loss at step 800: 0.040113870054483414 Loss at step 850: 0.04454489424824715 Loss at step 900: 0.04427119717001915 Mean training loss after epoch 202: 0.04685303686198586 EPOCH: 203 Loss at step 0: 0.04189928621053696 Loss at step 50: 0.04080265387892723 Loss at step 100: 0.03463473543524742 Loss at step 150: 0.04628198966383934 Loss at step 200: 0.04216468334197998 Loss at step 250: 0.05883002653717995 Loss at step 300: 0.043016113340854645 Loss at step 350: 0.053891900926828384 Loss at step 400: 0.04589606821537018 Loss at step 450: 0.04239628091454506 Loss at step 500: 0.04346850886940956 Loss at step 550: 0.055129628628492355 Loss at step 600: 0.044313959777355194 Loss at step 650: 0.0410446859896183 Loss at step 700: 0.03996780514717102 Loss at step 750: 0.036751117557287216 Loss at step 800: 0.04869403690099716 Loss at step 850: 0.044838204979896545 Loss at step 900: 0.04262921214103699 Mean training loss after epoch 203: 0.04722267224876357 EPOCH: 204 Loss at step 0: 0.045907728374004364 Loss at step 50: 0.05250387638807297 Loss at step 100: 0.04007772356271744 Loss at step 150: 0.04549560323357582 Loss at step 200: 0.043780550360679626 Loss at step 250: 0.047059670090675354 Loss at step 300: 0.042085714638233185 Loss at step 350: 0.03799083083868027 Loss at step 400: 0.05460008233785629 Loss at step 450: 0.041854243725538254 Loss at step 500: 0.040584612637758255 Loss at step 550: 0.045623779296875 Loss at step 600: 0.03949065878987312 Loss at step 650: 0.03711169213056564 Loss at step 700: 0.04408833384513855 Loss at step 750: 0.061064448207616806 Loss at step 800: 0.06394980102777481 Loss at step 850: 0.06175262853503227 Loss at step 900: 0.044184643775224686 Mean training loss after epoch 204: 0.04745814601368487 EPOCH: 205 Loss at step 0: 0.041288189589977264 Loss at step 50: 0.038685742765665054 Loss at step 100: 0.038626790046691895 Loss at step 150: 0.04322343319654465 Loss at step 200: 0.03561465069651604 Loss at step 250: 0.041365399956703186 Loss at step 300: 0.06124166026711464 Loss at step 350: 0.03973355516791344 Loss at step 400: 0.03603079542517662 Loss at step 450: 0.047554221004247665 Loss at step 500: 0.04057328402996063 Loss at step 550: 0.04764296114444733 Loss at step 600: 0.0435919351875782 Loss at step 650: 0.03940914571285248 Loss at step 700: 0.0384216234087944 Loss at step 750: 0.05625380203127861 Loss at step 800: 0.04380418732762337 Loss at step 850: 0.04211685433983803 Loss at step 900: 0.05546120926737785 Mean training loss after epoch 205: 0.04745016545891317 EPOCH: 206 Loss at step 0: 0.034765541553497314 Loss at step 50: 0.038723722100257874 Loss at step 100: 0.0454399473965168 Loss at step 150: 0.05223909765481949 Loss at step 200: 0.04094790667295456 Loss at step 250: 0.042603228241205215 Loss at step 300: 0.04334636405110359 Loss at step 350: 0.05328739807009697 Loss at step 400: 0.050095655024051666 Loss at step 450: 0.04107634723186493 Loss at step 500: 0.04035646840929985 Loss at step 550: 0.0454658679664135 Loss at step 600: 0.03692343086004257 Loss at step 650: 0.047351475805044174 Loss at step 700: 0.03826805576682091 Loss at step 750: 0.04734167829155922 Loss at step 800: 0.057458654046058655 Loss at step 850: 0.0445697121322155 Loss at step 900: 0.04746592789888382 Mean training loss after epoch 206: 0.04654643530331885 EPOCH: 207 Loss at step 0: 0.040064748376607895 Loss at step 50: 0.038938406854867935 Loss at step 100: 0.04217700660228729 Loss at step 150: 0.03899727016687393 Loss at step 200: 0.03661825880408287 Loss at step 250: 0.04186464846134186 Loss at step 300: 0.03987332805991173 Loss at step 350: 0.05648341402411461 Loss at step 400: 0.04927570000290871 Loss at step 450: 0.04263843595981598 Loss at step 500: 0.03791206330060959 Loss at step 550: 0.050119176506996155 Loss at step 600: 0.06256107240915298 Loss at step 650: 0.04496186971664429 Loss at step 700: 0.03763065114617348 Loss at step 750: 0.050966426730155945 Loss at step 800: 0.06336591392755508 Loss at step 850: 0.04951062053442001 Loss at step 900: 0.04180419445037842 Mean training loss after epoch 207: 0.047142072995779105 EPOCH: 208 Loss at step 0: 0.054582249373197556 Loss at step 50: 0.03830692544579506 Loss at step 100: 0.03415938839316368 Loss at step 150: 0.05044867843389511 Loss at step 200: 0.03850356489419937 Loss at step 250: 0.05492454022169113 Loss at step 300: 0.05722159519791603 Loss at step 350: 0.04443616047501564 Loss at step 400: 0.06136094033718109 Loss at step 450: 0.034933432936668396 Loss at step 500: 0.05902191996574402 Loss at step 550: 0.03807670995593071 Loss at step 600: 0.04836656153202057 Loss at step 650: 0.06500979512929916 Loss at step 700: 0.03956969082355499 Loss at step 750: 0.07490810751914978 Loss at step 800: 0.048251569271087646 Loss at step 850: 0.041129112243652344 Loss at step 900: 0.042175184935331345 Mean training loss after epoch 208: 0.04736662810561118 EPOCH: 209 Loss at step 0: 0.04029260203242302 Loss at step 50: 0.05940786376595497 Loss at step 100: 0.04576461389660835 Loss at step 150: 0.04029528424143791 Loss at step 200: 0.03479953482747078 Loss at step 250: 0.03907564654946327 Loss at step 300: 0.03972654044628143 Loss at step 350: 0.0501459576189518 Loss at step 400: 0.0418044775724411 Loss at step 450: 0.04144881293177605 Loss at step 500: 0.04213939234614372 Loss at step 550: 0.04146023094654083 Loss at step 600: 0.054995860904455185 Loss at step 650: 0.041662462055683136 Loss at step 700: 0.05348978936672211 Loss at step 750: 0.044034022837877274 Loss at step 800: 0.03991950303316116 Loss at step 850: 0.03975271061062813 Loss at step 900: 0.03791988268494606 Mean training loss after epoch 209: 0.04735934226783608 EPOCH: 210 Loss at step 0: 0.03790666162967682 Loss at step 50: 0.04619823023676872 Loss at step 100: 0.03762363642454147 Loss at step 150: 0.05611611157655716 Loss at step 200: 0.04840629920363426 Loss at step 250: 0.040922801941633224 Loss at step 300: 0.035502057522535324 Loss at step 350: 0.0424276739358902 Loss at step 400: 0.04930967092514038 Loss at step 450: 0.04652145877480507 Loss at step 500: 0.07391101121902466 Loss at step 550: 0.04691127687692642 Loss at step 600: 0.043226275593042374 Loss at step 650: 0.03128940239548683 Loss at step 700: 0.04555974528193474 Loss at step 750: 0.0455792136490345 Loss at step 800: 0.04832574352622032 Loss at step 850: 0.05956569314002991 Loss at step 900: 0.04815487936139107 Mean training loss after epoch 210: 0.04750805232785086 EPOCH: 211 Loss at step 0: 0.04331015422940254 Loss at step 50: 0.037901148200035095 Loss at step 100: 0.04370182380080223 Loss at step 150: 0.053254690021276474 Loss at step 200: 0.03575199469923973 Loss at step 250: 0.04344582557678223 Loss at step 300: 0.03810238465666771 Loss at step 350: 0.04735643044114113 Loss at step 400: 0.053177185356616974 Loss at step 450: 0.038579873740673065 Loss at step 500: 0.056145068258047104 Loss at step 550: 0.06175151467323303 Loss at step 600: 0.04711979627609253 Loss at step 650: 0.07187428325414658 Loss at step 700: 0.05031876638531685 Loss at step 750: 0.040170419961214066 Loss at step 800: 0.04001624509692192 Loss at step 850: 0.04776638373732567 Loss at step 900: 0.04222323000431061 Mean training loss after epoch 211: 0.046790155546386235 EPOCH: 212 Loss at step 0: 0.03550932928919792 Loss at step 50: 0.05087056756019592 Loss at step 100: 0.04115816950798035 Loss at step 150: 0.037096232175827026 Loss at step 200: 0.04893102869391441 Loss at step 250: 0.04111657664179802 Loss at step 300: 0.04713604226708412 Loss at step 350: 0.05851547047495842 Loss at step 400: 0.04332994297146797 Loss at step 450: 0.03713301569223404 Loss at step 500: 0.05493410304188728 Loss at step 550: 0.0368155911564827 Loss at step 600: 0.04319209232926369 Loss at step 650: 0.05126115679740906 Loss at step 700: 0.05852140486240387 Loss at step 750: 0.03431258350610733 Loss at step 800: 0.05106879770755768 Loss at step 850: 0.03759405016899109 Loss at step 900: 0.03601706027984619 Mean training loss after epoch 212: 0.04719335970077624 EPOCH: 213 Loss at step 0: 0.06436979025602341 Loss at step 50: 0.039839744567871094 Loss at step 100: 0.03833356499671936 Loss at step 150: 0.04300548881292343 Loss at step 200: 0.05223258584737778 Loss at step 250: 0.06075688824057579 Loss at step 300: 0.05535752326250076 Loss at step 350: 0.040697090327739716 Loss at step 400: 0.05013677850365639 Loss at step 450: 0.04562435299158096 Loss at step 500: 0.06300367414951324 Loss at step 550: 0.058297473937273026 Loss at step 600: 0.057210858911275864 Loss at step 650: 0.0429699644446373 Loss at step 700: 0.03951922431588173 Loss at step 750: 0.040372706949710846 Loss at step 800: 0.07243381440639496 Loss at step 850: 0.042357850819826126 Loss at step 900: 0.06275133043527603 Mean training loss after epoch 213: 0.04736104125041824 EPOCH: 214 Loss at step 0: 0.04823967441916466 Loss at step 50: 0.04271997511386871 Loss at step 100: 0.05867461487650871 Loss at step 150: 0.040006738156080246 Loss at step 200: 0.038602590560913086 Loss at step 250: 0.05543293058872223 Loss at step 300: 0.05980774760246277 Loss at step 350: 0.07868692278862 Loss at step 400: 0.037251830101013184 Loss at step 450: 0.06000708416104317 Loss at step 500: 0.035632725805044174 Loss at step 550: 0.07344222068786621 Loss at step 600: 0.07107151299715042 Loss at step 650: 0.044583722949028015 Loss at step 700: 0.05919918417930603 Loss at step 750: 0.07906430214643478 Loss at step 800: 0.03575844690203667 Loss at step 850: 0.05633951723575592 Loss at step 900: 0.03508196398615837 Mean training loss after epoch 214: 0.0471287766205413 EPOCH: 215 Loss at step 0: 0.04046102985739708 Loss at step 50: 0.042585551738739014 Loss at step 100: 0.04168212413787842 Loss at step 150: 0.05540783330798149 Loss at step 200: 0.04076763242483139 Loss at step 250: 0.05739479511976242 Loss at step 300: 0.046748191118240356 Loss at step 350: 0.03953908383846283 Loss at step 400: 0.0399739146232605 Loss at step 450: 0.05111340805888176 Loss at step 500: 0.03534373641014099 Loss at step 550: 0.06724732369184494 Loss at step 600: 0.043774232268333435 Loss at step 650: 0.042513035237789154 Loss at step 700: 0.03826793655753136 Loss at step 750: 0.044402606785297394 Loss at step 800: 0.06714445352554321 Loss at step 850: 0.048950813710689545 Loss at step 900: 0.04296063259243965 Mean training loss after epoch 215: 0.04774343337752481 EPOCH: 216 Loss at step 0: 0.056830473244190216 Loss at step 50: 0.06376022845506668 Loss at step 100: 0.0689898207783699 Loss at step 150: 0.0693066269159317 Loss at step 200: 0.03821583092212677 Loss at step 250: 0.048734769225120544 Loss at step 300: 0.037521447986364365 Loss at step 350: 0.06530138105154037 Loss at step 400: 0.03950732573866844 Loss at step 450: 0.05898682773113251 Loss at step 500: 0.07157939672470093 Loss at step 550: 0.07091155648231506 Loss at step 600: 0.04708830267190933 Loss at step 650: 0.03792092949151993 Loss at step 700: 0.05443757772445679 Loss at step 750: 0.039305295795202255 Loss at step 800: 0.03608972206711769 Loss at step 850: 0.04272271692752838 Loss at step 900: 0.03666987270116806 Mean training loss after epoch 216: 0.04704990969109001 EPOCH: 217 Loss at step 0: 0.042405858635902405 Loss at step 50: 0.05129184573888779 Loss at step 100: 0.08605532348155975 Loss at step 150: 0.04266434162855148 Loss at step 200: 0.054785583168268204 Loss at step 250: 0.05168217048048973 Loss at step 300: 0.059577781707048416 Loss at step 350: 0.05209672451019287 Loss at step 400: 0.04376287758350372 Loss at step 450: 0.0466315783560276 Loss at step 500: 0.0592544786632061 Loss at step 550: 0.05789228156208992 Loss at step 600: 0.03829116001725197 Loss at step 650: 0.05691918358206749 Loss at step 700: 0.08026427775621414 Loss at step 750: 0.0458269938826561 Loss at step 800: 0.05861802026629448 Loss at step 850: 0.04791343957185745 Loss at step 900: 0.05900823697447777 Mean training loss after epoch 217: 0.04705899012193624 EPOCH: 218 Loss at step 0: 0.042350687086582184 Loss at step 50: 0.03730512037873268 Loss at step 100: 0.04864880442619324 Loss at step 150: 0.038556136190891266 Loss at step 200: 0.04171710088849068 Loss at step 250: 0.038677360862493515 Loss at step 300: 0.039633769541978836 Loss at step 350: 0.04151154309511185 Loss at step 400: 0.06591847538948059 Loss at step 450: 0.04054322466254234 Loss at step 500: 0.05021519958972931 Loss at step 550: 0.03431762009859085 Loss at step 600: 0.042313456535339355 Loss at step 650: 0.03982793167233467 Loss at step 700: 0.055177148431539536 Loss at step 750: 0.045199424028396606 Loss at step 800: 0.04076436161994934 Loss at step 850: 0.043578606098890305 Loss at step 900: 0.04540648311376572 Mean training loss after epoch 218: 0.0472543411262667 EPOCH: 219 Loss at step 0: 0.049522917717695236 Loss at step 50: 0.06123467907309532 Loss at step 100: 0.04030996188521385 Loss at step 150: 0.049802009016275406 Loss at step 200: 0.042643506079912186 Loss at step 250: 0.05701090395450592 Loss at step 300: 0.038618676364421844 Loss at step 350: 0.04276908189058304 Loss at step 400: 0.07048887014389038 Loss at step 450: 0.03926249220967293 Loss at step 500: 0.04026566445827484 Loss at step 550: 0.037922125309705734 Loss at step 600: 0.04127132520079613 Loss at step 650: 0.042277563363313675 Loss at step 700: 0.07616514712572098 Loss at step 750: 0.055292509496212006 Loss at step 800: 0.03903445228934288 Loss at step 850: 0.055496495217084885 Loss at step 900: 0.059140875935554504 Mean training loss after epoch 219: 0.047198199111023055 EPOCH: 220 Loss at step 0: 0.042261574417352676 Loss at step 50: 0.04846461862325668 Loss at step 100: 0.03854244574904442 Loss at step 150: 0.042459603399038315 Loss at step 200: 0.04127642884850502 Loss at step 250: 0.04681336134672165 Loss at step 300: 0.062138285487890244 Loss at step 350: 0.03955625742673874 Loss at step 400: 0.03840973973274231 Loss at step 450: 0.06735293567180634 Loss at step 500: 0.06193742901086807 Loss at step 550: 0.04835178330540657 Loss at step 600: 0.0530577078461647 Loss at step 650: 0.04401564970612526 Loss at step 700: 0.05972939357161522 Loss at step 750: 0.04794591665267944 Loss at step 800: 0.04655498266220093 Loss at step 850: 0.053476426750421524 Loss at step 900: 0.0553446002304554 Mean training loss after epoch 220: 0.04722481643332284 EPOCH: 221 Loss at step 0: 0.04751558601856232 Loss at step 50: 0.03640862926840782 Loss at step 100: 0.03456695005297661 Loss at step 150: 0.03806763514876366 Loss at step 200: 0.04262298718094826 Loss at step 250: 0.050645485520362854 Loss at step 300: 0.03692036494612694 Loss at step 350: 0.046122144907712936 Loss at step 400: 0.06016391143202782 Loss at step 450: 0.049537308514118195 Loss at step 500: 0.038896940648555756 Loss at step 550: 0.038577236235141754 Loss at step 600: 0.047283269464969635 Loss at step 650: 0.06385765224695206 Loss at step 700: 0.04248689487576485 Loss at step 750: 0.05711031332612038 Loss at step 800: 0.07690203934907913 Loss at step 850: 0.03918411582708359 Loss at step 900: 0.04322792589664459 Mean training loss after epoch 221: 0.04713553050632225 EPOCH: 222 Loss at step 0: 0.06176179274916649 Loss at step 50: 0.035625845193862915 Loss at step 100: 0.04051368311047554 Loss at step 150: 0.04114648699760437 Loss at step 200: 0.050225503742694855 Loss at step 250: 0.03932324796915054 Loss at step 300: 0.05761918053030968 Loss at step 350: 0.05663652718067169 Loss at step 400: 0.03831227496266365 Loss at step 450: 0.055668383836746216 Loss at step 500: 0.050077080726623535 Loss at step 550: 0.0605497732758522 Loss at step 600: 0.042925599962472916 Loss at step 650: 0.0754731148481369 Loss at step 700: 0.060030557215213776 Loss at step 750: 0.03840659186244011 Loss at step 800: 0.04251925274729729 Loss at step 850: 0.040340449661016464 Loss at step 900: 0.06368248909711838 Mean training loss after epoch 222: 0.04751194085774899 EPOCH: 223 Loss at step 0: 0.07379432767629623 Loss at step 50: 0.044914498925209045 Loss at step 100: 0.04297406226396561 Loss at step 150: 0.05870857834815979 Loss at step 200: 0.0390847809612751 Loss at step 250: 0.047212712466716766 Loss at step 300: 0.03864661604166031 Loss at step 350: 0.04263768717646599 Loss at step 400: 0.04849701374769211 Loss at step 450: 0.03625960275530815 Loss at step 500: 0.04604107886552811 Loss at step 550: 0.0468415804207325 Loss at step 600: 0.049165043979883194 Loss at step 650: 0.034130413085222244 Loss at step 700: 0.050103381276130676 Loss at step 750: 0.04004396125674248 Loss at step 800: 0.051551856100559235 Loss at step 850: 0.058846957981586456 Loss at step 900: 0.06271244585514069 Mean training loss after epoch 223: 0.04676675359641057 EPOCH: 224 Loss at step 0: 0.04676776006817818 Loss at step 50: 0.04975375160574913 Loss at step 100: 0.03663460910320282 Loss at step 150: 0.04950935021042824 Loss at step 200: 0.057808466255664825 Loss at step 250: 0.03898349776864052 Loss at step 300: 0.0442899689078331 Loss at step 350: 0.04924413189291954 Loss at step 400: 0.03369944542646408 Loss at step 450: 0.059274252504110336 Loss at step 500: 0.05799891799688339 Loss at step 550: 0.04325135797262192 Loss at step 600: 0.041511859744787216 Loss at step 650: 0.05939384922385216 Loss at step 700: 0.04226621240377426 Loss at step 750: 0.04376399517059326 Loss at step 800: 0.040188051760196686 Loss at step 850: 0.0465223491191864 Loss at step 900: 0.0385562963783741 Mean training loss after epoch 224: 0.0470213487204204 EPOCH: 225 Loss at step 0: 0.09225140511989594 Loss at step 50: 0.03754953294992447 Loss at step 100: 0.04231557995080948 Loss at step 150: 0.03679513931274414 Loss at step 200: 0.045663103461265564 Loss at step 250: 0.04242528975009918 Loss at step 300: 0.04686900973320007 Loss at step 350: 0.04403219372034073 Loss at step 400: 0.043443210422992706 Loss at step 450: 0.03729455918073654 Loss at step 500: 0.038864556699991226 Loss at step 550: 0.03740460053086281 Loss at step 600: 0.0359228141605854 Loss at step 650: 0.04111127555370331 Loss at step 700: 0.03803587332367897 Loss at step 750: 0.03938001021742821 Loss at step 800: 0.04360322654247284 Loss at step 850: 0.034894105046987534 Loss at step 900: 0.06122999265789986 Mean training loss after epoch 225: 0.04680429039193369 EPOCH: 226 Loss at step 0: 0.03958696499466896 Loss at step 50: 0.05706845223903656 Loss at step 100: 0.03954337537288666 Loss at step 150: 0.044338759034872055 Loss at step 200: 0.04363829270005226 Loss at step 250: 0.037388335913419724 Loss at step 300: 0.06261762231588364 Loss at step 350: 0.05542841553688049 Loss at step 400: 0.04188565909862518 Loss at step 450: 0.04823734238743782 Loss at step 500: 0.040371887385845184 Loss at step 550: 0.0392460897564888 Loss at step 600: 0.04178383946418762 Loss at step 650: 0.05720229819417 Loss at step 700: 0.047035105526447296 Loss at step 750: 0.03595872223377228 Loss at step 800: 0.03900788724422455 Loss at step 850: 0.041382789611816406 Loss at step 900: 0.06022444739937782 Mean training loss after epoch 226: 0.04666900050951474 EPOCH: 227 Loss at step 0: 0.05519401654601097 Loss at step 50: 0.06846608966588974 Loss at step 100: 0.040014397352933884 Loss at step 150: 0.03845193237066269 Loss at step 200: 0.03896557539701462 Loss at step 250: 0.03243701905012131 Loss at step 300: 0.04257216677069664 Loss at step 350: 0.04316389188170433 Loss at step 400: 0.04275691509246826 Loss at step 450: 0.05983002111315727 Loss at step 500: 0.07126575708389282 Loss at step 550: 0.041043106466531754 Loss at step 600: 0.04291343688964844 Loss at step 650: 0.042597029358148575 Loss at step 700: 0.04034142196178436 Loss at step 750: 0.048591841012239456 Loss at step 800: 0.03519866243004799 Loss at step 850: 0.039252836257219315 Loss at step 900: 0.04995473474264145 Mean training loss after epoch 227: 0.04681701377741119 EPOCH: 228 Loss at step 0: 0.05774597078561783 Loss at step 50: 0.04212473705410957 Loss at step 100: 0.04690399765968323 Loss at step 150: 0.051266469061374664 Loss at step 200: 0.042083319276571274 Loss at step 250: 0.04396282881498337 Loss at step 300: 0.04231863468885422 Loss at step 350: 0.04843040183186531 Loss at step 400: 0.045190420001745224 Loss at step 450: 0.046595796942710876 Loss at step 500: 0.03327831253409386 Loss at step 550: 0.0530402809381485 Loss at step 600: 0.047428008168935776 Loss at step 650: 0.04647507891058922 Loss at step 700: 0.04182441160082817 Loss at step 750: 0.04246092215180397 Loss at step 800: 0.042235080152750015 Loss at step 850: 0.04309145361185074 Loss at step 900: 0.04195234179496765 Mean training loss after epoch 228: 0.04631303328830105 EPOCH: 229 Loss at step 0: 0.034200433641672134 Loss at step 50: 0.040905553847551346 Loss at step 100: 0.050026677548885345 Loss at step 150: 0.04150225967168808 Loss at step 200: 0.04820965602993965 Loss at step 250: 0.04052750766277313 Loss at step 300: 0.06049512326717377 Loss at step 350: 0.03785371780395508 Loss at step 400: 0.041913799941539764 Loss at step 450: 0.03622917830944061 Loss at step 500: 0.0625196024775505 Loss at step 550: 0.05518995225429535 Loss at step 600: 0.04004140570759773 Loss at step 650: 0.052394621074199677 Loss at step 700: 0.0760142132639885 Loss at step 750: 0.03969212993979454 Loss at step 800: 0.043161891400814056 Loss at step 850: 0.05260682478547096 Loss at step 900: 0.041100166738033295 Mean training loss after epoch 229: 0.04702144621142637 EPOCH: 230 Loss at step 0: 0.04274732992053032 Loss at step 50: 0.03655349090695381 Loss at step 100: 0.04154234379529953 Loss at step 150: 0.06222398951649666 Loss at step 200: 0.033603768795728683 Loss at step 250: 0.057350628077983856 Loss at step 300: 0.051227226853370667 Loss at step 350: 0.05017005652189255 Loss at step 400: 0.0463779978454113 Loss at step 450: 0.06788396835327148 Loss at step 500: 0.047457605600357056 Loss at step 550: 0.03680429980158806 Loss at step 600: 0.07308177649974823 Loss at step 650: 0.04451019689440727 Loss at step 700: 0.0436924546957016 Loss at step 750: 0.04094655066728592 Loss at step 800: 0.04321802780032158 Loss at step 850: 0.04240492731332779 Loss at step 900: 0.044324152171611786 Mean training loss after epoch 230: 0.04693595296951499 EPOCH: 231 Loss at step 0: 0.077011838555336 Loss at step 50: 0.049050260335206985 Loss at step 100: 0.035662878304719925 Loss at step 150: 0.056335240602493286 Loss at step 200: 0.03805972635746002 Loss at step 250: 0.0425637885928154 Loss at step 300: 0.04273020476102829 Loss at step 350: 0.04020824283361435 Loss at step 400: 0.03797215595841408 Loss at step 450: 0.046008333563804626 Loss at step 500: 0.05754195898771286 Loss at step 550: 0.059893928468227386 Loss at step 600: 0.050522077828645706 Loss at step 650: 0.04332813248038292 Loss at step 700: 0.05094119906425476 Loss at step 750: 0.04222907871007919 Loss at step 800: 0.05393718555569649 Loss at step 850: 0.058369655162096024 Loss at step 900: 0.04152904078364372 Mean training loss after epoch 231: 0.0467399971674818 EPOCH: 232 Loss at step 0: 0.056472018361091614 Loss at step 50: 0.044975947588682175 Loss at step 100: 0.038062889128923416 Loss at step 150: 0.04001207277178764 Loss at step 200: 0.03452861309051514 Loss at step 250: 0.04083997383713722 Loss at step 300: 0.058205023407936096 Loss at step 350: 0.03531161695718765 Loss at step 400: 0.04119256138801575 Loss at step 450: 0.05590355396270752 Loss at step 500: 0.04196659475564957 Loss at step 550: 0.06378086656332016 Loss at step 600: 0.036614347249269485 Loss at step 650: 0.04451005160808563 Loss at step 700: 0.04146326333284378 Loss at step 750: 0.04682595282793045 Loss at step 800: 0.056496769189834595 Loss at step 850: 0.07417181879281998 Loss at step 900: 0.0559571348130703 Mean training loss after epoch 232: 0.046746908007924366 EPOCH: 233 Loss at step 0: 0.052893638610839844 Loss at step 50: 0.04275893419981003 Loss at step 100: 0.03851613029837608 Loss at step 150: 0.05271649733185768 Loss at step 200: 0.05183817818760872 Loss at step 250: 0.07465900480747223 Loss at step 300: 0.045488640666007996 Loss at step 350: 0.048312194645404816 Loss at step 400: 0.05506879463791847 Loss at step 450: 0.0552791953086853 Loss at step 500: 0.04169054701924324 Loss at step 550: 0.04270520433783531 Loss at step 600: 0.0364527627825737 Loss at step 650: 0.037993062287569046 Loss at step 700: 0.03590629622340202 Loss at step 750: 0.03904951736330986 Loss at step 800: 0.0436675064265728 Loss at step 850: 0.07506541907787323 Loss at step 900: 0.06502348929643631 Mean training loss after epoch 233: 0.047188489407952276 EPOCH: 234 Loss at step 0: 0.03947950154542923 Loss at step 50: 0.06498251110315323 Loss at step 100: 0.038409341126680374 Loss at step 150: 0.07260207086801529 Loss at step 200: 0.045634813606739044 Loss at step 250: 0.039023272693157196 Loss at step 300: 0.04403722286224365 Loss at step 350: 0.044081345200538635 Loss at step 400: 0.037234559655189514 Loss at step 450: 0.04357670992612839 Loss at step 500: 0.04109257832169533 Loss at step 550: 0.048434965312480927 Loss at step 600: 0.04012172669172287 Loss at step 650: 0.04455825313925743 Loss at step 700: 0.046630244702100754 Loss at step 750: 0.040894124656915665 Loss at step 800: 0.06056270748376846 Loss at step 850: 0.05497825890779495 Loss at step 900: 0.05939118564128876 Mean training loss after epoch 234: 0.04720159347202859 EPOCH: 235 Loss at step 0: 0.043093569576740265 Loss at step 50: 0.04563622549176216 Loss at step 100: 0.040590908378362656 Loss at step 150: 0.04995432496070862 Loss at step 200: 0.03777633234858513 Loss at step 250: 0.0643155500292778 Loss at step 300: 0.044979654252529144 Loss at step 350: 0.039629269391298294 Loss at step 400: 0.03717920929193497 Loss at step 450: 0.04431942105293274 Loss at step 500: 0.03875933960080147 Loss at step 550: 0.06447777152061462 Loss at step 600: 0.03880549967288971 Loss at step 650: 0.05315857380628586 Loss at step 700: 0.05185626447200775 Loss at step 750: 0.04476068913936615 Loss at step 800: 0.03508048877120018 Loss at step 850: 0.0395304411649704 Loss at step 900: 0.049435894936323166 Mean training loss after epoch 235: 0.047100306063620395 EPOCH: 236 Loss at step 0: 0.03923726826906204 Loss at step 50: 0.03880602493882179 Loss at step 100: 0.0606842041015625 Loss at step 150: 0.0531562864780426 Loss at step 200: 0.046995580196380615 Loss at step 250: 0.04806024208664894 Loss at step 300: 0.042916130274534225 Loss at step 350: 0.04453442990779877 Loss at step 400: 0.05408308282494545 Loss at step 450: 0.054751403629779816 Loss at step 500: 0.046013180166482925 Loss at step 550: 0.03507464751601219 Loss at step 600: 0.04322716221213341 Loss at step 650: 0.044683437794446945 Loss at step 700: 0.043309200555086136 Loss at step 750: 0.04138895869255066 Loss at step 800: 0.05969798564910889 Loss at step 850: 0.057232458144426346 Loss at step 900: 0.0631415992975235 Mean training loss after epoch 236: 0.046600844842125615 EPOCH: 237 Loss at step 0: 0.04473394155502319 Loss at step 50: 0.04247753694653511 Loss at step 100: 0.040516749024391174 Loss at step 150: 0.043633781373500824 Loss at step 200: 0.03232303261756897 Loss at step 250: 0.07127030938863754 Loss at step 300: 0.05909910053014755 Loss at step 350: 0.03548896685242653 Loss at step 400: 0.03727230802178383 Loss at step 450: 0.049851272255182266 Loss at step 500: 0.053802113980054855 Loss at step 550: 0.03563284873962402 Loss at step 600: 0.04298397898674011 Loss at step 650: 0.045844148844480515 Loss at step 700: 0.0368964709341526 Loss at step 750: 0.03845261409878731 Loss at step 800: 0.039223503321409225 Loss at step 850: 0.04677176848053932 Loss at step 900: 0.04032231867313385 Mean training loss after epoch 237: 0.04664867008521931 EPOCH: 238 Loss at step 0: 0.04714509844779968 Loss at step 50: 0.05848288908600807 Loss at step 100: 0.05603637546300888 Loss at step 150: 0.04112808778882027 Loss at step 200: 0.05040885880589485 Loss at step 250: 0.036397550255060196 Loss at step 300: 0.04182954505085945 Loss at step 350: 0.03286342695355415 Loss at step 400: 0.05466403067111969 Loss at step 450: 0.048954788595438004 Loss at step 500: 0.04262811318039894 Loss at step 550: 0.03888767957687378 Loss at step 600: 0.040273357182741165 Loss at step 650: 0.04039822891354561 Loss at step 700: 0.04489091783761978 Loss at step 750: 0.038410257548093796 Loss at step 800: 0.041259873658418655 Loss at step 850: 0.03967861086130142 Loss at step 900: 0.03711875155568123 Mean training loss after epoch 238: 0.046608244650749 EPOCH: 239 Loss at step 0: 0.046363215893507004 Loss at step 50: 0.04513181000947952 Loss at step 100: 0.035453297197818756 Loss at step 150: 0.04779130965471268 Loss at step 200: 0.048856884241104126 Loss at step 250: 0.046283263713121414 Loss at step 300: 0.039166733622550964 Loss at step 350: 0.03872600197792053 Loss at step 400: 0.036717865616083145 Loss at step 450: 0.037559349089860916 Loss at step 500: 0.05537824705243111 Loss at step 550: 0.06261266767978668 Loss at step 600: 0.043737784028053284 Loss at step 650: 0.041823312640190125 Loss at step 700: 0.04186220094561577 Loss at step 750: 0.03983394801616669 Loss at step 800: 0.03957657516002655 Loss at step 850: 0.053769148886203766 Loss at step 900: 0.04092887043952942 Mean training loss after epoch 239: 0.04716429725162256 EPOCH: 240 Loss at step 0: 0.04034369811415672 Loss at step 50: 0.042229242622852325 Loss at step 100: 0.054793696850538254 Loss at step 150: 0.04283568263053894 Loss at step 200: 0.03461220860481262 Loss at step 250: 0.05163809657096863 Loss at step 300: 0.04255881533026695 Loss at step 350: 0.052954867482185364 Loss at step 400: 0.048525333404541016 Loss at step 450: 0.04447805508971214 Loss at step 500: 0.042359642684459686 Loss at step 550: 0.06020662933588028 Loss at step 600: 0.056461531668901443 Loss at step 650: 0.046871915459632874 Loss at step 700: 0.04288206994533539 Loss at step 750: 0.04234183952212334 Loss at step 800: 0.05304999276995659 Loss at step 850: 0.06013248860836029 Loss at step 900: 0.0751088559627533 Mean training loss after epoch 240: 0.0467108068888439 EPOCH: 241 Loss at step 0: 0.05541389808058739 Loss at step 50: 0.040862616151571274 Loss at step 100: 0.037520140409469604 Loss at step 150: 0.0429876372218132 Loss at step 200: 0.056682292371988297 Loss at step 250: 0.046041712164878845 Loss at step 300: 0.03782288730144501 Loss at step 350: 0.056302476674318314 Loss at step 400: 0.03928373008966446 Loss at step 450: 0.0392586775124073 Loss at step 500: 0.04364370182156563 Loss at step 550: 0.03423343226313591 Loss at step 600: 0.04018398001790047 Loss at step 650: 0.058480676263570786 Loss at step 700: 0.047578129917383194 Loss at step 750: 0.04933265596628189 Loss at step 800: 0.04072823375463486 Loss at step 850: 0.03966128081083298 Loss at step 900: 0.0420275516808033 Mean training loss after epoch 241: 0.046515306517450034 EPOCH: 242 Loss at step 0: 0.043020859360694885 Loss at step 50: 0.04488405957818031 Loss at step 100: 0.039506617933511734 Loss at step 150: 0.03761477768421173 Loss at step 200: 0.03995436057448387 Loss at step 250: 0.04392188787460327 Loss at step 300: 0.036959242075681686 Loss at step 350: 0.03620525822043419 Loss at step 400: 0.03635586425662041 Loss at step 450: 0.04001782089471817 Loss at step 500: 0.03787391632795334 Loss at step 550: 0.04044404625892639 Loss at step 600: 0.041857484728097916 Loss at step 650: 0.05722949281334877 Loss at step 700: 0.04221242666244507 Loss at step 750: 0.039986900985240936 Loss at step 800: 0.08643443882465363 Loss at step 850: 0.0417332649230957 Loss at step 900: 0.046174924820661545 Mean training loss after epoch 242: 0.047045410028907025 EPOCH: 243 Loss at step 0: 0.055201053619384766 Loss at step 50: 0.04406515508890152 Loss at step 100: 0.0323408767580986 Loss at step 150: 0.04328785836696625 Loss at step 200: 0.03778066858649254 Loss at step 250: 0.04081784933805466 Loss at step 300: 0.04484756663441658 Loss at step 350: 0.060414768755435944 Loss at step 400: 0.04330456256866455 Loss at step 450: 0.04083850234746933 Loss at step 500: 0.052029404789209366 Loss at step 550: 0.0488678440451622 Loss at step 600: 0.04068155586719513 Loss at step 650: 0.0447661392390728 Loss at step 700: 0.040958236902952194 Loss at step 750: 0.039343465119600296 Loss at step 800: 0.048806093633174896 Loss at step 850: 0.05552810803055763 Loss at step 900: 0.043618131428956985 Mean training loss after epoch 243: 0.04648467521868281 EPOCH: 244 Loss at step 0: 0.042332351207733154 Loss at step 50: 0.04474608972668648 Loss at step 100: 0.044558651745319366 Loss at step 150: 0.04339471086859703 Loss at step 200: 0.04256226867437363 Loss at step 250: 0.05371783673763275 Loss at step 300: 0.0492573156952858 Loss at step 350: 0.038463935256004333 Loss at step 400: 0.05593513697385788 Loss at step 450: 0.052630715072155 Loss at step 500: 0.06187453493475914 Loss at step 550: 0.04681934416294098 Loss at step 600: 0.05897469446063042 Loss at step 650: 0.04355793446302414 Loss at step 700: 0.037925753742456436 Loss at step 750: 0.0479317381978035 Loss at step 800: 0.04982185363769531 Loss at step 850: 0.039181407541036606 Loss at step 900: 0.051154956221580505 Mean training loss after epoch 244: 0.0466384645293294 EPOCH: 245 Loss at step 0: 0.043584395200014114 Loss at step 50: 0.02752978913486004 Loss at step 100: 0.06282732635736465 Loss at step 150: 0.04680536314845085 Loss at step 200: 0.05102938413619995 Loss at step 250: 0.0554245188832283 Loss at step 300: 0.04456479474902153 Loss at step 350: 0.0377659946680069 Loss at step 400: 0.04321426525712013 Loss at step 450: 0.04434271156787872 Loss at step 500: 0.054085828363895416 Loss at step 550: 0.04823891445994377 Loss at step 600: 0.040322136133909225 Loss at step 650: 0.05315065383911133 Loss at step 700: 0.0364287830889225 Loss at step 750: 0.0457751601934433 Loss at step 800: 0.047259584069252014 Loss at step 850: 0.047946520149707794 Loss at step 900: 0.05827641859650612 Mean training loss after epoch 245: 0.04614305966047209 EPOCH: 246 Loss at step 0: 0.04589905962347984 Loss at step 50: 0.0424797348678112 Loss at step 100: 0.03408568352460861 Loss at step 150: 0.036884963512420654 Loss at step 200: 0.06338883936405182 Loss at step 250: 0.043746206909418106 Loss at step 300: 0.04174879193305969 Loss at step 350: 0.035894837230443954 Loss at step 400: 0.08393120765686035 Loss at step 450: 0.03932879492640495 Loss at step 500: 0.037524834275245667 Loss at step 550: 0.05414777621626854 Loss at step 600: 0.042327918112277985 Loss at step 650: 0.04759960621595383 Loss at step 700: 0.04520302265882492 Loss at step 750: 0.04365616291761398 Loss at step 800: 0.046100106090307236 Loss at step 850: 0.05007437616586685 Loss at step 900: 0.06000826507806778 Mean training loss after epoch 246: 0.046613718105007466 EPOCH: 247 Loss at step 0: 0.0436004213988781 Loss at step 50: 0.040224067866802216 Loss at step 100: 0.0361749529838562 Loss at step 150: 0.04980206862092018 Loss at step 200: 0.039048559963703156 Loss at step 250: 0.03741644322872162 Loss at step 300: 0.05382959917187691 Loss at step 350: 0.03957396000623703 Loss at step 400: 0.05043027177453041 Loss at step 450: 0.04557007923722267 Loss at step 500: 0.051771774888038635 Loss at step 550: 0.059253934770822525 Loss at step 600: 0.03973615542054176 Loss at step 650: 0.04193766787648201 Loss at step 700: 0.04148926958441734 Loss at step 750: 0.04530224949121475 Loss at step 800: 0.04536725580692291 Loss at step 850: 0.04929728806018829 Loss at step 900: 0.047313105314970016 Mean training loss after epoch 247: 0.0467028823286009 EPOCH: 248 Loss at step 0: 0.04395918548107147 Loss at step 50: 0.04333007335662842 Loss at step 100: 0.03953546658158302 Loss at step 150: 0.03913760930299759 Loss at step 200: 0.04378338158130646 Loss at step 250: 0.035230159759521484 Loss at step 300: 0.04072575643658638 Loss at step 350: 0.05845290794968605 Loss at step 400: 0.045099806040525436 Loss at step 450: 0.06069492548704147 Loss at step 500: 0.03815390169620514 Loss at step 550: 0.0403125137090683 Loss at step 600: 0.050567470490932465 Loss at step 650: 0.041751179844141006 Loss at step 700: 0.03756287321448326 Loss at step 750: 0.07715876400470734 Loss at step 800: 0.040058109909296036 Loss at step 850: 0.040386416018009186 Loss at step 900: 0.04387412592768669 Mean training loss after epoch 248: 0.0466029432608184 EPOCH: 249 Loss at step 0: 0.0424538217484951 Loss at step 50: 0.07103290408849716 Loss at step 100: 0.053155262023210526 Loss at step 150: 0.041889920830726624 Loss at step 200: 0.03708808869123459 Loss at step 250: 0.061525918543338776 Loss at step 300: 0.057526372373104095 Loss at step 350: 0.051409341394901276 Loss at step 400: 0.042662862688302994 Loss at step 450: 0.04036589339375496 Loss at step 500: 0.04930693656206131 Loss at step 550: 0.05353317782282829 Loss at step 600: 0.03686092793941498 Loss at step 650: 0.050817620009183884 Loss at step 700: 0.05220216140151024 Loss at step 750: 0.04239517077803612 Loss at step 800: 0.06913333386182785 Loss at step 850: 0.04196493700146675 Loss at step 900: 0.04706023633480072 Mean training loss after epoch 249: 0.04695936385145002 EPOCH: 250 Loss at step 0: 0.053577352315187454 Loss at step 50: 0.08183886855840683 Loss at step 100: 0.04055812209844589 Loss at step 150: 0.03427647426724434 Loss at step 200: 0.07296517491340637 Loss at step 250: 0.041054867208004 Loss at step 300: 0.04080365598201752 Loss at step 350: 0.04531676322221756 Loss at step 400: 0.05988508090376854 Loss at step 450: 0.056230757385492325 Loss at step 500: 0.042878441512584686 Loss at step 550: 0.05008154362440109 Loss at step 600: 0.05312185734510422 Loss at step 650: 0.044707220047712326 Loss at step 700: 0.04215612635016441 Loss at step 750: 0.040725406259298325 Loss at step 800: 0.051624845713377 Loss at step 850: 0.0405830442905426 Loss at step 900: 0.06841656565666199 Mean training loss after epoch 250: 0.04655080900263431 EPOCH: 251 Loss at step 0: 0.04503781348466873 Loss at step 50: 0.04293043538928032 Loss at step 100: 0.04203268885612488 Loss at step 150: 0.043018095195293427 Loss at step 200: 0.04455701634287834 Loss at step 250: 0.06907901167869568 Loss at step 300: 0.054588064551353455 Loss at step 350: 0.039414286613464355 Loss at step 400: 0.04302799329161644 Loss at step 450: 0.04112699627876282 Loss at step 500: 0.0388084277510643 Loss at step 550: 0.04138350859284401 Loss at step 600: 0.04518958553671837 Loss at step 650: 0.05055014044046402 Loss at step 700: 0.04376186430454254 Loss at step 750: 0.04757804051041603 Loss at step 800: 0.041471898555755615 Loss at step 850: 0.04145321622490883 Loss at step 900: 0.05633110925555229 Mean training loss after epoch 251: 0.04692487044732517 EPOCH: 252 Loss at step 0: 0.039264194667339325 Loss at step 50: 0.044433195143938065 Loss at step 100: 0.08431346714496613 Loss at step 150: 0.04697263240814209 Loss at step 200: 0.050439223647117615 Loss at step 250: 0.03841196745634079 Loss at step 300: 0.04193376749753952 Loss at step 350: 0.03764447942376137 Loss at step 400: 0.06420301645994186 Loss at step 450: 0.03974910080432892 Loss at step 500: 0.054245129227638245 Loss at step 550: 0.047174159437417984 Loss at step 600: 0.0575483962893486 Loss at step 650: 0.045981522649526596 Loss at step 700: 0.040542807430028915 Loss at step 750: 0.03865841031074524 Loss at step 800: 0.06457774341106415 Loss at step 850: 0.04819154739379883 Loss at step 900: 0.041227903217077255 Mean training loss after epoch 252: 0.04676203306184522 EPOCH: 253 Loss at step 0: 0.047092460095882416 Loss at step 50: 0.04392985999584198 Loss at step 100: 0.041745856404304504 Loss at step 150: 0.039194490760564804 Loss at step 200: 0.04131277650594711 Loss at step 250: 0.03870227560400963 Loss at step 300: 0.04857826232910156 Loss at step 350: 0.062004320323467255 Loss at step 400: 0.061412565410137177 Loss at step 450: 0.04076113551855087 Loss at step 500: 0.045095499604940414 Loss at step 550: 0.07241609692573547 Loss at step 600: 0.03968914970755577 Loss at step 650: 0.03798513114452362 Loss at step 700: 0.07444627583026886 Loss at step 750: 0.047429393976926804 Loss at step 800: 0.03645297884941101 Loss at step 850: 0.058835308998823166 Loss at step 900: 0.07364003360271454 Mean training loss after epoch 253: 0.04624672042829459 EPOCH: 254 Loss at step 0: 0.04062104970216751 Loss at step 50: 0.04671759903430939 Loss at step 100: 0.04618627205491066 Loss at step 150: 0.03837088868021965 Loss at step 200: 0.05416730046272278 Loss at step 250: 0.05831977725028992 Loss at step 300: 0.05125100538134575 Loss at step 350: 0.04097776114940643 Loss at step 400: 0.0594232976436615 Loss at step 450: 0.05617247521877289 Loss at step 500: 0.03688930720090866 Loss at step 550: 0.042152997106313705 Loss at step 600: 0.03977837413549423 Loss at step 650: 0.03640800341963768 Loss at step 700: 0.035886120051145554 Loss at step 750: 0.04342680051922798 Loss at step 800: 0.04812350869178772 Loss at step 850: 0.0719207152724266 Loss at step 900: 0.04773256182670593 Mean training loss after epoch 254: 0.046415297965854724 EPOCH: 255 Loss at step 0: 0.04461383819580078 Loss at step 50: 0.04794487729668617 Loss at step 100: 0.045477475970983505 Loss at step 150: 0.03540829196572304 Loss at step 200: 0.0431925505399704 Loss at step 250: 0.05770278349518776 Loss at step 300: 0.042712077498435974 Loss at step 350: 0.04346690699458122 Loss at step 400: 0.03652375191450119 Loss at step 450: 0.057763006538152695 Loss at step 500: 0.03898652642965317 Loss at step 550: 0.049082666635513306 Loss at step 600: 0.055002231150865555 Loss at step 650: 0.04125279560685158 Loss at step 700: 0.05654747411608696 Loss at step 750: 0.04331617057323456 Loss at step 800: 0.04440425708889961 Loss at step 850: 0.076203353703022 Loss at step 900: 0.04607589170336723 Mean training loss after epoch 255: 0.046819980949290525 EPOCH: 256 Loss at step 0: 0.04553733393549919 Loss at step 50: 0.056667495518922806 Loss at step 100: 0.06272846460342407 Loss at step 150: 0.053540412336587906 Loss at step 200: 0.05639112740755081 Loss at step 250: 0.04135221242904663 Loss at step 300: 0.0559735931456089 Loss at step 350: 0.04206572845578194 Loss at step 400: 0.05807684734463692 Loss at step 450: 0.04875210300087929 Loss at step 500: 0.04416326433420181 Loss at step 550: 0.06146674230694771 Loss at step 600: 0.039454881101846695 Loss at step 650: 0.040558796375989914 Loss at step 700: 0.054772958159446716 Loss at step 750: 0.04770456254482269 Loss at step 800: 0.046651896089315414 Loss at step 850: 0.04274526983499527 Loss at step 900: 0.061705056577920914 Mean training loss after epoch 256: 0.04698430569663739 /athenahomes/gabrijel/miniconda3/envs/track-generator/lib/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/athenahomes/gabrijel/miniconda3/envs/track-generator/lib/python3.11/site-packages/torchvision/image.so: undefined symbol: _ZN3c1017RegisterOperatorsD1Ev'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source? warn( Schedule: linear Cfg: True Output path: /scratch/shared/beegfs/gabrijel/m2l/mini Patch Size: 2 Device: cuda:6 ===================================================================================== Layer (type:depth-idx) Param # ===================================================================================== DiT 75,264 ├─PatchEmbed: 1-1 -- │ └─Conv2d: 2-1 1,920 ├─TimestepEmbedder: 1-2 -- │ └─Mlp: 2-2 -- │ │ └─Linear: 3-1 98,688 │ │ └─SiLU: 3-2 -- │ │ └─Linear: 3-3 147,840 ├─LabelEmbedder: 1-3 -- │ └─Embedding: 2-3 4,224 ├─ModuleList: 1-4 -- │ └─DiTBlock: 2-4 -- │ │ └─LayerNorm: 3-4 -- │ │ └─MultiheadAttention: 3-5 591,360 │ │ └─LayerNorm: 3-6 -- │ │ └─Mlp: 3-7 1,181,568 │ │ └─Sequential: 3-8 887,040 │ └─DiTBlock: 2-5 -- │ │ └─LayerNorm: 3-9 -- │ │ └─MultiheadAttention: 3-10 591,360 │ │ └─LayerNorm: 3-11 -- │ │ └─Mlp: 3-12 1,181,568 │ │ └─Sequential: 3-13 887,040 │ └─DiTBlock: 2-6 -- │ │ └─LayerNorm: 3-14 -- │ │ └─MultiheadAttention: 3-15 591,360 │ │ └─LayerNorm: 3-16 -- │ │ └─Mlp: 3-17 1,181,568 │ │ └─Sequential: 3-18 887,040 │ └─DiTBlock: 2-7 -- │ │ └─LayerNorm: 3-19 -- │ │ └─MultiheadAttention: 3-20 591,360 │ │ └─LayerNorm: 3-21 -- │ │ └─Mlp: 3-22 1,181,568 │ │ └─Sequential: 3-23 887,040 │ └─DiTBlock: 2-8 -- │ │ └─LayerNorm: 3-24 -- │ │ └─MultiheadAttention: 3-25 591,360 │ │ └─LayerNorm: 3-26 -- │ │ └─Mlp: 3-27 1,181,568 │ │ └─Sequential: 3-28 887,040 │ └─DiTBlock: 2-9 -- │ │ └─LayerNorm: 3-29 -- │ │ └─MultiheadAttention: 3-30 591,360 │ │ └─LayerNorm: 3-31 -- │ │ └─Mlp: 3-32 1,181,568 │ │ └─Sequential: 3-33 887,040 ├─FinalLayer: 1-5 -- │ └─LayerNorm: 2-10 -- │ └─Linear: 2-11 1,540 │ └─Sequential: 2-12 -- │ │ └─SiLU: 3-34 -- │ │ └─Linear: 3-35 295,680 ├─Unpatchify: 1-6 -- ===================================================================================== Total params: 16,584,964 Trainable params: 16,509,700 Non-trainable params: 75,264 ===================================================================================== EPOCH: 1 Loss at step 0: 0.9900221228599548 Loss at step 50: 0.26053664088249207 Loss at step 100: 0.1877361387014389 Loss at step 150: 0.16752423346042633 Loss at step 200: 0.13426224887371063 Loss at step 250: 0.16180236637592316 Loss at step 300: 0.14377865195274353 Loss at step 350: 0.13125309348106384 Loss at step 400: 0.12112417072057724 Loss at step 450: 0.11843827366828918 Loss at step 500: 0.12789297103881836 Loss at step 550: 0.11653564870357513 Loss at step 600: 0.10974673926830292 Loss at step 650: 0.11059190332889557 Loss at step 700: 0.10060910880565643 Loss at step 750: 0.09780599176883698 Loss at step 800: 0.10769357532262802 Loss at step 850: 0.08873921632766724 Loss at step 900: 0.08759167790412903 Mean training loss after epoch 1: 0.15817816070974 EPOCH: 2 Loss at step 0: 0.10410167276859283 Loss at step 50: 0.11506811529397964 Loss at step 100: 0.0794951543211937 Loss at step 150: 0.08876204490661621 Loss at step 200: 0.07578525692224503 Loss at step 250: 0.07880866527557373 Loss at step 300: 0.06640154123306274 Loss at step 350: 0.0734575092792511 Loss at step 400: 0.08329509198665619 Loss at step 450: 0.09704723954200745 Loss at step 500: 0.07437891513109207 Loss at step 550: 0.08141027390956879 Loss at step 600: 0.071867436170578 Loss at step 650: 0.09286921471357346 Loss at step 700: 0.07970961928367615 Loss at step 750: 0.0648030936717987 Loss at step 800: 0.07334072887897491 Loss at step 850: 0.06517716497182846 Loss at step 900: 0.07280310988426208 Mean training loss after epoch 2: 0.08173069849943937 EPOCH: 3 Loss at step 0: 0.08989976346492767 Loss at step 50: 0.06721095740795135 Loss at step 100: 0.06641840189695358 Loss at step 150: 0.06260210275650024 Loss at step 200: 0.058850307017564774 Loss at step 250: 0.08398021012544632 Loss at step 300: 0.09923350811004639 Loss at step 350: 0.0697428360581398 Loss at step 400: 0.07186858355998993 Loss at step 450: 0.0651426762342453 Loss at step 500: 0.05896075442433357 Loss at step 550: 0.06548614054918289 Loss at step 600: 0.07064562290906906 Loss at step 650: 0.06531503796577454 Loss at step 700: 0.06172043830156326 Loss at step 750: 0.08369684964418411 Loss at step 800: 0.0666690468788147 Loss at step 850: 0.07950199395418167 Loss at step 900: 0.07981020212173462 Mean training loss after epoch 3: 0.0731523244627821 EPOCH: 4 Loss at step 0: 0.06528396159410477 Loss at step 50: 0.0715896412730217 Loss at step 100: 0.05910539627075195 Loss at step 150: 0.059343717992305756 Loss at step 200: 0.07979817688465118 Loss at step 250: 0.08106307685375214 Loss at step 300: 0.0650542750954628 Loss at step 350: 0.10218542069196701 Loss at step 400: 0.08511245250701904 Loss at step 450: 0.06320608407258987 Loss at step 500: 0.06818963587284088 Loss at step 550: 0.06453108042478561 Loss at step 600: 0.07997074723243713 Loss at step 650: 0.09118005633354187 Loss at step 700: 0.0629737377166748 Loss at step 750: 0.06995861977338791 Loss at step 800: 0.06347789615392685 Loss at step 850: 0.07784655690193176 Loss at step 900: 0.0836794525384903 Mean training loss after epoch 4: 0.06976637033733732 EPOCH: 5 Loss at step 0: 0.06683993339538574 Loss at step 50: 0.08440276235342026 Loss at step 100: 0.060900069773197174 Loss at step 150: 0.07818977534770966 Loss at step 200: 0.057267721742391586 Loss at step 250: 0.06707001477479935 Loss at step 300: 0.06638924032449722 Loss at step 350: 0.061096593737602234 Loss at step 400: 0.06031990796327591 Loss at step 450: 0.05881478264927864 Loss at step 500: 0.06828563660383224 Loss at step 550: 0.05629601329565048 Loss at step 600: 0.05650922283530235 Loss at step 650: 0.05970834195613861 Loss at step 700: 0.06717745214700699 Loss at step 750: 0.06325725466012955 Loss at step 800: 0.05908304080367088 Loss at step 850: 0.05927037075161934 Loss at step 900: 0.06529387831687927 Mean training loss after epoch 5: 0.06656247441734332 EPOCH: 6 Loss at step 0: 0.07471832633018494 Loss at step 50: 0.053293004631996155 Loss at step 100: 0.06814439594745636 Loss at step 150: 0.06267695873975754 Loss at step 200: 0.07520017772912979 Loss at step 250: 0.05756623297929764 Loss at step 300: 0.08668671548366547 Loss at step 350: 0.05761696398258209 Loss at step 400: 0.05845346301794052 Loss at step 450: 0.05361559987068176 Loss at step 500: 0.0545581579208374 Loss at step 550: 0.06865722686052322 Loss at step 600: 0.05494900420308113 Loss at step 650: 0.05566301941871643 Loss at step 700: 0.07576145231723785 Loss at step 750: 0.056802812963724136 Loss at step 800: 0.07006783038377762 Loss at step 850: 0.06922551244497299 Loss at step 900: 0.07127238065004349 Mean training loss after epoch 6: 0.06240604378616632 EPOCH: 7 Loss at step 0: 0.05351852625608444 Loss at step 50: 0.058600328862667084 Loss at step 100: 0.05950756371021271 Loss at step 150: 0.0671197921037674 Loss at step 200: 0.06352273374795914 Loss at step 250: 0.05097755044698715 Loss at step 300: 0.05459808185696602 Loss at step 350: 0.05407089367508888 Loss at step 400: 0.0714799240231514 Loss at step 450: 0.05385363847017288 Loss at step 500: 0.07540430873632431 Loss at step 550: 0.055351242423057556 Loss at step 600: 0.057140953838825226 Loss at step 650: 0.06574184447526932 Loss at step 700: 0.06367585062980652 Loss at step 750: 0.06031997129321098 Loss at step 800: 0.05662156641483307 Loss at step 850: 0.058937765657901764 Loss at step 900: 0.04628606513142586 Mean training loss after epoch 7: 0.06069141083450587 EPOCH: 8 Loss at step 0: 0.06337548047304153 Loss at step 50: 0.04743544012308121 Loss at step 100: 0.05359313637018204 Loss at step 150: 0.06011583283543587 Loss at step 200: 0.06013641506433487 Loss at step 250: 0.05289948731660843 Loss at step 300: 0.05319839343428612 Loss at step 350: 0.07368673384189606 Loss at step 400: 0.050813931971788406 Loss at step 450: 0.05384664610028267 Loss at step 500: 0.05393756926059723 Loss at step 550: 0.049860429018735886 Loss at step 600: 0.04961004853248596 Loss at step 650: 0.05552338808774948 Loss at step 700: 0.06309756636619568 Loss at step 750: 0.04827840253710747 Loss at step 800: 0.05685289576649666 Loss at step 850: 0.05250624939799309 Loss at step 900: 0.052172064781188965 Mean training loss after epoch 8: 0.05963837629410504 EPOCH: 9 Loss at step 0: 0.05447672680020332 Loss at step 50: 0.05453566834330559 Loss at step 100: 0.055621594190597534 Loss at step 150: 0.05637390539050102 Loss at step 200: 0.05907885730266571 Loss at step 250: 0.05867369845509529 Loss at step 300: 0.04663334786891937 Loss at step 350: 0.06715775281190872 Loss at step 400: 0.052095770835876465 Loss at step 450: 0.05039728432893753 Loss at step 500: 0.052317939698696136 Loss at step 550: 0.053312335163354874 Loss at step 600: 0.0628514289855957 Loss at step 650: 0.05169586464762688 Loss at step 700: 0.0460006408393383 Loss at step 750: 0.05597761645913124 Loss at step 800: 0.05459621548652649 Loss at step 850: 0.07205402106046677 Loss at step 900: 0.08359582722187042 Mean training loss after epoch 9: 0.05765084499346295 EPOCH: 10 Loss at step 0: 0.05655226483941078 Loss at step 50: 0.048913758248090744 Loss at step 100: 0.05492245405912399 Loss at step 150: 0.05230483040213585 Loss at step 200: 0.05209646373987198 Loss at step 250: 0.05794297903776169 Loss at step 300: 0.05258149281144142 Loss at step 350: 0.052122872322797775 Loss at step 400: 0.049483589828014374 Loss at step 450: 0.051203303039073944 Loss at step 500: 0.047938209027051926 Loss at step 550: 0.0428009033203125 Loss at step 600: 0.052013326436281204 Loss at step 650: 0.04760659486055374 Loss at step 700: 0.0570572130382061 Loss at step 750: 0.06020286679267883 Loss at step 800: 0.07187081128358841 Loss at step 850: 0.05805287882685661 Loss at step 900: 0.04751009866595268 Mean training loss after epoch 10: 0.056692671690032934 EPOCH: 11 Loss at step 0: 0.05386165529489517 Loss at step 50: 0.05422254279255867 Loss at step 100: 0.057490553706884384 Loss at step 150: 0.044205863028764725 Loss at step 200: 0.06188994273543358 Loss at step 250: 0.04439062997698784 Loss at step 300: 0.05631803721189499 Loss at step 350: 0.05760139226913452 Loss at step 400: 0.048933807760477066 Loss at step 450: 0.06113451346755028 Loss at step 500: 0.056232962757349014 Loss at step 550: 0.06440309435129166 Loss at step 600: 0.05474826693534851 Loss at step 650: 0.05499072000384331 Loss at step 700: 0.05624644085764885 Loss at step 750: 0.054855700582265854 Loss at step 800: 0.04982941225171089 Loss at step 850: 0.05728418752551079 Loss at step 900: 0.051657743752002716 Mean training loss after epoch 11: 0.05565160312759342 EPOCH: 12 Loss at step 0: 0.053275711834430695 Loss at step 50: 0.06455083191394806 Loss at step 100: 0.04895901679992676 Loss at step 150: 0.04603651911020279 Loss at step 200: 0.04748505726456642 Loss at step 250: 0.04844500869512558 Loss at step 300: 0.051398929208517075 Loss at step 350: 0.05038746818900108 Loss at step 400: 0.05069694295525551 Loss at step 450: 0.05893038213253021 Loss at step 500: 0.062313683331012726 Loss at step 550: 0.06718035787343979 Loss at step 600: 0.048768989741802216 Loss at step 650: 0.0656282901763916 Loss at step 700: 0.051334448158741 Loss at step 750: 0.04673048481345177 Loss at step 800: 0.04831412434577942 Loss at step 850: 0.047983504831790924 Loss at step 900: 0.09374847263097763 Mean training loss after epoch 12: 0.05530917990420546 EPOCH: 13 Loss at step 0: 0.05954428017139435 Loss at step 50: 0.07312961667776108 Loss at step 100: 0.06694300472736359 Loss at step 150: 0.04554149508476257 Loss at step 200: 0.053379930555820465 Loss at step 250: 0.05023263394832611 Loss at step 300: 0.06807883083820343 Loss at step 350: 0.050012968480587006 Loss at step 400: 0.0535954013466835 Loss at step 450: 0.055984463542699814 Loss at step 500: 0.044035062193870544 Loss at step 550: 0.0764070451259613 Loss at step 600: 0.04339207336306572 Loss at step 650: 0.0500074066221714 Loss at step 700: 0.05217390134930611 Loss at step 750: 0.04710053279995918 Loss at step 800: 0.07175597548484802 Loss at step 850: 0.055902156978845596 Loss at step 900: 0.04478731006383896 Mean training loss after epoch 13: 0.05418686110025911 EPOCH: 14 Loss at step 0: 0.05249132215976715 Loss at step 50: 0.046420637518167496 Loss at step 100: 0.06697291135787964 Loss at step 150: 0.046893730759620667 Loss at step 200: 0.04473346844315529 Loss at step 250: 0.04798652604222298 Loss at step 300: 0.051287319511175156 Loss at step 350: 0.0490957610309124 Loss at step 400: 0.06688471883535385 Loss at step 450: 0.05241381376981735 Loss at step 500: 0.05634472146630287 Loss at step 550: 0.06949912011623383 Loss at step 600: 0.048228029161691666 Loss at step 650: 0.04295073077082634 Loss at step 700: 0.05262024328112602 Loss at step 750: 0.05323537066578865 Loss at step 800: 0.04349362850189209 Loss at step 850: 0.05458184704184532 Loss at step 900: 0.05282286927103996 Mean training loss after epoch 14: 0.05379633725817397 EPOCH: 15 Loss at step 0: 0.0491589792072773 Loss at step 50: 0.05669986829161644 Loss at step 100: 0.04453291743993759 Loss at step 150: 0.04781879484653473 Loss at step 200: 0.0710807591676712 Loss at step 250: 0.06223907321691513 Loss at step 300: 0.052085068076848984 Loss at step 350: 0.06384319812059402 Loss at step 400: 0.058555278927087784 Loss at step 450: 0.047311265021562576 Loss at step 500: 0.060577597469091415 Loss at step 550: 0.045497942715883255 Loss at step 600: 0.050803445279598236 Loss at step 650: 0.05076170712709427 Loss at step 700: 0.04958857595920563 Loss at step 750: 0.0437324121594429 Loss at step 800: 0.04795270413160324 Loss at step 850: 0.06756742298603058 Loss at step 900: 0.07460088282823563 Mean training loss after epoch 15: 0.053468067564372064 EPOCH: 16 Loss at step 0: 0.05459015443921089 Loss at step 50: 0.04945049807429314 Loss at step 100: 0.04973185807466507 Loss at step 150: 0.041980139911174774 Loss at step 200: 0.04770443215966225 Loss at step 250: 0.04414280131459236 Loss at step 300: 0.052286576479673386 Loss at step 350: 0.041010886430740356 Loss at step 400: 0.04782414436340332 Loss at step 450: 0.04761097952723503 Loss at step 500: 0.04825061187148094 Loss at step 550: 0.038099177181720734 Loss at step 600: 0.0480027012526989 Loss at step 650: 0.04800068587064743 Loss at step 700: 0.061499349772930145 Loss at step 750: 0.04146014526486397 Loss at step 800: 0.06098603457212448 Loss at step 850: 0.04748842120170593 Loss at step 900: 0.06296972930431366 Mean training loss after epoch 16: 0.05301589951681684 EPOCH: 17 Loss at step 0: 0.050900649279356 Loss at step 50: 0.05870142579078674 Loss at step 100: 0.04217936098575592 Loss at step 150: 0.04953073710203171 Loss at step 200: 0.049804799258708954 Loss at step 250: 0.07654105871915817 Loss at step 300: 0.04744258150458336 Loss at step 350: 0.049581244587898254 Loss at step 400: 0.04433496668934822 Loss at step 450: 0.07378721982240677 Loss at step 500: 0.0472632497549057 Loss at step 550: 0.04462122172117233 Loss at step 600: 0.047357019037008286 Loss at step 650: 0.0511186346411705 Loss at step 700: 0.06029724329710007 Loss at step 750: 0.08607412874698639 Loss at step 800: 0.06673901528120041 Loss at step 850: 0.049956243485212326 Loss at step 900: 0.05064079910516739 Mean training loss after epoch 17: 0.053277499953916334 EPOCH: 18 Loss at step 0: 0.05086199939250946 Loss at step 50: 0.046769943088293076 Loss at step 100: 0.06523244827985764 Loss at step 150: 0.0458143875002861 Loss at step 200: 0.04205577075481415 Loss at step 250: 0.049900200217962265 Loss at step 300: 0.04594661667943001 Loss at step 350: 0.06559468060731888 Loss at step 400: 0.04268786683678627 Loss at step 450: 0.06698732823133469 Loss at step 500: 0.06509141623973846 Loss at step 550: 0.04594078287482262 Loss at step 600: 0.05700868368148804 Loss at step 650: 0.04624931886792183 Loss at step 700: 0.04845418035984039 Loss at step 750: 0.052461400628089905 Loss at step 800: 0.04679504781961441 Loss at step 850: 0.04455115273594856 Loss at step 900: 0.04563405364751816 Mean training loss after epoch 18: 0.05261682159801536 EPOCH: 19 Loss at step 0: 0.05008582025766373 Loss at step 50: 0.0469428151845932 Loss at step 100: 0.042250920087099075 Loss at step 150: 0.05690677836537361 Loss at step 200: 0.04289969429373741 Loss at step 250: 0.04495471715927124 Loss at step 300: 0.0512433722615242 Loss at step 350: 0.06341147422790527 Loss at step 400: 0.08557368814945221 Loss at step 450: 0.050049830228090286 Loss at step 500: 0.051234953105449677 Loss at step 550: 0.04708549752831459 Loss at step 600: 0.045194361358881 Loss at step 650: 0.048760268837213516 Loss at step 700: 0.04330597445368767 Loss at step 750: 0.062037039548158646 Loss at step 800: 0.042576003819704056 Loss at step 850: 0.04331246018409729 Loss at step 900: 0.042293522506952286 Mean training loss after epoch 19: 0.05178125435387148 EPOCH: 20 Loss at step 0: 0.04085322842001915 Loss at step 50: 0.04718044772744179 Loss at step 100: 0.05507251247763634 Loss at step 150: 0.047603312879800797 Loss at step 200: 0.052473146468400955 Loss at step 250: 0.06438302993774414 Loss at step 300: 0.05676301568746567 Loss at step 350: 0.052687861025333405 Loss at step 400: 0.04799008369445801 Loss at step 450: 0.049181342124938965 Loss at step 500: 0.041534651070833206 Loss at step 550: 0.044799648225307465 Loss at step 600: 0.06126664578914642 Loss at step 650: 0.04045187309384346 Loss at step 700: 0.05458369478583336 Loss at step 750: 0.04129229113459587 Loss at step 800: 0.05073817819356918 Loss at step 850: 0.0450238436460495 Loss at step 900: 0.05734768882393837 Mean training loss after epoch 20: 0.0517661922228044 EPOCH: 21 Loss at step 0: 0.04736816883087158 Loss at step 50: 0.05123460292816162 Loss at step 100: 0.07180149108171463 Loss at step 150: 0.04614833742380142 Loss at step 200: 0.055986788123846054 Loss at step 250: 0.04713688790798187 Loss at step 300: 0.07680073380470276 Loss at step 350: 0.037171658128499985 Loss at step 400: 0.041306547820568085 Loss at step 450: 0.06588644534349442 Loss at step 500: 0.04721453785896301 Loss at step 550: 0.04810350015759468 Loss at step 600: 0.07714991271495819 Loss at step 650: 0.06845646351575851 Loss at step 700: 0.0885474681854248 Loss at step 750: 0.04188632220029831 Loss at step 800: 0.04813788831233978 Loss at step 850: 0.056454822421073914 Loss at step 900: 0.047110963612794876 Mean training loss after epoch 21: 0.05180830073588565 EPOCH: 22 Loss at step 0: 0.04491744935512543 Loss at step 50: 0.06945037841796875 Loss at step 100: 0.05069524794816971 Loss at step 150: 0.04872312769293785 Loss at step 200: 0.07165094465017319 Loss at step 250: 0.042615655809640884 Loss at step 300: 0.06838830560445786 Loss at step 350: 0.039251286536455154 Loss at step 400: 0.051442209631204605 Loss at step 450: 0.04868418723344803 Loss at step 500: 0.052500247955322266 Loss at step 550: 0.0614294707775116 Loss at step 600: 0.04364381358027458 Loss at step 650: 0.08213029056787491 Loss at step 700: 0.04516555741429329 Loss at step 750: 0.04714184254407883 Loss at step 800: 0.045882437378168106 Loss at step 850: 0.043889161199331284 Loss at step 900: 0.06544516235589981 Mean training loss after epoch 22: 0.05079023511822163 EPOCH: 23 Loss at step 0: 0.0436272919178009 Loss at step 50: 0.03963764011859894 Loss at step 100: 0.06861964613199234 Loss at step 150: 0.04127909243106842 Loss at step 200: 0.04526159539818764 Loss at step 250: 0.049396008253097534 Loss at step 300: 0.06150975823402405 Loss at step 350: 0.050574496388435364 Loss at step 400: 0.060202695429325104 Loss at step 450: 0.060561228543519974 Loss at step 500: 0.04676740989089012 Loss at step 550: 0.04325539618730545 Loss at step 600: 0.06235778331756592 Loss at step 650: 0.0405287966132164 Loss at step 700: 0.04523320123553276 Loss at step 750: 0.04291979968547821 Loss at step 800: 0.04659253731369972 Loss at step 850: 0.05818319320678711 Loss at step 900: 0.04298672452569008 Mean training loss after epoch 23: 0.051045390989766445 EPOCH: 24 Loss at step 0: 0.047377731651067734 Loss at step 50: 0.06946328282356262 Loss at step 100: 0.04813387989997864 Loss at step 150: 0.051848117262125015 Loss at step 200: 0.04167890548706055 Loss at step 250: 0.04727325960993767 Loss at step 300: 0.04914094880223274 Loss at step 350: 0.04801968112587929 Loss at step 400: 0.04385871812701225 Loss at step 450: 0.04802468419075012 Loss at step 500: 0.0376226007938385 Loss at step 550: 0.041958943009376526 Loss at step 600: 0.04319850355386734 Loss at step 650: 0.0450308658182621 Loss at step 700: 0.058143459260463715 Loss at step 750: 0.04116497188806534 Loss at step 800: 0.056865643709897995 Loss at step 850: 0.0419921875 Loss at step 900: 0.04536984860897064 Mean training loss after epoch 24: 0.05077524852555698 EPOCH: 25 Loss at step 0: 0.06290269643068314 Loss at step 50: 0.044282540678977966 Loss at step 100: 0.04541947692632675 Loss at step 150: 0.06099638342857361 Loss at step 200: 0.03880387172102928 Loss at step 250: 0.04267255216836929 Loss at step 300: 0.08341123163700104 Loss at step 350: 0.057711828500032425 Loss at step 400: 0.062459301203489304 Loss at step 450: 0.04647941514849663 Loss at step 500: 0.04681705683469772 Loss at step 550: 0.04867672920227051 Loss at step 600: 0.05859166756272316 Loss at step 650: 0.04239819571375847 Loss at step 700: 0.062004368752241135 Loss at step 750: 0.0419701486825943 Loss at step 800: 0.05037344992160797 Loss at step 850: 0.05640712380409241 Loss at step 900: 0.04209965839982033 Mean training loss after epoch 25: 0.05090450371569916 EPOCH: 26 Loss at step 0: 0.042142391204833984 Loss at step 50: 0.047991108149290085 Loss at step 100: 0.04590717703104019 Loss at step 150: 0.044671718031167984 Loss at step 200: 0.045167550444602966 Loss at step 250: 0.0460001640021801 Loss at step 300: 0.04420885816216469 Loss at step 350: 0.05234783515334129 Loss at step 400: 0.0453946515917778 Loss at step 450: 0.05281873419880867 Loss at step 500: 0.04344617947936058 Loss at step 550: 0.0661051794886589 Loss at step 600: 0.05057433992624283 Loss at step 650: 0.05572565272450447 Loss at step 700: 0.03902425616979599 Loss at step 750: 0.06489232927560806 Loss at step 800: 0.05262084677815437 Loss at step 850: 0.05673076957464218 Loss at step 900: 0.04644480347633362 Mean training loss after epoch 26: 0.05055352122679766 EPOCH: 27 Loss at step 0: 0.04452642425894737 Loss at step 50: 0.06435798108577728 Loss at step 100: 0.06569556891918182 Loss at step 150: 0.04142623394727707 Loss at step 200: 0.04948320984840393 Loss at step 250: 0.06049847975373268 Loss at step 300: 0.06083298474550247 Loss at step 350: 0.052206285297870636 Loss at step 400: 0.05401673540472984 Loss at step 450: 0.04744541645050049 Loss at step 500: 0.06511743366718292 Loss at step 550: 0.04501355439424515 Loss at step 600: 0.041242241859436035 Loss at step 650: 0.04379437863826752 Loss at step 700: 0.04015430063009262 Loss at step 750: 0.05899045616388321 Loss at step 800: 0.05839214473962784 Loss at step 850: 0.06337793916463852 Loss at step 900: 0.043618232011795044 Mean training loss after epoch 27: 0.05037758056161754 EPOCH: 28 Loss at step 0: 0.04514889419078827 Loss at step 50: 0.044290829449892044 Loss at step 100: 0.04405052214860916 Loss at step 150: 0.03945878893136978 Loss at step 200: 0.05663863942027092 Loss at step 250: 0.037913646548986435 Loss at step 300: 0.04983401671051979 Loss at step 350: 0.05962616950273514 Loss at step 400: 0.04913979023694992 Loss at step 450: 0.04188830032944679 Loss at step 500: 0.04532388597726822 Loss at step 550: 0.04382990300655365 Loss at step 600: 0.0424957349896431 Loss at step 650: 0.040769629180431366 Loss at step 700: 0.05011913552880287 Loss at step 750: 0.05189616233110428 Loss at step 800: 0.06069490686058998 Loss at step 850: 0.0606662780046463 Loss at step 900: 0.06339233368635178 Mean training loss after epoch 28: 0.05006565867281799 EPOCH: 29 Loss at step 0: 0.04655815660953522 Loss at step 50: 0.05161908268928528 Loss at step 100: 0.06573128700256348 Loss at step 150: 0.06455166637897491 Loss at step 200: 0.041213154792785645 Loss at step 250: 0.03638099506497383 Loss at step 300: 0.04306737706065178 Loss at step 350: 0.04878435656428337 Loss at step 400: 0.06398443877696991 Loss at step 450: 0.05853625014424324 Loss at step 500: 0.045773811638355255 Loss at step 550: 0.047761376947164536 Loss at step 600: 0.04002698138356209 Loss at step 650: 0.04743783548474312 Loss at step 700: 0.03869962692260742 Loss at step 750: 0.047647785395383835 Loss at step 800: 0.06634891033172607 Loss at step 850: 0.07956603169441223 Loss at step 900: 0.04754936695098877 Mean training loss after epoch 29: 0.049404000911091185 EPOCH: 30 Loss at step 0: 0.050915688276290894 Loss at step 50: 0.041026610881090164 Loss at step 100: 0.04445674270391464 Loss at step 150: 0.03953985869884491 Loss at step 200: 0.04294249415397644 Loss at step 250: 0.04661906883120537 Loss at step 300: 0.04330490529537201 Loss at step 350: 0.050236452370882034 Loss at step 400: 0.044032078236341476 Loss at step 450: 0.04203539341688156 Loss at step 500: 0.04053846746683121 Loss at step 550: 0.04489400237798691 Loss at step 600: 0.04497632756829262 Loss at step 650: 0.042391419410705566 Loss at step 700: 0.04372663423418999 Loss at step 750: 0.050727978348731995 Loss at step 800: 0.056840550154447556 Loss at step 850: 0.03669234737753868 Loss at step 900: 0.04258238524198532 Mean training loss after epoch 30: 0.04971318146281405 EPOCH: 31 Loss at step 0: 0.06484978646039963 Loss at step 50: 0.04175840690732002 Loss at step 100: 0.04312325268983841 Loss at step 150: 0.04282500222325325 Loss at step 200: 0.05401477962732315 Loss at step 250: 0.042297832667827606 Loss at step 300: 0.05250099301338196 Loss at step 350: 0.03834922984242439 Loss at step 400: 0.05337914451956749 Loss at step 450: 0.04185112193226814 Loss at step 500: 0.06487885862588882 Loss at step 550: 0.04757973551750183 Loss at step 600: 0.04829677566885948 Loss at step 650: 0.035463087260723114 Loss at step 700: 0.05495708808302879 Loss at step 750: 0.036232586950063705 Loss at step 800: 0.061512090265750885 Loss at step 850: 0.043556567281484604 Loss at step 900: 0.08011730015277863 Mean training loss after epoch 31: 0.049935012738079404 EPOCH: 32 Loss at step 0: 0.04344335198402405 Loss at step 50: 0.05721044912934303 Loss at step 100: 0.046312738209962845 Loss at step 150: 0.05228522792458534 Loss at step 200: 0.057067014276981354 Loss at step 250: 0.03289061039686203 Loss at step 300: 0.06629761308431625 Loss at step 350: 0.04323932155966759 Loss at step 400: 0.04615568369626999 Loss at step 450: 0.056625645607709885 Loss at step 500: 0.03862985223531723 Loss at step 550: 0.04294659569859505 Loss at step 600: 0.05523504316806793 Loss at step 650: 0.0397227443754673 Loss at step 700: 0.04306117817759514 Loss at step 750: 0.04495193064212799 Loss at step 800: 0.04435350373387337 Loss at step 850: 0.05286962166428566 Loss at step 900: 0.03940501809120178 Mean training loss after epoch 32: 0.049423977319619804 EPOCH: 33 Loss at step 0: 0.04250555858016014 Loss at step 50: 0.037470895797014236 Loss at step 100: 0.05101972073316574 Loss at step 150: 0.05711402744054794 Loss at step 200: 0.045323025435209274 Loss at step 250: 0.04399580880999565 Loss at step 300: 0.044076479971408844 Loss at step 350: 0.040897756814956665 Loss at step 400: 0.0451720766723156 Loss at step 450: 0.043176982551813126 Loss at step 500: 0.052592918276786804 Loss at step 550: 0.04394547641277313 Loss at step 600: 0.04949023202061653 Loss at step 650: 0.04767784848809242 Loss at step 700: 0.06310772150754929 Loss at step 750: 0.04441818967461586 Loss at step 800: 0.03678903728723526 Loss at step 850: 0.053998153656721115 Loss at step 900: 0.04527631029486656 Mean training loss after epoch 33: 0.04910826508694493 EPOCH: 34 Loss at step 0: 0.05105522647500038 Loss at step 50: 0.059240952134132385 Loss at step 100: 0.06297364085912704 Loss at step 150: 0.05741322040557861 Loss at step 200: 0.04099162295460701 Loss at step 250: 0.04615488275885582 Loss at step 300: 0.04576802998781204 Loss at step 350: 0.056760627776384354 Loss at step 400: 0.054047271609306335 Loss at step 450: 0.03682532161474228 Loss at step 500: 0.04004567489027977 Loss at step 550: 0.04781923070549965 Loss at step 600: 0.0550600066781044 Loss at step 650: 0.04479058459401131 Loss at step 700: 0.04241100698709488 Loss at step 750: 0.04358221963047981 Loss at step 800: 0.04976465180516243 Loss at step 850: 0.04104519262909889 Loss at step 900: 0.04504445195198059 Mean training loss after epoch 34: 0.04901465586523639 EPOCH: 35 Loss at step 0: 0.057444434612989426 Loss at step 50: 0.06277285516262054 Loss at step 100: 0.051594328135252 Loss at step 150: 0.06292452663183212 Loss at step 200: 0.06344478577375412 Loss at step 250: 0.04469678923487663 Loss at step 300: 0.04674845188856125 Loss at step 350: 0.05346088856458664 Loss at step 400: 0.04565180093050003 Loss at step 450: 0.04018675163388252 Loss at step 500: 0.037274982780218124 Loss at step 550: 0.047220826148986816 Loss at step 600: 0.0489717461168766 Loss at step 650: 0.05728902667760849 Loss at step 700: 0.06198665872216225 Loss at step 750: 0.05335989221930504 Loss at step 800: 0.04688859358429909 Loss at step 850: 0.04307933524250984 Loss at step 900: 0.043919213116168976 Mean training loss after epoch 35: 0.049011839140675216 EPOCH: 36 Loss at step 0: 0.03969353809952736 Loss at step 50: 0.04488852992653847 Loss at step 100: 0.039581749588251114 Loss at step 150: 0.04721924662590027 Loss at step 200: 0.04326583817601204 Loss at step 250: 0.0863625705242157 Loss at step 300: 0.07442335784435272 Loss at step 350: 0.053866345435380936 Loss at step 400: 0.043542373925447464 Loss at step 450: 0.06381448358297348 Loss at step 500: 0.042756859213113785 Loss at step 550: 0.0683085098862648 Loss at step 600: 0.056149888783693314 Loss at step 650: 0.049297116696834564 Loss at step 700: 0.04522441327571869 Loss at step 750: 0.03973071649670601 Loss at step 800: 0.041491225361824036 Loss at step 850: 0.0637526586651802 Loss at step 900: 0.058193858712911606 Mean training loss after epoch 36: 0.049057883426928316 EPOCH: 37 Loss at step 0: 0.0653361976146698 Loss at step 50: 0.05335010215640068 Loss at step 100: 0.044727008789777756 Loss at step 150: 0.03675035387277603 Loss at step 200: 0.04185955598950386 Loss at step 250: 0.05051618069410324 Loss at step 300: 0.05953283980488777 Loss at step 350: 0.04645645245909691 Loss at step 400: 0.03637448698282242 Loss at step 450: 0.07463718950748444 Loss at step 500: 0.05889231339097023 Loss at step 550: 0.04141048341989517 Loss at step 600: 0.04219688102602959 Loss at step 650: 0.04691004008054733 Loss at step 700: 0.053767286241054535 Loss at step 750: 0.08572646975517273 Loss at step 800: 0.06511525064706802 Loss at step 850: 0.05932161584496498 Loss at step 900: 0.045819077640771866 Mean training loss after epoch 37: 0.048769560882817704 EPOCH: 38 Loss at step 0: 0.04538530111312866 Loss at step 50: 0.07527995854616165 Loss at step 100: 0.04729525372385979 Loss at step 150: 0.05010809004306793 Loss at step 200: 0.05092030391097069 Loss at step 250: 0.04526155814528465 Loss at step 300: 0.04286094382405281 Loss at step 350: 0.043236829340457916 Loss at step 400: 0.04764321446418762 Loss at step 450: 0.04734548181295395 Loss at step 500: 0.04488454386591911 Loss at step 550: 0.04956037178635597 Loss at step 600: 0.0663757473230362 Loss at step 650: 0.04199524223804474 Loss at step 700: 0.050872236490249634 Loss at step 750: 0.04291040822863579 Loss at step 800: 0.04178263992071152 Loss at step 850: 0.037855736911296844 Loss at step 900: 0.045713480561971664 Mean training loss after epoch 38: 0.04907073359340747 EPOCH: 39 Loss at step 0: 0.03673163801431656 Loss at step 50: 0.0534173958003521 Loss at step 100: 0.04433878883719444 Loss at step 150: 0.04459472373127937 Loss at step 200: 0.0429384745657444 Loss at step 250: 0.049357492476701736 Loss at step 300: 0.044586073607206345 Loss at step 350: 0.0363384373486042 Loss at step 400: 0.031046107411384583 Loss at step 450: 0.043330032378435135 Loss at step 500: 0.055950116366147995 Loss at step 550: 0.0618477389216423 Loss at step 600: 0.05723656341433525 Loss at step 650: 0.06474528461694717 Loss at step 700: 0.04325956106185913 Loss at step 750: 0.04518519341945648 Loss at step 800: 0.059344902634620667 Loss at step 850: 0.04913629591464996 Loss at step 900: 0.04235805571079254 Mean training loss after epoch 39: 0.04861656298785449 EPOCH: 40 Loss at step 0: 0.059134311974048615 Loss at step 50: 0.036434974521398544 Loss at step 100: 0.05801485851407051 Loss at step 150: 0.047127652913331985 Loss at step 200: 0.05689512938261032 Loss at step 250: 0.04027149826288223 Loss at step 300: 0.06532489508390427 Loss at step 350: 0.042276132851839066 Loss at step 400: 0.0475594587624073 Loss at step 450: 0.05379931628704071 Loss at step 500: 0.05025225132703781 Loss at step 550: 0.03438549488782883 Loss at step 600: 0.04708230122923851 Loss at step 650: 0.03939519077539444 Loss at step 700: 0.03963244706392288 Loss at step 750: 0.04508259892463684 Loss at step 800: 0.06207893788814545 Loss at step 850: 0.07638747990131378 Loss at step 900: 0.042751844972372055 Mean training loss after epoch 40: 0.048425927184331516 EPOCH: 41 Loss at step 0: 0.045149724930524826 Loss at step 50: 0.040779367089271545 Loss at step 100: 0.046693481504917145 Loss at step 150: 0.05173959955573082 Loss at step 200: 0.05692810192704201 Loss at step 250: 0.034382231533527374 Loss at step 300: 0.04717053100466728 Loss at step 350: 0.038805440068244934 Loss at step 400: 0.04484940320253372 Loss at step 450: 0.04230780899524689 Loss at step 500: 0.04572199657559395 Loss at step 550: 0.04334509000182152 Loss at step 600: 0.04342447966337204 Loss at step 650: 0.0616951659321785 Loss at step 700: 0.046972498297691345 Loss at step 750: 0.04025867581367493 Loss at step 800: 0.03753625601530075 Loss at step 850: 0.04760415479540825 Loss at step 900: 0.045295197516679764 Mean training loss after epoch 41: 0.04846534076561806 EPOCH: 42 Loss at step 0: 0.040075745433568954 Loss at step 50: 0.04573138803243637 Loss at step 100: 0.05906829237937927 Loss at step 150: 0.045193739235401154 Loss at step 200: 0.06081918627023697 Loss at step 250: 0.041441235691308975 Loss at step 300: 0.04311012104153633 Loss at step 350: 0.05839284509420395 Loss at step 400: 0.04170873016119003 Loss at step 450: 0.04581957310438156 Loss at step 500: 0.048567526042461395 Loss at step 550: 0.060478199273347855 Loss at step 600: 0.045915063470602036 Loss at step 650: 0.05965419486165047 Loss at step 700: 0.05063694342970848 Loss at step 750: 0.0467405691742897 Loss at step 800: 0.04492557793855667 Loss at step 850: 0.04573036730289459 Loss at step 900: 0.041740208864212036 Mean training loss after epoch 42: 0.04856988888924946 EPOCH: 43 Loss at step 0: 0.04095011577010155 Loss at step 50: 0.03885659947991371 Loss at step 100: 0.043073128908872604 Loss at step 150: 0.036185137927532196 Loss at step 200: 0.046450089663267136 Loss at step 250: 0.05685682222247124 Loss at step 300: 0.03571143001317978 Loss at step 350: 0.03485193848609924 Loss at step 400: 0.05649569630622864 Loss at step 450: 0.04298286512494087 Loss at step 500: 0.04161464795470238 Loss at step 550: 0.039557427167892456 Loss at step 600: 0.042602986097335815 Loss at step 650: 0.035522449761629105 Loss at step 700: 0.03820335865020752 Loss at step 750: 0.0585138238966465 Loss at step 800: 0.04570963978767395 Loss at step 850: 0.03834442049264908 Loss at step 900: 0.07231715321540833 Mean training loss after epoch 43: 0.04821958671262396 EPOCH: 44 Loss at step 0: 0.034798916429281235 Loss at step 50: 0.04197883978486061 Loss at step 100: 0.05111131817102432 Loss at step 150: 0.04755363613367081 Loss at step 200: 0.058203745633363724 Loss at step 250: 0.06759943068027496 Loss at step 300: 0.03717445582151413 Loss at step 350: 0.06995828449726105 Loss at step 400: 0.03989290073513985 Loss at step 450: 0.04441496357321739 Loss at step 500: 0.04872564598917961 Loss at step 550: 0.04298517480492592 Loss at step 600: 0.04903782159090042 Loss at step 650: 0.038995109498500824 Loss at step 700: 0.05700412765145302 Loss at step 750: 0.062260664999485016 Loss at step 800: 0.06332585960626602 Loss at step 850: 0.04258022457361221 Loss at step 900: 0.07600081712007523 Mean training loss after epoch 44: 0.04870770010016938 EPOCH: 45 Loss at step 0: 0.044188424944877625 Loss at step 50: 0.04226576164364815 Loss at step 100: 0.04229314997792244 Loss at step 150: 0.059289153665304184 Loss at step 200: 0.04477671533823013 Loss at step 250: 0.04305263236165047 Loss at step 300: 0.04676917940378189 Loss at step 350: 0.04440266638994217 Loss at step 400: 0.043396756052970886 Loss at step 450: 0.04248253256082535 Loss at step 500: 0.042557645589113235 Loss at step 550: 0.05129760876297951 Loss at step 600: 0.04851830005645752 Loss at step 650: 0.05680913105607033 Loss at step 700: 0.045711394399404526 Loss at step 750: 0.053241752088069916 Loss at step 800: 0.0385541208088398 Loss at step 850: 0.03976454958319664 Loss at step 900: 0.0427694208920002 Mean training loss after epoch 45: 0.047767311924381425 EPOCH: 46 Loss at step 0: 0.03992181643843651 Loss at step 50: 0.04293971508741379 Loss at step 100: 0.03939201310276985 Loss at step 150: 0.05830605700612068 Loss at step 200: 0.06204408407211304 Loss at step 250: 0.04135732352733612 Loss at step 300: 0.061164457350969315 Loss at step 350: 0.04397903010249138 Loss at step 400: 0.0532187782227993 Loss at step 450: 0.0443677194416523 Loss at step 500: 0.04271259531378746 Loss at step 550: 0.04626317322254181 Loss at step 600: 0.04927317425608635 Loss at step 650: 0.045233193784952164 Loss at step 700: 0.04343508929014206 Loss at step 750: 0.054247207939624786 Loss at step 800: 0.049550238996744156 Loss at step 850: 0.05290677398443222 Loss at step 900: 0.05137857049703598 Mean training loss after epoch 46: 0.04831483051069637 EPOCH: 47 Loss at step 0: 0.041090186685323715 Loss at step 50: 0.05871324986219406 Loss at step 100: 0.049123216420412064 Loss at step 150: 0.037949852645397186 Loss at step 200: 0.03899049758911133 Loss at step 250: 0.07024305313825607 Loss at step 300: 0.046286843717098236 Loss at step 350: 0.04203737899661064 Loss at step 400: 0.04148729518055916 Loss at step 450: 0.04240085557103157 Loss at step 500: 0.04039154201745987 Loss at step 550: 0.06134286895394325 Loss at step 600: 0.05570222809910774 Loss at step 650: 0.06063522398471832 Loss at step 700: 0.03961898386478424 Loss at step 750: 0.05074620619416237 Loss at step 800: 0.03857548534870148 Loss at step 850: 0.05507887899875641 Loss at step 900: 0.045516569167375565 Mean training loss after epoch 47: 0.04786117012693938 EPOCH: 48 Loss at step 0: 0.03990032523870468 Loss at step 50: 0.051056742668151855 Loss at step 100: 0.061179183423519135 Loss at step 150: 0.04846607893705368 Loss at step 200: 0.054427407681941986 Loss at step 250: 0.05284188687801361 Loss at step 300: 0.03520993888378143 Loss at step 350: 0.05905189365148544 Loss at step 400: 0.07662802934646606 Loss at step 450: 0.042436257004737854 Loss at step 500: 0.04391404613852501 Loss at step 550: 0.05657761171460152 Loss at step 600: 0.04691973328590393 Loss at step 650: 0.04965410381555557 Loss at step 700: 0.04046349227428436 Loss at step 750: 0.047720734030008316 Loss at step 800: 0.04424591362476349 Loss at step 850: 0.04407589137554169 Loss at step 900: 0.05281566455960274 Mean training loss after epoch 48: 0.048559599736733225 EPOCH: 49 Loss at step 0: 0.04248611629009247 Loss at step 50: 0.06542706489562988 Loss at step 100: 0.036818962544202805 Loss at step 150: 0.04566914588212967 Loss at step 200: 0.06010732799768448 Loss at step 250: 0.05796084925532341 Loss at step 300: 0.06409075856208801 Loss at step 350: 0.03826560080051422 Loss at step 400: 0.04047199711203575 Loss at step 450: 0.043740324676036835 Loss at step 500: 0.056194137781858444 Loss at step 550: 0.06169737130403519 Loss at step 600: 0.0614679716527462 Loss at step 650: 0.05903233587741852 Loss at step 700: 0.050649963319301605 Loss at step 750: 0.051603130996227264 Loss at step 800: 0.052182961255311966 Loss at step 850: 0.04206417128443718 Loss at step 900: 0.049614641815423965 Mean training loss after epoch 49: 0.04802999187157606 EPOCH: 50 Loss at step 0: 0.037383776158094406 Loss at step 50: 0.04241829365491867 Loss at step 100: 0.059479039162397385 Loss at step 150: 0.05270376056432724 Loss at step 200: 0.04617966338992119 Loss at step 250: 0.04218583554029465 Loss at step 300: 0.04614056646823883 Loss at step 350: 0.05725904181599617 Loss at step 400: 0.036421194672584534 Loss at step 450: 0.04488678276538849 Loss at step 500: 0.060551952570676804 Loss at step 550: 0.049277305603027344 Loss at step 600: 0.05480537936091423 Loss at step 650: 0.03827526047825813 Loss at step 700: 0.03998871147632599 Loss at step 750: 0.041451502591371536 Loss at step 800: 0.0709061548113823 Loss at step 850: 0.041058316826820374 Loss at step 900: 0.03959430754184723 Mean training loss after epoch 50: 0.04820934032151567 EPOCH: 51 Loss at step 0: 0.041619203984737396 Loss at step 50: 0.04883274808526039 Loss at step 100: 0.045424070209264755 Loss at step 150: 0.05160556361079216 Loss at step 200: 0.05843271315097809 Loss at step 250: 0.04874180629849434 Loss at step 300: 0.05108042061328888 Loss at step 350: 0.05615350976586342 Loss at step 400: 0.06415876746177673 Loss at step 450: 0.03767374902963638 Loss at step 500: 0.04029324650764465 Loss at step 550: 0.04865525662899017 Loss at step 600: 0.04192189872264862 Loss at step 650: 0.039923734962940216 Loss at step 700: 0.051063574850559235 Loss at step 750: 0.06408775597810745 Loss at step 800: 0.042297735810279846 Loss at step 850: 0.07015858590602875 Loss at step 900: 0.04509627819061279 Mean training loss after epoch 51: 0.0476567621575172 EPOCH: 52 Loss at step 0: 0.04478397220373154 Loss at step 50: 0.04244489595293999 Loss at step 100: 0.04888368025422096 Loss at step 150: 0.06016533449292183 Loss at step 200: 0.051399730145931244 Loss at step 250: 0.05953222140669823 Loss at step 300: 0.04817875102162361 Loss at step 350: 0.04486919939517975 Loss at step 400: 0.047504130750894547 Loss at step 450: 0.04274759814143181 Loss at step 500: 0.041122015565633774 Loss at step 550: 0.048857755959033966 Loss at step 600: 0.04217732697725296 Loss at step 650: 0.056703366339206696 Loss at step 700: 0.05860498547554016 Loss at step 750: 0.056156985461711884 Loss at step 800: 0.05311962962150574 Loss at step 850: 0.04071011021733284 Loss at step 900: 0.05511891841888428 Mean training loss after epoch 52: 0.04819428236054967 EPOCH: 53 Loss at step 0: 0.05077159032225609 Loss at step 50: 0.053084906190633774 Loss at step 100: 0.03665981814265251 Loss at step 150: 0.03799162805080414 Loss at step 200: 0.037085629999637604 Loss at step 250: 0.042924895882606506 Loss at step 300: 0.04338213801383972 Loss at step 350: 0.045671381056308746 Loss at step 400: 0.06677921116352081 Loss at step 450: 0.049600083380937576 Loss at step 500: 0.042698781937360764 Loss at step 550: 0.05366478115320206 Loss at step 600: 0.05048254877328873 Loss at step 650: 0.043918296694755554 Loss at step 700: 0.055468734353780746 Loss at step 750: 0.05061151459813118 Loss at step 800: 0.04261021316051483 Loss at step 850: 0.0454380176961422 Loss at step 900: 0.05239158123731613 Mean training loss after epoch 53: 0.04770294610442701 EPOCH: 54 Loss at step 0: 0.04333365708589554 Loss at step 50: 0.0559883676469326 Loss at step 100: 0.037564121186733246 Loss at step 150: 0.0373581238090992 Loss at step 200: 0.045086342841386795 Loss at step 250: 0.04513102024793625 Loss at step 300: 0.07298414409160614 Loss at step 350: 0.04264046996831894 Loss at step 400: 0.04622480645775795 Loss at step 450: 0.061569806188344955 Loss at step 500: 0.041549116373062134 Loss at step 550: 0.06776107102632523 Loss at step 600: 0.06506755948066711 Loss at step 650: 0.042012769728899 Loss at step 700: 0.04169164225459099 Loss at step 750: 0.05260315537452698 Loss at step 800: 0.03175465017557144 Loss at step 850: 0.044237203896045685 Loss at step 900: 0.04644806310534477 Mean training loss after epoch 54: 0.0475714241287538 EPOCH: 55 Loss at step 0: 0.059405967593193054 Loss at step 50: 0.03870406746864319 Loss at step 100: 0.042681701481342316 Loss at step 150: 0.04306136816740036 Loss at step 200: 0.06196766346693039 Loss at step 250: 0.040188293904066086 Loss at step 300: 0.03733072429895401 Loss at step 350: 0.03666871413588524 Loss at step 400: 0.05433874949812889 Loss at step 450: 0.040527451783418655 Loss at step 500: 0.05207390710711479 Loss at step 550: 0.03889770805835724 Loss at step 600: 0.04664124175906181 Loss at step 650: 0.041562147438526154 Loss at step 700: 0.07040927559137344 Loss at step 750: 0.05096499249339104 Loss at step 800: 0.04143202304840088 Loss at step 850: 0.07190993428230286 Loss at step 900: 0.04007066786289215 Mean training loss after epoch 55: 0.04781148741359332 EPOCH: 56 Loss at step 0: 0.04074227064847946 Loss at step 50: 0.04614419862627983 Loss at step 100: 0.041476599872112274 Loss at step 150: 0.0518798865377903 Loss at step 200: 0.037053726613521576 Loss at step 250: 0.04328696057200432 Loss at step 300: 0.06059698015451431 Loss at step 350: 0.0382181853055954 Loss at step 400: 0.048399440944194794 Loss at step 450: 0.04355709254741669 Loss at step 500: 0.04544617235660553 Loss at step 550: 0.05161639675498009 Loss at step 600: 0.04733527824282646 Loss at step 650: 0.04416561871767044 Loss at step 700: 0.03827279061079025 Loss at step 750: 0.03590138629078865 Loss at step 800: 0.03504889830946922 Loss at step 850: 0.050474025309085846 Loss at step 900: 0.041742656379938126 Mean training loss after epoch 56: 0.04734459527527918 EPOCH: 57 Loss at step 0: 0.058345891535282135 Loss at step 50: 0.06117750704288483 Loss at step 100: 0.046317242085933685 Loss at step 150: 0.045685600489377975 Loss at step 200: 0.04756028205156326 Loss at step 250: 0.056808728724718094 Loss at step 300: 0.04321655258536339 Loss at step 350: 0.08190663158893585 Loss at step 400: 0.03924918174743652 Loss at step 450: 0.04386617988348007 Loss at step 500: 0.047922540456056595 Loss at step 550: 0.03665482997894287 Loss at step 600: 0.06892412155866623 Loss at step 650: 0.03585939109325409 Loss at step 700: 0.04524300992488861 Loss at step 750: 0.03950698301196098 Loss at step 800: 0.04373985156416893 Loss at step 850: 0.036136604845523834 Loss at step 900: 0.03782308101654053 Mean training loss after epoch 57: 0.04744744747241677 EPOCH: 58 Loss at step 0: 0.04976191744208336 Loss at step 50: 0.043196871876716614 Loss at step 100: 0.07170452922582626 Loss at step 150: 0.054760776460170746 Loss at step 200: 0.06260420382022858 Loss at step 250: 0.04796525835990906 Loss at step 300: 0.041735656559467316 Loss at step 350: 0.03981652110815048 Loss at step 400: 0.03943171724677086 Loss at step 450: 0.043901290744543076 Loss at step 500: 0.04369043558835983 Loss at step 550: 0.039904605597257614 Loss at step 600: 0.046015627682209015 Loss at step 650: 0.06019500643014908 Loss at step 700: 0.04324943944811821 Loss at step 750: 0.037860363721847534 Loss at step 800: 0.04226585105061531 Loss at step 850: 0.04079734906554222 Loss at step 900: 0.04691777750849724 Mean training loss after epoch 58: 0.04781080302057553 EPOCH: 59 Loss at step 0: 0.043843116611242294 Loss at step 50: 0.04664764553308487 Loss at step 100: 0.04755036532878876 Loss at step 150: 0.03984639421105385 Loss at step 200: 0.044170260429382324 Loss at step 250: 0.043402574956417084 Loss at step 300: 0.04149569198489189 Loss at step 350: 0.05489599332213402 Loss at step 400: 0.04855499416589737 Loss at step 450: 0.05261210724711418 Loss at step 500: 0.035756614059209824 Loss at step 550: 0.049778953194618225 Loss at step 600: 0.04196585714817047 Loss at step 650: 0.057926371693611145 Loss at step 700: 0.05905818194150925 Loss at step 750: 0.03722427040338516 Loss at step 800: 0.04345725476741791 Loss at step 850: 0.044434789568185806 Loss at step 900: 0.04761435464024544 Mean training loss after epoch 59: 0.04721634086372375 EPOCH: 60 Loss at step 0: 0.0397469736635685 Loss at step 50: 0.05075594782829285 Loss at step 100: 0.050635192543268204 Loss at step 150: 0.04306646063923836 Loss at step 200: 0.054759297519922256 Loss at step 250: 0.04501060023903847 Loss at step 300: 0.03430795297026634 Loss at step 350: 0.037742119282484055 Loss at step 400: 0.03947889432311058 Loss at step 450: 0.04035079479217529 Loss at step 500: 0.05425229296088219 Loss at step 550: 0.03946786746382713 Loss at step 600: 0.06366962194442749 Loss at step 650: 0.06009647622704506 Loss at step 700: 0.04294969141483307 Loss at step 750: 0.05300426483154297 Loss at step 800: 0.040468886494636536 Loss at step 850: 0.0453835092484951 Loss at step 900: 0.03727548569440842 Mean training loss after epoch 60: 0.04746106219714257 EPOCH: 61 Loss at step 0: 0.046405743807554245 Loss at step 50: 0.05212169513106346 Loss at step 100: 0.041797101497650146 Loss at step 150: 0.04695983603596687 Loss at step 200: 0.04504573345184326 Loss at step 250: 0.04500821232795715 Loss at step 300: 0.04783492162823677 Loss at step 350: 0.03992660716176033 Loss at step 400: 0.044272251427173615 Loss at step 450: 0.043237246572971344 Loss at step 500: 0.060797300189733505 Loss at step 550: 0.03859396651387215 Loss at step 600: 0.04310062900185585 Loss at step 650: 0.06064233183860779 Loss at step 700: 0.03894903138279915 Loss at step 750: 0.042323049157857895 Loss at step 800: 0.08278437703847885 Loss at step 850: 0.0362105630338192 Loss at step 900: 0.04523681849241257 Mean training loss after epoch 61: 0.04743666749106033 EPOCH: 62 Loss at step 0: 0.0396600067615509 Loss at step 50: 0.03983306512236595 Loss at step 100: 0.03684050217270851 Loss at step 150: 0.041904330253601074 Loss at step 200: 0.04916789010167122 Loss at step 250: 0.044871583580970764 Loss at step 300: 0.04830225929617882 Loss at step 350: 0.04024415835738182 Loss at step 400: 0.05647255852818489 Loss at step 450: 0.044032543897628784 Loss at step 500: 0.055675894021987915 Loss at step 550: 0.039186086505651474 Loss at step 600: 0.03810429573059082 Loss at step 650: 0.04003801569342613 Loss at step 700: 0.057398658245801926 Loss at step 750: 0.03869132697582245 Loss at step 800: 0.04732149466872215 Loss at step 850: 0.050904884934425354 Loss at step 900: 0.045972540974617004 Mean training loss after epoch 62: 0.04692970525656046 EPOCH: 63 Loss at step 0: 0.03953172639012337 Loss at step 50: 0.04539800062775612 Loss at step 100: 0.0438656322658062 Loss at step 150: 0.04525233060121536 Loss at step 200: 0.03740347921848297 Loss at step 250: 0.0426780991256237 Loss at step 300: 0.04247293248772621 Loss at step 350: 0.0419720858335495 Loss at step 400: 0.04736369848251343 Loss at step 450: 0.061872538179159164 Loss at step 500: 0.05501571297645569 Loss at step 550: 0.06272027641534805 Loss at step 600: 0.03688386455178261 Loss at step 650: 0.04682745039463043 Loss at step 700: 0.055788882076740265 Loss at step 750: 0.043454479426145554 Loss at step 800: 0.0406973659992218 Loss at step 850: 0.044083818793296814 Loss at step 900: 0.036606565117836 Mean training loss after epoch 63: 0.04697812778720342 EPOCH: 64 Loss at step 0: 0.04095274955034256 Loss at step 50: 0.04482478275895119 Loss at step 100: 0.046302907168865204 Loss at step 150: 0.04385954886674881 Loss at step 200: 0.042863037437200546 Loss at step 250: 0.06368609517812729 Loss at step 300: 0.06583148241043091 Loss at step 350: 0.05111365392804146 Loss at step 400: 0.03907451778650284 Loss at step 450: 0.041604746133089066 Loss at step 500: 0.03983514383435249 Loss at step 550: 0.045956823974847794 Loss at step 600: 0.06226634979248047 Loss at step 650: 0.04369053617119789 Loss at step 700: 0.04254018887877464 Loss at step 750: 0.040303755551576614 Loss at step 800: 0.04570179432630539 Loss at step 850: 0.0649692639708519 Loss at step 900: 0.05548830330371857 Mean training loss after epoch 64: 0.04767467499748349 EPOCH: 65 Loss at step 0: 0.047734152525663376 Loss at step 50: 0.0597100593149662 Loss at step 100: 0.059440728276968 Loss at step 150: 0.05727502703666687 Loss at step 200: 0.038514167070388794 Loss at step 250: 0.054700206965208054 Loss at step 300: 0.0377231128513813 Loss at step 350: 0.07502403110265732 Loss at step 400: 0.0425320528447628 Loss at step 450: 0.04494963586330414 Loss at step 500: 0.03536660224199295 Loss at step 550: 0.05629727989435196 Loss at step 600: 0.043928906321525574 Loss at step 650: 0.06952134519815445 Loss at step 700: 0.04920810088515282 Loss at step 750: 0.037532318383455276 Loss at step 800: 0.04342438653111458 Loss at step 850: 0.041775595396757126 Loss at step 900: 0.04680093377828598 Mean training loss after epoch 65: 0.047508182333730685 EPOCH: 66 Loss at step 0: 0.03914298117160797 Loss at step 50: 0.04533550515770912 Loss at step 100: 0.03584485873579979 Loss at step 150: 0.03203648701310158 Loss at step 200: 0.060460127890110016 Loss at step 250: 0.04005389288067818 Loss at step 300: 0.05664902180433273 Loss at step 350: 0.04251211881637573 Loss at step 400: 0.05594141408801079 Loss at step 450: 0.038950368762016296 Loss at step 500: 0.035225119441747665 Loss at step 550: 0.04369492456316948 Loss at step 600: 0.046434592455625534 Loss at step 650: 0.0351271815598011 Loss at step 700: 0.06532762944698334 Loss at step 750: 0.040061768144369125 Loss at step 800: 0.043671198189258575 Loss at step 850: 0.034423891454935074 Loss at step 900: 0.06415649503469467 Mean training loss after epoch 66: 0.04759214189388096 EPOCH: 67 Loss at step 0: 0.04689043387770653 Loss at step 50: 0.04020730406045914 Loss at step 100: 0.04667309299111366 Loss at step 150: 0.043307214975357056 Loss at step 200: 0.037202753126621246 Loss at step 250: 0.03939209505915642 Loss at step 300: 0.04974953085184097 Loss at step 350: 0.03526380658149719 Loss at step 400: 0.061402756720781326 Loss at step 450: 0.03961838781833649 Loss at step 500: 0.04030150547623634 Loss at step 550: 0.06970278173685074 Loss at step 600: 0.03788544982671738 Loss at step 650: 0.04417352378368378 Loss at step 700: 0.045198049396276474 Loss at step 750: 0.05938095971941948 Loss at step 800: 0.03898803889751434 Loss at step 850: 0.041472259908914566 Loss at step 900: 0.03720082342624664 Mean training loss after epoch 67: 0.04692958526488052 EPOCH: 68 Loss at step 0: 0.05670297145843506 Loss at step 50: 0.04052157700061798 Loss at step 100: 0.03783930838108063 Loss at step 150: 0.041044995188713074 Loss at step 200: 0.04051780700683594 Loss at step 250: 0.03903106227517128 Loss at step 300: 0.04628558084368706 Loss at step 350: 0.03781464695930481 Loss at step 400: 0.037941738963127136 Loss at step 450: 0.05132885277271271 Loss at step 500: 0.0466160923242569 Loss at step 550: 0.04678484424948692 Loss at step 600: 0.05577147752046585 Loss at step 650: 0.04776022210717201 Loss at step 700: 0.0739283338189125 Loss at step 750: 0.04235205426812172 Loss at step 800: 0.04734031483530998 Loss at step 850: 0.03397378697991371 Loss at step 900: 0.03687811642885208 Mean training loss after epoch 68: 0.04736001390431609 EPOCH: 69 Loss at step 0: 0.039057657122612 Loss at step 50: 0.057333722710609436 Loss at step 100: 0.037624578922986984 Loss at step 150: 0.04285532608628273 Loss at step 200: 0.044802140444517136 Loss at step 250: 0.041549913585186005 Loss at step 300: 0.03855804353952408 Loss at step 350: 0.04356319084763527 Loss at step 400: 0.04934385418891907 Loss at step 450: 0.0406583771109581 Loss at step 500: 0.03242161124944687 Loss at step 550: 0.039438243955373764 Loss at step 600: 0.0391414538025856 Loss at step 650: 0.05352361127734184 Loss at step 700: 0.04481195658445358 Loss at step 750: 0.05181654170155525 Loss at step 800: 0.03987474367022514 Loss at step 850: 0.040107693523168564 Loss at step 900: 0.0398813895881176 Mean training loss after epoch 69: 0.04724252622908176 EPOCH: 70 Loss at step 0: 0.06291109323501587 Loss at step 50: 0.04021677374839783 Loss at step 100: 0.038856908679008484 Loss at step 150: 0.03903401643037796 Loss at step 200: 0.041255153715610504 Loss at step 250: 0.0405704528093338 Loss at step 300: 0.04133491963148117 Loss at step 350: 0.04257418215274811 Loss at step 400: 0.04130098596215248 Loss at step 450: 0.039632875472307205 Loss at step 500: 0.05174471437931061 Loss at step 550: 0.041152168065309525 Loss at step 600: 0.03953293338418007 Loss at step 650: 0.04831678047776222 Loss at step 700: 0.040168195962905884 Loss at step 750: 0.058045148849487305 Loss at step 800: 0.05023400858044624 Loss at step 850: 0.04506324231624603 Loss at step 900: 0.03938381373882294 Mean training loss after epoch 70: 0.04706599061160898 EPOCH: 71 Loss at step 0: 0.048933133482933044 Loss at step 50: 0.06072348728775978 Loss at step 100: 0.044322431087493896 Loss at step 150: 0.0379573255777359 Loss at step 200: 0.046718206256628036 Loss at step 250: 0.03879999741911888 Loss at step 300: 0.055539947003126144 Loss at step 350: 0.04774976149201393 Loss at step 400: 0.06769908964633942 Loss at step 450: 0.04883067309856415 Loss at step 500: 0.03767262026667595 Loss at step 550: 0.04490840435028076 Loss at step 600: 0.04414452239871025 Loss at step 650: 0.03944578766822815 Loss at step 700: 0.04526679217815399 Loss at step 750: 0.04265045002102852 Loss at step 800: 0.049209292978048325 Loss at step 850: 0.051392339169979095 Loss at step 900: 0.04199763759970665 Mean training loss after epoch 71: 0.04647631058767279 EPOCH: 72 Loss at step 0: 0.037351757287979126 Loss at step 50: 0.05144992098212242 Loss at step 100: 0.049174852669239044 Loss at step 150: 0.04054522514343262 Loss at step 200: 0.05075554922223091 Loss at step 250: 0.0451522059738636 Loss at step 300: 0.0631338432431221 Loss at step 350: 0.03411368280649185 Loss at step 400: 0.05932801589369774 Loss at step 450: 0.05935049429535866 Loss at step 500: 0.03340667486190796 Loss at step 550: 0.04059337079524994 Loss at step 600: 0.03747294843196869 Loss at step 650: 0.042974457144737244 Loss at step 700: 0.043749187141656876 Loss at step 750: 0.05861726403236389 Loss at step 800: 0.03579092398285866 Loss at step 850: 0.038818247616291046 Loss at step 900: 0.046826623380184174 Mean training loss after epoch 72: 0.04672037470124678 EPOCH: 73 Loss at step 0: 0.04076121002435684 Loss at step 50: 0.04321940615773201 Loss at step 100: 0.035648319870233536 Loss at step 150: 0.04660576954483986 Loss at step 200: 0.04967901483178139 Loss at step 250: 0.03815003111958504 Loss at step 300: 0.036646630614995956 Loss at step 350: 0.04083497077226639 Loss at step 400: 0.05002129077911377 Loss at step 450: 0.04220649227499962 Loss at step 500: 0.04852573201060295 Loss at step 550: 0.04010987654328346 Loss at step 600: 0.049310389906167984 Loss at step 650: 0.05295303463935852 Loss at step 700: 0.04436635971069336 Loss at step 750: 0.04303748533129692 Loss at step 800: 0.03880155831575394 Loss at step 850: 0.05747796967625618 Loss at step 900: 0.06115530803799629 Mean training loss after epoch 73: 0.04707469783032309 EPOCH: 74 Loss at step 0: 0.04151436686515808 Loss at step 50: 0.045299433171749115 Loss at step 100: 0.040978703647851944 Loss at step 150: 0.039344724267721176 Loss at step 200: 0.040704455226659775 Loss at step 250: 0.04816163331270218 Loss at step 300: 0.06526995450258255 Loss at step 350: 0.038377951830625534 Loss at step 400: 0.04510173946619034 Loss at step 450: 0.06017175316810608 Loss at step 500: 0.05685875192284584 Loss at step 550: 0.043399810791015625 Loss at step 600: 0.041581250727176666 Loss at step 650: 0.05848102644085884 Loss at step 700: 0.05299646034836769 Loss at step 750: 0.04554300010204315 Loss at step 800: 0.0672788992524147 Loss at step 850: 0.036223456263542175 Loss at step 900: 0.03746917471289635 Mean training loss after epoch 74: 0.04663193005440967 EPOCH: 75 Loss at step 0: 0.06510418653488159 Loss at step 50: 0.0428827665746212 Loss at step 100: 0.04674754664301872 Loss at step 150: 0.034679852426052094 Loss at step 200: 0.04052438959479332 Loss at step 250: 0.04185913875699043 Loss at step 300: 0.03701731190085411 Loss at step 350: 0.05839525908231735 Loss at step 400: 0.08055532723665237 Loss at step 450: 0.03798747807741165 Loss at step 500: 0.03870858997106552 Loss at step 550: 0.04734209179878235 Loss at step 600: 0.04606061801314354 Loss at step 650: 0.03919639810919762 Loss at step 700: 0.035162199288606644 Loss at step 750: 0.044265419244766235 Loss at step 800: 0.04063871130347252 Loss at step 850: 0.046370990574359894 Loss at step 900: 0.039004627615213394 Mean training loss after epoch 75: 0.04658207347167771 EPOCH: 76 Loss at step 0: 0.050724539905786514 Loss at step 50: 0.04318808764219284 Loss at step 100: 0.06086809188127518 Loss at step 150: 0.03939744457602501 Loss at step 200: 0.04349702224135399 Loss at step 250: 0.050022173672914505 Loss at step 300: 0.038475800305604935 Loss at step 350: 0.04485173895955086 Loss at step 400: 0.056289877742528915 Loss at step 450: 0.038029033690690994 Loss at step 500: 0.03938248008489609 Loss at step 550: 0.042038168758153915 Loss at step 600: 0.05614514276385307 Loss at step 650: 0.048029765486717224 Loss at step 700: 0.04157460480928421 Loss at step 750: 0.03268986567854881 Loss at step 800: 0.04234771430492401 Loss at step 850: 0.03760245442390442 Loss at step 900: 0.05143234133720398 Mean training loss after epoch 76: 0.046328955574203405 EPOCH: 77 Loss at step 0: 0.04460402950644493 Loss at step 50: 0.03896404057741165 Loss at step 100: 0.04294845834374428 Loss at step 150: 0.04045379161834717 Loss at step 200: 0.04180868715047836 Loss at step 250: 0.05968054011464119 Loss at step 300: 0.057949941605329514 Loss at step 350: 0.03777265548706055 Loss at step 400: 0.04339759424328804 Loss at step 450: 0.04139368608593941 Loss at step 500: 0.04214297607541084 Loss at step 550: 0.04679353907704353 Loss at step 600: 0.0372079499065876 Loss at step 650: 0.07386253774166107 Loss at step 700: 0.06014396250247955 Loss at step 750: 0.05433698371052742 Loss at step 800: 0.04691873490810394 Loss at step 850: 0.03916292265057564 Loss at step 900: 0.06371419876813889 Mean training loss after epoch 77: 0.046980197257451665 EPOCH: 78 Loss at step 0: 0.0436321459710598 Loss at step 50: 0.040520790964365005 Loss at step 100: 0.03976666182279587 Loss at step 150: 0.04382176324725151 Loss at step 200: 0.06901687383651733 Loss at step 250: 0.04052455350756645 Loss at step 300: 0.039674922823905945 Loss at step 350: 0.0410609245300293 Loss at step 400: 0.04690007120370865 Loss at step 450: 0.04003782570362091 Loss at step 500: 0.035881590098142624 Loss at step 550: 0.043689124286174774 Loss at step 600: 0.06190581992268562 Loss at step 650: 0.05985833704471588 Loss at step 700: 0.0461273230612278 Loss at step 750: 0.040103521198034286 Loss at step 800: 0.044692158699035645 Loss at step 850: 0.05954989045858383 Loss at step 900: 0.04473769664764404 Mean training loss after epoch 78: 0.046493531941477935 EPOCH: 79 Loss at step 0: 0.03826908767223358 Loss at step 50: 0.04367157816886902 Loss at step 100: 0.048482201993465424 Loss at step 150: 0.03936955705285072 Loss at step 200: 0.0416095107793808 Loss at step 250: 0.06551491469144821 Loss at step 300: 0.04213517904281616 Loss at step 350: 0.04769870266318321 Loss at step 400: 0.04063646122813225 Loss at step 450: 0.05516039580106735 Loss at step 500: 0.04599534720182419 Loss at step 550: 0.04067381098866463 Loss at step 600: 0.05324804410338402 Loss at step 650: 0.07911417633295059 Loss at step 700: 0.03929920867085457 Loss at step 750: 0.0639362707734108 Loss at step 800: 0.04507226124405861 Loss at step 850: 0.05595323070883751 Loss at step 900: 0.03641363978385925 Mean training loss after epoch 79: 0.04596119451878676 EPOCH: 80 Loss at step 0: 0.043059855699539185 Loss at step 50: 0.04386511445045471 Loss at step 100: 0.03702298551797867 Loss at step 150: 0.044457364827394485 Loss at step 200: 0.038280218839645386 Loss at step 250: 0.03972554951906204 Loss at step 300: 0.032914794981479645 Loss at step 350: 0.05250844731926918 Loss at step 400: 0.040263935923576355 Loss at step 450: 0.03720948472619057 Loss at step 500: 0.039043232798576355 Loss at step 550: 0.04992212355136871 Loss at step 600: 0.05528197064995766 Loss at step 650: 0.058428358286619186 Loss at step 700: 0.040580231696367264 Loss at step 750: 0.04222964495420456 Loss at step 800: 0.03987835720181465 Loss at step 850: 0.03841922804713249 Loss at step 900: 0.03964025899767876 Mean training loss after epoch 80: 0.04675633649526438 EPOCH: 81 Loss at step 0: 0.04294147342443466 Loss at step 50: 0.03750874474644661 Loss at step 100: 0.04480570927262306 Loss at step 150: 0.054179415106773376 Loss at step 200: 0.05873093008995056 Loss at step 250: 0.04453163594007492 Loss at step 300: 0.04360324516892433 Loss at step 350: 0.03819626197218895 Loss at step 400: 0.04269474744796753 Loss at step 450: 0.051444508135318756 Loss at step 500: 0.06055659428238869 Loss at step 550: 0.04260946810245514 Loss at step 600: 0.047703638672828674 Loss at step 650: 0.04420706629753113 Loss at step 700: 0.0422188900411129 Loss at step 750: 0.05335630103945732 Loss at step 800: 0.03268652409315109 Loss at step 850: 0.07306642830371857 Loss at step 900: 0.04345092549920082 Mean training loss after epoch 81: 0.04696100746104712 EPOCH: 82 Loss at step 0: 0.04407839849591255 Loss at step 50: 0.04146694391965866 Loss at step 100: 0.03925240784883499 Loss at step 150: 0.04650525376200676 Loss at step 200: 0.0485885851085186 Loss at step 250: 0.05592464283108711 Loss at step 300: 0.051775697618722916 Loss at step 350: 0.04124452918767929 Loss at step 400: 0.05777931213378906 Loss at step 450: 0.03746679425239563 Loss at step 500: 0.04309296980500221 Loss at step 550: 0.0754709541797638 Loss at step 600: 0.04911469668149948 Loss at step 650: 0.056559968739748 Loss at step 700: 0.05650978535413742 Loss at step 750: 0.05455979332327843 Loss at step 800: 0.03872409090399742 Loss at step 850: 0.0484020933508873 Loss at step 900: 0.057744842022657394 Mean training loss after epoch 82: 0.04682268855223524 EPOCH: 83 Loss at step 0: 0.04085930809378624 Loss at step 50: 0.04331555962562561 Loss at step 100: 0.03998298570513725 Loss at step 150: 0.044699396938085556 Loss at step 200: 0.04715953394770622 Loss at step 250: 0.040661588311195374 Loss at step 300: 0.045970477163791656 Loss at step 350: 0.04109860211610794 Loss at step 400: 0.03988077491521835 Loss at step 450: 0.04544086754322052 Loss at step 500: 0.039380256086587906 Loss at step 550: 0.0602344386279583 Loss at step 600: 0.0581962987780571 Loss at step 650: 0.04821816086769104 Loss at step 700: 0.05307741090655327 Loss at step 750: 0.052487362176179886 Loss at step 800: 0.04843626916408539 Loss at step 850: 0.045977991074323654 Loss at step 900: 0.04208477959036827 Mean training loss after epoch 83: 0.04646959100395187 EPOCH: 84 Loss at step 0: 0.05132857710123062 Loss at step 50: 0.040353551506996155 Loss at step 100: 0.04923536255955696 Loss at step 150: 0.06066861376166344 Loss at step 200: 0.06053975224494934 Loss at step 250: 0.040289249271154404 Loss at step 300: 0.05039741098880768 Loss at step 350: 0.040825005620718 Loss at step 400: 0.04631093889474869 Loss at step 450: 0.04110642895102501 Loss at step 500: 0.03968242555856705 Loss at step 550: 0.0507030263543129 Loss at step 600: 0.03993452340364456 Loss at step 650: 0.037151508033275604 Loss at step 700: 0.06560724228620529 Loss at step 750: 0.03816938400268555 Loss at step 800: 0.041706979274749756 Loss at step 850: 0.04701434075832367 Loss at step 900: 0.052717216312885284 Mean training loss after epoch 84: 0.04636791844897941 EPOCH: 85 Loss at step 0: 0.0448119230568409 Loss at step 50: 0.05372151359915733 Loss at step 100: 0.047969501465559006 Loss at step 150: 0.04142036288976669 Loss at step 200: 0.045791205018758774 Loss at step 250: 0.04753302037715912 Loss at step 300: 0.040758222341537476 Loss at step 350: 0.057022903114557266 Loss at step 400: 0.04700716957449913 Loss at step 450: 0.06130004674196243 Loss at step 500: 0.04115890711545944 Loss at step 550: 0.04039519652724266 Loss at step 600: 0.0341905802488327 Loss at step 650: 0.041618600487709045 Loss at step 700: 0.03960268571972847 Loss at step 750: 0.04738497734069824 Loss at step 800: 0.03702997416257858 Loss at step 850: 0.045399561524391174 Loss at step 900: 0.054047051817178726 Mean training loss after epoch 85: 0.04602402822374662 EPOCH: 86 Loss at step 0: 0.07172813266515732 Loss at step 50: 0.053118422627449036 Loss at step 100: 0.058854833245277405 Loss at step 150: 0.042429227381944656 Loss at step 200: 0.04839654639363289 Loss at step 250: 0.0402103029191494 Loss at step 300: 0.04665457829833031 Loss at step 350: 0.039321739226579666 Loss at step 400: 0.03910680115222931 Loss at step 450: 0.041398126631975174 Loss at step 500: 0.03600377216935158 Loss at step 550: 0.04888325184583664 Loss at step 600: 0.0442972369492054 Loss at step 650: 0.057485610246658325 Loss at step 700: 0.040198929607868195 Loss at step 750: 0.0467190183699131 Loss at step 800: 0.05040740966796875 Loss at step 850: 0.037695158272981644 Loss at step 900: 0.03791056200861931 Mean training loss after epoch 86: 0.046677122857651986 EPOCH: 87 Loss at step 0: 0.048008032143116 Loss at step 50: 0.05880206823348999 Loss at step 100: 0.05426805838942528 Loss at step 150: 0.04468279331922531 Loss at step 200: 0.07699838280677795 Loss at step 250: 0.03907587751746178 Loss at step 300: 0.052888259291648865 Loss at step 350: 0.045319799333810806 Loss at step 400: 0.041423700749874115 Loss at step 450: 0.03447338566184044 Loss at step 500: 0.049211181700229645 Loss at step 550: 0.044402144849300385 Loss at step 600: 0.038957081735134125 Loss at step 650: 0.036828603595495224 Loss at step 700: 0.054644688963890076 Loss at step 750: 0.04475133493542671 Loss at step 800: 0.03951174393296242 Loss at step 850: 0.0445011667907238 Loss at step 900: 0.05303921923041344 Mean training loss after epoch 87: 0.04662589250462078 EPOCH: 88 Loss at step 0: 0.04095936194062233 Loss at step 50: 0.05740981921553612 Loss at step 100: 0.03356751799583435 Loss at step 150: 0.04101330786943436 Loss at step 200: 0.04165109246969223 Loss at step 250: 0.04774738848209381 Loss at step 300: 0.0368298701941967 Loss at step 350: 0.042243052273988724 Loss at step 400: 0.04973376914858818 Loss at step 450: 0.0388069674372673 Loss at step 500: 0.05148136243224144 Loss at step 550: 0.035082750022411346 Loss at step 600: 0.039241526275873184 Loss at step 650: 0.04062201827764511 Loss at step 700: 0.03812513127923012 Loss at step 750: 0.03865785524249077 Loss at step 800: 0.043402671813964844 Loss at step 850: 0.06126170605421066 Loss at step 900: 0.04055490717291832 Mean training loss after epoch 88: 0.0461914571844089 EPOCH: 89 Loss at step 0: 0.04994302615523338 Loss at step 50: 0.03743825480341911 Loss at step 100: 0.048832450062036514 Loss at step 150: 0.05880393087863922 Loss at step 200: 0.04024968668818474 Loss at step 250: 0.05558256059885025 Loss at step 300: 0.03903936594724655 Loss at step 350: 0.0444948673248291 Loss at step 400: 0.03698185831308365 Loss at step 450: 0.04081856459379196 Loss at step 500: 0.05662966147065163 Loss at step 550: 0.04961967468261719 Loss at step 600: 0.05946849659085274 Loss at step 650: 0.05116843432188034 Loss at step 700: 0.05528327450156212 Loss at step 750: 0.04257083311676979 Loss at step 800: 0.046960752457380295 Loss at step 850: 0.047767091542482376 Loss at step 900: 0.03987089917063713 Mean training loss after epoch 89: 0.04622526244838228 EPOCH: 90 Loss at step 0: 0.06063803285360336 Loss at step 50: 0.0424574613571167 Loss at step 100: 0.041625138372182846 Loss at step 150: 0.034588783979415894 Loss at step 200: 0.043866440653800964 Loss at step 250: 0.045798640698194504 Loss at step 300: 0.03853832557797432 Loss at step 350: 0.06382334977388382 Loss at step 400: 0.04653134569525719 Loss at step 450: 0.041496675461530685 Loss at step 500: 0.04409236088395119 Loss at step 550: 0.04770370200276375 Loss at step 600: 0.0462186373770237 Loss at step 650: 0.04067652300000191 Loss at step 700: 0.054041434079408646 Loss at step 750: 0.05379099026322365 Loss at step 800: 0.03475543111562729 Loss at step 850: 0.043173231184482574 Loss at step 900: 0.0550682470202446 Mean training loss after epoch 90: 0.04664755975013412 EPOCH: 91 Loss at step 0: 0.04082764685153961 Loss at step 50: 0.04821721464395523 Loss at step 100: 0.044878121465444565 Loss at step 150: 0.04182320460677147 Loss at step 200: 0.05699346214532852 Loss at step 250: 0.046205148100852966 Loss at step 300: 0.03751571103930473 Loss at step 350: 0.041176483035087585 Loss at step 400: 0.045942485332489014 Loss at step 450: 0.0438523031771183 Loss at step 500: 0.04302748292684555 Loss at step 550: 0.043287452310323715 Loss at step 600: 0.06685381382703781 Loss at step 650: 0.052904341369867325 Loss at step 700: 0.0414947010576725 Loss at step 750: 0.04428581893444061 Loss at step 800: 0.0412389300763607 Loss at step 850: 0.045413609594106674 Loss at step 900: 0.038029517978429794 Mean training loss after epoch 91: 0.04596302945460719 EPOCH: 92 Loss at step 0: 0.05232563614845276 Loss at step 50: 0.04430554807186127 Loss at step 100: 0.04119739681482315 Loss at step 150: 0.047370247542858124 Loss at step 200: 0.03939657285809517 Loss at step 250: 0.039717331528663635 Loss at step 300: 0.04332708567380905 Loss at step 350: 0.07481904327869415 Loss at step 400: 0.05585939809679985 Loss at step 450: 0.034339748322963715 Loss at step 500: 0.0591573603451252 Loss at step 550: 0.056375112384557724 Loss at step 600: 0.04369417950510979 Loss at step 650: 0.03970782458782196 Loss at step 700: 0.059591539204120636 Loss at step 750: 0.038719359785318375 Loss at step 800: 0.0540897510945797 Loss at step 850: 0.061056412756443024 Loss at step 900: 0.042261555790901184 Mean training loss after epoch 92: 0.04576610633011248 EPOCH: 93 Loss at step 0: 0.045835018157958984 Loss at step 50: 0.0443243682384491 Loss at step 100: 0.03608640655875206 Loss at step 150: 0.0347212478518486 Loss at step 200: 0.03769459202885628 Loss at step 250: 0.04713582992553711 Loss at step 300: 0.04511668533086777 Loss at step 350: 0.058783695101737976 Loss at step 400: 0.03647502139210701 Loss at step 450: 0.04561181366443634 Loss at step 500: 0.057211704552173615 Loss at step 550: 0.03964332863688469 Loss at step 600: 0.04506583884358406 Loss at step 650: 0.03565307334065437 Loss at step 700: 0.05165249481797218 Loss at step 750: 0.04046669229865074 Loss at step 800: 0.04181963950395584 Loss at step 850: 0.04751787334680557 Loss at step 900: 0.07055795192718506 Mean training loss after epoch 93: 0.04603372933442341 EPOCH: 94 Loss at step 0: 0.03797959163784981 Loss at step 50: 0.05107312276959419 Loss at step 100: 0.04748326912522316 Loss at step 150: 0.053900811821222305 Loss at step 200: 0.03657055273652077 Loss at step 250: 0.0494106151163578 Loss at step 300: 0.06723731011152267 Loss at step 350: 0.03884919360280037 Loss at step 400: 0.03814617916941643 Loss at step 450: 0.05970653146505356 Loss at step 500: 0.03925241157412529 Loss at step 550: 0.043273478746414185 Loss at step 600: 0.036642882972955704 Loss at step 650: 0.03756127133965492 Loss at step 700: 0.07361476868391037 Loss at step 750: 0.05761375278234482 Loss at step 800: 0.04954205080866814 Loss at step 850: 0.04357149079442024 Loss at step 900: 0.03794253244996071 Mean training loss after epoch 94: 0.04602277179016297 EPOCH: 95 Loss at step 0: 0.05163072049617767 Loss at step 50: 0.0443747378885746 Loss at step 100: 0.061171650886535645 Loss at step 150: 0.033554136753082275 Loss at step 200: 0.04577706754207611 Loss at step 250: 0.039306074380874634 Loss at step 300: 0.039763566106557846 Loss at step 350: 0.046032119542360306 Loss at step 400: 0.04351012781262398 Loss at step 450: 0.05555079132318497 Loss at step 500: 0.039365384727716446 Loss at step 550: 0.03652270510792732 Loss at step 600: 0.041159410029649734 Loss at step 650: 0.03522713854908943 Loss at step 700: 0.05416010320186615 Loss at step 750: 0.034960631281137466 Loss at step 800: 0.04422462731599808 Loss at step 850: 0.07029815763235092 Loss at step 900: 0.044007573276758194 Mean training loss after epoch 95: 0.04606622522240127 EPOCH: 96 Loss at step 0: 0.08471716195344925 Loss at step 50: 0.04083536937832832 Loss at step 100: 0.044068560004234314 Loss at step 150: 0.03515639156103134 Loss at step 200: 0.04729173704981804 Loss at step 250: 0.040371645241975784 Loss at step 300: 0.04628555476665497 Loss at step 350: 0.06131986528635025 Loss at step 400: 0.04641334339976311 Loss at step 450: 0.05580970644950867 Loss at step 500: 0.0496378168463707 Loss at step 550: 0.05687294900417328 Loss at step 600: 0.06893197447061539 Loss at step 650: 0.04078323021531105 Loss at step 700: 0.051332298666238785 Loss at step 750: 0.0398741215467453 Loss at step 800: 0.04457125440239906 Loss at step 850: 0.03423371538519859 Loss at step 900: 0.034485943615436554 Mean training loss after epoch 96: 0.04666423618293075 EPOCH: 97 Loss at step 0: 0.03193143755197525 Loss at step 50: 0.06092342361807823 Loss at step 100: 0.0350760892033577 Loss at step 150: 0.03877177834510803 Loss at step 200: 0.05231152102351189 Loss at step 250: 0.03888712078332901 Loss at step 300: 0.040509182959795 Loss at step 350: 0.039845727384090424 Loss at step 400: 0.05378155782818794 Loss at step 450: 0.039821185171604156 Loss at step 500: 0.06873445212841034 Loss at step 550: 0.04370138421654701 Loss at step 600: 0.05771337077021599 Loss at step 650: 0.041599780321121216 Loss at step 700: 0.041043348610401154 Loss at step 750: 0.04434209689497948 Loss at step 800: 0.04462100937962532 Loss at step 850: 0.03976351022720337 Loss at step 900: 0.044081054627895355 Mean training loss after epoch 97: 0.04593602270046785 EPOCH: 98 Loss at step 0: 0.0342780239880085 Loss at step 50: 0.04241218417882919 Loss at step 100: 0.04206021875143051 Loss at step 150: 0.07047586143016815 Loss at step 200: 0.06636768579483032 Loss at step 250: 0.04468633234500885 Loss at step 300: 0.04188190773129463 Loss at step 350: 0.06004686281085014 Loss at step 400: 0.05557779222726822 Loss at step 450: 0.04578510671854019 Loss at step 500: 0.04510294646024704 Loss at step 550: 0.03823684900999069 Loss at step 600: 0.04543498903512955 Loss at step 650: 0.05409776419401169 Loss at step 700: 0.036201052367687225 Loss at step 750: 0.03554457798600197 Loss at step 800: 0.03902563825249672 Loss at step 850: 0.051609884947538376 Loss at step 900: 0.03746648505330086 Mean training loss after epoch 98: 0.04613876398771938 EPOCH: 99 Loss at step 0: 0.03615928441286087 Loss at step 50: 0.040959153324365616 Loss at step 100: 0.034678857773542404 Loss at step 150: 0.045152079313993454 Loss at step 200: 0.05716494098305702 Loss at step 250: 0.05873488634824753 Loss at step 300: 0.05778392776846886 Loss at step 350: 0.03713604435324669 Loss at step 400: 0.04261068254709244 Loss at step 450: 0.06368521600961685 Loss at step 500: 0.05692901834845543 Loss at step 550: 0.054897211492061615 Loss at step 600: 0.04783918336033821 Loss at step 650: 0.04229700192809105 Loss at step 700: 0.03940588980913162 Loss at step 750: 0.03988191857933998 Loss at step 800: 0.05298018455505371 Loss at step 850: 0.0444318912923336 Loss at step 900: 0.05263909325003624 Mean training loss after epoch 99: 0.04553167308881275 EPOCH: 100 Loss at step 0: 0.03345146402716637 Loss at step 50: 0.04001449793577194 Loss at step 100: 0.041478097438812256 Loss at step 150: 0.035443346947431564 Loss at step 200: 0.04292285442352295 Loss at step 250: 0.056641362607479095 Loss at step 300: 0.04687800630927086 Loss at step 350: 0.052689146250486374 Loss at step 400: 0.05926014482975006 Loss at step 450: 0.06873384863138199 Loss at step 500: 0.04350966215133667 Loss at step 550: 0.05276675522327423 Loss at step 600: 0.03778418153524399 Loss at step 650: 0.041283462196588516 Loss at step 700: 0.04496544227004051 Loss at step 750: 0.04064444079995155 Loss at step 800: 0.0392770990729332 Loss at step 850: 0.06321597099304199 Loss at step 900: 0.04072505608201027 Mean training loss after epoch 100: 0.04593738991377959 EPOCH: 101 Loss at step 0: 0.06284533441066742 Loss at step 50: 0.037693191319704056 Loss at step 100: 0.03766042739152908 Loss at step 150: 0.0374579094350338 Loss at step 200: 0.059883106499910355 Loss at step 250: 0.03248827904462814 Loss at step 300: 0.05771387368440628 Loss at step 350: 0.032919157296419144 Loss at step 400: 0.04494740068912506 Loss at step 450: 0.03686508908867836 Loss at step 500: 0.05213222652673721 Loss at step 550: 0.035883303731679916 Loss at step 600: 0.04146915301680565 Loss at step 650: 0.057481247931718826 Loss at step 700: 0.04926550015807152 Loss at step 750: 0.0644112080335617 Loss at step 800: 0.03733290731906891 Loss at step 850: 0.03840302303433418 Loss at step 900: 0.03663436323404312 Mean training loss after epoch 101: 0.045610774655951494 EPOCH: 102 Loss at step 0: 0.03266073763370514 Loss at step 50: 0.05712069198489189 Loss at step 100: 0.03992948308587074 Loss at step 150: 0.05871712416410446 Loss at step 200: 0.05021010339260101 Loss at step 250: 0.03915297985076904 Loss at step 300: 0.05636504292488098 Loss at step 350: 0.05920378863811493 Loss at step 400: 0.051703859120607376 Loss at step 450: 0.05553581193089485 Loss at step 500: 0.04331844672560692 Loss at step 550: 0.03778453171253204 Loss at step 600: 0.03466461971402168 Loss at step 650: 0.05443223565816879 Loss at step 700: 0.038912177085876465 Loss at step 750: 0.03801083564758301 Loss at step 800: 0.053212933242321014 Loss at step 850: 0.04365288466215134 Loss at step 900: 0.039011143147945404 Mean training loss after epoch 102: 0.04544403986024386 EPOCH: 103 Loss at step 0: 0.03836321830749512 Loss at step 50: 0.03450578823685646 Loss at step 100: 0.043043240904808044 Loss at step 150: 0.037821900099515915 Loss at step 200: 0.035670142620801926 Loss at step 250: 0.05436832830309868 Loss at step 300: 0.05005837231874466 Loss at step 350: 0.03833141177892685 Loss at step 400: 0.04092288017272949 Loss at step 450: 0.04235215485095978 Loss at step 500: 0.03582450747489929 Loss at step 550: 0.05497898906469345 Loss at step 600: 0.05159313231706619 Loss at step 650: 0.043687283992767334 Loss at step 700: 0.03675663471221924 Loss at step 750: 0.03909031301736832 Loss at step 800: 0.058008261024951935 Loss at step 850: 0.06922484934329987 Loss at step 900: 0.039749786257743835 Mean training loss after epoch 103: 0.045992981815246 EPOCH: 104 Loss at step 0: 0.045963071286678314 Loss at step 50: 0.04501866549253464 Loss at step 100: 0.036801066249608994 Loss at step 150: 0.04234980419278145 Loss at step 200: 0.03912907466292381 Loss at step 250: 0.03137866035103798 Loss at step 300: 0.06587888300418854 Loss at step 350: 0.05509747564792633 Loss at step 400: 0.034431394189596176 Loss at step 450: 0.04458971321582794 Loss at step 500: 0.04710601642727852 Loss at step 550: 0.035531047731637955 Loss at step 600: 0.03829113021492958 Loss at step 650: 0.04536285996437073 Loss at step 700: 0.03908412531018257 Loss at step 750: 0.03482560068368912 Loss at step 800: 0.05889918655157089 Loss at step 850: 0.03735806792974472 Loss at step 900: 0.038946483284235 Mean training loss after epoch 104: 0.046232971247595386 EPOCH: 105 Loss at step 0: 0.03802655264735222 Loss at step 50: 0.043217793107032776 Loss at step 100: 0.040620628744363785 Loss at step 150: 0.03952966257929802 Loss at step 200: 0.044269394129514694 Loss at step 250: 0.03849523887038231 Loss at step 300: 0.04610207676887512 Loss at step 350: 0.05798592418432236 Loss at step 400: 0.062006112188100815 Loss at step 450: 0.05731591209769249 Loss at step 500: 0.04216567799448967 Loss at step 550: 0.0736042782664299 Loss at step 600: 0.039417725056409836 Loss at step 650: 0.06214912608265877 Loss at step 700: 0.04436998441815376 Loss at step 750: 0.036212705075740814 Loss at step 800: 0.05886728689074516 Loss at step 850: 0.05092043802142143 Loss at step 900: 0.05248633772134781 Mean training loss after epoch 105: 0.045883986916242125 EPOCH: 106 Loss at step 0: 0.04092259332537651 Loss at step 50: 0.052326418459415436 Loss at step 100: 0.05783850699663162 Loss at step 150: 0.04059632495045662 Loss at step 200: 0.054954130202531815 Loss at step 250: 0.03884883224964142 Loss at step 300: 0.03604060038924217 Loss at step 350: 0.03297986835241318 Loss at step 400: 0.04052901268005371 Loss at step 450: 0.03716326132416725 Loss at step 500: 0.03729403391480446 Loss at step 550: 0.04082081839442253 Loss at step 600: 0.04013705253601074 Loss at step 650: 0.04336664453148842 Loss at step 700: 0.05716712400317192 Loss at step 750: 0.03719208016991615 Loss at step 800: 0.04664986953139305 Loss at step 850: 0.04502305015921593 Loss at step 900: 0.0402035191655159 Mean training loss after epoch 106: 0.04613978178627583 EPOCH: 107 Loss at step 0: 0.039730705320835114 Loss at step 50: 0.059027642011642456 Loss at step 100: 0.03385939449071884 Loss at step 150: 0.045767102390527725 Loss at step 200: 0.04241578280925751 Loss at step 250: 0.03641652315855026 Loss at step 300: 0.04466358944773674 Loss at step 350: 0.04171045869588852 Loss at step 400: 0.04439990222454071 Loss at step 450: 0.04007254168391228 Loss at step 500: 0.04224420338869095 Loss at step 550: 0.042757969349622726 Loss at step 600: 0.046736668795347214 Loss at step 650: 0.05228203907608986 Loss at step 700: 0.03738444298505783 Loss at step 750: 0.03584491088986397 Loss at step 800: 0.0339224711060524 Loss at step 850: 0.0401468351483345 Loss at step 900: 0.062193479388952255 Mean training loss after epoch 107: 0.04537088549864699 EPOCH: 108 Loss at step 0: 0.04594706371426582 Loss at step 50: 0.048390794545412064 Loss at step 100: 0.07521775364875793 Loss at step 150: 0.04278415068984032 Loss at step 200: 0.048521630465984344 Loss at step 250: 0.03704987093806267 Loss at step 300: 0.042436111718416214 Loss at step 350: 0.0418018214404583 Loss at step 400: 0.0416070856153965 Loss at step 450: 0.04196631908416748 Loss at step 500: 0.03803296759724617 Loss at step 550: 0.0375557504594326 Loss at step 600: 0.039449434727430344 Loss at step 650: 0.03768813610076904 Loss at step 700: 0.03774433955550194 Loss at step 750: 0.055847425013780594 Loss at step 800: 0.05350764840841293 Loss at step 850: 0.05881110951304436 Loss at step 900: 0.041136108338832855 Mean training loss after epoch 108: 0.046230212263048076 EPOCH: 109 Loss at step 0: 0.04331596940755844 Loss at step 50: 0.041769735515117645 Loss at step 100: 0.03577038645744324 Loss at step 150: 0.04017090052366257 Loss at step 200: 0.0379541777074337 Loss at step 250: 0.06268075853586197 Loss at step 300: 0.03878922387957573 Loss at step 350: 0.041492413729429245 Loss at step 400: 0.03876432403922081 Loss at step 450: 0.04306156560778618 Loss at step 500: 0.04325302690267563 Loss at step 550: 0.04203340783715248 Loss at step 600: 0.04345481097698212 Loss at step 650: 0.04283579811453819 Loss at step 700: 0.04968598484992981 Loss at step 750: 0.03872764855623245 Loss at step 800: 0.03822629526257515 Loss at step 850: 0.043384552001953125 Loss at step 900: 0.059493325650691986 Mean training loss after epoch 109: 0.04608502546782051 EPOCH: 110 Loss at step 0: 0.04138829559087753 Loss at step 50: 0.04273611679673195 Loss at step 100: 0.05357480049133301 Loss at step 150: 0.0397087000310421 Loss at step 200: 0.0569712370634079 Loss at step 250: 0.045684970915317535 Loss at step 300: 0.04746640473604202 Loss at step 350: 0.05276178568601608 Loss at step 400: 0.057444941252470016 Loss at step 450: 0.03861095756292343 Loss at step 500: 0.03831413388252258 Loss at step 550: 0.03973749652504921 Loss at step 600: 0.04364781826734543 Loss at step 650: 0.057610202580690384 Loss at step 700: 0.04891800135374069 Loss at step 750: 0.04174337536096573 Loss at step 800: 0.04448842257261276 Loss at step 850: 0.046235837042331696 Loss at step 900: 0.058497507125139236 Mean training loss after epoch 110: 0.04628308412672551 EPOCH: 111 Loss at step 0: 0.05321190506219864 Loss at step 50: 0.050442442297935486 Loss at step 100: 0.04806714504957199 Loss at step 150: 0.040195535868406296 Loss at step 200: 0.04016339033842087 Loss at step 250: 0.053371697664260864 Loss at step 300: 0.05192591995000839 Loss at step 350: 0.041987672448158264 Loss at step 400: 0.0349150188267231 Loss at step 450: 0.06134936213493347 Loss at step 500: 0.03099333867430687 Loss at step 550: 0.04208878427743912 Loss at step 600: 0.03302982822060585 Loss at step 650: 0.038587506860494614 Loss at step 700: 0.041607875376939774 Loss at step 750: 0.041626885533332825 Loss at step 800: 0.0488191694021225 Loss at step 850: 0.041104208678007126 Loss at step 900: 0.047659970819950104 Mean training loss after epoch 111: 0.045597349766141444 EPOCH: 112 Loss at step 0: 0.06383563578128815 Loss at step 50: 0.04436638951301575 Loss at step 100: 0.06976839900016785 Loss at step 150: 0.03989202529191971 Loss at step 200: 0.043199654668569565 Loss at step 250: 0.041085466742515564 Loss at step 300: 0.037663910537958145 Loss at step 350: 0.03954296186566353 Loss at step 400: 0.043034475296735764 Loss at step 450: 0.036268774420022964 Loss at step 500: 0.047949761152267456 Loss at step 550: 0.04154909774661064 Loss at step 600: 0.05549217760562897 Loss at step 650: 0.048869624733924866 Loss at step 700: 0.039821796119213104 Loss at step 750: 0.03410565108060837 Loss at step 800: 0.04710477218031883 Loss at step 850: 0.04484633356332779 Loss at step 900: 0.04346233233809471 Mean training loss after epoch 112: 0.0453420280000326 EPOCH: 113 Loss at step 0: 0.07626193016767502 Loss at step 50: 0.0415835902094841 Loss at step 100: 0.045651957392692566 Loss at step 150: 0.0380973145365715 Loss at step 200: 0.03246360644698143 Loss at step 250: 0.033414047211408615 Loss at step 300: 0.055106669664382935 Loss at step 350: 0.05358094349503517 Loss at step 400: 0.04711718484759331 Loss at step 450: 0.04322366416454315 Loss at step 500: 0.040659353137016296 Loss at step 550: 0.03881227597594261 Loss at step 600: 0.055880963802337646 Loss at step 650: 0.03416518121957779 Loss at step 700: 0.053411707282066345 Loss at step 750: 0.05347869545221329 Loss at step 800: 0.04365544021129608 Loss at step 850: 0.03879184648394585 Loss at step 900: 0.03905371204018593 Mean training loss after epoch 113: 0.04619103259067418 EPOCH: 114 Loss at step 0: 0.06173821911215782 Loss at step 50: 0.044631946831941605 Loss at step 100: 0.03788907453417778 Loss at step 150: 0.040534861385822296 Loss at step 200: 0.060370057821273804 Loss at step 250: 0.06507488340139389 Loss at step 300: 0.03939327970147133 Loss at step 350: 0.03969823569059372 Loss at step 400: 0.03916063904762268 Loss at step 450: 0.038448918610811234 Loss at step 500: 0.038338176906108856 Loss at step 550: 0.036796946078538895 Loss at step 600: 0.05042095482349396 Loss at step 650: 0.04241005331277847 Loss at step 700: 0.03769886493682861 Loss at step 750: 0.03910194709897041 Loss at step 800: 0.04179013893008232 Loss at step 850: 0.040940720587968826 Loss at step 900: 0.05026422068476677 Mean training loss after epoch 114: 0.04623785242438316 EPOCH: 115 Loss at step 0: 0.04057649150490761 Loss at step 50: 0.041685063391923904 Loss at step 100: 0.04562162607908249 Loss at step 150: 0.042543817311525345 Loss at step 200: 0.052379410713911057 Loss at step 250: 0.03978654369711876 Loss at step 300: 0.03946053981781006 Loss at step 350: 0.035735804587602615 Loss at step 400: 0.040037207305431366 Loss at step 450: 0.04607934132218361 Loss at step 500: 0.03653736412525177 Loss at step 550: 0.04023975133895874 Loss at step 600: 0.06386446952819824 Loss at step 650: 0.07692032307386398 Loss at step 700: 0.0415915846824646 Loss at step 750: 0.04023095592856407 Loss at step 800: 0.06130256503820419 Loss at step 850: 0.036192335188388824 Loss at step 900: 0.07270359992980957 Mean training loss after epoch 115: 0.04626032051199408 EPOCH: 116 Loss at step 0: 0.04137524217367172 Loss at step 50: 0.05751585215330124 Loss at step 100: 0.040026068687438965 Loss at step 150: 0.0417439304292202 Loss at step 200: 0.03479935601353645 Loss at step 250: 0.04191172868013382 Loss at step 300: 0.041678331792354584 Loss at step 350: 0.0548802874982357 Loss at step 400: 0.03189847245812416 Loss at step 450: 0.043784238398075104 Loss at step 500: 0.04866683483123779 Loss at step 550: 0.04101727902889252 Loss at step 600: 0.03853071108460426 Loss at step 650: 0.03798483684659004 Loss at step 700: 0.04383529722690582 Loss at step 750: 0.04519215226173401 Loss at step 800: 0.03663557022809982 Loss at step 850: 0.03711738437414169 Loss at step 900: 0.04020150750875473 Mean training loss after epoch 116: 0.04539206308691995 EPOCH: 117 Loss at step 0: 0.04230361059308052 Loss at step 50: 0.03566981852054596 Loss at step 100: 0.05356087535619736 Loss at step 150: 0.037879131734371185 Loss at step 200: 0.040729232132434845 Loss at step 250: 0.04111870005726814 Loss at step 300: 0.035754069685935974 Loss at step 350: 0.0384984090924263 Loss at step 400: 0.0562249980866909 Loss at step 450: 0.05474273860454559 Loss at step 500: 0.05589917674660683 Loss at step 550: 0.04402327537536621 Loss at step 600: 0.03898777440190315 Loss at step 650: 0.042293012142181396 Loss at step 700: 0.0343497097492218 Loss at step 750: 0.040538374334573746 Loss at step 800: 0.0498085580766201 Loss at step 850: 0.0443001352250576 Loss at step 900: 0.051811911165714264 Mean training loss after epoch 117: 0.045026219721986796 EPOCH: 118 Loss at step 0: 0.049354154616594315 Loss at step 50: 0.04046230390667915 Loss at step 100: 0.042247429490089417 Loss at step 150: 0.07154253125190735 Loss at step 200: 0.04717350751161575 Loss at step 250: 0.03412480652332306 Loss at step 300: 0.05966733396053314 Loss at step 350: 0.041864052414894104 Loss at step 400: 0.03680933639407158 Loss at step 450: 0.04304802417755127 Loss at step 500: 0.050058793276548386 Loss at step 550: 0.04052836820483208 Loss at step 600: 0.05878746509552002 Loss at step 650: 0.04671039432287216 Loss at step 700: 0.04113098978996277 Loss at step 750: 0.042497146874666214 Loss at step 800: 0.041556790471076965 Loss at step 850: 0.0468997061252594 Loss at step 900: 0.042844533920288086 Mean training loss after epoch 118: 0.04565491621046941 EPOCH: 119 Loss at step 0: 0.03629811853170395 Loss at step 50: 0.039753109216690063 Loss at step 100: 0.04494662955403328 Loss at step 150: 0.04514935612678528 Loss at step 200: 0.054180391132831573 Loss at step 250: 0.03876584768295288 Loss at step 300: 0.036581311374902725 Loss at step 350: 0.045561112463474274 Loss at step 400: 0.06118316948413849 Loss at step 450: 0.05746641382575035 Loss at step 500: 0.0374116487801075 Loss at step 550: 0.04476442188024521 Loss at step 600: 0.04973207041621208 Loss at step 650: 0.04077382758259773 Loss at step 700: 0.044274888932704926 Loss at step 750: 0.05092695727944374 Loss at step 800: 0.03972302004694939 Loss at step 850: 0.04418878257274628 Loss at step 900: 0.03748190402984619 Mean training loss after epoch 119: 0.04595040716429446 EPOCH: 120 Loss at step 0: 0.04838099330663681 Loss at step 50: 0.04279475286602974 Loss at step 100: 0.04066827893257141 Loss at step 150: 0.05239912495017052 Loss at step 200: 0.10351400822401047 Loss at step 250: 0.04161537438631058 Loss at step 300: 0.04149562492966652 Loss at step 350: 0.05071291700005531 Loss at step 400: 0.03960766643285751 Loss at step 450: 0.0719488337635994 Loss at step 500: 0.07051458954811096 Loss at step 550: 0.04016199707984924 Loss at step 600: 0.050184909254312515 Loss at step 650: 0.0554337278008461 Loss at step 700: 0.03850797191262245 Loss at step 750: 0.059062663465738297 Loss at step 800: 0.05229046940803528 Loss at step 850: 0.037738896906375885 Loss at step 900: 0.07455455511808395 Mean training loss after epoch 120: 0.045788908496435514 EPOCH: 121 Loss at step 0: 0.0605679526925087 Loss at step 50: 0.05179459601640701 Loss at step 100: 0.04193218797445297 Loss at step 150: 0.039966851472854614 Loss at step 200: 0.0580020509660244 Loss at step 250: 0.039037011563777924 Loss at step 300: 0.05752614885568619 Loss at step 350: 0.03489096090197563 Loss at step 400: 0.042048584669828415 Loss at step 450: 0.04997346177697182 Loss at step 500: 0.04731182008981705 Loss at step 550: 0.05156144127249718 Loss at step 600: 0.05521991848945618 Loss at step 650: 0.04086447134613991 Loss at step 700: 0.04651058465242386 Loss at step 750: 0.051097799092531204 Loss at step 800: 0.040208932012319565 Loss at step 850: 0.042645104229450226 Loss at step 900: 0.04390297830104828 Mean training loss after epoch 121: 0.04601577716841817 EPOCH: 122 Loss at step 0: 0.043923601508140564 Loss at step 50: 0.03714654594659805 Loss at step 100: 0.03844394534826279 Loss at step 150: 0.05471894517540932 Loss at step 200: 0.07179094105958939 Loss at step 250: 0.033937808126211166 Loss at step 300: 0.07009420543909073 Loss at step 350: 0.04137634113430977 Loss at step 400: 0.03541381284594536 Loss at step 450: 0.03873599320650101 Loss at step 500: 0.0487188883125782 Loss at step 550: 0.03586108237504959 Loss at step 600: 0.036098774522542953 Loss at step 650: 0.05510465428233147 Loss at step 700: 0.041482146829366684 Loss at step 750: 0.049225449562072754 Loss at step 800: 0.043085575103759766 Loss at step 850: 0.03696538507938385 Loss at step 900: 0.0431535467505455 Mean training loss after epoch 122: 0.04582549276541291 EPOCH: 123 Loss at step 0: 0.04815653711557388 Loss at step 50: 0.04370001330971718 Loss at step 100: 0.03439382091164589 Loss at step 150: 0.08422527462244034 Loss at step 200: 0.038499169051647186 Loss at step 250: 0.04276833310723305 Loss at step 300: 0.03476071357727051 Loss at step 350: 0.04122304171323776 Loss at step 400: 0.035837721079587936 Loss at step 450: 0.05923902615904808 Loss at step 500: 0.04063846915960312 Loss at step 550: 0.042982183396816254 Loss at step 600: 0.03486408293247223 Loss at step 650: 0.05115692317485809 Loss at step 700: 0.058140404522418976 Loss at step 750: 0.04429341107606888 Loss at step 800: 0.055187735706567764 Loss at step 850: 0.04702796787023544 Loss at step 900: 0.05210493877530098 Mean training loss after epoch 123: 0.04538784502371987 EPOCH: 124 Loss at step 0: 0.06142456457018852 Loss at step 50: 0.040733739733695984 Loss at step 100: 0.04394620284438133 Loss at step 150: 0.03876916691660881 Loss at step 200: 0.038998156785964966 Loss at step 250: 0.07218591123819351 Loss at step 300: 0.03354524448513985 Loss at step 350: 0.04375628009438515 Loss at step 400: 0.04026757925748825 Loss at step 450: 0.04343396797776222 Loss at step 500: 0.05305086821317673 Loss at step 550: 0.045792609453201294 Loss at step 600: 0.04338764026761055 Loss at step 650: 0.05779615417122841 Loss at step 700: 0.03826998919248581 Loss at step 750: 0.04057617858052254 Loss at step 800: 0.039986502379179 Loss at step 850: 0.053070370107889175 Loss at step 900: 0.059538912028074265 Mean training loss after epoch 124: 0.04550087457097797 EPOCH: 125 Loss at step 0: 0.04200837016105652 Loss at step 50: 0.03761272504925728 Loss at step 100: 0.03739985451102257 Loss at step 150: 0.06605248153209686 Loss at step 200: 0.044975295662879944 Loss at step 250: 0.04247662425041199 Loss at step 300: 0.04585328325629234 Loss at step 350: 0.042718205600976944 Loss at step 400: 0.057040903717279434 Loss at step 450: 0.038020189851522446 Loss at step 500: 0.07280875742435455 Loss at step 550: 0.04592399299144745 Loss at step 600: 0.05157145485281944 Loss at step 650: 0.03785568103194237 Loss at step 700: 0.04340960457921028 Loss at step 750: 0.03745993599295616 Loss at step 800: 0.055159494280815125 Loss at step 850: 0.0517415776848793 Loss at step 900: 0.040050674229860306 Mean training loss after epoch 125: 0.045501244474829894 EPOCH: 126 Loss at step 0: 0.04763704910874367 Loss at step 50: 0.039280299097299576 Loss at step 100: 0.045587651431560516 Loss at step 150: 0.047382839024066925 Loss at step 200: 0.035892464220523834 Loss at step 250: 0.03947671130299568 Loss at step 300: 0.04084878787398338 Loss at step 350: 0.04034554213285446 Loss at step 400: 0.03943360596895218 Loss at step 450: 0.036898188292980194 Loss at step 500: 0.055732179433107376 Loss at step 550: 0.038164325058460236 Loss at step 600: 0.03902953863143921 Loss at step 650: 0.04754732921719551 Loss at step 700: 0.045771315693855286 Loss at step 750: 0.03638895973563194 Loss at step 800: 0.03966190665960312 Loss at step 850: 0.07703128457069397 Loss at step 900: 0.04331199452280998 Mean training loss after epoch 126: 0.04557279546433357 EPOCH: 127 Loss at step 0: 0.059333737939596176 Loss at step 50: 0.06586962938308716 Loss at step 100: 0.05987364798784256 Loss at step 150: 0.03762233257293701 Loss at step 200: 0.05424175038933754 Loss at step 250: 0.040750131011009216 Loss at step 300: 0.040018774569034576 Loss at step 350: 0.05387149378657341 Loss at step 400: 0.03840246796607971 Loss at step 450: 0.05691347271203995 Loss at step 500: 0.03817612677812576 Loss at step 550: 0.057353537529706955 Loss at step 600: 0.04899299889802933 Loss at step 650: 0.03684781491756439 Loss at step 700: 0.039131779223680496 Loss at step 750: 0.04168980196118355 Loss at step 800: 0.038554597645998 Loss at step 850: 0.07030532509088516 Loss at step 900: 0.04318450018763542 Mean training loss after epoch 127: 0.04599316375079884 EPOCH: 128 Loss at step 0: 0.0466042198240757 Loss at step 50: 0.07072354853153229 Loss at step 100: 0.041617561131715775 Loss at step 150: 0.05082223564386368 Loss at step 200: 0.0592975988984108 Loss at step 250: 0.03287239372730255 Loss at step 300: 0.04346143826842308 Loss at step 350: 0.03603116422891617 Loss at step 400: 0.038364943116903305 Loss at step 450: 0.04101430997252464 Loss at step 500: 0.06150265038013458 Loss at step 550: 0.04119909927248955 Loss at step 600: 0.03752235695719719 Loss at step 650: 0.045760560780763626 Loss at step 700: 0.04026544839143753 Loss at step 750: 0.04553986340761185 Loss at step 800: 0.03764986991882324 Loss at step 850: 0.0483270101249218 Loss at step 900: 0.04005664959549904 Mean training loss after epoch 128: 0.045383623705855185 EPOCH: 129 Loss at step 0: 0.053651321679353714 Loss at step 50: 0.03917371854186058 Loss at step 100: 0.05599095672369003 Loss at step 150: 0.054929040372371674 Loss at step 200: 0.04243887960910797 Loss at step 250: 0.04254559054970741 Loss at step 300: 0.03672843426465988 Loss at step 350: 0.0511045940220356 Loss at step 400: 0.039361849427223206 Loss at step 450: 0.039907995611429214 Loss at step 500: 0.057698700577020645 Loss at step 550: 0.033956773579120636 Loss at step 600: 0.05067725479602814 Loss at step 650: 0.05590909719467163 Loss at step 700: 0.06066897511482239 Loss at step 750: 0.057069331407547 Loss at step 800: 0.0603121742606163 Loss at step 850: 0.04482825845479965 Loss at step 900: 0.05580492317676544 Mean training loss after epoch 129: 0.04554252279823078 EPOCH: 130 Loss at step 0: 0.04340419918298721 Loss at step 50: 0.038668155670166016 Loss at step 100: 0.04678479954600334 Loss at step 150: 0.06984972208738327 Loss at step 200: 0.04738502949476242 Loss at step 250: 0.03660202771425247 Loss at step 300: 0.07697250694036484 Loss at step 350: 0.04514691233634949 Loss at step 400: 0.03893262520432472 Loss at step 450: 0.05270220339298248 Loss at step 500: 0.05586271360516548 Loss at step 550: 0.03882136568427086 Loss at step 600: 0.039893507957458496 Loss at step 650: 0.04328221455216408 Loss at step 700: 0.03795141726732254 Loss at step 750: 0.038523118942976 Loss at step 800: 0.03727137669920921 Loss at step 850: 0.04158678650856018 Loss at step 900: 0.059664588421583176 Mean training loss after epoch 130: 0.0457480407035999 EPOCH: 131 Loss at step 0: 0.05638613924384117 Loss at step 50: 0.03839738667011261 Loss at step 100: 0.03876176476478577 Loss at step 150: 0.041901782155036926 Loss at step 200: 0.050253286957740784 Loss at step 250: 0.0528360977768898 Loss at step 300: 0.0518767386674881 Loss at step 350: 0.03999554365873337 Loss at step 400: 0.04001703858375549 Loss at step 450: 0.04216305539011955 Loss at step 500: 0.04270217940211296 Loss at step 550: 0.03471309691667557 Loss at step 600: 0.03751356154680252 Loss at step 650: 0.05595651641488075 Loss at step 700: 0.03887197747826576 Loss at step 750: 0.040457408875226974 Loss at step 800: 0.042652927339076996 Loss at step 850: 0.041036684066057205 Loss at step 900: 0.056002430617809296 Mean training loss after epoch 131: 0.04539558447119015 EPOCH: 132 Loss at step 0: 0.04207568243145943 Loss at step 50: 0.0566919781267643 Loss at step 100: 0.04351049289107323 Loss at step 150: 0.04230139032006264 Loss at step 200: 0.034749437123537064 Loss at step 250: 0.040480904281139374 Loss at step 300: 0.039265699684619904 Loss at step 350: 0.04376889020204544 Loss at step 400: 0.05784851685166359 Loss at step 450: 0.05143562704324722 Loss at step 500: 0.04562538489699364 Loss at step 550: 0.07069180905818939 Loss at step 600: 0.041919611394405365 Loss at step 650: 0.05814877524971962 Loss at step 700: 0.03926195204257965 Loss at step 750: 0.03882830590009689 Loss at step 800: 0.04184386134147644 Loss at step 850: 0.037850987166166306 Loss at step 900: 0.04386572539806366 Mean training loss after epoch 132: 0.04502781874724606 EPOCH: 133 Loss at step 0: 0.041981443762779236 Loss at step 50: 0.044557683169841766 Loss at step 100: 0.0405050627887249 Loss at step 150: 0.04432527348399162 Loss at step 200: 0.0363587811589241 Loss at step 250: 0.05288492888212204 Loss at step 300: 0.03484547138214111 Loss at step 350: 0.05178239941596985 Loss at step 400: 0.06321636587381363 Loss at step 450: 0.05097784101963043 Loss at step 500: 0.051171835511922836 Loss at step 550: 0.03884849697351456 Loss at step 600: 0.042196448892354965 Loss at step 650: 0.05368613451719284 Loss at step 700: 0.03731679916381836 Loss at step 750: 0.040016379207372665 Loss at step 800: 0.06686176359653473 Loss at step 850: 0.04520504176616669 Loss at step 900: 0.04722175374627113 Mean training loss after epoch 133: 0.0453269995872114 EPOCH: 134 Loss at step 0: 0.055279213935136795 Loss at step 50: 0.040013182908296585 Loss at step 100: 0.04826372489333153 Loss at step 150: 0.042962219566106796 Loss at step 200: 0.05188482627272606 Loss at step 250: 0.056876007467508316 Loss at step 300: 0.03821921721100807 Loss at step 350: 0.04275687783956528 Loss at step 400: 0.035218246281147 Loss at step 450: 0.03679642826318741 Loss at step 500: 0.05798317864537239 Loss at step 550: 0.053573962301015854 Loss at step 600: 0.034900959581136703 Loss at step 650: 0.04363512247800827 Loss at step 700: 0.035073887556791306 Loss at step 750: 0.07189979404211044 Loss at step 800: 0.03670172765851021 Loss at step 850: 0.0559610053896904 Loss at step 900: 0.03790052607655525 Mean training loss after epoch 134: 0.045974428664201865 EPOCH: 135 Loss at step 0: 0.048522092401981354 Loss at step 50: 0.03976353257894516 Loss at step 100: 0.03863305225968361 Loss at step 150: 0.054090000689029694 Loss at step 200: 0.04041542857885361 Loss at step 250: 0.041321899741888046 Loss at step 300: 0.04464292526245117 Loss at step 350: 0.03554973751306534 Loss at step 400: 0.06174508109688759 Loss at step 450: 0.03710295632481575 Loss at step 500: 0.03672080859541893 Loss at step 550: 0.038844525814056396 Loss at step 600: 0.041508644819259644 Loss at step 650: 0.05704212933778763 Loss at step 700: 0.04607998579740524 Loss at step 750: 0.03549562767148018 Loss at step 800: 0.05155307427048683 Loss at step 850: 0.03882014378905296 Loss at step 900: 0.03972895070910454 Mean training loss after epoch 135: 0.04565108851043146 EPOCH: 136 Loss at step 0: 0.05280226469039917 Loss at step 50: 0.039051275700330734 Loss at step 100: 0.043916817754507065 Loss at step 150: 0.0367436483502388 Loss at step 200: 0.05519120395183563 Loss at step 250: 0.059185415506362915 Loss at step 300: 0.038331348448991776 Loss at step 350: 0.03815161809325218 Loss at step 400: 0.046509694308042526 Loss at step 450: 0.05401482805609703 Loss at step 500: 0.05526139587163925 Loss at step 550: 0.036430906504392624 Loss at step 600: 0.03996090218424797 Loss at step 650: 0.0424160473048687 Loss at step 700: 0.04245726019144058 Loss at step 750: 0.04163707047700882 Loss at step 800: 0.04257072135806084 Loss at step 850: 0.03982752189040184 Loss at step 900: 0.0390545018017292 Mean training loss after epoch 136: 0.04513865826464792 EPOCH: 137 Loss at step 0: 0.041630249470472336 Loss at step 50: 0.03990495204925537 Loss at step 100: 0.035573575645685196 Loss at step 150: 0.04011499136686325 Loss at step 200: 0.048975251615047455 Loss at step 250: 0.04927099868655205 Loss at step 300: 0.052106961607933044 Loss at step 350: 0.04262035712599754 Loss at step 400: 0.04712356626987457 Loss at step 450: 0.03765459358692169 Loss at step 500: 0.039250459522008896 Loss at step 550: 0.05373281612992287 Loss at step 600: 0.03461955860257149 Loss at step 650: 0.03782762587070465 Loss at step 700: 0.038719866424798965 Loss at step 750: 0.04533890262246132 Loss at step 800: 0.04096260666847229 Loss at step 850: 0.05749823525547981 Loss at step 900: 0.03692217916250229 Mean training loss after epoch 137: 0.04508865276959215 EPOCH: 138 Loss at step 0: 0.06945963948965073 Loss at step 50: 0.040057942271232605 Loss at step 100: 0.047803498804569244 Loss at step 150: 0.03815947100520134 Loss at step 200: 0.038313817232847214 Loss at step 250: 0.03740762919187546 Loss at step 300: 0.03993337228894234 Loss at step 350: 0.04766089841723442 Loss at step 400: 0.04580902308225632 Loss at step 450: 0.0398416630923748 Loss at step 500: 0.057695355266332626 Loss at step 550: 0.05464360862970352 Loss at step 600: 0.04387955740094185 Loss at step 650: 0.044662464410066605 Loss at step 700: 0.05483702942728996 Loss at step 750: 0.042811185121536255 Loss at step 800: 0.05621471256017685 Loss at step 850: 0.0402277335524559 Loss at step 900: 0.03834249824285507 Mean training loss after epoch 138: 0.045429797114323836 EPOCH: 139 Loss at step 0: 0.043225619941949844 Loss at step 50: 0.054096516221761703 Loss at step 100: 0.0365842804312706 Loss at step 150: 0.040682353079319 Loss at step 200: 0.059067703783512115 Loss at step 250: 0.05527551844716072 Loss at step 300: 0.04655776545405388 Loss at step 350: 0.0397433266043663 Loss at step 400: 0.04133676737546921 Loss at step 450: 0.046193573623895645 Loss at step 500: 0.04525919258594513 Loss at step 550: 0.04293420910835266 Loss at step 600: 0.050167959183454514 Loss at step 650: 0.04329539090394974 Loss at step 700: 0.0405573733150959 Loss at step 750: 0.033732008188962936 Loss at step 800: 0.04616639018058777 Loss at step 850: 0.041459355503320694 Loss at step 900: 0.03283211216330528 Mean training loss after epoch 139: 0.04565253498743592 EPOCH: 140 Loss at step 0: 0.05933477357029915 Loss at step 50: 0.04480671137571335 Loss at step 100: 0.043455131351947784 Loss at step 150: 0.04014158248901367 Loss at step 200: 0.03925960883498192 Loss at step 250: 0.05922736972570419 Loss at step 300: 0.038889359682798386 Loss at step 350: 0.038517266511917114 Loss at step 400: 0.05189181864261627 Loss at step 450: 0.053639013320207596 Loss at step 500: 0.043128762394189835 Loss at step 550: 0.05406291410326958 Loss at step 600: 0.05133901163935661 Loss at step 650: 0.05123307183384895 Loss at step 700: 0.04091839864850044 Loss at step 750: 0.03644350916147232 Loss at step 800: 0.04408593103289604 Loss at step 850: 0.04143134504556656 Loss at step 900: 0.03982198238372803 Mean training loss after epoch 140: 0.045726284063033966 EPOCH: 141 Loss at step 0: 0.07643990218639374 Loss at step 50: 0.042827509343624115 Loss at step 100: 0.03729494288563728 Loss at step 150: 0.04267359524965286 Loss at step 200: 0.0526069775223732 Loss at step 250: 0.03965418040752411 Loss at step 300: 0.03965660557150841 Loss at step 350: 0.05664416775107384 Loss at step 400: 0.03898712620139122 Loss at step 450: 0.05071273073554039 Loss at step 500: 0.039970893412828445 Loss at step 550: 0.054535917937755585 Loss at step 600: 0.058682337403297424 Loss at step 650: 0.0395549051463604 Loss at step 700: 0.03768622875213623 Loss at step 750: 0.04078089818358421 Loss at step 800: 0.04222925379872322 Loss at step 850: 0.09229809045791626 Loss at step 900: 0.04109574109315872 Mean training loss after epoch 141: 0.04526171847375662 EPOCH: 142 Loss at step 0: 0.06706027686595917 Loss at step 50: 0.04288400337100029 Loss at step 100: 0.0419747419655323 Loss at step 150: 0.04507150501012802 Loss at step 200: 0.03391042351722717 Loss at step 250: 0.04253320395946503 Loss at step 300: 0.04334145039319992 Loss at step 350: 0.04149409011006355 Loss at step 400: 0.03677082434296608 Loss at step 450: 0.04003981500864029 Loss at step 500: 0.043348681181669235 Loss at step 550: 0.06000783294439316 Loss at step 600: 0.06905491650104523 Loss at step 650: 0.03770330548286438 Loss at step 700: 0.04500021040439606 Loss at step 750: 0.04407598823308945 Loss at step 800: 0.05708540603518486 Loss at step 850: 0.03979479521512985 Loss at step 900: 0.052334755659103394 Mean training loss after epoch 142: 0.046000506639528246 EPOCH: 143 Loss at step 0: 0.03922393545508385 Loss at step 50: 0.07708627730607986 Loss at step 100: 0.04452862963080406 Loss at step 150: 0.04042590409517288 Loss at step 200: 0.04951493442058563 Loss at step 250: 0.039893023669719696 Loss at step 300: 0.052120305597782135 Loss at step 350: 0.036691006273031235 Loss at step 400: 0.04150749742984772 Loss at step 450: 0.06771457195281982 Loss at step 500: 0.05504632368683815 Loss at step 550: 0.05673797056078911 Loss at step 600: 0.0723184272646904 Loss at step 650: 0.05623732507228851 Loss at step 700: 0.033086147159338 Loss at step 750: 0.05448524281382561 Loss at step 800: 0.03592769056558609 Loss at step 850: 0.055780086666345596 Loss at step 900: 0.04763459041714668 Mean training loss after epoch 143: 0.04546483400375096 EPOCH: 144 Loss at step 0: 0.05236753076314926 Loss at step 50: 0.03856360912322998 Loss at step 100: 0.05341062694787979 Loss at step 150: 0.06998606026172638 Loss at step 200: 0.03969200700521469 Loss at step 250: 0.042679231613874435 Loss at step 300: 0.04525759816169739 Loss at step 350: 0.04260559380054474 Loss at step 400: 0.05723167210817337 Loss at step 450: 0.04439713805913925 Loss at step 500: 0.038396380841732025 Loss at step 550: 0.03396816924214363 Loss at step 600: 0.04347236827015877 Loss at step 650: 0.03931021690368652 Loss at step 700: 0.0402735136449337 Loss at step 750: 0.03282059729099274 Loss at step 800: 0.04266834259033203 Loss at step 850: 0.044804733246564865 Loss at step 900: 0.04567558690905571 Mean training loss after epoch 144: 0.04540096294246058 EPOCH: 145 Loss at step 0: 0.05981537699699402 Loss at step 50: 0.044027186930179596 Loss at step 100: 0.05124702304601669 Loss at step 150: 0.04432817921042442 Loss at step 200: 0.04773557931184769 Loss at step 250: 0.04199443385004997 Loss at step 300: 0.037892792373895645 Loss at step 350: 0.04425608366727829 Loss at step 400: 0.04401080310344696 Loss at step 450: 0.058971889317035675 Loss at step 500: 0.040399424731731415 Loss at step 550: 0.03519612178206444 Loss at step 600: 0.044670648872852325 Loss at step 650: 0.049837131053209305 Loss at step 700: 0.048635561019182205 Loss at step 750: 0.04474201798439026 Loss at step 800: 0.04917857423424721 Loss at step 850: 0.05089118331670761 Loss at step 900: 0.03693833947181702 Mean training loss after epoch 145: 0.04524336322578095 EPOCH: 146 Loss at step 0: 0.05509737506508827 Loss at step 50: 0.05514668673276901 Loss at step 100: 0.03898341953754425 Loss at step 150: 0.036114562302827835 Loss at step 200: 0.05524233356118202 Loss at step 250: 0.044069528579711914 Loss at step 300: 0.0578959621489048 Loss at step 350: 0.03757890686392784 Loss at step 400: 0.056066133081912994 Loss at step 450: 0.035422857850790024 Loss at step 500: 0.04168100655078888 Loss at step 550: 0.03915674611926079 Loss at step 600: 0.04030285030603409 Loss at step 650: 0.043299704790115356 Loss at step 700: 0.056268516927957535 Loss at step 750: 0.04520934820175171 Loss at step 800: 0.04452161490917206 Loss at step 850: 0.043545469641685486 Loss at step 900: 0.04091626778244972 Mean training loss after epoch 146: 0.04517968092908038 EPOCH: 147 Loss at step 0: 0.03717481717467308 Loss at step 50: 0.04272831231355667 Loss at step 100: 0.050952453166246414 Loss at step 150: 0.036012761294841766 Loss at step 200: 0.03929879516363144 Loss at step 250: 0.03773774206638336 Loss at step 300: 0.039233505725860596 Loss at step 350: 0.04369395971298218 Loss at step 400: 0.0398932583630085 Loss at step 450: 0.05194225162267685 Loss at step 500: 0.04488073289394379 Loss at step 550: 0.03971068561077118 Loss at step 600: 0.037073154002428055 Loss at step 650: 0.04490215703845024 Loss at step 700: 0.03524032235145569 Loss at step 750: 0.056882236152887344 Loss at step 800: 0.059731870889663696 Loss at step 850: 0.04750389978289604 Loss at step 900: 0.040222469717264175 Mean training loss after epoch 147: 0.04518091132137567 EPOCH: 148 Loss at step 0: 0.05383118614554405 Loss at step 50: 0.070344477891922 Loss at step 100: 0.038971684873104095 Loss at step 150: 0.0423809252679348 Loss at step 200: 0.06397809088230133 Loss at step 250: 0.03217413276433945 Loss at step 300: 0.03603808209300041 Loss at step 350: 0.039157118648290634 Loss at step 400: 0.05652608722448349 Loss at step 450: 0.05102754011750221 Loss at step 500: 0.03910079598426819 Loss at step 550: 0.0410158708691597 Loss at step 600: 0.04120100662112236 Loss at step 650: 0.061705753207206726 Loss at step 700: 0.04131576791405678 Loss at step 750: 0.06772387772798538 Loss at step 800: 0.0699528157711029 Loss at step 850: 0.04130713641643524 Loss at step 900: 0.0418078638613224 Mean training loss after epoch 148: 0.0455712422883428 EPOCH: 149 Loss at step 0: 0.04634777456521988 Loss at step 50: 0.041253238916397095 Loss at step 100: 0.04008399695158005 Loss at step 150: 0.04814977943897247 Loss at step 200: 0.0363667868077755 Loss at step 250: 0.0403866283595562 Loss at step 300: 0.036316774785518646 Loss at step 350: 0.06149269640445709 Loss at step 400: 0.033166296780109406 Loss at step 450: 0.04679546505212784 Loss at step 500: 0.04138525202870369 Loss at step 550: 0.045685410499572754 Loss at step 600: 0.04208800196647644 Loss at step 650: 0.04151197895407677 Loss at step 700: 0.054437167942523956 Loss at step 750: 0.041471440345048904 Loss at step 800: 0.04165365546941757 Loss at step 850: 0.03842727467417717 Loss at step 900: 0.038395073264837265 Mean training loss after epoch 149: 0.04573362713445352 EPOCH: 150 Loss at step 0: 0.04282116889953613 Loss at step 50: 0.043629392981529236 Loss at step 100: 0.05596429854631424 Loss at step 150: 0.03119789808988571 Loss at step 200: 0.048692360520362854 Loss at step 250: 0.06410976499319077 Loss at step 300: 0.04391242936253548 Loss at step 350: 0.037003882229328156 Loss at step 400: 0.039316438138484955 Loss at step 450: 0.05625694617629051 Loss at step 500: 0.03798418864607811 Loss at step 550: 0.05146384984254837 Loss at step 600: 0.04114704951643944 Loss at step 650: 0.043083686381578445 Loss at step 700: 0.0767633393406868 Loss at step 750: 0.04598135128617287 Loss at step 800: 0.0435275137424469 Loss at step 850: 0.04724422097206116 Loss at step 900: 0.04118693619966507 Mean training loss after epoch 150: 0.04494143853277794 EPOCH: 151 Loss at step 0: 0.04423930123448372 Loss at step 50: 0.04018191993236542 Loss at step 100: 0.05738671496510506 Loss at step 150: 0.05743098631501198 Loss at step 200: 0.060497164726257324 Loss at step 250: 0.06154698133468628 Loss at step 300: 0.05344966799020767 Loss at step 350: 0.045403216034173965 Loss at step 400: 0.06312426924705505 Loss at step 450: 0.035465486347675323 Loss at step 500: 0.038386791944503784 Loss at step 550: 0.0512588731944561 Loss at step 600: 0.04121686890721321 Loss at step 650: 0.04401252791285515 Loss at step 700: 0.03906235843896866 Loss at step 750: 0.03810534626245499 Loss at step 800: 0.039807841181755066 Loss at step 850: 0.03677043691277504 Loss at step 900: 0.054777592420578 Mean training loss after epoch 151: 0.045252184322012515 EPOCH: 152 Loss at step 0: 0.05806635692715645 Loss at step 50: 0.035181425511837006 Loss at step 100: 0.042128726840019226 Loss at step 150: 0.043391868472099304 Loss at step 200: 0.04172728955745697 Loss at step 250: 0.03902006894350052 Loss at step 300: 0.03561905771493912 Loss at step 350: 0.05256867781281471 Loss at step 400: 0.058148615062236786 Loss at step 450: 0.03650686517357826 Loss at step 500: 0.03790832683444023 Loss at step 550: 0.054602961987257004 Loss at step 600: 0.05786190927028656 Loss at step 650: 0.0895070731639862 Loss at step 700: 0.04103327542543411 Loss at step 750: 0.05321703478693962 Loss at step 800: 0.041732579469680786 Loss at step 850: 0.04743202403187752 Loss at step 900: 0.040576353669166565 Mean training loss after epoch 152: 0.045163730216989 EPOCH: 153 Loss at step 0: 0.04918451979756355 Loss at step 50: 0.044581182301044464 Loss at step 100: 0.03506239876151085 Loss at step 150: 0.036982737481594086 Loss at step 200: 0.05321277305483818 Loss at step 250: 0.05708350986242294 Loss at step 300: 0.03523658588528633 Loss at step 350: 0.07865951955318451 Loss at step 400: 0.037904612720012665 Loss at step 450: 0.049754124134778976 Loss at step 500: 0.04084399342536926 Loss at step 550: 0.05506086349487305 Loss at step 600: 0.04476074129343033 Loss at step 650: 0.042190708220005035 Loss at step 700: 0.04140375554561615 Loss at step 750: 0.04651135578751564 Loss at step 800: 0.04372303560376167 Loss at step 850: 0.053790464997291565 Loss at step 900: 0.05865241959691048 Mean training loss after epoch 153: 0.044561652532184935 EPOCH: 154 Loss at step 0: 0.057139765471220016 Loss at step 50: 0.04055776819586754 Loss at step 100: 0.044082965701818466 Loss at step 150: 0.05612867698073387 Loss at step 200: 0.03699573501944542 Loss at step 250: 0.056301578879356384 Loss at step 300: 0.04163125529885292 Loss at step 350: 0.05092141777276993 Loss at step 400: 0.038708385080099106 Loss at step 450: 0.037120964378118515 Loss at step 500: 0.045294389128685 Loss at step 550: 0.03878610581159592 Loss at step 600: 0.059239793568849564 Loss at step 650: 0.041102636605501175 Loss at step 700: 0.03927904739975929 Loss at step 750: 0.03888082504272461 Loss at step 800: 0.03796042874455452 Loss at step 850: 0.04006597027182579 Loss at step 900: 0.045492131263017654 Mean training loss after epoch 154: 0.045093357545146935 EPOCH: 155 Loss at step 0: 0.05144727975130081 Loss at step 50: 0.05564383044838905 Loss at step 100: 0.04785928875207901 Loss at step 150: 0.041502684354782104 Loss at step 200: 0.03820454329252243 Loss at step 250: 0.054117631167173386 Loss at step 300: 0.04392990097403526 Loss at step 350: 0.05677485838532448 Loss at step 400: 0.039841581135988235 Loss at step 450: 0.0558331198990345 Loss at step 500: 0.033879172056913376 Loss at step 550: 0.04350897669792175 Loss at step 600: 0.050638966262340546 Loss at step 650: 0.04028291627764702 Loss at step 700: 0.035014595836400986 Loss at step 750: 0.035447221249341965 Loss at step 800: 0.054362379014492035 Loss at step 850: 0.03870699554681778 Loss at step 900: 0.03903436288237572 Mean training loss after epoch 155: 0.04521909763198544 EPOCH: 156 Loss at step 0: 0.045506857335567474 Loss at step 50: 0.036485545337200165 Loss at step 100: 0.043250419199466705 Loss at step 150: 0.039637889713048935 Loss at step 200: 0.031968116760253906 Loss at step 250: 0.04212938994169235 Loss at step 300: 0.03247520700097084 Loss at step 350: 0.03485275059938431 Loss at step 400: 0.0412992388010025 Loss at step 450: 0.04109900817275047 Loss at step 500: 0.037688419222831726 Loss at step 550: 0.055701177567243576 Loss at step 600: 0.04417446628212929 Loss at step 650: 0.04994824901223183 Loss at step 700: 0.03200230747461319 Loss at step 750: 0.0446736179292202 Loss at step 800: 0.04309774935245514 Loss at step 850: 0.0446033850312233 Loss at step 900: 0.04056999832391739 Mean training loss after epoch 156: 0.04517605644402537 EPOCH: 157 Loss at step 0: 0.037095312029123306 Loss at step 50: 0.03319815173745155 Loss at step 100: 0.04379301518201828 Loss at step 150: 0.05878935381770134 Loss at step 200: 0.03608798235654831 Loss at step 250: 0.035521797835826874 Loss at step 300: 0.06459105759859085 Loss at step 350: 0.06066285818815231 Loss at step 400: 0.038824763149023056 Loss at step 450: 0.05673769488930702 Loss at step 500: 0.04635654017329216 Loss at step 550: 0.06733667105436325 Loss at step 600: 0.045455336570739746 Loss at step 650: 0.03464576229453087 Loss at step 700: 0.03845019266009331 Loss at step 750: 0.04295254126191139 Loss at step 800: 0.045532990247011185 Loss at step 850: 0.035553138703107834 Loss at step 900: 0.05329480022192001 Mean training loss after epoch 157: 0.04537811060585002 EPOCH: 158 Loss at step 0: 0.0369277223944664 Loss at step 50: 0.0565960556268692 Loss at step 100: 0.03274596855044365 Loss at step 150: 0.04789676517248154 Loss at step 200: 0.03991915285587311 Loss at step 250: 0.037666160613298416 Loss at step 300: 0.04278586804866791 Loss at step 350: 0.04168589785695076 Loss at step 400: 0.042697809636592865 Loss at step 450: 0.03964439779520035 Loss at step 500: 0.03852182999253273 Loss at step 550: 0.05026841536164284 Loss at step 600: 0.0633716732263565 Loss at step 650: 0.041244715452194214 Loss at step 700: 0.0523509718477726 Loss at step 750: 0.050148312002420425 Loss at step 800: 0.04236278682947159 Loss at step 850: 0.03888542205095291 Loss at step 900: 0.06856473535299301 Mean training loss after epoch 158: 0.04575330806550568 EPOCH: 159 Loss at step 0: 0.058583665639162064 Loss at step 50: 0.04570406302809715 Loss at step 100: 0.05466499179601669 Loss at step 150: 0.05179697647690773 Loss at step 200: 0.05455441400408745 Loss at step 250: 0.043174732476472855 Loss at step 300: 0.03637443855404854 Loss at step 350: 0.05753784999251366 Loss at step 400: 0.03829602897167206 Loss at step 450: 0.03489849716424942 Loss at step 500: 0.05396250635385513 Loss at step 550: 0.04547581449151039 Loss at step 600: 0.04038912430405617 Loss at step 650: 0.03943103179335594 Loss at step 700: 0.042903218418359756 Loss at step 750: 0.036858465522527695 Loss at step 800: 0.03432771563529968 Loss at step 850: 0.039740316569805145 Loss at step 900: 0.0359964594244957 Mean training loss after epoch 159: 0.044653533840738634 EPOCH: 160 Loss at step 0: 0.04313793033361435 Loss at step 50: 0.04902562126517296 Loss at step 100: 0.04139186814427376 Loss at step 150: 0.039588816463947296 Loss at step 200: 0.04033917933702469 Loss at step 250: 0.06279414892196655 Loss at step 300: 0.0330042727291584 Loss at step 350: 0.04404492303729057 Loss at step 400: 0.041888631880283356 Loss at step 450: 0.03896510601043701 Loss at step 500: 0.044775135815143585 Loss at step 550: 0.0408957339823246 Loss at step 600: 0.041120849549770355 Loss at step 650: 0.0419977530837059 Loss at step 700: 0.05292803794145584 Loss at step 750: 0.05475588142871857 Loss at step 800: 0.04420020058751106 Loss at step 850: 0.042813513427972794 Loss at step 900: 0.03962252661585808 Mean training loss after epoch 160: 0.04453597062710188 EPOCH: 161 Loss at step 0: 0.04346984997391701 Loss at step 50: 0.06921962648630142 Loss at step 100: 0.03423284366726875 Loss at step 150: 0.06012038514018059 Loss at step 200: 0.06300819665193558 Loss at step 250: 0.05199810490012169 Loss at step 300: 0.03783068433403969 Loss at step 350: 0.042337916791439056 Loss at step 400: 0.04122010990977287 Loss at step 450: 0.051785919815301895 Loss at step 500: 0.033950772136449814 Loss at step 550: 0.057830292731523514 Loss at step 600: 0.042520876973867416 Loss at step 650: 0.05031419172883034 Loss at step 700: 0.041341397911310196 Loss at step 750: 0.04464469850063324 Loss at step 800: 0.03740358725190163 Loss at step 850: 0.0358235128223896 Loss at step 900: 0.06903759390115738 Mean training loss after epoch 161: 0.04589811383065448 EPOCH: 162 Loss at step 0: 0.06228118762373924 Loss at step 50: 0.036720599979162216 Loss at step 100: 0.052344147115945816 Loss at step 150: 0.045240022242069244 Loss at step 200: 0.03988766670227051 Loss at step 250: 0.04472324624657631 Loss at step 300: 0.03517422452569008 Loss at step 350: 0.04895933344960213 Loss at step 400: 0.034063566476106644 Loss at step 450: 0.04699360579252243 Loss at step 500: 0.0419827364385128 Loss at step 550: 0.05346928536891937 Loss at step 600: 0.044737912714481354 Loss at step 650: 0.04173022881150246 Loss at step 700: 0.04145962372422218 Loss at step 750: 0.05889792740345001 Loss at step 800: 0.04171168804168701 Loss at step 850: 0.05337144806981087 Loss at step 900: 0.055150944739580154 Mean training loss after epoch 162: 0.04474590926059782 EPOCH: 163 Loss at step 0: 0.04805530607700348 Loss at step 50: 0.03818832337856293 Loss at step 100: 0.044808465987443924 Loss at step 150: 0.03218396380543709 Loss at step 200: 0.060345493257045746 Loss at step 250: 0.055748917162418365 Loss at step 300: 0.04493658244609833 Loss at step 350: 0.04465840756893158 Loss at step 400: 0.04412674158811569 Loss at step 450: 0.04227730631828308 Loss at step 500: 0.03790136054158211 Loss at step 550: 0.048960037529468536 Loss at step 600: 0.04284412041306496 Loss at step 650: 0.04502560943365097 Loss at step 700: 0.04821677505970001 Loss at step 750: 0.0426793247461319 Loss at step 800: 0.036286089569330215 Loss at step 850: 0.05409059673547745 Loss at step 900: 0.06297631561756134 Mean training loss after epoch 163: 0.04533423062588678 EPOCH: 164 Loss at step 0: 0.03790433332324028 Loss at step 50: 0.03961317613720894 Loss at step 100: 0.043957505375146866 Loss at step 150: 0.035413481295108795 Loss at step 200: 0.042927131056785583 Loss at step 250: 0.0439896285533905 Loss at step 300: 0.05656082555651665 Loss at step 350: 0.04485234245657921 Loss at step 400: 0.04038581997156143 Loss at step 450: 0.04925728589296341 Loss at step 500: 0.039945341646671295 Loss at step 550: 0.046423327177762985 Loss at step 600: 0.04233664646744728 Loss at step 650: 0.05188891291618347 Loss at step 700: 0.04988690838217735 Loss at step 750: 0.0401928573846817 Loss at step 800: 0.03773686662316322 Loss at step 850: 0.037738457322120667 Loss at step 900: 0.032886702567338943 Mean training loss after epoch 164: 0.044881119664861704 EPOCH: 165 Loss at step 0: 0.03795930743217468 Loss at step 50: 0.04019002243876457 Loss at step 100: 0.040128469467163086 Loss at step 150: 0.0427861288189888 Loss at step 200: 0.04058043286204338 Loss at step 250: 0.04106577858328819 Loss at step 300: 0.04070611670613289 Loss at step 350: 0.07410020381212234 Loss at step 400: 0.06480217725038528 Loss at step 450: 0.03282897546887398 Loss at step 500: 0.034894850105047226 Loss at step 550: 0.034078992903232574 Loss at step 600: 0.05498705059289932 Loss at step 650: 0.040249455720186234 Loss at step 700: 0.05763848498463631 Loss at step 750: 0.04198414459824562 Loss at step 800: 0.055930543690919876 Loss at step 850: 0.047943998128175735 Loss at step 900: 0.052265144884586334 Mean training loss after epoch 165: 0.04512808755087827 EPOCH: 166 Loss at step 0: 0.0371636264026165 Loss at step 50: 0.051955271512269974 Loss at step 100: 0.03980622813105583 Loss at step 150: 0.036819297820329666 Loss at step 200: 0.04422394186258316 Loss at step 250: 0.04338080435991287 Loss at step 300: 0.03552800416946411 Loss at step 350: 0.03632098808884621 Loss at step 400: 0.03490375354886055 Loss at step 450: 0.043271373957395554 Loss at step 500: 0.045901112258434296 Loss at step 550: 0.04607975482940674 Loss at step 600: 0.04123765975236893 Loss at step 650: 0.03876103460788727 Loss at step 700: 0.04552314803004265 Loss at step 750: 0.039844971150159836 Loss at step 800: 0.041845422238111496 Loss at step 850: 0.039886973798274994 Loss at step 900: 0.052534013986587524 Mean training loss after epoch 166: 0.04487656772351151 EPOCH: 167 Loss at step 0: 0.042515482753515244 Loss at step 50: 0.036949824541807175 Loss at step 100: 0.05021488666534424 Loss at step 150: 0.04379821568727493 Loss at step 200: 0.035356417298316956 Loss at step 250: 0.053139787167310715 Loss at step 300: 0.06796281784772873 Loss at step 350: 0.07853253930807114 Loss at step 400: 0.035891272127628326 Loss at step 450: 0.04436671733856201 Loss at step 500: 0.045565467327833176 Loss at step 550: 0.042891550809144974 Loss at step 600: 0.04448252171278 Loss at step 650: 0.04244929924607277 Loss at step 700: 0.05214785784482956 Loss at step 750: 0.05638536438345909 Loss at step 800: 0.056534722447395325 Loss at step 850: 0.0576387457549572 Loss at step 900: 0.05789962038397789 Mean training loss after epoch 167: 0.0449544004182508 EPOCH: 168 Loss at step 0: 0.03884321078658104 Loss at step 50: 0.07249975204467773 Loss at step 100: 0.03951134532690048 Loss at step 150: 0.03786768391728401 Loss at step 200: 0.05340374633669853 Loss at step 250: 0.042558200657367706 Loss at step 300: 0.038699921220541 Loss at step 350: 0.0631069466471672 Loss at step 400: 0.06125781685113907 Loss at step 450: 0.04031132906675339 Loss at step 500: 0.053958434611558914 Loss at step 550: 0.0420476458966732 Loss at step 600: 0.03338213264942169 Loss at step 650: 0.03919863700866699 Loss at step 700: 0.04096262529492378 Loss at step 750: 0.06386973708868027 Loss at step 800: 0.05964842438697815 Loss at step 850: 0.04585675895214081 Loss at step 900: 0.036834605038166046 Mean training loss after epoch 168: 0.04493701730622475 EPOCH: 169 Loss at step 0: 0.037322383373975754 Loss at step 50: 0.04599372670054436 Loss at step 100: 0.04076969623565674 Loss at step 150: 0.04358348995447159 Loss at step 200: 0.05261244997382164 Loss at step 250: 0.03960583731532097 Loss at step 300: 0.05687599256634712 Loss at step 350: 0.04662065953016281 Loss at step 400: 0.043231457471847534 Loss at step 450: 0.034571513533592224 Loss at step 500: 0.05706150084733963 Loss at step 550: 0.04137909412384033 Loss at step 600: 0.03791302815079689 Loss at step 650: 0.0533582903444767 Loss at step 700: 0.03930455818772316 Loss at step 750: 0.04063140228390694 Loss at step 800: 0.0522279255092144 Loss at step 850: 0.0773761123418808 Loss at step 900: 0.0542360357940197 Mean training loss after epoch 169: 0.044961434260789135 EPOCH: 170 Loss at step 0: 0.06077020987868309 Loss at step 50: 0.04344135895371437 Loss at step 100: 0.03787507861852646 Loss at step 150: 0.0351269394159317 Loss at step 200: 0.0676461011171341 Loss at step 250: 0.043764445930719376 Loss at step 300: 0.03691134601831436 Loss at step 350: 0.041500743478536606 Loss at step 400: 0.060653042048215866 Loss at step 450: 0.03650350496172905 Loss at step 500: 0.05107559263706207 Loss at step 550: 0.03269306570291519 Loss at step 600: 0.04152371734380722 Loss at step 650: 0.06715691834688187 Loss at step 700: 0.050732143223285675 Loss at step 750: 0.04120210185647011 Loss at step 800: 0.041679635643959045 Loss at step 850: 0.042040348052978516 Loss at step 900: 0.05703346058726311 Mean training loss after epoch 170: 0.045306254047225276 EPOCH: 171 Loss at step 0: 0.03913231939077377 Loss at step 50: 0.03670516610145569 Loss at step 100: 0.035496070981025696 Loss at step 150: 0.041489142924547195 Loss at step 200: 0.06871479004621506 Loss at step 250: 0.04436494782567024 Loss at step 300: 0.04429658129811287 Loss at step 350: 0.03626585751771927 Loss at step 400: 0.051591094583272934 Loss at step 450: 0.043016012758016586 Loss at step 500: 0.03883887827396393 Loss at step 550: 0.05507335811853409 Loss at step 600: 0.05791299045085907 Loss at step 650: 0.04087790101766586 Loss at step 700: 0.03721265494823456 Loss at step 750: 0.041336797177791595 Loss at step 800: 0.04449688643217087 Loss at step 850: 0.03942723944783211 Loss at step 900: 0.05507685989141464 Mean training loss after epoch 171: 0.04514510651379188 EPOCH: 172 Loss at step 0: 0.03944544866681099 Loss at step 50: 0.0535382516682148 Loss at step 100: 0.041857436299324036 Loss at step 150: 0.043972112238407135 Loss at step 200: 0.03472994267940521 Loss at step 250: 0.03893918916583061 Loss at step 300: 0.0320570208132267 Loss at step 350: 0.03827795758843422 Loss at step 400: 0.04127787798643112 Loss at step 450: 0.03436832129955292 Loss at step 500: 0.05759158357977867 Loss at step 550: 0.047414131462574005 Loss at step 600: 0.051036376506090164 Loss at step 650: 0.03663410246372223 Loss at step 700: 0.03801409900188446 Loss at step 750: 0.0345013402402401 Loss at step 800: 0.03504687175154686 Loss at step 850: 0.05460084229707718 Loss at step 900: 0.0377960279583931 Mean training loss after epoch 172: 0.04465780588887584 EPOCH: 173 Loss at step 0: 0.03905131295323372 Loss at step 50: 0.07217153906822205 Loss at step 100: 0.04105672985315323 Loss at step 150: 0.035745784640312195 Loss at step 200: 0.0411885567009449 Loss at step 250: 0.038246989250183105 Loss at step 300: 0.05221167206764221 Loss at step 350: 0.043755676597356796 Loss at step 400: 0.04112847521901131 Loss at step 450: 0.044805582612752914 Loss at step 500: 0.07119356840848923 Loss at step 550: 0.04142846167087555 Loss at step 600: 0.041867926716804504 Loss at step 650: 0.05168565362691879 Loss at step 700: 0.03702205792069435 Loss at step 750: 0.04131702333688736 Loss at step 800: 0.038887590169906616 Loss at step 850: 0.04208119586110115 Loss at step 900: 0.03953680768609047 Mean training loss after epoch 173: 0.04540624770758821 EPOCH: 174 Loss at step 0: 0.03728681430220604 Loss at step 50: 0.04520278051495552 Loss at step 100: 0.044506531208753586 Loss at step 150: 0.05480348691344261 Loss at step 200: 0.03661713749170303 Loss at step 250: 0.043457888066768646 Loss at step 300: 0.037857163697481155 Loss at step 350: 0.05054677277803421 Loss at step 400: 0.04079076275229454 Loss at step 450: 0.05187408626079559 Loss at step 500: 0.0400749035179615 Loss at step 550: 0.05640106648206711 Loss at step 600: 0.038121238350868225 Loss at step 650: 0.03378952294588089 Loss at step 700: 0.04456334933638573 Loss at step 750: 0.05235273763537407 Loss at step 800: 0.04061971232295036 Loss at step 850: 0.046062715351581573 Loss at step 900: 0.037213973701000214 Mean training loss after epoch 174: 0.04515980026210104 EPOCH: 175 Loss at step 0: 0.04041971266269684 Loss at step 50: 0.04416755586862564 Loss at step 100: 0.03968343883752823 Loss at step 150: 0.04334979131817818 Loss at step 200: 0.03476853296160698 Loss at step 250: 0.039381612092256546 Loss at step 300: 0.055342402309179306 Loss at step 350: 0.05493606626987457 Loss at step 400: 0.03886989504098892 Loss at step 450: 0.04168616607785225 Loss at step 500: 0.04194429889321327 Loss at step 550: 0.04917588457465172 Loss at step 600: 0.03969926759600639 Loss at step 650: 0.0356193371117115 Loss at step 700: 0.03869745507836342 Loss at step 750: 0.046061642467975616 Loss at step 800: 0.03734658658504486 Loss at step 850: 0.03790218383073807 Loss at step 900: 0.059502992779016495 Mean training loss after epoch 175: 0.04561641965625383 EPOCH: 176 Loss at step 0: 0.0438973605632782 Loss at step 50: 0.03801502287387848 Loss at step 100: 0.05283140391111374 Loss at step 150: 0.03342270106077194 Loss at step 200: 0.07433956861495972 Loss at step 250: 0.03262732923030853 Loss at step 300: 0.03676030412316322 Loss at step 350: 0.04657838121056557 Loss at step 400: 0.04034288600087166 Loss at step 450: 0.03682326152920723 Loss at step 500: 0.05552436783909798 Loss at step 550: 0.0703757107257843 Loss at step 600: 0.03970923647284508 Loss at step 650: 0.050401609390974045 Loss at step 700: 0.04095887392759323 Loss at step 750: 0.05756223946809769 Loss at step 800: 0.04552058130502701 Loss at step 850: 0.04162651672959328 Loss at step 900: 0.04205979406833649 Mean training loss after epoch 176: 0.04486434652146373 EPOCH: 177 Loss at step 0: 0.04057762026786804 Loss at step 50: 0.038913846015930176 Loss at step 100: 0.045906469225883484 Loss at step 150: 0.058392222970724106 Loss at step 200: 0.041061390191316605 Loss at step 250: 0.03889657184481621 Loss at step 300: 0.042890045791864395 Loss at step 350: 0.034926798194646835 Loss at step 400: 0.03627695515751839 Loss at step 450: 0.05198673531413078 Loss at step 500: 0.043665047734975815 Loss at step 550: 0.046865660697221756 Loss at step 600: 0.034785110503435135 Loss at step 650: 0.03818181902170181 Loss at step 700: 0.039786145091056824 Loss at step 750: 0.03778981417417526 Loss at step 800: 0.042405374348163605 Loss at step 850: 0.0317123644053936 Loss at step 900: 0.04232268035411835 Mean training loss after epoch 177: 0.04561708193582131 EPOCH: 178 Loss at step 0: 0.046510543674230576 Loss at step 50: 0.04347822442650795 Loss at step 100: 0.06058052182197571 Loss at step 150: 0.039572227746248245 Loss at step 200: 0.0376017764210701 Loss at step 250: 0.05838087573647499 Loss at step 300: 0.057596754282712936 Loss at step 350: 0.04388156533241272 Loss at step 400: 0.03788335248827934 Loss at step 450: 0.07211469858884811 Loss at step 500: 0.03736795857548714 Loss at step 550: 0.037261713296175 Loss at step 600: 0.06575857102870941 Loss at step 650: 0.038952507078647614 Loss at step 700: 0.03537405654788017 Loss at step 750: 0.0377969928085804 Loss at step 800: 0.062033407390117645 Loss at step 850: 0.03478017449378967 Loss at step 900: 0.04188700020313263 Mean training loss after epoch 178: 0.04497519669248097 EPOCH: 179 Loss at step 0: 0.059967316687107086 Loss at step 50: 0.04144581779837608 Loss at step 100: 0.03836938366293907 Loss at step 150: 0.05321413651108742 Loss at step 200: 0.035425715148448944 Loss at step 250: 0.03638635203242302 Loss at step 300: 0.04127886891365051 Loss at step 350: 0.03489508479833603 Loss at step 400: 0.03354688733816147 Loss at step 450: 0.03861060366034508 Loss at step 500: 0.05084632709622383 Loss at step 550: 0.03696342557668686 Loss at step 600: 0.043770838528871536 Loss at step 650: 0.038569312542676926 Loss at step 700: 0.03763270378112793 Loss at step 750: 0.03871903568506241 Loss at step 800: 0.04064547270536423 Loss at step 850: 0.03822667524218559 Loss at step 900: 0.03937992826104164 Mean training loss after epoch 179: 0.045007529634752 EPOCH: 180 Loss at step 0: 0.042903684079647064 Loss at step 50: 0.031873710453510284 Loss at step 100: 0.050054363906383514 Loss at step 150: 0.05196557939052582 Loss at step 200: 0.04554685205221176 Loss at step 250: 0.035816509276628494 Loss at step 300: 0.06207126006484032 Loss at step 350: 0.03341127559542656 Loss at step 400: 0.037607915699481964 Loss at step 450: 0.05046604946255684 Loss at step 500: 0.0626264289021492 Loss at step 550: 0.04584550857543945 Loss at step 600: 0.04838404431939125 Loss at step 650: 0.04887682572007179 Loss at step 700: 0.038784027099609375 Loss at step 750: 0.04283243045210838 Loss at step 800: 0.03820963203907013 Loss at step 850: 0.07433793693780899 Loss at step 900: 0.03854098916053772 Mean training loss after epoch 180: 0.04473149823124157 EPOCH: 181 Loss at step 0: 0.04769797995686531 Loss at step 50: 0.05833132192492485 Loss at step 100: 0.04888228699564934 Loss at step 150: 0.03764694929122925 Loss at step 200: 0.04069898650050163 Loss at step 250: 0.04297289624810219 Loss at step 300: 0.0381748229265213 Loss at step 350: 0.03979555889964104 Loss at step 400: 0.03818729519844055 Loss at step 450: 0.03959185630083084 Loss at step 500: 0.03633880242705345 Loss at step 550: 0.04006045684218407 Loss at step 600: 0.035344574600458145 Loss at step 650: 0.040042437613010406 Loss at step 700: 0.050252582877874374 Loss at step 750: 0.038052815943956375 Loss at step 800: 0.04089273512363434 Loss at step 850: 0.04157586768269539 Loss at step 900: 0.04185422882437706 Mean training loss after epoch 181: 0.04508357354477525 EPOCH: 182 Loss at step 0: 0.03518753871321678 Loss at step 50: 0.03506264463067055 Loss at step 100: 0.043282683938741684 Loss at step 150: 0.05627964809536934 Loss at step 200: 0.03603701665997505 Loss at step 250: 0.0693434625864029 Loss at step 300: 0.035020239651203156 Loss at step 350: 0.0531441830098629 Loss at step 400: 0.05295827239751816 Loss at step 450: 0.05775165557861328 Loss at step 500: 0.03738385811448097 Loss at step 550: 0.03496372699737549 Loss at step 600: 0.034062422811985016 Loss at step 650: 0.055209383368492126 Loss at step 700: 0.07243537902832031 Loss at step 750: 0.0427025705575943 Loss at step 800: 0.03934381529688835 Loss at step 850: 0.045297376811504364 Loss at step 900: 0.05486844480037689 Mean training loss after epoch 182: 0.04514537612869859 EPOCH: 183 Loss at step 0: 0.05578277260065079 Loss at step 50: 0.054883863776922226 Loss at step 100: 0.043947938829660416 Loss at step 150: 0.055015888065099716 Loss at step 200: 0.0445951484143734 Loss at step 250: 0.05216481164097786 Loss at step 300: 0.03621487691998482 Loss at step 350: 0.04049992561340332 Loss at step 400: 0.06191181018948555 Loss at step 450: 0.03954257443547249 Loss at step 500: 0.03847004845738411 Loss at step 550: 0.03769614174962044 Loss at step 600: 0.04031503573060036 Loss at step 650: 0.034376177936792374 Loss at step 700: 0.03602452576160431 Loss at step 750: 0.032602615654468536 Loss at step 800: 0.05360531061887741 Loss at step 850: 0.03787517547607422 Loss at step 900: 0.037630099803209305 Mean training loss after epoch 183: 0.044904542759235604 EPOCH: 184 Loss at step 0: 0.03566925972700119 Loss at step 50: 0.05731191858649254 Loss at step 100: 0.04462838172912598 Loss at step 150: 0.03549737110733986 Loss at step 200: 0.0435023307800293 Loss at step 250: 0.04403376951813698 Loss at step 300: 0.03603821620345116 Loss at step 350: 0.037171099334955215 Loss at step 400: 0.04279512166976929 Loss at step 450: 0.037279658019542694 Loss at step 500: 0.039825234562158585 Loss at step 550: 0.047310519963502884 Loss at step 600: 0.041668180376291275 Loss at step 650: 0.05263359472155571 Loss at step 700: 0.055043142288923264 Loss at step 750: 0.07477422058582306 Loss at step 800: 0.037778519093990326 Loss at step 850: 0.032246168702840805 Loss at step 900: 0.06303811073303223 Mean training loss after epoch 184: 0.04482485091031742 EPOCH: 185 Loss at step 0: 0.043932683765888214 Loss at step 50: 0.039595767855644226 Loss at step 100: 0.039315205067396164 Loss at step 150: 0.03800807520747185 Loss at step 200: 0.0414329431951046 Loss at step 250: 0.04020103067159653 Loss at step 300: 0.04062681645154953 Loss at step 350: 0.03425222262740135 Loss at step 400: 0.04015747830271721 Loss at step 450: 0.05243675410747528 Loss at step 500: 0.060402534902095795 Loss at step 550: 0.03726222366094589 Loss at step 600: 0.0639244019985199 Loss at step 650: 0.05324471741914749 Loss at step 700: 0.042684219777584076 Loss at step 750: 0.06055530533194542 Loss at step 800: 0.05538884177803993 Loss at step 850: 0.05424724519252777 Loss at step 900: 0.06907608360052109 Mean training loss after epoch 185: 0.04516900105596478 EPOCH: 186 Loss at step 0: 0.054311759769916534 Loss at step 50: 0.039730533957481384 Loss at step 100: 0.042144082486629486 Loss at step 150: 0.0686865895986557 Loss at step 200: 0.08182462304830551 Loss at step 250: 0.04291687160730362 Loss at step 300: 0.030537933111190796 Loss at step 350: 0.05605510622262955 Loss at step 400: 0.042675700038671494 Loss at step 450: 0.04619893804192543 Loss at step 500: 0.03707440569996834 Loss at step 550: 0.03921253979206085 Loss at step 600: 0.04132802039384842 Loss at step 650: 0.04900854825973511 Loss at step 700: 0.03101758472621441 Loss at step 750: 0.058900296688079834 Loss at step 800: 0.032254964113235474 Loss at step 850: 0.042196668684482574 Loss at step 900: 0.04243256151676178 Mean training loss after epoch 186: 0.04481762285227143 EPOCH: 187 Loss at step 0: 0.0409850999712944 Loss at step 50: 0.04665282368659973 Loss at step 100: 0.04172646254301071 Loss at step 150: 0.05424593761563301 Loss at step 200: 0.039626818150281906 Loss at step 250: 0.056469712406396866 Loss at step 300: 0.05572487413883209 Loss at step 350: 0.04946182295680046 Loss at step 400: 0.035019103437662125 Loss at step 450: 0.04167184978723526 Loss at step 500: 0.04191799461841583 Loss at step 550: 0.03499951586127281 Loss at step 600: 0.052424829453229904 Loss at step 650: 0.055010102689266205 Loss at step 700: 0.07735258340835571 Loss at step 750: 0.04067900404334068 Loss at step 800: 0.05906986445188522 Loss at step 850: 0.03792669251561165 Loss at step 900: 0.06813978403806686 Mean training loss after epoch 187: 0.04489128255267451 EPOCH: 188 Loss at step 0: 0.07105597853660583 Loss at step 50: 0.04358724504709244 Loss at step 100: 0.0636221170425415 Loss at step 150: 0.059660404920578 Loss at step 200: 0.04381810128688812 Loss at step 250: 0.04805498197674751 Loss at step 300: 0.036966025829315186 Loss at step 350: 0.045569662004709244 Loss at step 400: 0.04108558967709541 Loss at step 450: 0.03863092139363289 Loss at step 500: 0.04803810641169548 Loss at step 550: 0.04085911065340042 Loss at step 600: 0.04926976189017296 Loss at step 650: 0.03582033887505531 Loss at step 700: 0.03573821112513542 Loss at step 750: 0.0430586151778698 Loss at step 800: 0.045410048216581345 Loss at step 850: 0.0395040437579155 Loss at step 900: 0.04151354357600212 Mean training loss after epoch 188: 0.045020847192117526 EPOCH: 189 Loss at step 0: 0.03678242117166519 Loss at step 50: 0.04224667698144913 Loss at step 100: 0.05190664902329445 Loss at step 150: 0.0510210357606411 Loss at step 200: 0.0448233038187027 Loss at step 250: 0.03838038071990013 Loss at step 300: 0.04211239889264107 Loss at step 350: 0.06944701820611954 Loss at step 400: 0.0383700467646122 Loss at step 450: 0.035947974771261215 Loss at step 500: 0.03802557662129402 Loss at step 550: 0.03518896549940109 Loss at step 600: 0.0361245796084404 Loss at step 650: 0.05336225777864456 Loss at step 700: 0.03919314593076706 Loss at step 750: 0.04480776563286781 Loss at step 800: 0.04292784631252289 Loss at step 850: 0.040505893528461456 Loss at step 900: 0.051672160625457764 Mean training loss after epoch 189: 0.04416820053845199 EPOCH: 190 Loss at step 0: 0.037132564932107925 Loss at step 50: 0.03978399932384491 Loss at step 100: 0.04017455503344536 Loss at step 150: 0.02974758855998516 Loss at step 200: 0.034091174602508545 Loss at step 250: 0.042982712388038635 Loss at step 300: 0.04615750163793564 Loss at step 350: 0.03649500012397766 Loss at step 400: 0.03600305691361427 Loss at step 450: 0.04089406877756119 Loss at step 500: 0.03962417319417 Loss at step 550: 0.03398328647017479 Loss at step 600: 0.04663172364234924 Loss at step 650: 0.037379905581474304 Loss at step 700: 0.054393406957387924 Loss at step 750: 0.0380566269159317 Loss at step 800: 0.04001429304480553 Loss at step 850: 0.03456884250044823 Loss at step 900: 0.06186676397919655 Mean training loss after epoch 190: 0.04409443369226606 EPOCH: 191 Loss at step 0: 0.039473600685596466 Loss at step 50: 0.04987269267439842 Loss at step 100: 0.03175457939505577 Loss at step 150: 0.05496815964579582 Loss at step 200: 0.036864619702100754 Loss at step 250: 0.04583047330379486 Loss at step 300: 0.04147879034280777 Loss at step 350: 0.06286437809467316 Loss at step 400: 0.038204483687877655 Loss at step 450: 0.03880622237920761 Loss at step 500: 0.050679873675107956 Loss at step 550: 0.037858497351408005 Loss at step 600: 0.03397248685359955 Loss at step 650: 0.0411633737385273 Loss at step 700: 0.039710935205221176 Loss at step 750: 0.036119185388088226 Loss at step 800: 0.03969964757561684 Loss at step 850: 0.0554770790040493 Loss at step 900: 0.03859982267022133 Mean training loss after epoch 191: 0.045024714911225505 EPOCH: 192 Loss at step 0: 0.05441718176007271 Loss at step 50: 0.04275316372513771 Loss at step 100: 0.03651956468820572 Loss at step 150: 0.03748447075486183 Loss at step 200: 0.042151276022195816 Loss at step 250: 0.03811294585466385 Loss at step 300: 0.036658041179180145 Loss at step 350: 0.03500925377011299 Loss at step 400: 0.04387687146663666 Loss at step 450: 0.05901684984564781 Loss at step 500: 0.04263965040445328 Loss at step 550: 0.039237115532159805 Loss at step 600: 0.059497103095054626 Loss at step 650: 0.03794829547405243 Loss at step 700: 0.040827058255672455 Loss at step 750: 0.05759906768798828 Loss at step 800: 0.04497917741537094 Loss at step 850: 0.04010135680437088 Loss at step 900: 0.03718538582324982 Mean training loss after epoch 192: 0.044246014622625895 EPOCH: 193 Loss at step 0: 0.039787497371435165 Loss at step 50: 0.05037599429488182 Loss at step 100: 0.03623298555612564 Loss at step 150: 0.04062265530228615 Loss at step 200: 0.06584542989730835 Loss at step 250: 0.03863614425063133 Loss at step 300: 0.08427802473306656 Loss at step 350: 0.04203381761908531 Loss at step 400: 0.03830770030617714 Loss at step 450: 0.039331886917352676 Loss at step 500: 0.0702025443315506 Loss at step 550: 0.0394720658659935 Loss at step 600: 0.0536833181977272 Loss at step 650: 0.04961357265710831 Loss at step 700: 0.04249569773674011 Loss at step 750: 0.05418466776609421 Loss at step 800: 0.04909861087799072 Loss at step 850: 0.033759187906980515 Loss at step 900: 0.05396170914173126 Mean training loss after epoch 193: 0.04490992175673307 EPOCH: 194 Loss at step 0: 0.043888017535209656 Loss at step 50: 0.03819920867681503 Loss at step 100: 0.03595161437988281 Loss at step 150: 0.05687680467963219 Loss at step 200: 0.03393780067563057 Loss at step 250: 0.049421265721321106 Loss at step 300: 0.037861596792936325 Loss at step 350: 0.06901735812425613 Loss at step 400: 0.03178342059254646 Loss at step 450: 0.03472926467657089 Loss at step 500: 0.056202761828899384 Loss at step 550: 0.05190080404281616 Loss at step 600: 0.037387944757938385 Loss at step 650: 0.050456054508686066 Loss at step 700: 0.038684096187353134 Loss at step 750: 0.04527813568711281 Loss at step 800: 0.04017271474003792 Loss at step 850: 0.04028673097491264 Loss at step 900: 0.06016369163990021 Mean training loss after epoch 194: 0.04435515471462057 EPOCH: 195 Loss at step 0: 0.042644958943128586 Loss at step 50: 0.04013166204094887 Loss at step 100: 0.04052606225013733 Loss at step 150: 0.03470568731427193 Loss at step 200: 0.04545783996582031 Loss at step 250: 0.03575114533305168 Loss at step 300: 0.04380500316619873 Loss at step 350: 0.046571798622608185 Loss at step 400: 0.037060558795928955 Loss at step 450: 0.036900006234645844 Loss at step 500: 0.05433354899287224 Loss at step 550: 0.044129274785518646 Loss at step 600: 0.03778447210788727 Loss at step 650: 0.06086691841483116 Loss at step 700: 0.05877881124615669 Loss at step 750: 0.03515113890171051 Loss at step 800: 0.05753929540514946 Loss at step 850: 0.033424291759729385 Loss at step 900: 0.03396277874708176 Mean training loss after epoch 195: 0.044560606134678125 EPOCH: 196 Loss at step 0: 0.05090687423944473 Loss at step 50: 0.03411884233355522 Loss at step 100: 0.04391876980662346 Loss at step 150: 0.03864395618438721 Loss at step 200: 0.04558490216732025 Loss at step 250: 0.04708222672343254 Loss at step 300: 0.052004240453243256 Loss at step 350: 0.036968156695365906 Loss at step 400: 0.03956743702292442 Loss at step 450: 0.0406690314412117 Loss at step 500: 0.033270757645368576 Loss at step 550: 0.03827505558729172 Loss at step 600: 0.05641259253025055 Loss at step 650: 0.04077734425663948 Loss at step 700: 0.041342563927173615 Loss at step 750: 0.03640327230095863 Loss at step 800: 0.04145050048828125 Loss at step 850: 0.06099329888820648 Loss at step 900: 0.036867186427116394 Mean training loss after epoch 196: 0.04448432442563365 EPOCH: 197 Loss at step 0: 0.0328943245112896 Loss at step 50: 0.036947183310985565 Loss at step 100: 0.040807388722896576 Loss at step 150: 0.05726994574069977 Loss at step 200: 0.0404985249042511 Loss at step 250: 0.042285967618227005 Loss at step 300: 0.03843049705028534 Loss at step 350: 0.03923683613538742 Loss at step 400: 0.03886844962835312 Loss at step 450: 0.06920197606086731 Loss at step 500: 0.032337289303541183 Loss at step 550: 0.042542923241853714 Loss at step 600: 0.03785329684615135 Loss at step 650: 0.055230338126420975 Loss at step 700: 0.04026314616203308 Loss at step 750: 0.041575510054826736 Loss at step 800: 0.034413523972034454 Loss at step 850: 0.03325406834483147 Loss at step 900: 0.03191730007529259 Mean training loss after epoch 197: 0.04455303780432703 EPOCH: 198 Loss at step 0: 0.05046096071600914 Loss at step 50: 0.05143970251083374 Loss at step 100: 0.039546459913253784 Loss at step 150: 0.040689077228307724 Loss at step 200: 0.05650666356086731 Loss at step 250: 0.03989996016025543 Loss at step 300: 0.05325893312692642 Loss at step 350: 0.041523464024066925 Loss at step 400: 0.04482836648821831 Loss at step 450: 0.040449108928442 Loss at step 500: 0.046822626143693924 Loss at step 550: 0.0404198132455349 Loss at step 600: 0.03865204378962517 Loss at step 650: 0.04052437096834183 Loss at step 700: 0.03780702129006386 Loss at step 750: 0.040986139327287674 Loss at step 800: 0.03426042199134827 Loss at step 850: 0.03757645934820175 Loss at step 900: 0.04048733785748482 Mean training loss after epoch 198: 0.04481908426220928 EPOCH: 199 Loss at step 0: 0.04264058917760849 Loss at step 50: 0.04608980566263199 Loss at step 100: 0.0429585799574852 Loss at step 150: 0.04466039314866066 Loss at step 200: 0.0409453846514225 Loss at step 250: 0.04844653606414795 Loss at step 300: 0.04033463075757027 Loss at step 350: 0.06720422208309174 Loss at step 400: 0.05425972864031792 Loss at step 450: 0.057348527014255524 Loss at step 500: 0.050080034881830215 Loss at step 550: 0.07454514503479004 Loss at step 600: 0.03905537724494934 Loss at step 650: 0.0594196617603302 Loss at step 700: 0.04001199081540108 Loss at step 750: 0.040093302726745605 Loss at step 800: 0.04882723465561867 Loss at step 850: 0.036482810974121094 Loss at step 900: 0.04444018751382828 Mean training loss after epoch 199: 0.044767681437768915 EPOCH: 200 Loss at step 0: 0.036261703819036484 Loss at step 50: 0.050364479422569275 Loss at step 100: 0.05856318399310112 Loss at step 150: 0.039443183690309525 Loss at step 200: 0.05825747177004814 Loss at step 250: 0.048224084079265594 Loss at step 300: 0.045931681990623474 Loss at step 350: 0.03916923701763153 Loss at step 400: 0.04516158252954483 Loss at step 450: 0.03252314776182175 Loss at step 500: 0.032447606325149536 Loss at step 550: 0.03703015670180321 Loss at step 600: 0.03757529705762863 Loss at step 650: 0.04083080589771271 Loss at step 700: 0.03758569434285164 Loss at step 750: 0.0540434904396534 Loss at step 800: 0.05690935626626015 Loss at step 850: 0.050613537430763245 Loss at step 900: 0.03576168790459633 Mean training loss after epoch 200: 0.045134358084214524 EPOCH: 201 Loss at step 0: 0.04179003834724426 Loss at step 50: 0.03944104537367821 Loss at step 100: 0.039319902658462524 Loss at step 150: 0.05394578352570534 Loss at step 200: 0.04152711480855942 Loss at step 250: 0.03468654304742813 Loss at step 300: 0.044628024101257324 Loss at step 350: 0.052683185786008835 Loss at step 400: 0.05324063077569008 Loss at step 450: 0.03184512257575989 Loss at step 500: 0.03944956511259079 Loss at step 550: 0.04089953005313873 Loss at step 600: 0.03726722300052643 Loss at step 650: 0.04498681053519249 Loss at step 700: 0.04093097895383835 Loss at step 750: 0.033684078603982925 Loss at step 800: 0.03274279460310936 Loss at step 850: 0.045319993048906326 Loss at step 900: 0.031044499948620796 Mean training loss after epoch 201: 0.04450676275262319 EPOCH: 202 Loss at step 0: 0.046937622129917145 Loss at step 50: 0.04056081920862198 Loss at step 100: 0.03860340639948845 Loss at step 150: 0.03677826002240181 Loss at step 200: 0.055216941982507706 Loss at step 250: 0.03545172140002251 Loss at step 300: 0.04355182498693466 Loss at step 350: 0.055224306881427765 Loss at step 400: 0.05327907204627991 Loss at step 450: 0.03767799586057663 Loss at step 500: 0.04335283115506172 Loss at step 550: 0.03967515379190445 Loss at step 600: 0.05263349041342735 Loss at step 650: 0.04726291447877884 Loss at step 700: 0.0409478023648262 Loss at step 750: 0.05461839213967323 Loss at step 800: 0.040983255952596664 Loss at step 850: 0.04160749167203903 Loss at step 900: 0.0712745189666748 Mean training loss after epoch 202: 0.045210634096106615 EPOCH: 203 Loss at step 0: 0.038665469735860825 Loss at step 50: 0.04321262612938881 Loss at step 100: 0.03404640406370163 Loss at step 150: 0.041546422988176346 Loss at step 200: 0.054322678595781326 Loss at step 250: 0.04289986938238144 Loss at step 300: 0.056764889508485794 Loss at step 350: 0.039965204894542694 Loss at step 400: 0.0487280935049057 Loss at step 450: 0.0472584143280983 Loss at step 500: 0.03953580930829048 Loss at step 550: 0.03466639295220375 Loss at step 600: 0.0405547171831131 Loss at step 650: 0.05032387375831604 Loss at step 700: 0.0376773439347744 Loss at step 750: 0.05212203785777092 Loss at step 800: 0.052570778876543045 Loss at step 850: 0.039889540523290634 Loss at step 900: 0.03657597303390503 Mean training loss after epoch 203: 0.04529807742820111 EPOCH: 204 Loss at step 0: 0.06427709013223648 Loss at step 50: 0.04181193560361862 Loss at step 100: 0.04083321616053581 Loss at step 150: 0.042014431208372116 Loss at step 200: 0.04877964407205582 Loss at step 250: 0.053028445690870285 Loss at step 300: 0.03556166961789131 Loss at step 350: 0.04489937424659729 Loss at step 400: 0.04536342993378639 Loss at step 450: 0.04286002367734909 Loss at step 500: 0.05612441524863243 Loss at step 550: 0.045176610350608826 Loss at step 600: 0.042835768312215805 Loss at step 650: 0.041226454079151154 Loss at step 700: 0.05275183543562889 Loss at step 750: 0.04104991629719734 Loss at step 800: 0.03207405284047127 Loss at step 850: 0.037956640124320984 Loss at step 900: 0.039791788905858994 Mean training loss after epoch 204: 0.044798484655903345 EPOCH: 205 Loss at step 0: 0.04019807651638985 Loss at step 50: 0.03741755709052086 Loss at step 100: 0.03701980784535408 Loss at step 150: 0.0568574033677578 Loss at step 200: 0.03935074433684349 Loss at step 250: 0.06374751776456833 Loss at step 300: 0.040833793580532074 Loss at step 350: 0.03745630383491516 Loss at step 400: 0.0638345405459404 Loss at step 450: 0.0363474115729332 Loss at step 500: 0.042533304542303085 Loss at step 550: 0.041598957031965256 Loss at step 600: 0.04374314099550247 Loss at step 650: 0.043570905923843384 Loss at step 700: 0.03491118550300598 Loss at step 750: 0.04161426052451134 Loss at step 800: 0.057267192751169205 Loss at step 850: 0.04025007411837578 Loss at step 900: 0.03690268099308014 Mean training loss after epoch 205: 0.04474620818337207 EPOCH: 206 Loss at step 0: 0.0488518625497818 Loss at step 50: 0.04371650889515877 Loss at step 100: 0.04654466733336449 Loss at step 150: 0.037092164158821106 Loss at step 200: 0.057837437838315964 Loss at step 250: 0.05549275875091553 Loss at step 300: 0.04101499170064926 Loss at step 350: 0.047131478786468506 Loss at step 400: 0.034627363085746765 Loss at step 450: 0.05197343975305557 Loss at step 500: 0.033131327480077744 Loss at step 550: 0.04608553647994995 Loss at step 600: 0.03043905459344387 Loss at step 650: 0.08885003626346588 Loss at step 700: 0.03925390914082527 Loss at step 750: 0.04234819859266281 Loss at step 800: 0.042675167322158813 Loss at step 850: 0.03622400388121605 Loss at step 900: 0.0501653216779232 Mean training loss after epoch 206: 0.04483646082916239 EPOCH: 207 Loss at step 0: 0.041329286992549896 Loss at step 50: 0.05029821768403053 Loss at step 100: 0.037916090339422226 Loss at step 150: 0.042153358459472656 Loss at step 200: 0.04264974594116211 Loss at step 250: 0.06181373447179794 Loss at step 300: 0.05471928045153618 Loss at step 350: 0.05230170115828514 Loss at step 400: 0.04071968421339989 Loss at step 450: 0.04595133662223816 Loss at step 500: 0.06311923265457153 Loss at step 550: 0.03960271552205086 Loss at step 600: 0.03320460021495819 Loss at step 650: 0.04211987182497978 Loss at step 700: 0.04706708341836929 Loss at step 750: 0.05702006071805954 Loss at step 800: 0.04131157323718071 Loss at step 850: 0.043862149119377136 Loss at step 900: 0.04290897771716118 Mean training loss after epoch 207: 0.04467255848326853 EPOCH: 208 Loss at step 0: 0.03583333268761635 Loss at step 50: 0.042231831699609756 Loss at step 100: 0.05500105395913124 Loss at step 150: 0.04096812382340431 Loss at step 200: 0.05559578165411949 Loss at step 250: 0.03439216688275337 Loss at step 300: 0.03562644124031067 Loss at step 350: 0.04618927836418152 Loss at step 400: 0.03577146306633949 Loss at step 450: 0.04384012520313263 Loss at step 500: 0.03789527341723442 Loss at step 550: 0.04118846356868744 Loss at step 600: 0.0528583824634552 Loss at step 650: 0.04037889093160629 Loss at step 700: 0.06652871519327164 Loss at step 750: 0.04567474126815796 Loss at step 800: 0.04373334348201752 Loss at step 850: 0.07000480592250824 Loss at step 900: 0.035133954137563705 Mean training loss after epoch 208: 0.045489177352059755 EPOCH: 209 Loss at step 0: 0.040241826325654984 Loss at step 50: 0.045737285166978836 Loss at step 100: 0.03868098556995392 Loss at step 150: 0.05752106010913849 Loss at step 200: 0.03498556837439537 Loss at step 250: 0.03756824880838394 Loss at step 300: 0.046026039868593216 Loss at step 350: 0.04700861871242523 Loss at step 400: 0.04535992071032524 Loss at step 450: 0.0361664853990078 Loss at step 500: 0.03679213672876358 Loss at step 550: 0.04113169014453888 Loss at step 600: 0.04239748790860176 Loss at step 650: 0.03956149145960808 Loss at step 700: 0.06336464732885361 Loss at step 750: 0.036619655787944794 Loss at step 800: 0.041176386177539825 Loss at step 850: 0.03490028902888298 Loss at step 900: 0.03709738329052925 Mean training loss after epoch 209: 0.04465725022687841 EPOCH: 210 Loss at step 0: 0.05687613785266876 Loss at step 50: 0.05013085901737213 Loss at step 100: 0.04858029633760452 Loss at step 150: 0.03588380664587021 Loss at step 200: 0.0372549332678318 Loss at step 250: 0.04096882417798042 Loss at step 300: 0.04422140121459961 Loss at step 350: 0.038299620151519775 Loss at step 400: 0.055386170744895935 Loss at step 450: 0.04114128649234772 Loss at step 500: 0.0361589752137661 Loss at step 550: 0.03733664005994797 Loss at step 600: 0.044938746839761734 Loss at step 650: 0.04162149876356125 Loss at step 700: 0.05209438130259514 Loss at step 750: 0.046597305685281754 Loss at step 800: 0.05847260728478432 Loss at step 850: 0.052423544228076935 Loss at step 900: 0.059762533754110336 Mean training loss after epoch 210: 0.04478163022929227 EPOCH: 211 Loss at step 0: 0.06691908836364746 Loss at step 50: 0.035693906247615814 Loss at step 100: 0.05166630819439888 Loss at step 150: 0.06433721631765366 Loss at step 200: 0.03782796114683151 Loss at step 250: 0.034301064908504486 Loss at step 300: 0.03778960555791855 Loss at step 350: 0.05144350603222847 Loss at step 400: 0.03950542211532593 Loss at step 450: 0.053962159901857376 Loss at step 500: 0.0466562956571579 Loss at step 550: 0.035127975046634674 Loss at step 600: 0.03515129163861275 Loss at step 650: 0.05232604593038559 Loss at step 700: 0.03469924256205559 Loss at step 750: 0.05438494682312012 Loss at step 800: 0.03877584636211395 Loss at step 850: 0.06565561145544052 Loss at step 900: 0.033948276191949844 Mean training loss after epoch 211: 0.04442757227892942 EPOCH: 212 Loss at step 0: 0.042861003428697586 Loss at step 50: 0.045693833380937576 Loss at step 100: 0.0561738982796669 Loss at step 150: 0.0551237016916275 Loss at step 200: 0.04741625860333443 Loss at step 250: 0.054285962134599686 Loss at step 300: 0.05603278428316116 Loss at step 350: 0.036363329738378525 Loss at step 400: 0.03463932126760483 Loss at step 450: 0.04528743773698807 Loss at step 500: 0.05638975277543068 Loss at step 550: 0.04864133149385452 Loss at step 600: 0.05686035752296448 Loss at step 650: 0.052209604531526566 Loss at step 700: 0.038732223212718964 Loss at step 750: 0.040247853845357895 Loss at step 800: 0.039523351937532425 Loss at step 850: 0.053021881729364395 Loss at step 900: 0.04180256277322769 Mean training loss after epoch 212: 0.04472841997382674 EPOCH: 213 Loss at step 0: 0.041997626423835754 Loss at step 50: 0.053836364299058914 Loss at step 100: 0.03242935985326767 Loss at step 150: 0.036811575293540955 Loss at step 200: 0.03557124361395836 Loss at step 250: 0.042105428874492645 Loss at step 300: 0.05059794709086418 Loss at step 350: 0.039921753108501434 Loss at step 400: 0.06578107178211212 Loss at step 450: 0.0527103915810585 Loss at step 500: 0.03761640191078186 Loss at step 550: 0.05834657698869705 Loss at step 600: 0.06001506373286247 Loss at step 650: 0.039713054895401 Loss at step 700: 0.04013936594128609 Loss at step 750: 0.04072520136833191 Loss at step 800: 0.03933260217308998 Loss at step 850: 0.041448961943387985 Loss at step 900: 0.057059962302446365 Mean training loss after epoch 213: 0.0447038265366131 EPOCH: 214 Loss at step 0: 0.030479082837700844 Loss at step 50: 0.04288707301020622 Loss at step 100: 0.038969360291957855 Loss at step 150: 0.041547369211912155 Loss at step 200: 0.05549389868974686 Loss at step 250: 0.06761427968740463 Loss at step 300: 0.037302155047655106 Loss at step 350: 0.05018205940723419 Loss at step 400: 0.03970324993133545 Loss at step 450: 0.03680818900465965 Loss at step 500: 0.037354111671447754 Loss at step 550: 0.04247164726257324 Loss at step 600: 0.038211919367313385 Loss at step 650: 0.05702434107661247 Loss at step 700: 0.0384671613574028 Loss at step 750: 0.05089595168828964 Loss at step 800: 0.037770140916109085 Loss at step 850: 0.03341638669371605 Loss at step 900: 0.06726159900426865 Mean training loss after epoch 214: 0.044803932519641515 EPOCH: 215 Loss at step 0: 0.056243039667606354 Loss at step 50: 0.03829198330640793 Loss at step 100: 0.034186385571956635 Loss at step 150: 0.04691140353679657 Loss at step 200: 0.05554024875164032 Loss at step 250: 0.03711654245853424 Loss at step 300: 0.04364906996488571 Loss at step 350: 0.03453472629189491 Loss at step 400: 0.05391367897391319 Loss at step 450: 0.037369951605796814 Loss at step 500: 0.03188815712928772 Loss at step 550: 0.05440067499876022 Loss at step 600: 0.03814352676272392 Loss at step 650: 0.04470740258693695 Loss at step 700: 0.05335073545575142 Loss at step 750: 0.03624849393963814 Loss at step 800: 0.03799010440707207 Loss at step 850: 0.03769124299287796 Loss at step 900: 0.03659210354089737 Mean training loss after epoch 215: 0.04471600990988679 EPOCH: 216 Loss at step 0: 0.03753291815519333 Loss at step 50: 0.03743390738964081 Loss at step 100: 0.06047659367322922 Loss at step 150: 0.0415237657725811 Loss at step 200: 0.03962986543774605 Loss at step 250: 0.05908052623271942 Loss at step 300: 0.04326390102505684 Loss at step 350: 0.05845440551638603 Loss at step 400: 0.05858081206679344 Loss at step 450: 0.03763098642230034 Loss at step 500: 0.055912960320711136 Loss at step 550: 0.07363441586494446 Loss at step 600: 0.040607403963804245 Loss at step 650: 0.05869431421160698 Loss at step 700: 0.032413434237241745 Loss at step 750: 0.04082751274108887 Loss at step 800: 0.04092312231659889 Loss at step 850: 0.04544469714164734 Loss at step 900: 0.043841440230607986 Mean training loss after epoch 216: 0.044679132363657705 EPOCH: 217 Loss at step 0: 0.05519341677427292 Loss at step 50: 0.0554828941822052 Loss at step 100: 0.04626215994358063 Loss at step 150: 0.04870392754673958 Loss at step 200: 0.04054103419184685 Loss at step 250: 0.05489875748753548 Loss at step 300: 0.03392302617430687 Loss at step 350: 0.036958612501621246 Loss at step 400: 0.035467639565467834 Loss at step 450: 0.03539278358221054 Loss at step 500: 0.03581181541085243 Loss at step 550: 0.03322763368487358 Loss at step 600: 0.03546096384525299 Loss at step 650: 0.03469721972942352 Loss at step 700: 0.0344327911734581 Loss at step 750: 0.03204134479165077 Loss at step 800: 0.04161359369754791 Loss at step 850: 0.034608472138643265 Loss at step 900: 0.05326886102557182 Mean training loss after epoch 217: 0.04492616937803561 EPOCH: 218 Loss at step 0: 0.04289891570806503 Loss at step 50: 0.04113813117146492 Loss at step 100: 0.05178505927324295 Loss at step 150: 0.03500144183635712 Loss at step 200: 0.04575960710644722 Loss at step 250: 0.06521622836589813 Loss at step 300: 0.03964005038142204 Loss at step 350: 0.06534741818904877 Loss at step 400: 0.03503140062093735 Loss at step 450: 0.05881740525364876 Loss at step 500: 0.04038887470960617 Loss at step 550: 0.040449853986501694 Loss at step 600: 0.033399682492017746 Loss at step 650: 0.05029311031103134 Loss at step 700: 0.03451422229409218 Loss at step 750: 0.031593941152095795 Loss at step 800: 0.03843049705028534 Loss at step 850: 0.05415230616927147 Loss at step 900: 0.0408015251159668 Mean training loss after epoch 218: 0.0445497410435444 EPOCH: 219 Loss at step 0: 0.053106870502233505 Loss at step 50: 0.037226706743240356 Loss at step 100: 0.052297212183475494 Loss at step 150: 0.05023274943232536 Loss at step 200: 0.03262278437614441 Loss at step 250: 0.03546475991606712 Loss at step 300: 0.07248827069997787 Loss at step 350: 0.03974536806344986 Loss at step 400: 0.03986159712076187 Loss at step 450: 0.038130249828100204 Loss at step 500: 0.041532181203365326 Loss at step 550: 0.04032932594418526 Loss at step 600: 0.03708921745419502 Loss at step 650: 0.041134849190711975 Loss at step 700: 0.0422060526907444 Loss at step 750: 0.06252138316631317 Loss at step 800: 0.039511941373348236 Loss at step 850: 0.03460269048810005 Loss at step 900: 0.03706207126379013 Mean training loss after epoch 219: 0.04445110564269046 EPOCH: 220 Loss at step 0: 0.040481265634298325 Loss at step 50: 0.03706512972712517 Loss at step 100: 0.038593292236328125 Loss at step 150: 0.03473096713423729 Loss at step 200: 0.05504322052001953 Loss at step 250: 0.04259065166115761 Loss at step 300: 0.04101940989494324 Loss at step 350: 0.03838224336504936 Loss at step 400: 0.05303853377699852 Loss at step 450: 0.04808196797966957 Loss at step 500: 0.04001510515809059 Loss at step 550: 0.04007555544376373 Loss at step 600: 0.05779372155666351 Loss at step 650: 0.034573838114738464 Loss at step 700: 0.05608305707573891 Loss at step 750: 0.05965934693813324 Loss at step 800: 0.03866574913263321 Loss at step 850: 0.04101347178220749 Loss at step 900: 0.05478304252028465 Mean training loss after epoch 220: 0.04494452161559545 EPOCH: 221 Loss at step 0: 0.0368199422955513 Loss at step 50: 0.03767472133040428 Loss at step 100: 0.04108607769012451 Loss at step 150: 0.0377773717045784 Loss at step 200: 0.05094992369413376 Loss at step 250: 0.03807835280895233 Loss at step 300: 0.037324871867895126 Loss at step 350: 0.039368823170661926 Loss at step 400: 0.028165580704808235 Loss at step 450: 0.042976923286914825 Loss at step 500: 0.04643237590789795 Loss at step 550: 0.04107698053121567 Loss at step 600: 0.03591163456439972 Loss at step 650: 0.04017348960042 Loss at step 700: 0.05192948132753372 Loss at step 750: 0.036656223237514496 Loss at step 800: 0.07154205441474915 Loss at step 850: 0.037165142595767975 Loss at step 900: 0.0383647158741951 Mean training loss after epoch 221: 0.044025317461950694 EPOCH: 222 Loss at step 0: 0.04182730242609978 Loss at step 50: 0.04577288776636124 Loss at step 100: 0.03554667532444 Loss at step 150: 0.05106238275766373 Loss at step 200: 0.05746976658701897 Loss at step 250: 0.033837493509054184 Loss at step 300: 0.0384085476398468 Loss at step 350: 0.03922661766409874 Loss at step 400: 0.057755742222070694 Loss at step 450: 0.043094322085380554 Loss at step 500: 0.050574351102113724 Loss at step 550: 0.05048663541674614 Loss at step 600: 0.04478348791599274 Loss at step 650: 0.057520557194948196 Loss at step 700: 0.04994799196720123 Loss at step 750: 0.0636516734957695 Loss at step 800: 0.04923856258392334 Loss at step 850: 0.03554229065775871 Loss at step 900: 0.034735944122076035 Mean training loss after epoch 222: 0.044427500300999005 EPOCH: 223 Loss at step 0: 0.06132660061120987 Loss at step 50: 0.0543343760073185 Loss at step 100: 0.035205356776714325 Loss at step 150: 0.0341426283121109 Loss at step 200: 0.03658345341682434 Loss at step 250: 0.05199233442544937 Loss at step 300: 0.04163883626461029 Loss at step 350: 0.05298907309770584 Loss at step 400: 0.03930951654911041 Loss at step 450: 0.0401400551199913 Loss at step 500: 0.04041733592748642 Loss at step 550: 0.046828873455524445 Loss at step 600: 0.03263270482420921 Loss at step 650: 0.04142157733440399 Loss at step 700: 0.04249532148241997 Loss at step 750: 0.03847498446702957 Loss at step 800: 0.03887447342276573 Loss at step 850: 0.05396873876452446 Loss at step 900: 0.054782234132289886 Mean training loss after epoch 223: 0.04483108876197577 EPOCH: 224 Loss at step 0: 0.042199064046144485 Loss at step 50: 0.04361164569854736 Loss at step 100: 0.03491578623652458 Loss at step 150: 0.059895895421504974 Loss at step 200: 0.03664816543459892 Loss at step 250: 0.05232000723481178 Loss at step 300: 0.03586943820118904 Loss at step 350: 0.041511114686727524 Loss at step 400: 0.03577348217368126 Loss at step 450: 0.04632328078150749 Loss at step 500: 0.03640926629304886 Loss at step 550: 0.03210856765508652 Loss at step 600: 0.04217298701405525 Loss at step 650: 0.043735962361097336 Loss at step 700: 0.05557512119412422 Loss at step 750: 0.044240329414606094 Loss at step 800: 0.03677906095981598 Loss at step 850: 0.05325004830956459 Loss at step 900: 0.03961622342467308 Mean training loss after epoch 224: 0.044534495068606794 EPOCH: 225 Loss at step 0: 0.04622454568743706 Loss at step 50: 0.06210627779364586 Loss at step 100: 0.05201232433319092 Loss at step 150: 0.07234062999486923 Loss at step 200: 0.035236380994319916 Loss at step 250: 0.04349125549197197 Loss at step 300: 0.04192188382148743 Loss at step 350: 0.0352543368935585 Loss at step 400: 0.03887450322508812 Loss at step 450: 0.039311520755290985 Loss at step 500: 0.04055427759885788 Loss at step 550: 0.04948360100388527 Loss at step 600: 0.05527811124920845 Loss at step 650: 0.038460664451122284 Loss at step 700: 0.03765904903411865 Loss at step 750: 0.06473623216152191 Loss at step 800: 0.03736783191561699 Loss at step 850: 0.045657116919755936 Loss at step 900: 0.04801734536886215 Mean training loss after epoch 225: 0.04423962763028104 EPOCH: 226 Loss at step 0: 0.04440918564796448 Loss at step 50: 0.04214627668261528 Loss at step 100: 0.05155540257692337 Loss at step 150: 0.03420787677168846 Loss at step 200: 0.04445880651473999 Loss at step 250: 0.04135341942310333 Loss at step 300: 0.03764950856566429 Loss at step 350: 0.03555797412991524 Loss at step 400: 0.054966408759355545 Loss at step 450: 0.037553977221250534 Loss at step 500: 0.04235851392149925 Loss at step 550: 0.052324917167425156 Loss at step 600: 0.06798584759235382 Loss at step 650: 0.03689294308423996 Loss at step 700: 0.03582879900932312 Loss at step 750: 0.0461050420999527 Loss at step 800: 0.03891498222947121 Loss at step 850: 0.03825835511088371 Loss at step 900: 0.03618009015917778 Mean training loss after epoch 226: 0.04427418285317576 EPOCH: 227 Loss at step 0: 0.052530743181705475 Loss at step 50: 0.03534878045320511 Loss at step 100: 0.03571544960141182 Loss at step 150: 0.03456777706742287 Loss at step 200: 0.05414804443717003 Loss at step 250: 0.035873252898454666 Loss at step 300: 0.05691087990999222 Loss at step 350: 0.03897303715348244 Loss at step 400: 0.057685334235429764 Loss at step 450: 0.043648771941661835 Loss at step 500: 0.04307885095477104 Loss at step 550: 0.042105771601200104 Loss at step 600: 0.04015239328145981 Loss at step 650: 0.033844754099845886 Loss at step 700: 0.0431150421500206 Loss at step 750: 0.0348924845457077 Loss at step 800: 0.0334988534450531 Loss at step 850: 0.04003101587295532 Loss at step 900: 0.03588278219103813 Mean training loss after epoch 227: 0.04456468254351603 EPOCH: 228 Loss at step 0: 0.03684232011437416 Loss at step 50: 0.039001911878585815 Loss at step 100: 0.044856809079647064 Loss at step 150: 0.05445076897740364 Loss at step 200: 0.04170619696378708 Loss at step 250: 0.05851566419005394 Loss at step 300: 0.053710468113422394 Loss at step 350: 0.0749521404504776 Loss at step 400: 0.052428752183914185 Loss at step 450: 0.03739193454384804 Loss at step 500: 0.03277401253581047 Loss at step 550: 0.03896070271730423 Loss at step 600: 0.05095149949193001 Loss at step 650: 0.04222695156931877 Loss at step 700: 0.0360226184129715 Loss at step 750: 0.04675811156630516 Loss at step 800: 0.04110189899802208 Loss at step 850: 0.08451087027788162 Loss at step 900: 0.03852592781186104 Mean training loss after epoch 228: 0.04465478906102145 EPOCH: 229 Loss at step 0: 0.05108466371893883 Loss at step 50: 0.05194531008601189 Loss at step 100: 0.041818682104349136 Loss at step 150: 0.0384753979742527 Loss at step 200: 0.0405755415558815 Loss at step 250: 0.05246740207076073 Loss at step 300: 0.04146997258067131 Loss at step 350: 0.05254256725311279 Loss at step 400: 0.055959608405828476 Loss at step 450: 0.038753923028707504 Loss at step 500: 0.04197828844189644 Loss at step 550: 0.047750987112522125 Loss at step 600: 0.07170537114143372 Loss at step 650: 0.045384716242551804 Loss at step 700: 0.05604903772473335 Loss at step 750: 0.03814692422747612 Loss at step 800: 0.04901311919093132 Loss at step 850: 0.04933915287256241 Loss at step 900: 0.03400961682200432 Mean training loss after epoch 229: 0.044632837734322174 EPOCH: 230 Loss at step 0: 0.03669809550046921 Loss at step 50: 0.04018440097570419 Loss at step 100: 0.04279716685414314 Loss at step 150: 0.043286051601171494 Loss at step 200: 0.03525383397936821 Loss at step 250: 0.056565381586551666 Loss at step 300: 0.038992635905742645 Loss at step 350: 0.06072370335459709 Loss at step 400: 0.03555493429303169 Loss at step 450: 0.04520929604768753 Loss at step 500: 0.03647547587752342 Loss at step 550: 0.03704897686839104 Loss at step 600: 0.03750087693333626 Loss at step 650: 0.04368426650762558 Loss at step 700: 0.036438558250665665 Loss at step 750: 0.05498021841049194 Loss at step 800: 0.04066320136189461 Loss at step 850: 0.04316500201821327 Loss at step 900: 0.05370895564556122 Mean training loss after epoch 230: 0.04457185561778639 EPOCH: 231 Loss at step 0: 0.03770393133163452 Loss at step 50: 0.034805431962013245 Loss at step 100: 0.0444357730448246 Loss at step 150: 0.05281805247068405 Loss at step 200: 0.05921708047389984 Loss at step 250: 0.03885786607861519 Loss at step 300: 0.03273274376988411 Loss at step 350: 0.032848041504621506 Loss at step 400: 0.05635708570480347 Loss at step 450: 0.03409358486533165 Loss at step 500: 0.03834175318479538 Loss at step 550: 0.033972807228565216 Loss at step 600: 0.05163354054093361 Loss at step 650: 0.03665749356150627 Loss at step 700: 0.056586652994155884 Loss at step 750: 0.03766965866088867 Loss at step 800: 0.03514869138598442 Loss at step 850: 0.041739463806152344 Loss at step 900: 0.05937602370977402 Mean training loss after epoch 231: 0.044418961939606455 EPOCH: 232 Loss at step 0: 0.035402242094278336 Loss at step 50: 0.05405203625559807 Loss at step 100: 0.03572479635477066 Loss at step 150: 0.04400161653757095 Loss at step 200: 0.041011713445186615 Loss at step 250: 0.0564279779791832 Loss at step 300: 0.038131412118673325 Loss at step 350: 0.03998841345310211 Loss at step 400: 0.04402971640229225 Loss at step 450: 0.045692600309848785 Loss at step 500: 0.033435121178627014 Loss at step 550: 0.039971645921468735 Loss at step 600: 0.05079904943704605 Loss at step 650: 0.03339352831244469 Loss at step 700: 0.03948865458369255 Loss at step 750: 0.06508280336856842 Loss at step 800: 0.05607238784432411 Loss at step 850: 0.04319646209478378 Loss at step 900: 0.039168547838926315 Mean training loss after epoch 232: 0.044317588815763434 EPOCH: 233 Loss at step 0: 0.04465668275952339 Loss at step 50: 0.06741321831941605 Loss at step 100: 0.04849402606487274 Loss at step 150: 0.043645914644002914 Loss at step 200: 0.047043073922395706 Loss at step 250: 0.060325074940919876 Loss at step 300: 0.04393173009157181 Loss at step 350: 0.04822869226336479 Loss at step 400: 0.03814808279275894 Loss at step 450: 0.056601159274578094 Loss at step 500: 0.0417080782353878 Loss at step 550: 0.03766276314854622 Loss at step 600: 0.05231618881225586 Loss at step 650: 0.0407596118748188 Loss at step 700: 0.03828562796115875 Loss at step 750: 0.042527683079242706 Loss at step 800: 0.04012439027428627 Loss at step 850: 0.03502320125699043 Loss at step 900: 0.03792629763484001 Mean training loss after epoch 233: 0.04423730323579647 EPOCH: 234 Loss at step 0: 0.03807925432920456 Loss at step 50: 0.031229475513100624 Loss at step 100: 0.05022416263818741 Loss at step 150: 0.04872728884220123 Loss at step 200: 0.03880457952618599 Loss at step 250: 0.04016609117388725 Loss at step 300: 0.03846309706568718 Loss at step 350: 0.03860631585121155 Loss at step 400: 0.036757875233888626 Loss at step 450: 0.06192059814929962 Loss at step 500: 0.039699528366327286 Loss at step 550: 0.05446843057870865 Loss at step 600: 0.059930626302957535 Loss at step 650: 0.04280960187315941 Loss at step 700: 0.04683290049433708 Loss at step 750: 0.039119504392147064 Loss at step 800: 0.04021736979484558 Loss at step 850: 0.03949132561683655 Loss at step 900: 0.03685220330953598 Mean training loss after epoch 234: 0.04406493270932548 EPOCH: 235 Loss at step 0: 0.04134300351142883 Loss at step 50: 0.03510057553648949 Loss at step 100: 0.05569342151284218 Loss at step 150: 0.04845584183931351 Loss at step 200: 0.05550575256347656 Loss at step 250: 0.03954542055726051 Loss at step 300: 0.04809613153338432 Loss at step 350: 0.03877285495400429 Loss at step 400: 0.04220467060804367 Loss at step 450: 0.04409005492925644 Loss at step 500: 0.06992071121931076 Loss at step 550: 0.04231961816549301 Loss at step 600: 0.03597664833068848 Loss at step 650: 0.05228782072663307 Loss at step 700: 0.0393548309803009 Loss at step 750: 0.05624103546142578 Loss at step 800: 0.04046499729156494 Loss at step 850: 0.03583335131406784 Loss at step 900: 0.04333672299981117 Mean training loss after epoch 235: 0.04430317855092572 EPOCH: 236 Loss at step 0: 0.032516445964574814 Loss at step 50: 0.04008984565734863 Loss at step 100: 0.0352465845644474 Loss at step 150: 0.046643227338790894 Loss at step 200: 0.059407856315374374 Loss at step 250: 0.037334807217121124 Loss at step 300: 0.062338944524526596 Loss at step 350: 0.04171206057071686 Loss at step 400: 0.04033081233501434 Loss at step 450: 0.05590067803859711 Loss at step 500: 0.04499054700136185 Loss at step 550: 0.05553867295384407 Loss at step 600: 0.045173805207014084 Loss at step 650: 0.07901201397180557 Loss at step 700: 0.040791742503643036 Loss at step 750: 0.030054846778512 Loss at step 800: 0.03897227346897125 Loss at step 850: 0.04771837964653969 Loss at step 900: 0.03987076133489609 Mean training loss after epoch 236: 0.04483670831076118 EPOCH: 237 Loss at step 0: 0.0510728657245636 Loss at step 50: 0.0418047197163105 Loss at step 100: 0.03850840404629707 Loss at step 150: 0.056453071534633636 Loss at step 200: 0.062162622809410095 Loss at step 250: 0.03496583551168442 Loss at step 300: 0.05549657344818115 Loss at step 350: 0.04142916947603226 Loss at step 400: 0.04347146302461624 Loss at step 450: 0.0333283506333828 Loss at step 500: 0.03846332058310509 Loss at step 550: 0.048594940453767776 Loss at step 600: 0.029942316934466362 Loss at step 650: 0.04789109155535698 Loss at step 700: 0.06755321472883224 Loss at step 750: 0.033395860344171524 Loss at step 800: 0.03372882306575775 Loss at step 850: 0.035078924149274826 Loss at step 900: 0.055053096264600754 Mean training loss after epoch 237: 0.04417390492893676 EPOCH: 238 Loss at step 0: 0.05147320777177811 Loss at step 50: 0.039187587797641754 Loss at step 100: 0.055001165717840195 Loss at step 150: 0.03846966475248337 Loss at step 200: 0.03929845988750458 Loss at step 250: 0.03281696140766144 Loss at step 300: 0.032586198300123215 Loss at step 350: 0.04078591242432594 Loss at step 400: 0.0413895919919014 Loss at step 450: 0.03794271498918533 Loss at step 500: 0.03986990824341774 Loss at step 550: 0.05859588086605072 Loss at step 600: 0.04253333434462547 Loss at step 650: 0.06633122265338898 Loss at step 700: 0.04219958186149597 Loss at step 750: 0.04258506000041962 Loss at step 800: 0.038704387843608856 Loss at step 850: 0.04238678514957428 Loss at step 900: 0.041586652398109436 Mean training loss after epoch 238: 0.04494136210475395 EPOCH: 239 Loss at step 0: 0.041239745914936066 Loss at step 50: 0.04201152175664902 Loss at step 100: 0.05637505650520325 Loss at step 150: 0.03948371484875679 Loss at step 200: 0.04181200638413429 Loss at step 250: 0.03835396468639374 Loss at step 300: 0.03822080418467522 Loss at step 350: 0.04060918465256691 Loss at step 400: 0.035568591207265854 Loss at step 450: 0.04555300623178482 Loss at step 500: 0.04333491250872612 Loss at step 550: 0.03358202427625656 Loss at step 600: 0.05711157247424126 Loss at step 650: 0.04011398181319237 Loss at step 700: 0.05212013050913811 Loss at step 750: 0.035279735922813416 Loss at step 800: 0.04327112063765526 Loss at step 850: 0.038576412945985794 Loss at step 900: 0.05415380001068115 Mean training loss after epoch 239: 0.04447213252549614 EPOCH: 240 Loss at step 0: 0.06922220438718796 Loss at step 50: 0.04003436118364334 Loss at step 100: 0.03700927644968033 Loss at step 150: 0.03861986845731735 Loss at step 200: 0.03814982250332832 Loss at step 250: 0.03455357998609543 Loss at step 300: 0.03514722362160683 Loss at step 350: 0.053300146013498306 Loss at step 400: 0.04403198882937431 Loss at step 450: 0.037878118455410004 Loss at step 500: 0.03442619368433952 Loss at step 550: 0.03510485216975212 Loss at step 600: 0.040278252214193344 Loss at step 650: 0.03906048461794853 Loss at step 700: 0.03850758820772171 Loss at step 750: 0.053010500967502594 Loss at step 800: 0.05290995538234711 Loss at step 850: 0.045766741037368774 Loss at step 900: 0.04373601824045181 Mean training loss after epoch 240: 0.0441294677380814 EPOCH: 241 Loss at step 0: 0.042439430952072144 Loss at step 50: 0.040793824940919876 Loss at step 100: 0.04385942593216896 Loss at step 150: 0.05269765853881836 Loss at step 200: 0.051611509174108505 Loss at step 250: 0.04552067443728447 Loss at step 300: 0.037669532001018524 Loss at step 350: 0.03692082315683365 Loss at step 400: 0.04439778998494148 Loss at step 450: 0.038996513932943344 Loss at step 500: 0.03799392282962799 Loss at step 550: 0.03770509362220764 Loss at step 600: 0.04597807675600052 Loss at step 650: 0.04291791096329689 Loss at step 700: 0.035662781447172165 Loss at step 750: 0.05241561308503151 Loss at step 800: 0.034455034881830215 Loss at step 850: 0.039081890136003494 Loss at step 900: 0.04445033147931099 Mean training loss after epoch 241: 0.04406139072872746 EPOCH: 242 Loss at step 0: 0.04944797232747078 Loss at step 50: 0.05216164514422417 Loss at step 100: 0.04875577241182327 Loss at step 150: 0.0673205703496933 Loss at step 200: 0.04306604713201523 Loss at step 250: 0.03722129017114639 Loss at step 300: 0.03307698294520378 Loss at step 350: 0.0400797463953495 Loss at step 400: 0.0405728705227375 Loss at step 450: 0.049708083271980286 Loss at step 500: 0.051399968564510345 Loss at step 550: 0.05209653079509735 Loss at step 600: 0.05289535969495773 Loss at step 650: 0.053849395364522934 Loss at step 700: 0.039101503789424896 Loss at step 750: 0.03636116161942482 Loss at step 800: 0.0371871255338192 Loss at step 850: 0.03941494598984718 Loss at step 900: 0.03299860283732414 Mean training loss after epoch 242: 0.0445653403273174 EPOCH: 243 Loss at step 0: 0.041460614651441574 Loss at step 50: 0.04175207391381264 Loss at step 100: 0.054093774408102036 Loss at step 150: 0.040323127061128616 Loss at step 200: 0.03668613359332085 Loss at step 250: 0.03947689011693001 Loss at step 300: 0.04163401946425438 Loss at step 350: 0.03339975327253342 Loss at step 400: 0.03803552687168121 Loss at step 450: 0.03810008242726326 Loss at step 500: 0.040087487548589706 Loss at step 550: 0.039286136627197266 Loss at step 600: 0.04391473904252052 Loss at step 650: 0.04659884050488472 Loss at step 700: 0.04182414710521698 Loss at step 750: 0.04138852655887604 Loss at step 800: 0.03558828681707382 Loss at step 850: 0.041529227048158646 Loss at step 900: 0.054287709295749664 Mean training loss after epoch 243: 0.04456375608208782 EPOCH: 244 Loss at step 0: 0.03862342983484268 Loss at step 50: 0.03699956461787224 Loss at step 100: 0.04541103169322014 Loss at step 150: 0.054182205349206924 Loss at step 200: 0.035459570586681366 Loss at step 250: 0.06665021181106567 Loss at step 300: 0.03785838931798935 Loss at step 350: 0.04772230610251427 Loss at step 400: 0.07057550549507141 Loss at step 450: 0.04496711492538452 Loss at step 500: 0.03899592161178589 Loss at step 550: 0.03864012286067009 Loss at step 600: 0.05862945690751076 Loss at step 650: 0.03949219360947609 Loss at step 700: 0.03607963025569916 Loss at step 750: 0.03490065410733223 Loss at step 800: 0.039182137697935104 Loss at step 850: 0.036437973380088806 Loss at step 900: 0.038373224437236786 Mean training loss after epoch 244: 0.04429702969915323 EPOCH: 245 Loss at step 0: 0.0385575145483017 Loss at step 50: 0.035694535821676254 Loss at step 100: 0.03993331640958786 Loss at step 150: 0.04976116865873337 Loss at step 200: 0.07041259855031967 Loss at step 250: 0.043676652014255524 Loss at step 300: 0.037697285413742065 Loss at step 350: 0.05178793892264366 Loss at step 400: 0.041463788598775864 Loss at step 450: 0.03819473460316658 Loss at step 500: 0.04713521525263786 Loss at step 550: 0.030759654939174652 Loss at step 600: 0.03611500933766365 Loss at step 650: 0.055565204471349716 Loss at step 700: 0.054949138313531876 Loss at step 750: 0.04036974161863327 Loss at step 800: 0.0694025382399559 Loss at step 850: 0.07210106402635574 Loss at step 900: 0.054195452481508255 Mean training loss after epoch 245: 0.043982705857946294 EPOCH: 246 Loss at step 0: 0.046563711017370224 Loss at step 50: 0.04037541523575783 Loss at step 100: 0.06303931027650833 Loss at step 150: 0.04221680760383606 Loss at step 200: 0.05687280371785164 Loss at step 250: 0.0410008430480957 Loss at step 300: 0.037717729806900024 Loss at step 350: 0.038490865379571915 Loss at step 400: 0.04085135832428932 Loss at step 450: 0.04239315912127495 Loss at step 500: 0.05287067964673042 Loss at step 550: 0.054484669119119644 Loss at step 600: 0.038821443915367126 Loss at step 650: 0.06903842091560364 Loss at step 700: 0.04015694931149483 Loss at step 750: 0.04011210426688194 Loss at step 800: 0.0451258160173893 Loss at step 850: 0.04654308035969734 Loss at step 900: 0.03742603957653046 Mean training loss after epoch 246: 0.044348204621612264 EPOCH: 247 Loss at step 0: 0.04142046719789505 Loss at step 50: 0.033297233283519745 Loss at step 100: 0.046769872307777405 Loss at step 150: 0.037833306938409805 Loss at step 200: 0.036816515028476715 Loss at step 250: 0.05887838080525398 Loss at step 300: 0.059113334864377975 Loss at step 350: 0.05297510698437691 Loss at step 400: 0.04204738885164261 Loss at step 450: 0.032779913395643234 Loss at step 500: 0.057316750288009644 Loss at step 550: 0.03848464414477348 Loss at step 600: 0.03648789972066879 Loss at step 650: 0.0383773110806942 Loss at step 700: 0.04066190496087074 Loss at step 750: 0.03606988489627838 Loss at step 800: 0.04675501585006714 Loss at step 850: 0.038642220199108124 Loss at step 900: 0.03858107700943947 Mean training loss after epoch 247: 0.04431582274022641 EPOCH: 248 Loss at step 0: 0.05418825149536133 Loss at step 50: 0.03973623737692833 Loss at step 100: 0.05273979529738426 Loss at step 150: 0.0548342801630497 Loss at step 200: 0.041761841624975204 Loss at step 250: 0.03664913401007652 Loss at step 300: 0.03196101635694504 Loss at step 350: 0.041133616119623184 Loss at step 400: 0.04098746180534363 Loss at step 450: 0.03807763755321503 Loss at step 500: 0.040364254266023636 Loss at step 550: 0.03996184468269348 Loss at step 600: 0.04357937350869179 Loss at step 650: 0.0520884245634079 Loss at step 700: 0.030209943652153015 Loss at step 750: 0.041414737701416016 Loss at step 800: 0.053582292050123215 Loss at step 850: 0.03609104081988335 Loss at step 900: 0.040039170533418655 Mean training loss after epoch 248: 0.04434315699226122 EPOCH: 249 Loss at step 0: 0.03948770835995674 Loss at step 50: 0.0406712181866169 Loss at step 100: 0.04347073659300804 Loss at step 150: 0.04676469787955284 Loss at step 200: 0.07193492352962494 Loss at step 250: 0.03555160015821457 Loss at step 300: 0.03707060590386391 Loss at step 350: 0.046947382390499115 Loss at step 400: 0.03827507421374321 Loss at step 450: 0.04179251939058304 Loss at step 500: 0.04160206392407417 Loss at step 550: 0.07334379851818085 Loss at step 600: 0.04077136516571045 Loss at step 650: 0.052336789667606354 Loss at step 700: 0.04818994924426079 Loss at step 750: 0.051759831607341766 Loss at step 800: 0.045137569308280945 Loss at step 850: 0.03760245069861412 Loss at step 900: 0.05863180756568909 Mean training loss after epoch 249: 0.044215300583095946 EPOCH: 250 Loss at step 0: 0.038875363767147064 Loss at step 50: 0.037620995193719864 Loss at step 100: 0.03692841902375221 Loss at step 150: 0.03940868750214577 Loss at step 200: 0.03740055114030838 Loss at step 250: 0.03367472440004349 Loss at step 300: 0.038468316197395325 Loss at step 350: 0.037713922560214996 Loss at step 400: 0.036685965955257416 Loss at step 450: 0.03785112500190735 Loss at step 500: 0.04122714698314667 Loss at step 550: 0.03726314753293991 Loss at step 600: 0.03256077319383621 Loss at step 650: 0.055076733231544495 Loss at step 700: 0.03816482797265053 Loss at step 750: 0.055582866072654724 Loss at step 800: 0.08860261738300323 Loss at step 850: 0.03718943893909454 Loss at step 900: 0.05643482133746147 Mean training loss after epoch 250: 0.04433973120259387 EPOCH: 251 Loss at step 0: 0.06810788810253143 Loss at step 50: 0.041349876672029495 Loss at step 100: 0.03786836192011833 Loss at step 150: 0.032446783035993576 Loss at step 200: 0.041399210691452026 Loss at step 250: 0.03198905289173126 Loss at step 300: 0.03318614140152931 Loss at step 350: 0.05619623139500618 Loss at step 400: 0.03853699192404747 Loss at step 450: 0.03942260891199112 Loss at step 500: 0.06082315370440483 Loss at step 550: 0.03650325909256935 Loss at step 600: 0.04712916910648346 Loss at step 650: 0.035325147211551666 Loss at step 700: 0.03782686963677406 Loss at step 750: 0.03523416817188263 Loss at step 800: 0.03904144838452339 Loss at step 850: 0.041671376675367355 Loss at step 900: 0.051437921822071075 Mean training loss after epoch 251: 0.04508220699669392 EPOCH: 252 Loss at step 0: 0.041207026690244675 Loss at step 50: 0.038896676152944565 Loss at step 100: 0.04663672670722008 Loss at step 150: 0.03690395876765251 Loss at step 200: 0.03591988980770111 Loss at step 250: 0.0406290739774704 Loss at step 300: 0.03737616539001465 Loss at step 350: 0.03158732131123543 Loss at step 400: 0.03977910801768303 Loss at step 450: 0.0533040389418602 Loss at step 500: 0.04981403797864914 Loss at step 550: 0.04212784022092819 Loss at step 600: 0.03667016699910164 Loss at step 650: 0.038131147623062134 Loss at step 700: 0.03542795404791832 Loss at step 750: 0.043604787439107895 Loss at step 800: 0.038433175534009933 Loss at step 850: 0.055870309472084045 Loss at step 900: 0.0375676192343235 Mean training loss after epoch 252: 0.04482186316593941 EPOCH: 253 Loss at step 0: 0.05503140017390251 Loss at step 50: 0.03788676857948303 Loss at step 100: 0.055284880101680756 Loss at step 150: 0.0358143150806427 Loss at step 200: 0.05510799214243889 Loss at step 250: 0.04058799520134926 Loss at step 300: 0.0347650907933712 Loss at step 350: 0.0616508387029171 Loss at step 400: 0.03940384089946747 Loss at step 450: 0.03657587990164757 Loss at step 500: 0.05826427415013313 Loss at step 550: 0.041848715394735336 Loss at step 600: 0.06460398435592651 Loss at step 650: 0.039536163210868835 Loss at step 700: 0.05656956136226654 Loss at step 750: 0.05238109454512596 Loss at step 800: 0.04120216146111488 Loss at step 850: 0.0391661636531353 Loss at step 900: 0.04305194690823555 Mean training loss after epoch 253: 0.04483300463151512 EPOCH: 254 Loss at step 0: 0.03971615061163902 Loss at step 50: 0.05664581060409546 Loss at step 100: 0.05656798928976059 Loss at step 150: 0.05187411978840828 Loss at step 200: 0.04007023572921753 Loss at step 250: 0.037236735224723816 Loss at step 300: 0.039175327867269516 Loss at step 350: 0.055862776935100555 Loss at step 400: 0.04621698334813118 Loss at step 450: 0.04217910021543503 Loss at step 500: 0.061341363936662674 Loss at step 550: 0.07238207757472992 Loss at step 600: 0.033694226294755936 Loss at step 650: 0.037575799971818924 Loss at step 700: 0.05597231164574623 Loss at step 750: 0.034979090094566345 Loss at step 800: 0.03639863058924675 Loss at step 850: 0.0475204698741436 Loss at step 900: 0.05808389186859131 Mean training loss after epoch 254: 0.04437024319874071 EPOCH: 255 Loss at step 0: 0.039444949477910995 Loss at step 50: 0.04160661995410919 Loss at step 100: 0.04202868044376373 Loss at step 150: 0.04580506309866905 Loss at step 200: 0.04085221141576767 Loss at step 250: 0.04214933514595032 Loss at step 300: 0.058081235736608505 Loss at step 350: 0.03845188021659851 Loss at step 400: 0.04343468323349953 Loss at step 450: 0.037098903208971024 Loss at step 500: 0.03605657443404198 Loss at step 550: 0.0458068884909153 Loss at step 600: 0.056568074971437454 Loss at step 650: 0.03963596746325493 Loss at step 700: 0.051961176097393036 Loss at step 750: 0.04422375187277794 Loss at step 800: 0.03969626873731613 Loss at step 850: 0.03999221324920654 Loss at step 900: 0.04070429503917694 Mean training loss after epoch 255: 0.04411663428775029 EPOCH: 256 Loss at step 0: 0.05216563493013382 Loss at step 50: 0.05301661416888237 Loss at step 100: 0.0539783351123333 Loss at step 150: 0.04018509015440941 Loss at step 200: 0.03229360282421112 Loss at step 250: 0.03559143468737602 Loss at step 300: 0.0367603674530983 Loss at step 350: 0.053487278521060944 Loss at step 400: 0.054278429597616196 Loss at step 450: 0.03296155855059624 Loss at step 500: 0.03843662515282631 Loss at step 550: 0.05298559367656708 Loss at step 600: 0.04144495353102684 Loss at step 650: 0.04829194396734238 Loss at step 700: 0.04341008514165878 Loss at step 750: 0.04183496534824371 Loss at step 800: 0.05452213063836098 Loss at step 850: 0.040023621171712875 Loss at step 900: 0.03979317843914032 Mean training loss after epoch 256: 0.04401584711076736