/athenahomes/gabrijel/miniconda3/envs/track-generator/lib/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/athenahomes/gabrijel/miniconda3/envs/track-generator/lib/python3.11/site-packages/torchvision/image.so: undefined symbol: _ZN3c1017RegisterOperatorsD1Ev'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source? warn( Schedule: cosine Cfg: True Output path: /scratch/shared/beegfs/gabrijel/m2l/mini Patch Size: 4 Device: cuda:2 ===================================================================================== Layer (type:depth-idx) Param # ===================================================================================== DiT 18,816 ├─PatchEmbed: 1-1 -- │ └─Conv2d: 2-1 6,528 ├─TimestepEmbedder: 1-2 -- │ └─Mlp: 2-2 -- │ │ └─Linear: 3-1 98,688 │ │ └─SiLU: 3-2 -- │ │ └─Linear: 3-3 147,840 ├─LabelEmbedder: 1-3 -- │ └─Embedding: 2-3 4,224 ├─ModuleList: 1-4 -- │ └─DiTBlock: 2-4 -- │ │ └─LayerNorm: 3-4 -- │ │ └─MultiheadAttention: 3-5 591,360 │ │ └─LayerNorm: 3-6 -- │ │ └─Mlp: 3-7 1,181,568 │ │ └─Sequential: 3-8 887,040 │ └─DiTBlock: 2-5 -- │ │ └─LayerNorm: 3-9 -- │ │ └─MultiheadAttention: 3-10 591,360 │ │ └─LayerNorm: 3-11 -- │ │ └─Mlp: 3-12 1,181,568 │ │ └─Sequential: 3-13 887,040 │ └─DiTBlock: 2-6 -- │ │ └─LayerNorm: 3-14 -- │ │ └─MultiheadAttention: 3-15 591,360 │ │ └─LayerNorm: 3-16 -- │ │ └─Mlp: 3-17 1,181,568 │ │ └─Sequential: 3-18 887,040 │ └─DiTBlock: 2-7 -- │ │ └─LayerNorm: 3-19 -- │ │ └─MultiheadAttention: 3-20 591,360 │ │ └─LayerNorm: 3-21 -- │ │ └─Mlp: 3-22 1,181,568 │ │ └─Sequential: 3-23 887,040 │ └─DiTBlock: 2-8 -- │ │ └─LayerNorm: 3-24 -- │ │ └─MultiheadAttention: 3-25 591,360 │ │ └─LayerNorm: 3-26 -- │ │ └─Mlp: 3-27 1,181,568 │ │ └─Sequential: 3-28 887,040 │ └─DiTBlock: 2-9 -- │ │ └─LayerNorm: 3-29 -- │ │ └─MultiheadAttention: 3-30 591,360 │ │ └─LayerNorm: 3-31 -- │ │ └─Mlp: 3-32 1,181,568 │ │ └─Sequential: 3-33 887,040 ├─FinalLayer: 1-5 -- │ └─LayerNorm: 2-10 -- │ └─Linear: 2-11 6,160 │ └─Sequential: 2-12 -- │ │ └─SiLU: 3-34 -- │ │ └─Linear: 3-35 295,680 ├─Unpatchify: 1-6 -- ===================================================================================== Total params: 16,537,744 Trainable params: 16,518,928 Non-trainable params: 18,816 ===================================================================================== EPOCH: 1 Loss at step 0: 1.0032988786697388 Loss at step 50: 0.28100234270095825 Loss at step 100: 0.15790344774723053 Loss at step 150: 0.1532917022705078 Loss at step 200: 0.11566716432571411 Loss at step 250: 0.14305005967617035 Loss at step 300: 0.12596340477466583 Loss at step 350: 0.14476105570793152 Loss at step 400: 0.10752303153276443 Loss at step 450: 0.11066831648349762 Loss at step 500: 0.10961861163377762 Loss at step 550: 0.11502250283956528 Loss at step 600: 0.094396211206913 Loss at step 650: 0.10382624715566635 Loss at step 700: 0.09550666809082031 Loss at step 750: 0.09472190588712692 Loss at step 800: 0.09662456065416336 Loss at step 850: 0.08198032528162003 Loss at step 900: 0.08515343815088272 Mean training loss after epoch 1: 0.1540481198602902 EPOCH: 2 Loss at step 0: 0.11917328834533691 Loss at step 50: 0.08881281316280365 Loss at step 100: 0.08945430815219879 Loss at step 150: 0.07754352688789368 Loss at step 200: 0.09049089252948761 Loss at step 250: 0.11009038984775543 Loss at step 300: 0.1079808920621872 Loss at step 350: 0.08573401719331741 Loss at step 400: 0.08793515712022781 Loss at step 450: 0.0909033939242363 Loss at step 500: 0.07937399297952652 Loss at step 550: 0.1291026473045349 Loss at step 600: 0.08784221112728119 Loss at step 650: 0.07763306051492691 Loss at step 700: 0.0776451900601387 Loss at step 750: 0.07036770135164261 Loss at step 800: 0.07073037326335907 Loss at step 850: 0.07468012720346451 Loss at step 900: 0.08575412631034851 Mean training loss after epoch 2: 0.08906067006671226 EPOCH: 3 Loss at step 0: 0.09433022886514664 Loss at step 50: 0.07529403269290924 Loss at step 100: 0.0672491043806076 Loss at step 150: 0.09711700677871704 Loss at step 200: 0.08488614112138748 Loss at step 250: 0.07206659018993378 Loss at step 300: 0.1012728363275528 Loss at step 350: 0.09898691624403 Loss at step 400: 0.08937603235244751 Loss at step 450: 0.07042986899614334 Loss at step 500: 0.07934796065092087 Loss at step 550: 0.06464157998561859 Loss at step 600: 0.06833136826753616 Loss at step 650: 0.06996160745620728 Loss at step 700: 0.08818931877613068 Loss at step 750: 0.06187369301915169 Loss at step 800: 0.076381616294384 Loss at step 850: 0.07929016649723053 Loss at step 900: 0.07780424505472183 Mean training loss after epoch 3: 0.08227174257688812 EPOCH: 4 Loss at step 0: 0.08530061691999435 Loss at step 50: 0.08677544444799423 Loss at step 100: 0.08733125776052475 Loss at step 150: 0.07325705140829086 Loss at step 200: 0.05939679592847824 Loss at step 250: 0.057559989392757416 Loss at step 300: 0.07129830867052078 Loss at step 350: 0.08439402282238007 Loss at step 400: 0.0764765813946724 Loss at step 450: 0.06626784056425095 Loss at step 500: 0.07943372428417206 Loss at step 550: 0.0881531834602356 Loss at step 600: 0.09719893336296082 Loss at step 650: 0.09041012078523636 Loss at step 700: 0.06861002743244171 Loss at step 750: 0.07708974927663803 Loss at step 800: 0.07114200294017792 Loss at step 850: 0.09723278135061264 Loss at step 900: 0.07782142609357834 Mean training loss after epoch 4: 0.07874018226716437 EPOCH: 5 Loss at step 0: 0.06798771768808365 Loss at step 50: 0.06197076663374901 Loss at step 100: 0.07914486527442932 Loss at step 150: 0.07449166476726532 Loss at step 200: 0.0548844113945961 Loss at step 250: 0.061390623450279236 Loss at step 300: 0.08183029294013977 Loss at step 350: 0.08273658156394958 Loss at step 400: 0.0868213102221489 Loss at step 450: 0.07743315398693085 Loss at step 500: 0.061679523438215256 Loss at step 550: 0.06548775732517242 Loss at step 600: 0.0630403533577919 Loss at step 650: 0.06152629852294922 Loss at step 700: 0.0628242939710617 Loss at step 750: 0.09506598114967346 Loss at step 800: 0.06352828443050385 Loss at step 850: 0.05398479476571083 Loss at step 900: 0.08300777524709702 Mean training loss after epoch 5: 0.07554647528222883 EPOCH: 6 Loss at step 0: 0.08550689369440079 Loss at step 50: 0.06561055034399033 Loss at step 100: 0.10026326030492783 Loss at step 150: 0.058517780154943466 Loss at step 200: 0.0748281329870224 Loss at step 250: 0.07203217595815659 Loss at step 300: 0.08951158821582794 Loss at step 350: 0.09763558954000473 Loss at step 400: 0.07416783273220062 Loss at step 450: 0.07072515040636063 Loss at step 500: 0.051279231905937195 Loss at step 550: 0.07496016472578049 Loss at step 600: 0.053522173315286636 Loss at step 650: 0.06426257640123367 Loss at step 700: 0.05862976610660553 Loss at step 750: 0.06409177929162979 Loss at step 800: 0.05744431912899017 Loss at step 850: 0.05425829067826271 Loss at step 900: 0.07321682572364807 Mean training loss after epoch 6: 0.06662005164237546 EPOCH: 7 Loss at step 0: 0.0585484579205513 Loss at step 50: 0.0657687857747078 Loss at step 100: 0.05798777565360069 Loss at step 150: 0.0729501023888588 Loss at step 200: 0.052548374980688095 Loss at step 250: 0.05256379768252373 Loss at step 300: 0.051424648612737656 Loss at step 350: 0.07288046926259995 Loss at step 400: 0.07918024808168411 Loss at step 450: 0.05400697514414787 Loss at step 500: 0.05738596245646477 Loss at step 550: 0.06573101133108139 Loss at step 600: 0.044168710708618164 Loss at step 650: 0.07504048198461533 Loss at step 700: 0.0842641219496727 Loss at step 750: 0.05520966649055481 Loss at step 800: 0.0748949646949768 Loss at step 850: 0.05452551692724228 Loss at step 900: 0.05990322679281235 Mean training loss after epoch 7: 0.06117044287576858 EPOCH: 8 Loss at step 0: 0.053899168968200684 Loss at step 50: 0.05785033106803894 Loss at step 100: 0.07317903637886047 Loss at step 150: 0.06414580345153809 Loss at step 200: 0.061996642500162125 Loss at step 250: 0.04947176203131676 Loss at step 300: 0.05572287738323212 Loss at step 350: 0.07422453910112381 Loss at step 400: 0.06327630579471588 Loss at step 450: 0.04263661429286003 Loss at step 500: 0.04761233553290367 Loss at step 550: 0.060575030744075775 Loss at step 600: 0.057951945811510086 Loss at step 650: 0.07642796635627747 Loss at step 700: 0.05623459070920944 Loss at step 750: 0.05784178525209427 Loss at step 800: 0.04385168105363846 Loss at step 850: 0.07238267362117767 Loss at step 900: 0.051058147102594376 Mean training loss after epoch 8: 0.058506400060297836 EPOCH: 9 Loss at step 0: 0.06567651033401489 Loss at step 50: 0.05011646822094917 Loss at step 100: 0.0442323200404644 Loss at step 150: 0.07979139685630798 Loss at step 200: 0.059566106647253036 Loss at step 250: 0.04721330851316452 Loss at step 300: 0.05740227922797203 Loss at step 350: 0.04490107297897339 Loss at step 400: 0.07325353473424911 Loss at step 450: 0.05293695256114006 Loss at step 500: 0.0682472437620163 Loss at step 550: 0.05692541226744652 Loss at step 600: 0.04486086964607239 Loss at step 650: 0.05205613747239113 Loss at step 700: 0.0504949577152729 Loss at step 750: 0.060046520084142685 Loss at step 800: 0.04393153265118599 Loss at step 850: 0.041282135993242264 Loss at step 900: 0.04571003094315529 Mean training loss after epoch 9: 0.05718105828075775 EPOCH: 10 Loss at step 0: 0.04927707463502884 Loss at step 50: 0.052660632878541946 Loss at step 100: 0.05112871155142784 Loss at step 150: 0.04420886188745499 Loss at step 200: 0.053314704447984695 Loss at step 250: 0.05112135410308838 Loss at step 300: 0.04187316820025444 Loss at step 350: 0.05586050823330879 Loss at step 400: 0.04854384809732437 Loss at step 450: 0.0473695732653141 Loss at step 500: 0.05124212056398392 Loss at step 550: 0.044917549937963486 Loss at step 600: 0.05567365512251854 Loss at step 650: 0.049273595213890076 Loss at step 700: 0.05710634961724281 Loss at step 750: 0.055023930966854095 Loss at step 800: 0.0506950281560421 Loss at step 850: 0.06732702255249023 Loss at step 900: 0.04814585670828819 Mean training loss after epoch 10: 0.05586914197087034 EPOCH: 11 Loss at step 0: 0.05619771406054497 Loss at step 50: 0.04649684950709343 Loss at step 100: 0.055291302502155304 Loss at step 150: 0.05079489201307297 Loss at step 200: 0.04974418506026268 Loss at step 250: 0.04739781841635704 Loss at step 300: 0.0468808077275753 Loss at step 350: 0.07444166392087936 Loss at step 400: 0.04982239753007889 Loss at step 450: 0.08256076276302338 Loss at step 500: 0.05051376298069954 Loss at step 550: 0.06056930497288704 Loss at step 600: 0.037354350090026855 Loss at step 650: 0.045672960579395294 Loss at step 700: 0.045412372797727585 Loss at step 750: 0.07216699421405792 Loss at step 800: 0.07113915681838989 Loss at step 850: 0.05242380127310753 Loss at step 900: 0.049089837819337845 Mean training loss after epoch 11: 0.05427141437954359 EPOCH: 12 Loss at step 0: 0.05060136318206787 Loss at step 50: 0.05241239815950394 Loss at step 100: 0.06094280257821083 Loss at step 150: 0.04531657695770264 Loss at step 200: 0.03954005986452103 Loss at step 250: 0.04658504202961922 Loss at step 300: 0.07222633063793182 Loss at step 350: 0.04347982257604599 Loss at step 400: 0.053278278559446335 Loss at step 450: 0.0582684688270092 Loss at step 500: 0.04339516907930374 Loss at step 550: 0.05362709239125252 Loss at step 600: 0.055693499743938446 Loss at step 650: 0.047406770288944244 Loss at step 700: 0.0434655025601387 Loss at step 750: 0.061405330896377563 Loss at step 800: 0.059041887521743774 Loss at step 850: 0.043860405683517456 Loss at step 900: 0.0451459065079689 Mean training loss after epoch 12: 0.053681242182406025 EPOCH: 13 Loss at step 0: 0.06393380463123322 Loss at step 50: 0.04970129206776619 Loss at step 100: 0.05190184712409973 Loss at step 150: 0.07499688118696213 Loss at step 200: 0.053462058305740356 Loss at step 250: 0.03378462418913841 Loss at step 300: 0.05154848098754883 Loss at step 350: 0.06942163407802582 Loss at step 400: 0.043218836188316345 Loss at step 450: 0.04835328087210655 Loss at step 500: 0.054426927119493484 Loss at step 550: 0.06357688456773758 Loss at step 600: 0.05531872436404228 Loss at step 650: 0.043588414788246155 Loss at step 700: 0.04750918224453926 Loss at step 750: 0.047899167984724045 Loss at step 800: 0.06586626172065735 Loss at step 850: 0.05884872376918793 Loss at step 900: 0.04350794479250908 Mean training loss after epoch 13: 0.053086005919761876 EPOCH: 14 Loss at step 0: 0.06759799271821976 Loss at step 50: 0.049358248710632324 Loss at step 100: 0.04130908474326134 Loss at step 150: 0.054195791482925415 Loss at step 200: 0.04955044388771057 Loss at step 250: 0.03780859708786011 Loss at step 300: 0.05414270609617233 Loss at step 350: 0.042130932211875916 Loss at step 400: 0.05103324353694916 Loss at step 450: 0.040817949920892715 Loss at step 500: 0.04351351037621498 Loss at step 550: 0.03547561168670654 Loss at step 600: 0.044220708310604095 Loss at step 650: 0.04728607088327408 Loss at step 700: 0.041615214198827744 Loss at step 750: 0.047035444527864456 Loss at step 800: 0.04477541148662567 Loss at step 850: 0.05299461632966995 Loss at step 900: 0.037443045526742935 Mean training loss after epoch 14: 0.05151311736275901 EPOCH: 15 Loss at step 0: 0.05737945809960365 Loss at step 50: 0.0656358152627945 Loss at step 100: 0.053794510662555695 Loss at step 150: 0.04894326627254486 Loss at step 200: 0.04552307352423668 Loss at step 250: 0.049503348767757416 Loss at step 300: 0.06760121881961823 Loss at step 350: 0.06368258595466614 Loss at step 400: 0.06373651325702667 Loss at step 450: 0.06099700927734375 Loss at step 500: 0.05417845398187637 Loss at step 550: 0.05741911381483078 Loss at step 600: 0.06749982386827469 Loss at step 650: 0.042390890419483185 Loss at step 700: 0.053614623844623566 Loss at step 750: 0.07460471242666245 Loss at step 800: 0.06053284928202629 Loss at step 850: 0.047700025141239166 Loss at step 900: 0.08828172832727432 Mean training loss after epoch 15: 0.05189366042931705 EPOCH: 16 Loss at step 0: 0.041451841592788696 Loss at step 50: 0.04900481551885605 Loss at step 100: 0.07347466051578522 Loss at step 150: 0.050105903297662735 Loss at step 200: 0.045521337538957596 Loss at step 250: 0.04869288206100464 Loss at step 300: 0.04296572133898735 Loss at step 350: 0.053801193833351135 Loss at step 400: 0.05193058401346207 Loss at step 450: 0.04602538049221039 Loss at step 500: 0.057385608553886414 Loss at step 550: 0.04785994812846184 Loss at step 600: 0.06599785387516022 Loss at step 650: 0.06853920966386795 Loss at step 700: 0.037615641951560974 Loss at step 750: 0.06823594868183136 Loss at step 800: 0.04431406408548355 Loss at step 850: 0.04909656569361687 Loss at step 900: 0.04431888088583946 Mean training loss after epoch 16: 0.05114124746703263 EPOCH: 17 Loss at step 0: 0.0521666593849659 Loss at step 50: 0.051707684993743896 Loss at step 100: 0.05441327020525932 Loss at step 150: 0.057663094252347946 Loss at step 200: 0.07378870993852615 Loss at step 250: 0.050751421600580215 Loss at step 300: 0.062344908714294434 Loss at step 350: 0.05189160257577896 Loss at step 400: 0.05709874629974365 Loss at step 450: 0.04825239256024361 Loss at step 500: 0.05772879719734192 Loss at step 550: 0.06364763528108597 Loss at step 600: 0.05479339510202408 Loss at step 650: 0.04588032513856888 Loss at step 700: 0.043124567717313766 Loss at step 750: 0.05217563733458519 Loss at step 800: 0.04346119612455368 Loss at step 850: 0.07459196448326111 Loss at step 900: 0.054725296795368195 Mean training loss after epoch 17: 0.05056830106386498 EPOCH: 18 Loss at step 0: 0.05071704462170601 Loss at step 50: 0.034955546259880066 Loss at step 100: 0.04583731293678284 Loss at step 150: 0.04008633643388748 Loss at step 200: 0.06688159704208374 Loss at step 250: 0.047952525317668915 Loss at step 300: 0.043405793607234955 Loss at step 350: 0.04774263873696327 Loss at step 400: 0.046817176043987274 Loss at step 450: 0.03929746896028519 Loss at step 500: 0.079617939889431 Loss at step 550: 0.04334261268377304 Loss at step 600: 0.0471222884953022 Loss at step 650: 0.04792984947562218 Loss at step 700: 0.04215889424085617 Loss at step 750: 0.04638111591339111 Loss at step 800: 0.043817948549985886 Loss at step 850: 0.04502178728580475 Loss at step 900: 0.05787310749292374 Mean training loss after epoch 18: 0.049581253420569495 EPOCH: 19 Loss at step 0: 0.04228626564145088 Loss at step 50: 0.042197369039058685 Loss at step 100: 0.039931416511535645 Loss at step 150: 0.04972762241959572 Loss at step 200: 0.05912093073129654 Loss at step 250: 0.053235869854688644 Loss at step 300: 0.049150481820106506 Loss at step 350: 0.04091176018118858 Loss at step 400: 0.040499769151210785 Loss at step 450: 0.049341194331645966 Loss at step 500: 0.052930958569049835 Loss at step 550: 0.04528976231813431 Loss at step 600: 0.06338972598314285 Loss at step 650: 0.0580691359937191 Loss at step 700: 0.04884226247668266 Loss at step 750: 0.045226141810417175 Loss at step 800: 0.06865664571523666 Loss at step 850: 0.060370225459337234 Loss at step 900: 0.04114345461130142 Mean training loss after epoch 19: 0.04932376959009656 EPOCH: 20 Loss at step 0: 0.04478728771209717 Loss at step 50: 0.04779455438256264 Loss at step 100: 0.04231518134474754 Loss at step 150: 0.04103277996182442 Loss at step 200: 0.05444299802184105 Loss at step 250: 0.04654530808329582 Loss at step 300: 0.040753815323114395 Loss at step 350: 0.04871691018342972 Loss at step 400: 0.043512869626283646 Loss at step 450: 0.0640493705868721 Loss at step 500: 0.04292638972401619 Loss at step 550: 0.045018818229436874 Loss at step 600: 0.05216085538268089 Loss at step 650: 0.04202280193567276 Loss at step 700: 0.060490675270557404 Loss at step 750: 0.0422544851899147 Loss at step 800: 0.06875777244567871 Loss at step 850: 0.07055886089801788 Loss at step 900: 0.04233856126666069 Mean training loss after epoch 20: 0.049361435235785775 EPOCH: 21 Loss at step 0: 0.057242829352617264 Loss at step 50: 0.03898493945598602 Loss at step 100: 0.07967870682477951 Loss at step 150: 0.05630537495017052 Loss at step 200: 0.04372085630893707 Loss at step 250: 0.04970481991767883 Loss at step 300: 0.05237191542983055 Loss at step 350: 0.056337881833314896 Loss at step 400: 0.05814845487475395 Loss at step 450: 0.04169025272130966 Loss at step 500: 0.04148674011230469 Loss at step 550: 0.04293093457818031 Loss at step 600: 0.04959719255566597 Loss at step 650: 0.05694776400923729 Loss at step 700: 0.0667663961648941 Loss at step 750: 0.04168504476547241 Loss at step 800: 0.038989607244729996 Loss at step 850: 0.039551153779029846 Loss at step 900: 0.03581921383738518 Mean training loss after epoch 21: 0.04813995266329251 EPOCH: 22 Loss at step 0: 0.04820111393928528 Loss at step 50: 0.042721573263406754 Loss at step 100: 0.0398503914475441 Loss at step 150: 0.041564516723155975 Loss at step 200: 0.04333219677209854 Loss at step 250: 0.04821101948618889 Loss at step 300: 0.060613084584474564 Loss at step 350: 0.04699994996190071 Loss at step 400: 0.07057081162929535 Loss at step 450: 0.04254617914557457 Loss at step 500: 0.04675067961215973 Loss at step 550: 0.04271329194307327 Loss at step 600: 0.04587005823850632 Loss at step 650: 0.055867817252874374 Loss at step 700: 0.03579968214035034 Loss at step 750: 0.04521609842777252 Loss at step 800: 0.0396389439702034 Loss at step 850: 0.05075797066092491 Loss at step 900: 0.04018908366560936 Mean training loss after epoch 22: 0.0486397048784122 EPOCH: 23 Loss at step 0: 0.04985598102211952 Loss at step 50: 0.05661115422844887 Loss at step 100: 0.04682755470275879 Loss at step 150: 0.04256591945886612 Loss at step 200: 0.03815319389104843 Loss at step 250: 0.043400879949331284 Loss at step 300: 0.04790991172194481 Loss at step 350: 0.03745166212320328 Loss at step 400: 0.035074200481176376 Loss at step 450: 0.05088219419121742 Loss at step 500: 0.04689011350274086 Loss at step 550: 0.04227309301495552 Loss at step 600: 0.06955820322036743 Loss at step 650: 0.055936530232429504 Loss at step 700: 0.038835786283016205 Loss at step 750: 0.04778265953063965 Loss at step 800: 0.046287067234516144 Loss at step 850: 0.054521869868040085 Loss at step 900: 0.04016726091504097 Mean training loss after epoch 23: 0.047982386372951685 EPOCH: 24 Loss at step 0: 0.04150605574250221 Loss at step 50: 0.04815929755568504 Loss at step 100: 0.03120419941842556 Loss at step 150: 0.03939548134803772 Loss at step 200: 0.04055680334568024 Loss at step 250: 0.06227433681488037 Loss at step 300: 0.044589947909116745 Loss at step 350: 0.04759351164102554 Loss at step 400: 0.047371793538331985 Loss at step 450: 0.05117009952664375 Loss at step 500: 0.05378156527876854 Loss at step 550: 0.06377073377370834 Loss at step 600: 0.037574153393507004 Loss at step 650: 0.04335593432188034 Loss at step 700: 0.04391580447554588 Loss at step 750: 0.061173368245363235 Loss at step 800: 0.038800764828920364 Loss at step 850: 0.05175633355975151 Loss at step 900: 0.034596920013427734 Mean training loss after epoch 24: 0.04802751868391342 EPOCH: 25 Loss at step 0: 0.044158726930618286 Loss at step 50: 0.044421661645174026 Loss at step 100: 0.06192682683467865 Loss at step 150: 0.0386945977807045 Loss at step 200: 0.04094390943646431 Loss at step 250: 0.03774237632751465 Loss at step 300: 0.04544094204902649 Loss at step 350: 0.05073751509189606 Loss at step 400: 0.07915718853473663 Loss at step 450: 0.037253402173519135 Loss at step 500: 0.04396594315767288 Loss at step 550: 0.04149482771754265 Loss at step 600: 0.0447382852435112 Loss at step 650: 0.05713300779461861 Loss at step 700: 0.0406673289835453 Loss at step 750: 0.03966266289353371 Loss at step 800: 0.04038640856742859 Loss at step 850: 0.038789160549640656 Loss at step 900: 0.04273026064038277 Mean training loss after epoch 25: 0.04810257169451795 EPOCH: 26 Loss at step 0: 0.0738731250166893 Loss at step 50: 0.039041951298713684 Loss at step 100: 0.05356955900788307 Loss at step 150: 0.044214773923158646 Loss at step 200: 0.041814714670181274 Loss at step 250: 0.037034325301647186 Loss at step 300: 0.07149610668420792 Loss at step 350: 0.040252216160297394 Loss at step 400: 0.04628663882613182 Loss at step 450: 0.04761270061135292 Loss at step 500: 0.05770416185259819 Loss at step 550: 0.05762528255581856 Loss at step 600: 0.056418757885694504 Loss at step 650: 0.04045207053422928 Loss at step 700: 0.035491324961185455 Loss at step 750: 0.04037416726350784 Loss at step 800: 0.05804738029837608 Loss at step 850: 0.061591338366270065 Loss at step 900: 0.044310737401247025 Mean training loss after epoch 26: 0.047380719623808416 EPOCH: 27 Loss at step 0: 0.049038391560316086 Loss at step 50: 0.03846290707588196 Loss at step 100: 0.0712030827999115 Loss at step 150: 0.037491679191589355 Loss at step 200: 0.04702331870794296 Loss at step 250: 0.04332879185676575 Loss at step 300: 0.049409981817007065 Loss at step 350: 0.042355529963970184 Loss at step 400: 0.07228992879390717 Loss at step 450: 0.05311436578631401 Loss at step 500: 0.03335382044315338 Loss at step 550: 0.0439351424574852 Loss at step 600: 0.046124331653118134 Loss at step 650: 0.033601533621549606 Loss at step 700: 0.054854393005371094 Loss at step 750: 0.039578210562467575 Loss at step 800: 0.05442378669977188 Loss at step 850: 0.07322046905755997 Loss at step 900: 0.042541421949863434 Mean training loss after epoch 27: 0.04738947538647062 EPOCH: 28 Loss at step 0: 0.04530884325504303 Loss at step 50: 0.045461926609277725 Loss at step 100: 0.04069161415100098 Loss at step 150: 0.042209409177303314 Loss at step 200: 0.046075399965047836 Loss at step 250: 0.03671642020344734 Loss at step 300: 0.045606691390275955 Loss at step 350: 0.03427951782941818 Loss at step 400: 0.04707861319184303 Loss at step 450: 0.06953071802854538 Loss at step 500: 0.04575486108660698 Loss at step 550: 0.038307223469018936 Loss at step 600: 0.04492930322885513 Loss at step 650: 0.04327325522899628 Loss at step 700: 0.045061539858579636 Loss at step 750: 0.036779627203941345 Loss at step 800: 0.045676086097955704 Loss at step 850: 0.053789637982845306 Loss at step 900: 0.03851575776934624 Mean training loss after epoch 28: 0.04744897519689061 EPOCH: 29 Loss at step 0: 0.06447627395391464 Loss at step 50: 0.03987034782767296 Loss at step 100: 0.042642951011657715 Loss at step 150: 0.036218997091054916 Loss at step 200: 0.042858824133872986 Loss at step 250: 0.04978536069393158 Loss at step 300: 0.04909146577119827 Loss at step 350: 0.044921278953552246 Loss at step 400: 0.048842400312423706 Loss at step 450: 0.039353545755147934 Loss at step 500: 0.06456374377012253 Loss at step 550: 0.042786143720149994 Loss at step 600: 0.046684011816978455 Loss at step 650: 0.059550944715738297 Loss at step 700: 0.057465989142656326 Loss at step 750: 0.03953887149691582 Loss at step 800: 0.0403439886868 Loss at step 850: 0.05080438032746315 Loss at step 900: 0.035545192658901215 Mean training loss after epoch 29: 0.046624985676425604 EPOCH: 30 Loss at step 0: 0.05264035984873772 Loss at step 50: 0.04082728177309036 Loss at step 100: 0.039486490190029144 Loss at step 150: 0.048212796449661255 Loss at step 200: 0.058597903698682785 Loss at step 250: 0.04548010975122452 Loss at step 300: 0.03388722985982895 Loss at step 350: 0.044192615896463394 Loss at step 400: 0.03997396305203438 Loss at step 450: 0.04315615072846413 Loss at step 500: 0.040677353739738464 Loss at step 550: 0.06874975562095642 Loss at step 600: 0.04751063138246536 Loss at step 650: 0.06061069294810295 Loss at step 700: 0.04245021194219589 Loss at step 750: 0.03848959505558014 Loss at step 800: 0.04834328219294548 Loss at step 850: 0.038825634866952896 Loss at step 900: 0.036729175597429276 Mean training loss after epoch 30: 0.04731866832314206 EPOCH: 31 Loss at step 0: 0.04543890058994293 Loss at step 50: 0.05763975903391838 Loss at step 100: 0.03673163801431656 Loss at step 150: 0.05212445557117462 Loss at step 200: 0.040951650589704514 Loss at step 250: 0.048645202070474625 Loss at step 300: 0.039718713611364365 Loss at step 350: 0.04818684607744217 Loss at step 400: 0.047619834542274475 Loss at step 450: 0.03320758044719696 Loss at step 500: 0.0615844801068306 Loss at step 550: 0.052367113530635834 Loss at step 600: 0.044281940907239914 Loss at step 650: 0.04340628162026405 Loss at step 700: 0.04534591734409332 Loss at step 750: 0.0418778620660305 Loss at step 800: 0.04080505296587944 Loss at step 850: 0.04495676979422569 Loss at step 900: 0.058602336794137955 Mean training loss after epoch 31: 0.04682321637781508 EPOCH: 32 Loss at step 0: 0.03586552292108536 Loss at step 50: 0.044016048312187195 Loss at step 100: 0.038552794605493546 Loss at step 150: 0.05735073983669281 Loss at step 200: 0.039340462535619736 Loss at step 250: 0.03860184922814369 Loss at step 300: 0.0499936006963253 Loss at step 350: 0.04247073829174042 Loss at step 400: 0.042582735419273376 Loss at step 450: 0.04172196239233017 Loss at step 500: 0.04375220090150833 Loss at step 550: 0.049852389842271805 Loss at step 600: 0.05309749022126198 Loss at step 650: 0.03964278846979141 Loss at step 700: 0.04146329686045647 Loss at step 750: 0.041789550334215164 Loss at step 800: 0.04341278225183487 Loss at step 850: 0.061948101967573166 Loss at step 900: 0.040852051228284836 Mean training loss after epoch 32: 0.0462073688782545 EPOCH: 33 Loss at step 0: 0.051677390933036804 Loss at step 50: 0.04916350543498993 Loss at step 100: 0.03556419909000397 Loss at step 150: 0.0609838105738163 Loss at step 200: 0.06120261177420616 Loss at step 250: 0.04196576029062271 Loss at step 300: 0.03834303468465805 Loss at step 350: 0.04257172718644142 Loss at step 400: 0.05106821656227112 Loss at step 450: 0.039700672030448914 Loss at step 500: 0.04688606038689613 Loss at step 550: 0.06087465584278107 Loss at step 600: 0.04071589931845665 Loss at step 650: 0.043777648359537125 Loss at step 700: 0.039057888090610504 Loss at step 750: 0.04498720169067383 Loss at step 800: 0.05147164687514305 Loss at step 850: 0.04205849766731262 Loss at step 900: 0.0387691929936409 Mean training loss after epoch 33: 0.04616302853501809 EPOCH: 34 Loss at step 0: 0.04452984407544136 Loss at step 50: 0.03801630437374115 Loss at step 100: 0.03995673358440399 Loss at step 150: 0.04280686378479004 Loss at step 200: 0.05432606115937233 Loss at step 250: 0.03408088907599449 Loss at step 300: 0.031791165471076965 Loss at step 350: 0.0405244380235672 Loss at step 400: 0.03609772399067879 Loss at step 450: 0.032710716128349304 Loss at step 500: 0.060254134237766266 Loss at step 550: 0.04243450611829758 Loss at step 600: 0.044699620455503464 Loss at step 650: 0.04603554308414459 Loss at step 700: 0.0410064198076725 Loss at step 750: 0.04785141721367836 Loss at step 800: 0.04068530350923538 Loss at step 850: 0.044351059943437576 Loss at step 900: 0.0484464168548584 Mean training loss after epoch 34: 0.04589835291049246 EPOCH: 35 Loss at step 0: 0.03427721932530403 Loss at step 50: 0.03641103208065033 Loss at step 100: 0.03501199558377266 Loss at step 150: 0.04129161685705185 Loss at step 200: 0.0690116211771965 Loss at step 250: 0.04111175239086151 Loss at step 300: 0.05558272823691368 Loss at step 350: 0.048011116683483124 Loss at step 400: 0.03979776054620743 Loss at step 450: 0.038475219160318375 Loss at step 500: 0.04866974428296089 Loss at step 550: 0.06619524955749512 Loss at step 600: 0.05033188685774803 Loss at step 650: 0.05432315543293953 Loss at step 700: 0.03721868246793747 Loss at step 750: 0.07640575617551804 Loss at step 800: 0.04290306195616722 Loss at step 850: 0.03675772622227669 Loss at step 900: 0.03862721845507622 Mean training loss after epoch 35: 0.04624822235175732 EPOCH: 36 Loss at step 0: 0.041054368019104004 Loss at step 50: 0.03293498978018761 Loss at step 100: 0.05807509273290634 Loss at step 150: 0.042155671864748 Loss at step 200: 0.049861449748277664 Loss at step 250: 0.04087654873728752 Loss at step 300: 0.039814334362745285 Loss at step 350: 0.05217153578996658 Loss at step 400: 0.0432986244559288 Loss at step 450: 0.04100101813673973 Loss at step 500: 0.034710843116045 Loss at step 550: 0.05269300565123558 Loss at step 600: 0.03851809352636337 Loss at step 650: 0.04115372523665428 Loss at step 700: 0.04274921491742134 Loss at step 750: 0.035769023001194 Loss at step 800: 0.047649454325437546 Loss at step 850: 0.06335577368736267 Loss at step 900: 0.04677771404385567 Mean training loss after epoch 36: 0.045017869855120365 EPOCH: 37 Loss at step 0: 0.037055592983961105 Loss at step 50: 0.04689611494541168 Loss at step 100: 0.0393880158662796 Loss at step 150: 0.04150373861193657 Loss at step 200: 0.06105344370007515 Loss at step 250: 0.0548628568649292 Loss at step 300: 0.04269426688551903 Loss at step 350: 0.035322245210409164 Loss at step 400: 0.03930993750691414 Loss at step 450: 0.05941391736268997 Loss at step 500: 0.026936566457152367 Loss at step 550: 0.039630208164453506 Loss at step 600: 0.04083450883626938 Loss at step 650: 0.06535467505455017 Loss at step 700: 0.03930655121803284 Loss at step 750: 0.04082164913415909 Loss at step 800: 0.043574701994657516 Loss at step 850: 0.036989230662584305 Loss at step 900: 0.05570298060774803 Mean training loss after epoch 37: 0.046344118307966156 EPOCH: 38 Loss at step 0: 0.039952944964170456 Loss at step 50: 0.060863371938467026 Loss at step 100: 0.056630440056324005 Loss at step 150: 0.06893322616815567 Loss at step 200: 0.035041213035583496 Loss at step 250: 0.03354683518409729 Loss at step 300: 0.04462216794490814 Loss at step 350: 0.0508306547999382 Loss at step 400: 0.04117551073431969 Loss at step 450: 0.05938679724931717 Loss at step 500: 0.0429999977350235 Loss at step 550: 0.034874215722084045 Loss at step 600: 0.04204763472080231 Loss at step 650: 0.043633509427309036 Loss at step 700: 0.03138619288802147 Loss at step 750: 0.046009160578250885 Loss at step 800: 0.039435673505067825 Loss at step 850: 0.04889987036585808 Loss at step 900: 0.04411719739437103 Mean training loss after epoch 38: 0.04582460524875726 EPOCH: 39 Loss at step 0: 0.07640911638736725 Loss at step 50: 0.05208307132124901 Loss at step 100: 0.044912103563547134 Loss at step 150: 0.042954929172992706 Loss at step 200: 0.03926927223801613 Loss at step 250: 0.04031513258814812 Loss at step 300: 0.03811677545309067 Loss at step 350: 0.049038566648960114 Loss at step 400: 0.046485718339681625 Loss at step 450: 0.05104729160666466 Loss at step 500: 0.052266139537096024 Loss at step 550: 0.06923632323741913 Loss at step 600: 0.0438576303422451 Loss at step 650: 0.03655534237623215 Loss at step 700: 0.04258132725954056 Loss at step 750: 0.041627269238233566 Loss at step 800: 0.03921901434659958 Loss at step 850: 0.03420531004667282 Loss at step 900: 0.04845491051673889 Mean training loss after epoch 39: 0.045713012188132895 EPOCH: 40 Loss at step 0: 0.04033637046813965 Loss at step 50: 0.0448838546872139 Loss at step 100: 0.04793475568294525 Loss at step 150: 0.035842664539813995 Loss at step 200: 0.03394421190023422 Loss at step 250: 0.041135065257549286 Loss at step 300: 0.039013415575027466 Loss at step 350: 0.03755592182278633 Loss at step 400: 0.04762657359242439 Loss at step 450: 0.03159711882472038 Loss at step 500: 0.0345923975110054 Loss at step 550: 0.042622119188308716 Loss at step 600: 0.05832074582576752 Loss at step 650: 0.058192506432533264 Loss at step 700: 0.04100321605801582 Loss at step 750: 0.0526466928422451 Loss at step 800: 0.04709823802113533 Loss at step 850: 0.040037207305431366 Loss at step 900: 0.058162689208984375 Mean training loss after epoch 40: 0.04529354535241816 EPOCH: 41 Loss at step 0: 0.05011366680264473 Loss at step 50: 0.04204009845852852 Loss at step 100: 0.04977741837501526 Loss at step 150: 0.04213026538491249 Loss at step 200: 0.031131187453866005 Loss at step 250: 0.034031808376312256 Loss at step 300: 0.03464902937412262 Loss at step 350: 0.0667363777756691 Loss at step 400: 0.040202800184488297 Loss at step 450: 0.038813963532447815 Loss at step 500: 0.03280869126319885 Loss at step 550: 0.04071187227964401 Loss at step 600: 0.039284586906433105 Loss at step 650: 0.051950082182884216 Loss at step 700: 0.042672839015722275 Loss at step 750: 0.07990647852420807 Loss at step 800: 0.038235973566770554 Loss at step 850: 0.043944619596004486 Loss at step 900: 0.06580127030611038 Mean training loss after epoch 41: 0.045291238942983814 EPOCH: 42 Loss at step 0: 0.0772307962179184 Loss at step 50: 0.04207930341362953 Loss at step 100: 0.0437316969037056 Loss at step 150: 0.04159977287054062 Loss at step 200: 0.03747226297855377 Loss at step 250: 0.047048479318618774 Loss at step 300: 0.037064068019390106 Loss at step 350: 0.03982170298695564 Loss at step 400: 0.053306419402360916 Loss at step 450: 0.044104646891355515 Loss at step 500: 0.040247250348329544 Loss at step 550: 0.05121247470378876 Loss at step 600: 0.03265077993273735 Loss at step 650: 0.04923049733042717 Loss at step 700: 0.038922492414712906 Loss at step 750: 0.045636579394340515 Loss at step 800: 0.043634749948978424 Loss at step 850: 0.0371113084256649 Loss at step 900: 0.04238105192780495 Mean training loss after epoch 42: 0.045089150149462576 EPOCH: 43 Loss at step 0: 0.05043477192521095 Loss at step 50: 0.03907647356390953 Loss at step 100: 0.03702184930443764 Loss at step 150: 0.040008846670389175 Loss at step 200: 0.04169635847210884 Loss at step 250: 0.04811178892850876 Loss at step 300: 0.042245522141456604 Loss at step 350: 0.039020076394081116 Loss at step 400: 0.042415667325258255 Loss at step 450: 0.05365709215402603 Loss at step 500: 0.045845016837120056 Loss at step 550: 0.03969947621226311 Loss at step 600: 0.04653801769018173 Loss at step 650: 0.040986012667417526 Loss at step 700: 0.03863952308893204 Loss at step 750: 0.03537902608513832 Loss at step 800: 0.041785404086112976 Loss at step 850: 0.03621106594800949 Loss at step 900: 0.05040254816412926 Mean training loss after epoch 43: 0.04519384227264156 EPOCH: 44 Loss at step 0: 0.05104903504252434 Loss at step 50: 0.03660161420702934 Loss at step 100: 0.041797079145908356 Loss at step 150: 0.056867510080337524 Loss at step 200: 0.05402429774403572 Loss at step 250: 0.07812891900539398 Loss at step 300: 0.04420232027769089 Loss at step 350: 0.04686306044459343 Loss at step 400: 0.03163023293018341 Loss at step 450: 0.05214483663439751 Loss at step 500: 0.04913416504859924 Loss at step 550: 0.0477847084403038 Loss at step 600: 0.04103192314505577 Loss at step 650: 0.043909694999456406 Loss at step 700: 0.04345354810357094 Loss at step 750: 0.04103267937898636 Loss at step 800: 0.04657016322016716 Loss at step 850: 0.03583376109600067 Loss at step 900: 0.061176594346761703 Mean training loss after epoch 44: 0.045247607248853135 EPOCH: 45 Loss at step 0: 0.035292848944664 Loss at step 50: 0.03834798187017441 Loss at step 100: 0.04595617204904556 Loss at step 150: 0.03913861885666847 Loss at step 200: 0.03372667357325554 Loss at step 250: 0.03358614072203636 Loss at step 300: 0.08240310102701187 Loss at step 350: 0.03516072407364845 Loss at step 400: 0.03716601803898811 Loss at step 450: 0.04046781733632088 Loss at step 500: 0.039766740053892136 Loss at step 550: 0.0402786061167717 Loss at step 600: 0.039516471326351166 Loss at step 650: 0.040419578552246094 Loss at step 700: 0.052264198660850525 Loss at step 750: 0.054281365126371384 Loss at step 800: 0.028285956010222435 Loss at step 850: 0.06413894891738892 Loss at step 900: 0.04265845566987991 Mean training loss after epoch 45: 0.04522183960847763 EPOCH: 46 Loss at step 0: 0.03985786437988281 Loss at step 50: 0.04146808385848999 Loss at step 100: 0.06060174107551575 Loss at step 150: 0.04009333252906799 Loss at step 200: 0.03494905307888985 Loss at step 250: 0.045710645616054535 Loss at step 300: 0.03618847206234932 Loss at step 350: 0.03652859106659889 Loss at step 400: 0.052702657878398895 Loss at step 450: 0.04365064576268196 Loss at step 500: 0.039007291197776794 Loss at step 550: 0.04299476742744446 Loss at step 600: 0.04161442071199417 Loss at step 650: 0.03856796771287918 Loss at step 700: 0.051069967448711395 Loss at step 750: 0.041965968906879425 Loss at step 800: 0.040831904858350754 Loss at step 850: 0.05974890664219856 Loss at step 900: 0.036682914942502975 Mean training loss after epoch 46: 0.04505698912655875 EPOCH: 47 Loss at step 0: 0.04610956832766533 Loss at step 50: 0.058215584605932236 Loss at step 100: 0.03838493674993515 Loss at step 150: 0.04786888509988785 Loss at step 200: 0.037456244230270386 Loss at step 250: 0.045611798763275146 Loss at step 300: 0.04103695601224899 Loss at step 350: 0.035197313874959946 Loss at step 400: 0.07306955754756927 Loss at step 450: 0.04680854454636574 Loss at step 500: 0.045103028416633606 Loss at step 550: 0.045555759221315384 Loss at step 600: 0.0345003604888916 Loss at step 650: 0.03938684239983559 Loss at step 700: 0.039016298949718475 Loss at step 750: 0.044986143708229065 Loss at step 800: 0.03744972124695778 Loss at step 850: 0.03995202109217644 Loss at step 900: 0.03368764370679855 Mean training loss after epoch 47: 0.04513063731351133 EPOCH: 48 Loss at step 0: 0.04350524768233299 Loss at step 50: 0.060863398015499115 Loss at step 100: 0.0333227664232254 Loss at step 150: 0.03986271470785141 Loss at step 200: 0.05863669887185097 Loss at step 250: 0.05796366557478905 Loss at step 300: 0.04159175977110863 Loss at step 350: 0.0533243864774704 Loss at step 400: 0.043375324457883835 Loss at step 450: 0.0409679152071476 Loss at step 500: 0.03501559793949127 Loss at step 550: 0.03828950226306915 Loss at step 600: 0.06662052124738693 Loss at step 650: 0.04269475117325783 Loss at step 700: 0.042974043637514114 Loss at step 750: 0.032780494540929794 Loss at step 800: 0.05983198806643486 Loss at step 850: 0.05910483002662659 Loss at step 900: 0.045382242649793625 Mean training loss after epoch 48: 0.04481174193346488 EPOCH: 49 Loss at step 0: 0.04578622058033943 Loss at step 50: 0.036628738045692444 Loss at step 100: 0.041522443294525146 Loss at step 150: 0.04253847524523735 Loss at step 200: 0.04357890039682388 Loss at step 250: 0.03647568076848984 Loss at step 300: 0.033183008432388306 Loss at step 350: 0.04829011857509613 Loss at step 400: 0.03458862751722336 Loss at step 450: 0.03374733030796051 Loss at step 500: 0.03659965842962265 Loss at step 550: 0.04090893268585205 Loss at step 600: 0.041405949741601944 Loss at step 650: 0.0478866808116436 Loss at step 700: 0.04418734088540077 Loss at step 750: 0.03688157722353935 Loss at step 800: 0.05839547514915466 Loss at step 850: 0.03965429216623306 Loss at step 900: 0.05136933550238609 Mean training loss after epoch 49: 0.04456062049372618 EPOCH: 50 Loss at step 0: 0.03721202164888382 Loss at step 50: 0.058499548584222794 Loss at step 100: 0.04153984785079956 Loss at step 150: 0.04430457577109337 Loss at step 200: 0.04395736753940582 Loss at step 250: 0.03468909487128258 Loss at step 300: 0.05051280930638313 Loss at step 350: 0.054895345121622086 Loss at step 400: 0.05611054599285126 Loss at step 450: 0.04133675992488861 Loss at step 500: 0.07565674185752869 Loss at step 550: 0.040542490780353546 Loss at step 600: 0.04535050690174103 Loss at step 650: 0.06903442740440369 Loss at step 700: 0.049368467181921005 Loss at step 750: 0.03324176371097565 Loss at step 800: 0.04077507182955742 Loss at step 850: 0.049969982355833054 Loss at step 900: 0.04658673703670502 Mean training loss after epoch 50: 0.044827424265793774 EPOCH: 51 Loss at step 0: 0.041674789041280746 Loss at step 50: 0.07659906148910522 Loss at step 100: 0.0518621988594532 Loss at step 150: 0.03988570719957352 Loss at step 200: 0.0530693456530571 Loss at step 250: 0.06237221509218216 Loss at step 300: 0.05326875299215317 Loss at step 350: 0.03704452887177467 Loss at step 400: 0.038866326212882996 Loss at step 450: 0.05775166302919388 Loss at step 500: 0.05775808170437813 Loss at step 550: 0.05583389475941658 Loss at step 600: 0.04127657040953636 Loss at step 650: 0.044436946511268616 Loss at step 700: 0.04051068425178528 Loss at step 750: 0.058910299092531204 Loss at step 800: 0.04445767402648926 Loss at step 850: 0.04186765477061272 Loss at step 900: 0.03834076225757599 Mean training loss after epoch 51: 0.04438713311291199 EPOCH: 52 Loss at step 0: 0.052693191915750504 Loss at step 50: 0.03920779004693031 Loss at step 100: 0.03543086722493172 Loss at step 150: 0.04069437459111214 Loss at step 200: 0.04017770290374756 Loss at step 250: 0.0385870598256588 Loss at step 300: 0.03626216575503349 Loss at step 350: 0.06309404969215393 Loss at step 400: 0.0529196597635746 Loss at step 450: 0.04862214997410774 Loss at step 500: 0.058813415467739105 Loss at step 550: 0.03600115701556206 Loss at step 600: 0.043110061436891556 Loss at step 650: 0.060965459793806076 Loss at step 700: 0.05340484902262688 Loss at step 750: 0.038335170596838 Loss at step 800: 0.06913969665765762 Loss at step 850: 0.04643036425113678 Loss at step 900: 0.06210217624902725 Mean training loss after epoch 52: 0.044694147757820483 EPOCH: 53 Loss at step 0: 0.053369034081697464 Loss at step 50: 0.04277252405881882 Loss at step 100: 0.04557925835251808 Loss at step 150: 0.05085182934999466 Loss at step 200: 0.03728601336479187 Loss at step 250: 0.03649943694472313 Loss at step 300: 0.04009496793150902 Loss at step 350: 0.027651352807879448 Loss at step 400: 0.06505263596773148 Loss at step 450: 0.06590829789638519 Loss at step 500: 0.04075932502746582 Loss at step 550: 0.05507941171526909 Loss at step 600: 0.037964966148138046 Loss at step 650: 0.031593091785907745 Loss at step 700: 0.044959452003240585 Loss at step 750: 0.05538303032517433 Loss at step 800: 0.03053993731737137 Loss at step 850: 0.03970097005367279 Loss at step 900: 0.04476042091846466 Mean training loss after epoch 53: 0.04411795005988655 EPOCH: 54 Loss at step 0: 0.04348362982273102 Loss at step 50: 0.056138332933187485 Loss at step 100: 0.05467775836586952 Loss at step 150: 0.0384548045694828 Loss at step 200: 0.05451337620615959 Loss at step 250: 0.03390023112297058 Loss at step 300: 0.05078355595469475 Loss at step 350: 0.04042322561144829 Loss at step 400: 0.042468760162591934 Loss at step 450: 0.07247456908226013 Loss at step 500: 0.03948535397648811 Loss at step 550: 0.03673892468214035 Loss at step 600: 0.05291754752397537 Loss at step 650: 0.03432352468371391 Loss at step 700: 0.04061224311590195 Loss at step 750: 0.06771548092365265 Loss at step 800: 0.04963546618819237 Loss at step 850: 0.038799386471509933 Loss at step 900: 0.03607568517327309 Mean training loss after epoch 54: 0.04451905764234282 EPOCH: 55 Loss at step 0: 0.038764771074056625 Loss at step 50: 0.03729073703289032 Loss at step 100: 0.05500001832842827 Loss at step 150: 0.03877246379852295 Loss at step 200: 0.033786363899707794 Loss at step 250: 0.03781069442629814 Loss at step 300: 0.056109990924596786 Loss at step 350: 0.050873979926109314 Loss at step 400: 0.04471708834171295 Loss at step 450: 0.041386689990758896 Loss at step 500: 0.03778187930583954 Loss at step 550: 0.03396514803171158 Loss at step 600: 0.03287336602807045 Loss at step 650: 0.045914020389318466 Loss at step 700: 0.03805634006857872 Loss at step 750: 0.04088671877980232 Loss at step 800: 0.0631185993552208 Loss at step 850: 0.0301095899194479 Loss at step 900: 0.05724484845995903 Mean training loss after epoch 55: 0.04416637424665537 EPOCH: 56 Loss at step 0: 0.04013429954648018 Loss at step 50: 0.03995443135499954 Loss at step 100: 0.03449016064405441 Loss at step 150: 0.05941744148731232 Loss at step 200: 0.03430594131350517 Loss at step 250: 0.04036951810121536 Loss at step 300: 0.057713113725185394 Loss at step 350: 0.04798108711838722 Loss at step 400: 0.041298218071460724 Loss at step 450: 0.03666747361421585 Loss at step 500: 0.039826344698667526 Loss at step 550: 0.031247366219758987 Loss at step 600: 0.041256990283727646 Loss at step 650: 0.0510532408952713 Loss at step 700: 0.05131274834275246 Loss at step 750: 0.03973361849784851 Loss at step 800: 0.05414649471640587 Loss at step 850: 0.053432706743478775 Loss at step 900: 0.0535944439470768 Mean training loss after epoch 56: 0.04448502279444735 EPOCH: 57 Loss at step 0: 0.04106990993022919 Loss at step 50: 0.04753556102514267 Loss at step 100: 0.05139553174376488 Loss at step 150: 0.056965164840221405 Loss at step 200: 0.04258835315704346 Loss at step 250: 0.05772864818572998 Loss at step 300: 0.04447953775525093 Loss at step 350: 0.037768758833408356 Loss at step 400: 0.03535608947277069 Loss at step 450: 0.04180369898676872 Loss at step 500: 0.042708620429039 Loss at step 550: 0.04845503345131874 Loss at step 600: 0.03896493837237358 Loss at step 650: 0.056047532707452774 Loss at step 700: 0.038474343717098236 Loss at step 750: 0.053751710802316666 Loss at step 800: 0.03910607472062111 Loss at step 850: 0.054330892860889435 Loss at step 900: 0.03776271268725395 Mean training loss after epoch 57: 0.04434565956325038 EPOCH: 58 Loss at step 0: 0.037215717136859894 Loss at step 50: 0.032638631761074066 Loss at step 100: 0.05127355828881264 Loss at step 150: 0.047395095229148865 Loss at step 200: 0.035622186958789825 Loss at step 250: 0.05029334872961044 Loss at step 300: 0.031070074066519737 Loss at step 350: 0.03343849256634712 Loss at step 400: 0.044326964765787125 Loss at step 450: 0.049742553383111954 Loss at step 500: 0.06049850955605507 Loss at step 550: 0.035870060324668884 Loss at step 600: 0.035055339336395264 Loss at step 650: 0.03915105015039444 Loss at step 700: 0.0364338718354702 Loss at step 750: 0.05474480614066124 Loss at step 800: 0.04849613457918167 Loss at step 850: 0.04633571207523346 Loss at step 900: 0.05306543409824371 Mean training loss after epoch 58: 0.04476543783601413 EPOCH: 59 Loss at step 0: 0.04269300028681755 Loss at step 50: 0.036249835044145584 Loss at step 100: 0.03650631383061409 Loss at step 150: 0.03397783637046814 Loss at step 200: 0.06729648262262344 Loss at step 250: 0.03886761516332626 Loss at step 300: 0.03839433193206787 Loss at step 350: 0.03744744136929512 Loss at step 400: 0.06058405339717865 Loss at step 450: 0.0355243906378746 Loss at step 500: 0.03686053305864334 Loss at step 550: 0.03507908806204796 Loss at step 600: 0.03575192764401436 Loss at step 650: 0.040934592485427856 Loss at step 700: 0.0554889440536499 Loss at step 750: 0.06874354183673859 Loss at step 800: 0.049705274403095245 Loss at step 850: 0.059904176741838455 Loss at step 900: 0.05714351683855057 Mean training loss after epoch 59: 0.04417288842509741 EPOCH: 60 Loss at step 0: 0.037019938230514526 Loss at step 50: 0.0353710800409317 Loss at step 100: 0.037612151354551315 Loss at step 150: 0.055569928139448166 Loss at step 200: 0.04338991269469261 Loss at step 250: 0.0372660867869854 Loss at step 300: 0.05050265043973923 Loss at step 350: 0.034928590059280396 Loss at step 400: 0.05242051184177399 Loss at step 450: 0.03831075131893158 Loss at step 500: 0.03791706636548042 Loss at step 550: 0.03913911059498787 Loss at step 600: 0.044629912823438644 Loss at step 650: 0.0355302169919014 Loss at step 700: 0.04739176481962204 Loss at step 750: 0.0685577318072319 Loss at step 800: 0.041057705879211426 Loss at step 850: 0.03980248421430588 Loss at step 900: 0.040162138640880585 Mean training loss after epoch 60: 0.04341694105988436 EPOCH: 61 Loss at step 0: 0.05443108081817627 Loss at step 50: 0.06467065215110779 Loss at step 100: 0.03419261798262596 Loss at step 150: 0.037817951291799545 Loss at step 200: 0.03623431921005249 Loss at step 250: 0.039390068501234055 Loss at step 300: 0.039723411202430725 Loss at step 350: 0.04816046357154846 Loss at step 400: 0.039156123995780945 Loss at step 450: 0.04266924783587456 Loss at step 500: 0.05981460586190224 Loss at step 550: 0.03927464038133621 Loss at step 600: 0.03746930882334709 Loss at step 650: 0.03883933648467064 Loss at step 700: 0.0364956296980381 Loss at step 750: 0.04019448533654213 Loss at step 800: 0.036613985896110535 Loss at step 850: 0.03683716431260109 Loss at step 900: 0.06945187598466873 Mean training loss after epoch 61: 0.04420928464634523 EPOCH: 62 Loss at step 0: 0.04358210787177086 Loss at step 50: 0.03247460350394249 Loss at step 100: 0.03609451279044151 Loss at step 150: 0.04466622695326805 Loss at step 200: 0.060206882655620575 Loss at step 250: 0.05667277052998543 Loss at step 300: 0.057884328067302704 Loss at step 350: 0.05067787319421768 Loss at step 400: 0.032946374267339706 Loss at step 450: 0.03521326556801796 Loss at step 500: 0.04274969920516014 Loss at step 550: 0.045447807759046555 Loss at step 600: 0.03791704401373863 Loss at step 650: 0.03841477259993553 Loss at step 700: 0.033977847546339035 Loss at step 750: 0.05122512951493263 Loss at step 800: 0.06106852367520332 Loss at step 850: 0.039144162088632584 Loss at step 900: 0.03995443880558014 Mean training loss after epoch 62: 0.043732919402595265 EPOCH: 63 Loss at step 0: 0.03815672546625137 Loss at step 50: 0.050264883786439896 Loss at step 100: 0.04439234733581543 Loss at step 150: 0.06198287755250931 Loss at step 200: 0.04588256776332855 Loss at step 250: 0.03678915277123451 Loss at step 300: 0.04422404617071152 Loss at step 350: 0.05034128576517105 Loss at step 400: 0.05333563685417175 Loss at step 450: 0.04801738262176514 Loss at step 500: 0.033807236701250076 Loss at step 550: 0.04020163416862488 Loss at step 600: 0.04083223640918732 Loss at step 650: 0.04149266704916954 Loss at step 700: 0.04398471862077713 Loss at step 750: 0.04754597321152687 Loss at step 800: 0.060221485793590546 Loss at step 850: 0.04963650926947594 Loss at step 900: 0.041407033801078796 Mean training loss after epoch 63: 0.04385997945550027 EPOCH: 64 Loss at step 0: 0.040046464651823044 Loss at step 50: 0.03484063968062401 Loss at step 100: 0.035342779010534286 Loss at step 150: 0.03742619976401329 Loss at step 200: 0.03765399381518364 Loss at step 250: 0.059254616498947144 Loss at step 300: 0.03355315327644348 Loss at step 350: 0.050833262503147125 Loss at step 400: 0.046321380883455276 Loss at step 450: 0.03242103382945061 Loss at step 500: 0.04384983330965042 Loss at step 550: 0.028412621468305588 Loss at step 600: 0.05334277078509331 Loss at step 650: 0.03919610008597374 Loss at step 700: 0.055286046117544174 Loss at step 750: 0.039194878190755844 Loss at step 800: 0.04105188697576523 Loss at step 850: 0.054702192544937134 Loss at step 900: 0.05970655009150505 Mean training loss after epoch 64: 0.04382909899120773 EPOCH: 65 Loss at step 0: 0.03367344290018082 Loss at step 50: 0.0354597344994545 Loss at step 100: 0.035989440977573395 Loss at step 150: 0.03798596188426018 Loss at step 200: 0.0610586479306221 Loss at step 250: 0.04177812486886978 Loss at step 300: 0.030401073396205902 Loss at step 350: 0.03905997425317764 Loss at step 400: 0.03326280787587166 Loss at step 450: 0.034041862934827805 Loss at step 500: 0.0394960418343544 Loss at step 550: 0.03998494893312454 Loss at step 600: 0.0551016628742218 Loss at step 650: 0.058599479496479034 Loss at step 700: 0.06312040239572525 Loss at step 750: 0.05376887321472168 Loss at step 800: 0.046181660145521164 Loss at step 850: 0.04129238426685333 Loss at step 900: 0.053095243871212006 Mean training loss after epoch 65: 0.043387788544490394 EPOCH: 66 Loss at step 0: 0.05289135128259659 Loss at step 50: 0.06815432012081146 Loss at step 100: 0.03486901894211769 Loss at step 150: 0.03665197268128395 Loss at step 200: 0.03154068812727928 Loss at step 250: 0.04155510291457176 Loss at step 300: 0.042474210262298584 Loss at step 350: 0.03589129075407982 Loss at step 400: 0.04365568608045578 Loss at step 450: 0.047383975237607956 Loss at step 500: 0.048636239022016525 Loss at step 550: 0.034667667001485825 Loss at step 600: 0.03305600583553314 Loss at step 650: 0.052088744938373566 Loss at step 700: 0.04252149537205696 Loss at step 750: 0.04315147548913956 Loss at step 800: 0.03580925241112709 Loss at step 850: 0.03682782128453255 Loss at step 900: 0.0367286391556263 Mean training loss after epoch 66: 0.04358888258621382 EPOCH: 67 Loss at step 0: 0.057258546352386475 Loss at step 50: 0.03748401254415512 Loss at step 100: 0.05534794181585312 Loss at step 150: 0.0344119630753994 Loss at step 200: 0.03639748692512512 Loss at step 250: 0.0411214642226696 Loss at step 300: 0.05639127641916275 Loss at step 350: 0.03786665201187134 Loss at step 400: 0.034221503883600235 Loss at step 450: 0.0328063927590847 Loss at step 500: 0.03967868164181709 Loss at step 550: 0.03393542766571045 Loss at step 600: 0.03444861248135567 Loss at step 650: 0.04365673288702965 Loss at step 700: 0.05683682858943939 Loss at step 750: 0.03911632299423218 Loss at step 800: 0.04050537571310997 Loss at step 850: 0.03792897239327431 Loss at step 900: 0.03783582150936127 Mean training loss after epoch 67: 0.04331578487661411 EPOCH: 68 Loss at step 0: 0.0345102921128273 Loss at step 50: 0.05480239540338516 Loss at step 100: 0.044552598148584366 Loss at step 150: 0.05394255742430687 Loss at step 200: 0.036491990089416504 Loss at step 250: 0.04225953295826912 Loss at step 300: 0.03236745670437813 Loss at step 350: 0.055431634187698364 Loss at step 400: 0.04698576033115387 Loss at step 450: 0.05353078991174698 Loss at step 500: 0.0517372190952301 Loss at step 550: 0.07307250797748566 Loss at step 600: 0.04057915881276131 Loss at step 650: 0.05076027661561966 Loss at step 700: 0.07868611812591553 Loss at step 750: 0.057362716645002365 Loss at step 800: 0.041063278913497925 Loss at step 850: 0.04307812452316284 Loss at step 900: 0.03890369459986687 Mean training loss after epoch 68: 0.04351941579535826 EPOCH: 69 Loss at step 0: 0.04516177996993065 Loss at step 50: 0.04142066463828087 Loss at step 100: 0.04467983916401863 Loss at step 150: 0.04826439172029495 Loss at step 200: 0.034170184284448624 Loss at step 250: 0.04293690249323845 Loss at step 300: 0.045061588287353516 Loss at step 350: 0.059500064700841904 Loss at step 400: 0.05270945653319359 Loss at step 450: 0.044725194573402405 Loss at step 500: 0.039622776210308075 Loss at step 550: 0.06403230875730515 Loss at step 600: 0.04146086424589157 Loss at step 650: 0.03829912096261978 Loss at step 700: 0.03391618654131889 Loss at step 750: 0.044226061552762985 Loss at step 800: 0.04004054144024849 Loss at step 850: 0.0455927774310112 Loss at step 900: 0.03712063282728195 Mean training loss after epoch 69: 0.04342332778613705 EPOCH: 70 Loss at step 0: 0.0423751063644886 Loss at step 50: 0.030321940779685974 Loss at step 100: 0.03864332661032677 Loss at step 150: 0.043860290199518204 Loss at step 200: 0.04766250029206276 Loss at step 250: 0.03530235216021538 Loss at step 300: 0.038718342781066895 Loss at step 350: 0.03203143924474716 Loss at step 400: 0.05068623274564743 Loss at step 450: 0.040166180580854416 Loss at step 500: 0.05197145789861679 Loss at step 550: 0.056420858949422836 Loss at step 600: 0.06334302574396133 Loss at step 650: 0.04419821500778198 Loss at step 700: 0.05083312839269638 Loss at step 750: 0.03966745361685753 Loss at step 800: 0.0368986651301384 Loss at step 850: 0.037707842886447906 Loss at step 900: 0.037850260734558105 Mean training loss after epoch 70: 0.04336900550967404 EPOCH: 71 Loss at step 0: 0.03733111172914505 Loss at step 50: 0.030389469116926193 Loss at step 100: 0.03566594794392586 Loss at step 150: 0.06146685406565666 Loss at step 200: 0.04514341801404953 Loss at step 250: 0.030727380886673927 Loss at step 300: 0.057443056255578995 Loss at step 350: 0.05871230736374855 Loss at step 400: 0.03506730869412422 Loss at step 450: 0.03634852170944214 Loss at step 500: 0.04066995903849602 Loss at step 550: 0.04442782700061798 Loss at step 600: 0.037578314542770386 Loss at step 650: 0.029170960187911987 Loss at step 700: 0.05442596971988678 Loss at step 750: 0.03350013494491577 Loss at step 800: 0.05183800309896469 Loss at step 850: 0.03568757697939873 Loss at step 900: 0.05002089962363243 Mean training loss after epoch 71: 0.04379487072806686 EPOCH: 72 Loss at step 0: 0.05867093801498413 Loss at step 50: 0.03618363291025162 Loss at step 100: 0.03833587467670441 Loss at step 150: 0.04944595322012901 Loss at step 200: 0.03897697106003761 Loss at step 250: 0.04640533775091171 Loss at step 300: 0.04102756455540657 Loss at step 350: 0.038510724902153015 Loss at step 400: 0.03700847551226616 Loss at step 450: 0.04004450514912605 Loss at step 500: 0.042017776519060135 Loss at step 550: 0.03899156302213669 Loss at step 600: 0.04415284842252731 Loss at step 650: 0.04386115446686745 Loss at step 700: 0.039281342178583145 Loss at step 750: 0.04084251448512077 Loss at step 800: 0.050894834101200104 Loss at step 850: 0.040509793907403946 Loss at step 900: 0.046887610107660294 Mean training loss after epoch 72: 0.04331664659623017 EPOCH: 73 Loss at step 0: 0.034433797001838684 Loss at step 50: 0.04201598837971687 Loss at step 100: 0.059156645089387894 Loss at step 150: 0.038104016333818436 Loss at step 200: 0.04817300662398338 Loss at step 250: 0.04323570802807808 Loss at step 300: 0.03921014443039894 Loss at step 350: 0.0369136743247509 Loss at step 400: 0.048426855355501175 Loss at step 450: 0.03871035575866699 Loss at step 500: 0.05355559289455414 Loss at step 550: 0.042402710765600204 Loss at step 600: 0.04162994772195816 Loss at step 650: 0.06282988935709 Loss at step 700: 0.04596627503633499 Loss at step 750: 0.040705110877752304 Loss at step 800: 0.059697188436985016 Loss at step 850: 0.03585108742117882 Loss at step 900: 0.03499142825603485 Mean training loss after epoch 73: 0.043123359915051764 EPOCH: 74 Loss at step 0: 0.055764030665159225 Loss at step 50: 0.035522691905498505 Loss at step 100: 0.03944651037454605 Loss at step 150: 0.03404494747519493 Loss at step 200: 0.054167017340660095 Loss at step 250: 0.045357923954725266 Loss at step 300: 0.0798131600022316 Loss at step 350: 0.04721348360180855 Loss at step 400: 0.04061078652739525 Loss at step 450: 0.03321346640586853 Loss at step 500: 0.04168355464935303 Loss at step 550: 0.039873287081718445 Loss at step 600: 0.030855227261781693 Loss at step 650: 0.05285376310348511 Loss at step 700: 0.0395192913711071 Loss at step 750: 0.03783942013978958 Loss at step 800: 0.03440512716770172 Loss at step 850: 0.03771792724728584 Loss at step 900: 0.03853467479348183 Mean training loss after epoch 74: 0.04349475961003794 EPOCH: 75 Loss at step 0: 0.04388419911265373 Loss at step 50: 0.03633366897702217 Loss at step 100: 0.043419141322374344 Loss at step 150: 0.04856560006737709 Loss at step 200: 0.04187382385134697 Loss at step 250: 0.04095287248492241 Loss at step 300: 0.03438714146614075 Loss at step 350: 0.03767545148730278 Loss at step 400: 0.05222098529338837 Loss at step 450: 0.04703434556722641 Loss at step 500: 0.03931450843811035 Loss at step 550: 0.03860877826809883 Loss at step 600: 0.05392081290483475 Loss at step 650: 0.044611986726522446 Loss at step 700: 0.05068850889801979 Loss at step 750: 0.04055117070674896 Loss at step 800: 0.04954316467046738 Loss at step 850: 0.03628535196185112 Loss at step 900: 0.037140265107154846 Mean training loss after epoch 75: 0.04344091003240426 EPOCH: 76 Loss at step 0: 0.0343879759311676 Loss at step 50: 0.05361231788992882 Loss at step 100: 0.0380454920232296 Loss at step 150: 0.03423832729458809 Loss at step 200: 0.03713914379477501 Loss at step 250: 0.04609273746609688 Loss at step 300: 0.0515567809343338 Loss at step 350: 0.07265965640544891 Loss at step 400: 0.03148298338055611 Loss at step 450: 0.03703472018241882 Loss at step 500: 0.03727557882666588 Loss at step 550: 0.058256879448890686 Loss at step 600: 0.027964435517787933 Loss at step 650: 0.0567343533039093 Loss at step 700: 0.03368888795375824 Loss at step 750: 0.04406190291047096 Loss at step 800: 0.05421852320432663 Loss at step 850: 0.04743503779172897 Loss at step 900: 0.03985704481601715 Mean training loss after epoch 76: 0.0434630046834919 EPOCH: 77 Loss at step 0: 0.03183684125542641 Loss at step 50: 0.037883464246988297 Loss at step 100: 0.04415455460548401 Loss at step 150: 0.053632915019989014 Loss at step 200: 0.04381205886602402 Loss at step 250: 0.052479296922683716 Loss at step 300: 0.05958736687898636 Loss at step 350: 0.03841213881969452 Loss at step 400: 0.03418457508087158 Loss at step 450: 0.04150049015879631 Loss at step 500: 0.0394926555454731 Loss at step 550: 0.04740995168685913 Loss at step 600: 0.042850811034440994 Loss at step 650: 0.04100580886006355 Loss at step 700: 0.03945142403244972 Loss at step 750: 0.03582557663321495 Loss at step 800: 0.03739253804087639 Loss at step 850: 0.034367337822914124 Loss at step 900: 0.040952593088150024 Mean training loss after epoch 77: 0.04321287293384261 EPOCH: 78 Loss at step 0: 0.04119819402694702 Loss at step 50: 0.034810151904821396 Loss at step 100: 0.03249846398830414 Loss at step 150: 0.05472606047987938 Loss at step 200: 0.036216311156749725 Loss at step 250: 0.06442862749099731 Loss at step 300: 0.03778067231178284 Loss at step 350: 0.054383646696805954 Loss at step 400: 0.0353001169860363 Loss at step 450: 0.03855694457888603 Loss at step 500: 0.04806986451148987 Loss at step 550: 0.040879301726818085 Loss at step 600: 0.053744301199913025 Loss at step 650: 0.040531158447265625 Loss at step 700: 0.03517676889896393 Loss at step 750: 0.05384642630815506 Loss at step 800: 0.050301406532526016 Loss at step 850: 0.04929354786872864 Loss at step 900: 0.04053040221333504 Mean training loss after epoch 78: 0.04328321536649456 EPOCH: 79 Loss at step 0: 0.04194871336221695 Loss at step 50: 0.03198212385177612 Loss at step 100: 0.03828536346554756 Loss at step 150: 0.03559303656220436 Loss at step 200: 0.03919830173254013 Loss at step 250: 0.03772548958659172 Loss at step 300: 0.03586260974407196 Loss at step 350: 0.043631747364997864 Loss at step 400: 0.04422422870993614 Loss at step 450: 0.048818062990903854 Loss at step 500: 0.03155799210071564 Loss at step 550: 0.05843576043844223 Loss at step 600: 0.05522342398762703 Loss at step 650: 0.038866207003593445 Loss at step 700: 0.036214429885149 Loss at step 750: 0.03130275383591652 Loss at step 800: 0.0340677835047245 Loss at step 850: 0.04786337912082672 Loss at step 900: 0.031822774559259415 Mean training loss after epoch 79: 0.04343328970287845 EPOCH: 80 Loss at step 0: 0.03619969263672829 Loss at step 50: 0.03505738452076912 Loss at step 100: 0.0416874960064888 Loss at step 150: 0.0397004634141922 Loss at step 200: 0.04935501143336296 Loss at step 250: 0.035591986030340195 Loss at step 300: 0.034097764641046524 Loss at step 350: 0.0753348246216774 Loss at step 400: 0.045698486268520355 Loss at step 450: 0.05276000499725342 Loss at step 500: 0.039212241768836975 Loss at step 550: 0.034957434982061386 Loss at step 600: 0.04207983613014221 Loss at step 650: 0.04190154746174812 Loss at step 700: 0.03177616372704506 Loss at step 750: 0.04089004918932915 Loss at step 800: 0.04352838918566704 Loss at step 850: 0.035795073956251144 Loss at step 900: 0.04028657078742981 Mean training loss after epoch 80: 0.043427567684780684 EPOCH: 81 Loss at step 0: 0.042431291192770004 Loss at step 50: 0.04434075206518173 Loss at step 100: 0.03444292023777962 Loss at step 150: 0.034287624061107635 Loss at step 200: 0.04747597500681877 Loss at step 250: 0.07046405971050262 Loss at step 300: 0.053709570318460464 Loss at step 350: 0.044494032859802246 Loss at step 400: 0.03507664054632187 Loss at step 450: 0.05276798456907272 Loss at step 500: 0.0337027944624424 Loss at step 550: 0.03558002784848213 Loss at step 600: 0.03827740252017975 Loss at step 650: 0.06272587180137634 Loss at step 700: 0.0387282595038414 Loss at step 750: 0.036419566720724106 Loss at step 800: 0.03936458006501198 Loss at step 850: 0.0405547209084034 Loss at step 900: 0.042010705918073654 Mean training loss after epoch 81: 0.04371572843095514 EPOCH: 82 Loss at step 0: 0.03459695726633072 Loss at step 50: 0.059825580567121506 Loss at step 100: 0.05627289414405823 Loss at step 150: 0.03709516301751137 Loss at step 200: 0.03274582698941231 Loss at step 250: 0.03857039660215378 Loss at step 300: 0.05362662300467491 Loss at step 350: 0.028752855956554413 Loss at step 400: 0.0427774116396904 Loss at step 450: 0.03295108675956726 Loss at step 500: 0.035480353981256485 Loss at step 550: 0.0386529378592968 Loss at step 600: 0.05622414872050285 Loss at step 650: 0.036230552941560745 Loss at step 700: 0.037954311817884445 Loss at step 750: 0.04108721762895584 Loss at step 800: 0.029807908460497856 Loss at step 850: 0.03318273648619652 Loss at step 900: 0.032488565891981125 Mean training loss after epoch 82: 0.042967537912462696 EPOCH: 83 Loss at step 0: 0.03957919031381607 Loss at step 50: 0.05966862663626671 Loss at step 100: 0.03850177675485611 Loss at step 150: 0.03814758360385895 Loss at step 200: 0.05008736252784729 Loss at step 250: 0.03801500052213669 Loss at step 300: 0.03611404821276665 Loss at step 350: 0.05439632758498192 Loss at step 400: 0.03871960937976837 Loss at step 450: 0.036402326077222824 Loss at step 500: 0.028765158727765083 Loss at step 550: 0.048865433782339096 Loss at step 600: 0.03361959755420685 Loss at step 650: 0.0361751951277256 Loss at step 700: 0.04165593907237053 Loss at step 750: 0.067274309694767 Loss at step 800: 0.03976880759000778 Loss at step 850: 0.03604168817400932 Loss at step 900: 0.03472847864031792 Mean training loss after epoch 83: 0.043061317586060015 EPOCH: 84 Loss at step 0: 0.03923407942056656 Loss at step 50: 0.05730864033102989 Loss at step 100: 0.04049661010503769 Loss at step 150: 0.04807131364941597 Loss at step 200: 0.04765217378735542 Loss at step 250: 0.03335307538509369 Loss at step 300: 0.0422411747276783 Loss at step 350: 0.04837876930832863 Loss at step 400: 0.031468722969293594 Loss at step 450: 0.03963646665215492 Loss at step 500: 0.05229082331061363 Loss at step 550: 0.05960356444120407 Loss at step 600: 0.05173199251294136 Loss at step 650: 0.03792847320437431 Loss at step 700: 0.04248504340648651 Loss at step 750: 0.035815827548503876 Loss at step 800: 0.05281852185726166 Loss at step 850: 0.037402376532554626 Loss at step 900: 0.0452844500541687 Mean training loss after epoch 84: 0.042966231151716286 EPOCH: 85 Loss at step 0: 0.0366511344909668 Loss at step 50: 0.03707302361726761 Loss at step 100: 0.040821511298418045 Loss at step 150: 0.04278367757797241 Loss at step 200: 0.03191452845931053 Loss at step 250: 0.04186934605240822 Loss at step 300: 0.03885969892144203 Loss at step 350: 0.038280971348285675 Loss at step 400: 0.043265823274850845 Loss at step 450: 0.036392029374837875 Loss at step 500: 0.036326102912425995 Loss at step 550: 0.06019260734319687 Loss at step 600: 0.03924421966075897 Loss at step 650: 0.039656706154346466 Loss at step 700: 0.0372033566236496 Loss at step 750: 0.04699388146400452 Loss at step 800: 0.055457230657339096 Loss at step 850: 0.030086882412433624 Loss at step 900: 0.02990223467350006 Mean training loss after epoch 85: 0.042536891708924954 EPOCH: 86 Loss at step 0: 0.05130326747894287 Loss at step 50: 0.03300853446125984 Loss at step 100: 0.036591097712516785 Loss at step 150: 0.030882099643349648 Loss at step 200: 0.04031068831682205 Loss at step 250: 0.03158494457602501 Loss at step 300: 0.04480988532304764 Loss at step 350: 0.05302272364497185 Loss at step 400: 0.04243970662355423 Loss at step 450: 0.03975902870297432 Loss at step 500: 0.05695364251732826 Loss at step 550: 0.07452802360057831 Loss at step 600: 0.04246452823281288 Loss at step 650: 0.03339885547757149 Loss at step 700: 0.051078420132398605 Loss at step 750: 0.033357683569192886 Loss at step 800: 0.03185432776808739 Loss at step 850: 0.05841461196541786 Loss at step 900: 0.05896671861410141 Mean training loss after epoch 86: 0.04324152772185772 EPOCH: 87 Loss at step 0: 0.03475942835211754 Loss at step 50: 0.05159994959831238 Loss at step 100: 0.03108212910592556 Loss at step 150: 0.03585439547896385 Loss at step 200: 0.03803715854883194 Loss at step 250: 0.04172193631529808 Loss at step 300: 0.03311793506145477 Loss at step 350: 0.042666126042604446 Loss at step 400: 0.04034850746393204 Loss at step 450: 0.049102120101451874 Loss at step 500: 0.03519275784492493 Loss at step 550: 0.03459713235497475 Loss at step 600: 0.03990953043103218 Loss at step 650: 0.0728394091129303 Loss at step 700: 0.0427023246884346 Loss at step 750: 0.049997903406620026 Loss at step 800: 0.0453764833509922 Loss at step 850: 0.04252421855926514 Loss at step 900: 0.05488305911421776 Mean training loss after epoch 87: 0.04274524296167245 EPOCH: 88 Loss at step 0: 0.05471730977296829 Loss at step 50: 0.056022848933935165 Loss at step 100: 0.040596332401037216 Loss at step 150: 0.051124874502420425 Loss at step 200: 0.044127557426691055 Loss at step 250: 0.03722318261861801 Loss at step 300: 0.050390247255563736 Loss at step 350: 0.04235997423529625 Loss at step 400: 0.034908175468444824 Loss at step 450: 0.033560801297426224 Loss at step 500: 0.04489276185631752 Loss at step 550: 0.042110878974199295 Loss at step 600: 0.048575304448604584 Loss at step 650: 0.033107779920101166 Loss at step 700: 0.03481648489832878 Loss at step 750: 0.03892889246344566 Loss at step 800: 0.05361012741923332 Loss at step 850: 0.04063637927174568 Loss at step 900: 0.053515736013650894 Mean training loss after epoch 88: 0.04307139576124802 EPOCH: 89 Loss at step 0: 0.04820404574275017 Loss at step 50: 0.03489292412996292 Loss at step 100: 0.04863356426358223 Loss at step 150: 0.05758465453982353 Loss at step 200: 0.05572289600968361 Loss at step 250: 0.04746093973517418 Loss at step 300: 0.03288865089416504 Loss at step 350: 0.03116542473435402 Loss at step 400: 0.05131383612751961 Loss at step 450: 0.038161735981702805 Loss at step 500: 0.03702284395694733 Loss at step 550: 0.0480567067861557 Loss at step 600: 0.033356573432683945 Loss at step 650: 0.053302399814128876 Loss at step 700: 0.033832304179668427 Loss at step 750: 0.041473738849163055 Loss at step 800: 0.056586917489767075 Loss at step 850: 0.0345584861934185 Loss at step 900: 0.04435005411505699 Mean training loss after epoch 89: 0.04328680808332239 EPOCH: 90 Loss at step 0: 0.047606855630874634 Loss at step 50: 0.04018019512295723 Loss at step 100: 0.03586830943822861 Loss at step 150: 0.03451627492904663 Loss at step 200: 0.03475628048181534 Loss at step 250: 0.03728964924812317 Loss at step 300: 0.039771657437086105 Loss at step 350: 0.03414740040898323 Loss at step 400: 0.04197125509381294 Loss at step 450: 0.03973749279975891 Loss at step 500: 0.05776713788509369 Loss at step 550: 0.03985593840479851 Loss at step 600: 0.02793002501130104 Loss at step 650: 0.054175302386283875 Loss at step 700: 0.027408519759774208 Loss at step 750: 0.061310600489377975 Loss at step 800: 0.0400681234896183 Loss at step 850: 0.053323760628700256 Loss at step 900: 0.05491902679204941 Mean training loss after epoch 90: 0.04289123003106954 EPOCH: 91 Loss at step 0: 0.033909887075424194 Loss at step 50: 0.0437256321310997 Loss at step 100: 0.03694885969161987 Loss at step 150: 0.039778877049684525 Loss at step 200: 0.03751223161816597 Loss at step 250: 0.0417797788977623 Loss at step 300: 0.04823096841573715 Loss at step 350: 0.06883671879768372 Loss at step 400: 0.03908750042319298 Loss at step 450: 0.03905390575528145 Loss at step 500: 0.04777468740940094 Loss at step 550: 0.03328052908182144 Loss at step 600: 0.07238498330116272 Loss at step 650: 0.03659852594137192 Loss at step 700: 0.03590676560997963 Loss at step 750: 0.053549155592918396 Loss at step 800: 0.03421996906399727 Loss at step 850: 0.030873049050569534 Loss at step 900: 0.0393933430314064 Mean training loss after epoch 91: 0.04305210614056666 EPOCH: 92 Loss at step 0: 0.04011550545692444 Loss at step 50: 0.06091095507144928 Loss at step 100: 0.038123007863759995 Loss at step 150: 0.03346842899918556 Loss at step 200: 0.03723548725247383 Loss at step 250: 0.04279014840722084 Loss at step 300: 0.030391424894332886 Loss at step 350: 0.036060504615306854 Loss at step 400: 0.038071583956480026 Loss at step 450: 0.028947332873940468 Loss at step 500: 0.05115755274891853 Loss at step 550: 0.04308745637536049 Loss at step 600: 0.040355805307626724 Loss at step 650: 0.04092932119965553 Loss at step 700: 0.03406473621726036 Loss at step 750: 0.04425012320280075 Loss at step 800: 0.04782664030790329 Loss at step 850: 0.04364293813705444 Loss at step 900: 0.026517856866121292 Mean training loss after epoch 92: 0.04281410130896548 EPOCH: 93 Loss at step 0: 0.05114510655403137 Loss at step 50: 0.03565933555364609 Loss at step 100: 0.04560065269470215 Loss at step 150: 0.036366794258356094 Loss at step 200: 0.055939555168151855 Loss at step 250: 0.04558614641427994 Loss at step 300: 0.03254948928952217 Loss at step 350: 0.03610646724700928 Loss at step 400: 0.05478349328041077 Loss at step 450: 0.05785294994711876 Loss at step 500: 0.04104214161634445 Loss at step 550: 0.04105930030345917 Loss at step 600: 0.04729423671960831 Loss at step 650: 0.05462300777435303 Loss at step 700: 0.03686506301164627 Loss at step 750: 0.0352778285741806 Loss at step 800: 0.04227263107895851 Loss at step 850: 0.04231570288538933 Loss at step 900: 0.038680147379636765 Mean training loss after epoch 93: 0.04285589140504281 EPOCH: 94 Loss at step 0: 0.03377024456858635 Loss at step 50: 0.05313805490732193 Loss at step 100: 0.052344705909490585 Loss at step 150: 0.0549699068069458 Loss at step 200: 0.043661560863256454 Loss at step 250: 0.04525458812713623 Loss at step 300: 0.03621533513069153 Loss at step 350: 0.043058205395936966 Loss at step 400: 0.033526916056871414 Loss at step 450: 0.045574210584163666 Loss at step 500: 0.05113609880208969 Loss at step 550: 0.02983434684574604 Loss at step 600: 0.03884029760956764 Loss at step 650: 0.034773290157318115 Loss at step 700: 0.03184628114104271 Loss at step 750: 0.05141289159655571 Loss at step 800: 0.031959474086761475 Loss at step 850: 0.06084321811795235 Loss at step 900: 0.04533257707953453 Mean training loss after epoch 94: 0.04296914038103399 EPOCH: 95 Loss at step 0: 0.049454618245363235 Loss at step 50: 0.03432849794626236 Loss at step 100: 0.03927228972315788 Loss at step 150: 0.03987348824739456 Loss at step 200: 0.03219296783208847 Loss at step 250: 0.03891606628894806 Loss at step 300: 0.05109253525733948 Loss at step 350: 0.05382926017045975 Loss at step 400: 0.03941356763243675 Loss at step 450: 0.047886043787002563 Loss at step 500: 0.03733037784695625 Loss at step 550: 0.03691401705145836 Loss at step 600: 0.03794213756918907 Loss at step 650: 0.0335395373404026 Loss at step 700: 0.02686847373843193 Loss at step 750: 0.03974767401814461 Loss at step 800: 0.039435628801584244 Loss at step 850: 0.02982071414589882 Loss at step 900: 0.03484566882252693 Mean training loss after epoch 95: 0.0429740620792897 EPOCH: 96 Loss at step 0: 0.03344530984759331 Loss at step 50: 0.03988856449723244 Loss at step 100: 0.037878695875406265 Loss at step 150: 0.03148704767227173 Loss at step 200: 0.0553303137421608 Loss at step 250: 0.03375915810465813 Loss at step 300: 0.03673849627375603 Loss at step 350: 0.06203629821538925 Loss at step 400: 0.03654186427593231 Loss at step 450: 0.05359223484992981 Loss at step 500: 0.04555334150791168 Loss at step 550: 0.07236088067293167 Loss at step 600: 0.027772484347224236 Loss at step 650: 0.052784331142902374 Loss at step 700: 0.03660596162080765 Loss at step 750: 0.04406493902206421 Loss at step 800: 0.053940869867801666 Loss at step 850: 0.036954037845134735 Loss at step 900: 0.03537648543715477 Mean training loss after epoch 96: 0.04271107995267044 EPOCH: 97 Loss at step 0: 0.0571073554456234 Loss at step 50: 0.0507214292883873 Loss at step 100: 0.03628087043762207 Loss at step 150: 0.03885267674922943 Loss at step 200: 0.042158111929893494 Loss at step 250: 0.05556236952543259 Loss at step 300: 0.03981505706906319 Loss at step 350: 0.0548434779047966 Loss at step 400: 0.034418996423482895 Loss at step 450: 0.03472287207841873 Loss at step 500: 0.03555117920041084 Loss at step 550: 0.03816407918930054 Loss at step 600: 0.025759799405932426 Loss at step 650: 0.04199007526040077 Loss at step 700: 0.038333673030138016 Loss at step 750: 0.03289998322725296 Loss at step 800: 0.0702672153711319 Loss at step 850: 0.038640499114990234 Loss at step 900: 0.040598511695861816 Mean training loss after epoch 97: 0.042963363667096154 EPOCH: 98 Loss at step 0: 0.041190773248672485 Loss at step 50: 0.03610696643590927 Loss at step 100: 0.0334661602973938 Loss at step 150: 0.04453654959797859 Loss at step 200: 0.04337998852133751 Loss at step 250: 0.04634065553545952 Loss at step 300: 0.041758954524993896 Loss at step 350: 0.03860583156347275 Loss at step 400: 0.03955512121319771 Loss at step 450: 0.048036493360996246 Loss at step 500: 0.03650392219424248 Loss at step 550: 0.05565332621335983 Loss at step 600: 0.048426784574985504 Loss at step 650: 0.039550747722387314 Loss at step 700: 0.050359249114990234 Loss at step 750: 0.04194114729762077 Loss at step 800: 0.048837970942258835 Loss at step 850: 0.037034422159194946 Loss at step 900: 0.049725938588380814 Mean training loss after epoch 98: 0.04256112145970879 EPOCH: 99 Loss at step 0: 0.03797951340675354 Loss at step 50: 0.030993685126304626 Loss at step 100: 0.036409344524145126 Loss at step 150: 0.036334164440631866 Loss at step 200: 0.03806266561150551 Loss at step 250: 0.059645894914865494 Loss at step 300: 0.04319143667817116 Loss at step 350: 0.037388790398836136 Loss at step 400: 0.0379926823079586 Loss at step 450: 0.05213199183344841 Loss at step 500: 0.04928722605109215 Loss at step 550: 0.036855775862932205 Loss at step 600: 0.037726860493421555 Loss at step 650: 0.04194611310958862 Loss at step 700: 0.047788213938474655 Loss at step 750: 0.03666809946298599 Loss at step 800: 0.0394861213862896 Loss at step 850: 0.03480648621916771 Loss at step 900: 0.070110984146595 Mean training loss after epoch 99: 0.04252374041746102 EPOCH: 100 Loss at step 0: 0.06594480574131012 Loss at step 50: 0.03445781022310257 Loss at step 100: 0.05826606601476669 Loss at step 150: 0.04245076701045036 Loss at step 200: 0.03514834865927696 Loss at step 250: 0.033999983221292496 Loss at step 300: 0.03557480499148369 Loss at step 350: 0.04028984531760216 Loss at step 400: 0.030987918376922607 Loss at step 450: 0.04478059709072113 Loss at step 500: 0.039208147674798965 Loss at step 550: 0.0393022745847702 Loss at step 600: 0.03440580144524574 Loss at step 650: 0.03126935288310051 Loss at step 700: 0.037463799118995667 Loss at step 750: 0.03385145589709282 Loss at step 800: 0.035457681864500046 Loss at step 850: 0.05063025280833244 Loss at step 900: 0.031727395951747894 Mean training loss after epoch 100: 0.04250231359416107 EPOCH: 101 Loss at step 0: 0.054816994816064835 Loss at step 50: 0.05111993849277496 Loss at step 100: 0.03686879947781563 Loss at step 150: 0.03355054929852486 Loss at step 200: 0.04180608317255974 Loss at step 250: 0.038119226694107056 Loss at step 300: 0.06747231632471085 Loss at step 350: 0.03204899653792381 Loss at step 400: 0.03702154383063316 Loss at step 450: 0.03760283812880516 Loss at step 500: 0.05520860478281975 Loss at step 550: 0.054886963218450546 Loss at step 600: 0.05183286964893341 Loss at step 650: 0.0519767664372921 Loss at step 700: 0.04563475027680397 Loss at step 750: 0.031452666968107224 Loss at step 800: 0.04563373327255249 Loss at step 850: 0.03595368191599846 Loss at step 900: 0.04092846065759659 Mean training loss after epoch 101: 0.04266446885833545 EPOCH: 102 Loss at step 0: 0.03758480027318001 Loss at step 50: 0.048051171004772186 Loss at step 100: 0.03287939727306366 Loss at step 150: 0.050316717475652695 Loss at step 200: 0.042837388813495636 Loss at step 250: 0.04012921452522278 Loss at step 300: 0.028731361031532288 Loss at step 350: 0.037974998354911804 Loss at step 400: 0.04281613230705261 Loss at step 450: 0.0336131826043129 Loss at step 500: 0.03552795201539993 Loss at step 550: 0.03016221523284912 Loss at step 600: 0.053395941853523254 Loss at step 650: 0.03166235610842705 Loss at step 700: 0.043697696179151535 Loss at step 750: 0.03533635661005974 Loss at step 800: 0.04244111478328705 Loss at step 850: 0.03686143457889557 Loss at step 900: 0.04618969187140465 Mean training loss after epoch 102: 0.042582686170379615 EPOCH: 103 Loss at step 0: 0.03813011199235916 Loss at step 50: 0.034927256405353546 Loss at step 100: 0.031239565461874008 Loss at step 150: 0.042713697999715805 Loss at step 200: 0.0433226116001606 Loss at step 250: 0.05187712237238884 Loss at step 300: 0.04413282498717308 Loss at step 350: 0.04902511090040207 Loss at step 400: 0.03953693434596062 Loss at step 450: 0.039114054292440414 Loss at step 500: 0.04274632781744003 Loss at step 550: 0.0531548373401165 Loss at step 600: 0.03745201602578163 Loss at step 650: 0.031790003180503845 Loss at step 700: 0.053833868354558945 Loss at step 750: 0.04040380194783211 Loss at step 800: 0.04860929027199745 Loss at step 850: 0.03554920107126236 Loss at step 900: 0.038346171379089355 Mean training loss after epoch 103: 0.04251617089168095 EPOCH: 104 Loss at step 0: 0.038474924862384796 Loss at step 50: 0.044805858284235 Loss at step 100: 0.034493427723646164 Loss at step 150: 0.03356339782476425 Loss at step 200: 0.0352184996008873 Loss at step 250: 0.02936066873371601 Loss at step 300: 0.042006444185972214 Loss at step 350: 0.051808517426252365 Loss at step 400: 0.05232838913798332 Loss at step 450: 0.06221949681639671 Loss at step 500: 0.05095582455396652 Loss at step 550: 0.06535247713327408 Loss at step 600: 0.0405137836933136 Loss at step 650: 0.049897950142621994 Loss at step 700: 0.0437726192176342 Loss at step 750: 0.03346915915608406 Loss at step 800: 0.03773397207260132 Loss at step 850: 0.04173166677355766 Loss at step 900: 0.04306480661034584 Mean training loss after epoch 104: 0.04267875678233628 EPOCH: 105 Loss at step 0: 0.03925051912665367 Loss at step 50: 0.05954936146736145 Loss at step 100: 0.03243694081902504 Loss at step 150: 0.03322523832321167 Loss at step 200: 0.052616603672504425 Loss at step 250: 0.06096075847744942 Loss at step 300: 0.03955041244626045 Loss at step 350: 0.06198221817612648 Loss at step 400: 0.04186546429991722 Loss at step 450: 0.033763185143470764 Loss at step 500: 0.037474341690540314 Loss at step 550: 0.035403359681367874 Loss at step 600: 0.06787261366844177 Loss at step 650: 0.03845910727977753 Loss at step 700: 0.04604948312044144 Loss at step 750: 0.05642213672399521 Loss at step 800: 0.03913521394133568 Loss at step 850: 0.05649983882904053 Loss at step 900: 0.031140901148319244 Mean training loss after epoch 105: 0.04258796887428585 EPOCH: 106 Loss at step 0: 0.03972821682691574 Loss at step 50: 0.04153129830956459 Loss at step 100: 0.036973148584365845 Loss at step 150: 0.04552757740020752 Loss at step 200: 0.03672264888882637 Loss at step 250: 0.05741647630929947 Loss at step 300: 0.04311000928282738 Loss at step 350: 0.0379987433552742 Loss at step 400: 0.03218933939933777 Loss at step 450: 0.03928535059094429 Loss at step 500: 0.07216713577508926 Loss at step 550: 0.039698220789432526 Loss at step 600: 0.03438391909003258 Loss at step 650: 0.03522801771759987 Loss at step 700: 0.03284917399287224 Loss at step 750: 0.036407504230737686 Loss at step 800: 0.04317454993724823 Loss at step 850: 0.032883405685424805 Loss at step 900: 0.03503987565636635 Mean training loss after epoch 106: 0.04261471616076445 EPOCH: 107 Loss at step 0: 0.03846707195043564 Loss at step 50: 0.04331924766302109 Loss at step 100: 0.03615028038620949 Loss at step 150: 0.06391095370054245 Loss at step 200: 0.05727905035018921 Loss at step 250: 0.07090695202350616 Loss at step 300: 0.0342496857047081 Loss at step 350: 0.03432444483041763 Loss at step 400: 0.048370108008384705 Loss at step 450: 0.04175262153148651 Loss at step 500: 0.03698036074638367 Loss at step 550: 0.040640752762556076 Loss at step 600: 0.03526313975453377 Loss at step 650: 0.03848537430167198 Loss at step 700: 0.03649405390024185 Loss at step 750: 0.041827209293842316 Loss at step 800: 0.037235528230667114 Loss at step 850: 0.03328535333275795 Loss at step 900: 0.03950582444667816 Mean training loss after epoch 107: 0.04160137259875978 EPOCH: 108 Loss at step 0: 0.034566376358270645 Loss at step 50: 0.054515156894922256 Loss at step 100: 0.028471434488892555 Loss at step 150: 0.04659850522875786 Loss at step 200: 0.03646298125386238 Loss at step 250: 0.03687431663274765 Loss at step 300: 0.05480756610631943 Loss at step 350: 0.04384409636259079 Loss at step 400: 0.05794437974691391 Loss at step 450: 0.033320315182209015 Loss at step 500: 0.03170258179306984 Loss at step 550: 0.035993508994579315 Loss at step 600: 0.044657595455646515 Loss at step 650: 0.059966079890728 Loss at step 700: 0.03883882239460945 Loss at step 750: 0.054844051599502563 Loss at step 800: 0.037106383591890335 Loss at step 850: 0.07108719646930695 Loss at step 900: 0.033420123159885406 Mean training loss after epoch 108: 0.0427084030615273 EPOCH: 109 Loss at step 0: 0.03165261074900627 Loss at step 50: 0.036600880324840546 Loss at step 100: 0.036065876483917236 Loss at step 150: 0.04737785831093788 Loss at step 200: 0.03935203701257706 Loss at step 250: 0.050776977092027664 Loss at step 300: 0.037553731352090836 Loss at step 350: 0.05398639291524887 Loss at step 400: 0.0401119627058506 Loss at step 450: 0.03579001873731613 Loss at step 500: 0.07371081411838531 Loss at step 550: 0.03295370936393738 Loss at step 600: 0.03467197343707085 Loss at step 650: 0.03789252042770386 Loss at step 700: 0.03379783406853676 Loss at step 750: 0.04293002933263779 Loss at step 800: 0.05582861974835396 Loss at step 850: 0.04344412684440613 Loss at step 900: 0.03561467304825783 Mean training loss after epoch 109: 0.04232415787851823 EPOCH: 110 Loss at step 0: 0.04066146910190582 Loss at step 50: 0.03092169761657715 Loss at step 100: 0.04379524290561676 Loss at step 150: 0.046481452882289886 Loss at step 200: 0.035964298993349075 Loss at step 250: 0.03943020477890968 Loss at step 300: 0.040158461779356 Loss at step 350: 0.037916190922260284 Loss at step 400: 0.048341553658246994 Loss at step 450: 0.05997522547841072 Loss at step 500: 0.04303839057683945 Loss at step 550: 0.044007912278175354 Loss at step 600: 0.040273018181324005 Loss at step 650: 0.052575692534446716 Loss at step 700: 0.052247803658246994 Loss at step 750: 0.02987508475780487 Loss at step 800: 0.03995498642325401 Loss at step 850: 0.039623409509658813 Loss at step 900: 0.03085923194885254 Mean training loss after epoch 110: 0.0418662315091567 EPOCH: 111 Loss at step 0: 0.04087461158633232 Loss at step 50: 0.04566572979092598 Loss at step 100: 0.04353125020861626 Loss at step 150: 0.036665331572294235 Loss at step 200: 0.04791853949427605 Loss at step 250: 0.03176961466670036 Loss at step 300: 0.03411388397216797 Loss at step 350: 0.03534029796719551 Loss at step 400: 0.06946619600057602 Loss at step 450: 0.04892348498106003 Loss at step 500: 0.05122726783156395 Loss at step 550: 0.041164230555295944 Loss at step 600: 0.03580718860030174 Loss at step 650: 0.04979429394006729 Loss at step 700: 0.040910400450229645 Loss at step 750: 0.0392809733748436 Loss at step 800: 0.04240449517965317 Loss at step 850: 0.04202154651284218 Loss at step 900: 0.03828532248735428 Mean training loss after epoch 111: 0.042309815641929475 EPOCH: 112 Loss at step 0: 0.05732368305325508 Loss at step 50: 0.032697346061468124 Loss at step 100: 0.03879428654909134 Loss at step 150: 0.04335654526948929 Loss at step 200: 0.05167644843459129 Loss at step 250: 0.031387872993946075 Loss at step 300: 0.032151125371456146 Loss at step 350: 0.03301664814352989 Loss at step 400: 0.03214358910918236 Loss at step 450: 0.03987685963511467 Loss at step 500: 0.04216481000185013 Loss at step 550: 0.04103528708219528 Loss at step 600: 0.03057384490966797 Loss at step 650: 0.03793831542134285 Loss at step 700: 0.03568381443619728 Loss at step 750: 0.03574970364570618 Loss at step 800: 0.03824491426348686 Loss at step 850: 0.037126149982213974 Loss at step 900: 0.0429203025996685 Mean training loss after epoch 112: 0.04249270636040264 EPOCH: 113 Loss at step 0: 0.0373196043074131 Loss at step 50: 0.05041010305285454 Loss at step 100: 0.04375283792614937 Loss at step 150: 0.044493693858385086 Loss at step 200: 0.046422868967056274 Loss at step 250: 0.042558424174785614 Loss at step 300: 0.04969615116715431 Loss at step 350: 0.0355910025537014 Loss at step 400: 0.035893332213163376 Loss at step 450: 0.03937875106930733 Loss at step 500: 0.03531285747885704 Loss at step 550: 0.031121892854571342 Loss at step 600: 0.0344448946416378 Loss at step 650: 0.03561868891119957 Loss at step 700: 0.04948853701353073 Loss at step 750: 0.0521889291703701 Loss at step 800: 0.034127190709114075 Loss at step 850: 0.050092700868844986 Loss at step 900: 0.03754093497991562 Mean training loss after epoch 113: 0.04210959726781733 EPOCH: 114 Loss at step 0: 0.07794119417667389 Loss at step 50: 0.03785872459411621 Loss at step 100: 0.02951163798570633 Loss at step 150: 0.03614893555641174 Loss at step 200: 0.0334288589656353 Loss at step 250: 0.04454198107123375 Loss at step 300: 0.0302814282476902 Loss at step 350: 0.037919871509075165 Loss at step 400: 0.03609878569841385 Loss at step 450: 0.036125730723142624 Loss at step 500: 0.03330675885081291 Loss at step 550: 0.03959421440958977 Loss at step 600: 0.03536437451839447 Loss at step 650: 0.04799531400203705 Loss at step 700: 0.03922136500477791 Loss at step 750: 0.04748925194144249 Loss at step 800: 0.04288400709629059 Loss at step 850: 0.030749179422855377 Loss at step 900: 0.05282088741660118 Mean training loss after epoch 114: 0.04215258744551238 EPOCH: 115 Loss at step 0: 0.03269724175333977 Loss at step 50: 0.04127200320363045 Loss at step 100: 0.04298258572816849 Loss at step 150: 0.044186368584632874 Loss at step 200: 0.06728621572256088 Loss at step 250: 0.06409439444541931 Loss at step 300: 0.03680689260363579 Loss at step 350: 0.05468391627073288 Loss at step 400: 0.037299200892448425 Loss at step 450: 0.034279804676771164 Loss at step 500: 0.05238455533981323 Loss at step 550: 0.03664058446884155 Loss at step 600: 0.050791505724191666 Loss at step 650: 0.035236917436122894 Loss at step 700: 0.037347350269556046 Loss at step 750: 0.04255804792046547 Loss at step 800: 0.0385412760078907 Loss at step 850: 0.03693554922938347 Loss at step 900: 0.034885890781879425 Mean training loss after epoch 115: 0.0422524697145324 EPOCH: 116 Loss at step 0: 0.038392242044210434 Loss at step 50: 0.0498710460960865 Loss at step 100: 0.0316699743270874 Loss at step 150: 0.04003997519612312 Loss at step 200: 0.05826578289270401 Loss at step 250: 0.041681259870529175 Loss at step 300: 0.05601358413696289 Loss at step 350: 0.03729918971657753 Loss at step 400: 0.03318071737885475 Loss at step 450: 0.04293840751051903 Loss at step 500: 0.06227767467498779 Loss at step 550: 0.03678235039114952 Loss at step 600: 0.034046124666929245 Loss at step 650: 0.03284251317381859 Loss at step 700: 0.041909221559762955 Loss at step 750: 0.032798267900943756 Loss at step 800: 0.031638942658901215 Loss at step 850: 0.03910643979907036 Loss at step 900: 0.04234055057168007 Mean training loss after epoch 116: 0.04153785489197733 EPOCH: 117 Loss at step 0: 0.06445081532001495 Loss at step 50: 0.06357962638139725 Loss at step 100: 0.027501210570335388 Loss at step 150: 0.03932282328605652 Loss at step 200: 0.04287012293934822 Loss at step 250: 0.03253049775958061 Loss at step 300: 0.03618066757917404 Loss at step 350: 0.04000543802976608 Loss at step 400: 0.03892853856086731 Loss at step 450: 0.0451839454472065 Loss at step 500: 0.03603319451212883 Loss at step 550: 0.03950086981058121 Loss at step 600: 0.032469674944877625 Loss at step 650: 0.0363955944776535 Loss at step 700: 0.05247776582837105 Loss at step 750: 0.03969534859061241 Loss at step 800: 0.03632740303874016 Loss at step 850: 0.042407725006341934 Loss at step 900: 0.03060283325612545 Mean training loss after epoch 117: 0.04181724062153716 EPOCH: 118 Loss at step 0: 0.046983376145362854 Loss at step 50: 0.038296736776828766 Loss at step 100: 0.03750442713499069 Loss at step 150: 0.03925739601254463 Loss at step 200: 0.03386780992150307 Loss at step 250: 0.03454817831516266 Loss at step 300: 0.028822144493460655 Loss at step 350: 0.03317120671272278 Loss at step 400: 0.060472145676612854 Loss at step 450: 0.03656953200697899 Loss at step 500: 0.03465289995074272 Loss at step 550: 0.049657080322504044 Loss at step 600: 0.03733038529753685 Loss at step 650: 0.03379211574792862 Loss at step 700: 0.03200632706284523 Loss at step 750: 0.05074142664670944 Loss at step 800: 0.038302790373563766 Loss at step 850: 0.0523531474173069 Loss at step 900: 0.030052630230784416 Mean training loss after epoch 118: 0.04178073859291036 EPOCH: 119 Loss at step 0: 0.03358502313494682 Loss at step 50: 0.06789039075374603 Loss at step 100: 0.05482921376824379 Loss at step 150: 0.03166317939758301 Loss at step 200: 0.0713806003332138 Loss at step 250: 0.03563233092427254 Loss at step 300: 0.05284898355603218 Loss at step 350: 0.0413966104388237 Loss at step 400: 0.03635629639029503 Loss at step 450: 0.05459525063633919 Loss at step 500: 0.04127579182386398 Loss at step 550: 0.052134640514850616 Loss at step 600: 0.035580333322286606 Loss at step 650: 0.04784807935357094 Loss at step 700: 0.03520835191011429 Loss at step 750: 0.043256036937236786 Loss at step 800: 0.03310855105519295 Loss at step 850: 0.07495839893817902 Loss at step 900: 0.0649784654378891 Mean training loss after epoch 119: 0.04298082933918055 EPOCH: 120 Loss at step 0: 0.05655069649219513 Loss at step 50: 0.04514957219362259 Loss at step 100: 0.06655114889144897 Loss at step 150: 0.03106803260743618 Loss at step 200: 0.045833077281713486 Loss at step 250: 0.03953521326184273 Loss at step 300: 0.05246560648083687 Loss at step 350: 0.04267233982682228 Loss at step 400: 0.038346465677022934 Loss at step 450: 0.037754569202661514 Loss at step 500: 0.043163035064935684 Loss at step 550: 0.04140860587358475 Loss at step 600: 0.0406038872897625 Loss at step 650: 0.03359159827232361 Loss at step 700: 0.04936815798282623 Loss at step 750: 0.05438612401485443 Loss at step 800: 0.04213950037956238 Loss at step 850: 0.034508705139160156 Loss at step 900: 0.0675860270857811 Mean training loss after epoch 120: 0.042658466736533875 EPOCH: 121 Loss at step 0: 0.03897164389491081 Loss at step 50: 0.05522599816322327 Loss at step 100: 0.04948018491268158 Loss at step 150: 0.043886661529541016 Loss at step 200: 0.03704773262143135 Loss at step 250: 0.03725631162524223 Loss at step 300: 0.05728599801659584 Loss at step 350: 0.0416950061917305 Loss at step 400: 0.07756723463535309 Loss at step 450: 0.04351458325982094 Loss at step 500: 0.04274976998567581 Loss at step 550: 0.0296721663326025 Loss at step 600: 0.03947244957089424 Loss at step 650: 0.04766566678881645 Loss at step 700: 0.0404764823615551 Loss at step 750: 0.05349675938487053 Loss at step 800: 0.038787178695201874 Loss at step 850: 0.035913098603487015 Loss at step 900: 0.051250796765089035 Mean training loss after epoch 121: 0.04169183673420504 EPOCH: 122 Loss at step 0: 0.05119352787733078 Loss at step 50: 0.04652483016252518 Loss at step 100: 0.03692207857966423 Loss at step 150: 0.03007550723850727 Loss at step 200: 0.03636409342288971 Loss at step 250: 0.042276497930288315 Loss at step 300: 0.057180218398571014 Loss at step 350: 0.039825454354286194 Loss at step 400: 0.04095945507287979 Loss at step 450: 0.05761071667075157 Loss at step 500: 0.03229396045207977 Loss at step 550: 0.04794428125023842 Loss at step 600: 0.03873315453529358 Loss at step 650: 0.03666229173541069 Loss at step 700: 0.03922053426504135 Loss at step 750: 0.051435839384794235 Loss at step 800: 0.04893461614847183 Loss at step 850: 0.062272828072309494 Loss at step 900: 0.058558881282806396 Mean training loss after epoch 122: 0.042184871921280044 EPOCH: 123 Loss at step 0: 0.04936463385820389 Loss at step 50: 0.038175780326128006 Loss at step 100: 0.04319179803133011 Loss at step 150: 0.06064098700881004 Loss at step 200: 0.044131651520729065 Loss at step 250: 0.03260546177625656 Loss at step 300: 0.04953109472990036 Loss at step 350: 0.03585084155201912 Loss at step 400: 0.03319363296031952 Loss at step 450: 0.03386983647942543 Loss at step 500: 0.03440601006150246 Loss at step 550: 0.05421939119696617 Loss at step 600: 0.04438026621937752 Loss at step 650: 0.05095277354121208 Loss at step 700: 0.03414129093289375 Loss at step 750: 0.05175543576478958 Loss at step 800: 0.038732655346393585 Loss at step 850: 0.02856926992535591 Loss at step 900: 0.03633487597107887 Mean training loss after epoch 123: 0.041980016295478415 EPOCH: 124 Loss at step 0: 0.03989504277706146 Loss at step 50: 0.04353111609816551 Loss at step 100: 0.03718562424182892 Loss at step 150: 0.031707338988780975 Loss at step 200: 0.04026669263839722 Loss at step 250: 0.03404298424720764 Loss at step 300: 0.0534050315618515 Loss at step 350: 0.029968062415719032 Loss at step 400: 0.03694538772106171 Loss at step 450: 0.05466097220778465 Loss at step 500: 0.03984195366501808 Loss at step 550: 0.05208732932806015 Loss at step 600: 0.03710039705038071 Loss at step 650: 0.04038833826780319 Loss at step 700: 0.037802185863256454 Loss at step 750: 0.04496192932128906 Loss at step 800: 0.035395435988903046 Loss at step 850: 0.05032876133918762 Loss at step 900: 0.032192520797252655 Mean training loss after epoch 124: 0.0420913482263589 EPOCH: 125 Loss at step 0: 0.032458074390888214 Loss at step 50: 0.055535197257995605 Loss at step 100: 0.05189374461770058 Loss at step 150: 0.041088443249464035 Loss at step 200: 0.05803776532411575 Loss at step 250: 0.036880023777484894 Loss at step 300: 0.0345192588865757 Loss at step 350: 0.03455229476094246 Loss at step 400: 0.03948681056499481 Loss at step 450: 0.040654201060533524 Loss at step 500: 0.05414455011487007 Loss at step 550: 0.036972880363464355 Loss at step 600: 0.038368161767721176 Loss at step 650: 0.03491312265396118 Loss at step 700: 0.035577669739723206 Loss at step 750: 0.039977770298719406 Loss at step 800: 0.03572118282318115 Loss at step 850: 0.03331460803747177 Loss at step 900: 0.034765347838401794 Mean training loss after epoch 125: 0.04199916848948579 EPOCH: 126 Loss at step 0: 0.051006585359573364 Loss at step 50: 0.05238280072808266 Loss at step 100: 0.07898009568452835 Loss at step 150: 0.039006780833005905 Loss at step 200: 0.028127070516347885 Loss at step 250: 0.04001521319150925 Loss at step 300: 0.034793008118867874 Loss at step 350: 0.08402115851640701 Loss at step 400: 0.05133382976055145 Loss at step 450: 0.028179431334137917 Loss at step 500: 0.03849378973245621 Loss at step 550: 0.03267565369606018 Loss at step 600: 0.03785364329814911 Loss at step 650: 0.03719611093401909 Loss at step 700: 0.0354740172624588 Loss at step 750: 0.07399938255548477 Loss at step 800: 0.0404299721121788 Loss at step 850: 0.04966295510530472 Loss at step 900: 0.04146670177578926 Mean training loss after epoch 126: 0.0418950337896755 EPOCH: 127 Loss at step 0: 0.04349181056022644 Loss at step 50: 0.04604915529489517 Loss at step 100: 0.0789995938539505 Loss at step 150: 0.05610816180706024 Loss at step 200: 0.0388299934566021 Loss at step 250: 0.031251903623342514 Loss at step 300: 0.03407461196184158 Loss at step 350: 0.035196419805288315 Loss at step 400: 0.04049470275640488 Loss at step 450: 0.0353369303047657 Loss at step 500: 0.03746187686920166 Loss at step 550: 0.0338483527302742 Loss at step 600: 0.05165375396609306 Loss at step 650: 0.04033738002181053 Loss at step 700: 0.041242919862270355 Loss at step 750: 0.036316126585006714 Loss at step 800: 0.04027669131755829 Loss at step 850: 0.051455847918987274 Loss at step 900: 0.05169288069009781 Mean training loss after epoch 127: 0.04239633188112331 EPOCH: 128 Loss at step 0: 0.034746017307043076 Loss at step 50: 0.034719470888376236 Loss at step 100: 0.03823745250701904 Loss at step 150: 0.04462166130542755 Loss at step 200: 0.0378401018679142 Loss at step 250: 0.055615052580833435 Loss at step 300: 0.033228740096092224 Loss at step 350: 0.03287515416741371 Loss at step 400: 0.04448705166578293 Loss at step 450: 0.03881360590457916 Loss at step 500: 0.03505659103393555 Loss at step 550: 0.05110374093055725 Loss at step 600: 0.03396036475896835 Loss at step 650: 0.032783545553684235 Loss at step 700: 0.03914528340101242 Loss at step 750: 0.03648678958415985 Loss at step 800: 0.03869437426328659 Loss at step 850: 0.06087281554937363 Loss at step 900: 0.029765406623482704 Mean training loss after epoch 128: 0.04200040021819918 EPOCH: 129 Loss at step 0: 0.05055684968829155 Loss at step 50: 0.030315108597278595 Loss at step 100: 0.04236120358109474 Loss at step 150: 0.03397499397397041 Loss at step 200: 0.05119398608803749 Loss at step 250: 0.051564838737249374 Loss at step 300: 0.032033029943704605 Loss at step 350: 0.05339248850941658 Loss at step 400: 0.0373106524348259 Loss at step 450: 0.030532313510775566 Loss at step 500: 0.06623230129480362 Loss at step 550: 0.04921671748161316 Loss at step 600: 0.03587377071380615 Loss at step 650: 0.036296579986810684 Loss at step 700: 0.03638605400919914 Loss at step 750: 0.03656594455242157 Loss at step 800: 0.039779480546712875 Loss at step 850: 0.029600784182548523 Loss at step 900: 0.049143511801958084 Mean training loss after epoch 129: 0.04218049899442618 EPOCH: 130 Loss at step 0: 0.052545249462127686 Loss at step 50: 0.038528211414813995 Loss at step 100: 0.027567030861973763 Loss at step 150: 0.03429867699742317 Loss at step 200: 0.038811951875686646 Loss at step 250: 0.04634413495659828 Loss at step 300: 0.032937608659267426 Loss at step 350: 0.03211917728185654 Loss at step 400: 0.034982986748218536 Loss at step 450: 0.040340572595596313 Loss at step 500: 0.03976989910006523 Loss at step 550: 0.04129543900489807 Loss at step 600: 0.04864150658249855 Loss at step 650: 0.05638287961483002 Loss at step 700: 0.03687974810600281 Loss at step 750: 0.04878706857562065 Loss at step 800: 0.03704630210995674 Loss at step 850: 0.03765973076224327 Loss at step 900: 0.03913654386997223 Mean training loss after epoch 130: 0.04150379456873578 EPOCH: 131 Loss at step 0: 0.04336472228169441 Loss at step 50: 0.039639681577682495 Loss at step 100: 0.037212807685136795 Loss at step 150: 0.032861024141311646 Loss at step 200: 0.03451058268547058 Loss at step 250: 0.06621665507555008 Loss at step 300: 0.05730853229761124 Loss at step 350: 0.03239696845412254 Loss at step 400: 0.05622163787484169 Loss at step 450: 0.033836279064416885 Loss at step 500: 0.056446053087711334 Loss at step 550: 0.026984812691807747 Loss at step 600: 0.02471383474767208 Loss at step 650: 0.04270629584789276 Loss at step 700: 0.0501001812517643 Loss at step 750: 0.06488067656755447 Loss at step 800: 0.037917789071798325 Loss at step 850: 0.10470237582921982 Loss at step 900: 0.03097240999341011 Mean training loss after epoch 131: 0.04195304798967104 EPOCH: 132 Loss at step 0: 0.04962288588285446 Loss at step 50: 0.06931368261575699 Loss at step 100: 0.049940966069698334 Loss at step 150: 0.06847498565912247 Loss at step 200: 0.03371453285217285 Loss at step 250: 0.041119612753391266 Loss at step 300: 0.03662676736712456 Loss at step 350: 0.04187504202127457 Loss at step 400: 0.0364510677754879 Loss at step 450: 0.04922874644398689 Loss at step 500: 0.03917347639799118 Loss at step 550: 0.036016497761011124 Loss at step 600: 0.041930682957172394 Loss at step 650: 0.03447548672556877 Loss at step 700: 0.039227474480867386 Loss at step 750: 0.03534382954239845 Loss at step 800: 0.05443809926509857 Loss at step 850: 0.0599854402244091 Loss at step 900: 0.03519508242607117 Mean training loss after epoch 132: 0.041760940632935784 EPOCH: 133 Loss at step 0: 0.0473250113427639 Loss at step 50: 0.037641141563653946 Loss at step 100: 0.04980551451444626 Loss at step 150: 0.04431820660829544 Loss at step 200: 0.032616157084703445 Loss at step 250: 0.03402405604720116 Loss at step 300: 0.03094940260052681 Loss at step 350: 0.03868083283305168 Loss at step 400: 0.03758811205625534 Loss at step 450: 0.043539367616176605 Loss at step 500: 0.041610222309827805 Loss at step 550: 0.03396304324269295 Loss at step 600: 0.0355415977537632 Loss at step 650: 0.037485938519239426 Loss at step 700: 0.055193942040205 Loss at step 750: 0.050874676555395126 Loss at step 800: 0.034698810428380966 Loss at step 850: 0.052936941385269165 Loss at step 900: 0.038264304399490356 Mean training loss after epoch 133: 0.04197042437393401 EPOCH: 134 Loss at step 0: 0.03001808002591133 Loss at step 50: 0.035312578082084656 Loss at step 100: 0.033444952219724655 Loss at step 150: 0.0531051941215992 Loss at step 200: 0.03694118559360504 Loss at step 250: 0.059171855449676514 Loss at step 300: 0.03842538222670555 Loss at step 350: 0.03861884027719498 Loss at step 400: 0.05249006673693657 Loss at step 450: 0.04973667487502098 Loss at step 500: 0.0346035473048687 Loss at step 550: 0.029177922755479813 Loss at step 600: 0.04092540964484215 Loss at step 650: 0.04141342267394066 Loss at step 700: 0.036661289632320404 Loss at step 750: 0.03652454912662506 Loss at step 800: 0.036444712430238724 Loss at step 850: 0.03831741586327553 Loss at step 900: 0.039848729968070984 Mean training loss after epoch 134: 0.04186191032729995 EPOCH: 135 Loss at step 0: 0.030228322371840477 Loss at step 50: 0.04310741275548935 Loss at step 100: 0.03188135847449303 Loss at step 150: 0.03175545856356621 Loss at step 200: 0.04696660488843918 Loss at step 250: 0.04513179510831833 Loss at step 300: 0.03545892983675003 Loss at step 350: 0.04064309597015381 Loss at step 400: 0.06752579659223557 Loss at step 450: 0.06105548515915871 Loss at step 500: 0.04529595375061035 Loss at step 550: 0.05377986282110214 Loss at step 600: 0.042490482330322266 Loss at step 650: 0.029119232669472694 Loss at step 700: 0.04060959070920944 Loss at step 750: 0.03701868653297424 Loss at step 800: 0.0503336638212204 Loss at step 850: 0.04718286171555519 Loss at step 900: 0.04402776435017586 Mean training loss after epoch 135: 0.041762241239804446 EPOCH: 136 Loss at step 0: 0.03764044865965843 Loss at step 50: 0.035784490406513214 Loss at step 100: 0.06043146923184395 Loss at step 150: 0.034588154405355453 Loss at step 200: 0.04766268655657768 Loss at step 250: 0.046546097844839096 Loss at step 300: 0.031802643090486526 Loss at step 350: 0.035414449870586395 Loss at step 400: 0.03844719007611275 Loss at step 450: 0.04922423139214516 Loss at step 500: 0.05372387543320656 Loss at step 550: 0.0574822723865509 Loss at step 600: 0.03827773034572601 Loss at step 650: 0.04494352266192436 Loss at step 700: 0.07233446091413498 Loss at step 750: 0.06421661376953125 Loss at step 800: 0.044943779706954956 Loss at step 850: 0.0366646945476532 Loss at step 900: 0.038116566836833954 Mean training loss after epoch 136: 0.04148861685835286 EPOCH: 137 Loss at step 0: 0.0368124358355999 Loss at step 50: 0.03689468652009964 Loss at step 100: 0.038075949996709824 Loss at step 150: 0.04152105376124382 Loss at step 200: 0.03293756768107414 Loss at step 250: 0.034354064613580704 Loss at step 300: 0.05950964242219925 Loss at step 350: 0.03678474575281143 Loss at step 400: 0.03860022500157356 Loss at step 450: 0.041248176246881485 Loss at step 500: 0.03373798355460167 Loss at step 550: 0.029979437589645386 Loss at step 600: 0.03760131821036339 Loss at step 650: 0.02812694013118744 Loss at step 700: 0.03887851908802986 Loss at step 750: 0.032030824571847916 Loss at step 800: 0.04420321062207222 Loss at step 850: 0.031504228711128235 Loss at step 900: 0.03780015558004379 Mean training loss after epoch 137: 0.04162597493417482 EPOCH: 138 Loss at step 0: 0.03750850260257721 Loss at step 50: 0.0446242056787014 Loss at step 100: 0.04489576816558838 Loss at step 150: 0.03486185148358345 Loss at step 200: 0.03764588385820389 Loss at step 250: 0.054112598299980164 Loss at step 300: 0.03600380942225456 Loss at step 350: 0.042464736849069595 Loss at step 400: 0.038752783089876175 Loss at step 450: 0.036281924694776535 Loss at step 500: 0.03598717972636223 Loss at step 550: 0.03989803418517113 Loss at step 600: 0.04851244017481804 Loss at step 650: 0.036570772528648376 Loss at step 700: 0.034442074596881866 Loss at step 750: 0.035217348486185074 Loss at step 800: 0.037707939743995667 Loss at step 850: 0.040225084871053696 Loss at step 900: 0.04016699641942978 Mean training loss after epoch 138: 0.04153412301689069 EPOCH: 139 Loss at step 0: 0.04551554471254349 Loss at step 50: 0.032560113817453384 Loss at step 100: 0.03775615990161896 Loss at step 150: 0.03421652689576149 Loss at step 200: 0.03354345262050629 Loss at step 250: 0.053721800446510315 Loss at step 300: 0.04135346785187721 Loss at step 350: 0.0513974092900753 Loss at step 400: 0.04881959408521652 Loss at step 450: 0.03544827178120613 Loss at step 500: 0.05913611873984337 Loss at step 550: 0.039679333567619324 Loss at step 600: 0.050094157457351685 Loss at step 650: 0.03873215243220329 Loss at step 700: 0.038062117993831635 Loss at step 750: 0.03416633978486061 Loss at step 800: 0.03664739802479744 Loss at step 850: 0.04644625261425972 Loss at step 900: 0.03739149495959282 Mean training loss after epoch 139: 0.04111956674327601 EPOCH: 140 Loss at step 0: 0.03596428409218788 Loss at step 50: 0.042823463678359985 Loss at step 100: 0.04230928421020508 Loss at step 150: 0.042609747499227524 Loss at step 200: 0.05407872423529625 Loss at step 250: 0.05363228917121887 Loss at step 300: 0.05207119137048721 Loss at step 350: 0.048670560121536255 Loss at step 400: 0.03814106807112694 Loss at step 450: 0.03930611535906792 Loss at step 500: 0.03556457906961441 Loss at step 550: 0.03832629323005676 Loss at step 600: 0.05887491628527641 Loss at step 650: 0.03224894404411316 Loss at step 700: 0.04541449621319771 Loss at step 750: 0.055066440254449844 Loss at step 800: 0.03603579103946686 Loss at step 850: 0.04084492474794388 Loss at step 900: 0.056961894035339355 Mean training loss after epoch 140: 0.042075636034120505 EPOCH: 141 Loss at step 0: 0.0399349145591259 Loss at step 50: 0.04864419996738434 Loss at step 100: 0.03305085748434067 Loss at step 150: 0.03143967688083649 Loss at step 200: 0.03543752059340477 Loss at step 250: 0.062418945133686066 Loss at step 300: 0.04697494953870773 Loss at step 350: 0.03883915767073631 Loss at step 400: 0.042982928454875946 Loss at step 450: 0.03620343282818794 Loss at step 500: 0.049217771738767624 Loss at step 550: 0.0331941582262516 Loss at step 600: 0.03566623479127884 Loss at step 650: 0.03173791989684105 Loss at step 700: 0.036675065755844116 Loss at step 750: 0.03768469765782356 Loss at step 800: 0.04658212512731552 Loss at step 850: 0.05539308115839958 Loss at step 900: 0.0515570230782032 Mean training loss after epoch 141: 0.04171412966367024 EPOCH: 142 Loss at step 0: 0.03624924272298813 Loss at step 50: 0.03696461021900177 Loss at step 100: 0.04459252581000328 Loss at step 150: 0.03969615697860718 Loss at step 200: 0.03675790876150131 Loss at step 250: 0.0769742876291275 Loss at step 300: 0.05350521206855774 Loss at step 350: 0.035982292145490646 Loss at step 400: 0.05962669476866722 Loss at step 450: 0.03493090346455574 Loss at step 500: 0.042015235871076584 Loss at step 550: 0.030329391360282898 Loss at step 600: 0.045749954879283905 Loss at step 650: 0.05055365338921547 Loss at step 700: 0.02786894328892231 Loss at step 750: 0.03922906890511513 Loss at step 800: 0.033311016857624054 Loss at step 850: 0.043917424976825714 Loss at step 900: 0.03556983917951584 Mean training loss after epoch 142: 0.041388397367158804 EPOCH: 143 Loss at step 0: 0.03461332246661186 Loss at step 50: 0.04649975895881653 Loss at step 100: 0.037163153290748596 Loss at step 150: 0.0337543822824955 Loss at step 200: 0.05186038464307785 Loss at step 250: 0.029812298715114594 Loss at step 300: 0.05709000304341316 Loss at step 350: 0.037626028060913086 Loss at step 400: 0.03862740099430084 Loss at step 450: 0.03890547528862953 Loss at step 500: 0.03525479510426521 Loss at step 550: 0.05341339111328125 Loss at step 600: 0.041368093341588974 Loss at step 650: 0.03110479936003685 Loss at step 700: 0.03622483089566231 Loss at step 750: 0.047113243490457535 Loss at step 800: 0.05379750579595566 Loss at step 850: 0.043072398751974106 Loss at step 900: 0.041429538279771805 Mean training loss after epoch 143: 0.04150605164984587 EPOCH: 144 Loss at step 0: 0.037773117423057556 Loss at step 50: 0.05765533447265625 Loss at step 100: 0.04451600834727287 Loss at step 150: 0.03383982554078102 Loss at step 200: 0.05011241137981415 Loss at step 250: 0.04995810613036156 Loss at step 300: 0.036682143807411194 Loss at step 350: 0.06827323138713837 Loss at step 400: 0.03250201791524887 Loss at step 450: 0.03412742167711258 Loss at step 500: 0.0334237739443779 Loss at step 550: 0.048093199729919434 Loss at step 600: 0.03731585294008255 Loss at step 650: 0.03577633574604988 Loss at step 700: 0.0499037429690361 Loss at step 750: 0.052077312022447586 Loss at step 800: 0.036779262125492096 Loss at step 850: 0.03815673664212227 Loss at step 900: 0.041096482425928116 Mean training loss after epoch 144: 0.0413131038390243 EPOCH: 145 Loss at step 0: 0.0331590361893177 Loss at step 50: 0.05208244174718857 Loss at step 100: 0.04629826918244362 Loss at step 150: 0.03534118831157684 Loss at step 200: 0.05991370230913162 Loss at step 250: 0.051161717623472214 Loss at step 300: 0.03613453358411789 Loss at step 350: 0.04293036833405495 Loss at step 400: 0.028370732441544533 Loss at step 450: 0.04262422025203705 Loss at step 500: 0.03800327703356743 Loss at step 550: 0.030145244672894478 Loss at step 600: 0.03815841302275658 Loss at step 650: 0.02962811104953289 Loss at step 700: 0.039205390959978104 Loss at step 750: 0.03288600966334343 Loss at step 800: 0.03849463537335396 Loss at step 850: 0.03639960661530495 Loss at step 900: 0.04786938801407814 Mean training loss after epoch 145: 0.04152789463731907 EPOCH: 146 Loss at step 0: 0.06922589987516403 Loss at step 50: 0.027426032349467278 Loss at step 100: 0.0419546514749527 Loss at step 150: 0.07683013379573822 Loss at step 200: 0.039043329656124115 Loss at step 250: 0.04375987872481346 Loss at step 300: 0.049568191170692444 Loss at step 350: 0.05699385330080986 Loss at step 400: 0.048499494791030884 Loss at step 450: 0.0383453294634819 Loss at step 500: 0.05154740810394287 Loss at step 550: 0.03572075441479683 Loss at step 600: 0.027929428964853287 Loss at step 650: 0.05027191340923309 Loss at step 700: 0.038150355219841 Loss at step 750: 0.035848140716552734 Loss at step 800: 0.03224359452724457 Loss at step 850: 0.057315386831760406 Loss at step 900: 0.057250626385211945 Mean training loss after epoch 146: 0.04164098674800795 EPOCH: 147 Loss at step 0: 0.06581734120845795 Loss at step 50: 0.038791775703430176 Loss at step 100: 0.05927491560578346 Loss at step 150: 0.03419802337884903 Loss at step 200: 0.03612799942493439 Loss at step 250: 0.038876332342624664 Loss at step 300: 0.035981107503175735 Loss at step 350: 0.033888012170791626 Loss at step 400: 0.037413209676742554 Loss at step 450: 0.037917546927928925 Loss at step 500: 0.05330871790647507 Loss at step 550: 0.029614856466650963 Loss at step 600: 0.03355589136481285 Loss at step 650: 0.036023762077093124 Loss at step 700: 0.03964334726333618 Loss at step 750: 0.05343924090266228 Loss at step 800: 0.053697213530540466 Loss at step 850: 0.03371152654290199 Loss at step 900: 0.05578039586544037 Mean training loss after epoch 147: 0.04240527805655813 EPOCH: 148 Loss at step 0: 0.03936386480927467 Loss at step 50: 0.04210640490055084 Loss at step 100: 0.04268816486001015 Loss at step 150: 0.06183818355202675 Loss at step 200: 0.07075987011194229 Loss at step 250: 0.05519825592637062 Loss at step 300: 0.035694755613803864 Loss at step 350: 0.03371533378958702 Loss at step 400: 0.033945232629776 Loss at step 450: 0.05821037292480469 Loss at step 500: 0.04697125032544136 Loss at step 550: 0.061502858996391296 Loss at step 600: 0.03560385853052139 Loss at step 650: 0.04983074590563774 Loss at step 700: 0.05276470631361008 Loss at step 750: 0.03687663748860359 Loss at step 800: 0.042109232395887375 Loss at step 850: 0.03334448114037514 Loss at step 900: 0.0355062298476696 Mean training loss after epoch 148: 0.041936105562012584 EPOCH: 149 Loss at step 0: 0.047576889395713806 Loss at step 50: 0.044975463300943375 Loss at step 100: 0.03648234158754349 Loss at step 150: 0.03300274536013603 Loss at step 200: 0.042352344840765 Loss at step 250: 0.03916765749454498 Loss at step 300: 0.03359303995966911 Loss at step 350: 0.03628871962428093 Loss at step 400: 0.05589814856648445 Loss at step 450: 0.04867476224899292 Loss at step 500: 0.03508033603429794 Loss at step 550: 0.03953943029046059 Loss at step 600: 0.04848071560263634 Loss at step 650: 0.03590350225567818 Loss at step 700: 0.05390321463346481 Loss at step 750: 0.03665829449892044 Loss at step 800: 0.0287545807659626 Loss at step 850: 0.039661508053541183 Loss at step 900: 0.038975901901721954 Mean training loss after epoch 149: 0.041458570587037724 EPOCH: 150 Loss at step 0: 0.056643884629011154 Loss at step 50: 0.04595481976866722 Loss at step 100: 0.029688257724046707 Loss at step 150: 0.03850787132978439 Loss at step 200: 0.03398960456252098 Loss at step 250: 0.03484099358320236 Loss at step 300: 0.03894566372036934 Loss at step 350: 0.036098673939704895 Loss at step 400: 0.0428624153137207 Loss at step 450: 0.04429728165268898 Loss at step 500: 0.040077000856399536 Loss at step 550: 0.033312078565359116 Loss at step 600: 0.05234813690185547 Loss at step 650: 0.037106581032276154 Loss at step 700: 0.045419253408908844 Loss at step 750: 0.036966074258089066 Loss at step 800: 0.04133862629532814 Loss at step 850: 0.03744308277964592 Loss at step 900: 0.0384526252746582 Mean training loss after epoch 150: 0.041602980478533676 EPOCH: 151 Loss at step 0: 0.040540531277656555 Loss at step 50: 0.056973885744810104 Loss at step 100: 0.03904992714524269 Loss at step 150: 0.03366922214627266 Loss at step 200: 0.04054241627454758 Loss at step 250: 0.03384271264076233 Loss at step 300: 0.06538203358650208 Loss at step 350: 0.04228401929140091 Loss at step 400: 0.03054974041879177 Loss at step 450: 0.032168153673410416 Loss at step 500: 0.037130411714315414 Loss at step 550: 0.04935746267437935 Loss at step 600: 0.037488698959350586 Loss at step 650: 0.03225978463888168 Loss at step 700: 0.05651061609387398 Loss at step 750: 0.04557491093873978 Loss at step 800: 0.0365501269698143 Loss at step 850: 0.049771085381507874 Loss at step 900: 0.039190683513879776 Mean training loss after epoch 151: 0.042139969378916314 EPOCH: 152 Loss at step 0: 0.04227951541543007 Loss at step 50: 0.03895462676882744 Loss at step 100: 0.03314309939742088 Loss at step 150: 0.046858105808496475 Loss at step 200: 0.10066678375005722 Loss at step 250: 0.04834022745490074 Loss at step 300: 0.04364693909883499 Loss at step 350: 0.03455347195267677 Loss at step 400: 0.04103506729006767 Loss at step 450: 0.03935318440198898 Loss at step 500: 0.028242669999599457 Loss at step 550: 0.03538709506392479 Loss at step 600: 0.03963807597756386 Loss at step 650: 0.05471167340874672 Loss at step 700: 0.032676227390766144 Loss at step 750: 0.03609933704137802 Loss at step 800: 0.0532238632440567 Loss at step 850: 0.03841548040509224 Loss at step 900: 0.04774486646056175 Mean training loss after epoch 152: 0.041547004582245216 EPOCH: 153 Loss at step 0: 0.038035210222005844 Loss at step 50: 0.05306409299373627 Loss at step 100: 0.046712178736925125 Loss at step 150: 0.05315027013421059 Loss at step 200: 0.04282886162400246 Loss at step 250: 0.05324966832995415 Loss at step 300: 0.06066737696528435 Loss at step 350: 0.036959365010261536 Loss at step 400: 0.033942628651857376 Loss at step 450: 0.0355224534869194 Loss at step 500: 0.034113649278879166 Loss at step 550: 0.05700324848294258 Loss at step 600: 0.05441057309508324 Loss at step 650: 0.048028379678726196 Loss at step 700: 0.03308489918708801 Loss at step 750: 0.035109080374240875 Loss at step 800: 0.038110118359327316 Loss at step 850: 0.039251405745744705 Loss at step 900: 0.03400469198822975 Mean training loss after epoch 153: 0.04140030328970728 EPOCH: 154 Loss at step 0: 0.03536457195878029 Loss at step 50: 0.035665545612573624 Loss at step 100: 0.03440066799521446 Loss at step 150: 0.05249651521444321 Loss at step 200: 0.03746199607849121 Loss at step 250: 0.042931798845529556 Loss at step 300: 0.055315639823675156 Loss at step 350: 0.03905506432056427 Loss at step 400: 0.04804264008998871 Loss at step 450: 0.03962818533182144 Loss at step 500: 0.04109342768788338 Loss at step 550: 0.0552239716053009 Loss at step 600: 0.04724815860390663 Loss at step 650: 0.03302013501524925 Loss at step 700: 0.043438415974378586 Loss at step 750: 0.03739817067980766 Loss at step 800: 0.04706943780183792 Loss at step 850: 0.035197675228118896 Loss at step 900: 0.04516708478331566 Mean training loss after epoch 154: 0.04130730619515056 EPOCH: 155 Loss at step 0: 0.03467891365289688 Loss at step 50: 0.03850401192903519 Loss at step 100: 0.041584696620702744 Loss at step 150: 0.03654111921787262 Loss at step 200: 0.043707262724637985 Loss at step 250: 0.04661315679550171 Loss at step 300: 0.03193278983235359 Loss at step 350: 0.04390989989042282 Loss at step 400: 0.04261472076177597 Loss at step 450: 0.04774937406182289 Loss at step 500: 0.07615716010332108 Loss at step 550: 0.03692953288555145 Loss at step 600: 0.044842418283224106 Loss at step 650: 0.038759954273700714 Loss at step 700: 0.044489938765764236 Loss at step 750: 0.038699883967638016 Loss at step 800: 0.036361485719680786 Loss at step 850: 0.03979257121682167 Loss at step 900: 0.05347076803445816 Mean training loss after epoch 155: 0.04177116386012546 EPOCH: 156 Loss at step 0: 0.05879873409867287 Loss at step 50: 0.028947211802005768 Loss at step 100: 0.0397520549595356 Loss at step 150: 0.033991921693086624 Loss at step 200: 0.039479754865169525 Loss at step 250: 0.03921077027916908 Loss at step 300: 0.03595222160220146 Loss at step 350: 0.05635802447795868 Loss at step 400: 0.033633120357990265 Loss at step 450: 0.03170834109187126 Loss at step 500: 0.05231324955821037 Loss at step 550: 0.04040734842419624 Loss at step 600: 0.03696528822183609 Loss at step 650: 0.03061082400381565 Loss at step 700: 0.03297709301114082 Loss at step 750: 0.03979022428393364 Loss at step 800: 0.0386577844619751 Loss at step 850: 0.05775272473692894 Loss at step 900: 0.03511408343911171 Mean training loss after epoch 156: 0.04156894519774199 EPOCH: 157 Loss at step 0: 0.050811562687158585 Loss at step 50: 0.03690412640571594 Loss at step 100: 0.05122528225183487 Loss at step 150: 0.04289333522319794 Loss at step 200: 0.04267975687980652 Loss at step 250: 0.03571467101573944 Loss at step 300: 0.045765265822410583 Loss at step 350: 0.03954602777957916 Loss at step 400: 0.03697604313492775 Loss at step 450: 0.04773971810936928 Loss at step 500: 0.06028653681278229 Loss at step 550: 0.05740491673350334 Loss at step 600: 0.0323069728910923 Loss at step 650: 0.03556496649980545 Loss at step 700: 0.03418908268213272 Loss at step 750: 0.03536093980073929 Loss at step 800: 0.04835287481546402 Loss at step 850: 0.04068198427557945 Loss at step 900: 0.05368783324956894 Mean training loss after epoch 157: 0.04127082456427533 EPOCH: 158 Loss at step 0: 0.03775152191519737 Loss at step 50: 0.033143043518066406 Loss at step 100: 0.04213413968682289 Loss at step 150: 0.058427099138498306 Loss at step 200: 0.031781163066625595 Loss at step 250: 0.032437536865472794 Loss at step 300: 0.03743436932563782 Loss at step 350: 0.044606178998947144 Loss at step 400: 0.03823491558432579 Loss at step 450: 0.036084696650505066 Loss at step 500: 0.046157803386449814 Loss at step 550: 0.031442105770111084 Loss at step 600: 0.040949828922748566 Loss at step 650: 0.04821520671248436 Loss at step 700: 0.033100660890340805 Loss at step 750: 0.0539090670645237 Loss at step 800: 0.030354946851730347 Loss at step 850: 0.03413696214556694 Loss at step 900: 0.057310279458761215 Mean training loss after epoch 158: 0.04127330224770409 EPOCH: 159 Loss at step 0: 0.03428351506590843 Loss at step 50: 0.04459705203771591 Loss at step 100: 0.04183425009250641 Loss at step 150: 0.036627672612667084 Loss at step 200: 0.03811206668615341 Loss at step 250: 0.02782668173313141 Loss at step 300: 0.04908157140016556 Loss at step 350: 0.03867924213409424 Loss at step 400: 0.03195834904909134 Loss at step 450: 0.05242755264043808 Loss at step 500: 0.060414742678403854 Loss at step 550: 0.05232596769928932 Loss at step 600: 0.038364823907613754 Loss at step 650: 0.052884411066770554 Loss at step 700: 0.042744919657707214 Loss at step 750: 0.0516200065612793 Loss at step 800: 0.03257932513952255 Loss at step 850: 0.0459163524210453 Loss at step 900: 0.03473147377371788 Mean training loss after epoch 159: 0.04182222306625103 EPOCH: 160 Loss at step 0: 0.040808264166116714 Loss at step 50: 0.05758628994226456 Loss at step 100: 0.05205586552619934 Loss at step 150: 0.05736234039068222 Loss at step 200: 0.039956506341695786 Loss at step 250: 0.0343388132750988 Loss at step 300: 0.029595982283353806 Loss at step 350: 0.03062593936920166 Loss at step 400: 0.034775834530591965 Loss at step 450: 0.04531395062804222 Loss at step 500: 0.03862028568983078 Loss at step 550: 0.0501788854598999 Loss at step 600: 0.05932524800300598 Loss at step 650: 0.04029729962348938 Loss at step 700: 0.044166840612888336 Loss at step 750: 0.03580829128623009 Loss at step 800: 0.035127583891153336 Loss at step 850: 0.03489823266863823 Loss at step 900: 0.0411771759390831 Mean training loss after epoch 160: 0.04113637999907485 EPOCH: 161 Loss at step 0: 0.037108443677425385 Loss at step 50: 0.03884833678603172 Loss at step 100: 0.05023416504263878 Loss at step 150: 0.029741037636995316 Loss at step 200: 0.03433835506439209 Loss at step 250: 0.05427270755171776 Loss at step 300: 0.036310866475105286 Loss at step 350: 0.04984491690993309 Loss at step 400: 0.034500155597925186 Loss at step 450: 0.05055621266365051 Loss at step 500: 0.05776491016149521 Loss at step 550: 0.032918285578489304 Loss at step 600: 0.03631903976202011 Loss at step 650: 0.042213425040245056 Loss at step 700: 0.03325970098376274 Loss at step 750: 0.03703517094254494 Loss at step 800: 0.06844731420278549 Loss at step 850: 0.03261305391788483 Loss at step 900: 0.05112277716398239 Mean training loss after epoch 161: 0.0417790625319838 EPOCH: 162 Loss at step 0: 0.0719052255153656 Loss at step 50: 0.037063807249069214 Loss at step 100: 0.03580861538648605 Loss at step 150: 0.0347788967192173 Loss at step 200: 0.02989416942000389 Loss at step 250: 0.052019696682691574 Loss at step 300: 0.027014371007680893 Loss at step 350: 0.038413770496845245 Loss at step 400: 0.03156735375523567 Loss at step 450: 0.051059555262327194 Loss at step 500: 0.04005306959152222 Loss at step 550: 0.03175788372755051 Loss at step 600: 0.04830549284815788 Loss at step 650: 0.06640689820051193 Loss at step 700: 0.039580654352903366 Loss at step 750: 0.0357203409075737 Loss at step 800: 0.038485798984766006 Loss at step 850: 0.05260685086250305 Loss at step 900: 0.02665777876973152 Mean training loss after epoch 162: 0.041168077376240224 EPOCH: 163 Loss at step 0: 0.04725033417344093 Loss at step 50: 0.04516443982720375 Loss at step 100: 0.04181196540594101 Loss at step 150: 0.059215202927589417 Loss at step 200: 0.0511409193277359 Loss at step 250: 0.044708773493766785 Loss at step 300: 0.056504808366298676 Loss at step 350: 0.03039669431746006 Loss at step 400: 0.03736458346247673 Loss at step 450: 0.03143162652850151 Loss at step 500: 0.03512733429670334 Loss at step 550: 0.03089907206594944 Loss at step 600: 0.03306128829717636 Loss at step 650: 0.033669613301754 Loss at step 700: 0.05137248709797859 Loss at step 750: 0.035417888313531876 Loss at step 800: 0.04103937745094299 Loss at step 850: 0.03747738525271416 Loss at step 900: 0.028445962816476822 Mean training loss after epoch 163: 0.04109104012828201 EPOCH: 164 Loss at step 0: 0.05217801034450531 Loss at step 50: 0.05267312005162239 Loss at step 100: 0.031826451420784 Loss at step 150: 0.052346594631671906 Loss at step 200: 0.04511871188879013 Loss at step 250: 0.037728406488895416 Loss at step 300: 0.06702085584402084 Loss at step 350: 0.037826959043741226 Loss at step 400: 0.06176650524139404 Loss at step 450: 0.029300564900040627 Loss at step 500: 0.04240002855658531 Loss at step 550: 0.045189179480075836 Loss at step 600: 0.0361819788813591 Loss at step 650: 0.029073655605316162 Loss at step 700: 0.03859752044081688 Loss at step 750: 0.05829818919301033 Loss at step 800: 0.033017776906490326 Loss at step 850: 0.03646533936262131 Loss at step 900: 0.06864107400178909 Mean training loss after epoch 164: 0.0410780383567852 EPOCH: 165 Loss at step 0: 0.05876418575644493 Loss at step 50: 0.05110076069831848 Loss at step 100: 0.059062596410512924 Loss at step 150: 0.04298301413655281 Loss at step 200: 0.05270030349493027 Loss at step 250: 0.05030237138271332 Loss at step 300: 0.03602796047925949 Loss at step 350: 0.03711725026369095 Loss at step 400: 0.03948073089122772 Loss at step 450: 0.043773554265499115 Loss at step 500: 0.04284488782286644 Loss at step 550: 0.03366897627711296 Loss at step 600: 0.04140644520521164 Loss at step 650: 0.04177847504615784 Loss at step 700: 0.05340777337551117 Loss at step 750: 0.038395337760448456 Loss at step 800: 0.04130345955491066 Loss at step 850: 0.03217364102602005 Loss at step 900: 0.04178420826792717 Mean training loss after epoch 165: 0.04151375282968857 EPOCH: 166 Loss at step 0: 0.0419134795665741 Loss at step 50: 0.05736503750085831 Loss at step 100: 0.030650990083813667 Loss at step 150: 0.030072642490267754 Loss at step 200: 0.03552139177918434 Loss at step 250: 0.07287216931581497 Loss at step 300: 0.0375552773475647 Loss at step 350: 0.03761951997876167 Loss at step 400: 0.03759915381669998 Loss at step 450: 0.042520273476839066 Loss at step 500: 0.027038510888814926 Loss at step 550: 0.042743224650621414 Loss at step 600: 0.038364190608263016 Loss at step 650: 0.03768661990761757 Loss at step 700: 0.02832593023777008 Loss at step 750: 0.04041239246726036 Loss at step 800: 0.0353480726480484 Loss at step 850: 0.03948948159813881 Loss at step 900: 0.03328026086091995 Mean training loss after epoch 166: 0.04100758559120172 EPOCH: 167 Loss at step 0: 0.0262257419526577 Loss at step 50: 0.04143349081277847 Loss at step 100: 0.060030244290828705 Loss at step 150: 0.03262229636311531 Loss at step 200: 0.05317354202270508 Loss at step 250: 0.03562411293387413 Loss at step 300: 0.03788932040333748 Loss at step 350: 0.04144253954291344 Loss at step 400: 0.049303729087114334 Loss at step 450: 0.02967887930572033 Loss at step 500: 0.056796032935380936 Loss at step 550: 0.038984231650829315 Loss at step 600: 0.038679711520671844 Loss at step 650: 0.03297624737024307 Loss at step 700: 0.033573538064956665 Loss at step 750: 0.03411845862865448 Loss at step 800: 0.047023676335811615 Loss at step 850: 0.029027296230196953 Loss at step 900: 0.05361451208591461 Mean training loss after epoch 167: 0.041467136282449975 EPOCH: 168 Loss at step 0: 0.045670561492443085 Loss at step 50: 0.04795874282717705 Loss at step 100: 0.026450492441654205 Loss at step 150: 0.04658697545528412 Loss at step 200: 0.035420503467321396 Loss at step 250: 0.042320605367422104 Loss at step 300: 0.04044985771179199 Loss at step 350: 0.04170997813344002 Loss at step 400: 0.03641573339700699 Loss at step 450: 0.052922364324331284 Loss at step 500: 0.027648935094475746 Loss at step 550: 0.03640696778893471 Loss at step 600: 0.03652293235063553 Loss at step 650: 0.03035900928080082 Loss at step 700: 0.06778659671545029 Loss at step 750: 0.041280876845121384 Loss at step 800: 0.03142838552594185 Loss at step 850: 0.03208177536725998 Loss at step 900: 0.0396452471613884 Mean training loss after epoch 168: 0.04114864491768229 EPOCH: 169 Loss at step 0: 0.036086320877075195 Loss at step 50: 0.06495927274227142 Loss at step 100: 0.0443490669131279 Loss at step 150: 0.05400795117020607 Loss at step 200: 0.035492513328790665 Loss at step 250: 0.03413419798016548 Loss at step 300: 0.03728095442056656 Loss at step 350: 0.04830026626586914 Loss at step 400: 0.03588800132274628 Loss at step 450: 0.030070966109633446 Loss at step 500: 0.03209591656923294 Loss at step 550: 0.05534738674759865 Loss at step 600: 0.03457661718130112 Loss at step 650: 0.042635127902030945 Loss at step 700: 0.03610777109861374 Loss at step 750: 0.03900205343961716 Loss at step 800: 0.03707462549209595 Loss at step 850: 0.04740383103489876 Loss at step 900: 0.050756413489580154 Mean training loss after epoch 169: 0.04132169490850874 EPOCH: 170 Loss at step 0: 0.05271969735622406 Loss at step 50: 0.046714216470718384 Loss at step 100: 0.049631692469120026 Loss at step 150: 0.032844651490449905 Loss at step 200: 0.04006989672780037 Loss at step 250: 0.041742004454135895 Loss at step 300: 0.041226569563150406 Loss at step 350: 0.055178746581077576 Loss at step 400: 0.03863288834691048 Loss at step 450: 0.045994412153959274 Loss at step 500: 0.05180394649505615 Loss at step 550: 0.03305521234869957 Loss at step 600: 0.06343858689069748 Loss at step 650: 0.03838767111301422 Loss at step 700: 0.03147896006703377 Loss at step 750: 0.046315018087625504 Loss at step 800: 0.034339092671871185 Loss at step 850: 0.04254193231463432 Loss at step 900: 0.03785974904894829 Mean training loss after epoch 170: 0.04170199298361408 EPOCH: 171 Loss at step 0: 0.06740348041057587 Loss at step 50: 0.047490037977695465 Loss at step 100: 0.035187751054763794 Loss at step 150: 0.049540381878614426 Loss at step 200: 0.05009833723306656 Loss at step 250: 0.05929061397910118 Loss at step 300: 0.04258132353425026 Loss at step 350: 0.05649558827280998 Loss at step 400: 0.030305715277791023 Loss at step 450: 0.03909805417060852 Loss at step 500: 0.03631551191210747 Loss at step 550: 0.02946609817445278 Loss at step 600: 0.03998938202857971 Loss at step 650: 0.03758906200528145 Loss at step 700: 0.02867104485630989 Loss at step 750: 0.04174979776144028 Loss at step 800: 0.035577442497015 Loss at step 850: 0.03188648819923401 Loss at step 900: 0.05330802872776985 Mean training loss after epoch 171: 0.040822592387194316 EPOCH: 172 Loss at step 0: 0.033346861600875854 Loss at step 50: 0.036343757063150406 Loss at step 100: 0.055585041642189026 Loss at step 150: 0.032702941447496414 Loss at step 200: 0.04017132893204689 Loss at step 250: 0.029746584594249725 Loss at step 300: 0.05510460585355759 Loss at step 350: 0.034879546612501144 Loss at step 400: 0.06533102691173553 Loss at step 450: 0.042110148817300797 Loss at step 500: 0.03785338252782822 Loss at step 550: 0.052018292248249054 Loss at step 600: 0.03545716404914856 Loss at step 650: 0.04979861527681351 Loss at step 700: 0.03377075865864754 Loss at step 750: 0.03614296391606331 Loss at step 800: 0.054106034338474274 Loss at step 850: 0.03156478330492973 Loss at step 900: 0.037218037992715836 Mean training loss after epoch 172: 0.04138560169763657 EPOCH: 173 Loss at step 0: 0.04330369457602501 Loss at step 50: 0.04168296232819557 Loss at step 100: 0.034111060202121735 Loss at step 150: 0.06202607601881027 Loss at step 200: 0.035057954490184784 Loss at step 250: 0.03576508164405823 Loss at step 300: 0.03573876991868019 Loss at step 350: 0.060402777045965195 Loss at step 400: 0.03696572780609131 Loss at step 450: 0.04163713380694389 Loss at step 500: 0.0348895788192749 Loss at step 550: 0.03917588293552399 Loss at step 600: 0.03556079789996147 Loss at step 650: 0.046132419258356094 Loss at step 700: 0.03614691644906998 Loss at step 750: 0.03507085517048836 Loss at step 800: 0.035289958119392395 Loss at step 850: 0.03328322619199753 Loss at step 900: 0.031033160164952278 Mean training loss after epoch 173: 0.041041162202575567 EPOCH: 174 Loss at step 0: 0.02710857428610325 Loss at step 50: 0.03293299674987793 Loss at step 100: 0.03430355340242386 Loss at step 150: 0.042253073304891586 Loss at step 200: 0.03440471366047859 Loss at step 250: 0.04128659889101982 Loss at step 300: 0.05326399579644203 Loss at step 350: 0.04793737456202507 Loss at step 400: 0.04378213733434677 Loss at step 450: 0.03941934183239937 Loss at step 500: 0.03290329501032829 Loss at step 550: 0.0443020835518837 Loss at step 600: 0.04333820939064026 Loss at step 650: 0.04413630813360214 Loss at step 700: 0.03485510125756264 Loss at step 750: 0.03232556954026222 Loss at step 800: 0.03464951738715172 Loss at step 850: 0.06247880682349205 Loss at step 900: 0.03548730909824371 Mean training loss after epoch 174: 0.041309402826657175 EPOCH: 175 Loss at step 0: 0.06619555503129959 Loss at step 50: 0.057501547038555145 Loss at step 100: 0.03235335275530815 Loss at step 150: 0.041652921587228775 Loss at step 200: 0.05548626556992531 Loss at step 250: 0.03441996872425079 Loss at step 300: 0.034742917865514755 Loss at step 350: 0.049428023397922516 Loss at step 400: 0.03472219780087471 Loss at step 450: 0.0408954918384552 Loss at step 500: 0.03046843409538269 Loss at step 550: 0.04178620129823685 Loss at step 600: 0.03512788191437721 Loss at step 650: 0.028864067047834396 Loss at step 700: 0.03302551433444023 Loss at step 750: 0.051518216729164124 Loss at step 800: 0.035428158938884735 Loss at step 850: 0.03413361683487892 Loss at step 900: 0.032487619668245316 Mean training loss after epoch 175: 0.04128562736469927 EPOCH: 176 Loss at step 0: 0.035202424973249435 Loss at step 50: 0.04528702050447464 Loss at step 100: 0.04495453089475632 Loss at step 150: 0.04800651594996452 Loss at step 200: 0.036405108869075775 Loss at step 250: 0.043345704674720764 Loss at step 300: 0.03719569370150566 Loss at step 350: 0.041217539459466934 Loss at step 400: 0.04344746842980385 Loss at step 450: 0.03599061071872711 Loss at step 500: 0.033683083951473236 Loss at step 550: 0.04049773886799812 Loss at step 600: 0.03449854627251625 Loss at step 650: 0.04139147326350212 Loss at step 700: 0.030807020142674446 Loss at step 750: 0.032249607145786285 Loss at step 800: 0.035520970821380615 Loss at step 850: 0.034074101597070694 Loss at step 900: 0.039867300540208817 Mean training loss after epoch 176: 0.04119277500243647 EPOCH: 177 Loss at step 0: 0.057691194117069244 Loss at step 50: 0.035389818251132965 Loss at step 100: 0.044451769441366196 Loss at step 150: 0.05335525423288345 Loss at step 200: 0.044394008815288544 Loss at step 250: 0.02768351510167122 Loss at step 300: 0.049231935292482376 Loss at step 350: 0.050118301063776016 Loss at step 400: 0.03398577496409416 Loss at step 450: 0.03977412357926369 Loss at step 500: 0.03331071138381958 Loss at step 550: 0.0323900543153286 Loss at step 600: 0.035320207476615906 Loss at step 650: 0.035859931260347366 Loss at step 700: 0.03800678253173828 Loss at step 750: 0.03234219178557396 Loss at step 800: 0.049370381981134415 Loss at step 850: 0.03641175106167793 Loss at step 900: 0.03786250576376915 Mean training loss after epoch 177: 0.041183066975349174 EPOCH: 178 Loss at step 0: 0.04053865373134613 Loss at step 50: 0.04920952394604683 Loss at step 100: 0.03871060162782669 Loss at step 150: 0.034633755683898926 Loss at step 200: 0.04725545272231102 Loss at step 250: 0.031137259677052498 Loss at step 300: 0.04227209836244583 Loss at step 350: 0.03783777356147766 Loss at step 400: 0.03871219977736473 Loss at step 450: 0.03819052502512932 Loss at step 500: 0.039527300745248795 Loss at step 550: 0.05352233350276947 Loss at step 600: 0.0675860047340393 Loss at step 650: 0.03583545237779617 Loss at step 700: 0.03797457739710808 Loss at step 750: 0.04727550223469734 Loss at step 800: 0.04133649170398712 Loss at step 850: 0.02972884103655815 Loss at step 900: 0.040887437760829926 Mean training loss after epoch 178: 0.04102830078278078 EPOCH: 179 Loss at step 0: 0.0496118888258934 Loss at step 50: 0.038384564220905304 Loss at step 100: 0.051486846059560776 Loss at step 150: 0.03191399574279785 Loss at step 200: 0.039020683616399765 Loss at step 250: 0.04334458336234093 Loss at step 300: 0.046673376113176346 Loss at step 350: 0.0322888121008873 Loss at step 400: 0.04057774320244789 Loss at step 450: 0.03424479067325592 Loss at step 500: 0.03743143379688263 Loss at step 550: 0.04013605788350105 Loss at step 600: 0.05277823284268379 Loss at step 650: 0.03414126858115196 Loss at step 700: 0.03626846522092819 Loss at step 750: 0.060391299426555634 Loss at step 800: 0.056807905435562134 Loss at step 850: 0.03397446498274803 Loss at step 900: 0.03473489731550217 Mean training loss after epoch 179: 0.041886192136831375 EPOCH: 180 Loss at step 0: 0.03742155060172081 Loss at step 50: 0.04146679863333702 Loss at step 100: 0.030266009271144867 Loss at step 150: 0.03398127481341362 Loss at step 200: 0.03745659440755844 Loss at step 250: 0.03693356364965439 Loss at step 300: 0.03900127857923508 Loss at step 350: 0.0321943461894989 Loss at step 400: 0.033814482390880585 Loss at step 450: 0.05010516196489334 Loss at step 500: 0.03337934613227844 Loss at step 550: 0.03781595453619957 Loss at step 600: 0.03656068444252014 Loss at step 650: 0.04348238557577133 Loss at step 700: 0.04046044498682022 Loss at step 750: 0.03936430811882019 Loss at step 800: 0.03486889228224754 Loss at step 850: 0.03748004138469696 Loss at step 900: 0.033036306500434875 Mean training loss after epoch 180: 0.04087161542430742 EPOCH: 181 Loss at step 0: 0.03539994731545448 Loss at step 50: 0.0397937037050724 Loss at step 100: 0.06892500072717667 Loss at step 150: 0.03225782513618469 Loss at step 200: 0.0511590801179409 Loss at step 250: 0.03497767820954323 Loss at step 300: 0.038419418036937714 Loss at step 350: 0.05367429926991463 Loss at step 400: 0.04869798570871353 Loss at step 450: 0.03468048572540283 Loss at step 500: 0.037934135645627975 Loss at step 550: 0.04422640800476074 Loss at step 600: 0.031334321945905685 Loss at step 650: 0.0354008711874485 Loss at step 700: 0.04256727173924446 Loss at step 750: 0.03654541075229645 Loss at step 800: 0.026717044413089752 Loss at step 850: 0.03481684997677803 Loss at step 900: 0.03742305561900139 Mean training loss after epoch 181: 0.041199005234724424 EPOCH: 182 Loss at step 0: 0.042348023504018784 Loss at step 50: 0.0344836600124836 Loss at step 100: 0.051688820123672485 Loss at step 150: 0.07034405320882797 Loss at step 200: 0.031865011900663376 Loss at step 250: 0.029475703835487366 Loss at step 300: 0.035261332988739014 Loss at step 350: 0.04605374485254288 Loss at step 400: 0.03675435483455658 Loss at step 450: 0.03763430565595627 Loss at step 500: 0.032832708209753036 Loss at step 550: 0.04272369295358658 Loss at step 600: 0.026183418929576874 Loss at step 650: 0.04753145948052406 Loss at step 700: 0.032421816140413284 Loss at step 750: 0.046281423419713974 Loss at step 800: 0.05420113354921341 Loss at step 850: 0.033526744693517685 Loss at step 900: 0.03910479322075844 Mean training loss after epoch 182: 0.04143104658150343 EPOCH: 183 Loss at step 0: 0.03486000373959541 Loss at step 50: 0.04086889326572418 Loss at step 100: 0.04026981443166733 Loss at step 150: 0.031166674569249153 Loss at step 200: 0.03122401423752308 Loss at step 250: 0.03616003319621086 Loss at step 300: 0.036647699773311615 Loss at step 350: 0.050884950906038284 Loss at step 400: 0.03801600635051727 Loss at step 450: 0.03602983057498932 Loss at step 500: 0.09175577759742737 Loss at step 550: 0.03133425489068031 Loss at step 600: 0.06428657472133636 Loss at step 650: 0.030993375927209854 Loss at step 700: 0.03350936621427536 Loss at step 750: 0.03523945435881615 Loss at step 800: 0.029117180034518242 Loss at step 850: 0.043647605925798416 Loss at step 900: 0.04038027673959732 Mean training loss after epoch 183: 0.041085116166089265 EPOCH: 184 Loss at step 0: 0.05037994682788849 Loss at step 50: 0.06212716922163963 Loss at step 100: 0.03902390971779823 Loss at step 150: 0.052186399698257446 Loss at step 200: 0.05398682504892349 Loss at step 250: 0.04356851428747177 Loss at step 300: 0.04690634831786156 Loss at step 350: 0.045627791434526443 Loss at step 400: 0.04712317883968353 Loss at step 450: 0.036127470433712006 Loss at step 500: 0.035705190151929855 Loss at step 550: 0.041072484105825424 Loss at step 600: 0.034532904624938965 Loss at step 650: 0.03774843364953995 Loss at step 700: 0.02717268094420433 Loss at step 750: 0.05673539638519287 Loss at step 800: 0.03262036293745041 Loss at step 850: 0.05196235328912735 Loss at step 900: 0.037352561950683594 Mean training loss after epoch 184: 0.041593552185798376 EPOCH: 185 Loss at step 0: 0.034667786210775375 Loss at step 50: 0.03664986789226532 Loss at step 100: 0.04453791677951813 Loss at step 150: 0.032478589564561844 Loss at step 200: 0.04739055782556534 Loss at step 250: 0.0600055456161499 Loss at step 300: 0.04286332055926323 Loss at step 350: 0.03538944199681282 Loss at step 400: 0.0353425070643425 Loss at step 450: 0.031036892905831337 Loss at step 500: 0.03956003487110138 Loss at step 550: 0.0572323240339756 Loss at step 600: 0.07209392637014389 Loss at step 650: 0.041867148131132126 Loss at step 700: 0.06022677943110466 Loss at step 750: 0.038538120687007904 Loss at step 800: 0.03907705843448639 Loss at step 850: 0.03262921795248985 Loss at step 900: 0.03066466748714447 Mean training loss after epoch 185: 0.04143010784806346 EPOCH: 186 Loss at step 0: 0.03675982356071472 Loss at step 50: 0.03620411828160286 Loss at step 100: 0.04260580614209175 Loss at step 150: 0.03542805090546608 Loss at step 200: 0.04173298552632332 Loss at step 250: 0.038859572261571884 Loss at step 300: 0.0539051853120327 Loss at step 350: 0.037735700607299805 Loss at step 400: 0.05391717702150345 Loss at step 450: 0.03702409565448761 Loss at step 500: 0.03829742595553398 Loss at step 550: 0.03964691236615181 Loss at step 600: 0.05835528299212456 Loss at step 650: 0.03510157763957977 Loss at step 700: 0.037147484719753265 Loss at step 750: 0.04881170764565468 Loss at step 800: 0.03959803283214569 Loss at step 850: 0.04374637454748154 Loss at step 900: 0.0532485693693161 Mean training loss after epoch 186: 0.041114724868698035 EPOCH: 187 Loss at step 0: 0.03341015428304672 Loss at step 50: 0.03360753878951073 Loss at step 100: 0.037256088107824326 Loss at step 150: 0.037449076771736145 Loss at step 200: 0.05277197062969208 Loss at step 250: 0.03463160991668701 Loss at step 300: 0.04016861692070961 Loss at step 350: 0.048892393708229065 Loss at step 400: 0.05203116685152054 Loss at step 450: 0.0328763946890831 Loss at step 500: 0.03633170202374458 Loss at step 550: 0.03687275946140289 Loss at step 600: 0.041447803378105164 Loss at step 650: 0.039668574929237366 Loss at step 700: 0.028526468202471733 Loss at step 750: 0.029051868245005608 Loss at step 800: 0.03571057692170143 Loss at step 850: 0.031222984194755554 Loss at step 900: 0.06122225895524025 Mean training loss after epoch 187: 0.04135203498727413 EPOCH: 188 Loss at step 0: 0.035144176334142685 Loss at step 50: 0.04409192502498627 Loss at step 100: 0.03803578019142151 Loss at step 150: 0.031051304191350937 Loss at step 200: 0.03822355717420578 Loss at step 250: 0.03701493889093399 Loss at step 300: 0.04047181084752083 Loss at step 350: 0.052611492574214935 Loss at step 400: 0.04568652808666229 Loss at step 450: 0.0398855023086071 Loss at step 500: 0.03960626199841499 Loss at step 550: 0.031328439712524414 Loss at step 600: 0.05476410314440727 Loss at step 650: 0.03687070310115814 Loss at step 700: 0.04068216308951378 Loss at step 750: 0.03235265240073204 Loss at step 800: 0.03952326625585556 Loss at step 850: 0.03331683203577995 Loss at step 900: 0.029711971059441566 Mean training loss after epoch 188: 0.0409105112895306 EPOCH: 189 Loss at step 0: 0.03131984919309616 Loss at step 50: 0.03435014933347702 Loss at step 100: 0.036105986684560776 Loss at step 150: 0.055127598345279694 Loss at step 200: 0.05040585994720459 Loss at step 250: 0.03966245427727699 Loss at step 300: 0.03856890648603439 Loss at step 350: 0.03580622747540474 Loss at step 400: 0.035066716372966766 Loss at step 450: 0.04096367210149765 Loss at step 500: 0.03280964493751526 Loss at step 550: 0.05122831463813782 Loss at step 600: 0.03615153580904007 Loss at step 650: 0.038228102028369904 Loss at step 700: 0.041382741183042526 Loss at step 750: 0.031186502426862717 Loss at step 800: 0.033646564930677414 Loss at step 850: 0.03227361664175987 Loss at step 900: 0.03588711842894554 Mean training loss after epoch 189: 0.04143349572754046 EPOCH: 190 Loss at step 0: 0.0578099824488163 Loss at step 50: 0.03757844492793083 Loss at step 100: 0.038110170513391495 Loss at step 150: 0.05027920752763748 Loss at step 200: 0.03137429431080818 Loss at step 250: 0.052811674773693085 Loss at step 300: 0.036862388253211975 Loss at step 350: 0.050126731395721436 Loss at step 400: 0.05192907527089119 Loss at step 450: 0.03917970508337021 Loss at step 500: 0.03455211967229843 Loss at step 550: 0.05670452117919922 Loss at step 600: 0.03874848783016205 Loss at step 650: 0.055585578083992004 Loss at step 700: 0.042754366993904114 Loss at step 750: 0.043895136564970016 Loss at step 800: 0.041297804564237595 Loss at step 850: 0.06710420548915863 Loss at step 900: 0.03670049458742142 Mean training loss after epoch 190: 0.04147083937391035 EPOCH: 191 Loss at step 0: 0.06263330578804016 Loss at step 50: 0.03823355212807655 Loss at step 100: 0.034148454666137695 Loss at step 150: 0.04550662264227867 Loss at step 200: 0.0418052077293396 Loss at step 250: 0.0550265833735466 Loss at step 300: 0.030911820009350777 Loss at step 350: 0.052474506199359894 Loss at step 400: 0.02898096665740013 Loss at step 450: 0.05503057688474655 Loss at step 500: 0.06547393649816513 Loss at step 550: 0.04336470738053322 Loss at step 600: 0.04436841234564781 Loss at step 650: 0.03836284577846527 Loss at step 700: 0.06932152062654495 Loss at step 750: 0.03472721204161644 Loss at step 800: 0.04640382528305054 Loss at step 850: 0.03905215859413147 Loss at step 900: 0.05053381249308586 Mean training loss after epoch 191: 0.040970529898254476 EPOCH: 192 Loss at step 0: 0.03086368553340435 Loss at step 50: 0.05552278086543083 Loss at step 100: 0.036756258457899094 Loss at step 150: 0.03414138779044151 Loss at step 200: 0.03406932204961777 Loss at step 250: 0.03442727401852608 Loss at step 300: 0.049642808735370636 Loss at step 350: 0.03155268356204033 Loss at step 400: 0.0327191986143589 Loss at step 450: 0.03105553798377514 Loss at step 500: 0.03697984665632248 Loss at step 550: 0.05652059614658356 Loss at step 600: 0.041702114045619965 Loss at step 650: 0.05544755607843399 Loss at step 700: 0.07079564034938812 Loss at step 750: 0.03584619238972664 Loss at step 800: 0.03885915130376816 Loss at step 850: 0.049547575414180756 Loss at step 900: 0.04108596220612526 Mean training loss after epoch 192: 0.04109867157013432 EPOCH: 193 Loss at step 0: 0.03247745335102081 Loss at step 50: 0.03647783398628235 Loss at step 100: 0.05963939055800438 Loss at step 150: 0.03091658279299736 Loss at step 200: 0.031709276139736176 Loss at step 250: 0.04873473197221756 Loss at step 300: 0.06559808552265167 Loss at step 350: 0.02791413478553295 Loss at step 400: 0.044414401054382324 Loss at step 450: 0.03095756284892559 Loss at step 500: 0.03009779565036297 Loss at step 550: 0.03683574125170708 Loss at step 600: 0.052306465804576874 Loss at step 650: 0.029605967923998833 Loss at step 700: 0.03544901683926582 Loss at step 750: 0.04578153416514397 Loss at step 800: 0.041149698197841644 Loss at step 850: 0.031246203929185867 Loss at step 900: 0.03593813627958298 Mean training loss after epoch 193: 0.04117470388727656 EPOCH: 194 Loss at step 0: 0.06487667560577393 Loss at step 50: 0.042295027524232864 Loss at step 100: 0.030577773228287697 Loss at step 150: 0.03241685777902603 Loss at step 200: 0.037086743861436844 Loss at step 250: 0.04891343042254448 Loss at step 300: 0.039207398891448975 Loss at step 350: 0.04557713493704796 Loss at step 400: 0.03431115299463272 Loss at step 450: 0.029251424595713615 Loss at step 500: 0.05570143833756447 Loss at step 550: 0.037910863757133484 Loss at step 600: 0.05352575704455376 Loss at step 650: 0.03852808475494385 Loss at step 700: 0.04708243906497955 Loss at step 750: 0.026353036984801292 Loss at step 800: 0.03650936484336853 Loss at step 850: 0.03329676389694214 Loss at step 900: 0.0403190441429615 Mean training loss after epoch 194: 0.04060127881012047 EPOCH: 195 Loss at step 0: 0.04363727569580078 Loss at step 50: 0.035226691514253616 Loss at step 100: 0.04585903882980347 Loss at step 150: 0.03184151276946068 Loss at step 200: 0.033356472849845886 Loss at step 250: 0.04661582410335541 Loss at step 300: 0.03826885670423508 Loss at step 350: 0.0429307222366333 Loss at step 400: 0.033405471593141556 Loss at step 450: 0.04061249643564224 Loss at step 500: 0.0467667430639267 Loss at step 550: 0.030969833955168724 Loss at step 600: 0.05913640931248665 Loss at step 650: 0.03165191411972046 Loss at step 700: 0.050419941544532776 Loss at step 750: 0.03641653433442116 Loss at step 800: 0.03706834837794304 Loss at step 850: 0.03109755739569664 Loss at step 900: 0.06048808619379997 Mean training loss after epoch 195: 0.04081377769901808 EPOCH: 196 Loss at step 0: 0.04946375638246536 Loss at step 50: 0.03481105715036392 Loss at step 100: 0.03722474351525307 Loss at step 150: 0.05357375741004944 Loss at step 200: 0.034146830439567566 Loss at step 250: 0.0506136454641819 Loss at step 300: 0.03680326044559479 Loss at step 350: 0.06682924181222916 Loss at step 400: 0.03368544578552246 Loss at step 450: 0.032180093228816986 Loss at step 500: 0.06661216169595718 Loss at step 550: 0.05137762799859047 Loss at step 600: 0.056735388934612274 Loss at step 650: 0.03981880098581314 Loss at step 700: 0.062398482114076614 Loss at step 750: 0.031081395223736763 Loss at step 800: 0.03569858893752098 Loss at step 850: 0.03516566380858421 Loss at step 900: 0.04888331890106201 Mean training loss after epoch 196: 0.04106460441189852 EPOCH: 197 Loss at step 0: 0.033267904072999954 Loss at step 50: 0.026825478300452232 Loss at step 100: 0.0333157442510128 Loss at step 150: 0.034367743879556656 Loss at step 200: 0.0363345630466938 Loss at step 250: 0.03655015677213669 Loss at step 300: 0.052015479654073715 Loss at step 350: 0.033158812671899796 Loss at step 400: 0.03191790357232094 Loss at step 450: 0.05308828502893448 Loss at step 500: 0.03845398873090744 Loss at step 550: 0.04206359386444092 Loss at step 600: 0.036213863641023636 Loss at step 650: 0.03715220093727112 Loss at step 700: 0.042153824120759964 Loss at step 750: 0.032749272882938385 Loss at step 800: 0.03813325613737106 Loss at step 850: 0.03370192274451256 Loss at step 900: 0.07624957710504532 Mean training loss after epoch 197: 0.041273357392326473 EPOCH: 198 Loss at step 0: 0.03572851046919823 Loss at step 50: 0.03928769379854202 Loss at step 100: 0.053057439625263214 Loss at step 150: 0.033817801624536514 Loss at step 200: 0.03354412689805031 Loss at step 250: 0.03424428030848503 Loss at step 300: 0.039850182831287384 Loss at step 350: 0.03441798314452171 Loss at step 400: 0.03617231547832489 Loss at step 450: 0.04995562136173248 Loss at step 500: 0.032263126224279404 Loss at step 550: 0.03086046688258648 Loss at step 600: 0.04726880416274071 Loss at step 650: 0.05277194827795029 Loss at step 700: 0.03499541059136391 Loss at step 750: 0.05741386488080025 Loss at step 800: 0.053315434604883194 Loss at step 850: 0.03170153871178627 Loss at step 900: 0.0313238799571991 Mean training loss after epoch 198: 0.04143886743331832 EPOCH: 199 Loss at step 0: 0.0435892753303051 Loss at step 50: 0.0344247967004776 Loss at step 100: 0.03943983465433121 Loss at step 150: 0.028521453961730003 Loss at step 200: 0.04200650006532669 Loss at step 250: 0.03400389850139618 Loss at step 300: 0.0430566631257534 Loss at step 350: 0.03662889450788498 Loss at step 400: 0.03405138477683067 Loss at step 450: 0.03646203875541687 Loss at step 500: 0.038583800196647644 Loss at step 550: 0.03847289830446243 Loss at step 600: 0.05437365174293518 Loss at step 650: 0.030444391071796417 Loss at step 700: 0.03164180368185043 Loss at step 750: 0.04799719527363777 Loss at step 800: 0.03562280535697937 Loss at step 850: 0.03373323753476143 Loss at step 900: 0.04217617213726044 Mean training loss after epoch 199: 0.041037750730255264 EPOCH: 200 Loss at step 0: 0.04337261617183685 Loss at step 50: 0.038126084953546524 Loss at step 100: 0.029905274510383606 Loss at step 150: 0.034626610577106476 Loss at step 200: 0.053894128650426865 Loss at step 250: 0.033408645540475845 Loss at step 300: 0.07837864011526108 Loss at step 350: 0.03306877613067627 Loss at step 400: 0.04004674404859543 Loss at step 450: 0.052700597792863846 Loss at step 500: 0.050933729857206345 Loss at step 550: 0.038936447352170944 Loss at step 600: 0.03535982593894005 Loss at step 650: 0.07919874787330627 Loss at step 700: 0.032308802008628845 Loss at step 750: 0.04241956025362015 Loss at step 800: 0.03665988892316818 Loss at step 850: 0.06026674807071686 Loss at step 900: 0.04620642960071564 Mean training loss after epoch 200: 0.04048385464751136 EPOCH: 201 Loss at step 0: 0.03924839571118355 Loss at step 50: 0.04013318195939064 Loss at step 100: 0.06246310845017433 Loss at step 150: 0.06835775822401047 Loss at step 200: 0.03888271003961563 Loss at step 250: 0.030264925211668015 Loss at step 300: 0.03133225068449974 Loss at step 350: 0.05444778501987457 Loss at step 400: 0.033401113003492355 Loss at step 450: 0.04151122272014618 Loss at step 500: 0.053741589188575745 Loss at step 550: 0.033934518694877625 Loss at step 600: 0.03145533800125122 Loss at step 650: 0.055735740810632706 Loss at step 700: 0.04723541811108589 Loss at step 750: 0.05059937760233879 Loss at step 800: 0.06299877911806107 Loss at step 850: 0.04913831502199173 Loss at step 900: 0.03624095395207405 Mean training loss after epoch 201: 0.04096025045413071 EPOCH: 202 Loss at step 0: 0.03889895975589752 Loss at step 50: 0.05285234749317169 Loss at step 100: 0.02855575829744339 Loss at step 150: 0.03238051384687424 Loss at step 200: 0.05435951426625252 Loss at step 250: 0.03173847496509552 Loss at step 300: 0.03342343121767044 Loss at step 350: 0.03853283450007439 Loss at step 400: 0.053353164345026016 Loss at step 450: 0.03999951109290123 Loss at step 500: 0.06987713277339935 Loss at step 550: 0.029343143105506897 Loss at step 600: 0.0418456569314003 Loss at step 650: 0.03713199868798256 Loss at step 700: 0.03553243353962898 Loss at step 750: 0.04379253834486008 Loss at step 800: 0.03917234018445015 Loss at step 850: 0.028421802446246147 Loss at step 900: 0.05297483131289482 Mean training loss after epoch 202: 0.04110213666796875 EPOCH: 203 Loss at step 0: 0.060226794332265854 Loss at step 50: 0.05047924444079399 Loss at step 100: 0.05316202715039253 Loss at step 150: 0.05685155466198921 Loss at step 200: 0.03843305632472038 Loss at step 250: 0.04069899767637253 Loss at step 300: 0.03171660006046295 Loss at step 350: 0.03657343611121178 Loss at step 400: 0.03817500174045563 Loss at step 450: 0.044746845960617065 Loss at step 500: 0.03472840040922165 Loss at step 550: 0.027824951335787773 Loss at step 600: 0.03443307429552078 Loss at step 650: 0.029447097331285477 Loss at step 700: 0.051438942551612854 Loss at step 750: 0.04129990190267563 Loss at step 800: 0.05596796050667763 Loss at step 850: 0.03629111126065254 Loss at step 900: 0.03315466269850731 Mean training loss after epoch 203: 0.04105745639794989 EPOCH: 204 Loss at step 0: 0.042799364775419235 Loss at step 50: 0.031929537653923035 Loss at step 100: 0.047864317893981934 Loss at step 150: 0.0342666357755661 Loss at step 200: 0.06054456904530525 Loss at step 250: 0.036816492676734924 Loss at step 300: 0.05343258008360863 Loss at step 350: 0.06748872995376587 Loss at step 400: 0.02991330996155739 Loss at step 450: 0.04579372704029083 Loss at step 500: 0.03849664330482483 Loss at step 550: 0.047723498195409775 Loss at step 600: 0.030417494475841522 Loss at step 650: 0.03247131407260895 Loss at step 700: 0.032086338847875595 Loss at step 750: 0.04014010354876518 Loss at step 800: 0.032784026116132736 Loss at step 850: 0.03725343942642212 Loss at step 900: 0.05670224130153656 Mean training loss after epoch 204: 0.04122408698680304 EPOCH: 205 Loss at step 0: 0.03365284577012062 Loss at step 50: 0.04744843393564224 Loss at step 100: 0.03485788404941559 Loss at step 150: 0.04996902495622635 Loss at step 200: 0.03245384618639946 Loss at step 250: 0.06172189861536026 Loss at step 300: 0.03067169338464737 Loss at step 350: 0.03785982355475426 Loss at step 400: 0.035666145384311676 Loss at step 450: 0.038585010915994644 Loss at step 500: 0.05278969183564186 Loss at step 550: 0.03420484811067581 Loss at step 600: 0.035578034818172455 Loss at step 650: 0.05873752012848854 Loss at step 700: 0.052170529961586 Loss at step 750: 0.05123858526349068 Loss at step 800: 0.028626032173633575 Loss at step 850: 0.04657066985964775 Loss at step 900: 0.04766450449824333 Mean training loss after epoch 205: 0.04061389337581739 EPOCH: 206 Loss at step 0: 0.034108567982912064 Loss at step 50: 0.03980407863855362 Loss at step 100: 0.0626382902264595 Loss at step 150: 0.04959193989634514 Loss at step 200: 0.04817379638552666 Loss at step 250: 0.037478357553482056 Loss at step 300: 0.03300425782799721 Loss at step 350: 0.036181725561618805 Loss at step 400: 0.03484601154923439 Loss at step 450: 0.03569401055574417 Loss at step 500: 0.02932155877351761 Loss at step 550: 0.035176441073417664 Loss at step 600: 0.0315680094063282 Loss at step 650: 0.030586695298552513 Loss at step 700: 0.04044463112950325 Loss at step 750: 0.027646826580166817 Loss at step 800: 0.04065338522195816 Loss at step 850: 0.032977815717458725 Loss at step 900: 0.04012962058186531 Mean training loss after epoch 206: 0.04040954973914031 EPOCH: 207 Loss at step 0: 0.031025512143969536 Loss at step 50: 0.03509480878710747 Loss at step 100: 0.0297771655023098 Loss at step 150: 0.031918950378894806 Loss at step 200: 0.048496000468730927 Loss at step 250: 0.037156231701374054 Loss at step 300: 0.053501810878515244 Loss at step 350: 0.044525884091854095 Loss at step 400: 0.032685536891222 Loss at step 450: 0.038589347153902054 Loss at step 500: 0.0452488474547863 Loss at step 550: 0.03623788803815842 Loss at step 600: 0.06447766721248627 Loss at step 650: 0.036816906183958054 Loss at step 700: 0.0436214953660965 Loss at step 750: 0.03152133524417877 Loss at step 800: 0.03654224053025246 Loss at step 850: 0.029152270406484604 Loss at step 900: 0.038390688598155975 Mean training loss after epoch 207: 0.040911576085126224 EPOCH: 208 Loss at step 0: 0.03644964471459389 Loss at step 50: 0.03678005561232567 Loss at step 100: 0.054620370268821716 Loss at step 150: 0.051956068724393845 Loss at step 200: 0.03831140324473381 Loss at step 250: 0.0378035232424736 Loss at step 300: 0.04875868558883667 Loss at step 350: 0.03341371566057205 Loss at step 400: 0.05621539056301117 Loss at step 450: 0.06496856361627579 Loss at step 500: 0.03538235276937485 Loss at step 550: 0.03590003401041031 Loss at step 600: 0.04737596958875656 Loss at step 650: 0.04989689588546753 Loss at step 700: 0.0391291119158268 Loss at step 750: 0.03200911357998848 Loss at step 800: 0.03762365132570267 Loss at step 850: 0.03629754111170769 Loss at step 900: 0.046959009021520615 Mean training loss after epoch 208: 0.04091805286769038 EPOCH: 209 Loss at step 0: 0.03670303523540497 Loss at step 50: 0.032707370817661285 Loss at step 100: 0.02870992384850979 Loss at step 150: 0.035916294902563095 Loss at step 200: 0.03935522958636284 Loss at step 250: 0.030148180201649666 Loss at step 300: 0.035139184445142746 Loss at step 350: 0.03461229428648949 Loss at step 400: 0.046298567205667496 Loss at step 450: 0.036341067403554916 Loss at step 500: 0.043749138712882996 Loss at step 550: 0.0375540591776371 Loss at step 600: 0.03832840174436569 Loss at step 650: 0.049754347652196884 Loss at step 700: 0.06401419639587402 Loss at step 750: 0.03371839225292206 Loss at step 800: 0.09534450620412827 Loss at step 850: 0.048229265958070755 Loss at step 900: 0.03963363915681839 Mean training loss after epoch 209: 0.04082295423480811 EPOCH: 210 Loss at step 0: 0.03700472414493561 Loss at step 50: 0.038069579750299454 Loss at step 100: 0.045794859528541565 Loss at step 150: 0.05461961403489113 Loss at step 200: 0.050047628581523895 Loss at step 250: 0.029595399275422096 Loss at step 300: 0.05655821040272713 Loss at step 350: 0.053081512451171875 Loss at step 400: 0.04027143865823746 Loss at step 450: 0.03776984661817551 Loss at step 500: 0.03503318503499031 Loss at step 550: 0.045148659497499466 Loss at step 600: 0.04900876805186272 Loss at step 650: 0.05401453748345375 Loss at step 700: 0.03438520058989525 Loss at step 750: 0.03253091499209404 Loss at step 800: 0.05371645465493202 Loss at step 850: 0.029090404510498047 Loss at step 900: 0.03355353698134422 Mean training loss after epoch 210: 0.040992089714020935 EPOCH: 211 Loss at step 0: 0.03264230117201805 Loss at step 50: 0.049675457179546356 Loss at step 100: 0.03588966280221939 Loss at step 150: 0.03382323682308197 Loss at step 200: 0.04609042778611183 Loss at step 250: 0.038608118891716 Loss at step 300: 0.028927184641361237 Loss at step 350: 0.05451912805438042 Loss at step 400: 0.03432918339967728 Loss at step 450: 0.0514654815196991 Loss at step 500: 0.03658546134829521 Loss at step 550: 0.040894895792007446 Loss at step 600: 0.06478168815374374 Loss at step 650: 0.03764785826206207 Loss at step 700: 0.03644264489412308 Loss at step 750: 0.03765944018959999 Loss at step 800: 0.035398706793785095 Loss at step 850: 0.03518440201878548 Loss at step 900: 0.035572197288274765 Mean training loss after epoch 211: 0.04080253048762202 EPOCH: 212 Loss at step 0: 0.03544224053621292 Loss at step 50: 0.02887314185500145 Loss at step 100: 0.031207507476210594 Loss at step 150: 0.030686045065522194 Loss at step 200: 0.02721082791686058 Loss at step 250: 0.04124588891863823 Loss at step 300: 0.03821634128689766 Loss at step 350: 0.04851790890097618 Loss at step 400: 0.03651542589068413 Loss at step 450: 0.03496086224913597 Loss at step 500: 0.034377869218587875 Loss at step 550: 0.031918007880449295 Loss at step 600: 0.0316789485514164 Loss at step 650: 0.03203672170639038 Loss at step 700: 0.03798976540565491 Loss at step 750: 0.032948777079582214 Loss at step 800: 0.04611369967460632 Loss at step 850: 0.03311717510223389 Loss at step 900: 0.04931173473596573 Mean training loss after epoch 212: 0.041240211911817225 EPOCH: 213 Loss at step 0: 0.036282576620578766 Loss at step 50: 0.04555473476648331 Loss at step 100: 0.05341045930981636 Loss at step 150: 0.03163037449121475 Loss at step 200: 0.04202600196003914 Loss at step 250: 0.028469834476709366 Loss at step 300: 0.032680150121450424 Loss at step 350: 0.03266773745417595 Loss at step 400: 0.03356439247727394 Loss at step 450: 0.039978548884391785 Loss at step 500: 0.034787192940711975 Loss at step 550: 0.038250215351581573 Loss at step 600: 0.03187960386276245 Loss at step 650: 0.03330293297767639 Loss at step 700: 0.05375072732567787 Loss at step 750: 0.036330875009298325 Loss at step 800: 0.04288564622402191 Loss at step 850: 0.03796966373920441 Loss at step 900: 0.039988789707422256 Mean training loss after epoch 213: 0.041377121014699245 EPOCH: 214 Loss at step 0: 0.03678497299551964 Loss at step 50: 0.041952233761548996 Loss at step 100: 0.03371589630842209 Loss at step 150: 0.05489826574921608 Loss at step 200: 0.03933741897344589 Loss at step 250: 0.03126215934753418 Loss at step 300: 0.04887827858328819 Loss at step 350: 0.04601048305630684 Loss at step 400: 0.03451801836490631 Loss at step 450: 0.04811180755496025 Loss at step 500: 0.033375758677721024 Loss at step 550: 0.03335057571530342 Loss at step 600: 0.03660382330417633 Loss at step 650: 0.030024170875549316 Loss at step 700: 0.08038553595542908 Loss at step 750: 0.05596690624952316 Loss at step 800: 0.041750915348529816 Loss at step 850: 0.0345877967774868 Loss at step 900: 0.03914251551032066 Mean training loss after epoch 214: 0.04040251136906366 EPOCH: 215 Loss at step 0: 0.03518815338611603 Loss at step 50: 0.03783980756998062 Loss at step 100: 0.03582343831658363 Loss at step 150: 0.03563719615340233 Loss at step 200: 0.03279443085193634 Loss at step 250: 0.038114920258522034 Loss at step 300: 0.030715307220816612 Loss at step 350: 0.054344769567251205 Loss at step 400: 0.040084633976221085 Loss at step 450: 0.05290379747748375 Loss at step 500: 0.05472366884350777 Loss at step 550: 0.03370008245110512 Loss at step 600: 0.04700184613466263 Loss at step 650: 0.05084618180990219 Loss at step 700: 0.05089908093214035 Loss at step 750: 0.038635145872831345 Loss at step 800: 0.033889152109622955 Loss at step 850: 0.036383986473083496 Loss at step 900: 0.05539562553167343 Mean training loss after epoch 215: 0.04109134432126973 EPOCH: 216 Loss at step 0: 0.03574740141630173 Loss at step 50: 0.04133062809705734 Loss at step 100: 0.034683309495449066 Loss at step 150: 0.036068178713321686 Loss at step 200: 0.037907298654317856 Loss at step 250: 0.06637134402990341 Loss at step 300: 0.03382106497883797 Loss at step 350: 0.04029597342014313 Loss at step 400: 0.055082645267248154 Loss at step 450: 0.04917321354150772 Loss at step 500: 0.041472189128398895 Loss at step 550: 0.03399905562400818 Loss at step 600: 0.036293838173151016 Loss at step 650: 0.0507781058549881 Loss at step 700: 0.032008301466703415 Loss at step 750: 0.03810732439160347 Loss at step 800: 0.03786887601017952 Loss at step 850: 0.05404569208621979 Loss at step 900: 0.03898085653781891 Mean training loss after epoch 216: 0.0413106051164427 EPOCH: 217 Loss at step 0: 0.05661312863230705 Loss at step 50: 0.02945994772017002 Loss at step 100: 0.06003616005182266 Loss at step 150: 0.03934501111507416 Loss at step 200: 0.04473843798041344 Loss at step 250: 0.0633082389831543 Loss at step 300: 0.03692268952727318 Loss at step 350: 0.028801854699850082 Loss at step 400: 0.035873860120773315 Loss at step 450: 0.03431267663836479 Loss at step 500: 0.039170462638139725 Loss at step 550: 0.04494684562087059 Loss at step 600: 0.04163341969251633 Loss at step 650: 0.05725857987999916 Loss at step 700: 0.042347654700279236 Loss at step 750: 0.03535016253590584 Loss at step 800: 0.03905250504612923 Loss at step 850: 0.058260444551706314 Loss at step 900: 0.031771834939718246 Mean training loss after epoch 217: 0.04060951891198341 EPOCH: 218 Loss at step 0: 0.02915436588227749 Loss at step 50: 0.03229684755206108 Loss at step 100: 0.036171264946460724 Loss at step 150: 0.03890503942966461 Loss at step 200: 0.03237678110599518 Loss at step 250: 0.04330482706427574 Loss at step 300: 0.03708843141794205 Loss at step 350: 0.03246277570724487 Loss at step 400: 0.042354777455329895 Loss at step 450: 0.03352414071559906 Loss at step 500: 0.05508609488606453 Loss at step 550: 0.044320281594991684 Loss at step 600: 0.030840106308460236 Loss at step 650: 0.03831968829035759 Loss at step 700: 0.02537611499428749 Loss at step 750: 0.04614359512925148 Loss at step 800: 0.037960946559906006 Loss at step 850: 0.047622259706258774 Loss at step 900: 0.03617832437157631 Mean training loss after epoch 218: 0.04028225765188238 EPOCH: 219 Loss at step 0: 0.036237385123968124 Loss at step 50: 0.053000420331954956 Loss at step 100: 0.045414019376039505 Loss at step 150: 0.04280359670519829 Loss at step 200: 0.050488777458667755 Loss at step 250: 0.03622067719697952 Loss at step 300: 0.04424312710762024 Loss at step 350: 0.07010378688573837 Loss at step 400: 0.033997781574726105 Loss at step 450: 0.041352324187755585 Loss at step 500: 0.0505574569106102 Loss at step 550: 0.04738761484622955 Loss at step 600: 0.032967064529657364 Loss at step 650: 0.036979660391807556 Loss at step 700: 0.04071382060647011 Loss at step 750: 0.03467709943652153 Loss at step 800: 0.03316859155893326 Loss at step 850: 0.033415865153074265 Loss at step 900: 0.04367862641811371 Mean training loss after epoch 219: 0.04067866485327609 EPOCH: 220 Loss at step 0: 0.036585185676813126 Loss at step 50: 0.04451674595475197 Loss at step 100: 0.04751574993133545 Loss at step 150: 0.032734110951423645 Loss at step 200: 0.03369263559579849 Loss at step 250: 0.03185194730758667 Loss at step 300: 0.03537176921963692 Loss at step 350: 0.03204319626092911 Loss at step 400: 0.03527896851301193 Loss at step 450: 0.03393465653061867 Loss at step 500: 0.032165929675102234 Loss at step 550: 0.0340350866317749 Loss at step 600: 0.029470698907971382 Loss at step 650: 0.0314963161945343 Loss at step 700: 0.03525137901306152 Loss at step 750: 0.0354752279818058 Loss at step 800: 0.0589137002825737 Loss at step 850: 0.03656838834285736 Loss at step 900: 0.03468156233429909 Mean training loss after epoch 220: 0.04095931985040209 EPOCH: 221 Loss at step 0: 0.048009198158979416 Loss at step 50: 0.03965720161795616 Loss at step 100: 0.0327623188495636 Loss at step 150: 0.04059423506259918 Loss at step 200: 0.04813171923160553 Loss at step 250: 0.0363464280962944 Loss at step 300: 0.03229498118162155 Loss at step 350: 0.03783958777785301 Loss at step 400: 0.0393109992146492 Loss at step 450: 0.03206159919500351 Loss at step 500: 0.03763585165143013 Loss at step 550: 0.031863339245319366 Loss at step 600: 0.04703124612569809 Loss at step 650: 0.039514150470495224 Loss at step 700: 0.03863257169723511 Loss at step 750: 0.06303341686725616 Loss at step 800: 0.044238682836294174 Loss at step 850: 0.03570760041475296 Loss at step 900: 0.04780176654458046 Mean training loss after epoch 221: 0.04102550619748482 EPOCH: 222 Loss at step 0: 0.03219601884484291 Loss at step 50: 0.03331765905022621 Loss at step 100: 0.03110005334019661 Loss at step 150: 0.03601983189582825 Loss at step 200: 0.03793782368302345 Loss at step 250: 0.06809261441230774 Loss at step 300: 0.033426880836486816 Loss at step 350: 0.05162347853183746 Loss at step 400: 0.040187105536460876 Loss at step 450: 0.031734976917505264 Loss at step 500: 0.02812698855996132 Loss at step 550: 0.0368264801800251 Loss at step 600: 0.03218604624271393 Loss at step 650: 0.0666574090719223 Loss at step 700: 0.031853388994932175 Loss at step 750: 0.031155046075582504 Loss at step 800: 0.041822850704193115 Loss at step 850: 0.03040562942624092 Loss at step 900: 0.03476181626319885 Mean training loss after epoch 222: 0.040585690670446165 EPOCH: 223 Loss at step 0: 0.05190090090036392 Loss at step 50: 0.033086225390434265 Loss at step 100: 0.03847149759531021 Loss at step 150: 0.02937181480228901 Loss at step 200: 0.031451307237148285 Loss at step 250: 0.049538251012563705 Loss at step 300: 0.03875332325696945 Loss at step 350: 0.03406327962875366 Loss at step 400: 0.03891347348690033 Loss at step 450: 0.05313325300812721 Loss at step 500: 0.029113050550222397 Loss at step 550: 0.04073594510555267 Loss at step 600: 0.030082348734140396 Loss at step 650: 0.03769651800394058 Loss at step 700: 0.034968018531799316 Loss at step 750: 0.04075436294078827 Loss at step 800: 0.03300822526216507 Loss at step 850: 0.0325869619846344 Loss at step 900: 0.05560174211859703 Mean training loss after epoch 223: 0.04131811408242628 EPOCH: 224 Loss at step 0: 0.038598671555519104 Loss at step 50: 0.03439020738005638 Loss at step 100: 0.036015018820762634 Loss at step 150: 0.03827633708715439 Loss at step 200: 0.05665145069360733 Loss at step 250: 0.03604110702872276 Loss at step 300: 0.03394341096282005 Loss at step 350: 0.04257431626319885 Loss at step 400: 0.04357015714049339 Loss at step 450: 0.03115519881248474 Loss at step 500: 0.037695907056331635 Loss at step 550: 0.0394899919629097 Loss at step 600: 0.03302190080285072 Loss at step 650: 0.038717836141586304 Loss at step 700: 0.03684534877538681 Loss at step 750: 0.03798103332519531 Loss at step 800: 0.03686536103487015 Loss at step 850: 0.038721222430467606 Loss at step 900: 0.03423561900854111 Mean training loss after epoch 224: 0.040887425991216066 EPOCH: 225 Loss at step 0: 0.0350559838116169 Loss at step 50: 0.05149857699871063 Loss at step 100: 0.03987925872206688 Loss at step 150: 0.03651684895157814 Loss at step 200: 0.03998488560318947 Loss at step 250: 0.05090925097465515 Loss at step 300: 0.03307835012674332 Loss at step 350: 0.03626140579581261 Loss at step 400: 0.036097023636102676 Loss at step 450: 0.035245321691036224 Loss at step 500: 0.03443649783730507 Loss at step 550: 0.04304465278983116 Loss at step 600: 0.034231144934892654 Loss at step 650: 0.04270724952220917 Loss at step 700: 0.03977229446172714 Loss at step 750: 0.04652891308069229 Loss at step 800: 0.03666549548506737 Loss at step 850: 0.035559654235839844 Loss at step 900: 0.036141537129879 Mean training loss after epoch 225: 0.04059989765278502 EPOCH: 226 Loss at step 0: 0.041318703442811966 Loss at step 50: 0.04630459472537041 Loss at step 100: 0.051989611238241196 Loss at step 150: 0.04431668668985367 Loss at step 200: 0.038641273975372314 Loss at step 250: 0.03931621089577675 Loss at step 300: 0.0440056174993515 Loss at step 350: 0.06051286682486534 Loss at step 400: 0.038130324333906174 Loss at step 450: 0.03695256635546684 Loss at step 500: 0.032698579132556915 Loss at step 550: 0.03373121842741966 Loss at step 600: 0.03621514514088631 Loss at step 650: 0.040953610092401505 Loss at step 700: 0.03880661353468895 Loss at step 750: 0.03910220041871071 Loss at step 800: 0.042733803391456604 Loss at step 850: 0.05858441814780235 Loss at step 900: 0.048480771481990814 Mean training loss after epoch 226: 0.04105129703950844 EPOCH: 227 Loss at step 0: 0.03476101532578468 Loss at step 50: 0.043932873755693436 Loss at step 100: 0.04377767816185951 Loss at step 150: 0.03412061557173729 Loss at step 200: 0.032839011400938034 Loss at step 250: 0.03314913809299469 Loss at step 300: 0.047224752604961395 Loss at step 350: 0.03201420605182648 Loss at step 400: 0.042330123484134674 Loss at step 450: 0.035332586616277695 Loss at step 500: 0.03097502887248993 Loss at step 550: 0.05088911950588226 Loss at step 600: 0.051642902195453644 Loss at step 650: 0.03411383926868439 Loss at step 700: 0.030035002157092094 Loss at step 750: 0.03293563798069954 Loss at step 800: 0.03734913468360901 Loss at step 850: 0.03618234768509865 Loss at step 900: 0.05087332800030708 Mean training loss after epoch 227: 0.04022779434061508 EPOCH: 228 Loss at step 0: 0.04110071063041687 Loss at step 50: 0.032724037766456604 Loss at step 100: 0.03459864482283592 Loss at step 150: 0.042574286460876465 Loss at step 200: 0.035179343074560165 Loss at step 250: 0.0490020252764225 Loss at step 300: 0.05132973939180374 Loss at step 350: 0.03468789532780647 Loss at step 400: 0.028886545449495316 Loss at step 450: 0.03913021460175514 Loss at step 500: 0.05317218229174614 Loss at step 550: 0.03887658938765526 Loss at step 600: 0.05439882352948189 Loss at step 650: 0.03609241172671318 Loss at step 700: 0.029882274568080902 Loss at step 750: 0.034023791551589966 Loss at step 800: 0.030449653044342995 Loss at step 850: 0.034879419952631 Loss at step 900: 0.048086121678352356 Mean training loss after epoch 228: 0.04066776138331209 EPOCH: 229 Loss at step 0: 0.04594782367348671 Loss at step 50: 0.03350931033492088 Loss at step 100: 0.03459792584180832 Loss at step 150: 0.034039516001939774 Loss at step 200: 0.039268556982278824 Loss at step 250: 0.04344497621059418 Loss at step 300: 0.033134061843156815 Loss at step 350: 0.05308954045176506 Loss at step 400: 0.030663713812828064 Loss at step 450: 0.05199190974235535 Loss at step 500: 0.06791495531797409 Loss at step 550: 0.03276384621858597 Loss at step 600: 0.05020755156874657 Loss at step 650: 0.04773669317364693 Loss at step 700: 0.03743778541684151 Loss at step 750: 0.04412699490785599 Loss at step 800: 0.03982185572385788 Loss at step 850: 0.028231678530573845 Loss at step 900: 0.040991488844156265 Mean training loss after epoch 229: 0.04034455153130011 EPOCH: 230 Loss at step 0: 0.07370340079069138 Loss at step 50: 0.04149540886282921 Loss at step 100: 0.04249609634280205 Loss at step 150: 0.04175019636750221 Loss at step 200: 0.03379853442311287 Loss at step 250: 0.045653898268938065 Loss at step 300: 0.034466374665498734 Loss at step 350: 0.02741246670484543 Loss at step 400: 0.045745011419057846 Loss at step 450: 0.06563558429479599 Loss at step 500: 0.04089518263936043 Loss at step 550: 0.035255882889032364 Loss at step 600: 0.037620749324560165 Loss at step 650: 0.040671128779649734 Loss at step 700: 0.032700713723897934 Loss at step 750: 0.03918527439236641 Loss at step 800: 0.040701042860746384 Loss at step 850: 0.02955266274511814 Loss at step 900: 0.03290962800383568 Mean training loss after epoch 230: 0.04049838393894848 EPOCH: 231 Loss at step 0: 0.03640205040574074 Loss at step 50: 0.03952917084097862 Loss at step 100: 0.03547775000333786 Loss at step 150: 0.031360697001218796 Loss at step 200: 0.06404799222946167 Loss at step 250: 0.02789343148469925 Loss at step 300: 0.03425315022468567 Loss at step 350: 0.039719726890325546 Loss at step 400: 0.033137597143650055 Loss at step 450: 0.053653739392757416 Loss at step 500: 0.033349234610795975 Loss at step 550: 0.05908425897359848 Loss at step 600: 0.04200682416558266 Loss at step 650: 0.03911254182457924 Loss at step 700: 0.04086443781852722 Loss at step 750: 0.03599826991558075 Loss at step 800: 0.05697152763605118 Loss at step 850: 0.0363885723054409 Loss at step 900: 0.05578959733247757 Mean training loss after epoch 231: 0.04121733449701307 EPOCH: 232 Loss at step 0: 0.046681761741638184 Loss at step 50: 0.03475075215101242 Loss at step 100: 0.05200602859258652 Loss at step 150: 0.04910620301961899 Loss at step 200: 0.04948532581329346 Loss at step 250: 0.03690744936466217 Loss at step 300: 0.05005256086587906 Loss at step 350: 0.03267233073711395 Loss at step 400: 0.05046040564775467 Loss at step 450: 0.03491542115807533 Loss at step 500: 0.03435011953115463 Loss at step 550: 0.03899304196238518 Loss at step 600: 0.059160251170396805 Loss at step 650: 0.04317529872059822 Loss at step 700: 0.029207885265350342 Loss at step 750: 0.03262770548462868 Loss at step 800: 0.028827568516135216 Loss at step 850: 0.03712737560272217 Loss at step 900: 0.034314170479774475 Mean training loss after epoch 232: 0.04035682764563606 EPOCH: 233 Loss at step 0: 0.03284892067313194 Loss at step 50: 0.049067527055740356 Loss at step 100: 0.04263941943645477 Loss at step 150: 0.03516695275902748 Loss at step 200: 0.03370160236954689 Loss at step 250: 0.03362370282411575 Loss at step 300: 0.043473053723573685 Loss at step 350: 0.03507721796631813 Loss at step 400: 0.04045109450817108 Loss at step 450: 0.04194480553269386 Loss at step 500: 0.03533252701163292 Loss at step 550: 0.028089255094528198 Loss at step 600: 0.05269410461187363 Loss at step 650: 0.05129024758934975 Loss at step 700: 0.05418984219431877 Loss at step 750: 0.04270755872130394 Loss at step 800: 0.03565696254372597 Loss at step 850: 0.03896743059158325 Loss at step 900: 0.03765369951725006 Mean training loss after epoch 233: 0.04035992385632892 EPOCH: 234 Loss at step 0: 0.032117605209350586 Loss at step 50: 0.04209525138139725 Loss at step 100: 0.04184500128030777 Loss at step 150: 0.03609657660126686 Loss at step 200: 0.04544885829091072 Loss at step 250: 0.03819899633526802 Loss at step 300: 0.03512510284781456 Loss at step 350: 0.030431503430008888 Loss at step 400: 0.03644801676273346 Loss at step 450: 0.038362495601177216 Loss at step 500: 0.05434475839138031 Loss at step 550: 0.033140186220407486 Loss at step 600: 0.03765227273106575 Loss at step 650: 0.031903404742479324 Loss at step 700: 0.04018140211701393 Loss at step 750: 0.047592416405677795 Loss at step 800: 0.030580997467041016 Loss at step 850: 0.03485675901174545 Loss at step 900: 0.06050544232130051 Mean training loss after epoch 234: 0.04079634596162768 EPOCH: 235 Loss at step 0: 0.04100864753127098 Loss at step 50: 0.045375462621450424 Loss at step 100: 0.03317685425281525 Loss at step 150: 0.038317758589982986 Loss at step 200: 0.03303635120391846 Loss at step 250: 0.03378681465983391 Loss at step 300: 0.03232065215706825 Loss at step 350: 0.042727887630462646 Loss at step 400: 0.03047262318432331 Loss at step 450: 0.0421324148774147 Loss at step 500: 0.0328185148537159 Loss at step 550: 0.035245370119810104 Loss at step 600: 0.048470910638570786 Loss at step 650: 0.0406319685280323 Loss at step 700: 0.040761057287454605 Loss at step 750: 0.03468814864754677 Loss at step 800: 0.05839885026216507 Loss at step 850: 0.03739485517144203 Loss at step 900: 0.030294056981801987 Mean training loss after epoch 235: 0.040335652350918696 EPOCH: 236 Loss at step 0: 0.05109997093677521 Loss at step 50: 0.03329673409461975 Loss at step 100: 0.03527875244617462 Loss at step 150: 0.03676069527864456 Loss at step 200: 0.054956935346126556 Loss at step 250: 0.03587140142917633 Loss at step 300: 0.05181976035237312 Loss at step 350: 0.03496068716049194 Loss at step 400: 0.03678620606660843 Loss at step 450: 0.05036969110369682 Loss at step 500: 0.052894409745931625 Loss at step 550: 0.029731588438153267 Loss at step 600: 0.0547807402908802 Loss at step 650: 0.032831307500600815 Loss at step 700: 0.05393674224615097 Loss at step 750: 0.035394664853811264 Loss at step 800: 0.03782997652888298 Loss at step 850: 0.039177678525447845 Loss at step 900: 0.03182186186313629 Mean training loss after epoch 236: 0.04128357183251744 EPOCH: 237 Loss at step 0: 0.033669646829366684 Loss at step 50: 0.03389575332403183 Loss at step 100: 0.030661456286907196 Loss at step 150: 0.05719205364584923 Loss at step 200: 0.05400202423334122 Loss at step 250: 0.054050132632255554 Loss at step 300: 0.03732125461101532 Loss at step 350: 0.030867664143443108 Loss at step 400: 0.03047124668955803 Loss at step 450: 0.027111828327178955 Loss at step 500: 0.030552463605999947 Loss at step 550: 0.041529107838869095 Loss at step 600: 0.03469231724739075 Loss at step 650: 0.02823128178715706 Loss at step 700: 0.033046361058950424 Loss at step 750: 0.04491984099149704 Loss at step 800: 0.035873882472515106 Loss at step 850: 0.03740305081009865 Loss at step 900: 0.05181649327278137 Mean training loss after epoch 237: 0.040515869788364814 EPOCH: 238 Loss at step 0: 0.03407023474574089 Loss at step 50: 0.03716685622930527 Loss at step 100: 0.0499604269862175 Loss at step 150: 0.041235875338315964 Loss at step 200: 0.03628018870949745 Loss at step 250: 0.046335335820913315 Loss at step 300: 0.038733601570129395 Loss at step 350: 0.02899951860308647 Loss at step 400: 0.04672306776046753 Loss at step 450: 0.038131825625896454 Loss at step 500: 0.039655301719903946 Loss at step 550: 0.03420468792319298 Loss at step 600: 0.042391419410705566 Loss at step 650: 0.034384068101644516 Loss at step 700: 0.04296164587140083 Loss at step 750: 0.03288166597485542 Loss at step 800: 0.042064595967531204 Loss at step 850: 0.035995323210954666 Loss at step 900: 0.03905920311808586 Mean training loss after epoch 238: 0.04110643325218641 EPOCH: 239 Loss at step 0: 0.042766958475112915 Loss at step 50: 0.034336868673563004 Loss at step 100: 0.05833307281136513 Loss at step 150: 0.047431789338588715 Loss at step 200: 0.041414499282836914 Loss at step 250: 0.032134439796209335 Loss at step 300: 0.03527076542377472 Loss at step 350: 0.028329158201813698 Loss at step 400: 0.04356451332569122 Loss at step 450: 0.03153983876109123 Loss at step 500: 0.03495301306247711 Loss at step 550: 0.06634292006492615 Loss at step 600: 0.03570905327796936 Loss at step 650: 0.04344918578863144 Loss at step 700: 0.06957556307315826 Loss at step 750: 0.0414244681596756 Loss at step 800: 0.04601111263036728 Loss at step 850: 0.04992087930440903 Loss at step 900: 0.0490182563662529 Mean training loss after epoch 239: 0.04107364757991295 EPOCH: 240 Loss at step 0: 0.06585916131734848 Loss at step 50: 0.047516003251075745 Loss at step 100: 0.04976578429341316 Loss at step 150: 0.03511520475149155 Loss at step 200: 0.03421030938625336 Loss at step 250: 0.03885245323181152 Loss at step 300: 0.03844355791807175 Loss at step 350: 0.03810760751366615 Loss at step 400: 0.0391947403550148 Loss at step 450: 0.03753313049674034 Loss at step 500: 0.053552012890577316 Loss at step 550: 0.051457297056913376 Loss at step 600: 0.034328434616327286 Loss at step 650: 0.05210643634200096 Loss at step 700: 0.04153870418667793 Loss at step 750: 0.053673624992370605 Loss at step 800: 0.05263229086995125 Loss at step 850: 0.03452400118112564 Loss at step 900: 0.04052826762199402 Mean training loss after epoch 240: 0.040676945895909755 EPOCH: 241 Loss at step 0: 0.06603045016527176 Loss at step 50: 0.03743353486061096 Loss at step 100: 0.05328306928277016 Loss at step 150: 0.039216965436935425 Loss at step 200: 0.06239703670144081 Loss at step 250: 0.03253446891903877 Loss at step 300: 0.05133199691772461 Loss at step 350: 0.027969395741820335 Loss at step 400: 0.04014642909169197 Loss at step 450: 0.04667861387133598 Loss at step 500: 0.032404977828264236 Loss at step 550: 0.052615776658058167 Loss at step 600: 0.05234615132212639 Loss at step 650: 0.03000178560614586 Loss at step 700: 0.049136750400066376 Loss at step 750: 0.03488517552614212 Loss at step 800: 0.032697007060050964 Loss at step 850: 0.04658398777246475 Loss at step 900: 0.03619987890124321 Mean training loss after epoch 241: 0.04073687086997828 EPOCH: 242 Loss at step 0: 0.08837753534317017 Loss at step 50: 0.03601660579442978 Loss at step 100: 0.05402151122689247 Loss at step 150: 0.04023151472210884 Loss at step 200: 0.03392839431762695 Loss at step 250: 0.06174050271511078 Loss at step 300: 0.03861714154481888 Loss at step 350: 0.0401429645717144 Loss at step 400: 0.048623085021972656 Loss at step 450: 0.031068669632077217 Loss at step 500: 0.044573377817869186 Loss at step 550: 0.05028491094708443 Loss at step 600: 0.03808924928307533 Loss at step 650: 0.04322211071848869 Loss at step 700: 0.04736390337347984 Loss at step 750: 0.03251202404499054 Loss at step 800: 0.027967162430286407 Loss at step 850: 0.03763602301478386 Loss at step 900: 0.029998404905200005 Mean training loss after epoch 242: 0.04054957908838352 EPOCH: 243 Loss at step 0: 0.05490536615252495 Loss at step 50: 0.049660008400678635 Loss at step 100: 0.03286217898130417 Loss at step 150: 0.036375973373651505 Loss at step 200: 0.03979400172829628 Loss at step 250: 0.03045588545501232 Loss at step 300: 0.03500024974346161 Loss at step 350: 0.05770951136946678 Loss at step 400: 0.03377716988325119 Loss at step 450: 0.037807807326316833 Loss at step 500: 0.039293888956308365 Loss at step 550: 0.03406384587287903 Loss at step 600: 0.05588801950216293 Loss at step 650: 0.03701205551624298 Loss at step 700: 0.0533134788274765 Loss at step 750: 0.04013683646917343 Loss at step 800: 0.030643966048955917 Loss at step 850: 0.031996432691812515 Loss at step 900: 0.03502169996500015 Mean training loss after epoch 243: 0.04062247800547431 EPOCH: 244 Loss at step 0: 0.03444995731115341 Loss at step 50: 0.036059148609638214 Loss at step 100: 0.0301399864256382 Loss at step 150: 0.028246015310287476 Loss at step 200: 0.03636842966079712 Loss at step 250: 0.03803431987762451 Loss at step 300: 0.03876959905028343 Loss at step 350: 0.051011521369218826 Loss at step 400: 0.0383102111518383 Loss at step 450: 0.030653782188892365 Loss at step 500: 0.04666541516780853 Loss at step 550: 0.033463481813669205 Loss at step 600: 0.03481241688132286 Loss at step 650: 0.03974812477827072 Loss at step 700: 0.0426596999168396 Loss at step 750: 0.029317786917090416 Loss at step 800: 0.04094095155596733 Loss at step 850: 0.03516566380858421 Loss at step 900: 0.03830796852707863 Mean training loss after epoch 244: 0.04025751948038907 EPOCH: 245 Loss at step 0: 0.05015711113810539 Loss at step 50: 0.03536130487918854 Loss at step 100: 0.03635382652282715 Loss at step 150: 0.05867026746273041 Loss at step 200: 0.031406860798597336 Loss at step 250: 0.03355463221669197 Loss at step 300: 0.033481843769550323 Loss at step 350: 0.052395060658454895 Loss at step 400: 0.046079665422439575 Loss at step 450: 0.03515570983290672 Loss at step 500: 0.028048772364854813 Loss at step 550: 0.04184780269861221 Loss at step 600: 0.025456659495830536 Loss at step 650: 0.03420046716928482 Loss at step 700: 0.0326826386153698 Loss at step 750: 0.03399045020341873 Loss at step 800: 0.03406073525547981 Loss at step 850: 0.040993593633174896 Loss at step 900: 0.033859625458717346 Mean training loss after epoch 245: 0.04099307534719772 EPOCH: 246 Loss at step 0: 0.03609120100736618 Loss at step 50: 0.03703043982386589 Loss at step 100: 0.0365465022623539 Loss at step 150: 0.032843463122844696 Loss at step 200: 0.03396597504615784 Loss at step 250: 0.03837744519114494 Loss at step 300: 0.04299570620059967 Loss at step 350: 0.035457078367471695 Loss at step 400: 0.032963745296001434 Loss at step 450: 0.033488914370536804 Loss at step 500: 0.03309255093336105 Loss at step 550: 0.032369352877140045 Loss at step 600: 0.03163677826523781 Loss at step 650: 0.05191655829548836 Loss at step 700: 0.048612914979457855 Loss at step 750: 0.04112313687801361 Loss at step 800: 0.03423643857240677 Loss at step 850: 0.03122253715991974 Loss at step 900: 0.061019983142614365 Mean training loss after epoch 246: 0.040807000127062995 EPOCH: 247 Loss at step 0: 0.030693301931023598 Loss at step 50: 0.03418082371354103 Loss at step 100: 0.030874159187078476 Loss at step 150: 0.0276369359344244 Loss at step 200: 0.034766342490911484 Loss at step 250: 0.035995710641145706 Loss at step 300: 0.05377712473273277 Loss at step 350: 0.029132088646292686 Loss at step 400: 0.033872269093990326 Loss at step 450: 0.03130783140659332 Loss at step 500: 0.03842512518167496 Loss at step 550: 0.06892840564250946 Loss at step 600: 0.030725587159395218 Loss at step 650: 0.05028015747666359 Loss at step 700: 0.04937463626265526 Loss at step 750: 0.03966902196407318 Loss at step 800: 0.046255528926849365 Loss at step 850: 0.033612992614507675 Loss at step 900: 0.04044083133339882 Mean training loss after epoch 247: 0.040103690728529304 EPOCH: 248 Loss at step 0: 0.022905169054865837 Loss at step 50: 0.02825915440917015 Loss at step 100: 0.0354083776473999 Loss at step 150: 0.055598869919776917 Loss at step 200: 0.03948677331209183 Loss at step 250: 0.03647748380899429 Loss at step 300: 0.043860580772161484 Loss at step 350: 0.05674099177122116 Loss at step 400: 0.046612389385700226 Loss at step 450: 0.029206717386841774 Loss at step 500: 0.03615832328796387 Loss at step 550: 0.04181687906384468 Loss at step 600: 0.052520204335451126 Loss at step 650: 0.032829415053129196 Loss at step 700: 0.03388149291276932 Loss at step 750: 0.03407541662454605 Loss at step 800: 0.032868679612874985 Loss at step 850: 0.03385961800813675 Loss at step 900: 0.037560608237981796 Mean training loss after epoch 248: 0.04033867702650617 EPOCH: 249 Loss at step 0: 0.03196587786078453 Loss at step 50: 0.03017467074096203 Loss at step 100: 0.04060886800289154 Loss at step 150: 0.05703726038336754 Loss at step 200: 0.039137259125709534 Loss at step 250: 0.037107184529304504 Loss at step 300: 0.03888140991330147 Loss at step 350: 0.037356168031692505 Loss at step 400: 0.03527894243597984 Loss at step 450: 0.03414120525121689 Loss at step 500: 0.038231149315834045 Loss at step 550: 0.05978032574057579 Loss at step 600: 0.03419792279601097 Loss at step 650: 0.04076605290174484 Loss at step 700: 0.03359724208712578 Loss at step 750: 0.03526807576417923 Loss at step 800: 0.0678853839635849 Loss at step 850: 0.03583000972867012 Loss at step 900: 0.04986990988254547 Mean training loss after epoch 249: 0.04056774090522769 EPOCH: 250 Loss at step 0: 0.03321145102381706 Loss at step 50: 0.036163344979286194 Loss at step 100: 0.06365063786506653 Loss at step 150: 0.043113335967063904 Loss at step 200: 0.03192630410194397 Loss at step 250: 0.061995044350624084 Loss at step 300: 0.031983382999897 Loss at step 350: 0.03961939364671707 Loss at step 400: 0.0464235283434391 Loss at step 450: 0.03585807606577873 Loss at step 500: 0.037509240210056305 Loss at step 550: 0.03257523104548454 Loss at step 600: 0.03942615166306496 Loss at step 650: 0.036237914115190506 Loss at step 700: 0.04845541715621948 Loss at step 750: 0.035197626799345016 Loss at step 800: 0.0368398055434227 Loss at step 850: 0.03899800404906273 Loss at step 900: 0.037544772028923035 Mean training loss after epoch 250: 0.04045310073188627 EPOCH: 251 Loss at step 0: 0.037194546312093735 Loss at step 50: 0.03993240371346474 Loss at step 100: 0.03463948890566826 Loss at step 150: 0.05161776766180992 Loss at step 200: 0.03141540661454201 Loss at step 250: 0.0454881489276886 Loss at step 300: 0.06912378966808319 Loss at step 350: 0.03287580981850624 Loss at step 400: 0.04460114613175392 Loss at step 450: 0.03356293588876724 Loss at step 500: 0.03255254030227661 Loss at step 550: 0.05133015662431717 Loss at step 600: 0.06959903985261917 Loss at step 650: 0.034467048943042755 Loss at step 700: 0.038957349956035614 Loss at step 750: 0.05736846104264259 Loss at step 800: 0.06558053195476532 Loss at step 850: 0.04637990519404411 Loss at step 900: 0.03879043459892273 Mean training loss after epoch 251: 0.04082436600465701 EPOCH: 252 Loss at step 0: 0.03219963610172272 Loss at step 50: 0.05536222830414772 Loss at step 100: 0.0331990011036396 Loss at step 150: 0.041927989572286606 Loss at step 200: 0.03609161451458931 Loss at step 250: 0.028216199949383736 Loss at step 300: 0.02834271267056465 Loss at step 350: 0.03645806014537811 Loss at step 400: 0.033808644860982895 Loss at step 450: 0.06627167016267776 Loss at step 500: 0.040605492889881134 Loss at step 550: 0.03351821377873421 Loss at step 600: 0.034697841852903366 Loss at step 650: 0.06765446066856384 Loss at step 700: 0.0329233780503273 Loss at step 750: 0.04380008950829506 Loss at step 800: 0.040970005095005035 Loss at step 850: 0.03628602251410484 Loss at step 900: 0.040875885635614395 Mean training loss after epoch 252: 0.04052129749860019 EPOCH: 253 Loss at step 0: 0.031450606882572174 Loss at step 50: 0.039175134152173996 Loss at step 100: 0.03360241651535034 Loss at step 150: 0.033307407051324844 Loss at step 200: 0.03203302249312401 Loss at step 250: 0.06097416952252388 Loss at step 300: 0.052457477897405624 Loss at step 350: 0.037789829075336456 Loss at step 400: 0.035727884620428085 Loss at step 450: 0.03333180770277977 Loss at step 500: 0.035464078187942505 Loss at step 550: 0.053760211914777756 Loss at step 600: 0.044210035353899 Loss at step 650: 0.03467017784714699 Loss at step 700: 0.033193401992321014 Loss at step 750: 0.032985176891088486 Loss at step 800: 0.02771451510488987 Loss at step 850: 0.03529129549860954 Loss at step 900: 0.030993495136499405 Mean training loss after epoch 253: 0.04094931620286345 EPOCH: 254 Loss at step 0: 0.03597330302000046 Loss at step 50: 0.035820286720991135 Loss at step 100: 0.06784122437238693 Loss at step 150: 0.04478087276220322 Loss at step 200: 0.03867571800947189 Loss at step 250: 0.05590273439884186 Loss at step 300: 0.036192916333675385 Loss at step 350: 0.03621829301118851 Loss at step 400: 0.04097563773393631 Loss at step 450: 0.030006734654307365 Loss at step 500: 0.03598341718316078 Loss at step 550: 0.030993429943919182 Loss at step 600: 0.03479975834488869 Loss at step 650: 0.049355681985616684 Loss at step 700: 0.037697046995162964 Loss at step 750: 0.03279855102300644 Loss at step 800: 0.047051846981048584 Loss at step 850: 0.050114840269088745 Loss at step 900: 0.035594236105680466 Mean training loss after epoch 254: 0.041024236963676616 EPOCH: 255 Loss at step 0: 0.036270253360271454 Loss at step 50: 0.027531743049621582 Loss at step 100: 0.0329333171248436 Loss at step 150: 0.028931349515914917 Loss at step 200: 0.045733362436294556 Loss at step 250: 0.02787296660244465 Loss at step 300: 0.03179275617003441 Loss at step 350: 0.042607054114341736 Loss at step 400: 0.04592496529221535 Loss at step 450: 0.045363813638687134 Loss at step 500: 0.031503453850746155 Loss at step 550: 0.048280421644449234 Loss at step 600: 0.03866904601454735 Loss at step 650: 0.03599901497364044 Loss at step 700: 0.03498953580856323 Loss at step 750: 0.036033205687999725 Loss at step 800: 0.06665211915969849 Loss at step 850: 0.03595655411481857 Loss at step 900: 0.06520474702119827 Mean training loss after epoch 255: 0.04107395446360874 EPOCH: 256 Loss at step 0: 0.03461553156375885 Loss at step 50: 0.05570172145962715 Loss at step 100: 0.03918387368321419 Loss at step 150: 0.03947175294160843 Loss at step 200: 0.05647904425859451 Loss at step 250: 0.03421878442168236 Loss at step 300: 0.046404361724853516 Loss at step 350: 0.03359692543745041 Loss at step 400: 0.05041421949863434 Loss at step 450: 0.041159775108098984 Loss at step 500: 0.048340387642383575 Loss at step 550: 0.04249272868037224 Loss at step 600: 0.0327446348965168 Loss at step 650: 0.03699032589793205 Loss at step 700: 0.03066154755651951 Loss at step 750: 0.03852835297584534 Loss at step 800: 0.032849788665771484 Loss at step 850: 0.032775960862636566 Loss at step 900: 0.027619091793894768 Mean training loss after epoch 256: 0.04033945776077349 /athenahomes/gabrijel/miniconda3/envs/track-generator/lib/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/athenahomes/gabrijel/miniconda3/envs/track-generator/lib/python3.11/site-packages/torchvision/image.so: undefined symbol: _ZN3c1017RegisterOperatorsD1Ev'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source? warn( Schedule: cosine Cfg: True Output path: /scratch/shared/beegfs/gabrijel/m2l/mini Patch Size: 2 Device: cuda:2 ===================================================================================== Layer (type:depth-idx) Param # ===================================================================================== DiT 75,264 ├─PatchEmbed: 1-1 -- │ └─Conv2d: 2-1 1,920 ├─TimestepEmbedder: 1-2 -- │ └─Mlp: 2-2 -- │ │ └─Linear: 3-1 98,688 │ │ └─SiLU: 3-2 -- │ │ └─Linear: 3-3 147,840 ├─LabelEmbedder: 1-3 -- │ └─Embedding: 2-3 4,224 ├─ModuleList: 1-4 -- │ └─DiTBlock: 2-4 -- │ │ └─LayerNorm: 3-4 -- │ │ └─MultiheadAttention: 3-5 591,360 │ │ └─LayerNorm: 3-6 -- │ │ └─Mlp: 3-7 1,181,568 │ │ └─Sequential: 3-8 887,040 │ └─DiTBlock: 2-5 -- │ │ └─LayerNorm: 3-9 -- │ │ └─MultiheadAttention: 3-10 591,360 │ │ └─LayerNorm: 3-11 -- │ │ └─Mlp: 3-12 1,181,568 │ │ └─Sequential: 3-13 887,040 │ └─DiTBlock: 2-6 -- │ │ └─LayerNorm: 3-14 -- │ │ └─MultiheadAttention: 3-15 591,360 │ │ └─LayerNorm: 3-16 -- │ │ └─Mlp: 3-17 1,181,568 │ │ └─Sequential: 3-18 887,040 │ └─DiTBlock: 2-7 -- │ │ └─LayerNorm: 3-19 -- │ │ └─MultiheadAttention: 3-20 591,360 │ │ └─LayerNorm: 3-21 -- │ │ └─Mlp: 3-22 1,181,568 │ │ └─Sequential: 3-23 887,040 │ └─DiTBlock: 2-8 -- │ │ └─LayerNorm: 3-24 -- │ │ └─MultiheadAttention: 3-25 591,360 │ │ └─LayerNorm: 3-26 -- │ │ └─Mlp: 3-27 1,181,568 │ │ └─Sequential: 3-28 887,040 │ └─DiTBlock: 2-9 -- │ │ └─LayerNorm: 3-29 -- │ │ └─MultiheadAttention: 3-30 591,360 │ │ └─LayerNorm: 3-31 -- │ │ └─Mlp: 3-32 1,181,568 │ │ └─Sequential: 3-33 887,040 ├─FinalLayer: 1-5 -- │ └─LayerNorm: 2-10 -- │ └─Linear: 2-11 1,540 │ └─Sequential: 2-12 -- │ │ └─SiLU: 3-34 -- │ │ └─Linear: 3-35 295,680 ├─Unpatchify: 1-6 -- ===================================================================================== Total params: 16,584,964 Trainable params: 16,509,700 Non-trainable params: 75,264 ===================================================================================== EPOCH: 1 Loss at step 0: 0.9932535886764526 Loss at step 50: 0.2630425691604614 Loss at step 100: 0.1743299514055252 Loss at step 150: 0.12547697126865387 Loss at step 200: 0.13166950643062592 Loss at step 250: 0.125342458486557 Loss at step 300: 0.09910686314105988 Loss at step 350: 0.10796795785427094 Loss at step 400: 0.1221262663602829 Loss at step 450: 0.10806342214345932 Loss at step 500: 0.1035642996430397 Loss at step 550: 0.10258157551288605 Loss at step 600: 0.09074510633945465 Loss at step 650: 0.08687906712293625 Loss at step 700: 0.08930078893899918 Loss at step 750: 0.10323440283536911 Loss at step 800: 0.10141115635633469 Loss at step 850: 0.09151140600442886 Loss at step 900: 0.09285473078489304 Mean training loss after epoch 1: 0.14185436455997577 EPOCH: 2 Loss at step 0: 0.07944030314683914 Loss at step 50: 0.07709163427352905 Loss at step 100: 0.13487502932548523 Loss at step 150: 0.08502394706010818 Loss at step 200: 0.08500946313142776 Loss at step 250: 0.07470999658107758 Loss at step 300: 0.10066939145326614 Loss at step 350: 0.08946935087442398 Loss at step 400: 0.07538481801748276 Loss at step 450: 0.09641749411821365 Loss at step 500: 0.06413993239402771 Loss at step 550: 0.06424680352210999 Loss at step 600: 0.0533856600522995 Loss at step 650: 0.09370730817317963 Loss at step 700: 0.06133076176047325 Loss at step 750: 0.09496406465768814 Loss at step 800: 0.05922211334109306 Loss at step 850: 0.05885959044098854 Loss at step 900: 0.06334391981363297 Mean training loss after epoch 2: 0.07592437919110123 EPOCH: 3 Loss at step 0: 0.06674038618803024 Loss at step 50: 0.07711636275053024 Loss at step 100: 0.06107000634074211 Loss at step 150: 0.056310877203941345 Loss at step 200: 0.06908971071243286 Loss at step 250: 0.05715049430727959 Loss at step 300: 0.05947834998369217 Loss at step 350: 0.04663698002696037 Loss at step 400: 0.08153398334980011 Loss at step 450: 0.060365304350852966 Loss at step 500: 0.051955923438072205 Loss at step 550: 0.05961690470576286 Loss at step 600: 0.06051172316074371 Loss at step 650: 0.05173591151833534 Loss at step 700: 0.08113512396812439 Loss at step 750: 0.04647127911448479 Loss at step 800: 0.05414755269885063 Loss at step 850: 0.057110197842121124 Loss at step 900: 0.05876457691192627 Mean training loss after epoch 3: 0.06321073489100822 EPOCH: 4 Loss at step 0: 0.07289804518222809 Loss at step 50: 0.06844298541545868 Loss at step 100: 0.05744721740484238 Loss at step 150: 0.05562254786491394 Loss at step 200: 0.057991355657577515 Loss at step 250: 0.09215304255485535 Loss at step 300: 0.04858565330505371 Loss at step 350: 0.04806015267968178 Loss at step 400: 0.07025350630283356 Loss at step 450: 0.05072960630059242 Loss at step 500: 0.04687938466668129 Loss at step 550: 0.05058708414435387 Loss at step 600: 0.05769632011651993 Loss at step 650: 0.05611583590507507 Loss at step 700: 0.0539575032889843 Loss at step 750: 0.056668393313884735 Loss at step 800: 0.05010407045483589 Loss at step 850: 0.049048032611608505 Loss at step 900: 0.04864206910133362 Mean training loss after epoch 4: 0.057920614726094805 EPOCH: 5 Loss at step 0: 0.04823382943868637 Loss at step 50: 0.05461782589554787 Loss at step 100: 0.04084683209657669 Loss at step 150: 0.061489954590797424 Loss at step 200: 0.06747445464134216 Loss at step 250: 0.05822606384754181 Loss at step 300: 0.04529913142323494 Loss at step 350: 0.10371663421392441 Loss at step 400: 0.06650077551603317 Loss at step 450: 0.05847576633095741 Loss at step 500: 0.05942286178469658 Loss at step 550: 0.05403144285082817 Loss at step 600: 0.04998457431793213 Loss at step 650: 0.07330800592899323 Loss at step 700: 0.048948053270578384 Loss at step 750: 0.04250054433941841 Loss at step 800: 0.0428321547806263 Loss at step 850: 0.05379991605877876 Loss at step 900: 0.05059806630015373 Mean training loss after epoch 5: 0.0551789347002946 EPOCH: 6 Loss at step 0: 0.05581948161125183 Loss at step 50: 0.04329179227352142 Loss at step 100: 0.04308076202869415 Loss at step 150: 0.050919294357299805 Loss at step 200: 0.04057968780398369 Loss at step 250: 0.0447704941034317 Loss at step 300: 0.05596938356757164 Loss at step 350: 0.0597475990653038 Loss at step 400: 0.075344979763031 Loss at step 450: 0.04752616211771965 Loss at step 500: 0.058950815349817276 Loss at step 550: 0.049937713891267776 Loss at step 600: 0.060685060918331146 Loss at step 650: 0.050157614052295685 Loss at step 700: 0.048464685678482056 Loss at step 750: 0.04795868322253227 Loss at step 800: 0.05310791730880737 Loss at step 850: 0.0493113175034523 Loss at step 900: 0.06706822663545609 Mean training loss after epoch 6: 0.05340378787884834 EPOCH: 7 Loss at step 0: 0.055170636624097824 Loss at step 50: 0.05320220813155174 Loss at step 100: 0.04377249255776405 Loss at step 150: 0.0584837831556797 Loss at step 200: 0.050307661294937134 Loss at step 250: 0.04241902008652687 Loss at step 300: 0.05166137218475342 Loss at step 350: 0.05133924260735512 Loss at step 400: 0.045748207718133926 Loss at step 450: 0.056590888649225235 Loss at step 500: 0.050919681787490845 Loss at step 550: 0.050546955317258835 Loss at step 600: 0.04254843294620514 Loss at step 650: 0.0629238709807396 Loss at step 700: 0.053775761276483536 Loss at step 750: 0.049437426030635834 Loss at step 800: 0.046010784804821014 Loss at step 850: 0.04704846069216728 Loss at step 900: 0.057113878428936005 Mean training loss after epoch 7: 0.05232442173955918 EPOCH: 8 Loss at step 0: 0.040361832827329636 Loss at step 50: 0.06372988969087601 Loss at step 100: 0.047242458909749985 Loss at step 150: 0.056410349905490875 Loss at step 200: 0.048039451241493225 Loss at step 250: 0.0394059419631958 Loss at step 300: 0.048461101949214935 Loss at step 350: 0.046813201159238815 Loss at step 400: 0.04798107221722603 Loss at step 450: 0.04235563427209854 Loss at step 500: 0.03799458220601082 Loss at step 550: 0.04764721915125847 Loss at step 600: 0.05948030576109886 Loss at step 650: 0.052605487406253815 Loss at step 700: 0.047650016844272614 Loss at step 750: 0.04891142621636391 Loss at step 800: 0.06510608643293381 Loss at step 850: 0.044250987470149994 Loss at step 900: 0.06204114109277725 Mean training loss after epoch 8: 0.05154025904746897 EPOCH: 9 Loss at step 0: 0.04769778624176979 Loss at step 50: 0.04842979088425636 Loss at step 100: 0.051555998623371124 Loss at step 150: 0.06605996936559677 Loss at step 200: 0.044502753764390945 Loss at step 250: 0.059082720428705215 Loss at step 300: 0.0448286347091198 Loss at step 350: 0.043029773980379105 Loss at step 400: 0.042953841388225555 Loss at step 450: 0.05791505426168442 Loss at step 500: 0.04353957995772362 Loss at step 550: 0.03545535355806351 Loss at step 600: 0.04019762575626373 Loss at step 650: 0.0569273978471756 Loss at step 700: 0.0450766421854496 Loss at step 750: 0.04332722723484039 Loss at step 800: 0.04152940958738327 Loss at step 850: 0.047423381358385086 Loss at step 900: 0.056479621678590775 Mean training loss after epoch 9: 0.04982353130708943 EPOCH: 10 Loss at step 0: 0.052253469824790955 Loss at step 50: 0.059585221111774445 Loss at step 100: 0.03595810383558273 Loss at step 150: 0.05831580609083176 Loss at step 200: 0.06015695631504059 Loss at step 250: 0.045252226293087006 Loss at step 300: 0.0339214913547039 Loss at step 350: 0.04507286474108696 Loss at step 400: 0.0473455935716629 Loss at step 450: 0.04684752598404884 Loss at step 500: 0.0629328116774559 Loss at step 550: 0.04464546963572502 Loss at step 600: 0.053853973746299744 Loss at step 650: 0.043534062802791595 Loss at step 700: 0.05253700539469719 Loss at step 750: 0.06018798053264618 Loss at step 800: 0.04854215681552887 Loss at step 850: 0.04951377585530281 Loss at step 900: 0.0603373758494854 Mean training loss after epoch 10: 0.04955167105909922 EPOCH: 11 Loss at step 0: 0.06460548937320709 Loss at step 50: 0.05029812455177307 Loss at step 100: 0.05762842670083046 Loss at step 150: 0.0484875850379467 Loss at step 200: 0.0458974651992321 Loss at step 250: 0.04300424084067345 Loss at step 300: 0.061454303562641144 Loss at step 350: 0.04394501447677612 Loss at step 400: 0.05177269130945206 Loss at step 450: 0.05563750118017197 Loss at step 500: 0.04398886859416962 Loss at step 550: 0.06250730901956558 Loss at step 600: 0.0529044084250927 Loss at step 650: 0.06119568273425102 Loss at step 700: 0.055559467524290085 Loss at step 750: 0.046526629477739334 Loss at step 800: 0.062142759561538696 Loss at step 850: 0.04586857929825783 Loss at step 900: 0.039168428629636765 Mean training loss after epoch 11: 0.049766023617499926 EPOCH: 12 Loss at step 0: 0.04362691193819046 Loss at step 50: 0.046639323234558105 Loss at step 100: 0.07070167362689972 Loss at step 150: 0.03905355557799339 Loss at step 200: 0.050857119262218475 Loss at step 250: 0.03940210118889809 Loss at step 300: 0.04031214118003845 Loss at step 350: 0.044904954731464386 Loss at step 400: 0.04492027685046196 Loss at step 450: 0.05213872715830803 Loss at step 500: 0.03942315652966499 Loss at step 550: 0.04276583716273308 Loss at step 600: 0.04411996528506279 Loss at step 650: 0.06122054159641266 Loss at step 700: 0.03853228688240051 Loss at step 750: 0.04667110741138458 Loss at step 800: 0.03845730423927307 Loss at step 850: 0.0704936608672142 Loss at step 900: 0.05559678375720978 Mean training loss after epoch 12: 0.04843359212202431 EPOCH: 13 Loss at step 0: 0.04446792975068092 Loss at step 50: 0.055675115436315536 Loss at step 100: 0.062025487422943115 Loss at step 150: 0.04083752632141113 Loss at step 200: 0.06226179748773575 Loss at step 250: 0.03949614241719246 Loss at step 300: 0.044973455369472504 Loss at step 350: 0.05003393813967705 Loss at step 400: 0.05251089483499527 Loss at step 450: 0.05650754272937775 Loss at step 500: 0.04386991262435913 Loss at step 550: 0.059380169957876205 Loss at step 600: 0.06400784105062485 Loss at step 650: 0.04621538147330284 Loss at step 700: 0.045852627605199814 Loss at step 750: 0.05936848744750023 Loss at step 800: 0.04248848557472229 Loss at step 850: 0.04663265496492386 Loss at step 900: 0.044340021908283234 Mean training loss after epoch 13: 0.04769417119305779 EPOCH: 14 Loss at step 0: 0.04853847622871399 Loss at step 50: 0.03559271618723869 Loss at step 100: 0.03818679228425026 Loss at step 150: 0.05065193772315979 Loss at step 200: 0.03607818856835365 Loss at step 250: 0.0366886705160141 Loss at step 300: 0.048720892518758774 Loss at step 350: 0.05315859243273735 Loss at step 400: 0.037995558232069016 Loss at step 450: 0.04311233386397362 Loss at step 500: 0.041030850261449814 Loss at step 550: 0.040761057287454605 Loss at step 600: 0.0421484038233757 Loss at step 650: 0.061798591166734695 Loss at step 700: 0.04598328843712807 Loss at step 750: 0.054651569575071335 Loss at step 800: 0.04551733657717705 Loss at step 850: 0.05162295326590538 Loss at step 900: 0.03977259248495102 Mean training loss after epoch 14: 0.046762214187206996 EPOCH: 15 Loss at step 0: 0.04066102206707001 Loss at step 50: 0.03480575233697891 Loss at step 100: 0.042666856199502945 Loss at step 150: 0.032877564430236816 Loss at step 200: 0.044425565749406815 Loss at step 250: 0.05716993287205696 Loss at step 300: 0.048846252262592316 Loss at step 350: 0.046949345618486404 Loss at step 400: 0.050232741981744766 Loss at step 450: 0.042339760810136795 Loss at step 500: 0.0397944450378418 Loss at step 550: 0.03722016140818596 Loss at step 600: 0.03750521317124367 Loss at step 650: 0.04574858769774437 Loss at step 700: 0.048305295407772064 Loss at step 750: 0.04286493360996246 Loss at step 800: 0.042163312435150146 Loss at step 850: 0.053275126963853836 Loss at step 900: 0.04498055577278137 Mean training loss after epoch 15: 0.04649805390973018 EPOCH: 16 Loss at step 0: 0.05235812067985535 Loss at step 50: 0.050848338752985 Loss at step 100: 0.032529279589653015 Loss at step 150: 0.04161744937300682 Loss at step 200: 0.03343380242586136 Loss at step 250: 0.042494237422943115 Loss at step 300: 0.047855034470558167 Loss at step 350: 0.045861296355724335 Loss at step 400: 0.059920888394117355 Loss at step 450: 0.04518888518214226 Loss at step 500: 0.04230086877942085 Loss at step 550: 0.04409023001790047 Loss at step 600: 0.04298695549368858 Loss at step 650: 0.05603587254881859 Loss at step 700: 0.04736962541937828 Loss at step 750: 0.06492404639720917 Loss at step 800: 0.04676250368356705 Loss at step 850: 0.07509833574295044 Loss at step 900: 0.052781883627176285 Mean training loss after epoch 16: 0.046828101216348755 EPOCH: 17 Loss at step 0: 0.04518043249845505 Loss at step 50: 0.03734428063035011 Loss at step 100: 0.044894829392433167 Loss at step 150: 0.0556010864675045 Loss at step 200: 0.05231717973947525 Loss at step 250: 0.046200886368751526 Loss at step 300: 0.03805698826909065 Loss at step 350: 0.03610527142882347 Loss at step 400: 0.04848189279437065 Loss at step 450: 0.028467336669564247 Loss at step 500: 0.0361761711537838 Loss at step 550: 0.0370539166033268 Loss at step 600: 0.043320294469594955 Loss at step 650: 0.06098008155822754 Loss at step 700: 0.041510459035634995 Loss at step 750: 0.05186813697218895 Loss at step 800: 0.0351986400783062 Loss at step 850: 0.037117697298526764 Loss at step 900: 0.03967754915356636 Mean training loss after epoch 17: 0.04577488738543062 EPOCH: 18 Loss at step 0: 0.046201083809137344 Loss at step 50: 0.0443439818918705 Loss at step 100: 0.054377395659685135 Loss at step 150: 0.04721933230757713 Loss at step 200: 0.04115903377532959 Loss at step 250: 0.03896348178386688 Loss at step 300: 0.04418802261352539 Loss at step 350: 0.03786627948284149 Loss at step 400: 0.036552537232637405 Loss at step 450: 0.036637090146541595 Loss at step 500: 0.04750126972794533 Loss at step 550: 0.05561404302716255 Loss at step 600: 0.03819015994668007 Loss at step 650: 0.040957946330308914 Loss at step 700: 0.04186679422855377 Loss at step 750: 0.06359518319368362 Loss at step 800: 0.05069196969270706 Loss at step 850: 0.05281857028603554 Loss at step 900: 0.03926237300038338 Mean training loss after epoch 18: 0.04584055249569322 EPOCH: 19 Loss at step 0: 0.03931464999914169 Loss at step 50: 0.03443799167871475 Loss at step 100: 0.061100348830223083 Loss at step 150: 0.03554891422390938 Loss at step 200: 0.03427823260426521 Loss at step 250: 0.03775368630886078 Loss at step 300: 0.044856999069452286 Loss at step 350: 0.045644719153642654 Loss at step 400: 0.04319961741566658 Loss at step 450: 0.045037463307380676 Loss at step 500: 0.04194429889321327 Loss at step 550: 0.043922461569309235 Loss at step 600: 0.036279432475566864 Loss at step 650: 0.036259204149246216 Loss at step 700: 0.04240315034985542 Loss at step 750: 0.038166195154190063 Loss at step 800: 0.03892185166478157 Loss at step 850: 0.03561389073729515 Loss at step 900: 0.034657612442970276 Mean training loss after epoch 19: 0.04537794391896679 EPOCH: 20 Loss at step 0: 0.04980486258864403 Loss at step 50: 0.0399046316742897 Loss at step 100: 0.048348378390073776 Loss at step 150: 0.053706396371126175 Loss at step 200: 0.042325131595134735 Loss at step 250: 0.0398087315261364 Loss at step 300: 0.03794042393565178 Loss at step 350: 0.04836324602365494 Loss at step 400: 0.0561043880879879 Loss at step 450: 0.04225663095712662 Loss at step 500: 0.050038646906614304 Loss at step 550: 0.03533739224076271 Loss at step 600: 0.04639182984828949 Loss at step 650: 0.04378112033009529 Loss at step 700: 0.046276386827230453 Loss at step 750: 0.04993429034948349 Loss at step 800: 0.03606623411178589 Loss at step 850: 0.0573001466691494 Loss at step 900: 0.041887227445840836 Mean training loss after epoch 20: 0.04513302455936223 EPOCH: 21 Loss at step 0: 0.047755707055330276 Loss at step 50: 0.03141017630696297 Loss at step 100: 0.0471370629966259 Loss at step 150: 0.0551283098757267 Loss at step 200: 0.03729449585080147 Loss at step 250: 0.043121956288814545 Loss at step 300: 0.037084683775901794 Loss at step 350: 0.05062143877148628 Loss at step 400: 0.06583598256111145 Loss at step 450: 0.04296092689037323 Loss at step 500: 0.04771880432963371 Loss at step 550: 0.033939894288778305 Loss at step 600: 0.06935885548591614 Loss at step 650: 0.036335885524749756 Loss at step 700: 0.03918565809726715 Loss at step 750: 0.05254009738564491 Loss at step 800: 0.0372750461101532 Loss at step 850: 0.06892738491296768 Loss at step 900: 0.041418708860874176 Mean training loss after epoch 21: 0.04500063105440661 EPOCH: 22 Loss at step 0: 0.035148557275533676 Loss at step 50: 0.07158885896205902 Loss at step 100: 0.03762708231806755 Loss at step 150: 0.03869544714689255 Loss at step 200: 0.045696187764406204 Loss at step 250: 0.05302291736006737 Loss at step 300: 0.045397840440273285 Loss at step 350: 0.04022059217095375 Loss at step 400: 0.03989661484956741 Loss at step 450: 0.036009736359119415 Loss at step 500: 0.039285436272621155 Loss at step 550: 0.03704531490802765 Loss at step 600: 0.04350754991173744 Loss at step 650: 0.05553552135825157 Loss at step 700: 0.051583193242549896 Loss at step 750: 0.035207971930503845 Loss at step 800: 0.037553396075963974 Loss at step 850: 0.05680084973573685 Loss at step 900: 0.04120049253106117 Mean training loss after epoch 22: 0.0447093928550511 EPOCH: 23 Loss at step 0: 0.03783440962433815 Loss at step 50: 0.05536017566919327 Loss at step 100: 0.03400071710348129 Loss at step 150: 0.03878828510642052 Loss at step 200: 0.041732657700777054 Loss at step 250: 0.0433829165995121 Loss at step 300: 0.035492148250341415 Loss at step 350: 0.03446148708462715 Loss at step 400: 0.03248051553964615 Loss at step 450: 0.06526491791009903 Loss at step 500: 0.03704439476132393 Loss at step 550: 0.039148759096860886 Loss at step 600: 0.03867069631814957 Loss at step 650: 0.03926747664809227 Loss at step 700: 0.05320226401090622 Loss at step 750: 0.054149411618709564 Loss at step 800: 0.03957720100879669 Loss at step 850: 0.039691388607025146 Loss at step 900: 0.045606620609760284 Mean training loss after epoch 23: 0.044470616282700604 EPOCH: 24 Loss at step 0: 0.04650864377617836 Loss at step 50: 0.04052969440817833 Loss at step 100: 0.040921930223703384 Loss at step 150: 0.04234221577644348 Loss at step 200: 0.04005791246891022 Loss at step 250: 0.030816882848739624 Loss at step 300: 0.03885858505964279 Loss at step 350: 0.04244726523756981 Loss at step 400: 0.04633267968893051 Loss at step 450: 0.03887380287051201 Loss at step 500: 0.03060540370643139 Loss at step 550: 0.03648023679852486 Loss at step 600: 0.03821569308638573 Loss at step 650: 0.0390048511326313 Loss at step 700: 0.05740736797451973 Loss at step 750: 0.05048290640115738 Loss at step 800: 0.04146488755941391 Loss at step 850: 0.03429395332932472 Loss at step 900: 0.05217345803976059 Mean training loss after epoch 24: 0.04464735205509642 EPOCH: 25 Loss at step 0: 0.05005771294236183 Loss at step 50: 0.04012859985232353 Loss at step 100: 0.042899977415800095 Loss at step 150: 0.03493258357048035 Loss at step 200: 0.07089105248451233 Loss at step 250: 0.047123003751039505 Loss at step 300: 0.03955381363630295 Loss at step 350: 0.06360181421041489 Loss at step 400: 0.03887050971388817 Loss at step 450: 0.04412354156374931 Loss at step 500: 0.04529256001114845 Loss at step 550: 0.04032818228006363 Loss at step 600: 0.05209338292479515 Loss at step 650: 0.0425599068403244 Loss at step 700: 0.036708034574985504 Loss at step 750: 0.04328690469264984 Loss at step 800: 0.055562783032655716 Loss at step 850: 0.043608345091342926 Loss at step 900: 0.03736454248428345 Mean training loss after epoch 25: 0.04386292827297757 EPOCH: 26 Loss at step 0: 0.05515856668353081 Loss at step 50: 0.04354707524180412 Loss at step 100: 0.038486480712890625 Loss at step 150: 0.05573059618473053 Loss at step 200: 0.032634902745485306 Loss at step 250: 0.03851979598402977 Loss at step 300: 0.03949377313256264 Loss at step 350: 0.0471733957529068 Loss at step 400: 0.03927616775035858 Loss at step 450: 0.03867429122328758 Loss at step 500: 0.05111180990934372 Loss at step 550: 0.0520114004611969 Loss at step 600: 0.03997304290533066 Loss at step 650: 0.03773607313632965 Loss at step 700: 0.042556557804346085 Loss at step 750: 0.03986072540283203 Loss at step 800: 0.04331494867801666 Loss at step 850: 0.05162229388952255 Loss at step 900: 0.03865383565425873 Mean training loss after epoch 26: 0.04382917143182078 EPOCH: 27 Loss at step 0: 0.03430306911468506 Loss at step 50: 0.04677494987845421 Loss at step 100: 0.034297604113817215 Loss at step 150: 0.05992724746465683 Loss at step 200: 0.036741144955158234 Loss at step 250: 0.05155424028635025 Loss at step 300: 0.054275836795568466 Loss at step 350: 0.05484437942504883 Loss at step 400: 0.037665147334337234 Loss at step 450: 0.03397126495838165 Loss at step 500: 0.03523449972271919 Loss at step 550: 0.039269328117370605 Loss at step 600: 0.040531761944293976 Loss at step 650: 0.03606227785348892 Loss at step 700: 0.03627297654747963 Loss at step 750: 0.06994299590587616 Loss at step 800: 0.03893217071890831 Loss at step 850: 0.03401683270931244 Loss at step 900: 0.03729776293039322 Mean training loss after epoch 27: 0.04406887094484273 EPOCH: 28 Loss at step 0: 0.033236708492040634 Loss at step 50: 0.04051890969276428 Loss at step 100: 0.0386979766190052 Loss at step 150: 0.04833636060357094 Loss at step 200: 0.04366566613316536 Loss at step 250: 0.03755993768572807 Loss at step 300: 0.03423598036170006 Loss at step 350: 0.04578510671854019 Loss at step 400: 0.036333248019218445 Loss at step 450: 0.031157521530985832 Loss at step 500: 0.03514885902404785 Loss at step 550: 0.045480433851480484 Loss at step 600: 0.041569288820028305 Loss at step 650: 0.042210616171360016 Loss at step 700: 0.03702637925744057 Loss at step 750: 0.03146066516637802 Loss at step 800: 0.0388919822871685 Loss at step 850: 0.05460710823535919 Loss at step 900: 0.06076815724372864 Mean training loss after epoch 28: 0.04338920263768132 EPOCH: 29 Loss at step 0: 0.05113299936056137 Loss at step 50: 0.048249516636133194 Loss at step 100: 0.037999268621206284 Loss at step 150: 0.0373075008392334 Loss at step 200: 0.03410806506872177 Loss at step 250: 0.044861823320388794 Loss at step 300: 0.04133911058306694 Loss at step 350: 0.03280739486217499 Loss at step 400: 0.03223339840769768 Loss at step 450: 0.039397455751895905 Loss at step 500: 0.05658113211393356 Loss at step 550: 0.03554929047822952 Loss at step 600: 0.06543336808681488 Loss at step 650: 0.049131374806165695 Loss at step 700: 0.05585746094584465 Loss at step 750: 0.04687334597110748 Loss at step 800: 0.03191005811095238 Loss at step 850: 0.031938809901475906 Loss at step 900: 0.03746477887034416 Mean training loss after epoch 29: 0.044124060409711494 EPOCH: 30 Loss at step 0: 0.032684262841939926 Loss at step 50: 0.03756602853536606 Loss at step 100: 0.041453681886196136 Loss at step 150: 0.041892241686582565 Loss at step 200: 0.03094663843512535 Loss at step 250: 0.05277010798454285 Loss at step 300: 0.05041496083140373 Loss at step 350: 0.03589004650712013 Loss at step 400: 0.03318706154823303 Loss at step 450: 0.047618456184864044 Loss at step 500: 0.030878424644470215 Loss at step 550: 0.03503444790840149 Loss at step 600: 0.03603408485651016 Loss at step 650: 0.04117047041654587 Loss at step 700: 0.04891658574342728 Loss at step 750: 0.0349753238260746 Loss at step 800: 0.057963449507951736 Loss at step 850: 0.05564302206039429 Loss at step 900: 0.04321796074509621 Mean training loss after epoch 30: 0.04311676337314186 EPOCH: 31 Loss at step 0: 0.0543987974524498 Loss at step 50: 0.029258912429213524 Loss at step 100: 0.03925994038581848 Loss at step 150: 0.0369245707988739 Loss at step 200: 0.06379091739654541 Loss at step 250: 0.04208604618906975 Loss at step 300: 0.04706452786922455 Loss at step 350: 0.03690600395202637 Loss at step 400: 0.038193970918655396 Loss at step 450: 0.0689132958650589 Loss at step 500: 0.04634298384189606 Loss at step 550: 0.04795330390334129 Loss at step 600: 0.058100007474422455 Loss at step 650: 0.05691094323992729 Loss at step 700: 0.03779438138008118 Loss at step 750: 0.04262612387537956 Loss at step 800: 0.03465947136282921 Loss at step 850: 0.056568145751953125 Loss at step 900: 0.0351470410823822 Mean training loss after epoch 31: 0.043541477816397826 EPOCH: 32 Loss at step 0: 0.034166064113378525 Loss at step 50: 0.03686535358428955 Loss at step 100: 0.04622845724225044 Loss at step 150: 0.051726825535297394 Loss at step 200: 0.05568854138255119 Loss at step 250: 0.05901264026761055 Loss at step 300: 0.03604442626237869 Loss at step 350: 0.03537966310977936 Loss at step 400: 0.04667040705680847 Loss at step 450: 0.038151830434799194 Loss at step 500: 0.0585038997232914 Loss at step 550: 0.04130082204937935 Loss at step 600: 0.03147813305258751 Loss at step 650: 0.050793033093214035 Loss at step 700: 0.05356957018375397 Loss at step 750: 0.04346100240945816 Loss at step 800: 0.038438308984041214 Loss at step 850: 0.03511704504489899 Loss at step 900: 0.05421514809131622 Mean training loss after epoch 32: 0.04308389702927012 EPOCH: 33 Loss at step 0: 0.045620955526828766 Loss at step 50: 0.05138682201504707 Loss at step 100: 0.03177850693464279 Loss at step 150: 0.07932374626398087 Loss at step 200: 0.03234941512346268 Loss at step 250: 0.034716084599494934 Loss at step 300: 0.03937297686934471 Loss at step 350: 0.0359843410551548 Loss at step 400: 0.03958745300769806 Loss at step 450: 0.03357214853167534 Loss at step 500: 0.03532932326197624 Loss at step 550: 0.036571502685546875 Loss at step 600: 0.03673063963651657 Loss at step 650: 0.054065655916929245 Loss at step 700: 0.03816835954785347 Loss at step 750: 0.0355655699968338 Loss at step 800: 0.034650880843400955 Loss at step 850: 0.03534562140703201 Loss at step 900: 0.0441148616373539 Mean training loss after epoch 33: 0.04278865347562759 EPOCH: 34 Loss at step 0: 0.039698902517557144 Loss at step 50: 0.03754832595586777 Loss at step 100: 0.05479367822408676 Loss at step 150: 0.06518157571554184 Loss at step 200: 0.034476716071367264 Loss at step 250: 0.0375492163002491 Loss at step 300: 0.04562554508447647 Loss at step 350: 0.03773370012640953 Loss at step 400: 0.02849115990102291 Loss at step 450: 0.03514283150434494 Loss at step 500: 0.04999001696705818 Loss at step 550: 0.03032141923904419 Loss at step 600: 0.03616143763065338 Loss at step 650: 0.036334775388240814 Loss at step 700: 0.037501461803913116 Loss at step 750: 0.03822603076696396 Loss at step 800: 0.034096747636795044 Loss at step 850: 0.035960737615823746 Loss at step 900: 0.03675620257854462 Mean training loss after epoch 34: 0.0426741217229285 EPOCH: 35 Loss at step 0: 0.04510198161005974 Loss at step 50: 0.033604975789785385 Loss at step 100: 0.04437999427318573 Loss at step 150: 0.03433213755488396 Loss at step 200: 0.041162557899951935 Loss at step 250: 0.035488951951265335 Loss at step 300: 0.03492167964577675 Loss at step 350: 0.034426961094141006 Loss at step 400: 0.036935318261384964 Loss at step 450: 0.04038716107606888 Loss at step 500: 0.041272640228271484 Loss at step 550: 0.04316993057727814 Loss at step 600: 0.03621558099985123 Loss at step 650: 0.029870107769966125 Loss at step 700: 0.04963037744164467 Loss at step 750: 0.02950064279139042 Loss at step 800: 0.04474369436502457 Loss at step 850: 0.04101833328604698 Loss at step 900: 0.07014463096857071 Mean training loss after epoch 35: 0.04320456636453997 EPOCH: 36 Loss at step 0: 0.025993628427386284 Loss at step 50: 0.04897555336356163 Loss at step 100: 0.03934851288795471 Loss at step 150: 0.06618720293045044 Loss at step 200: 0.04162256792187691 Loss at step 250: 0.04871177673339844 Loss at step 300: 0.03265015035867691 Loss at step 350: 0.034183815121650696 Loss at step 400: 0.0524582602083683 Loss at step 450: 0.044084955006837845 Loss at step 500: 0.03780914098024368 Loss at step 550: 0.03755180537700653 Loss at step 600: 0.040510088205337524 Loss at step 650: 0.0388263538479805 Loss at step 700: 0.03912876173853874 Loss at step 750: 0.03538523241877556 Loss at step 800: 0.036377277225255966 Loss at step 850: 0.03163236752152443 Loss at step 900: 0.03611813858151436 Mean training loss after epoch 36: 0.04283352621368317 EPOCH: 37 Loss at step 0: 0.04283830150961876 Loss at step 50: 0.03797575458884239 Loss at step 100: 0.03365718200802803 Loss at step 150: 0.03667090833187103 Loss at step 200: 0.03226551041007042 Loss at step 250: 0.04704499617218971 Loss at step 300: 0.039291687309741974 Loss at step 350: 0.03681568056344986 Loss at step 400: 0.04530000686645508 Loss at step 450: 0.04133486747741699 Loss at step 500: 0.03882759064435959 Loss at step 550: 0.04363678768277168 Loss at step 600: 0.04127722233533859 Loss at step 650: 0.03316343575716019 Loss at step 700: 0.03651943430304527 Loss at step 750: 0.0345945842564106 Loss at step 800: 0.05486150458455086 Loss at step 850: 0.05823587253689766 Loss at step 900: 0.03287921100854874 Mean training loss after epoch 37: 0.04276241857741179 EPOCH: 38 Loss at step 0: 0.03594166412949562 Loss at step 50: 0.0297856442630291 Loss at step 100: 0.052678052335977554 Loss at step 150: 0.03669226914644241 Loss at step 200: 0.05482405424118042 Loss at step 250: 0.031136151403188705 Loss at step 300: 0.033809252083301544 Loss at step 350: 0.040372610092163086 Loss at step 400: 0.04138439521193504 Loss at step 450: 0.039938390254974365 Loss at step 500: 0.044176481664180756 Loss at step 550: 0.030343325808644295 Loss at step 600: 0.039482660591602325 Loss at step 650: 0.044072531163692474 Loss at step 700: 0.03978501632809639 Loss at step 750: 0.04050922393798828 Loss at step 800: 0.029906991869211197 Loss at step 850: 0.041337598115205765 Loss at step 900: 0.05419551953673363 Mean training loss after epoch 38: 0.042512604295571986 EPOCH: 39 Loss at step 0: 0.05286071449518204 Loss at step 50: 0.037840358912944794 Loss at step 100: 0.045970458537340164 Loss at step 150: 0.05866038054227829 Loss at step 200: 0.029906943440437317 Loss at step 250: 0.032362062484025955 Loss at step 300: 0.044867709279060364 Loss at step 350: 0.043735552579164505 Loss at step 400: 0.03765714168548584 Loss at step 450: 0.03833089396357536 Loss at step 500: 0.05327044427394867 Loss at step 550: 0.06563251465559006 Loss at step 600: 0.06387274712324142 Loss at step 650: 0.032957326620817184 Loss at step 700: 0.03684427961707115 Loss at step 750: 0.03934033587574959 Loss at step 800: 0.035481445491313934 Loss at step 850: 0.04319925606250763 Loss at step 900: 0.03513837605714798 Mean training loss after epoch 39: 0.042854417073748895 EPOCH: 40 Loss at step 0: 0.035943735390901566 Loss at step 50: 0.04122233763337135 Loss at step 100: 0.051782965660095215 Loss at step 150: 0.0410107783973217 Loss at step 200: 0.042942795902490616 Loss at step 250: 0.03462718427181244 Loss at step 300: 0.05719525367021561 Loss at step 350: 0.043004732578992844 Loss at step 400: 0.03742880001664162 Loss at step 450: 0.0474843792617321 Loss at step 500: 0.05895926058292389 Loss at step 550: 0.04313330724835396 Loss at step 600: 0.055879756808280945 Loss at step 650: 0.03325164318084717 Loss at step 700: 0.06301034986972809 Loss at step 750: 0.05393299087882042 Loss at step 800: 0.031038381159305573 Loss at step 850: 0.02856467105448246 Loss at step 900: 0.036049164831638336 Mean training loss after epoch 40: 0.0419790779452906 EPOCH: 41 Loss at step 0: 0.04042711481451988 Loss at step 50: 0.07221192866563797 Loss at step 100: 0.034279171377420425 Loss at step 150: 0.03359357267618179 Loss at step 200: 0.04015154391527176 Loss at step 250: 0.05523853749036789 Loss at step 300: 0.034587420523166656 Loss at step 350: 0.041955869644880295 Loss at step 400: 0.054641157388687134 Loss at step 450: 0.0579570010304451 Loss at step 500: 0.043109264224767685 Loss at step 550: 0.05567030981183052 Loss at step 600: 0.03030386194586754 Loss at step 650: 0.040729980915784836 Loss at step 700: 0.030166294425725937 Loss at step 750: 0.03686067834496498 Loss at step 800: 0.03158535063266754 Loss at step 850: 0.03705933690071106 Loss at step 900: 0.028601987287402153 Mean training loss after epoch 41: 0.043171245787443635 EPOCH: 42 Loss at step 0: 0.03731156885623932 Loss at step 50: 0.029425479471683502 Loss at step 100: 0.033947236835956573 Loss at step 150: 0.03268137574195862 Loss at step 200: 0.03908024728298187 Loss at step 250: 0.034402694553136826 Loss at step 300: 0.03026057779788971 Loss at step 350: 0.06623826920986176 Loss at step 400: 0.04798921197652817 Loss at step 450: 0.05254777520895004 Loss at step 500: 0.054360173642635345 Loss at step 550: 0.039399199187755585 Loss at step 600: 0.052537377923727036 Loss at step 650: 0.039711952209472656 Loss at step 700: 0.05434822291135788 Loss at step 750: 0.06079496443271637 Loss at step 800: 0.035274192690849304 Loss at step 850: 0.03495738282799721 Loss at step 900: 0.03135225549340248 Mean training loss after epoch 42: 0.04192801442410328 EPOCH: 43 Loss at step 0: 0.038916539400815964 Loss at step 50: 0.03431229665875435 Loss at step 100: 0.04213782772421837 Loss at step 150: 0.04049212858080864 Loss at step 200: 0.03142905607819557 Loss at step 250: 0.05235402286052704 Loss at step 300: 0.035686664283275604 Loss at step 350: 0.03123079054057598 Loss at step 400: 0.06660410761833191 Loss at step 450: 0.03638329729437828 Loss at step 500: 0.03898695856332779 Loss at step 550: 0.05099507048726082 Loss at step 600: 0.03619147464632988 Loss at step 650: 0.03628411889076233 Loss at step 700: 0.045269936323165894 Loss at step 750: 0.04766134172677994 Loss at step 800: 0.033041831105947495 Loss at step 850: 0.054613228887319565 Loss at step 900: 0.0532316192984581 Mean training loss after epoch 43: 0.042171207830500504 EPOCH: 44 Loss at step 0: 0.0389120914041996 Loss at step 50: 0.04755499213933945 Loss at step 100: 0.03901233524084091 Loss at step 150: 0.052873991429805756 Loss at step 200: 0.05334090068936348 Loss at step 250: 0.0397360622882843 Loss at step 300: 0.03453391417860985 Loss at step 350: 0.04875922575592995 Loss at step 400: 0.027675088495016098 Loss at step 450: 0.03275521472096443 Loss at step 500: 0.047318194061517715 Loss at step 550: 0.0374000184237957 Loss at step 600: 0.03966290503740311 Loss at step 650: 0.044589605182409286 Loss at step 700: 0.03750285506248474 Loss at step 750: 0.03201235830783844 Loss at step 800: 0.04894779995083809 Loss at step 850: 0.07314945012331009 Loss at step 900: 0.056103695183992386 Mean training loss after epoch 44: 0.041850487030407134 EPOCH: 45 Loss at step 0: 0.04078466817736626 Loss at step 50: 0.050380922853946686 Loss at step 100: 0.060176048427820206 Loss at step 150: 0.039128080010414124 Loss at step 200: 0.04138997197151184 Loss at step 250: 0.03647807240486145 Loss at step 300: 0.03359954059123993 Loss at step 350: 0.03585640713572502 Loss at step 400: 0.049841053783893585 Loss at step 450: 0.03362548351287842 Loss at step 500: 0.038759615272283554 Loss at step 550: 0.05031571537256241 Loss at step 600: 0.03527596965432167 Loss at step 650: 0.050249405205249786 Loss at step 700: 0.05455591157078743 Loss at step 750: 0.03572722151875496 Loss at step 800: 0.03366929292678833 Loss at step 850: 0.056674279272556305 Loss at step 900: 0.06866861879825592 Mean training loss after epoch 45: 0.04254473557175477 EPOCH: 46 Loss at step 0: 0.04400390386581421 Loss at step 50: 0.034666698426008224 Loss at step 100: 0.03618445247411728 Loss at step 150: 0.03352349251508713 Loss at step 200: 0.03571430966258049 Loss at step 250: 0.047723229974508286 Loss at step 300: 0.03910943493247032 Loss at step 350: 0.044531866908073425 Loss at step 400: 0.04934946447610855 Loss at step 450: 0.05399491265416145 Loss at step 500: 0.04363541677594185 Loss at step 550: 0.04296290874481201 Loss at step 600: 0.02955658920109272 Loss at step 650: 0.034045398235321045 Loss at step 700: 0.047498684376478195 Loss at step 750: 0.03900916129350662 Loss at step 800: 0.04976839944720268 Loss at step 850: 0.036251235753297806 Loss at step 900: 0.037588220089673996 Mean training loss after epoch 46: 0.04180497804414362 EPOCH: 47 Loss at step 0: 0.029707089066505432 Loss at step 50: 0.04782122001051903 Loss at step 100: 0.038219474256038666 Loss at step 150: 0.06238754466176033 Loss at step 200: 0.046214696019887924 Loss at step 250: 0.03892183303833008 Loss at step 300: 0.050494708120822906 Loss at step 350: 0.03603242337703705 Loss at step 400: 0.03870635852217674 Loss at step 450: 0.0663483589887619 Loss at step 500: 0.03352940082550049 Loss at step 550: 0.037131693214178085 Loss at step 600: 0.0331479050219059 Loss at step 650: 0.05511682108044624 Loss at step 700: 0.04531248286366463 Loss at step 750: 0.05891970545053482 Loss at step 800: 0.03977537155151367 Loss at step 850: 0.03603599965572357 Loss at step 900: 0.05687648430466652 Mean training loss after epoch 47: 0.0420258793630389 EPOCH: 48 Loss at step 0: 0.037126753479242325 Loss at step 50: 0.03995167091488838 Loss at step 100: 0.0385952852666378 Loss at step 150: 0.03452024608850479 Loss at step 200: 0.0516742467880249 Loss at step 250: 0.034860219806432724 Loss at step 300: 0.03432386741042137 Loss at step 350: 0.046457309275865555 Loss at step 400: 0.035395506769418716 Loss at step 450: 0.04863947629928589 Loss at step 500: 0.039817485958337784 Loss at step 550: 0.05280037224292755 Loss at step 600: 0.07884229719638824 Loss at step 650: 0.048403359949588776 Loss at step 700: 0.04747028648853302 Loss at step 750: 0.03548581153154373 Loss at step 800: 0.04720131307840347 Loss at step 850: 0.04094947874546051 Loss at step 900: 0.0464005284011364 Mean training loss after epoch 48: 0.042492505440961066 EPOCH: 49 Loss at step 0: 0.03580869734287262 Loss at step 50: 0.03737710416316986 Loss at step 100: 0.03427230194211006 Loss at step 150: 0.03864447399973869 Loss at step 200: 0.03881022334098816 Loss at step 250: 0.03382013365626335 Loss at step 300: 0.040609877556562424 Loss at step 350: 0.03813561797142029 Loss at step 400: 0.04044758901000023 Loss at step 450: 0.03480546921491623 Loss at step 500: 0.03296447917819023 Loss at step 550: 0.03343787416815758 Loss at step 600: 0.04355775564908981 Loss at step 650: 0.03697530925273895 Loss at step 700: 0.04083674028515816 Loss at step 750: 0.03362608328461647 Loss at step 800: 0.028002602979540825 Loss at step 850: 0.04565693438053131 Loss at step 900: 0.041313041001558304 Mean training loss after epoch 49: 0.0411302426329522 EPOCH: 50 Loss at step 0: 0.05600229278206825 Loss at step 50: 0.03500762954354286 Loss at step 100: 0.034702807664871216 Loss at step 150: 0.05114007741212845 Loss at step 200: 0.03988618776202202 Loss at step 250: 0.047436732798814774 Loss at step 300: 0.03933500126004219 Loss at step 350: 0.03835143893957138 Loss at step 400: 0.03729194030165672 Loss at step 450: 0.03136274591088295 Loss at step 500: 0.04722240939736366 Loss at step 550: 0.03327949717640877 Loss at step 600: 0.050426580011844635 Loss at step 650: 0.0342482328414917 Loss at step 700: 0.048615675419569016 Loss at step 750: 0.05062200501561165 Loss at step 800: 0.05415920540690422 Loss at step 850: 0.03786095231771469 Loss at step 900: 0.04295073449611664 Mean training loss after epoch 50: 0.04185620344071182 EPOCH: 51 Loss at step 0: 0.0365816093981266 Loss at step 50: 0.04206123948097229 Loss at step 100: 0.030760951340198517 Loss at step 150: 0.038841888308525085 Loss at step 200: 0.03523499146103859 Loss at step 250: 0.03495549038052559 Loss at step 300: 0.03674435615539551 Loss at step 350: 0.03339245915412903 Loss at step 400: 0.04231112077832222 Loss at step 450: 0.04484392702579498 Loss at step 500: 0.03611406311392784 Loss at step 550: 0.031036855652928352 Loss at step 600: 0.0323999859392643 Loss at step 650: 0.06802742928266525 Loss at step 700: 0.04545126482844353 Loss at step 750: 0.03140132129192352 Loss at step 800: 0.03638726472854614 Loss at step 850: 0.028535958379507065 Loss at step 900: 0.03602641075849533 Mean training loss after epoch 51: 0.041764994421953965 EPOCH: 52 Loss at step 0: 0.03624102473258972 Loss at step 50: 0.028261560946702957 Loss at step 100: 0.03621610626578331 Loss at step 150: 0.04555024206638336 Loss at step 200: 0.031065862625837326 Loss at step 250: 0.053021691739559174 Loss at step 300: 0.03265215456485748 Loss at step 350: 0.03725382685661316 Loss at step 400: 0.04863378778100014 Loss at step 450: 0.038839466869831085 Loss at step 500: 0.0398586243391037 Loss at step 550: 0.060210585594177246 Loss at step 600: 0.036141108721494675 Loss at step 650: 0.04793522134423256 Loss at step 700: 0.04301222041249275 Loss at step 750: 0.03841105476021767 Loss at step 800: 0.03010324016213417 Loss at step 850: 0.03730560839176178 Loss at step 900: 0.04455268010497093 Mean training loss after epoch 52: 0.041581095702676124 EPOCH: 53 Loss at step 0: 0.04127964749932289 Loss at step 50: 0.06336968392133713 Loss at step 100: 0.037742964923381805 Loss at step 150: 0.06472226977348328 Loss at step 200: 0.04113401472568512 Loss at step 250: 0.05065424367785454 Loss at step 300: 0.038604240864515305 Loss at step 350: 0.03237632289528847 Loss at step 400: 0.03383887931704521 Loss at step 450: 0.037273745983839035 Loss at step 500: 0.04051177576184273 Loss at step 550: 0.042496323585510254 Loss at step 600: 0.045325327664613724 Loss at step 650: 0.038446709513664246 Loss at step 700: 0.03412953019142151 Loss at step 750: 0.03683246672153473 Loss at step 800: 0.03079565055668354 Loss at step 850: 0.03118554688990116 Loss at step 900: 0.03316593915224075 Mean training loss after epoch 53: 0.04193888824464861 EPOCH: 54 Loss at step 0: 0.028049902990460396 Loss at step 50: 0.035439297556877136 Loss at step 100: 0.03789333999156952 Loss at step 150: 0.03792596235871315 Loss at step 200: 0.039335861802101135 Loss at step 250: 0.04706595465540886 Loss at step 300: 0.03727412223815918 Loss at step 350: 0.03848860412836075 Loss at step 400: 0.06445587426424026 Loss at step 450: 0.03863266482949257 Loss at step 500: 0.03326078876852989 Loss at step 550: 0.036470212042331696 Loss at step 600: 0.05505052208900452 Loss at step 650: 0.03974293917417526 Loss at step 700: 0.04287637770175934 Loss at step 750: 0.050755128264427185 Loss at step 800: 0.03950411453843117 Loss at step 850: 0.043459244072437286 Loss at step 900: 0.04161952808499336 Mean training loss after epoch 54: 0.0413095010702671 EPOCH: 55 Loss at step 0: 0.03458162024617195 Loss at step 50: 0.03692202642560005 Loss at step 100: 0.05423977971076965 Loss at step 150: 0.03795550763607025 Loss at step 200: 0.07981667667627335 Loss at step 250: 0.05591676011681557 Loss at step 300: 0.0461648553609848 Loss at step 350: 0.04785335063934326 Loss at step 400: 0.04649996757507324 Loss at step 450: 0.03327995911240578 Loss at step 500: 0.031735729426145554 Loss at step 550: 0.03526368737220764 Loss at step 600: 0.05189451947808266 Loss at step 650: 0.03438114747405052 Loss at step 700: 0.029040316119790077 Loss at step 750: 0.03763226419687271 Loss at step 800: 0.034916192293167114 Loss at step 850: 0.03035568818449974 Loss at step 900: 0.03768126666545868 Mean training loss after epoch 55: 0.041049283079262866 EPOCH: 56 Loss at step 0: 0.04755246639251709 Loss at step 50: 0.03922996297478676 Loss at step 100: 0.0300825834274292 Loss at step 150: 0.04195769876241684 Loss at step 200: 0.04849210008978844 Loss at step 250: 0.036399759352207184 Loss at step 300: 0.04363933205604553 Loss at step 350: 0.03892781212925911 Loss at step 400: 0.03183449059724808 Loss at step 450: 0.055824920535087585 Loss at step 500: 0.034369099885225296 Loss at step 550: 0.04003513976931572 Loss at step 600: 0.052378661930561066 Loss at step 650: 0.04061546549201012 Loss at step 700: 0.03651702404022217 Loss at step 750: 0.06485359370708466 Loss at step 800: 0.0504290908575058 Loss at step 850: 0.057005930691957474 Loss at step 900: 0.0482775904238224 Mean training loss after epoch 56: 0.04128387348968655 EPOCH: 57 Loss at step 0: 0.036981891840696335 Loss at step 50: 0.04764796048402786 Loss at step 100: 0.037701841443777084 Loss at step 150: 0.03614027053117752 Loss at step 200: 0.032591626048088074 Loss at step 250: 0.05211574211716652 Loss at step 300: 0.04101431369781494 Loss at step 350: 0.03576326742768288 Loss at step 400: 0.034557800740003586 Loss at step 450: 0.024434883147478104 Loss at step 500: 0.03391573578119278 Loss at step 550: 0.03371918201446533 Loss at step 600: 0.029611479490995407 Loss at step 650: 0.06152529641985893 Loss at step 700: 0.028848448768258095 Loss at step 750: 0.04482829570770264 Loss at step 800: 0.036650799214839935 Loss at step 850: 0.034292809665203094 Loss at step 900: 0.04071585461497307 Mean training loss after epoch 57: 0.041509310123143295 EPOCH: 58 Loss at step 0: 0.0313471257686615 Loss at step 50: 0.03477931022644043 Loss at step 100: 0.03902506083250046 Loss at step 150: 0.03168673813343048 Loss at step 200: 0.04132472723722458 Loss at step 250: 0.0303280558437109 Loss at step 300: 0.04293641075491905 Loss at step 350: 0.0368681438267231 Loss at step 400: 0.03749140724539757 Loss at step 450: 0.05166896432638168 Loss at step 500: 0.03380604833364487 Loss at step 550: 0.03452663868665695 Loss at step 600: 0.0734071433544159 Loss at step 650: 0.037792935967445374 Loss at step 700: 0.0323927104473114 Loss at step 750: 0.05200832709670067 Loss at step 800: 0.035054467618465424 Loss at step 850: 0.03371467813849449 Loss at step 900: 0.028349649161100388 Mean training loss after epoch 58: 0.04127057128623605 EPOCH: 59 Loss at step 0: 0.04468866065144539 Loss at step 50: 0.05357292294502258 Loss at step 100: 0.02892059087753296 Loss at step 150: 0.05689992010593414 Loss at step 200: 0.03328373655676842 Loss at step 250: 0.03268301114439964 Loss at step 300: 0.03351958841085434 Loss at step 350: 0.03822989761829376 Loss at step 400: 0.0435686893761158 Loss at step 450: 0.034768469631671906 Loss at step 500: 0.036948367953300476 Loss at step 550: 0.08341197669506073 Loss at step 600: 0.05658436194062233 Loss at step 650: 0.03157483786344528 Loss at step 700: 0.029805950820446014 Loss at step 750: 0.03255327045917511 Loss at step 800: 0.04871029779314995 Loss at step 850: 0.05279934033751488 Loss at step 900: 0.04329531267285347 Mean training loss after epoch 59: 0.04119420869510247 EPOCH: 60 Loss at step 0: 0.03733966872096062 Loss at step 50: 0.035529088228940964 Loss at step 100: 0.034515928477048874 Loss at step 150: 0.046241313219070435 Loss at step 200: 0.04218694195151329 Loss at step 250: 0.04049185663461685 Loss at step 300: 0.0434955470263958 Loss at step 350: 0.036528635770082474 Loss at step 400: 0.06843111664056778 Loss at step 450: 0.03414440155029297 Loss at step 500: 0.04126294329762459 Loss at step 550: 0.03592269867658615 Loss at step 600: 0.03903070464730263 Loss at step 650: 0.03635513409972191 Loss at step 700: 0.03650194779038429 Loss at step 750: 0.053711291402578354 Loss at step 800: 0.03540651872754097 Loss at step 850: 0.03359445556998253 Loss at step 900: 0.04086891561746597 Mean training loss after epoch 60: 0.041094930232889744 EPOCH: 61 Loss at step 0: 0.06793402135372162 Loss at step 50: 0.03945723921060562 Loss at step 100: 0.03619696572422981 Loss at step 150: 0.03008495457470417 Loss at step 200: 0.04227340966463089 Loss at step 250: 0.0383225679397583 Loss at step 300: 0.033168453723192215 Loss at step 350: 0.04032664746046066 Loss at step 400: 0.035941194742918015 Loss at step 450: 0.037784017622470856 Loss at step 500: 0.02421642281115055 Loss at step 550: 0.03739095479249954 Loss at step 600: 0.03398888185620308 Loss at step 650: 0.0663064643740654 Loss at step 700: 0.032433681190013885 Loss at step 750: 0.03640909865498543 Loss at step 800: 0.02945072203874588 Loss at step 850: 0.03605405241250992 Loss at step 900: 0.027247510850429535 Mean training loss after epoch 61: 0.04085269900384361 EPOCH: 62 Loss at step 0: 0.059905070811510086 Loss at step 50: 0.03902515396475792 Loss at step 100: 0.05490829423069954 Loss at step 150: 0.04435816407203674 Loss at step 200: 0.0391550250351429 Loss at step 250: 0.035643305629491806 Loss at step 300: 0.03209603577852249 Loss at step 350: 0.04543907195329666 Loss at step 400: 0.03586803376674652 Loss at step 450: 0.035627275705337524 Loss at step 500: 0.03927553817629814 Loss at step 550: 0.0516105554997921 Loss at step 600: 0.039572179317474365 Loss at step 650: 0.04354295879602432 Loss at step 700: 0.03606712818145752 Loss at step 750: 0.03259272500872612 Loss at step 800: 0.040457241237163544 Loss at step 850: 0.0515160858631134 Loss at step 900: 0.040600404143333435 Mean training loss after epoch 62: 0.04113285329296137 EPOCH: 63 Loss at step 0: 0.048996370285749435 Loss at step 50: 0.03926750645041466 Loss at step 100: 0.02985282801091671 Loss at step 150: 0.0465315580368042 Loss at step 200: 0.041508711874485016 Loss at step 250: 0.036188140511512756 Loss at step 300: 0.04599754884839058 Loss at step 350: 0.03635888919234276 Loss at step 400: 0.03991895169019699 Loss at step 450: 0.0426996573805809 Loss at step 500: 0.04454076290130615 Loss at step 550: 0.029493087902665138 Loss at step 600: 0.05105109140276909 Loss at step 650: 0.034425780177116394 Loss at step 700: 0.046669840812683105 Loss at step 750: 0.03993423283100128 Loss at step 800: 0.03684605658054352 Loss at step 850: 0.036167316138744354 Loss at step 900: 0.036040980368852615 Mean training loss after epoch 63: 0.04108375019387904 EPOCH: 64 Loss at step 0: 0.05066833272576332 Loss at step 50: 0.03873347491025925 Loss at step 100: 0.03917282447218895 Loss at step 150: 0.03228982910513878 Loss at step 200: 0.03253572806715965 Loss at step 250: 0.035392191261053085 Loss at step 300: 0.033903349190950394 Loss at step 350: 0.0356026366353035 Loss at step 400: 0.036322787404060364 Loss at step 450: 0.03833646699786186 Loss at step 500: 0.040528930723667145 Loss at step 550: 0.03655237331986427 Loss at step 600: 0.05067242309451103 Loss at step 650: 0.03306589648127556 Loss at step 700: 0.052661262452602386 Loss at step 750: 0.034153275191783905 Loss at step 800: 0.038508862257003784 Loss at step 850: 0.030761387199163437 Loss at step 900: 0.05371908098459244 Mean training loss after epoch 64: 0.0410062268312806 EPOCH: 65 Loss at step 0: 0.05412497743964195 Loss at step 50: 0.04372255504131317 Loss at step 100: 0.03498004004359245 Loss at step 150: 0.0381084568798542 Loss at step 200: 0.05362217500805855 Loss at step 250: 0.03024093434214592 Loss at step 300: 0.02933393232524395 Loss at step 350: 0.037758294492959976 Loss at step 400: 0.03605731949210167 Loss at step 450: 0.04783293232321739 Loss at step 500: 0.03717530518770218 Loss at step 550: 0.04843413084745407 Loss at step 600: 0.03646853566169739 Loss at step 650: 0.05176900327205658 Loss at step 700: 0.038385361433029175 Loss at step 750: 0.03823227062821388 Loss at step 800: 0.036260705441236496 Loss at step 850: 0.03468559682369232 Loss at step 900: 0.03480484336614609 Mean training loss after epoch 65: 0.04095594453682968 EPOCH: 66 Loss at step 0: 0.04693916440010071 Loss at step 50: 0.04649669677019119 Loss at step 100: 0.04137570783495903 Loss at step 150: 0.05023961886763573 Loss at step 200: 0.032473403960466385 Loss at step 250: 0.03336715325713158 Loss at step 300: 0.03639718517661095 Loss at step 350: 0.043084386736154556 Loss at step 400: 0.04095667973160744 Loss at step 450: 0.027839913964271545 Loss at step 500: 0.035986460745334625 Loss at step 550: 0.03168216720223427 Loss at step 600: 0.052309513092041016 Loss at step 650: 0.042334701865911484 Loss at step 700: 0.0339987650513649 Loss at step 750: 0.04019499570131302 Loss at step 800: 0.0642513707280159 Loss at step 850: 0.043473485857248306 Loss at step 900: 0.04131825640797615 Mean training loss after epoch 66: 0.04144210817772887 EPOCH: 67 Loss at step 0: 0.03357565030455589 Loss at step 50: 0.03570257127285004 Loss at step 100: 0.036334078758955 Loss at step 150: 0.03323247283697128 Loss at step 200: 0.03403712064027786 Loss at step 250: 0.057462427765131 Loss at step 300: 0.0384344644844532 Loss at step 350: 0.07729094475507736 Loss at step 400: 0.033858273178339005 Loss at step 450: 0.057260893285274506 Loss at step 500: 0.043454766273498535 Loss at step 550: 0.038910143077373505 Loss at step 600: 0.05510319024324417 Loss at step 650: 0.034722913056612015 Loss at step 700: 0.038881246000528336 Loss at step 750: 0.03272947296500206 Loss at step 800: 0.06547453999519348 Loss at step 850: 0.05286339297890663 Loss at step 900: 0.03724418953061104 Mean training loss after epoch 67: 0.04108985567064301 EPOCH: 68 Loss at step 0: 0.05025934427976608 Loss at step 50: 0.03896684944629669 Loss at step 100: 0.03575936704874039 Loss at step 150: 0.04318174347281456 Loss at step 200: 0.03338000550866127 Loss at step 250: 0.04982759431004524 Loss at step 300: 0.06833413243293762 Loss at step 350: 0.042229581624269485 Loss at step 400: 0.049448832869529724 Loss at step 450: 0.04598211497068405 Loss at step 500: 0.04985463619232178 Loss at step 550: 0.055369824171066284 Loss at step 600: 0.03471841663122177 Loss at step 650: 0.05153518542647362 Loss at step 700: 0.03576911240816116 Loss at step 750: 0.04154135286808014 Loss at step 800: 0.056176457554101944 Loss at step 850: 0.054225899279117584 Loss at step 900: 0.061161283403635025 Mean training loss after epoch 68: 0.04150341175147084 EPOCH: 69 Loss at step 0: 0.037399593740701675 Loss at step 50: 0.06563806533813477 Loss at step 100: 0.033014923334121704 Loss at step 150: 0.03403446450829506 Loss at step 200: 0.0331198051571846 Loss at step 250: 0.04146401211619377 Loss at step 300: 0.035946790128946304 Loss at step 350: 0.03914831951260567 Loss at step 400: 0.047792062163352966 Loss at step 450: 0.039913859218358994 Loss at step 500: 0.04979826509952545 Loss at step 550: 0.048698727041482925 Loss at step 600: 0.04997538402676582 Loss at step 650: 0.03111293353140354 Loss at step 700: 0.03876088187098503 Loss at step 750: 0.0424998477101326 Loss at step 800: 0.04018615931272507 Loss at step 850: 0.03831510990858078 Loss at step 900: 0.05775139480829239 Mean training loss after epoch 69: 0.04081546967384467 EPOCH: 70 Loss at step 0: 0.0448480024933815 Loss at step 50: 0.03368306905031204 Loss at step 100: 0.03058331087231636 Loss at step 150: 0.04635756090283394 Loss at step 200: 0.031106065958738327 Loss at step 250: 0.028866348788142204 Loss at step 300: 0.0359669029712677 Loss at step 350: 0.037562355399131775 Loss at step 400: 0.039168864488601685 Loss at step 450: 0.034104906022548676 Loss at step 500: 0.05071188509464264 Loss at step 550: 0.06088744103908539 Loss at step 600: 0.03136952593922615 Loss at step 650: 0.045640427619218826 Loss at step 700: 0.034992773085832596 Loss at step 750: 0.03277074545621872 Loss at step 800: 0.037216685712337494 Loss at step 850: 0.037963252514600754 Loss at step 900: 0.03827926889061928 Mean training loss after epoch 70: 0.04093026930787988 EPOCH: 71 Loss at step 0: 0.04572658613324165 Loss at step 50: 0.03692872077226639 Loss at step 100: 0.0348975844681263 Loss at step 150: 0.03698219358921051 Loss at step 200: 0.04490145668387413 Loss at step 250: 0.028855443000793457 Loss at step 300: 0.037085115909576416 Loss at step 350: 0.03344681113958359 Loss at step 400: 0.03997166082262993 Loss at step 450: 0.04058486595749855 Loss at step 500: 0.05359076336026192 Loss at step 550: 0.03302691504359245 Loss at step 600: 0.03402107581496239 Loss at step 650: 0.03605677932500839 Loss at step 700: 0.047627322375774384 Loss at step 750: 0.05416946858167648 Loss at step 800: 0.06014397740364075 Loss at step 850: 0.058849312365055084 Loss at step 900: 0.03739443048834801 Mean training loss after epoch 71: 0.0409836712033192 EPOCH: 72 Loss at step 0: 0.04265473783016205 Loss at step 50: 0.0323101244866848 Loss at step 100: 0.038401905447244644 Loss at step 150: 0.03111894242465496 Loss at step 200: 0.03982328623533249 Loss at step 250: 0.04645061492919922 Loss at step 300: 0.03536989912390709 Loss at step 350: 0.0634487196803093 Loss at step 400: 0.03881988674402237 Loss at step 450: 0.035606421530246735 Loss at step 500: 0.03894732892513275 Loss at step 550: 0.03749678656458855 Loss at step 600: 0.03382717818021774 Loss at step 650: 0.03612907975912094 Loss at step 700: 0.05400211736559868 Loss at step 750: 0.03785938397049904 Loss at step 800: 0.041893694549798965 Loss at step 850: 0.03943910449743271 Loss at step 900: 0.03425530716776848 Mean training loss after epoch 72: 0.04133278897592127 EPOCH: 73 Loss at step 0: 0.03869491443037987 Loss at step 50: 0.039199668914079666 Loss at step 100: 0.06549392640590668 Loss at step 150: 0.031088722869753838 Loss at step 200: 0.032884228974580765 Loss at step 250: 0.048584263771772385 Loss at step 300: 0.0363716185092926 Loss at step 350: 0.03723735734820366 Loss at step 400: 0.033744048327207565 Loss at step 450: 0.036768894642591476 Loss at step 500: 0.039916012436151505 Loss at step 550: 0.047769077122211456 Loss at step 600: 0.02882159687578678 Loss at step 650: 0.05318938195705414 Loss at step 700: 0.03097960352897644 Loss at step 750: 0.03811640664935112 Loss at step 800: 0.0397464781999588 Loss at step 850: 0.06003884598612785 Loss at step 900: 0.02968420274555683 Mean training loss after epoch 73: 0.0405391053076206 EPOCH: 74 Loss at step 0: 0.03649420291185379 Loss at step 50: 0.032308004796504974 Loss at step 100: 0.047303348779678345 Loss at step 150: 0.03292613476514816 Loss at step 200: 0.04039959982037544 Loss at step 250: 0.05027500540018082 Loss at step 300: 0.03606545552611351 Loss at step 350: 0.07029777765274048 Loss at step 400: 0.03697473555803299 Loss at step 450: 0.04910635948181152 Loss at step 500: 0.03588929772377014 Loss at step 550: 0.027478694915771484 Loss at step 600: 0.030547190457582474 Loss at step 650: 0.03733734041452408 Loss at step 700: 0.03662056103348732 Loss at step 750: 0.037164654582738876 Loss at step 800: 0.03266407549381256 Loss at step 850: 0.05308876186609268 Loss at step 900: 0.06803140789270401 Mean training loss after epoch 74: 0.040417188716961 EPOCH: 75 Loss at step 0: 0.055345918983221054 Loss at step 50: 0.032986272126436234 Loss at step 100: 0.05257772281765938 Loss at step 150: 0.04563840478658676 Loss at step 200: 0.030727950856089592 Loss at step 250: 0.037552978843450546 Loss at step 300: 0.06847788393497467 Loss at step 350: 0.03464344143867493 Loss at step 400: 0.04567990452051163 Loss at step 450: 0.03432216867804527 Loss at step 500: 0.043251946568489075 Loss at step 550: 0.0437377467751503 Loss at step 600: 0.03645789250731468 Loss at step 650: 0.056820984929800034 Loss at step 700: 0.03402060642838478 Loss at step 750: 0.034191813319921494 Loss at step 800: 0.042613349854946136 Loss at step 850: 0.029760945588350296 Loss at step 900: 0.049067385494709015 Mean training loss after epoch 75: 0.04055086274478418 EPOCH: 76 Loss at step 0: 0.03934860602021217 Loss at step 50: 0.0371619313955307 Loss at step 100: 0.055659420788288116 Loss at step 150: 0.036262303590774536 Loss at step 200: 0.0529065802693367 Loss at step 250: 0.04945623502135277 Loss at step 300: 0.05727596953511238 Loss at step 350: 0.035026904195547104 Loss at step 400: 0.0436706617474556 Loss at step 450: 0.03348938375711441 Loss at step 500: 0.040171168744564056 Loss at step 550: 0.04435765743255615 Loss at step 600: 0.03703637048602104 Loss at step 650: 0.04212481901049614 Loss at step 700: 0.036748748272657394 Loss at step 750: 0.0448739267885685 Loss at step 800: 0.033607594668865204 Loss at step 850: 0.04375181719660759 Loss at step 900: 0.03603624179959297 Mean training loss after epoch 76: 0.04041401032549041 EPOCH: 77 Loss at step 0: 0.04128456860780716 Loss at step 50: 0.03762640058994293 Loss at step 100: 0.03784097731113434 Loss at step 150: 0.03539128229022026 Loss at step 200: 0.05711248889565468 Loss at step 250: 0.032208021730184555 Loss at step 300: 0.03896225988864899 Loss at step 350: 0.0364358089864254 Loss at step 400: 0.04166363924741745 Loss at step 450: 0.052053481340408325 Loss at step 500: 0.04502061754465103 Loss at step 550: 0.0342751070857048 Loss at step 600: 0.029672076925635338 Loss at step 650: 0.046030443161726 Loss at step 700: 0.032568734139204025 Loss at step 750: 0.045311518013477325 Loss at step 800: 0.04912736266851425 Loss at step 850: 0.035343289375305176 Loss at step 900: 0.03532174229621887 Mean training loss after epoch 77: 0.04088964018581519 EPOCH: 78 Loss at step 0: 0.034219421446323395 Loss at step 50: 0.039517927914857864 Loss at step 100: 0.043103236705064774 Loss at step 150: 0.034503888338804245 Loss at step 200: 0.031066635623574257 Loss at step 250: 0.033845413476228714 Loss at step 300: 0.06853526830673218 Loss at step 350: 0.039914924651384354 Loss at step 400: 0.029369806870818138 Loss at step 450: 0.035847779363393784 Loss at step 500: 0.05417812988162041 Loss at step 550: 0.03971219062805176 Loss at step 600: 0.035403814166784286 Loss at step 650: 0.03439803048968315 Loss at step 700: 0.05482256039977074 Loss at step 750: 0.04046154022216797 Loss at step 800: 0.040404174476861954 Loss at step 850: 0.03185766562819481 Loss at step 900: 0.047124385833740234 Mean training loss after epoch 78: 0.04108673395283187 EPOCH: 79 Loss at step 0: 0.0292066540569067 Loss at step 50: 0.05637497082352638 Loss at step 100: 0.033106595277786255 Loss at step 150: 0.037283048033714294 Loss at step 200: 0.06435535103082657 Loss at step 250: 0.038874488323926926 Loss at step 300: 0.030614584684371948 Loss at step 350: 0.06488658487796783 Loss at step 400: 0.045286521315574646 Loss at step 450: 0.03548586368560791 Loss at step 500: 0.036548443138599396 Loss at step 550: 0.04465923458337784 Loss at step 600: 0.043890006840229034 Loss at step 650: 0.03705279156565666 Loss at step 700: 0.04273976385593414 Loss at step 750: 0.047070201486349106 Loss at step 800: 0.03513753414154053 Loss at step 850: 0.03871702402830124 Loss at step 900: 0.035829562693834305 Mean training loss after epoch 79: 0.04059115271252801 EPOCH: 80 Loss at step 0: 0.03206692263484001 Loss at step 50: 0.037654582411050797 Loss at step 100: 0.038878802210092545 Loss at step 150: 0.039210326969623566 Loss at step 200: 0.0504489429295063 Loss at step 250: 0.04905981197953224 Loss at step 300: 0.040067121386528015 Loss at step 350: 0.04287860915064812 Loss at step 400: 0.056837595999240875 Loss at step 450: 0.030337590724229813 Loss at step 500: 0.03159491717815399 Loss at step 550: 0.025158636271953583 Loss at step 600: 0.029796551913022995 Loss at step 650: 0.03474956005811691 Loss at step 700: 0.029329804703593254 Loss at step 750: 0.046252503991127014 Loss at step 800: 0.03895849362015724 Loss at step 850: 0.03130355477333069 Loss at step 900: 0.035585347563028336 Mean training loss after epoch 80: 0.040512821895243134 EPOCH: 81 Loss at step 0: 0.052062541246414185 Loss at step 50: 0.038238976150751114 Loss at step 100: 0.037346020340919495 Loss at step 150: 0.04015573859214783 Loss at step 200: 0.03852277994155884 Loss at step 250: 0.03047027252614498 Loss at step 300: 0.03593272343277931 Loss at step 350: 0.032499104738235474 Loss at step 400: 0.032538704574108124 Loss at step 450: 0.05285178869962692 Loss at step 500: 0.03977518156170845 Loss at step 550: 0.05382285639643669 Loss at step 600: 0.048433881253004074 Loss at step 650: 0.049364153295755386 Loss at step 700: 0.041338492184877396 Loss at step 750: 0.03147172927856445 Loss at step 800: 0.03539884835481644 Loss at step 850: 0.043928176164627075 Loss at step 900: 0.04099898412823677 Mean training loss after epoch 81: 0.0405722787853942 EPOCH: 82 Loss at step 0: 0.04728202894330025 Loss at step 50: 0.03753220662474632 Loss at step 100: 0.03720661997795105 Loss at step 150: 0.05796508863568306 Loss at step 200: 0.03124007023870945 Loss at step 250: 0.03203711286187172 Loss at step 300: 0.03969767689704895 Loss at step 350: 0.04587756097316742 Loss at step 400: 0.0516650527715683 Loss at step 450: 0.032214101403951645 Loss at step 500: 0.03950367122888565 Loss at step 550: 0.0373564139008522 Loss at step 600: 0.06261508911848068 Loss at step 650: 0.045457951724529266 Loss at step 700: 0.03577723354101181 Loss at step 750: 0.03626399487257004 Loss at step 800: 0.06346209347248077 Loss at step 850: 0.0363059937953949 Loss at step 900: 0.03522288054227829 Mean training loss after epoch 82: 0.0400778514160308 EPOCH: 83 Loss at step 0: 0.05544643476605415 Loss at step 50: 0.03140563145279884 Loss at step 100: 0.05331690236926079 Loss at step 150: 0.054754309356212616 Loss at step 200: 0.058940257877111435 Loss at step 250: 0.03803196921944618 Loss at step 300: 0.03877706453204155 Loss at step 350: 0.034086499363183975 Loss at step 400: 0.03601112961769104 Loss at step 450: 0.06806022673845291 Loss at step 500: 0.05294420197606087 Loss at step 550: 0.03984103724360466 Loss at step 600: 0.05012628063559532 Loss at step 650: 0.033755458891391754 Loss at step 700: 0.055153943598270416 Loss at step 750: 0.03309555724263191 Loss at step 800: 0.035536766052246094 Loss at step 850: 0.029573775827884674 Loss at step 900: 0.04920117184519768 Mean training loss after epoch 83: 0.0408117123154689 EPOCH: 84 Loss at step 0: 0.04207247123122215 Loss at step 50: 0.05615324154496193 Loss at step 100: 0.05292467027902603 Loss at step 150: 0.04186904430389404 Loss at step 200: 0.05707847699522972 Loss at step 250: 0.040155522525310516 Loss at step 300: 0.035927772521972656 Loss at step 350: 0.048871949315071106 Loss at step 400: 0.03666814789175987 Loss at step 450: 0.030357608571648598 Loss at step 500: 0.035020120441913605 Loss at step 550: 0.03662043437361717 Loss at step 600: 0.04401297867298126 Loss at step 650: 0.036190565675497055 Loss at step 700: 0.03145082667469978 Loss at step 750: 0.07448822259902954 Loss at step 800: 0.03716477006673813 Loss at step 850: 0.06238853558897972 Loss at step 900: 0.03587179630994797 Mean training loss after epoch 84: 0.04082293442881374 EPOCH: 85 Loss at step 0: 0.02904656156897545 Loss at step 50: 0.0356060229241848 Loss at step 100: 0.04782428592443466 Loss at step 150: 0.03371461108326912 Loss at step 200: 0.033603034913539886 Loss at step 250: 0.03414147347211838 Loss at step 300: 0.05430827662348747 Loss at step 350: 0.033129919320344925 Loss at step 400: 0.03058881126344204 Loss at step 450: 0.032881323248147964 Loss at step 500: 0.03734707087278366 Loss at step 550: 0.03624548017978668 Loss at step 600: 0.031804297119379044 Loss at step 650: 0.034315429627895355 Loss at step 700: 0.051231708377599716 Loss at step 750: 0.04130423814058304 Loss at step 800: 0.03633386269211769 Loss at step 850: 0.05054176598787308 Loss at step 900: 0.03765508532524109 Mean training loss after epoch 85: 0.04045379442025794 EPOCH: 86 Loss at step 0: 0.03516799211502075 Loss at step 50: 0.047536157071590424 Loss at step 100: 0.04891946166753769 Loss at step 150: 0.04499293863773346 Loss at step 200: 0.03445722907781601 Loss at step 250: 0.03891275078058243 Loss at step 300: 0.04263707622885704 Loss at step 350: 0.033308036625385284 Loss at step 400: 0.0325336791574955 Loss at step 450: 0.06835544854402542 Loss at step 500: 0.045047059655189514 Loss at step 550: 0.03585875779390335 Loss at step 600: 0.034199636429548264 Loss at step 650: 0.05098696053028107 Loss at step 700: 0.040409382432699203 Loss at step 750: 0.050622764974832535 Loss at step 800: 0.04750943183898926 Loss at step 850: 0.0647769570350647 Loss at step 900: 0.050917789340019226 Mean training loss after epoch 86: 0.040790052655568 EPOCH: 87 Loss at step 0: 0.03621918708086014 Loss at step 50: 0.03650108352303505 Loss at step 100: 0.03466832637786865 Loss at step 150: 0.03966951370239258 Loss at step 200: 0.04728352278470993 Loss at step 250: 0.04616722837090492 Loss at step 300: 0.05285411700606346 Loss at step 350: 0.03472704812884331 Loss at step 400: 0.04570436477661133 Loss at step 450: 0.03485792130231857 Loss at step 500: 0.06346246600151062 Loss at step 550: 0.03879791125655174 Loss at step 600: 0.040417902171611786 Loss at step 650: 0.04890981316566467 Loss at step 700: 0.03518608212471008 Loss at step 750: 0.04728652909398079 Loss at step 800: 0.055480632930994034 Loss at step 850: 0.03925785422325134 Loss at step 900: 0.05136599764227867 Mean training loss after epoch 87: 0.040338136484103795 EPOCH: 88 Loss at step 0: 0.05242814123630524 Loss at step 50: 0.04796266928315163 Loss at step 100: 0.05436587706208229 Loss at step 150: 0.038933560252189636 Loss at step 200: 0.04839233309030533 Loss at step 250: 0.03105120174586773 Loss at step 300: 0.04802002012729645 Loss at step 350: 0.03553696349263191 Loss at step 400: 0.03187737241387367 Loss at step 450: 0.049883753061294556 Loss at step 500: 0.04817438870668411 Loss at step 550: 0.03555983677506447 Loss at step 600: 0.039906058460474014 Loss at step 650: 0.04346184805035591 Loss at step 700: 0.051017116755247116 Loss at step 750: 0.04375026747584343 Loss at step 800: 0.03325865417718887 Loss at step 850: 0.05387381091713905 Loss at step 900: 0.05040017515420914 Mean training loss after epoch 88: 0.04090093306180383 EPOCH: 89 Loss at step 0: 0.03579762205481529 Loss at step 50: 0.04294757917523384 Loss at step 100: 0.03170525282621384 Loss at step 150: 0.031035082414746284 Loss at step 200: 0.03310935199260712 Loss at step 250: 0.03781760483980179 Loss at step 300: 0.035580240190029144 Loss at step 350: 0.03836481273174286 Loss at step 400: 0.04604678973555565 Loss at step 450: 0.052802518010139465 Loss at step 500: 0.021503133699297905 Loss at step 550: 0.03572698310017586 Loss at step 600: 0.03354794904589653 Loss at step 650: 0.039177291095256805 Loss at step 700: 0.05091559514403343 Loss at step 750: 0.03285715728998184 Loss at step 800: 0.033864084631204605 Loss at step 850: 0.05180785432457924 Loss at step 900: 0.03823590278625488 Mean training loss after epoch 89: 0.04008701584264159 EPOCH: 90 Loss at step 0: 0.03695370629429817 Loss at step 50: 0.037159111350774765 Loss at step 100: 0.04978851228952408 Loss at step 150: 0.042792223393917084 Loss at step 200: 0.04785523936152458 Loss at step 250: 0.0423269122838974 Loss at step 300: 0.03371042013168335 Loss at step 350: 0.04030280187726021 Loss at step 400: 0.03571828082203865 Loss at step 450: 0.04919782280921936 Loss at step 500: 0.03942182660102844 Loss at step 550: 0.028313415125012398 Loss at step 600: 0.034696027636528015 Loss at step 650: 0.03654114529490471 Loss at step 700: 0.031517013907432556 Loss at step 750: 0.027009544894099236 Loss at step 800: 0.038787923753261566 Loss at step 850: 0.03033806011080742 Loss at step 900: 0.03590017557144165 Mean training loss after epoch 90: 0.040392057944351294 EPOCH: 91 Loss at step 0: 0.046321600675582886 Loss at step 50: 0.0368216410279274 Loss at step 100: 0.07540422677993774 Loss at step 150: 0.037830669432878494 Loss at step 200: 0.040888119488954544 Loss at step 250: 0.04271574318408966 Loss at step 300: 0.05583266541361809 Loss at step 350: 0.031033698469400406 Loss at step 400: 0.031072791665792465 Loss at step 450: 0.03850722685456276 Loss at step 500: 0.04071514308452606 Loss at step 550: 0.03712037205696106 Loss at step 600: 0.03167273849248886 Loss at step 650: 0.03554568067193031 Loss at step 700: 0.045064691454172134 Loss at step 750: 0.03905215114355087 Loss at step 800: 0.03517136350274086 Loss at step 850: 0.02688581496477127 Loss at step 900: 0.06034184247255325 Mean training loss after epoch 91: 0.040211929776854734 EPOCH: 92 Loss at step 0: 0.048670217394828796 Loss at step 50: 0.02553282119333744 Loss at step 100: 0.03734314441680908 Loss at step 150: 0.0474412776529789 Loss at step 200: 0.0363517589867115 Loss at step 250: 0.03884747624397278 Loss at step 300: 0.053442977368831635 Loss at step 350: 0.03961704671382904 Loss at step 400: 0.052478060126304626 Loss at step 450: 0.0297246016561985 Loss at step 500: 0.04363180696964264 Loss at step 550: 0.03283963352441788 Loss at step 600: 0.03334625065326691 Loss at step 650: 0.03938412666320801 Loss at step 700: 0.04328535869717598 Loss at step 750: 0.039576053619384766 Loss at step 800: 0.034662164747714996 Loss at step 850: 0.03698847070336342 Loss at step 900: 0.05190076678991318 Mean training loss after epoch 92: 0.0403653561000599 EPOCH: 93 Loss at step 0: 0.04081479460000992 Loss at step 50: 0.03418498858809471 Loss at step 100: 0.02798306569457054 Loss at step 150: 0.03756929561495781 Loss at step 200: 0.02820855937898159 Loss at step 250: 0.036923982203006744 Loss at step 300: 0.036957722157239914 Loss at step 350: 0.037225961685180664 Loss at step 400: 0.04795173183083534 Loss at step 450: 0.03733421117067337 Loss at step 500: 0.044164326041936874 Loss at step 550: 0.03427265211939812 Loss at step 600: 0.03177943825721741 Loss at step 650: 0.046139899641275406 Loss at step 700: 0.03557630255818367 Loss at step 750: 0.04292155057191849 Loss at step 800: 0.02787667140364647 Loss at step 850: 0.03480743244290352 Loss at step 900: 0.04243965446949005 Mean training loss after epoch 93: 0.04055064068293012 EPOCH: 94 Loss at step 0: 0.03594643250107765 Loss at step 50: 0.033269692212343216 Loss at step 100: 0.0396963395178318 Loss at step 150: 0.030958646908402443 Loss at step 200: 0.05192381888628006 Loss at step 250: 0.04564175382256508 Loss at step 300: 0.03364739194512367 Loss at step 350: 0.04413637891411781 Loss at step 400: 0.03330669179558754 Loss at step 450: 0.04194950312376022 Loss at step 500: 0.05235005170106888 Loss at step 550: 0.04674359783530235 Loss at step 600: 0.035554829984903336 Loss at step 650: 0.044718511402606964 Loss at step 700: 0.04419019818305969 Loss at step 750: 0.031159212812781334 Loss at step 800: 0.037368424236774445 Loss at step 850: 0.04972722381353378 Loss at step 900: 0.04637891426682472 Mean training loss after epoch 94: 0.039845950373295524 EPOCH: 95 Loss at step 0: 0.03500945493578911 Loss at step 50: 0.0499107651412487 Loss at step 100: 0.031193625181913376 Loss at step 150: 0.03178275376558304 Loss at step 200: 0.04069007933139801 Loss at step 250: 0.04791006073355675 Loss at step 300: 0.03697551041841507 Loss at step 350: 0.03356834128499031 Loss at step 400: 0.04692601040005684 Loss at step 450: 0.03353650122880936 Loss at step 500: 0.05243617668747902 Loss at step 550: 0.03918429836630821 Loss at step 600: 0.03963415324687958 Loss at step 650: 0.0322025828063488 Loss at step 700: 0.04164209961891174 Loss at step 750: 0.06696214526891708 Loss at step 800: 0.033979710191488266 Loss at step 850: 0.03802863508462906 Loss at step 900: 0.06281917542219162 Mean training loss after epoch 95: 0.040050989606661966 EPOCH: 96 Loss at step 0: 0.062876857817173 Loss at step 50: 0.04466922953724861 Loss at step 100: 0.035194288939237595 Loss at step 150: 0.036158543080091476 Loss at step 200: 0.05215864256024361 Loss at step 250: 0.05362296104431152 Loss at step 300: 0.02842164970934391 Loss at step 350: 0.04300060495734215 Loss at step 400: 0.030633846297860146 Loss at step 450: 0.0328863300383091 Loss at step 500: 0.03523578494787216 Loss at step 550: 0.048109665513038635 Loss at step 600: 0.03623853251338005 Loss at step 650: 0.0325080044567585 Loss at step 700: 0.039300985634326935 Loss at step 750: 0.034248512238264084 Loss at step 800: 0.03062102198600769 Loss at step 850: 0.04759842902421951 Loss at step 900: 0.06728257238864899 Mean training loss after epoch 96: 0.04025086097872016 EPOCH: 97 Loss at step 0: 0.03604558855295181 Loss at step 50: 0.03586094081401825 Loss at step 100: 0.037453148514032364 Loss at step 150: 0.05949082970619202 Loss at step 200: 0.04595595970749855 Loss at step 250: 0.03523814678192139 Loss at step 300: 0.03561503812670708 Loss at step 350: 0.034665297716856 Loss at step 400: 0.04979550838470459 Loss at step 450: 0.04578594118356705 Loss at step 500: 0.03738207370042801 Loss at step 550: 0.033975016325712204 Loss at step 600: 0.0426042303442955 Loss at step 650: 0.05135397985577583 Loss at step 700: 0.040103811770677567 Loss at step 750: 0.05547140911221504 Loss at step 800: 0.03707880899310112 Loss at step 850: 0.05171624943614006 Loss at step 900: 0.05659947171807289 Mean training loss after epoch 97: 0.04026777711686994 EPOCH: 98 Loss at step 0: 0.04901181161403656 Loss at step 50: 0.03343966603279114 Loss at step 100: 0.05344740301370621 Loss at step 150: 0.05176970735192299 Loss at step 200: 0.03529363498091698 Loss at step 250: 0.038207679986953735 Loss at step 300: 0.02989993244409561 Loss at step 350: 0.035658884793519974 Loss at step 400: 0.03506334871053696 Loss at step 450: 0.029390836134552956 Loss at step 500: 0.035672180354595184 Loss at step 550: 0.04302973672747612 Loss at step 600: 0.03212498500943184 Loss at step 650: 0.05074365809559822 Loss at step 700: 0.03154989704489708 Loss at step 750: 0.048734329640865326 Loss at step 800: 0.04683579504489899 Loss at step 850: 0.06299743801355362 Loss at step 900: 0.03963327780365944 Mean training loss after epoch 98: 0.039652633168962975 EPOCH: 99 Loss at step 0: 0.033908911049366 Loss at step 50: 0.03945780172944069 Loss at step 100: 0.03597944229841232 Loss at step 150: 0.03968245908617973 Loss at step 200: 0.04228976368904114 Loss at step 250: 0.05131349340081215 Loss at step 300: 0.04730290547013283 Loss at step 350: 0.054106492549180984 Loss at step 400: 0.048031311482191086 Loss at step 450: 0.056564997881650925 Loss at step 500: 0.04713006690144539 Loss at step 550: 0.036680229008197784 Loss at step 600: 0.036483123898506165 Loss at step 650: 0.04585624486207962 Loss at step 700: 0.03316484019160271 Loss at step 750: 0.04711407050490379 Loss at step 800: 0.041108354926109314 Loss at step 850: 0.037704259157180786 Loss at step 900: 0.03709965571761131 Mean training loss after epoch 99: 0.04057291350853659 EPOCH: 100 Loss at step 0: 0.03885505720973015 Loss at step 50: 0.0364660881459713 Loss at step 100: 0.030998477712273598 Loss at step 150: 0.03358355164527893 Loss at step 200: 0.029647495597600937 Loss at step 250: 0.03730285167694092 Loss at step 300: 0.05705293267965317 Loss at step 350: 0.048707954585552216 Loss at step 400: 0.03184742107987404 Loss at step 450: 0.039552297443151474 Loss at step 500: 0.03112632781267166 Loss at step 550: 0.02541619911789894 Loss at step 600: 0.02927130088210106 Loss at step 650: 0.036506205797195435 Loss at step 700: 0.06400274485349655 Loss at step 750: 0.035191185772418976 Loss at step 800: 0.05189486965537071 Loss at step 850: 0.04187934473156929 Loss at step 900: 0.03587055206298828 Mean training loss after epoch 100: 0.03951049410203881 EPOCH: 101 Loss at step 0: 0.04608278349041939 Loss at step 50: 0.059671275317668915 Loss at step 100: 0.03808033838868141 Loss at step 150: 0.05196906998753548 Loss at step 200: 0.03131590411067009 Loss at step 250: 0.059665825217962265 Loss at step 300: 0.0483604334294796 Loss at step 350: 0.032256316393613815 Loss at step 400: 0.03246087208390236 Loss at step 450: 0.03289893642067909 Loss at step 500: 0.06762969493865967 Loss at step 550: 0.04613935947418213 Loss at step 600: 0.036531317979097366 Loss at step 650: 0.050805091857910156 Loss at step 700: 0.046931829303503036 Loss at step 750: 0.03377658128738403 Loss at step 800: 0.03567809984087944 Loss at step 850: 0.070270836353302 Loss at step 900: 0.038735322654247284 Mean training loss after epoch 101: 0.039873033674604604 EPOCH: 102 Loss at step 0: 0.029993660748004913 Loss at step 50: 0.05206090584397316 Loss at step 100: 0.03597714751958847 Loss at step 150: 0.0356283038854599 Loss at step 200: 0.045805949717760086 Loss at step 250: 0.040136922150850296 Loss at step 300: 0.04690312221646309 Loss at step 350: 0.0484488420188427 Loss at step 400: 0.03959207236766815 Loss at step 450: 0.0466473288834095 Loss at step 500: 0.052643727511167526 Loss at step 550: 0.043842654675245285 Loss at step 600: 0.03462978079915047 Loss at step 650: 0.04178337752819061 Loss at step 700: 0.03270316869020462 Loss at step 750: 0.03778819739818573 Loss at step 800: 0.034662868827581406 Loss at step 850: 0.03329360485076904 Loss at step 900: 0.02992893010377884 Mean training loss after epoch 102: 0.040307816527068996 EPOCH: 103 Loss at step 0: 0.028839770704507828 Loss at step 50: 0.032576169818639755 Loss at step 100: 0.046098269522190094 Loss at step 150: 0.03324437141418457 Loss at step 200: 0.036912575364112854 Loss at step 250: 0.06562089920043945 Loss at step 300: 0.038518793880939484 Loss at step 350: 0.04422292858362198 Loss at step 400: 0.05281831696629524 Loss at step 450: 0.030899688601493835 Loss at step 500: 0.03081597201526165 Loss at step 550: 0.03514919802546501 Loss at step 600: 0.03863156959414482 Loss at step 650: 0.04819471761584282 Loss at step 700: 0.02757849358022213 Loss at step 750: 0.03505969047546387 Loss at step 800: 0.050771504640579224 Loss at step 850: 0.03697929158806801 Loss at step 900: 0.03245064988732338 Mean training loss after epoch 103: 0.0396687470849103 EPOCH: 104 Loss at step 0: 0.037453245371580124 Loss at step 50: 0.03152114525437355 Loss at step 100: 0.049962665885686874 Loss at step 150: 0.03794638812541962 Loss at step 200: 0.029369976371526718 Loss at step 250: 0.0512501560151577 Loss at step 300: 0.028871268033981323 Loss at step 350: 0.029182739555835724 Loss at step 400: 0.0353519469499588 Loss at step 450: 0.034153975546360016 Loss at step 500: 0.03648490086197853 Loss at step 550: 0.03413955867290497 Loss at step 600: 0.042085278779268265 Loss at step 650: 0.03410710021853447 Loss at step 700: 0.03678492456674576 Loss at step 750: 0.039124734699726105 Loss at step 800: 0.03770854324102402 Loss at step 850: 0.03543799743056297 Loss at step 900: 0.06420788168907166 Mean training loss after epoch 104: 0.03980438330415279 EPOCH: 105 Loss at step 0: 0.030354788526892662 Loss at step 50: 0.036188848316669464 Loss at step 100: 0.038238782435655594 Loss at step 150: 0.04532600939273834 Loss at step 200: 0.03434334695339203 Loss at step 250: 0.030249161645770073 Loss at step 300: 0.0355229526758194 Loss at step 350: 0.04108721390366554 Loss at step 400: 0.04819981008768082 Loss at step 450: 0.036564890295267105 Loss at step 500: 0.03692222014069557 Loss at step 550: 0.05012466013431549 Loss at step 600: 0.038261499255895615 Loss at step 650: 0.049558911472558975 Loss at step 700: 0.02887756936252117 Loss at step 750: 0.03132330998778343 Loss at step 800: 0.03574202209711075 Loss at step 850: 0.03637055307626724 Loss at step 900: 0.03696170076727867 Mean training loss after epoch 105: 0.03931792791106744 EPOCH: 106 Loss at step 0: 0.02895914763212204 Loss at step 50: 0.03470218926668167 Loss at step 100: 0.03608347848057747 Loss at step 150: 0.03328566253185272 Loss at step 200: 0.03194018825888634 Loss at step 250: 0.039853960275650024 Loss at step 300: 0.03465244174003601 Loss at step 350: 0.04721842706203461 Loss at step 400: 0.04896251857280731 Loss at step 450: 0.03294207900762558 Loss at step 500: 0.03316209837794304 Loss at step 550: 0.038554515689611435 Loss at step 600: 0.03483927249908447 Loss at step 650: 0.03864853084087372 Loss at step 700: 0.05380488187074661 Loss at step 750: 0.03824802488088608 Loss at step 800: 0.045081984251737595 Loss at step 850: 0.055266037583351135 Loss at step 900: 0.03356846049427986 Mean training loss after epoch 106: 0.04033039600602282 EPOCH: 107 Loss at step 0: 0.03343590721487999 Loss at step 50: 0.040028467774391174 Loss at step 100: 0.030918775126338005 Loss at step 150: 0.035232074558734894 Loss at step 200: 0.02924354560673237 Loss at step 250: 0.036240655928850174 Loss at step 300: 0.03260454908013344 Loss at step 350: 0.028631126508116722 Loss at step 400: 0.03340171277523041 Loss at step 450: 0.054275911301374435 Loss at step 500: 0.03813685104250908 Loss at step 550: 0.03442586585879326 Loss at step 600: 0.03241576999425888 Loss at step 650: 0.05239550769329071 Loss at step 700: 0.03729248046875 Loss at step 750: 0.03661287575960159 Loss at step 800: 0.03836385905742645 Loss at step 850: 0.051482852548360825 Loss at step 900: 0.03634370118379593 Mean training loss after epoch 107: 0.04004901101483084 EPOCH: 108 Loss at step 0: 0.05109003931283951 Loss at step 50: 0.02608264610171318 Loss at step 100: 0.039076343178749084 Loss at step 150: 0.05490468814969063 Loss at step 200: 0.035528045147657394 Loss at step 250: 0.037927884608507156 Loss at step 300: 0.04951369762420654 Loss at step 350: 0.04343758523464203 Loss at step 400: 0.03904278948903084 Loss at step 450: 0.03650984540581703 Loss at step 500: 0.037677064538002014 Loss at step 550: 0.031686507165431976 Loss at step 600: 0.038852058351039886 Loss at step 650: 0.030935076996684074 Loss at step 700: 0.03871190547943115 Loss at step 750: 0.06702093780040741 Loss at step 800: 0.029941575601696968 Loss at step 850: 0.04882451519370079 Loss at step 900: 0.03624775633215904 Mean training loss after epoch 108: 0.04007373649333078 EPOCH: 109 Loss at step 0: 0.04383249953389168 Loss at step 50: 0.030085096135735512 Loss at step 100: 0.033767592161893845 Loss at step 150: 0.06283251196146011 Loss at step 200: 0.036935336887836456 Loss at step 250: 0.035300303250551224 Loss at step 300: 0.03438340127468109 Loss at step 350: 0.044885098934173584 Loss at step 400: 0.06037943437695503 Loss at step 450: 0.03868570923805237 Loss at step 500: 0.03664155304431915 Loss at step 550: 0.030856721103191376 Loss at step 600: 0.03619741275906563 Loss at step 650: 0.03256780654191971 Loss at step 700: 0.03119807317852974 Loss at step 750: 0.03271212801337242 Loss at step 800: 0.03587260842323303 Loss at step 850: 0.04169003665447235 Loss at step 900: 0.030415404587984085 Mean training loss after epoch 109: 0.04006141843373524 EPOCH: 110 Loss at step 0: 0.05496039614081383 Loss at step 50: 0.0386204719543457 Loss at step 100: 0.03214837610721588 Loss at step 150: 0.04659537598490715 Loss at step 200: 0.03392510488629341 Loss at step 250: 0.03253054991364479 Loss at step 300: 0.030138738453388214 Loss at step 350: 0.057351693511009216 Loss at step 400: 0.04540780559182167 Loss at step 450: 0.03278924152255058 Loss at step 500: 0.033986836671829224 Loss at step 550: 0.05008528009057045 Loss at step 600: 0.048306263983249664 Loss at step 650: 0.031286366283893585 Loss at step 700: 0.03290008008480072 Loss at step 750: 0.03337947279214859 Loss at step 800: 0.04888298362493515 Loss at step 850: 0.030231455340981483 Loss at step 900: 0.03091765195131302 Mean training loss after epoch 110: 0.04032340717476123 EPOCH: 111 Loss at step 0: 0.054753389209508896 Loss at step 50: 0.030340857803821564 Loss at step 100: 0.03798448294401169 Loss at step 150: 0.04007083922624588 Loss at step 200: 0.03326107934117317 Loss at step 250: 0.03548939898610115 Loss at step 300: 0.0340801477432251 Loss at step 350: 0.05384042114019394 Loss at step 400: 0.038668442517519 Loss at step 450: 0.047105152159929276 Loss at step 500: 0.03270631283521652 Loss at step 550: 0.033553946763277054 Loss at step 600: 0.0468083880841732 Loss at step 650: 0.032758068293333054 Loss at step 700: 0.0527087077498436 Loss at step 750: 0.032661087810993195 Loss at step 800: 0.04175620526075363 Loss at step 850: 0.041546840220689774 Loss at step 900: 0.04201667383313179 Mean training loss after epoch 111: 0.04014767684947961 EPOCH: 112 Loss at step 0: 0.0331437811255455 Loss at step 50: 0.051914554089307785 Loss at step 100: 0.05040278285741806 Loss at step 150: 0.038170043379068375 Loss at step 200: 0.04248329624533653 Loss at step 250: 0.050448957830667496 Loss at step 300: 0.0480661578476429 Loss at step 350: 0.044293470680713654 Loss at step 400: 0.03480818122625351 Loss at step 450: 0.03766563534736633 Loss at step 500: 0.04877613112330437 Loss at step 550: 0.03655528649687767 Loss at step 600: 0.03213350847363472 Loss at step 650: 0.0348079539835453 Loss at step 700: 0.04645334929227829 Loss at step 750: 0.04048634320497513 Loss at step 800: 0.03384030610322952 Loss at step 850: 0.04143141582608223 Loss at step 900: 0.044551536440849304 Mean training loss after epoch 112: 0.03972915529227778 EPOCH: 113 Loss at step 0: 0.03731623664498329 Loss at step 50: 0.036111876368522644 Loss at step 100: 0.04895376041531563 Loss at step 150: 0.03448157384991646 Loss at step 200: 0.05395231395959854 Loss at step 250: 0.029758598655462265 Loss at step 300: 0.035165295004844666 Loss at step 350: 0.06340741366147995 Loss at step 400: 0.03216756135225296 Loss at step 450: 0.030543548986315727 Loss at step 500: 0.03666659817099571 Loss at step 550: 0.04434603825211525 Loss at step 600: 0.04934762045741081 Loss at step 650: 0.03147423267364502 Loss at step 700: 0.02988319657742977 Loss at step 750: 0.034099943935871124 Loss at step 800: 0.035690028220415115 Loss at step 850: 0.050105053931474686 Loss at step 900: 0.05036727711558342 Mean training loss after epoch 113: 0.039910951901330496 EPOCH: 114 Loss at step 0: 0.04575241729617119 Loss at step 50: 0.03039500303566456 Loss at step 100: 0.03562726080417633 Loss at step 150: 0.04480719566345215 Loss at step 200: 0.03135523572564125 Loss at step 250: 0.03204413130879402 Loss at step 300: 0.034801606088876724 Loss at step 350: 0.03432801365852356 Loss at step 400: 0.03443780541419983 Loss at step 450: 0.03150886669754982 Loss at step 500: 0.06512052565813065 Loss at step 550: 0.030634431168437004 Loss at step 600: 0.03877680003643036 Loss at step 650: 0.03675153851509094 Loss at step 700: 0.05123452469706535 Loss at step 750: 0.03553891181945801 Loss at step 800: 0.036491744220256805 Loss at step 850: 0.04911104962229729 Loss at step 900: 0.03419262543320656 Mean training loss after epoch 114: 0.03932583956584049 EPOCH: 115 Loss at step 0: 0.037834931164979935 Loss at step 50: 0.03680989518761635 Loss at step 100: 0.03384866937994957 Loss at step 150: 0.052306655794382095 Loss at step 200: 0.03239380195736885 Loss at step 250: 0.034101855009794235 Loss at step 300: 0.02988138608634472 Loss at step 350: 0.036936160176992416 Loss at step 400: 0.03579823672771454 Loss at step 450: 0.03930773586034775 Loss at step 500: 0.04438015818595886 Loss at step 550: 0.026151835918426514 Loss at step 600: 0.03031524270772934 Loss at step 650: 0.028382038697600365 Loss at step 700: 0.03755122795701027 Loss at step 750: 0.030609527602791786 Loss at step 800: 0.041900672018527985 Loss at step 850: 0.03541675955057144 Loss at step 900: 0.034347839653491974 Mean training loss after epoch 115: 0.039802594229515424 EPOCH: 116 Loss at step 0: 0.05536920949816704 Loss at step 50: 0.033777933567762375 Loss at step 100: 0.029647989198565483 Loss at step 150: 0.03826063126325607 Loss at step 200: 0.05031178891658783 Loss at step 250: 0.034476444125175476 Loss at step 300: 0.028743013739585876 Loss at step 350: 0.04937901720404625 Loss at step 400: 0.06830427795648575 Loss at step 450: 0.06829162687063217 Loss at step 500: 0.05848719924688339 Loss at step 550: 0.04241336137056351 Loss at step 600: 0.0390176847577095 Loss at step 650: 0.030798722058534622 Loss at step 700: 0.05506971478462219 Loss at step 750: 0.0326869897544384 Loss at step 800: 0.051320888102054596 Loss at step 850: 0.04131399467587471 Loss at step 900: 0.032175444066524506 Mean training loss after epoch 116: 0.03980343399652794 EPOCH: 117 Loss at step 0: 0.04073479771614075 Loss at step 50: 0.04252466931939125 Loss at step 100: 0.0744202584028244 Loss at step 150: 0.05595950037240982 Loss at step 200: 0.03471630811691284 Loss at step 250: 0.035042792558670044 Loss at step 300: 0.028284024447202682 Loss at step 350: 0.034812550991773605 Loss at step 400: 0.0340358167886734 Loss at step 450: 0.05081040412187576 Loss at step 500: 0.038909394294023514 Loss at step 550: 0.03063329868018627 Loss at step 600: 0.03662952408194542 Loss at step 650: 0.03276791796088219 Loss at step 700: 0.05208593234419823 Loss at step 750: 0.03402816504240036 Loss at step 800: 0.037891607731580734 Loss at step 850: 0.03039601258933544 Loss at step 900: 0.03755423426628113 Mean training loss after epoch 117: 0.03969481655521624 EPOCH: 118 Loss at step 0: 0.05762284994125366 Loss at step 50: 0.03275303542613983 Loss at step 100: 0.03699730709195137 Loss at step 150: 0.0524105578660965 Loss at step 200: 0.0373607762157917 Loss at step 250: 0.051706284284591675 Loss at step 300: 0.04961675405502319 Loss at step 350: 0.030602816492319107 Loss at step 400: 0.039848703891038895 Loss at step 450: 0.027412720024585724 Loss at step 500: 0.04416043311357498 Loss at step 550: 0.03620737046003342 Loss at step 600: 0.03747519105672836 Loss at step 650: 0.03180905058979988 Loss at step 700: 0.0278563741594553 Loss at step 750: 0.027683768421411514 Loss at step 800: 0.028419815003871918 Loss at step 850: 0.035217415541410446 Loss at step 900: 0.044697314500808716 Mean training loss after epoch 118: 0.039538045250959615 EPOCH: 119 Loss at step 0: 0.046822331845760345 Loss at step 50: 0.040409281849861145 Loss at step 100: 0.03416061028838158 Loss at step 150: 0.03551073744893074 Loss at step 200: 0.03338408097624779 Loss at step 250: 0.03503516688942909 Loss at step 300: 0.035498205572366714 Loss at step 350: 0.033440135419368744 Loss at step 400: 0.03668081760406494 Loss at step 450: 0.030554620549082756 Loss at step 500: 0.03509295731782913 Loss at step 550: 0.039216965436935425 Loss at step 600: 0.03887547552585602 Loss at step 650: 0.05035317689180374 Loss at step 700: 0.02960081212222576 Loss at step 750: 0.03060799092054367 Loss at step 800: 0.04975055903196335 Loss at step 850: 0.030993852764368057 Loss at step 900: 0.039669476449489594 Mean training loss after epoch 119: 0.03914418635464935 EPOCH: 120 Loss at step 0: 0.05311105400323868 Loss at step 50: 0.056619614362716675 Loss at step 100: 0.03856995701789856 Loss at step 150: 0.031934138387441635 Loss at step 200: 0.036189254373311996 Loss at step 250: 0.039071522653102875 Loss at step 300: 0.035138342529535294 Loss at step 350: 0.03032342717051506 Loss at step 400: 0.03553370386362076 Loss at step 450: 0.03235636278986931 Loss at step 500: 0.027896877378225327 Loss at step 550: 0.032930176705121994 Loss at step 600: 0.032992299646139145 Loss at step 650: 0.056119002401828766 Loss at step 700: 0.04664117470383644 Loss at step 750: 0.03771108016371727 Loss at step 800: 0.03448104485869408 Loss at step 850: 0.03648252412676811 Loss at step 900: 0.03687531501054764 Mean training loss after epoch 120: 0.039342745918153066 EPOCH: 121 Loss at step 0: 0.0373818576335907 Loss at step 50: 0.03398089483380318 Loss at step 100: 0.03324371948838234 Loss at step 150: 0.03873438015580177 Loss at step 200: 0.05358034744858742 Loss at step 250: 0.026201050728559494 Loss at step 300: 0.03831248730421066 Loss at step 350: 0.02894914150238037 Loss at step 400: 0.05098544806241989 Loss at step 450: 0.038944061845541 Loss at step 500: 0.06779811531305313 Loss at step 550: 0.03747352212667465 Loss at step 600: 0.034456510096788406 Loss at step 650: 0.043319232761859894 Loss at step 700: 0.031052978709340096 Loss at step 750: 0.050076182931661606 Loss at step 800: 0.036640871316194534 Loss at step 850: 0.031347863376140594 Loss at step 900: 0.06967149674892426 Mean training loss after epoch 121: 0.04029602483153216 EPOCH: 122 Loss at step 0: 0.032164618372917175 Loss at step 50: 0.037187281996011734 Loss at step 100: 0.06261426955461502 Loss at step 150: 0.031779687851667404 Loss at step 200: 0.05010031536221504 Loss at step 250: 0.04943472892045975 Loss at step 300: 0.03739229962229729 Loss at step 350: 0.028652414679527283 Loss at step 400: 0.040482472628355026 Loss at step 450: 0.06316076964139938 Loss at step 500: 0.03193167597055435 Loss at step 550: 0.04684897139668465 Loss at step 600: 0.030669517815113068 Loss at step 650: 0.032912176102399826 Loss at step 700: 0.030892424285411835 Loss at step 750: 0.03469032794237137 Loss at step 800: 0.03827536106109619 Loss at step 850: 0.06514573097229004 Loss at step 900: 0.047702375799417496 Mean training loss after epoch 122: 0.03942630458662886 EPOCH: 123 Loss at step 0: 0.03647477179765701 Loss at step 50: 0.035446006804704666 Loss at step 100: 0.03321842849254608 Loss at step 150: 0.04978501796722412 Loss at step 200: 0.030861133709549904 Loss at step 250: 0.05148536339402199 Loss at step 300: 0.02919955551624298 Loss at step 350: 0.05153811722993851 Loss at step 400: 0.03864631429314613 Loss at step 450: 0.051382023841142654 Loss at step 500: 0.02576160617172718 Loss at step 550: 0.03659423440694809 Loss at step 600: 0.05013567954301834 Loss at step 650: 0.02838999778032303 Loss at step 700: 0.05561266466975212 Loss at step 750: 0.03207223862409592 Loss at step 800: 0.034148313105106354 Loss at step 850: 0.04415399953722954 Loss at step 900: 0.031937193125486374 Mean training loss after epoch 123: 0.03915354219882854 EPOCH: 124 Loss at step 0: 0.04839946702122688 Loss at step 50: 0.03287782520055771 Loss at step 100: 0.03596392273902893 Loss at step 150: 0.03851621598005295 Loss at step 200: 0.05554656311869621 Loss at step 250: 0.0404062494635582 Loss at step 300: 0.03724386543035507 Loss at step 350: 0.0341772735118866 Loss at step 400: 0.03929714113473892 Loss at step 450: 0.06781086325645447 Loss at step 500: 0.03279869258403778 Loss at step 550: 0.032746266573667526 Loss at step 600: 0.0346091166138649 Loss at step 650: 0.03212163969874382 Loss at step 700: 0.029754646122455597 Loss at step 750: 0.03289751708507538 Loss at step 800: 0.05357235297560692 Loss at step 850: 0.03909871354699135 Loss at step 900: 0.03817118704319 Mean training loss after epoch 124: 0.039234006618544745 EPOCH: 125 Loss at step 0: 0.030392801389098167 Loss at step 50: 0.04675878584384918 Loss at step 100: 0.035765547305345535 Loss at step 150: 0.03826824948191643 Loss at step 200: 0.02829303964972496 Loss at step 250: 0.03197896108031273 Loss at step 300: 0.03968275338411331 Loss at step 350: 0.07534510642290115 Loss at step 400: 0.04339231550693512 Loss at step 450: 0.06655878573656082 Loss at step 500: 0.05238719657063484 Loss at step 550: 0.049171749502420425 Loss at step 600: 0.03856230899691582 Loss at step 650: 0.03278575837612152 Loss at step 700: 0.036944057792425156 Loss at step 750: 0.05343270301818848 Loss at step 800: 0.0361625961959362 Loss at step 850: 0.03409438580274582 Loss at step 900: 0.03568330034613609 Mean training loss after epoch 125: 0.03979732663130392 EPOCH: 126 Loss at step 0: 0.053269945085048676 Loss at step 50: 0.048633527010679245 Loss at step 100: 0.029948212206363678 Loss at step 150: 0.03244273364543915 Loss at step 200: 0.050222307443618774 Loss at step 250: 0.03105895221233368 Loss at step 300: 0.03882803022861481 Loss at step 350: 0.04591241478919983 Loss at step 400: 0.032795775681734085 Loss at step 450: 0.039201706647872925 Loss at step 500: 0.03418349102139473 Loss at step 550: 0.03612521290779114 Loss at step 600: 0.045109085738658905 Loss at step 650: 0.036604929715394974 Loss at step 700: 0.038015253841876984 Loss at step 750: 0.03615311533212662 Loss at step 800: 0.040611233562231064 Loss at step 850: 0.051794733852148056 Loss at step 900: 0.0350944958627224 Mean training loss after epoch 126: 0.0395815702242606 EPOCH: 127 Loss at step 0: 0.0618874691426754 Loss at step 50: 0.027040809392929077 Loss at step 100: 0.03517129272222519 Loss at step 150: 0.034836456179618835 Loss at step 200: 0.033284999430179596 Loss at step 250: 0.03534623607993126 Loss at step 300: 0.04767061769962311 Loss at step 350: 0.03038816526532173 Loss at step 400: 0.03353381156921387 Loss at step 450: 0.03521526977419853 Loss at step 500: 0.03514536842703819 Loss at step 550: 0.05673908442258835 Loss at step 600: 0.0370648056268692 Loss at step 650: 0.03658979758620262 Loss at step 700: 0.03419577330350876 Loss at step 750: 0.03883873298764229 Loss at step 800: 0.03651333227753639 Loss at step 850: 0.060185886919498444 Loss at step 900: 0.030550435185432434 Mean training loss after epoch 127: 0.039405747858890845 EPOCH: 128 Loss at step 0: 0.03821711987257004 Loss at step 50: 0.05864926800131798 Loss at step 100: 0.028969813138246536 Loss at step 150: 0.04189308360219002 Loss at step 200: 0.042988795787096024 Loss at step 250: 0.034990474581718445 Loss at step 300: 0.03045235574245453 Loss at step 350: 0.03413374722003937 Loss at step 400: 0.032293252646923065 Loss at step 450: 0.024608304724097252 Loss at step 500: 0.053782153874635696 Loss at step 550: 0.06421202421188354 Loss at step 600: 0.04265635088086128 Loss at step 650: 0.03379753232002258 Loss at step 700: 0.03346531093120575 Loss at step 750: 0.03138613700866699 Loss at step 800: 0.044763680547475815 Loss at step 850: 0.07052712142467499 Loss at step 900: 0.03729238361120224 Mean training loss after epoch 128: 0.039433698775544604 EPOCH: 129 Loss at step 0: 0.02955583669245243 Loss at step 50: 0.0351250022649765 Loss at step 100: 0.03795075789093971 Loss at step 150: 0.04622901231050491 Loss at step 200: 0.03804639354348183 Loss at step 250: 0.035619787871837616 Loss at step 300: 0.03548955172300339 Loss at step 350: 0.05214408412575722 Loss at step 400: 0.039869122207164764 Loss at step 450: 0.03045550547540188 Loss at step 500: 0.03163490816950798 Loss at step 550: 0.05486064404249191 Loss at step 600: 0.047294847667217255 Loss at step 650: 0.03491346165537834 Loss at step 700: 0.03555524721741676 Loss at step 750: 0.028734855353832245 Loss at step 800: 0.035248763859272 Loss at step 850: 0.046891167759895325 Loss at step 900: 0.031013386324048042 Mean training loss after epoch 129: 0.03944409705762034 EPOCH: 130 Loss at step 0: 0.03365926817059517 Loss at step 50: 0.06629271060228348 Loss at step 100: 0.03606370463967323 Loss at step 150: 0.030420828610658646 Loss at step 200: 0.04769609868526459 Loss at step 250: 0.03749912604689598 Loss at step 300: 0.049227118492126465 Loss at step 350: 0.031046103686094284 Loss at step 400: 0.03232114017009735 Loss at step 450: 0.039081063121557236 Loss at step 500: 0.03998982906341553 Loss at step 550: 0.034319210797548294 Loss at step 600: 0.03146182373166084 Loss at step 650: 0.034599605947732925 Loss at step 700: 0.03418353572487831 Loss at step 750: 0.028567181900143623 Loss at step 800: 0.03475787118077278 Loss at step 850: 0.05721162632107735 Loss at step 900: 0.037084441632032394 Mean training loss after epoch 130: 0.03919941432940871 EPOCH: 131 Loss at step 0: 0.04942189157009125 Loss at step 50: 0.03365939483046532 Loss at step 100: 0.027852647006511688 Loss at step 150: 0.047222428023815155 Loss at step 200: 0.03616688773036003 Loss at step 250: 0.04715469852089882 Loss at step 300: 0.026201607659459114 Loss at step 350: 0.033973366022109985 Loss at step 400: 0.03621392324566841 Loss at step 450: 0.040546346455812454 Loss at step 500: 0.032363735139369965 Loss at step 550: 0.04279641807079315 Loss at step 600: 0.052475783973932266 Loss at step 650: 0.06121353060007095 Loss at step 700: 0.02716679684817791 Loss at step 750: 0.03327028453350067 Loss at step 800: 0.03253234550356865 Loss at step 850: 0.038443103432655334 Loss at step 900: 0.03776976838707924 Mean training loss after epoch 131: 0.0397078473545881 EPOCH: 132 Loss at step 0: 0.03770219162106514 Loss at step 50: 0.03506242111325264 Loss at step 100: 0.03144155442714691 Loss at step 150: 0.0354040302336216 Loss at step 200: 0.05009981617331505 Loss at step 250: 0.03772049397230148 Loss at step 300: 0.05776866525411606 Loss at step 350: 0.03293938562273979 Loss at step 400: 0.053352054208517075 Loss at step 450: 0.032124947756528854 Loss at step 500: 0.03288784250617027 Loss at step 550: 0.04658212512731552 Loss at step 600: 0.041854362934827805 Loss at step 650: 0.034500207751989365 Loss at step 700: 0.03226133808493614 Loss at step 750: 0.03156770020723343 Loss at step 800: 0.028153065592050552 Loss at step 850: 0.043809786438941956 Loss at step 900: 0.03347623720765114 Mean training loss after epoch 132: 0.039293870109437246 EPOCH: 133 Loss at step 0: 0.03174188360571861 Loss at step 50: 0.038746170699596405 Loss at step 100: 0.03673787787556648 Loss at step 150: 0.03268410265445709 Loss at step 200: 0.06379690021276474 Loss at step 250: 0.026786789298057556 Loss at step 300: 0.028228847309947014 Loss at step 350: 0.03646577522158623 Loss at step 400: 0.02791515551507473 Loss at step 450: 0.06170249357819557 Loss at step 500: 0.046977050602436066 Loss at step 550: 0.04851098731160164 Loss at step 600: 0.031120220199227333 Loss at step 650: 0.037626199424266815 Loss at step 700: 0.03415466845035553 Loss at step 750: 0.05338989198207855 Loss at step 800: 0.053004391491413116 Loss at step 850: 0.03651733323931694 Loss at step 900: 0.03708405792713165 Mean training loss after epoch 133: 0.03968962426307295 EPOCH: 134 Loss at step 0: 0.031423170119524 Loss at step 50: 0.03835438936948776 Loss at step 100: 0.04364590719342232 Loss at step 150: 0.03718087449669838 Loss at step 200: 0.030112434178590775 Loss at step 250: 0.03569090738892555 Loss at step 300: 0.0516793467104435 Loss at step 350: 0.03494222089648247 Loss at step 400: 0.034133993089199066 Loss at step 450: 0.02581144869327545 Loss at step 500: 0.06393831223249435 Loss at step 550: 0.025349831208586693 Loss at step 600: 0.05470648780465126 Loss at step 650: 0.03889636695384979 Loss at step 700: 0.04463161528110504 Loss at step 750: 0.03257147967815399 Loss at step 800: 0.03361441567540169 Loss at step 850: 0.03429478034377098 Loss at step 900: 0.04883019998669624 Mean training loss after epoch 134: 0.040071299990643065 EPOCH: 135 Loss at step 0: 0.048819735646247864 Loss at step 50: 0.03302542865276337 Loss at step 100: 0.04581547528505325 Loss at step 150: 0.03655993938446045 Loss at step 200: 0.03455425053834915 Loss at step 250: 0.03935471177101135 Loss at step 300: 0.034175023436546326 Loss at step 350: 0.03572332113981247 Loss at step 400: 0.03416956216096878 Loss at step 450: 0.037371937185525894 Loss at step 500: 0.03436872363090515 Loss at step 550: 0.0559718981385231 Loss at step 600: 0.027816057205200195 Loss at step 650: 0.04744337871670723 Loss at step 700: 0.03942970931529999 Loss at step 750: 0.054030727595090866 Loss at step 800: 0.03445924445986748 Loss at step 850: 0.0342407152056694 Loss at step 900: 0.031741995364427567 Mean training loss after epoch 135: 0.040007698637590225 EPOCH: 136 Loss at step 0: 0.03192661330103874 Loss at step 50: 0.051365189254283905 Loss at step 100: 0.027592819184064865 Loss at step 150: 0.04960818588733673 Loss at step 200: 0.033114392310380936 Loss at step 250: 0.03183818608522415 Loss at step 300: 0.03316628932952881 Loss at step 350: 0.0440492145717144 Loss at step 400: 0.036088209599256516 Loss at step 450: 0.034465935081243515 Loss at step 500: 0.036137793213129044 Loss at step 550: 0.03461097553372383 Loss at step 600: 0.029762331396341324 Loss at step 650: 0.051575858145952225 Loss at step 700: 0.04051172360777855 Loss at step 750: 0.06311260163784027 Loss at step 800: 0.042850006371736526 Loss at step 850: 0.03613436222076416 Loss at step 900: 0.03765690326690674 Mean training loss after epoch 136: 0.03945391030231519 EPOCH: 137 Loss at step 0: 0.04395096376538277 Loss at step 50: 0.050257641822099686 Loss at step 100: 0.02782345749437809 Loss at step 150: 0.054565224796533585 Loss at step 200: 0.023753153160214424 Loss at step 250: 0.04255859553813934 Loss at step 300: 0.03411520645022392 Loss at step 350: 0.040014415979385376 Loss at step 400: 0.03409161418676376 Loss at step 450: 0.03113659657537937 Loss at step 500: 0.08021800965070724 Loss at step 550: 0.0356910265982151 Loss at step 600: 0.06489501148462296 Loss at step 650: 0.05397685617208481 Loss at step 700: 0.03198349475860596 Loss at step 750: 0.053377822041511536 Loss at step 800: 0.03337441757321358 Loss at step 850: 0.03899423032999039 Loss at step 900: 0.034746699035167694 Mean training loss after epoch 137: 0.03952346904191381 EPOCH: 138 Loss at step 0: 0.036104995757341385 Loss at step 50: 0.027810009196400642 Loss at step 100: 0.03382234647870064 Loss at step 150: 0.04798336327075958 Loss at step 200: 0.037167731672525406 Loss at step 250: 0.032153934240341187 Loss at step 300: 0.03326819837093353 Loss at step 350: 0.05003686621785164 Loss at step 400: 0.05697597190737724 Loss at step 450: 0.045728810131549835 Loss at step 500: 0.06959401816129684 Loss at step 550: 0.035950131714344025 Loss at step 600: 0.0321117639541626 Loss at step 650: 0.03401661291718483 Loss at step 700: 0.05105699598789215 Loss at step 750: 0.0357053168118 Loss at step 800: 0.034585315734148026 Loss at step 850: 0.06274615228176117 Loss at step 900: 0.038454629480838776 Mean training loss after epoch 138: 0.039391084907771046 EPOCH: 139 Loss at step 0: 0.05037341266870499 Loss at step 50: 0.0302533321082592 Loss at step 100: 0.03672996163368225 Loss at step 150: 0.04855988919734955 Loss at step 200: 0.03576243668794632 Loss at step 250: 0.061005257070064545 Loss at step 300: 0.04025362804532051 Loss at step 350: 0.035697091370821 Loss at step 400: 0.03639305382966995 Loss at step 450: 0.038840338587760925 Loss at step 500: 0.032545316964387894 Loss at step 550: 0.03369078040122986 Loss at step 600: 0.06329631805419922 Loss at step 650: 0.03023676387965679 Loss at step 700: 0.03871874883770943 Loss at step 750: 0.026381155475974083 Loss at step 800: 0.04810655117034912 Loss at step 850: 0.03486458212137222 Loss at step 900: 0.033961161971092224 Mean training loss after epoch 139: 0.039095509505427596 EPOCH: 140 Loss at step 0: 0.032024532556533813 Loss at step 50: 0.037724222987890244 Loss at step 100: 0.0323198176920414 Loss at step 150: 0.03586667776107788 Loss at step 200: 0.03529367595911026 Loss at step 250: 0.03854746371507645 Loss at step 300: 0.03343571349978447 Loss at step 350: 0.031030818819999695 Loss at step 400: 0.03551212698221207 Loss at step 450: 0.04176176339387894 Loss at step 500: 0.029247712343931198 Loss at step 550: 0.03232535347342491 Loss at step 600: 0.03849416598677635 Loss at step 650: 0.059137772768735886 Loss at step 700: 0.042178407311439514 Loss at step 750: 0.03467922657728195 Loss at step 800: 0.04953860118985176 Loss at step 850: 0.033621009439229965 Loss at step 900: 0.06417721509933472 Mean training loss after epoch 140: 0.03956295682518467 EPOCH: 141 Loss at step 0: 0.031019700691103935 Loss at step 50: 0.03965692222118378 Loss at step 100: 0.033693525940179825 Loss at step 150: 0.03839253634214401 Loss at step 200: 0.04737725481390953 Loss at step 250: 0.06856204569339752 Loss at step 300: 0.05511469766497612 Loss at step 350: 0.027376102283596992 Loss at step 400: 0.03532049432396889 Loss at step 450: 0.031328268349170685 Loss at step 500: 0.04664536938071251 Loss at step 550: 0.03839164599776268 Loss at step 600: 0.03846563398838043 Loss at step 650: 0.03467755764722824 Loss at step 700: 0.03073793463408947 Loss at step 750: 0.04611557349562645 Loss at step 800: 0.03085605800151825 Loss at step 850: 0.05081420764327049 Loss at step 900: 0.04259518161416054 Mean training loss after epoch 141: 0.03890009620177275 EPOCH: 142 Loss at step 0: 0.037041887640953064 Loss at step 50: 0.030815064907073975 Loss at step 100: 0.04651723429560661 Loss at step 150: 0.03690178692340851 Loss at step 200: 0.04917485639452934 Loss at step 250: 0.04585995897650719 Loss at step 300: 0.05248291417956352 Loss at step 350: 0.03641911968588829 Loss at step 400: 0.0370466448366642 Loss at step 450: 0.046596039086580276 Loss at step 500: 0.038282692432403564 Loss at step 550: 0.03398037329316139 Loss at step 600: 0.030184363946318626 Loss at step 650: 0.039364419877529144 Loss at step 700: 0.025041282176971436 Loss at step 750: 0.035018227994441986 Loss at step 800: 0.03860177844762802 Loss at step 850: 0.028396092355251312 Loss at step 900: 0.028349613770842552 Mean training loss after epoch 142: 0.03887044592325621 EPOCH: 143 Loss at step 0: 0.07800904661417007 Loss at step 50: 0.03596172481775284 Loss at step 100: 0.03385293483734131 Loss at step 150: 0.029611460864543915 Loss at step 200: 0.03388464078307152 Loss at step 250: 0.0314510241150856 Loss at step 300: 0.03241944685578346 Loss at step 350: 0.031398653984069824 Loss at step 400: 0.04492209851741791 Loss at step 450: 0.04808317869901657 Loss at step 500: 0.03864818438887596 Loss at step 550: 0.0327545665204525 Loss at step 600: 0.03282344341278076 Loss at step 650: 0.02858343906700611 Loss at step 700: 0.04729243740439415 Loss at step 750: 0.054981477558612823 Loss at step 800: 0.048667341470718384 Loss at step 850: 0.05173594132065773 Loss at step 900: 0.03270960599184036 Mean training loss after epoch 143: 0.03935641843253679 EPOCH: 144 Loss at step 0: 0.04598182439804077 Loss at step 50: 0.033364444971084595 Loss at step 100: 0.04085895046591759 Loss at step 150: 0.054201699793338776 Loss at step 200: 0.03283211961388588 Loss at step 250: 0.04913542792201042 Loss at step 300: 0.040407951921224594 Loss at step 350: 0.03492480143904686 Loss at step 400: 0.03537715971469879 Loss at step 450: 0.040051206946372986 Loss at step 500: 0.0338791199028492 Loss at step 550: 0.031114110723137856 Loss at step 600: 0.0558236762881279 Loss at step 650: 0.04854753986001015 Loss at step 700: 0.03347047045826912 Loss at step 750: 0.04555104672908783 Loss at step 800: 0.05050625652074814 Loss at step 850: 0.034400638192892075 Loss at step 900: 0.04862940311431885 Mean training loss after epoch 144: 0.03965999963862111 EPOCH: 145 Loss at step 0: 0.053500883281230927 Loss at step 50: 0.042979948222637177 Loss at step 100: 0.034541741013526917 Loss at step 150: 0.08472888916730881 Loss at step 200: 0.03895635902881622 Loss at step 250: 0.03469546511769295 Loss at step 300: 0.037288323044776917 Loss at step 350: 0.043157368898391724 Loss at step 400: 0.03341991454362869 Loss at step 450: 0.030327077955007553 Loss at step 500: 0.032299868762493134 Loss at step 550: 0.03639330342411995 Loss at step 600: 0.03218500316143036 Loss at step 650: 0.03563271835446358 Loss at step 700: 0.041055385023355484 Loss at step 750: 0.030614376068115234 Loss at step 800: 0.0421602725982666 Loss at step 850: 0.07264039665460587 Loss at step 900: 0.03080025501549244 Mean training loss after epoch 145: 0.039527953409754644 EPOCH: 146 Loss at step 0: 0.038086384534835815 Loss at step 50: 0.03151174634695053 Loss at step 100: 0.06780578196048737 Loss at step 150: 0.035944465547800064 Loss at step 200: 0.04315976798534393 Loss at step 250: 0.03627829626202583 Loss at step 300: 0.044868070632219315 Loss at step 350: 0.06239329278469086 Loss at step 400: 0.03116191178560257 Loss at step 450: 0.030056767165660858 Loss at step 500: 0.04555276036262512 Loss at step 550: 0.035607095807790756 Loss at step 600: 0.03516821563243866 Loss at step 650: 0.04979132488369942 Loss at step 700: 0.029950257390737534 Loss at step 750: 0.043506622314453125 Loss at step 800: 0.03687719628214836 Loss at step 850: 0.024289464578032494 Loss at step 900: 0.05392932891845703 Mean training loss after epoch 146: 0.039193182740845024 EPOCH: 147 Loss at step 0: 0.0378408320248127 Loss at step 50: 0.02723839320242405 Loss at step 100: 0.030209243297576904 Loss at step 150: 0.029873475432395935 Loss at step 200: 0.0533607043325901 Loss at step 250: 0.030002551153302193 Loss at step 300: 0.03935733810067177 Loss at step 350: 0.033383894711732864 Loss at step 400: 0.03465532884001732 Loss at step 450: 0.02880512923002243 Loss at step 500: 0.04833516106009483 Loss at step 550: 0.036070566624403 Loss at step 600: 0.03941639885306358 Loss at step 650: 0.048679206520318985 Loss at step 700: 0.08777908235788345 Loss at step 750: 0.03170846775174141 Loss at step 800: 0.040402162820100784 Loss at step 850: 0.03725903108716011 Loss at step 900: 0.034787263721227646 Mean training loss after epoch 147: 0.039646158286773446 EPOCH: 148 Loss at step 0: 0.027730843052268028 Loss at step 50: 0.043807048350572586 Loss at step 100: 0.029920341446995735 Loss at step 150: 0.029993705451488495 Loss at step 200: 0.03400915861129761 Loss at step 250: 0.04301103204488754 Loss at step 300: 0.02902970463037491 Loss at step 350: 0.03378966450691223 Loss at step 400: 0.03222730755805969 Loss at step 450: 0.049026526510715485 Loss at step 500: 0.04590123891830444 Loss at step 550: 0.034457314759492874 Loss at step 600: 0.04091913625597954 Loss at step 650: 0.05898120254278183 Loss at step 700: 0.03195128217339516 Loss at step 750: 0.030757609754800797 Loss at step 800: 0.06246788427233696 Loss at step 850: 0.0350286141037941 Loss at step 900: 0.05706969276070595 Mean training loss after epoch 148: 0.039222197903038214 EPOCH: 149 Loss at step 0: 0.03730018809437752 Loss at step 50: 0.042346253991127014 Loss at step 100: 0.037642017006874084 Loss at step 150: 0.03183600306510925 Loss at step 200: 0.030109122395515442 Loss at step 250: 0.031424567103385925 Loss at step 300: 0.035488519817590714 Loss at step 350: 0.02787010557949543 Loss at step 400: 0.036141764372587204 Loss at step 450: 0.03689347580075264 Loss at step 500: 0.03262427821755409 Loss at step 550: 0.02633744291961193 Loss at step 600: 0.05149431526660919 Loss at step 650: 0.046467360109090805 Loss at step 700: 0.05140215903520584 Loss at step 750: 0.033662937581539154 Loss at step 800: 0.06175895407795906 Loss at step 850: 0.030706090852618217 Loss at step 900: 0.03915690258145332 Mean training loss after epoch 149: 0.039625167932067475 EPOCH: 150 Loss at step 0: 0.0635681301355362 Loss at step 50: 0.03859114646911621 Loss at step 100: 0.03359058499336243 Loss at step 150: 0.04951445758342743 Loss at step 200: 0.04027419537305832 Loss at step 250: 0.03632742539048195 Loss at step 300: 0.035029441118240356 Loss at step 350: 0.0336981937289238 Loss at step 400: 0.03739224746823311 Loss at step 450: 0.03403742238879204 Loss at step 500: 0.033260997384786606 Loss at step 550: 0.03288145735859871 Loss at step 600: 0.04006393626332283 Loss at step 650: 0.05073455721139908 Loss at step 700: 0.0312972329556942 Loss at step 750: 0.030890561640262604 Loss at step 800: 0.035060226917266846 Loss at step 850: 0.02834274247288704 Loss at step 900: 0.056624848395586014 Mean training loss after epoch 150: 0.03933214702442892 EPOCH: 151 Loss at step 0: 0.05048656836152077 Loss at step 50: 0.03354731202125549 Loss at step 100: 0.03805273026227951 Loss at step 150: 0.036541737616062164 Loss at step 200: 0.04793759807944298 Loss at step 250: 0.045414041727781296 Loss at step 300: 0.03187486156821251 Loss at step 350: 0.0512293204665184 Loss at step 400: 0.032727424055337906 Loss at step 450: 0.04914374649524689 Loss at step 500: 0.06363655626773834 Loss at step 550: 0.03607652336359024 Loss at step 600: 0.04093216732144356 Loss at step 650: 0.0568569079041481 Loss at step 700: 0.0362667553126812 Loss at step 750: 0.033561594784259796 Loss at step 800: 0.06503620743751526 Loss at step 850: 0.03378823399543762 Loss at step 900: 0.032519981265068054 Mean training loss after epoch 151: 0.03930958857668488 EPOCH: 152 Loss at step 0: 0.03550422191619873 Loss at step 50: 0.04223502799868584 Loss at step 100: 0.03614363446831703 Loss at step 150: 0.031053202226758003 Loss at step 200: 0.03926531970500946 Loss at step 250: 0.032965537160634995 Loss at step 300: 0.026286372914910316 Loss at step 350: 0.02918831631541252 Loss at step 400: 0.040757209062576294 Loss at step 450: 0.05696386471390724 Loss at step 500: 0.031654246151447296 Loss at step 550: 0.051633093506097794 Loss at step 600: 0.03519409894943237 Loss at step 650: 0.039070144295692444 Loss at step 700: 0.03182769566774368 Loss at step 750: 0.03688427433371544 Loss at step 800: 0.037481896579265594 Loss at step 850: 0.03385401517152786 Loss at step 900: 0.027971047908067703 Mean training loss after epoch 152: 0.03973303176065498 EPOCH: 153 Loss at step 0: 0.04514594003558159 Loss at step 50: 0.024968784302473068 Loss at step 100: 0.032275207340717316 Loss at step 150: 0.03340767323970795 Loss at step 200: 0.03128409385681152 Loss at step 250: 0.056861598044633865 Loss at step 300: 0.033244118094444275 Loss at step 350: 0.03236854448914528 Loss at step 400: 0.0381719172000885 Loss at step 450: 0.03750725835561752 Loss at step 500: 0.051320016384124756 Loss at step 550: 0.03580769523978233 Loss at step 600: 0.029731059446930885 Loss at step 650: 0.039082035422325134 Loss at step 700: 0.036009445786476135 Loss at step 750: 0.033434707671403885 Loss at step 800: 0.026327135041356087 Loss at step 850: 0.0509016215801239 Loss at step 900: 0.03406652435660362 Mean training loss after epoch 153: 0.0395354926804585 EPOCH: 154 Loss at step 0: 0.05581461638212204 Loss at step 50: 0.03425249084830284 Loss at step 100: 0.04120117798447609 Loss at step 150: 0.03463542461395264 Loss at step 200: 0.03750217705965042 Loss at step 250: 0.031635165214538574 Loss at step 300: 0.03139062598347664 Loss at step 350: 0.06818025559186935 Loss at step 400: 0.06229027733206749 Loss at step 450: 0.037896402180194855 Loss at step 500: 0.036053553223609924 Loss at step 550: 0.03459250554442406 Loss at step 600: 0.03319244459271431 Loss at step 650: 0.03027353435754776 Loss at step 700: 0.03464864194393158 Loss at step 750: 0.034393858164548874 Loss at step 800: 0.032815080136060715 Loss at step 850: 0.03894956037402153 Loss at step 900: 0.05316885933279991 Mean training loss after epoch 154: 0.039240895373337685 EPOCH: 155 Loss at step 0: 0.0310871209949255 Loss at step 50: 0.032070592045784 Loss at step 100: 0.08389531075954437 Loss at step 150: 0.05327474698424339 Loss at step 200: 0.028793562203645706 Loss at step 250: 0.03473593667149544 Loss at step 300: 0.03498676419258118 Loss at step 350: 0.030903948470950127 Loss at step 400: 0.04266294091939926 Loss at step 450: 0.03788764029741287 Loss at step 500: 0.040310509502887726 Loss at step 550: 0.04705553874373436 Loss at step 600: 0.027487406507134438 Loss at step 650: 0.048603251576423645 Loss at step 700: 0.03078928031027317 Loss at step 750: 0.03656581789255142 Loss at step 800: 0.03977557271718979 Loss at step 850: 0.0304188821464777 Loss at step 900: 0.03171367570757866 Mean training loss after epoch 155: 0.03976422408893546 EPOCH: 156 Loss at step 0: 0.03371843323111534 Loss at step 50: 0.05305543169379234 Loss at step 100: 0.03302857652306557 Loss at step 150: 0.06369593739509583 Loss at step 200: 0.05015632510185242 Loss at step 250: 0.05050581321120262 Loss at step 300: 0.07002273201942444 Loss at step 350: 0.030274443328380585 Loss at step 400: 0.04756508022546768 Loss at step 450: 0.036425795406103134 Loss at step 500: 0.0389801487326622 Loss at step 550: 0.028173215687274933 Loss at step 600: 0.03244103863835335 Loss at step 650: 0.036437876522541046 Loss at step 700: 0.04027947783470154 Loss at step 750: 0.032968707382678986 Loss at step 800: 0.03642258420586586 Loss at step 850: 0.03340383246541023 Loss at step 900: 0.03132117912173271 Mean training loss after epoch 156: 0.03897639409875247 EPOCH: 157 Loss at step 0: 0.057234928011894226 Loss at step 50: 0.027553504332900047 Loss at step 100: 0.04392443224787712 Loss at step 150: 0.06077493727207184 Loss at step 200: 0.03723381459712982 Loss at step 250: 0.03980686143040657 Loss at step 300: 0.0366029292345047 Loss at step 350: 0.03918832913041115 Loss at step 400: 0.04736342653632164 Loss at step 450: 0.03613236919045448 Loss at step 500: 0.03801778703927994 Loss at step 550: 0.029580578207969666 Loss at step 600: 0.03510025516152382 Loss at step 650: 0.035331495106220245 Loss at step 700: 0.04116532579064369 Loss at step 750: 0.06208318471908569 Loss at step 800: 0.038616690784692764 Loss at step 850: 0.05221186578273773 Loss at step 900: 0.03497852385044098 Mean training loss after epoch 157: 0.03924621167832981 EPOCH: 158 Loss at step 0: 0.03518567234277725 Loss at step 50: 0.03604546934366226 Loss at step 100: 0.036640558391809464 Loss at step 150: 0.06843830645084381 Loss at step 200: 0.043211791664361954 Loss at step 250: 0.061072979122400284 Loss at step 300: 0.0520750917494297 Loss at step 350: 0.025718245655298233 Loss at step 400: 0.056267280131578445 Loss at step 450: 0.029653271660208702 Loss at step 500: 0.03245259076356888 Loss at step 550: 0.034099020063877106 Loss at step 600: 0.041142288595438004 Loss at step 650: 0.0361553318798542 Loss at step 700: 0.054463841021060944 Loss at step 750: 0.03847633674740791 Loss at step 800: 0.037515804171562195 Loss at step 850: 0.031430359929800034 Loss at step 900: 0.03876139223575592 Mean training loss after epoch 158: 0.03919123150884851 EPOCH: 159 Loss at step 0: 0.043939609080553055 Loss at step 50: 0.053520455956459045 Loss at step 100: 0.052138444036245346 Loss at step 150: 0.03290197625756264 Loss at step 200: 0.03234424814581871 Loss at step 250: 0.051657792180776596 Loss at step 300: 0.03729550540447235 Loss at step 350: 0.0506618358194828 Loss at step 400: 0.0385521799325943 Loss at step 450: 0.037131600081920624 Loss at step 500: 0.03052428737282753 Loss at step 550: 0.036414749920368195 Loss at step 600: 0.050583191215991974 Loss at step 650: 0.027664225548505783 Loss at step 700: 0.030049961060285568 Loss at step 750: 0.05275664106011391 Loss at step 800: 0.03260257840156555 Loss at step 850: 0.03709041327238083 Loss at step 900: 0.03108915127813816 Mean training loss after epoch 159: 0.03924297018131531 EPOCH: 160 Loss at step 0: 0.05288207530975342 Loss at step 50: 0.039461344480514526 Loss at step 100: 0.03733035549521446 Loss at step 150: 0.05008985102176666 Loss at step 200: 0.04603055492043495 Loss at step 250: 0.037323225289583206 Loss at step 300: 0.03126620128750801 Loss at step 350: 0.03383047133684158 Loss at step 400: 0.0328173004090786 Loss at step 450: 0.03981626778841019 Loss at step 500: 0.03920019418001175 Loss at step 550: 0.0535724014043808 Loss at step 600: 0.07834642380475998 Loss at step 650: 0.03841550275683403 Loss at step 700: 0.0382724367082119 Loss at step 750: 0.050695884972810745 Loss at step 800: 0.06713271141052246 Loss at step 850: 0.0263232309371233 Loss at step 900: 0.03568645194172859 Mean training loss after epoch 160: 0.039470712145937406 EPOCH: 161 Loss at step 0: 0.0513257272541523 Loss at step 50: 0.034685760736465454 Loss at step 100: 0.04915265366435051 Loss at step 150: 0.03358159214258194 Loss at step 200: 0.033740002661943436 Loss at step 250: 0.04508654773235321 Loss at step 300: 0.036520522087812424 Loss at step 350: 0.0321122482419014 Loss at step 400: 0.028532158583402634 Loss at step 450: 0.03871700540184975 Loss at step 500: 0.036667924374341965 Loss at step 550: 0.06410471349954605 Loss at step 600: 0.053647808730602264 Loss at step 650: 0.03314018249511719 Loss at step 700: 0.028010109439492226 Loss at step 750: 0.04680185765028 Loss at step 800: 0.04235909879207611 Loss at step 850: 0.0374627523124218 Loss at step 900: 0.03358517214655876 Mean training loss after epoch 161: 0.03932493462411961 EPOCH: 162 Loss at step 0: 0.03617725893855095 Loss at step 50: 0.03734595328569412 Loss at step 100: 0.03455570712685585 Loss at step 150: 0.03798704966902733 Loss at step 200: 0.03124365396797657 Loss at step 250: 0.029349885880947113 Loss at step 300: 0.028867419809103012 Loss at step 350: 0.03217606991529465 Loss at step 400: 0.04024596884846687 Loss at step 450: 0.03250168636441231 Loss at step 500: 0.054221246391534805 Loss at step 550: 0.03538833186030388 Loss at step 600: 0.03723372519016266 Loss at step 650: 0.031777914613485336 Loss at step 700: 0.04728477820754051 Loss at step 750: 0.040914956480264664 Loss at step 800: 0.04542193189263344 Loss at step 850: 0.029963837936520576 Loss at step 900: 0.02079913578927517 Mean training loss after epoch 162: 0.03941607202635582 EPOCH: 163 Loss at step 0: 0.03298136964440346 Loss at step 50: 0.03542909771203995 Loss at step 100: 0.028272759169340134 Loss at step 150: 0.03729943931102753 Loss at step 200: 0.04108394682407379 Loss at step 250: 0.02899816818535328 Loss at step 300: 0.032750558108091354 Loss at step 350: 0.036313727498054504 Loss at step 400: 0.04136597365140915 Loss at step 450: 0.03960328921675682 Loss at step 500: 0.03230874985456467 Loss at step 550: 0.04867088422179222 Loss at step 600: 0.05001817271113396 Loss at step 650: 0.05419230833649635 Loss at step 700: 0.05996684357523918 Loss at step 750: 0.04545874521136284 Loss at step 800: 0.04341427981853485 Loss at step 850: 0.03400690481066704 Loss at step 900: 0.03298550471663475 Mean training loss after epoch 163: 0.03887515005343822 EPOCH: 164 Loss at step 0: 0.030659688636660576 Loss at step 50: 0.05481678992509842 Loss at step 100: 0.04064493253827095 Loss at step 150: 0.029294587671756744 Loss at step 200: 0.08567048609256744 Loss at step 250: 0.0467347614467144 Loss at step 300: 0.032512687146663666 Loss at step 350: 0.04507340118288994 Loss at step 400: 0.051033418625593185 Loss at step 450: 0.03646430745720863 Loss at step 500: 0.03835824877023697 Loss at step 550: 0.042812447994947433 Loss at step 600: 0.05038508400321007 Loss at step 650: 0.04679122194647789 Loss at step 700: 0.028881927952170372 Loss at step 750: 0.04263278841972351 Loss at step 800: 0.04901593551039696 Loss at step 850: 0.06753674894571304 Loss at step 900: 0.025265220552682877 Mean training loss after epoch 164: 0.039169049462371035 EPOCH: 165 Loss at step 0: 0.050334297120571136 Loss at step 50: 0.04915367811918259 Loss at step 100: 0.031108077615499496 Loss at step 150: 0.05546001344919205 Loss at step 200: 0.032186005264520645 Loss at step 250: 0.04261118918657303 Loss at step 300: 0.033219050616025925 Loss at step 350: 0.05272221937775612 Loss at step 400: 0.03478274866938591 Loss at step 450: 0.047346144914627075 Loss at step 500: 0.04917857423424721 Loss at step 550: 0.030005767941474915 Loss at step 600: 0.031224580481648445 Loss at step 650: 0.027899092063307762 Loss at step 700: 0.03437371551990509 Loss at step 750: 0.029487749561667442 Loss at step 800: 0.02987102046608925 Loss at step 850: 0.04677090793848038 Loss at step 900: 0.02810831554234028 Mean training loss after epoch 165: 0.038855627957564684 EPOCH: 166 Loss at step 0: 0.0349762886762619 Loss at step 50: 0.04614792764186859 Loss at step 100: 0.03358326107263565 Loss at step 150: 0.029618510976433754 Loss at step 200: 0.029864752665162086 Loss at step 250: 0.03437303006649017 Loss at step 300: 0.051478929817676544 Loss at step 350: 0.04301890358328819 Loss at step 400: 0.034064773470163345 Loss at step 450: 0.04761025309562683 Loss at step 500: 0.028954358771443367 Loss at step 550: 0.03937385231256485 Loss at step 600: 0.03357689827680588 Loss at step 650: 0.03834272176027298 Loss at step 700: 0.040788501501083374 Loss at step 750: 0.03717277571558952 Loss at step 800: 0.028320590034127235 Loss at step 850: 0.05199575424194336 Loss at step 900: 0.052021678537130356 Mean training loss after epoch 166: 0.03958347950702601 EPOCH: 167 Loss at step 0: 0.03851846233010292 Loss at step 50: 0.027202051132917404 Loss at step 100: 0.03668200224637985 Loss at step 150: 0.035710643976926804 Loss at step 200: 0.03407706320285797 Loss at step 250: 0.031828928738832474 Loss at step 300: 0.03363081440329552 Loss at step 350: 0.036198001354932785 Loss at step 400: 0.06014261394739151 Loss at step 450: 0.036499544978141785 Loss at step 500: 0.05670931935310364 Loss at step 550: 0.032829903066158295 Loss at step 600: 0.052797574549913406 Loss at step 650: 0.058645691722631454 Loss at step 700: 0.033004071563482285 Loss at step 750: 0.03863726183772087 Loss at step 800: 0.0497298538684845 Loss at step 850: 0.06951345503330231 Loss at step 900: 0.039329614490270615 Mean training loss after epoch 167: 0.03950740107253734 EPOCH: 168 Loss at step 0: 0.04812485724687576 Loss at step 50: 0.045562781393527985 Loss at step 100: 0.05294008553028107 Loss at step 150: 0.031224440783262253 Loss at step 200: 0.051217224448919296 Loss at step 250: 0.0352286733686924 Loss at step 300: 0.06751609593629837 Loss at step 350: 0.048404060304164886 Loss at step 400: 0.03235281631350517 Loss at step 450: 0.05293611064553261 Loss at step 500: 0.03771340847015381 Loss at step 550: 0.03256698325276375 Loss at step 600: 0.05058521404862404 Loss at step 650: 0.04209081456065178 Loss at step 700: 0.04569631442427635 Loss at step 750: 0.045270901173353195 Loss at step 800: 0.03682660311460495 Loss at step 850: 0.05838702619075775 Loss at step 900: 0.047535426914691925 Mean training loss after epoch 168: 0.038812527301977436 EPOCH: 169 Loss at step 0: 0.035249508917331696 Loss at step 50: 0.04865698888897896 Loss at step 100: 0.03341301158070564 Loss at step 150: 0.05195419862866402 Loss at step 200: 0.0326247476041317 Loss at step 250: 0.0534488707780838 Loss at step 300: 0.028785832226276398 Loss at step 350: 0.03274010866880417 Loss at step 400: 0.03166402876377106 Loss at step 450: 0.04065057635307312 Loss at step 500: 0.034363940358161926 Loss at step 550: 0.03997866064310074 Loss at step 600: 0.03224586322903633 Loss at step 650: 0.030992381274700165 Loss at step 700: 0.035040486603975296 Loss at step 750: 0.04140629991889 Loss at step 800: 0.025214888155460358 Loss at step 850: 0.026899171993136406 Loss at step 900: 0.046330489218235016 Mean training loss after epoch 169: 0.03878859086220325 EPOCH: 170 Loss at step 0: 0.03813978284597397 Loss at step 50: 0.047573428601026535 Loss at step 100: 0.03425794467329979 Loss at step 150: 0.046652909368276596 Loss at step 200: 0.034418776631355286 Loss at step 250: 0.04764286428689957 Loss at step 300: 0.06413517892360687 Loss at step 350: 0.030850959941744804 Loss at step 400: 0.044213276356458664 Loss at step 450: 0.03383059427142143 Loss at step 500: 0.04466385766863823 Loss at step 550: 0.037227801978588104 Loss at step 600: 0.03747207298874855 Loss at step 650: 0.029291454702615738 Loss at step 700: 0.04942853003740311 Loss at step 750: 0.03823840618133545 Loss at step 800: 0.035450272262096405 Loss at step 850: 0.05353355407714844 Loss at step 900: 0.03136174753308296 Mean training loss after epoch 170: 0.03907018407647099 EPOCH: 171 Loss at step 0: 0.03253631293773651 Loss at step 50: 0.036774586886167526 Loss at step 100: 0.036331068724393845 Loss at step 150: 0.037170059978961945 Loss at step 200: 0.03475722298026085 Loss at step 250: 0.03536279872059822 Loss at step 300: 0.032984524965286255 Loss at step 350: 0.07008861750364304 Loss at step 400: 0.03769521787762642 Loss at step 450: 0.04640931263566017 Loss at step 500: 0.027477040886878967 Loss at step 550: 0.029912931844592094 Loss at step 600: 0.03493645042181015 Loss at step 650: 0.03780147805809975 Loss at step 700: 0.03162743151187897 Loss at step 750: 0.03185141086578369 Loss at step 800: 0.032744985073804855 Loss at step 850: 0.042146794497966766 Loss at step 900: 0.03094305656850338 Mean training loss after epoch 171: 0.03935051010425157 EPOCH: 172 Loss at step 0: 0.03449912741780281 Loss at step 50: 0.03143830969929695 Loss at step 100: 0.03214815631508827 Loss at step 150: 0.03248504176735878 Loss at step 200: 0.05405086651444435 Loss at step 250: 0.040951650589704514 Loss at step 300: 0.052501268684864044 Loss at step 350: 0.03982722386717796 Loss at step 400: 0.035263411700725555 Loss at step 450: 0.029025916010141373 Loss at step 500: 0.04954290762543678 Loss at step 550: 0.037146568298339844 Loss at step 600: 0.03575003892183304 Loss at step 650: 0.03380720317363739 Loss at step 700: 0.035038143396377563 Loss at step 750: 0.038343656808137894 Loss at step 800: 0.03158945590257645 Loss at step 850: 0.036919865757226944 Loss at step 900: 0.03487055003643036 Mean training loss after epoch 172: 0.03889088288370544 EPOCH: 173 Loss at step 0: 0.03138274699449539 Loss at step 50: 0.025986801832914352 Loss at step 100: 0.03170880302786827 Loss at step 150: 0.04323652386665344 Loss at step 200: 0.03465811535716057 Loss at step 250: 0.045552611351013184 Loss at step 300: 0.030850393697619438 Loss at step 350: 0.039794258773326874 Loss at step 400: 0.04759243503212929 Loss at step 450: 0.04270853102207184 Loss at step 500: 0.04243195801973343 Loss at step 550: 0.03496856614947319 Loss at step 600: 0.03673814237117767 Loss at step 650: 0.029822615906596184 Loss at step 700: 0.035439055413007736 Loss at step 750: 0.03878561407327652 Loss at step 800: 0.03671463206410408 Loss at step 850: 0.054172221571207047 Loss at step 900: 0.03828703612089157 Mean training loss after epoch 173: 0.03913085260934858 EPOCH: 174 Loss at step 0: 0.03923233225941658 Loss at step 50: 0.0362083725631237 Loss at step 100: 0.027146954089403152 Loss at step 150: 0.045268718153238297 Loss at step 200: 0.028672441840171814 Loss at step 250: 0.04492820426821709 Loss at step 300: 0.03320549055933952 Loss at step 350: 0.03337933495640755 Loss at step 400: 0.03440156206488609 Loss at step 450: 0.05564742907881737 Loss at step 500: 0.03428972512483597 Loss at step 550: 0.036005955189466476 Loss at step 600: 0.04668712615966797 Loss at step 650: 0.046969570219516754 Loss at step 700: 0.04779266193509102 Loss at step 750: 0.03470974415540695 Loss at step 800: 0.03122764639556408 Loss at step 850: 0.03762230649590492 Loss at step 900: 0.03120492398738861 Mean training loss after epoch 174: 0.039515819358847924 EPOCH: 175 Loss at step 0: 0.05309366434812546 Loss at step 50: 0.05063268169760704 Loss at step 100: 0.03758813440799713 Loss at step 150: 0.04657798632979393 Loss at step 200: 0.03457564115524292 Loss at step 250: 0.035290032625198364 Loss at step 300: 0.030912622809410095 Loss at step 350: 0.06389304250478745 Loss at step 400: 0.03429973125457764 Loss at step 450: 0.03875140845775604 Loss at step 500: 0.03562680631875992 Loss at step 550: 0.06072109565138817 Loss at step 600: 0.0332050547003746 Loss at step 650: 0.030641604214906693 Loss at step 700: 0.02966175228357315 Loss at step 750: 0.02970394678413868 Loss at step 800: 0.03495590388774872 Loss at step 850: 0.03306683525443077 Loss at step 900: 0.05852912738919258 Mean training loss after epoch 175: 0.03916891178946251 EPOCH: 176 Loss at step 0: 0.03355959802865982 Loss at step 50: 0.03817852959036827 Loss at step 100: 0.029827676713466644 Loss at step 150: 0.049565061926841736 Loss at step 200: 0.035668980330228806 Loss at step 250: 0.028268609195947647 Loss at step 300: 0.03454212099313736 Loss at step 350: 0.03718159720301628 Loss at step 400: 0.042191654443740845 Loss at step 450: 0.03259550407528877 Loss at step 500: 0.028897596523165703 Loss at step 550: 0.03702004998922348 Loss at step 600: 0.05753730610013008 Loss at step 650: 0.035004790872335434 Loss at step 700: 0.04479010030627251 Loss at step 750: 0.03435611352324486 Loss at step 800: 0.0314345620572567 Loss at step 850: 0.03051929548382759 Loss at step 900: 0.049904949963092804 Mean training loss after epoch 176: 0.03916158326970997 EPOCH: 177 Loss at step 0: 0.0372232124209404 Loss at step 50: 0.061391156166791916 Loss at step 100: 0.02589123137295246 Loss at step 150: 0.04098399728536606 Loss at step 200: 0.03717169910669327 Loss at step 250: 0.028730250895023346 Loss at step 300: 0.03881428763270378 Loss at step 350: 0.05208694934844971 Loss at step 400: 0.03064451925456524 Loss at step 450: 0.04255562648177147 Loss at step 500: 0.03958858549594879 Loss at step 550: 0.03288872167468071 Loss at step 600: 0.03490692749619484 Loss at step 650: 0.05171559005975723 Loss at step 700: 0.04158690571784973 Loss at step 750: 0.04098907858133316 Loss at step 800: 0.03575713559985161 Loss at step 850: 0.03653305768966675 Loss at step 900: 0.02866176888346672 Mean training loss after epoch 177: 0.039235789370514564 EPOCH: 178 Loss at step 0: 0.04119231179356575 Loss at step 50: 0.03776135668158531 Loss at step 100: 0.03681959956884384 Loss at step 150: 0.03232073783874512 Loss at step 200: 0.025341227650642395 Loss at step 250: 0.034421276301145554 Loss at step 300: 0.03283604234457016 Loss at step 350: 0.03014027513563633 Loss at step 400: 0.03295137733221054 Loss at step 450: 0.034649476408958435 Loss at step 500: 0.03244210034608841 Loss at step 550: 0.06001987308263779 Loss at step 600: 0.028964921832084656 Loss at step 650: 0.05406300723552704 Loss at step 700: 0.06970390677452087 Loss at step 750: 0.03521111607551575 Loss at step 800: 0.056526873260736465 Loss at step 850: 0.03846585005521774 Loss at step 900: 0.03603179007768631 Mean training loss after epoch 178: 0.0388810425849834 EPOCH: 179 Loss at step 0: 0.03120381012558937 Loss at step 50: 0.04885181039571762 Loss at step 100: 0.057533055543899536 Loss at step 150: 0.02916364185512066 Loss at step 200: 0.03778975084424019 Loss at step 250: 0.03431578725576401 Loss at step 300: 0.03574364259839058 Loss at step 350: 0.033635325729846954 Loss at step 400: 0.04998693987727165 Loss at step 450: 0.04142377898097038 Loss at step 500: 0.03838814049959183 Loss at step 550: 0.05136236920952797 Loss at step 600: 0.05127251148223877 Loss at step 650: 0.033934950828552246 Loss at step 700: 0.03613833338022232 Loss at step 750: 0.03534485772252083 Loss at step 800: 0.03300854191184044 Loss at step 850: 0.03694523870944977 Loss at step 900: 0.02892526239156723 Mean training loss after epoch 179: 0.03904336244106166 EPOCH: 180 Loss at step 0: 0.05979617312550545 Loss at step 50: 0.03392496705055237 Loss at step 100: 0.03062225505709648 Loss at step 150: 0.03333626687526703 Loss at step 200: 0.052687518298625946 Loss at step 250: 0.04732373356819153 Loss at step 300: 0.056720294058322906 Loss at step 350: 0.03910677134990692 Loss at step 400: 0.031223593279719353 Loss at step 450: 0.040896810591220856 Loss at step 500: 0.03499598056077957 Loss at step 550: 0.04742546007037163 Loss at step 600: 0.05450151115655899 Loss at step 650: 0.041661858558654785 Loss at step 700: 0.03327474370598793 Loss at step 750: 0.033290062099695206 Loss at step 800: 0.04283091053366661 Loss at step 850: 0.03432048484683037 Loss at step 900: 0.033724892884492874 Mean training loss after epoch 180: 0.038939794110479764 EPOCH: 181 Loss at step 0: 0.03279447928071022 Loss at step 50: 0.04636192321777344 Loss at step 100: 0.031019365414977074 Loss at step 150: 0.04057184234261513 Loss at step 200: 0.040611717849969864 Loss at step 250: 0.034706905484199524 Loss at step 300: 0.06649104505777359 Loss at step 350: 0.033010490238666534 Loss at step 400: 0.031118985265493393 Loss at step 450: 0.032906051725149155 Loss at step 500: 0.03283214941620827 Loss at step 550: 0.028617560863494873 Loss at step 600: 0.0322125144302845 Loss at step 650: 0.03824643790721893 Loss at step 700: 0.04454563185572624 Loss at step 750: 0.03406640514731407 Loss at step 800: 0.02372257597744465 Loss at step 850: 0.036922357976436615 Loss at step 900: 0.05304741859436035 Mean training loss after epoch 181: 0.03884632851102395 EPOCH: 182 Loss at step 0: 0.0342755988240242 Loss at step 50: 0.05240946263074875 Loss at step 100: 0.03621063753962517 Loss at step 150: 0.03494064509868622 Loss at step 200: 0.03515654057264328 Loss at step 250: 0.05001012608408928 Loss at step 300: 0.06957817077636719 Loss at step 350: 0.05387880653142929 Loss at step 400: 0.05431636422872543 Loss at step 450: 0.029219942167401314 Loss at step 500: 0.03195520117878914 Loss at step 550: 0.040569860488176346 Loss at step 600: 0.046934232115745544 Loss at step 650: 0.03658497333526611 Loss at step 700: 0.048807621002197266 Loss at step 750: 0.03408190235495567 Loss at step 800: 0.03505171462893486 Loss at step 850: 0.037616655230522156 Loss at step 900: 0.032208580523729324 Mean training loss after epoch 182: 0.03900124043472476 EPOCH: 183 Loss at step 0: 0.031053226441144943 Loss at step 50: 0.03987487033009529 Loss at step 100: 0.04322532191872597 Loss at step 150: 0.03561634570360184 Loss at step 200: 0.03436218574643135 Loss at step 250: 0.030644109472632408 Loss at step 300: 0.033368855714797974 Loss at step 350: 0.03135083243250847 Loss at step 400: 0.03304379805922508 Loss at step 450: 0.053775086998939514 Loss at step 500: 0.038113951683044434 Loss at step 550: 0.03978734835982323 Loss at step 600: 0.06395132839679718 Loss at step 650: 0.029289614409208298 Loss at step 700: 0.040424637496471405 Loss at step 750: 0.046372637152671814 Loss at step 800: 0.04785748943686485 Loss at step 850: 0.03997105360031128 Loss at step 900: 0.04505618289113045 Mean training loss after epoch 183: 0.039033955181521904 EPOCH: 184 Loss at step 0: 0.04478374123573303 Loss at step 50: 0.03696471080183983 Loss at step 100: 0.03851635381579399 Loss at step 150: 0.03278617933392525 Loss at step 200: 0.0499626025557518 Loss at step 250: 0.03889112174510956 Loss at step 300: 0.03005954623222351 Loss at step 350: 0.032187268137931824 Loss at step 400: 0.05501941591501236 Loss at step 450: 0.03525257483124733 Loss at step 500: 0.03313101828098297 Loss at step 550: 0.04683452472090721 Loss at step 600: 0.04980849474668503 Loss at step 650: 0.04530256614089012 Loss at step 700: 0.03643593192100525 Loss at step 750: 0.03432920575141907 Loss at step 800: 0.029971560463309288 Loss at step 850: 0.03182389587163925 Loss at step 900: 0.05276381969451904 Mean training loss after epoch 184: 0.0388997272927878 EPOCH: 185 Loss at step 0: 0.033059317618608475 Loss at step 50: 0.03181805834174156 Loss at step 100: 0.03224778547883034 Loss at step 150: 0.03776165843009949 Loss at step 200: 0.04596346989274025 Loss at step 250: 0.033105283975601196 Loss at step 300: 0.029863011091947556 Loss at step 350: 0.051039889454841614 Loss at step 400: 0.03717060759663582 Loss at step 450: 0.05112181603908539 Loss at step 500: 0.03774179890751839 Loss at step 550: 0.03278662636876106 Loss at step 600: 0.023061348125338554 Loss at step 650: 0.027289168909192085 Loss at step 700: 0.03288483992218971 Loss at step 750: 0.0476837232708931 Loss at step 800: 0.03494765982031822 Loss at step 850: 0.03470047190785408 Loss at step 900: 0.031199805438518524 Mean training loss after epoch 185: 0.039518985688797574 EPOCH: 186 Loss at step 0: 0.030608385801315308 Loss at step 50: 0.05133853107690811 Loss at step 100: 0.035785410553216934 Loss at step 150: 0.027957605198025703 Loss at step 200: 0.04350670799612999 Loss at step 250: 0.035890549421310425 Loss at step 300: 0.028033733367919922 Loss at step 350: 0.05105113983154297 Loss at step 400: 0.05203131586313248 Loss at step 450: 0.03433821722865105 Loss at step 500: 0.04815547168254852 Loss at step 550: 0.03431757912039757 Loss at step 600: 0.040731120854616165 Loss at step 650: 0.029513956978917122 Loss at step 700: 0.03263835608959198 Loss at step 750: 0.043821342289447784 Loss at step 800: 0.03337934613227844 Loss at step 850: 0.05922061577439308 Loss at step 900: 0.03227822855114937 Mean training loss after epoch 186: 0.03909341554874296 EPOCH: 187 Loss at step 0: 0.048928674310445786 Loss at step 50: 0.05704346299171448 Loss at step 100: 0.026194971054792404 Loss at step 150: 0.03255079314112663 Loss at step 200: 0.04841430112719536 Loss at step 250: 0.05043070390820503 Loss at step 300: 0.05300204083323479 Loss at step 350: 0.03876351937651634 Loss at step 400: 0.042237620800733566 Loss at step 450: 0.036134324967861176 Loss at step 500: 0.10071895271539688 Loss at step 550: 0.05166502669453621 Loss at step 600: 0.03981539607048035 Loss at step 650: 0.03159171715378761 Loss at step 700: 0.04729178547859192 Loss at step 750: 0.03368173539638519 Loss at step 800: 0.03773368149995804 Loss at step 850: 0.03567412123084068 Loss at step 900: 0.05092141777276993 Mean training loss after epoch 187: 0.038983210718739766 EPOCH: 188 Loss at step 0: 0.04477743059396744 Loss at step 50: 0.029746346175670624 Loss at step 100: 0.03507571294903755 Loss at step 150: 0.03642008826136589 Loss at step 200: 0.04069194942712784 Loss at step 250: 0.034208133816719055 Loss at step 300: 0.030497226864099503 Loss at step 350: 0.025112565606832504 Loss at step 400: 0.03175532817840576 Loss at step 450: 0.04075225815176964 Loss at step 500: 0.03230821713805199 Loss at step 550: 0.03450753539800644 Loss at step 600: 0.03777884319424629 Loss at step 650: 0.03851217404007912 Loss at step 700: 0.03258569911122322 Loss at step 750: 0.03590553626418114 Loss at step 800: 0.04308297485113144 Loss at step 850: 0.049340177327394485 Loss at step 900: 0.047894176095724106 Mean training loss after epoch 188: 0.03880938190990674 EPOCH: 189 Loss at step 0: 0.03660089895129204 Loss at step 50: 0.04207656905055046 Loss at step 100: 0.03323345258831978 Loss at step 150: 0.03489391505718231 Loss at step 200: 0.05134592205286026 Loss at step 250: 0.036952078342437744 Loss at step 300: 0.029703164473176003 Loss at step 350: 0.05346686765551567 Loss at step 400: 0.0406339094042778 Loss at step 450: 0.04392090439796448 Loss at step 500: 0.03848660737276077 Loss at step 550: 0.032161299139261246 Loss at step 600: 0.04051768407225609 Loss at step 650: 0.029058972373604774 Loss at step 700: 0.053267430514097214 Loss at step 750: 0.03226928785443306 Loss at step 800: 0.03391336649656296 Loss at step 850: 0.02826833724975586 Loss at step 900: 0.035321079194545746 Mean training loss after epoch 189: 0.03930605392355019 EPOCH: 190 Loss at step 0: 0.05376546084880829 Loss at step 50: 0.037496741861104965 Loss at step 100: 0.038770999759435654 Loss at step 150: 0.02862187661230564 Loss at step 200: 0.03889872133731842 Loss at step 250: 0.04027478024363518 Loss at step 300: 0.0426960289478302 Loss at step 350: 0.0376674048602581 Loss at step 400: 0.03353482112288475 Loss at step 450: 0.04574622958898544 Loss at step 500: 0.04225154221057892 Loss at step 550: 0.03030192106962204 Loss at step 600: 0.03313294053077698 Loss at step 650: 0.03571409359574318 Loss at step 700: 0.02995031140744686 Loss at step 750: 0.03174382820725441 Loss at step 800: 0.04587981104850769 Loss at step 850: 0.044129628688097 Loss at step 900: 0.032038964331150055 Mean training loss after epoch 190: 0.03895416312706051 EPOCH: 191 Loss at step 0: 0.036378663033246994 Loss at step 50: 0.03015403263270855 Loss at step 100: 0.05445217341184616 Loss at step 150: 0.04559158533811569 Loss at step 200: 0.03661433979868889 Loss at step 250: 0.03651627153158188 Loss at step 300: 0.03637724369764328 Loss at step 350: 0.03501575440168381 Loss at step 400: 0.03614184632897377 Loss at step 450: 0.03464321047067642 Loss at step 500: 0.034569647163152695 Loss at step 550: 0.0379793643951416 Loss at step 600: 0.03503644838929176 Loss at step 650: 0.047934141010046005 Loss at step 700: 0.032989177852869034 Loss at step 750: 0.027233563363552094 Loss at step 800: 0.04051957651972771 Loss at step 850: 0.040287084877491 Loss at step 900: 0.03482608497142792 Mean training loss after epoch 191: 0.03891531796232343 EPOCH: 192 Loss at step 0: 0.04147909954190254 Loss at step 50: 0.028473179787397385 Loss at step 100: 0.03432323411107063 Loss at step 150: 0.0375903844833374 Loss at step 200: 0.03936471790075302 Loss at step 250: 0.06319249421358109 Loss at step 300: 0.02669503726065159 Loss at step 350: 0.061404433101415634 Loss at step 400: 0.056496769189834595 Loss at step 450: 0.05037005990743637 Loss at step 500: 0.03290452063083649 Loss at step 550: 0.0555347204208374 Loss at step 600: 0.03529955446720123 Loss at step 650: 0.04581906646490097 Loss at step 700: 0.034675613045692444 Loss at step 750: 0.04231615737080574 Loss at step 800: 0.03318033739924431 Loss at step 850: 0.03577800095081329 Loss at step 900: 0.05038568004965782 Mean training loss after epoch 192: 0.03941814621080404 EPOCH: 193 Loss at step 0: 0.035415735095739365 Loss at step 50: 0.046264443546533585 Loss at step 100: 0.04625854641199112 Loss at step 150: 0.032276708632707596 Loss at step 200: 0.05279934033751488 Loss at step 250: 0.04837778955698013 Loss at step 300: 0.03325385972857475 Loss at step 350: 0.027963122352957726 Loss at step 400: 0.03811400383710861 Loss at step 450: 0.03123643435537815 Loss at step 500: 0.04395969584584236 Loss at step 550: 0.03043503127992153 Loss at step 600: 0.03377535566687584 Loss at step 650: 0.03386563062667847 Loss at step 700: 0.0394895002245903 Loss at step 750: 0.03699750825762749 Loss at step 800: 0.03487136960029602 Loss at step 850: 0.03927285224199295 Loss at step 900: 0.03782561048865318 Mean training loss after epoch 193: 0.03819198376024519 EPOCH: 194 Loss at step 0: 0.03237533196806908 Loss at step 50: 0.035295262932777405 Loss at step 100: 0.03610394895076752 Loss at step 150: 0.037992507219314575 Loss at step 200: 0.03148764371871948 Loss at step 250: 0.0346674770116806 Loss at step 300: 0.029178127646446228 Loss at step 350: 0.046320077031850815 Loss at step 400: 0.038839809596538544 Loss at step 450: 0.04046128690242767 Loss at step 500: 0.05617505684494972 Loss at step 550: 0.023581495508551598 Loss at step 600: 0.036590371280908585 Loss at step 650: 0.05393571779131889 Loss at step 700: 0.069367915391922 Loss at step 750: 0.04336010664701462 Loss at step 800: 0.04497930780053139 Loss at step 850: 0.030787678435444832 Loss at step 900: 0.04442349076271057 Mean training loss after epoch 194: 0.0388704427063211 EPOCH: 195 Loss at step 0: 0.060611188411712646 Loss at step 50: 0.044154584407806396 Loss at step 100: 0.047029945999383926 Loss at step 150: 0.0409981943666935 Loss at step 200: 0.032900553196668625 Loss at step 250: 0.03581107407808304 Loss at step 300: 0.037890128791332245 Loss at step 350: 0.03587449714541435 Loss at step 400: 0.05791701003909111 Loss at step 450: 0.03251917287707329 Loss at step 500: 0.03649131581187248 Loss at step 550: 0.028912056237459183 Loss at step 600: 0.035127438604831696 Loss at step 650: 0.04722564294934273 Loss at step 700: 0.03341808170080185 Loss at step 750: 0.05109436437487602 Loss at step 800: 0.037090327590703964 Loss at step 850: 0.03302645683288574 Loss at step 900: 0.04707186296582222 Mean training loss after epoch 195: 0.03874821350304112 EPOCH: 196 Loss at step 0: 0.0337100550532341 Loss at step 50: 0.03762233257293701 Loss at step 100: 0.03844651207327843 Loss at step 150: 0.047390323132276535 Loss at step 200: 0.04727717116475105 Loss at step 250: 0.03370849788188934 Loss at step 300: 0.04930270463228226 Loss at step 350: 0.04576931893825531 Loss at step 400: 0.049766622483730316 Loss at step 450: 0.027108630165457726 Loss at step 500: 0.05714484676718712 Loss at step 550: 0.033943500369787216 Loss at step 600: 0.03327649086713791 Loss at step 650: 0.03375770524144173 Loss at step 700: 0.061523403972387314 Loss at step 750: 0.046659812331199646 Loss at step 800: 0.03266577795147896 Loss at step 850: 0.03372927010059357 Loss at step 900: 0.031558070331811905 Mean training loss after epoch 196: 0.03888965190561024 EPOCH: 197 Loss at step 0: 0.059007614850997925 Loss at step 50: 0.03997175022959709 Loss at step 100: 0.03405724838376045 Loss at step 150: 0.027337998151779175 Loss at step 200: 0.0340658538043499 Loss at step 250: 0.07053939253091812 Loss at step 300: 0.048851918429136276 Loss at step 350: 0.05203927680850029 Loss at step 400: 0.03326457366347313 Loss at step 450: 0.02583160065114498 Loss at step 500: 0.035522304475307465 Loss at step 550: 0.034742388874292374 Loss at step 600: 0.03925861045718193 Loss at step 650: 0.04817241430282593 Loss at step 700: 0.03287101536989212 Loss at step 750: 0.035301972180604935 Loss at step 800: 0.053145311772823334 Loss at step 850: 0.031259045004844666 Loss at step 900: 0.05004565417766571 Mean training loss after epoch 197: 0.038227055473987866 EPOCH: 198 Loss at step 0: 0.030746348202228546 Loss at step 50: 0.036975014954805374 Loss at step 100: 0.04746312275528908 Loss at step 150: 0.039533328264951706 Loss at step 200: 0.03317964822053909 Loss at step 250: 0.03495556861162186 Loss at step 300: 0.039996176958084106 Loss at step 350: 0.045914679765701294 Loss at step 400: 0.033994901925325394 Loss at step 450: 0.02843667007982731 Loss at step 500: 0.034881606698036194 Loss at step 550: 0.04047463834285736 Loss at step 600: 0.04381750896573067 Loss at step 650: 0.03301554173231125 Loss at step 700: 0.035445235669612885 Loss at step 750: 0.03355494514107704 Loss at step 800: 0.04971516877412796 Loss at step 850: 0.053038645535707474 Loss at step 900: 0.032453522086143494 Mean training loss after epoch 198: 0.03945544810056178 EPOCH: 199 Loss at step 0: 0.03342563286423683 Loss at step 50: 0.028049608692526817 Loss at step 100: 0.027587953954935074 Loss at step 150: 0.036282915621995926 Loss at step 200: 0.042191073298454285 Loss at step 250: 0.035924144089221954 Loss at step 300: 0.05328305810689926 Loss at step 350: 0.03483545407652855 Loss at step 400: 0.04354650154709816 Loss at step 450: 0.052058957517147064 Loss at step 500: 0.03887229040265083 Loss at step 550: 0.08071530610322952 Loss at step 600: 0.03114897385239601 Loss at step 650: 0.036469630897045135 Loss at step 700: 0.047903917729854584 Loss at step 750: 0.03672577440738678 Loss at step 800: 0.03693230077624321 Loss at step 850: 0.03923816606402397 Loss at step 900: 0.031142491847276688 Mean training loss after epoch 199: 0.03941454091019976 EPOCH: 200 Loss at step 0: 0.03448416292667389 Loss at step 50: 0.04204482212662697 Loss at step 100: 0.03497808799147606 Loss at step 150: 0.0527622252702713 Loss at step 200: 0.03832058981060982 Loss at step 250: 0.03115082159638405 Loss at step 300: 0.045666057616472244 Loss at step 350: 0.03208445385098457 Loss at step 400: 0.031334977596998215 Loss at step 450: 0.03196084499359131 Loss at step 500: 0.02949412912130356 Loss at step 550: 0.04885036498308182 Loss at step 600: 0.03381315618753433 Loss at step 650: 0.0339878648519516 Loss at step 700: 0.03733561933040619 Loss at step 750: 0.039158303290605545 Loss at step 800: 0.03603089600801468 Loss at step 850: 0.03200925514101982 Loss at step 900: 0.044060662388801575 Mean training loss after epoch 200: 0.03900172293130586 EPOCH: 201 Loss at step 0: 0.03608107194304466 Loss at step 50: 0.02876424230635166 Loss at step 100: 0.03640037402510643 Loss at step 150: 0.04781084135174751 Loss at step 200: 0.05266709625720978 Loss at step 250: 0.033672984689474106 Loss at step 300: 0.0269797183573246 Loss at step 350: 0.04562553018331528 Loss at step 400: 0.05222082883119583 Loss at step 450: 0.052432432770729065 Loss at step 500: 0.0430750846862793 Loss at step 550: 0.03657301142811775 Loss at step 600: 0.026580404490232468 Loss at step 650: 0.03239564970135689 Loss at step 700: 0.026605894789099693 Loss at step 750: 0.035307876765728 Loss at step 800: 0.03531424328684807 Loss at step 850: 0.044784389436244965 Loss at step 900: 0.038285993039608 Mean training loss after epoch 201: 0.03924384835297302 EPOCH: 202 Loss at step 0: 0.034410085529088974 Loss at step 50: 0.03751525282859802 Loss at step 100: 0.040865011513233185 Loss at step 150: 0.03878386691212654 Loss at step 200: 0.04949237406253815 Loss at step 250: 0.029782790690660477 Loss at step 300: 0.03072093427181244 Loss at step 350: 0.03746283799409866 Loss at step 400: 0.03612561523914337 Loss at step 450: 0.04694696143269539 Loss at step 500: 0.033120445907115936 Loss at step 550: 0.036482084542512894 Loss at step 600: 0.03173886612057686 Loss at step 650: 0.04167407751083374 Loss at step 700: 0.07123087346553802 Loss at step 750: 0.04836799576878548 Loss at step 800: 0.03589560091495514 Loss at step 850: 0.03556542843580246 Loss at step 900: 0.031001700088381767 Mean training loss after epoch 202: 0.03912837048178352 EPOCH: 203 Loss at step 0: 0.03857658803462982 Loss at step 50: 0.03261326625943184 Loss at step 100: 0.041330814361572266 Loss at step 150: 0.04896312579512596 Loss at step 200: 0.030943986028432846 Loss at step 250: 0.029989639297127724 Loss at step 300: 0.02920248545706272 Loss at step 350: 0.03491934388875961 Loss at step 400: 0.044427916407585144 Loss at step 450: 0.050978220999240875 Loss at step 500: 0.04809482768177986 Loss at step 550: 0.03796686232089996 Loss at step 600: 0.05045905336737633 Loss at step 650: 0.04250432923436165 Loss at step 700: 0.033959899097681046 Loss at step 750: 0.0571676641702652 Loss at step 800: 0.06508304178714752 Loss at step 850: 0.0528600849211216 Loss at step 900: 0.06640968471765518 Mean training loss after epoch 203: 0.03919593990047667 EPOCH: 204 Loss at step 0: 0.03565060719847679 Loss at step 50: 0.05724276229739189 Loss at step 100: 0.06661258637905121 Loss at step 150: 0.029290111735463142 Loss at step 200: 0.03637978434562683 Loss at step 250: 0.03921637311577797 Loss at step 300: 0.03259154409170151 Loss at step 350: 0.05091315135359764 Loss at step 400: 0.041239768266677856 Loss at step 450: 0.030525902286171913 Loss at step 500: 0.03625475615262985 Loss at step 550: 0.05059317126870155 Loss at step 600: 0.03669440373778343 Loss at step 650: 0.05405686795711517 Loss at step 700: 0.049178317189216614 Loss at step 750: 0.029772713780403137 Loss at step 800: 0.034465186297893524 Loss at step 850: 0.03069974295794964 Loss at step 900: 0.03190544620156288 Mean training loss after epoch 204: 0.03823181749113015 EPOCH: 205 Loss at step 0: 0.03676017001271248 Loss at step 50: 0.03938576951622963 Loss at step 100: 0.03508846461772919 Loss at step 150: 0.04729112610220909 Loss at step 200: 0.030423976480960846 Loss at step 250: 0.04223884642124176 Loss at step 300: 0.0319378525018692 Loss at step 350: 0.022482652217149734 Loss at step 400: 0.048247355967760086 Loss at step 450: 0.03762287646532059 Loss at step 500: 0.04755551740527153 Loss at step 550: 0.02959769032895565 Loss at step 600: 0.03311510384082794 Loss at step 650: 0.05192120000720024 Loss at step 700: 0.0575849749147892 Loss at step 750: 0.031640395522117615 Loss at step 800: 0.038834284991025925 Loss at step 850: 0.03184034302830696 Loss at step 900: 0.06182391196489334 Mean training loss after epoch 205: 0.038580602948377125 EPOCH: 206 Loss at step 0: 0.03930177539587021 Loss at step 50: 0.03205115720629692 Loss at step 100: 0.050223447382450104 Loss at step 150: 0.0298319049179554 Loss at step 200: 0.03390884771943092 Loss at step 250: 0.0328381210565567 Loss at step 300: 0.03294454514980316 Loss at step 350: 0.04002123698592186 Loss at step 400: 0.036691345274448395 Loss at step 450: 0.03283528983592987 Loss at step 500: 0.03423246368765831 Loss at step 550: 0.026798665523529053 Loss at step 600: 0.03723469376564026 Loss at step 650: 0.03705236315727234 Loss at step 700: 0.055472198873758316 Loss at step 750: 0.034867074340581894 Loss at step 800: 0.05657971650362015 Loss at step 850: 0.03195787966251373 Loss at step 900: 0.0483744777739048 Mean training loss after epoch 206: 0.038998424734817 EPOCH: 207 Loss at step 0: 0.036123067140579224 Loss at step 50: 0.03716934472322464 Loss at step 100: 0.03470231220126152 Loss at step 150: 0.05521228909492493 Loss at step 200: 0.032477233558893204 Loss at step 250: 0.06308622658252716 Loss at step 300: 0.04446502402424812 Loss at step 350: 0.03490413725376129 Loss at step 400: 0.0383303239941597 Loss at step 450: 0.03637818992137909 Loss at step 500: 0.04971726983785629 Loss at step 550: 0.033189382404088974 Loss at step 600: 0.03258603811264038 Loss at step 650: 0.028821464627981186 Loss at step 700: 0.034130848944187164 Loss at step 750: 0.031304992735385895 Loss at step 800: 0.029026390984654427 Loss at step 850: 0.05989857017993927 Loss at step 900: 0.02793525531888008 Mean training loss after epoch 207: 0.03880610405755386 EPOCH: 208 Loss at step 0: 0.046121250838041306 Loss at step 50: 0.03646767884492874 Loss at step 100: 0.04838304594159126 Loss at step 150: 0.0347878523170948 Loss at step 200: 0.03002927452325821 Loss at step 250: 0.036177463829517365 Loss at step 300: 0.03156193718314171 Loss at step 350: 0.04319934546947479 Loss at step 400: 0.03310208395123482 Loss at step 450: 0.031349729746580124 Loss at step 500: 0.048831384629011154 Loss at step 550: 0.03179125115275383 Loss at step 600: 0.029397141188383102 Loss at step 650: 0.04012276604771614 Loss at step 700: 0.03291450813412666 Loss at step 750: 0.03407580032944679 Loss at step 800: 0.03836648166179657 Loss at step 850: 0.02987813390791416 Loss at step 900: 0.03476279228925705 Mean training loss after epoch 208: 0.03903247418799507 EPOCH: 209 Loss at step 0: 0.035792458802461624 Loss at step 50: 0.0344044454395771 Loss at step 100: 0.038120657205581665 Loss at step 150: 0.03713478893041611 Loss at step 200: 0.03925497084856033 Loss at step 250: 0.033266931772232056 Loss at step 300: 0.04779272526502609 Loss at step 350: 0.047744739800691605 Loss at step 400: 0.0595935583114624 Loss at step 450: 0.044945500791072845 Loss at step 500: 0.05028168484568596 Loss at step 550: 0.053303834050893784 Loss at step 600: 0.05259212106466293 Loss at step 650: 0.0320139117538929 Loss at step 700: 0.034180693328380585 Loss at step 750: 0.04007092863321304 Loss at step 800: 0.0532253198325634 Loss at step 850: 0.058895643800497055 Loss at step 900: 0.030731121078133583 Mean training loss after epoch 209: 0.03883622211616621 EPOCH: 210 Loss at step 0: 0.029939644038677216 Loss at step 50: 0.05373511463403702 Loss at step 100: 0.030339794233441353 Loss at step 150: 0.03452230244874954 Loss at step 200: 0.04094143584370613 Loss at step 250: 0.0369291752576828 Loss at step 300: 0.037235379219055176 Loss at step 350: 0.035384539514780045 Loss at step 400: 0.035744886845350266 Loss at step 450: 0.025507070124149323 Loss at step 500: 0.03748571500182152 Loss at step 550: 0.030373048037290573 Loss at step 600: 0.042601700872182846 Loss at step 650: 0.0389503575861454 Loss at step 700: 0.040219951421022415 Loss at step 750: 0.053821317851543427 Loss at step 800: 0.028183812275528908 Loss at step 850: 0.03350718691945076 Loss at step 900: 0.03622322529554367 Mean training loss after epoch 210: 0.03861672730484941 EPOCH: 211 Loss at step 0: 0.03263504058122635 Loss at step 50: 0.031510088592767715 Loss at step 100: 0.029639696702361107 Loss at step 150: 0.03417270630598068 Loss at step 200: 0.033268772065639496 Loss at step 250: 0.03097095526754856 Loss at step 300: 0.03059498965740204 Loss at step 350: 0.033541303128004074 Loss at step 400: 0.024651994928717613 Loss at step 450: 0.03303629532456398 Loss at step 500: 0.032551638782024384 Loss at step 550: 0.03279938921332359 Loss at step 600: 0.04136810451745987 Loss at step 650: 0.029112065210938454 Loss at step 700: 0.03564835712313652 Loss at step 750: 0.05012630671262741 Loss at step 800: 0.047952260822057724 Loss at step 850: 0.031387630850076675 Loss at step 900: 0.03142992779612541 Mean training loss after epoch 211: 0.038534771490182834 EPOCH: 212 Loss at step 0: 0.03494802862405777 Loss at step 50: 0.03748581185936928 Loss at step 100: 0.03624940663576126 Loss at step 150: 0.04951558634638786 Loss at step 200: 0.04103242978453636 Loss at step 250: 0.02790713496506214 Loss at step 300: 0.04231208562850952 Loss at step 350: 0.03209036961197853 Loss at step 400: 0.028997186571359634 Loss at step 450: 0.029046665877103806 Loss at step 500: 0.03579002991318703 Loss at step 550: 0.03279267996549606 Loss at step 600: 0.05336654186248779 Loss at step 650: 0.03835752606391907 Loss at step 700: 0.04838625341653824 Loss at step 750: 0.04036305472254753 Loss at step 800: 0.028935911133885384 Loss at step 850: 0.030567439272999763 Loss at step 900: 0.04995988681912422 Mean training loss after epoch 212: 0.03873438690937976 EPOCH: 213 Loss at step 0: 0.0305633582174778 Loss at step 50: 0.031451284885406494 Loss at step 100: 0.033198531717061996 Loss at step 150: 0.034492358565330505 Loss at step 200: 0.03673641383647919 Loss at step 250: 0.05664115399122238 Loss at step 300: 0.031177658587694168 Loss at step 350: 0.03173810616135597 Loss at step 400: 0.03554634749889374 Loss at step 450: 0.05733977258205414 Loss at step 500: 0.03406798839569092 Loss at step 550: 0.03751193732023239 Loss at step 600: 0.030732285231351852 Loss at step 650: 0.04684927314519882 Loss at step 700: 0.04643711820244789 Loss at step 750: 0.032390985637903214 Loss at step 800: 0.05858895182609558 Loss at step 850: 0.039080142974853516 Loss at step 900: 0.03865396976470947 Mean training loss after epoch 213: 0.039167814177157145 EPOCH: 214 Loss at step 0: 0.034820254892110825 Loss at step 50: 0.06212383136153221 Loss at step 100: 0.035429082810878754 Loss at step 150: 0.0358148068189621 Loss at step 200: 0.032212674617767334 Loss at step 250: 0.034741975367069244 Loss at step 300: 0.03496982902288437 Loss at step 350: 0.03637605533003807 Loss at step 400: 0.05291757360100746 Loss at step 450: 0.04087560996413231 Loss at step 500: 0.034480463713407516 Loss at step 550: 0.037715550512075424 Loss at step 600: 0.05320248752832413 Loss at step 650: 0.0341896116733551 Loss at step 700: 0.05204067379236221 Loss at step 750: 0.04675726965069771 Loss at step 800: 0.03137475997209549 Loss at step 850: 0.03327544406056404 Loss at step 900: 0.03216896578669548 Mean training loss after epoch 214: 0.03909944554866313 EPOCH: 215 Loss at step 0: 0.03133048862218857 Loss at step 50: 0.03323538601398468 Loss at step 100: 0.030176354572176933 Loss at step 150: 0.06711596995592117 Loss at step 200: 0.050807952880859375 Loss at step 250: 0.043646421283483505 Loss at step 300: 0.04855671152472496 Loss at step 350: 0.03535478934645653 Loss at step 400: 0.03658371418714523 Loss at step 450: 0.02738657221198082 Loss at step 500: 0.04076555371284485 Loss at step 550: 0.0351577028632164 Loss at step 600: 0.03501153364777565 Loss at step 650: 0.0476493202149868 Loss at step 700: 0.03237270936369896 Loss at step 750: 0.05486316978931427 Loss at step 800: 0.03001241758465767 Loss at step 850: 0.03485102206468582 Loss at step 900: 0.042807675898075104 Mean training loss after epoch 215: 0.038794898108315115 EPOCH: 216 Loss at step 0: 0.038350146263837814 Loss at step 50: 0.029338212683796883 Loss at step 100: 0.02911316603422165 Loss at step 150: 0.028330136090517044 Loss at step 200: 0.030247842893004417 Loss at step 250: 0.04187742620706558 Loss at step 300: 0.02866005338728428 Loss at step 350: 0.031296275556087494 Loss at step 400: 0.04535824805498123 Loss at step 450: 0.032924585044384 Loss at step 500: 0.047757040709257126 Loss at step 550: 0.030043302103877068 Loss at step 600: 0.04660044610500336 Loss at step 650: 0.03011370822787285 Loss at step 700: 0.03834841027855873 Loss at step 750: 0.04171215742826462 Loss at step 800: 0.02969386987388134 Loss at step 850: 0.031841497868299484 Loss at step 900: 0.03773115575313568 Mean training loss after epoch 216: 0.038765641576699865 EPOCH: 217 Loss at step 0: 0.031741268932819366 Loss at step 50: 0.03626462444663048 Loss at step 100: 0.04653813689947128 Loss at step 150: 0.02959255874156952 Loss at step 200: 0.04867061227560043 Loss at step 250: 0.03144945204257965 Loss at step 300: 0.033888787031173706 Loss at step 350: 0.03249417990446091 Loss at step 400: 0.033298563212156296 Loss at step 450: 0.03513026982545853 Loss at step 500: 0.03576437383890152 Loss at step 550: 0.030613582581281662 Loss at step 600: 0.03500447794795036 Loss at step 650: 0.04801984503865242 Loss at step 700: 0.03224053233861923 Loss at step 750: 0.039313822984695435 Loss at step 800: 0.04416655749082565 Loss at step 850: 0.03519587591290474 Loss at step 900: 0.047773104161024094 Mean training loss after epoch 217: 0.03856635142935873 EPOCH: 218 Loss at step 0: 0.04131985455751419 Loss at step 50: 0.04961838945746422 Loss at step 100: 0.05226267874240875 Loss at step 150: 0.029256299138069153 Loss at step 200: 0.030823132023215294 Loss at step 250: 0.03272109478712082 Loss at step 300: 0.05129536613821983 Loss at step 350: 0.039657581597566605 Loss at step 400: 0.031544819474220276 Loss at step 450: 0.0317593477666378 Loss at step 500: 0.03580109030008316 Loss at step 550: 0.03561868518590927 Loss at step 600: 0.030231596902012825 Loss at step 650: 0.03684688359498978 Loss at step 700: 0.035572025924921036 Loss at step 750: 0.04066815972328186 Loss at step 800: 0.03826131671667099 Loss at step 850: 0.03254857659339905 Loss at step 900: 0.03815583884716034 Mean training loss after epoch 218: 0.03871198563075968 EPOCH: 219 Loss at step 0: 0.03232763335108757 Loss at step 50: 0.032373908907175064 Loss at step 100: 0.049574948847293854 Loss at step 150: 0.05158744379878044 Loss at step 200: 0.03188695013523102 Loss at step 250: 0.03646356612443924 Loss at step 300: 0.03662825748324394 Loss at step 350: 0.03089234232902527 Loss at step 400: 0.03349229320883751 Loss at step 450: 0.029799120500683784 Loss at step 500: 0.05702761933207512 Loss at step 550: 0.0669003576040268 Loss at step 600: 0.041376493871212006 Loss at step 650: 0.030917033553123474 Loss at step 700: 0.044996436685323715 Loss at step 750: 0.03363458067178726 Loss at step 800: 0.06501676142215729 Loss at step 850: 0.028591252863407135 Loss at step 900: 0.03952722251415253 Mean training loss after epoch 219: 0.038555985000897956 EPOCH: 220 Loss at step 0: 0.03197668120265007 Loss at step 50: 0.038616959005594254 Loss at step 100: 0.029931532219052315 Loss at step 150: 0.032169267535209656 Loss at step 200: 0.044209469109773636 Loss at step 250: 0.04873202368617058 Loss at step 300: 0.06394154578447342 Loss at step 350: 0.03337306156754494 Loss at step 400: 0.03250643238425255 Loss at step 450: 0.034415338188409805 Loss at step 500: 0.034679826349020004 Loss at step 550: 0.03250497207045555 Loss at step 600: 0.035618748515844345 Loss at step 650: 0.03961366042494774 Loss at step 700: 0.030626051127910614 Loss at step 750: 0.04939880967140198 Loss at step 800: 0.02721027098596096 Loss at step 850: 0.03401949629187584 Loss at step 900: 0.03158809244632721 Mean training loss after epoch 220: 0.0385771609727603 EPOCH: 221 Loss at step 0: 0.033307261765003204 Loss at step 50: 0.027653153985738754 Loss at step 100: 0.04842335730791092 Loss at step 150: 0.05061378702521324 Loss at step 200: 0.03869916871190071 Loss at step 250: 0.035439472645521164 Loss at step 300: 0.03720962628722191 Loss at step 350: 0.035217709839344025 Loss at step 400: 0.05012897029519081 Loss at step 450: 0.027615439146757126 Loss at step 500: 0.03626246377825737 Loss at step 550: 0.05002190172672272 Loss at step 600: 0.030623629689216614 Loss at step 650: 0.025477813556790352 Loss at step 700: 0.053467243909835815 Loss at step 750: 0.030479853972792625 Loss at step 800: 0.0320393368601799 Loss at step 850: 0.05970003083348274 Loss at step 900: 0.03120565414428711 Mean training loss after epoch 221: 0.03843559722688152 EPOCH: 222 Loss at step 0: 0.037890829145908356 Loss at step 50: 0.030479932203888893 Loss at step 100: 0.08171330392360687 Loss at step 150: 0.04287896677851677 Loss at step 200: 0.03289856016635895 Loss at step 250: 0.032602887600660324 Loss at step 300: 0.035880934447050095 Loss at step 350: 0.03647756949067116 Loss at step 400: 0.038081977516412735 Loss at step 450: 0.05719364434480667 Loss at step 500: 0.030570030212402344 Loss at step 550: 0.03147860988974571 Loss at step 600: 0.037512194365262985 Loss at step 650: 0.030427712947130203 Loss at step 700: 0.03863087296485901 Loss at step 750: 0.034135472029447556 Loss at step 800: 0.030253596603870392 Loss at step 850: 0.040759675204753876 Loss at step 900: 0.033469308167696 Mean training loss after epoch 222: 0.039099985175629036 EPOCH: 223 Loss at step 0: 0.03163981810212135 Loss at step 50: 0.027593662962317467 Loss at step 100: 0.0573481060564518 Loss at step 150: 0.053460631519556046 Loss at step 200: 0.036425571888685226 Loss at step 250: 0.08300038427114487 Loss at step 300: 0.029846543446183205 Loss at step 350: 0.031631071120500565 Loss at step 400: 0.049860309809446335 Loss at step 450: 0.06837867200374603 Loss at step 500: 0.06802817434072495 Loss at step 550: 0.029236018657684326 Loss at step 600: 0.039657142013311386 Loss at step 650: 0.034957949072122574 Loss at step 700: 0.03908216208219528 Loss at step 750: 0.0327768512070179 Loss at step 800: 0.04471376910805702 Loss at step 850: 0.035406459122896194 Loss at step 900: 0.02970868907868862 Mean training loss after epoch 223: 0.0390934235573228 EPOCH: 224 Loss at step 0: 0.02844778075814247 Loss at step 50: 0.03809535875916481 Loss at step 100: 0.033365823328495026 Loss at step 150: 0.049617569893598557 Loss at step 200: 0.030383432283997536 Loss at step 250: 0.024835946038365364 Loss at step 300: 0.03601111099123955 Loss at step 350: 0.0284729041159153 Loss at step 400: 0.03506002947688103 Loss at step 450: 0.06403302401304245 Loss at step 500: 0.05043325573205948 Loss at step 550: 0.05895457789301872 Loss at step 600: 0.030324801802635193 Loss at step 650: 0.034179799258708954 Loss at step 700: 0.039408691227436066 Loss at step 750: 0.04213355481624603 Loss at step 800: 0.03002680465579033 Loss at step 850: 0.03414057940244675 Loss at step 900: 0.024349508807063103 Mean training loss after epoch 224: 0.03854314730500679 EPOCH: 225 Loss at step 0: 0.07022244483232498 Loss at step 50: 0.05942574143409729 Loss at step 100: 0.04441233351826668 Loss at step 150: 0.03319178521633148 Loss at step 200: 0.061659928411245346 Loss at step 250: 0.054623764008283615 Loss at step 300: 0.02905896119773388 Loss at step 350: 0.03376946970820427 Loss at step 400: 0.0622660256922245 Loss at step 450: 0.03281800076365471 Loss at step 500: 0.03032631240785122 Loss at step 550: 0.04856538772583008 Loss at step 600: 0.037682436406612396 Loss at step 650: 0.0341029018163681 Loss at step 700: 0.035323429852724075 Loss at step 750: 0.0317775122821331 Loss at step 800: 0.05041889101266861 Loss at step 850: 0.03912316635251045 Loss at step 900: 0.06678944081068039 Mean training loss after epoch 225: 0.038688860124727685 EPOCH: 226 Loss at step 0: 0.07034695148468018 Loss at step 50: 0.03673253580927849 Loss at step 100: 0.030262352898716927 Loss at step 150: 0.033561330288648605 Loss at step 200: 0.033116813749074936 Loss at step 250: 0.03107394091784954 Loss at step 300: 0.05830236151814461 Loss at step 350: 0.03608950600028038 Loss at step 400: 0.028155099600553513 Loss at step 450: 0.031322818249464035 Loss at step 500: 0.03842392563819885 Loss at step 550: 0.03311774507164955 Loss at step 600: 0.030197488144040108 Loss at step 650: 0.035272642970085144 Loss at step 700: 0.02929736115038395 Loss at step 750: 0.04660622775554657 Loss at step 800: 0.03252449631690979 Loss at step 850: 0.03088449314236641 Loss at step 900: 0.03738436475396156 Mean training loss after epoch 226: 0.03955555847609666 EPOCH: 227 Loss at step 0: 0.029523007571697235 Loss at step 50: 0.06260868906974792 Loss at step 100: 0.04959005117416382 Loss at step 150: 0.03302012383937836 Loss at step 200: 0.056059591472148895 Loss at step 250: 0.04111262783408165 Loss at step 300: 0.048444513231515884 Loss at step 350: 0.02714165300130844 Loss at step 400: 0.03722955659031868 Loss at step 450: 0.05122249945998192 Loss at step 500: 0.030988814309239388 Loss at step 550: 0.06166931986808777 Loss at step 600: 0.041224509477615356 Loss at step 650: 0.034607186913490295 Loss at step 700: 0.03189275786280632 Loss at step 750: 0.03707089647650719 Loss at step 800: 0.03599362075328827 Loss at step 850: 0.030553320422768593 Loss at step 900: 0.033909816294908524 Mean training loss after epoch 227: 0.03898256691867736 EPOCH: 228 Loss at step 0: 0.04827951639890671 Loss at step 50: 0.03732552379369736 Loss at step 100: 0.041935425251722336 Loss at step 150: 0.02780473604798317 Loss at step 200: 0.02627200447022915 Loss at step 250: 0.028607193380594254 Loss at step 300: 0.030610159039497375 Loss at step 350: 0.034342437982559204 Loss at step 400: 0.051105283200740814 Loss at step 450: 0.03107832744717598 Loss at step 500: 0.03723703697323799 Loss at step 550: 0.03127643093466759 Loss at step 600: 0.050782810896635056 Loss at step 650: 0.03957713395357132 Loss at step 700: 0.02813553810119629 Loss at step 750: 0.04094870761036873 Loss at step 800: 0.05094950273633003 Loss at step 850: 0.05368144065141678 Loss at step 900: 0.040793485939502716 Mean training loss after epoch 228: 0.03846595825345468 EPOCH: 229 Loss at step 0: 0.04654906690120697 Loss at step 50: 0.03062305599451065 Loss at step 100: 0.05187810957431793 Loss at step 150: 0.04106740653514862 Loss at step 200: 0.04434585943818092 Loss at step 250: 0.03150363266468048 Loss at step 300: 0.03320511430501938 Loss at step 350: 0.046548813581466675 Loss at step 400: 0.045509375631809235 Loss at step 450: 0.030184946954250336 Loss at step 500: 0.030279263854026794 Loss at step 550: 0.029224146157503128 Loss at step 600: 0.04829501360654831 Loss at step 650: 0.03988927602767944 Loss at step 700: 0.04733537137508392 Loss at step 750: 0.03534760698676109 Loss at step 800: 0.03940034285187721 Loss at step 850: 0.034822508692741394 Loss at step 900: 0.06359588354825974 Mean training loss after epoch 229: 0.03859097589411016 EPOCH: 230 Loss at step 0: 0.042286865413188934 Loss at step 50: 0.04724636673927307 Loss at step 100: 0.03768102079629898 Loss at step 150: 0.048915181308984756 Loss at step 200: 0.053905826061964035 Loss at step 250: 0.04714168235659599 Loss at step 300: 0.03379489481449127 Loss at step 350: 0.033223796635866165 Loss at step 400: 0.04826541617512703 Loss at step 450: 0.03231550008058548 Loss at step 500: 0.026478756219148636 Loss at step 550: 0.051870252937078476 Loss at step 600: 0.06250990182161331 Loss at step 650: 0.04642006382346153 Loss at step 700: 0.03512794151902199 Loss at step 750: 0.05037693306803703 Loss at step 800: 0.03080512024462223 Loss at step 850: 0.06194986402988434 Loss at step 900: 0.029427601024508476 Mean training loss after epoch 230: 0.03902272368544963 EPOCH: 231 Loss at step 0: 0.035307157784700394 Loss at step 50: 0.03270497918128967 Loss at step 100: 0.05100880563259125 Loss at step 150: 0.04540596157312393 Loss at step 200: 0.03655621409416199 Loss at step 250: 0.02974519319832325 Loss at step 300: 0.0265053678303957 Loss at step 350: 0.0320795439183712 Loss at step 400: 0.03563934937119484 Loss at step 450: 0.03832893446087837 Loss at step 500: 0.03433196246623993 Loss at step 550: 0.08082190155982971 Loss at step 600: 0.034218937158584595 Loss at step 650: 0.03983144462108612 Loss at step 700: 0.05942707508802414 Loss at step 750: 0.03208743780851364 Loss at step 800: 0.03630710393190384 Loss at step 850: 0.031194031238555908 Loss at step 900: 0.04677927494049072 Mean training loss after epoch 231: 0.03862281920495572 EPOCH: 232 Loss at step 0: 0.06082414090633392 Loss at step 50: 0.04318132996559143 Loss at step 100: 0.03431619703769684 Loss at step 150: 0.034456703811883926 Loss at step 200: 0.03263910487294197 Loss at step 250: 0.03749016672372818 Loss at step 300: 0.030205905437469482 Loss at step 350: 0.037272196263074875 Loss at step 400: 0.08236411958932877 Loss at step 450: 0.03707977011799812 Loss at step 500: 0.04791644960641861 Loss at step 550: 0.03228437900543213 Loss at step 600: 0.040226876735687256 Loss at step 650: 0.04937141761183739 Loss at step 700: 0.03636350855231285 Loss at step 750: 0.04392433911561966 Loss at step 800: 0.02780865505337715 Loss at step 850: 0.035982441157102585 Loss at step 900: 0.06588919460773468 Mean training loss after epoch 232: 0.03923607193656377 EPOCH: 233 Loss at step 0: 0.034574441611766815 Loss at step 50: 0.05462922900915146 Loss at step 100: 0.033580001443624496 Loss at step 150: 0.03486957028508186 Loss at step 200: 0.0332089327275753 Loss at step 250: 0.030099935829639435 Loss at step 300: 0.0319361612200737 Loss at step 350: 0.031049080193042755 Loss at step 400: 0.04326486587524414 Loss at step 450: 0.0636681392788887 Loss at step 500: 0.028879398480057716 Loss at step 550: 0.03339890390634537 Loss at step 600: 0.03539077565073967 Loss at step 650: 0.040725916624069214 Loss at step 700: 0.04704589024186134 Loss at step 750: 0.055739887058734894 Loss at step 800: 0.026596875861287117 Loss at step 850: 0.06466401368379593 Loss at step 900: 0.029591111466288567 Mean training loss after epoch 233: 0.038573727228128706 EPOCH: 234 Loss at step 0: 0.03220190107822418 Loss at step 50: 0.03624245896935463 Loss at step 100: 0.028894346207380295 Loss at step 150: 0.03538462519645691 Loss at step 200: 0.028320934623479843 Loss at step 250: 0.0321003794670105 Loss at step 300: 0.024427611380815506 Loss at step 350: 0.03659152239561081 Loss at step 400: 0.04311208426952362 Loss at step 450: 0.0642971321940422 Loss at step 500: 0.03917315974831581 Loss at step 550: 0.05116429552435875 Loss at step 600: 0.03522849082946777 Loss at step 650: 0.03327084332704544 Loss at step 700: 0.03227394446730614 Loss at step 750: 0.036047425121068954 Loss at step 800: 0.0479966476559639 Loss at step 850: 0.03027443215250969 Loss at step 900: 0.03931645303964615 Mean training loss after epoch 234: 0.0383284426788722 EPOCH: 235 Loss at step 0: 0.037100132554769516 Loss at step 50: 0.03590553626418114 Loss at step 100: 0.049465227872133255 Loss at step 150: 0.05028872564435005 Loss at step 200: 0.03418462723493576 Loss at step 250: 0.029636060819029808 Loss at step 300: 0.03470972925424576 Loss at step 350: 0.04670199379324913 Loss at step 400: 0.03143817558884621 Loss at step 450: 0.03301733732223511 Loss at step 500: 0.039559245109558105 Loss at step 550: 0.03448006883263588 Loss at step 600: 0.03556208312511444 Loss at step 650: 0.035292185842990875 Loss at step 700: 0.03138948231935501 Loss at step 750: 0.03332112357020378 Loss at step 800: 0.046298425644636154 Loss at step 850: 0.044403813779354095 Loss at step 900: 0.026095671579241753 Mean training loss after epoch 235: 0.03916312653674627 EPOCH: 236 Loss at step 0: 0.06218736991286278 Loss at step 50: 0.044210802763700485 Loss at step 100: 0.035077303647994995 Loss at step 150: 0.03270825371146202 Loss at step 200: 0.04108499363064766 Loss at step 250: 0.04944796487689018 Loss at step 300: 0.04576771333813667 Loss at step 350: 0.034644704312086105 Loss at step 400: 0.03042898327112198 Loss at step 450: 0.03401826694607735 Loss at step 500: 0.06649397313594818 Loss at step 550: 0.03142006695270538 Loss at step 600: 0.03693798556923866 Loss at step 650: 0.028506331145763397 Loss at step 700: 0.03812647610902786 Loss at step 750: 0.02821643278002739 Loss at step 800: 0.055208731442689896 Loss at step 850: 0.03501010313630104 Loss at step 900: 0.03546426445245743 Mean training loss after epoch 236: 0.03910887272539995 EPOCH: 237 Loss at step 0: 0.03626174107193947 Loss at step 50: 0.03231123834848404 Loss at step 100: 0.039797767996788025 Loss at step 150: 0.047457147389650345 Loss at step 200: 0.03797951713204384 Loss at step 250: 0.0375983864068985 Loss at step 300: 0.04221799224615097 Loss at step 350: 0.04700365662574768 Loss at step 400: 0.06193067505955696 Loss at step 450: 0.03442120924592018 Loss at step 500: 0.030339643359184265 Loss at step 550: 0.03563765436410904 Loss at step 600: 0.03963075205683708 Loss at step 650: 0.04113083332777023 Loss at step 700: 0.03788047656416893 Loss at step 750: 0.03356481343507767 Loss at step 800: 0.0475701205432415 Loss at step 850: 0.027276229113340378 Loss at step 900: 0.02865569107234478 Mean training loss after epoch 237: 0.039007801350071106 EPOCH: 238 Loss at step 0: 0.03347187861800194 Loss at step 50: 0.03746746852993965 Loss at step 100: 0.05003228411078453 Loss at step 150: 0.049931589514017105 Loss at step 200: 0.05133192613720894 Loss at step 250: 0.03603018820285797 Loss at step 300: 0.05719320476055145 Loss at step 350: 0.04544191062450409 Loss at step 400: 0.028367027640342712 Loss at step 450: 0.036977194249629974 Loss at step 500: 0.048815805464982986 Loss at step 550: 0.0505886934697628 Loss at step 600: 0.04127909243106842 Loss at step 650: 0.043081238865852356 Loss at step 700: 0.03456343710422516 Loss at step 750: 0.032887279987335205 Loss at step 800: 0.03336777538061142 Loss at step 850: 0.03294628486037254 Loss at step 900: 0.037910137325525284 Mean training loss after epoch 238: 0.03837936322317957 EPOCH: 239 Loss at step 0: 0.03615179657936096 Loss at step 50: 0.025037020444869995 Loss at step 100: 0.03438402712345123 Loss at step 150: 0.036021217703819275 Loss at step 200: 0.03250397369265556 Loss at step 250: 0.049476515501737595 Loss at step 300: 0.03522685542702675 Loss at step 350: 0.04905928298830986 Loss at step 400: 0.02624732069671154 Loss at step 450: 0.03200606256723404 Loss at step 500: 0.05197629705071449 Loss at step 550: 0.050037819892168045 Loss at step 600: 0.04677993431687355 Loss at step 650: 0.042846888303756714 Loss at step 700: 0.0466928668320179 Loss at step 750: 0.03682347759604454 Loss at step 800: 0.04968541860580444 Loss at step 850: 0.047215528786182404 Loss at step 900: 0.04372040182352066 Mean training loss after epoch 239: 0.03861922594601475 EPOCH: 240 Loss at step 0: 0.03183655068278313 Loss at step 50: 0.02763986587524414 Loss at step 100: 0.04109526053071022 Loss at step 150: 0.04026518762111664 Loss at step 200: 0.051996853202581406 Loss at step 250: 0.028527425602078438 Loss at step 300: 0.029707537963986397 Loss at step 350: 0.03533201664686203 Loss at step 400: 0.03651180863380432 Loss at step 450: 0.030582789331674576 Loss at step 500: 0.02957247570157051 Loss at step 550: 0.034956831485033035 Loss at step 600: 0.02801669016480446 Loss at step 650: 0.03431185707449913 Loss at step 700: 0.031054919585585594 Loss at step 750: 0.029481979086995125 Loss at step 800: 0.028265465050935745 Loss at step 850: 0.033417247235774994 Loss at step 900: 0.04570600017905235 Mean training loss after epoch 240: 0.03830478865025776 EPOCH: 241 Loss at step 0: 0.03109552152454853 Loss at step 50: 0.030378544703125954 Loss at step 100: 0.03351382911205292 Loss at step 150: 0.040944118052721024 Loss at step 200: 0.0353764183819294 Loss at step 250: 0.04893406853079796 Loss at step 300: 0.040162328630685806 Loss at step 350: 0.0364413857460022 Loss at step 400: 0.026483720168471336 Loss at step 450: 0.03537702187895775 Loss at step 500: 0.03180855140089989 Loss at step 550: 0.029317481443285942 Loss at step 600: 0.03216392919421196 Loss at step 650: 0.030868923291563988 Loss at step 700: 0.03351496160030365 Loss at step 750: 0.03433990478515625 Loss at step 800: 0.030436938628554344 Loss at step 850: 0.029699428007006645 Loss at step 900: 0.030703607946634293 Mean training loss after epoch 241: 0.03813132326176235 EPOCH: 242 Loss at step 0: 0.040227845311164856 Loss at step 50: 0.032529816031455994 Loss at step 100: 0.049842268228530884 Loss at step 150: 0.027479950338602066 Loss at step 200: 0.05180365592241287 Loss at step 250: 0.035594504326581955 Loss at step 300: 0.03524905815720558 Loss at step 350: 0.03551099821925163 Loss at step 400: 0.048216234892606735 Loss at step 450: 0.034285057336091995 Loss at step 500: 0.038849711418151855 Loss at step 550: 0.02756931260228157 Loss at step 600: 0.029658813029527664 Loss at step 650: 0.05021066963672638 Loss at step 700: 0.03648877143859863 Loss at step 750: 0.04287077486515045 Loss at step 800: 0.03557495027780533 Loss at step 850: 0.04305817186832428 Loss at step 900: 0.033267419785261154 Mean training loss after epoch 242: 0.03794264143654533 EPOCH: 243 Loss at step 0: 0.028052018955349922 Loss at step 50: 0.057634565979242325 Loss at step 100: 0.03326396644115448 Loss at step 150: 0.031568024307489395 Loss at step 200: 0.02681826986372471 Loss at step 250: 0.02951916866004467 Loss at step 300: 0.03450676426291466 Loss at step 350: 0.060758814215660095 Loss at step 400: 0.03353974595665932 Loss at step 450: 0.03497523069381714 Loss at step 500: 0.04666362330317497 Loss at step 550: 0.035582661628723145 Loss at step 600: 0.03170207142829895 Loss at step 650: 0.04797527194023132 Loss at step 700: 0.039039187133312225 Loss at step 750: 0.05325045809149742 Loss at step 800: 0.06358706206083298 Loss at step 850: 0.04980865865945816 Loss at step 900: 0.028868576511740685 Mean training loss after epoch 243: 0.038302264214038595 EPOCH: 244 Loss at step 0: 0.02911030501127243 Loss at step 50: 0.0501345619559288 Loss at step 100: 0.029519718140363693 Loss at step 150: 0.03129797428846359 Loss at step 200: 0.02813171222805977 Loss at step 250: 0.033409107476472855 Loss at step 300: 0.03162944316864014 Loss at step 350: 0.03918871656060219 Loss at step 400: 0.04075954481959343 Loss at step 450: 0.031564101576805115 Loss at step 500: 0.042777013033628464 Loss at step 550: 0.032773640006780624 Loss at step 600: 0.031950756907463074 Loss at step 650: 0.04615243524312973 Loss at step 700: 0.024693867191672325 Loss at step 750: 0.04487437382340431 Loss at step 800: 0.027955463156104088 Loss at step 850: 0.044341884553432465 Loss at step 900: 0.03629644960165024 Mean training loss after epoch 244: 0.03890398460656786 EPOCH: 245 Loss at step 0: 0.037340197712183 Loss at step 50: 0.029318930581212044 Loss at step 100: 0.03091849945485592 Loss at step 150: 0.04658926650881767 Loss at step 200: 0.03120698221027851 Loss at step 250: 0.03539435565471649 Loss at step 300: 0.026544760912656784 Loss at step 350: 0.05479392036795616 Loss at step 400: 0.03966592624783516 Loss at step 450: 0.05040717497467995 Loss at step 500: 0.04143224284052849 Loss at step 550: 0.03432741016149521 Loss at step 600: 0.028360361233353615 Loss at step 650: 0.026759983971714973 Loss at step 700: 0.034197866916656494 Loss at step 750: 0.04659062251448631 Loss at step 800: 0.036194294691085815 Loss at step 850: 0.036296725273132324 Loss at step 900: 0.0557585135102272 Mean training loss after epoch 245: 0.03908835086963578 EPOCH: 246 Loss at step 0: 0.048487529158592224 Loss at step 50: 0.032858707010746 Loss at step 100: 0.0503917895257473 Loss at step 150: 0.03369736671447754 Loss at step 200: 0.04178541526198387 Loss at step 250: 0.037794653326272964 Loss at step 300: 0.03019559010863304 Loss at step 350: 0.049525436013936996 Loss at step 400: 0.03243931755423546 Loss at step 450: 0.03371049463748932 Loss at step 500: 0.024626167491078377 Loss at step 550: 0.03406745567917824 Loss at step 600: 0.050971776247024536 Loss at step 650: 0.032722216099500656 Loss at step 700: 0.03463369235396385 Loss at step 750: 0.06519994884729385 Loss at step 800: 0.06545976549386978 Loss at step 850: 0.037478622049093246 Loss at step 900: 0.04684297367930412 Mean training loss after epoch 246: 0.038927495981981634 EPOCH: 247 Loss at step 0: 0.042789679020643234 Loss at step 50: 0.03221210837364197 Loss at step 100: 0.03279988840222359 Loss at step 150: 0.03652225062251091 Loss at step 200: 0.0373283252120018 Loss at step 250: 0.03407222032546997 Loss at step 300: 0.042144209146499634 Loss at step 350: 0.03576641529798508 Loss at step 400: 0.028999431058764458 Loss at step 450: 0.05076942592859268 Loss at step 500: 0.0286117997020483 Loss at step 550: 0.03532993048429489 Loss at step 600: 0.02853424660861492 Loss at step 650: 0.049126870930194855 Loss at step 700: 0.04422701150178909 Loss at step 750: 0.03521481901407242 Loss at step 800: 0.03186199441552162 Loss at step 850: 0.04356918856501579 Loss at step 900: 0.03715118393301964 Mean training loss after epoch 247: 0.03880718420110722 EPOCH: 248 Loss at step 0: 0.028370682150125504 Loss at step 50: 0.026223866268992424 Loss at step 100: 0.03843769058585167 Loss at step 150: 0.03484419360756874 Loss at step 200: 0.07375449687242508 Loss at step 250: 0.04297425225377083 Loss at step 300: 0.039212215691804886 Loss at step 350: 0.04842158779501915 Loss at step 400: 0.030708642676472664 Loss at step 450: 0.03248598799109459 Loss at step 500: 0.04776611179113388 Loss at step 550: 0.049317553639411926 Loss at step 600: 0.04329526051878929 Loss at step 650: 0.035753797739744186 Loss at step 700: 0.03449887037277222 Loss at step 750: 0.033092305064201355 Loss at step 800: 0.027152828872203827 Loss at step 850: 0.03388499096035957 Loss at step 900: 0.04274323582649231 Mean training loss after epoch 248: 0.03935217853190739 EPOCH: 249 Loss at step 0: 0.034264273941516876 Loss at step 50: 0.039211515337228775 Loss at step 100: 0.03425781428813934 Loss at step 150: 0.043800510466098785 Loss at step 200: 0.036595020443201065 Loss at step 250: 0.03553929552435875 Loss at step 300: 0.030165458098053932 Loss at step 350: 0.035558976233005524 Loss at step 400: 0.03572900593280792 Loss at step 450: 0.049176447093486786 Loss at step 500: 0.030299728736281395 Loss at step 550: 0.02616003528237343 Loss at step 600: 0.0318768210709095 Loss at step 650: 0.029668428003787994 Loss at step 700: 0.03870300203561783 Loss at step 750: 0.03555506840348244 Loss at step 800: 0.02937936782836914 Loss at step 850: 0.03001921996474266 Loss at step 900: 0.05019860342144966 Mean training loss after epoch 249: 0.03870430404046324 EPOCH: 250 Loss at step 0: 0.0314297117292881 Loss at step 50: 0.03844265267252922 Loss at step 100: 0.036556657403707504 Loss at step 150: 0.03360800817608833 Loss at step 200: 0.04786539077758789 Loss at step 250: 0.02824132703244686 Loss at step 300: 0.03137164190411568 Loss at step 350: 0.02976536750793457 Loss at step 400: 0.03480188548564911 Loss at step 450: 0.03786173462867737 Loss at step 500: 0.04318583756685257 Loss at step 550: 0.031366243958473206 Loss at step 600: 0.04723876342177391 Loss at step 650: 0.0347898043692112 Loss at step 700: 0.045554373413324356 Loss at step 750: 0.04612750932574272 Loss at step 800: 0.025735393166542053 Loss at step 850: 0.03290900960564613 Loss at step 900: 0.0405740886926651 Mean training loss after epoch 250: 0.038979818625474914 EPOCH: 251 Loss at step 0: 0.044140949845314026 Loss at step 50: 0.03376059606671333 Loss at step 100: 0.04065614566206932 Loss at step 150: 0.03334932029247284 Loss at step 200: 0.03364711254835129 Loss at step 250: 0.03064453788101673 Loss at step 300: 0.03330123424530029 Loss at step 350: 0.03337063640356064 Loss at step 400: 0.035650379955768585 Loss at step 450: 0.048824962228536606 Loss at step 500: 0.03032439947128296 Loss at step 550: 0.036923617124557495 Loss at step 600: 0.03191971406340599 Loss at step 650: 0.030942482873797417 Loss at step 700: 0.03284404054284096 Loss at step 750: 0.035578109323978424 Loss at step 800: 0.04681437090039253 Loss at step 850: 0.03454016149044037 Loss at step 900: 0.045270197093486786 Mean training loss after epoch 251: 0.0384245455316675 EPOCH: 252 Loss at step 0: 0.03599514812231064 Loss at step 50: 0.051567282527685165 Loss at step 100: 0.06349392980337143 Loss at step 150: 0.03292436525225639 Loss at step 200: 0.07472285628318787 Loss at step 250: 0.04348371550440788 Loss at step 300: 0.04167739674448967 Loss at step 350: 0.04803667962551117 Loss at step 400: 0.05672234669327736 Loss at step 450: 0.03561023250222206 Loss at step 500: 0.032228026539087296 Loss at step 550: 0.037130650132894516 Loss at step 600: 0.03206946700811386 Loss at step 650: 0.039561402052640915 Loss at step 700: 0.03647386282682419 Loss at step 750: 0.03485389053821564 Loss at step 800: 0.033384568989276886 Loss at step 850: 0.029546715319156647 Loss at step 900: 0.029974499717354774 Mean training loss after epoch 252: 0.03880693005926129 EPOCH: 253 Loss at step 0: 0.03617330268025398 Loss at step 50: 0.03768526762723923 Loss at step 100: 0.04905659332871437 Loss at step 150: 0.03500690311193466 Loss at step 200: 0.043080758303403854 Loss at step 250: 0.04251527786254883 Loss at step 300: 0.05578237399458885 Loss at step 350: 0.046200916171073914 Loss at step 400: 0.033028751611709595 Loss at step 450: 0.06625927984714508 Loss at step 500: 0.03456371650099754 Loss at step 550: 0.0324789434671402 Loss at step 600: 0.035323016345500946 Loss at step 650: 0.03609853237867355 Loss at step 700: 0.033568497747182846 Loss at step 750: 0.026590745896100998 Loss at step 800: 0.03162870928645134 Loss at step 850: 0.04118257388472557 Loss at step 900: 0.04910838603973389 Mean training loss after epoch 253: 0.038444823243899515 EPOCH: 254 Loss at step 0: 0.045055218040943146 Loss at step 50: 0.03088836744427681 Loss at step 100: 0.03646702691912651 Loss at step 150: 0.047763608396053314 Loss at step 200: 0.02640809305012226 Loss at step 250: 0.032378047704696655 Loss at step 300: 0.06576403230428696 Loss at step 350: 0.04115092754364014 Loss at step 400: 0.0328705869615078 Loss at step 450: 0.03819030523300171 Loss at step 500: 0.05391668528318405 Loss at step 550: 0.04827481880784035 Loss at step 600: 0.053961608558893204 Loss at step 650: 0.050762780010700226 Loss at step 700: 0.037136971950531006 Loss at step 750: 0.03480202704668045 Loss at step 800: 0.044440142810344696 Loss at step 850: 0.052329014986753464 Loss at step 900: 0.04658179357647896 Mean training loss after epoch 254: 0.03798838559069486 EPOCH: 255 Loss at step 0: 0.03555431216955185 Loss at step 50: 0.03296544402837753 Loss at step 100: 0.03252720087766647 Loss at step 150: 0.03163224831223488 Loss at step 200: 0.046393923461437225 Loss at step 250: 0.03447677940130234 Loss at step 300: 0.033886607736349106 Loss at step 350: 0.051093898713588715 Loss at step 400: 0.06841552257537842 Loss at step 450: 0.04635843262076378 Loss at step 500: 0.03869534283876419 Loss at step 550: 0.03243042528629303 Loss at step 600: 0.035036344081163406 Loss at step 650: 0.03309624642133713 Loss at step 700: 0.02970036305487156 Loss at step 750: 0.03564947098493576 Loss at step 800: 0.037825874984264374 Loss at step 850: 0.03679411858320236 Loss at step 900: 0.053891636431217194 Mean training loss after epoch 255: 0.03837270241802626 EPOCH: 256 Loss at step 0: 0.0326099619269371 Loss at step 50: 0.06459227204322815 Loss at step 100: 0.04552460461854935 Loss at step 150: 0.028130022808909416 Loss at step 200: 0.05046702176332474 Loss at step 250: 0.04568394646048546 Loss at step 300: 0.04373426362872124 Loss at step 350: 0.03725827857851982 Loss at step 400: 0.037501025944948196 Loss at step 450: 0.04811300337314606 Loss at step 500: 0.035490117967128754 Loss at step 550: 0.06001218780875206 Loss at step 600: 0.030590670183300972 Loss at step 650: 0.028698492795228958 Loss at step 700: 0.032914407551288605 Loss at step 750: 0.034597624093294144 Loss at step 800: 0.03044790029525757 Loss at step 850: 0.03345989063382149 Loss at step 900: 0.040189728140830994 Mean training loss after epoch 256: 0.03845109197813501