TPT_10Mf_0 / training.log
xiulinyang's picture
Upload folder using huggingface_hub
9f69db1 verified
01/08/2026 03:36:38 - INFO - __main__ - Distributed environment: DistributedType.NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Mixed precision type: no
01/08/2026 03:36:38 - INFO - __main__ - Arguments:
01/08/2026 03:36:38 - INFO - __main__ - Namespace(train_file='data/preprocessed/dependency/train.sequential=False.random=False.convert_method=exponential.jsonl', validation_file='data/preprocessed/dependency/val.sequential=False.random=False.convert_method=exponential.jsonl', model_name_or_path=None, per_device_train_batch_size=32, per_device_eval_batch_size=32, learning_rate=0.0001, weight_decay=0.0, num_train_epochs=10, max_train_steps=None, gradient_accumulation_steps=1, lr_scheduler_type=<SchedulerType.LINEAR: 'linear'>, num_warmup_steps=0, output_dir='experiments/128/', seed=128, block_size=512, preprocessing_num_workers=None, overwrite_cache=False, trust_remote_code=False, checkpointing_steps='epoch', resume_from_checkpoint=None, with_tracking=True, report_to='wandb', low_cpu_mem_usage=False, n_positions=1024, n_embd=512, n_layer=4, n_head=8, n_inner=None, activation_function='gelu_new', resid_pdrop=0.1, embd_pdrop=0.1, attn_pdrop=0.1, layer_norm_epsilon=1e-05, initializer_range=0.02, attn_loss_weight=0.5, attn_loss_layers=[3], attn_loss_heads=[0], attn_loss_reduction='none')
01/08/2026 03:36:38 - INFO - __main__ - Training new model from scratch
01/08/2026 03:36:39 - INFO - __main__ - Sample 496287 of the training set: {'token_ids': [32768, 6067, 272, 515, 485, 318, 1388, 291, 32769], 'attn_matrix': [[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.5, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.3333333333333333, 0.3333333333333333, 0.3333333333333333, 0.0, 0.0, 0.0, 0.0, 0.0], [0.17487770452710943, 0.17487770452710943, 0.17487770452710943, 0.4753668864186718, 0.0, 0.0, 0.0, 0.0], [0.14884758120207758, 0.14884758120207758, 0.14884758120207758, 0.4046096751916898, 0.14884758120207758, 0.0, 0.0, 0.0], [0.07088509576696811, 0.07088509576696811, 0.07088509576696811, 0.1926856677319286, 0.07088509576696811, 0.523773949200199, 0.0, 0.0], [0.12366807520346387, 0.12366807520346387, 0.12366807520346387, 0.3361646815860826, 0.12366807520346387, 0.12366807520346387, 0.045494942396598195, 0.0], [0.1100574786562937, 0.1100574786562937, 0.1100574786562937, 0.2991672443174225, 0.1100574786562937, 0.1100574786562937, 0.04048788374481527, 0.1100574786562937]], 'word_token_membership_mask': [[True, False, False, False, False, False, False, False, False], [False, True, False, False, False, False, False, False, False], [False, False, True, False, False, False, False, False, False], [False, False, False, True, False, False, False, False, False], [False, False, False, False, True, False, False, False, False], [False, False, False, False, False, True, False, False, False], [False, False, False, False, False, False, True, False, False], [False, False, False, False, False, False, False, True, False], [False, False, False, False, False, False, False, False, True]], 'input_ids': [32768, 6067, 272, 515, 485, 318, 1388, 291, 32769], 'row_word_token_membership_mask': [[True, False, False, False, False, False, False, False], [False, True, False, False, False, False, False, False], [False, False, True, False, False, False, False, False], [False, False, False, True, False, False, False, False], [False, False, False, False, True, False, False, False], [False, False, False, False, False, True, False, False], [False, False, False, False, False, False, True, False], [False, False, False, False, False, False, False, True]], 'col_word_token_membership_mask': [[True, False, False, False, False, False, False, False], [False, True, False, False, False, False, False, False], [False, False, True, False, False, False, False, False], [False, False, False, True, False, False, False, False], [False, False, False, False, True, False, False, False], [False, False, False, False, False, True, False, False], [False, False, False, False, False, False, True, False], [False, False, False, False, False, False, False, True]]}.
01/08/2026 03:36:39 - INFO - __main__ - Sample 850736 of the training set: {'token_ids': [32768, 43, 1049, 314, 6556, 318, 267, 3950, 269, 32769], 'attn_matrix': [[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.5, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.3333333333333333, 0.3333333333333333, 0.3333333333333333, 0.0, 0.0, 0.0, 0.0, 0.0], [0.17487770452710943, 0.17487770452710943, 0.17487770452710943, 0.4753668864186718, 0.0, 0.0, 0.0, 0.0], [0.07629314247787525, 0.07629314247787525, 0.07629314247787525, 0.20738626283364517, 0.5637343097327291, 0.0, 0.0, 0.0], [0.04878906985862268, 0.04878906985862268, 0.04878906985862268, 0.13262244202411294, 0.36050517420000955, 0.36050517420000955, 0.0, 0.0], [0.13847545210043435, 0.13847545210043435, 0.13847545210043435, 0.37641530513226173, 0.13847545210043435, 0.018740614531330278, 0.050942271934670616, 0.0], [0.12163235653869706, 0.12163235653869706, 0.12163235653869706, 0.33063102453179205, 0.12163235653869706, 0.016461149422901226, 0.0447460433518215, 0.12163235653869706]], 'word_token_membership_mask': [[True, False, False, False, False, False, False, False, False, False], [False, True, False, False, False, False, False, False, False, False], [False, False, True, True, False, False, False, False, False, False], [False, False, False, False, True, False, False, False, False, False], [False, False, False, False, False, True, False, False, False, False], [False, False, False, False, False, False, True, False, False, False], [False, False, False, False, False, False, False, True, False, False], [False, False, False, False, False, False, False, False, True, False], [False, False, False, False, False, False, False, False, False, True]], 'input_ids': [32768, 43, 1049, 314, 6556, 318, 267, 3950, 269, 32769], 'row_word_token_membership_mask': [[True, False, False, False, False, False, False, False, False], [False, True, True, False, False, False, False, False, False], [False, False, False, True, False, False, False, False, False], [False, False, False, False, True, False, False, False, False], [False, False, False, False, False, True, False, False, False], [False, False, False, False, False, False, True, False, False], [False, False, False, False, False, False, False, True, False], [False, False, False, False, False, False, False, False, True]], 'col_word_token_membership_mask': [[True, False, False, False, False, False, False, False, False], [False, True, False, False, False, False, False, False, False], [False, False, True, True, False, False, False, False, False], [False, False, False, False, True, False, False, False, False], [False, False, False, False, False, True, False, False, False], [False, False, False, False, False, False, True, False, False], [False, False, False, False, False, False, False, True, False], [False, False, False, False, False, False, False, False, True]]}.
01/08/2026 03:36:39 - INFO - __main__ - Sample 1072717 of the training set: {'token_ids': [32768, 1162, 1594, 313, 27833, 3442, 397, 1207, 1702, 485, 2260, 566, 872, 298, 2168, 311, 267, 1474, 365, 1707, 1159, 365, 408, 267, 29500, 764, 9869, 365, 298, 861, 405, 560, 10219, 269, 32769], 'attn_matrix': [[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.26894142136999505, 0.731058578630005, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.09003057317038045, 0.2447284710547976, 0.665240955774822, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.03205860328008498, 0.08714431874203254, 0.23688281808991007, 0.6439142598879724, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.06745080586634483, 0.18335029990140392, 0.49839778846450256, 0.18335029990140392, 0.06745080586634483, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.04501528658519023, 0.12236423552739882, 0.33262047788741095, 0.12236423552739882, 0.04501528658519023, 0.33262047788741095, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.03377952487759491, 0.09182226864874651, 0.24959880431577242, 0.09182226864874651, 0.03377952487759491, 0.24959880431577242, 0.24959880431577242, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.02012506971027169, 0.054705611289903075, 0.14870526908408752, 0.054705611289903075, 0.02012506971027169, 0.14870526908408752, 0.14870526908408752, 0.4042228307473879, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.009588884358741272, 0.02606529010756156, 0.0708528044528979, 0.02606529010756156, 0.009588884358741272, 0.0708528044528979, 0.0708528044528979, 0.1925978908396745, 0.5235353468690261, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.27065309628634215, 0.0995677098131401, 0.27065309628634215, 0.0995677098131401, 0.03662891344477831, 0.03662891344477831, 0.03662891344477831, 0.0995677098131401, 0.03662891344477831, 0.013475024208782176, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.15593208485436483, 0.057364208236921656, 0.15593208485436483, 0.057364208236921656, 0.021103112869440988, 0.021103112869440988, 0.021103112869440988, 0.057364208236921656, 0.021103112869440988, 0.007763401369387824, 0.4238673527333538, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.10951306984813354, 0.0402876069367005, 0.10951306984813354, 0.0402876069367005, 0.014820982326008102, 0.014820982326008102, 0.014820982326008102, 0.0402876069367005, 0.014820982326008102, 0.005452334695703684, 0.2976873877469477, 0.2976873877469477, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.06053127227234526, 0.022268210616946796, 0.06053127227234526, 0.022268210616946796, 0.008192016877650367, 0.008192016877650367, 0.008192016877650367, 0.022268210616946796, 0.008192016877650367, 0.003013674591017041, 0.16454105747142297, 0.16454105747142297, 0.44726896656000464, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.12271734636543012, 0.04514518880295676, 0.12271734636543012, 0.04514518880295676, 0.016607986828410987, 0.016607986828410987, 0.016607986828410987, 0.04514518880295676, 0.016607986828410987, 0.006109736913418509, 0.33358033266186343, 0.04514518880295676, 0.12271734636543012, 0.04514518880295676, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.06435891335043971, 0.02367632107776104, 0.06435891335043971, 0.02367632107776104, 0.008710031767082373, 0.008710031767082373, 0.008710031767082373, 0.02367632107776104, 0.008710031767082373, 0.003204241619059774, 0.17494566465987052, 0.02367632107776104, 0.06435891335043971, 0.02367632107776104, 0.47555162121261596, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.02807143598771085, 0.010326904184038983, 0.02807143598771085, 0.010326904184038983, 0.0037990557402552906, 0.0037990557402552906, 0.0037990557402552906, 0.010326904184038983, 0.0037990557402552906, 0.0013975945027042763, 0.07630607434414569, 0.010326904184038983, 0.02807143598771085, 0.010326904184038983, 0.20742141529073618, 0.5638298640180653, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.01795044117880241, 0.006603598269638677, 0.01795044117880241, 0.006603598269638677, 0.00242932804115538, 0.00242932804115538, 0.00242932804115538, 0.006603598269638677, 0.00242932804115538, 0.0008936998422023558, 0.048794358069161554, 0.006603598269638677, 0.01795044117880241, 0.006603598269638677, 0.13263681687072582, 0.3605442490843441, 0.3605442490843441, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.013193573961951758, 0.004853644616176905, 0.013193573961951758, 0.004853644616176905, 0.0017855560690439394, 0.0017855560690439394, 0.0017855560690439394, 0.004853644616176905, 0.0017855560690439394, 0.0006568693688601616, 0.03586385235320387, 0.004853644616176905, 0.013193573961951758, 0.004853644616176905, 0.09748805815025224, 0.2650000169615894, 0.2650000169615894, 0.2650000169615894, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.010429702596875434, 0.0038368731629228755, 0.010429702596875434, 0.0038368731629228755, 0.0014115067550217718, 0.0014115067550217718, 0.0014115067550217718, 0.0038368731629228755, 0.0014115067550217718, 0.0005192643162471252, 0.028350871045318606, 0.0038368731629228755, 0.010429702596875434, 0.0038368731629228755, 0.07706565758347526, 0.20948617660740781, 0.20948617660740781, 0.20948617660740781, 0.20948617660740781, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0086232507643292, 0.003172316672262639, 0.0086232507643292, 0.003172316672262639, 0.001167030084610829, 0.001167030084610829, 0.001167030084610829, 0.003172316672262639, 0.001167030084610829, 0.0004293263753568929, 0.023440425854921634, 0.003172316672262639, 0.0086232507643292, 0.003172316672262639, 0.06371768365277505, 0.17320262162484046, 0.17320262162484046, 0.17320262162484046, 0.17320262162484046, 0.17320262162484046, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.04824470580633239, 0.0177482354115142, 0.04824470580633239, 0.0177482354115142, 0.006529210924967047, 0.006529210924967047, 0.006529210924967047, 0.0177482354115142, 0.006529210924967047, 0.0024019624663673533, 0.13114270711270593, 0.0177482354115142, 0.04824470580633239, 0.0177482354115142, 0.35648283767939537, 0.13114270711270593, 0.0177482354115142, 0.0177482354115142, 0.0177482354115142, 0.0177482354115142, 0.04824470580633239, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.042651298994341065, 0.01569053603927429, 0.042651298994341065, 0.01569053603927429, 0.005772225629808603, 0.005772225629808603, 0.005772225629808603, 0.01569053603927429, 0.005772225629808603, 0.002123483139009465, 0.11593825101649086, 0.01569053603927429, 0.042651298994341065, 0.01569053603927429, 0.31515284096145063, 0.11593825101649086, 0.01569053603927429, 0.01569053603927429, 0.01569053603927429, 0.01569053603927429, 0.042651298994341065, 0.11593825101649086, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.022971880611964247, 0.008450882602186497, 0.022971880611964247, 0.008450882602186497, 0.003108905969097833, 0.003108905969097833, 0.003108905969097833, 0.008450882602186497, 0.003108905969097833, 0.001143702590566272, 0.06244404563303305, 0.008450882602186497, 0.022971880611964247, 0.008450882602186497, 0.16974051453974115, 0.06244404563303305, 0.008450882602186497, 0.008450882602186497, 0.008450882602186497, 0.008450882602186497, 0.022971880611964247, 0.06244404563303305, 0.4614025562266667, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.015719064205880078, 0.005782720555797185, 0.015719064205880078, 0.005782720555797185, 0.002127344006517281, 0.002127344006517281, 0.002127344006517281, 0.005782720555797185, 0.002127344006517281, 0.0007826061242969943, 0.04272884659122482, 0.005782720555797185, 0.015719064205880078, 0.005782720555797185, 0.11614904723994066, 0.04272884659122482, 0.005782720555797185, 0.005782720555797185, 0.005782720555797185, 0.005782720555797185, 0.015719064205880078, 0.04272884659122482, 0.3157258445051619, 0.3157258445051619, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.011947066534815951, 0.004395080160466132, 0.011947066534815951, 0.004395080160466132, 0.0016168596333359737, 0.0016168596333359737, 0.0016168596333359737, 0.004395080160466132, 0.0016168596333359737, 0.0005948094183643012, 0.03247549386498137, 0.004395080160466132, 0.011947066534815951, 0.004395080160466132, 0.08827754484341206, 0.03247549386498137, 0.004395080160466132, 0.004395080160466132, 0.004395080160466132, 0.004395080160466132, 0.011947066534815951, 0.03247549386498137, 0.23996324600882557, 0.23996324600882557, 0.23996324600882557, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.09240574017894794, 0.03399417205806486, 0.09240574017894794, 0.03399417205806486, 0.012505757019806761, 0.012505757019806761, 0.012505757019806761, 0.03399417205806486, 0.012505757019806761, 0.004600610903872354, 0.25118484437374217, 0.03399417205806486, 0.09240574017894794, 0.03399417205806486, 0.09240574017894794, 0.03399417205806486, 0.004600610903872354, 0.004600610903872354, 0.004600610903872354, 0.004600610903872354, 0.012505757019806761, 0.03399417205806486, 0.03399417205806486, 0.004600610903872354, 0.004600610903872354, 0.012505757019806761, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.08458921148090166, 0.031118631848727055, 0.08458921148090166, 0.031118631848727055, 0.011447904894529556, 0.011447904894529556, 0.011447904894529556, 0.031118631848727055, 0.011447904894529556, 0.0042114488551833525, 0.2299373164522143, 0.031118631848727055, 0.08458921148090166, 0.031118631848727055, 0.08458921148090166, 0.031118631848727055, 0.0042114488551833525, 0.0042114488551833525, 0.0042114488551833525, 0.0042114488551833525, 0.011447904894529556, 0.031118631848727055, 0.031118631848727055, 0.0042114488551833525, 0.0042114488551833525, 0.011447904894529556, 0.08458921148090166, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.07799193518198774, 0.028691629530629002, 0.07799193518198774, 0.028691629530629002, 0.010555060638025849, 0.010555060638025849, 0.010555060638025849, 0.028691629530629002, 0.010555060638025849, 0.003882989809047637, 0.21200406017155304, 0.028691629530629002, 0.07799193518198774, 0.028691629530629002, 0.07799193518198774, 0.028691629530629002, 0.003882989809047637, 0.003882989809047637, 0.003882989809047637, 0.003882989809047637, 0.010555060638025849, 0.028691629530629002, 0.028691629530629002, 0.003882989809047637, 0.003882989809047637, 0.010555060638025849, 0.07799193518198774, 0.07799193518198774, 0.0, 0.0, 0.0, 0.0, 0.0], [0.04947826496904969, 0.018202036466946552, 0.04947826496904969, 0.018202036466946552, 0.006696155003642512, 0.006696155003642512, 0.006696155003642512, 0.018202036466946552, 0.006696155003642512, 0.0024633777607373647, 0.13449586856904952, 0.018202036466946552, 0.04947826496904969, 0.018202036466946552, 0.04947826496904969, 0.018202036466946552, 0.0024633777607373647, 0.0024633777607373647, 0.0024633777607373647, 0.0024633777607373647, 0.006696155003642512, 0.018202036466946552, 0.018202036466946552, 0.0024633777607373647, 0.0024633777607373647, 0.006696155003642512, 0.04947826496904969, 0.04947826496904969, 0.3655976755340633, 0.0, 0.0, 0.0, 0.0], [0.03623194873241091, 0.013328989052231675, 0.03623194873241091, 0.013328989052231675, 0.004903461043915262, 0.004903461043915262, 0.004903461043915262, 0.013328989052231675, 0.004903461043915262, 0.0018038825086414835, 0.09848864784897231, 0.013328989052231675, 0.03623194873241091, 0.013328989052231675, 0.03623194873241091, 0.013328989052231675, 0.0018038825086414835, 0.0018038825086414835, 0.0018038825086414835, 0.0018038825086414835, 0.004903461043915262, 0.013328989052231675, 0.013328989052231675, 0.0018038825086414835, 0.0018038825086414835, 0.004903461043915262, 0.03623194873241091, 0.03623194873241091, 0.2677199017573635, 0.2677199017573635, 0.0, 0.0, 0.0], [0.028580405405156724, 0.010514143568902325, 0.028580405405156724, 0.010514143568902325, 0.003867937260524102, 0.003867937260524102, 0.003867937260524102, 0.010514143568902325, 0.003867937260524102, 0.0014229345978878062, 0.0776895966628302, 0.010514143568902325, 0.028580405405156724, 0.010514143568902325, 0.028580405405156724, 0.010514143568902325, 0.0014229345978878062, 0.0014229345978878062, 0.0014229345978878062, 0.0014229345978878062, 0.003867937260524102, 0.010514143568902325, 0.010514143568902325, 0.0014229345978878062, 0.0014229345978878062, 0.003867937260524102, 0.028580405405156724, 0.028580405405156724, 0.21118221886888386, 0.21118221886888386, 0.21118221886888386, 0.0, 0.0], [0.06915443502737553, 0.02544049491239773, 0.06915443502737553, 0.02544049491239773, 0.009359035051497799, 0.009359035051497799, 0.009359035051497799, 0.02544049491239773, 0.009359035051497799, 0.0034429965846489515, 0.18798124409226669, 0.02544049491239773, 0.06915443502737553, 0.02544049491239773, 0.06915443502737553, 0.02544049491239773, 0.0034429965846489515, 0.0034429965846489515, 0.0034429965846489515, 0.0034429965846489515, 0.009359035051497799, 0.02544049491239773, 0.02544049491239773, 0.0034429965846489515, 0.0034429965846489515, 0.009359035051497799, 0.06915443502737553, 0.06915443502737553, 0.06915443502737553, 0.009359035051497799, 0.009359035051497799, 0.02544049491239773, 0.0], [0.06468142745496337, 0.02379496738630311, 0.06468142745496337, 0.02379496738630311, 0.008753679304765885, 0.008753679304765885, 0.008753679304765885, 0.02379496738630311, 0.008753679304765885, 0.0032202986508312937, 0.17582234888961898, 0.02379496738630311, 0.06468142745496337, 0.02379496738630311, 0.06468142745496337, 0.02379496738630311, 0.0032202986508312937, 0.0032202986508312937, 0.0032202986508312937, 0.0032202986508312937, 0.008753679304765885, 0.02379496738630311, 0.02379496738630311, 0.0032202986508312937, 0.0032202986508312937, 0.008753679304765885, 0.06468142745496337, 0.06468142745496337, 0.06468142745496337, 0.008753679304765885, 0.008753679304765885, 0.02379496738630311, 0.06468142745496337]], 'word_token_membership_mask': [[True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, True, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True]], 'input_ids': [32768, 1162, 1594, 313, 27833, 3442, 397, 1207, 1702, 485, 2260, 566, 872, 298, 2168, 311, 267, 1474, 365, 1707, 1159, 365, 408, 267, 29500, 764, 9869, 365, 298, 861, 405, 560, 10219, 269, 32769], 'row_word_token_membership_mask': [[True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, True, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True]], 'col_word_token_membership_mask': [[True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, True, False, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False], [False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True]]}.
01/08/2026 03:36:39 - INFO - __main__ - ***** Running training *****
01/08/2026 03:36:39 - INFO - __main__ - Num examples = 1132837
01/08/2026 03:36:39 - INFO - __main__ - Num Epochs = 10
01/08/2026 03:36:39 - INFO - __main__ - Instantaneous batch size per device = 32
01/08/2026 03:36:39 - INFO - __main__ - Total train batch size (w. parallel, distributed & accumulation) = 32
01/08/2026 03:36:39 - INFO - __main__ - Gradient Accumulation steps = 1
01/08/2026 03:36:39 - INFO - __main__ - Total optimization steps = 354020
01/08/2026 03:36:39 - INFO - accelerate.accelerator - Saving current state to experiments/128/epoch_0_init
01/08/2026 03:36:39 - WARNING - accelerate.utils.other - Removed shared tensor {'lm_head.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading
01/08/2026 03:36:39 - INFO - accelerate.checkpointing - Model weights saved in experiments/128/epoch_0_init/model.safetensors
01/08/2026 03:36:39 - INFO - accelerate.checkpointing - Optimizer state saved in experiments/128/epoch_0_init/optimizer.bin
01/08/2026 03:36:39 - INFO - accelerate.checkpointing - Scheduler state saved in experiments/128/epoch_0_init/scheduler.bin
01/08/2026 03:36:39 - INFO - accelerate.checkpointing - Sampler state for dataloader 0 saved in experiments/128/epoch_0_init/sampler.bin
01/08/2026 03:36:39 - INFO - accelerate.checkpointing - Sampler state for dataloader 1 saved in experiments/128/epoch_0_init/sampler_1.bin
01/08/2026 03:36:39 - INFO - accelerate.checkpointing - Random states saved in experiments/128/epoch_0_init/random_states_0.pkl
01/08/2026 03:53:49 - INFO - __main__ - epoch 0: perplexity: 61.45426781123526 eval_loss: 4.183549404144287 eval_nwp_loss: 4.118293285369873 eval_attn_loss: 0.13051171600818634
01/08/2026 03:53:49 - INFO - accelerate.accelerator - Saving current state to experiments/128/epoch_1
01/08/2026 03:53:49 - WARNING - accelerate.utils.other - Removed shared tensor {'lm_head.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading
01/08/2026 03:53:49 - INFO - accelerate.checkpointing - Model weights saved in experiments/128/epoch_1/model.safetensors
01/08/2026 03:53:49 - INFO - accelerate.checkpointing - Optimizer state saved in experiments/128/epoch_1/optimizer.bin
01/08/2026 03:53:49 - INFO - accelerate.checkpointing - Scheduler state saved in experiments/128/epoch_1/scheduler.bin
01/08/2026 03:53:49 - INFO - accelerate.checkpointing - Sampler state for dataloader 0 saved in experiments/128/epoch_1/sampler.bin
01/08/2026 03:53:49 - INFO - accelerate.checkpointing - Sampler state for dataloader 1 saved in experiments/128/epoch_1/sampler_1.bin
01/08/2026 03:53:49 - INFO - accelerate.checkpointing - Random states saved in experiments/128/epoch_1/random_states_0.pkl
01/08/2026 04:12:15 - INFO - __main__ - epoch 1: perplexity: 54.16680628732148 eval_loss: 4.053577899932861 eval_nwp_loss: 3.992068290710449 eval_attn_loss: 0.12301884591579437
01/08/2026 04:12:15 - INFO - accelerate.accelerator - Saving current state to experiments/128/epoch_2
01/08/2026 04:12:15 - WARNING - accelerate.utils.other - Removed shared tensor {'lm_head.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading
01/08/2026 04:12:15 - INFO - accelerate.checkpointing - Model weights saved in experiments/128/epoch_2/model.safetensors
01/08/2026 04:12:15 - INFO - accelerate.checkpointing - Optimizer state saved in experiments/128/epoch_2/optimizer.bin
01/08/2026 04:12:15 - INFO - accelerate.checkpointing - Scheduler state saved in experiments/128/epoch_2/scheduler.bin
01/08/2026 04:12:15 - INFO - accelerate.checkpointing - Sampler state for dataloader 0 saved in experiments/128/epoch_2/sampler.bin
01/08/2026 04:12:15 - INFO - accelerate.checkpointing - Sampler state for dataloader 1 saved in experiments/128/epoch_2/sampler_1.bin
01/08/2026 04:12:15 - INFO - accelerate.checkpointing - Random states saved in experiments/128/epoch_2/random_states_0.pkl
01/08/2026 04:30:23 - INFO - __main__ - epoch 2: perplexity: 51.11190373140643 eval_loss: 3.9937267303466797 eval_nwp_loss: 3.9340174198150635 eval_attn_loss: 0.11941886693239212
01/08/2026 04:30:23 - INFO - accelerate.accelerator - Saving current state to experiments/128/epoch_3
01/08/2026 04:30:23 - WARNING - accelerate.utils.other - Removed shared tensor {'lm_head.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading
01/08/2026 04:30:23 - INFO - accelerate.checkpointing - Model weights saved in experiments/128/epoch_3/model.safetensors
01/08/2026 04:30:23 - INFO - accelerate.checkpointing - Optimizer state saved in experiments/128/epoch_3/optimizer.bin
01/08/2026 04:30:23 - INFO - accelerate.checkpointing - Scheduler state saved in experiments/128/epoch_3/scheduler.bin
01/08/2026 04:30:23 - INFO - accelerate.checkpointing - Sampler state for dataloader 0 saved in experiments/128/epoch_3/sampler.bin
01/08/2026 04:30:23 - INFO - accelerate.checkpointing - Sampler state for dataloader 1 saved in experiments/128/epoch_3/sampler_1.bin
01/08/2026 04:30:23 - INFO - accelerate.checkpointing - Random states saved in experiments/128/epoch_3/random_states_0.pkl
01/08/2026 04:49:04 - INFO - __main__ - epoch 3: perplexity: 49.0719534463731 eval_loss: 3.9519906044006348 eval_nwp_loss: 3.8932876586914062 eval_attn_loss: 0.11740673333406448
01/08/2026 04:49:04 - INFO - accelerate.accelerator - Saving current state to experiments/128/epoch_4
01/08/2026 04:49:04 - WARNING - accelerate.utils.other - Removed shared tensor {'lm_head.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading
01/08/2026 04:49:04 - INFO - accelerate.checkpointing - Model weights saved in experiments/128/epoch_4/model.safetensors
01/08/2026 04:49:04 - INFO - accelerate.checkpointing - Optimizer state saved in experiments/128/epoch_4/optimizer.bin
01/08/2026 04:49:04 - INFO - accelerate.checkpointing - Scheduler state saved in experiments/128/epoch_4/scheduler.bin
01/08/2026 04:49:04 - INFO - accelerate.checkpointing - Sampler state for dataloader 0 saved in experiments/128/epoch_4/sampler.bin
01/08/2026 04:49:04 - INFO - accelerate.checkpointing - Sampler state for dataloader 1 saved in experiments/128/epoch_4/sampler_1.bin
01/08/2026 04:49:04 - INFO - accelerate.checkpointing - Random states saved in experiments/128/epoch_4/random_states_0.pkl
01/08/2026 05:07:52 - INFO - __main__ - epoch 4: perplexity: 48.164000025201815 eval_loss: 3.932680130004883 eval_nwp_loss: 3.8746118545532227 eval_attn_loss: 0.11613597720861435
01/08/2026 05:07:52 - INFO - accelerate.accelerator - Saving current state to experiments/128/epoch_5
01/08/2026 05:07:52 - WARNING - accelerate.utils.other - Removed shared tensor {'lm_head.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading
01/08/2026 05:07:52 - INFO - accelerate.checkpointing - Model weights saved in experiments/128/epoch_5/model.safetensors
01/08/2026 05:07:52 - INFO - accelerate.checkpointing - Optimizer state saved in experiments/128/epoch_5/optimizer.bin
01/08/2026 05:07:52 - INFO - accelerate.checkpointing - Scheduler state saved in experiments/128/epoch_5/scheduler.bin
01/08/2026 05:07:52 - INFO - accelerate.checkpointing - Sampler state for dataloader 0 saved in experiments/128/epoch_5/sampler.bin
01/08/2026 05:07:52 - INFO - accelerate.checkpointing - Sampler state for dataloader 1 saved in experiments/128/epoch_5/sampler_1.bin
01/08/2026 05:07:52 - INFO - accelerate.checkpointing - Random states saved in experiments/128/epoch_5/random_states_0.pkl
01/08/2026 05:26:02 - INFO - __main__ - epoch 5: perplexity: 47.37661658975665 eval_loss: 3.9156994819641113 eval_nwp_loss: 3.858128786087036 eval_attn_loss: 0.11514072865247726
01/08/2026 05:26:02 - INFO - accelerate.accelerator - Saving current state to experiments/128/epoch_6
01/08/2026 05:26:02 - WARNING - accelerate.utils.other - Removed shared tensor {'lm_head.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading
01/08/2026 05:26:02 - INFO - accelerate.checkpointing - Model weights saved in experiments/128/epoch_6/model.safetensors
01/08/2026 05:26:02 - INFO - accelerate.checkpointing - Optimizer state saved in experiments/128/epoch_6/optimizer.bin
01/08/2026 05:26:02 - INFO - accelerate.checkpointing - Scheduler state saved in experiments/128/epoch_6/scheduler.bin
01/08/2026 05:26:02 - INFO - accelerate.checkpointing - Sampler state for dataloader 0 saved in experiments/128/epoch_6/sampler.bin
01/08/2026 05:26:02 - INFO - accelerate.checkpointing - Sampler state for dataloader 1 saved in experiments/128/epoch_6/sampler_1.bin
01/08/2026 05:26:02 - INFO - accelerate.checkpointing - Random states saved in experiments/128/epoch_6/random_states_0.pkl
01/08/2026 05:44:03 - INFO - __main__ - epoch 6: perplexity: 47.01068597593352 eval_loss: 3.9075894355773926 eval_nwp_loss: 3.850374937057495 eval_attn_loss: 0.11442875117063522
01/08/2026 05:44:03 - INFO - accelerate.accelerator - Saving current state to experiments/128/epoch_7
01/08/2026 05:44:03 - WARNING - accelerate.utils.other - Removed shared tensor {'lm_head.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading
01/08/2026 05:44:03 - INFO - accelerate.checkpointing - Model weights saved in experiments/128/epoch_7/model.safetensors
01/08/2026 05:44:03 - INFO - accelerate.checkpointing - Optimizer state saved in experiments/128/epoch_7/optimizer.bin
01/08/2026 05:44:03 - INFO - accelerate.checkpointing - Scheduler state saved in experiments/128/epoch_7/scheduler.bin
01/08/2026 05:44:03 - INFO - accelerate.checkpointing - Sampler state for dataloader 0 saved in experiments/128/epoch_7/sampler.bin
01/08/2026 05:44:03 - INFO - accelerate.checkpointing - Sampler state for dataloader 1 saved in experiments/128/epoch_7/sampler_1.bin
01/08/2026 05:44:03 - INFO - accelerate.checkpointing - Random states saved in experiments/128/epoch_7/random_states_0.pkl
01/08/2026 06:02:27 - INFO - __main__ - epoch 7: perplexity: 46.709559367074704 eval_loss: 3.900867223739624 eval_nwp_loss: 3.8439488410949707 eval_attn_loss: 0.11383768171072006
01/08/2026 06:02:27 - INFO - accelerate.accelerator - Saving current state to experiments/128/epoch_8
01/08/2026 06:02:27 - WARNING - accelerate.utils.other - Removed shared tensor {'lm_head.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading
01/08/2026 06:02:27 - INFO - accelerate.checkpointing - Model weights saved in experiments/128/epoch_8/model.safetensors
01/08/2026 06:02:28 - INFO - accelerate.checkpointing - Optimizer state saved in experiments/128/epoch_8/optimizer.bin
01/08/2026 06:02:28 - INFO - accelerate.checkpointing - Scheduler state saved in experiments/128/epoch_8/scheduler.bin
01/08/2026 06:02:28 - INFO - accelerate.checkpointing - Sampler state for dataloader 0 saved in experiments/128/epoch_8/sampler.bin
01/08/2026 06:02:28 - INFO - accelerate.checkpointing - Sampler state for dataloader 1 saved in experiments/128/epoch_8/sampler_1.bin
01/08/2026 06:02:28 - INFO - accelerate.checkpointing - Random states saved in experiments/128/epoch_8/random_states_0.pkl
01/08/2026 06:17:03 - INFO - __main__ - epoch 8: perplexity: 46.623820848165956 eval_loss: 3.898854970932007 eval_nwp_loss: 3.842111587524414 eval_attn_loss: 0.11348710209131241
01/08/2026 06:17:03 - INFO - accelerate.accelerator - Saving current state to experiments/128/epoch_9
01/08/2026 06:17:03 - WARNING - accelerate.utils.other - Removed shared tensor {'lm_head.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading
01/08/2026 06:17:03 - INFO - accelerate.checkpointing - Model weights saved in experiments/128/epoch_9/model.safetensors
01/08/2026 06:17:03 - INFO - accelerate.checkpointing - Optimizer state saved in experiments/128/epoch_9/optimizer.bin
01/08/2026 06:17:03 - INFO - accelerate.checkpointing - Scheduler state saved in experiments/128/epoch_9/scheduler.bin
01/08/2026 06:17:03 - INFO - accelerate.checkpointing - Sampler state for dataloader 0 saved in experiments/128/epoch_9/sampler.bin
01/08/2026 06:17:03 - INFO - accelerate.checkpointing - Sampler state for dataloader 1 saved in experiments/128/epoch_9/sampler_1.bin
01/08/2026 06:17:03 - INFO - accelerate.checkpointing - Random states saved in experiments/128/epoch_9/random_states_0.pkl
01/08/2026 06:33:33 - INFO - __main__ - epoch 9: perplexity: 46.621519896025845 eval_loss: 3.898709297180176 eval_nwp_loss: 3.84206223487854 eval_attn_loss: 0.11329396814107895
01/08/2026 06:33:33 - INFO - accelerate.accelerator - Saving current state to experiments/128/epoch_10
01/08/2026 06:33:33 - WARNING - accelerate.utils.other - Removed shared tensor {'lm_head.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading
01/08/2026 06:33:33 - INFO - accelerate.checkpointing - Model weights saved in experiments/128/epoch_10/model.safetensors
01/08/2026 06:33:33 - INFO - accelerate.checkpointing - Optimizer state saved in experiments/128/epoch_10/optimizer.bin
01/08/2026 06:33:33 - INFO - accelerate.checkpointing - Scheduler state saved in experiments/128/epoch_10/scheduler.bin
01/08/2026 06:33:33 - INFO - accelerate.checkpointing - Sampler state for dataloader 0 saved in experiments/128/epoch_10/sampler.bin
01/08/2026 06:33:33 - INFO - accelerate.checkpointing - Sampler state for dataloader 1 saved in experiments/128/epoch_10/sampler_1.bin
01/08/2026 06:33:33 - INFO - accelerate.checkpointing - Random states saved in experiments/128/epoch_10/random_states_0.pkl